Re: [R] Post-hoc tests in MASS using glm.nb
PS I should have followed the example with one using with() for something that would often be done with attach(): Consider: with(polyData, { plot(x, y, pch=".") o <- order(x) lines(x[o], eta[o], col = "red") }) I use this kind of dodge a lot, too, but now you can mostly use data= arguments on the individual functions. Bill Venables. -Original Message- From: Venables, Bill (CMIS, Dutton Park) Sent: Wednesday, 18 May 2011 9:07 AM To: 'Bert Gunter'; 'Peter Ehlers' Cc: 'R list' Subject: RE: [R] Post-hoc tests in MASS using glm.nb Amen to all of that, Bert. Nicely put. The google style guide (not perfect, but a thoughtful contribution on these kinds of issues, has avoiding attach() as its very first line. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html) I would add, though, that not enough people seem yet to be aware of within(...), a companion of with(...) in a way, but used for modifying data frames or other kinds of list objects. It should be seen as a more flexible replacement for transform() (well, almost). The difference between with() and within() is as follows: with(data, expr, ...) allows you to evaluate 'expr' with 'data' providing the primary source for variables, and returns *the evaluated expression* as the result. By contrast within(data, expr, ...) again uses 'data' as the primary source for variables when evaluating 'expr', but now 'expr' is used to modify the varibles in 'data' and returns *the modified data set* as the result. I use this a lot in the data preparation phase of a project, especially, which is usually the longest, trickiest, most important, but least discussed aspect of any data analysis project. Here is a simple example using within() for something you cannot do in one step with transform(): polyData <- within(data.frame(x = runif(500)), { x2 <- x^2 x3 <- x*x2 b <- runif(4) eta <- cbind(1,x,x2,x3) %*% b y <- eta + rnorm(x, sd = 0.5) rm(b) }) check: > str(polyData) 'data.frame': 500 obs. of 5 variables: $ x : num 0.5185 0.185 0.5566 0.2467 0.0178 ... $ y : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ... $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ... $ x3 : num 1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ... $ x2 : num 0.268811 0.034224 0.309802 0.060844 0.000315 ... > Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Wednesday, 18 May 2011 12:08 AM To: Peter Ehlers Cc: R list Subject: Re: [R] Post-hoc tests in MASS using glm.nb Folks: > Only if the user hasn't yet been introduced to the with() function, > which is linked to on the ?attach page. > > Note also this sentence from the ?attach page: > " attach can lead to confusion." > > I can't remember the last time I needed attach(). > > Peter Ehlers Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential "confusion" comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, "a" in the formula and in the attached data and another "a" also in the global environment. Then the wrong "a" would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the "data = " argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert > > [... non-germane material snipped ...] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech
Re: [R] Post-hoc tests in MASS using glm.nb
Amen to all of that, Bert. Nicely put. The google style guide (not perfect, but a thoughtful contribution on these kinds of issues, has avoiding attach() as its very first line. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html) I would add, though, that not enough people seem yet to be aware of within(...), a companion of with(...) in a way, but used for modifying data frames or other kinds of list objects. It should be seen as a more flexible replacement for transform() (well, almost). The difference between with() and within() is as follows: with(data, expr, ...) allows you to evaluate 'expr' with 'data' providing the primary source for variables, and returns *the evaluated expression* as the result. By contrast within(data, expr, ...) again uses 'data' as the primary source for variables when evaluating 'expr', but now 'expr' is used to modify the varibles in 'data' and returns *the modified data set* as the result. I use this a lot in the data preparation phase of a project, especially, which is usually the longest, trickiest, most important, but least discussed aspect of any data analysis project. Here is a simple example using within() for something you cannot do in one step with transform(): polyData <- within(data.frame(x = runif(500)), { x2 <- x^2 x3 <- x*x2 b <- runif(4) eta <- cbind(1,x,x2,x3) %*% b y <- eta + rnorm(x, sd = 0.5) rm(b) }) check: > str(polyData) 'data.frame': 500 obs. of 5 variables: $ x : num 0.5185 0.185 0.5566 0.2467 0.0178 ... $ y : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ... $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ... $ x3 : num 1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ... $ x2 : num 0.268811 0.034224 0.309802 0.060844 0.000315 ... > Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Wednesday, 18 May 2011 12:08 AM To: Peter Ehlers Cc: R list Subject: Re: [R] Post-hoc tests in MASS using glm.nb Folks: > Only if the user hasn't yet been introduced to the with() function, > which is linked to on the ?attach page. > > Note also this sentence from the ?attach page: > " attach can lead to confusion." > > I can't remember the last time I needed attach(). > > Peter Ehlers Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential "confusion" comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, "a" in the formula and in the attached data and another "a" also in the global environment. Then the wrong "a" would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the "data = " argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert > > [... non-germane material snipped ...] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post-hoc tests in MASS using glm.nb
Folks: > Only if the user hasn't yet been introduced to the with() function, > which is linked to on the ?attach page. > > Note also this sentence from the ?attach page: > " attach can lead to confusion." > > I can't remember the last time I needed attach(). > > Peter Ehlers Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential "confusion" comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, "a" in the formula and in the attached data and another "a" also in the global environment. Then the wrong "a" would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the "data = " argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert > > [... non-germane material snipped ...] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post-hoc tests in MASS using glm.nb
On 2011-05-17 02:22, Timothy Bates wrote: Dear Bryony: the suggestion was not to change the name of the data object, but to explicitly tell glm.nb what dataset it should look in to find the variables you mention in the formula. so the salient difference is: m1<- glm.nb(Cells ~ Cryogel*Day, data = side) instead of attach(side) m1<- glm.nb(Cells ~ Cryogel*Day) This works for other functions also, but not uniformly as yet (how I wish it did and I could say hist(x, data=side) Instead of hist(side$x) this inconsistency encourages the need for attach() Only if the user hasn't yet been introduced to the with() function, which is linked to on the ?attach page. Note also this sentence from the ?attach page: " attach can lead to confusion." I can't remember the last time I needed attach(). Peter Ehlers [... non-germane material snipped ...] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post-hoc tests in MASS using glm.nb
On May 17, 2011, at 4:09 AM, Bryony Tolhurst wrote: Dear Bill Many thanks. I will try this. One question: why is the attach()function problematic? I have always done it that way (well in my very limited R-using capacity!) as dictated by 'Statistics, an Introduction using R' by Michael Crawley. You should pick your dictators with more care. My dataset is called Side ('Side.txt') so how do I import this data without using attach(data)? The data object is there as soon as you execute the read.table function successfully. I have tried: side<-read.table('Side.txt',T) # NOT attach(side) The regression functions in R generally have a data argument, so you would use this (as Bill already told you) Model1 <- glm.nb(Cells ~ Cryogel*Day, data = side) instead of: data<-read.table('Side.txt',T) attach(data) But obviously I am still using the attach function, if not with 'data'!! Right. There were two problems and you only addressed one of them. -- David. Thanks again Bryony Tolhurst -Original Message- From: bill.venab...@csiro.au [mailto:bill.venab...@csiro.au] Sent: 17 May 2011 03:21 To: Bryony Tolhurst; r-help@r-project.org Subject: RE: [R] Post-hoc tests in MASS using glm.nb ?relevel Also, you might want to fit the models as follows Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData) myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2")) Model2 <- update(Model1, data = myData1) &c You should always spedify the data set when you fit a model if at all possible. I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances). You could fit your model as Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData) This will give you the subclass means as the regression coefficients. You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t- statistics. This is pretty straightforward as well. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of bryony Sent: Tuesday, 17 May 2011 3:46 AM To: r-help@r-project.org Subject: [R] Post-hoc tests in MASS using glm.nb I am struggling to generate p values for comparisons of levels (post- hoc tests) in a glm with a negative binomial distribution I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over- dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them. However, I seem unable to generate multiple comparisons between the days and cryogels. So my model is Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel) The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.2040 0.2743 4.389 1.14e-05 *** Day143.3226 0.3440 9.658 < 2e-16 *** Day283.3546 0.3440 9.752 < 2e-16 *** Day7 3.3638 0.3440 9.779 < 2e-16 *** Cryogel2 0.7097 0.3655 1.942 0.05215 . Cryogel3 0.7259 0.3651 1.988 0.04677 * Cryogel4 1.4191 0.3539 4.010 6.07e-05 *** Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 . Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053 Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 * Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 * Day28:Cryogel3 0.1735 0.4733 0.367 0.71395 Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 * Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 ** Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997 Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 ** HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is. Please help! Many thanks for your input Bryony -- View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html Sent from the R help mailing list archive at Nabble.com. __ R-h
Re: [R] Post-hoc tests in MASS using glm.nb
Dear Bryony: the suggestion was not to change the name of the data object, but to explicitly tell glm.nb what dataset it should look in to find the variables you mention in the formula. so the salient difference is: m1 <- glm.nb(Cells ~ Cryogel*Day, data = side) instead of attach(side) m1 <- glm.nb(Cells ~ Cryogel*Day) This works for other functions also, but not uniformly as yet (how I wish it did and I could say hist(x, data=side) Instead of hist(side$x) this inconsistency encourages the need for attach() best, tim On 17 May 2011, at 9:09 AM, Bryony Tolhurst wrote: > Dear Bill > > Many thanks. I will try this. > > One question: why is the attach()function problematic? I have always done it > that way (well in my very limited R-using capacity!) as dictated by > 'Statistics, an Introduction using R' by Michael Crawley. My dataset is > called Side ('Side.txt') so how do I import this data without using > attach(data)? I have tried: > > side<-read.table('Side.txt',T) > attach(side) > > instead of: > > data<-read.table('Side.txt',T) > attach(data) > > But obviously I am still using the attach function, if not with 'data'!! > > Thanks again > > Bryony Tolhurst > > -Original Message- > From: bill.venab...@csiro.au [mailto:bill.venab...@csiro.au] > Sent: 17 May 2011 03:21 > To: Bryony Tolhurst; r-help@r-project.org > Subject: RE: [R] Post-hoc tests in MASS using glm.nb > > ?relevel > > Also, you might want to fit the models as follows > > Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData) > > myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2")) > Model2 <- update(Model1, data = myData1) > > &c > > You should always spedify the data set when you fit a model if at all > possible. I would recommend you NEVER use attach() to put it on the search > path, (under all but the most exceptional circumstances). > > You could fit your model as > > Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData) > > This will give you the subclass means as the regression coefficients. You > can then use vcov(Model0) to get the variance matrix and compare any two you > like using directly calculated t-statistics. This is pretty straightforward > as well. > > Bill Venables. > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of bryony > Sent: Tuesday, 17 May 2011 3:46 AM > To: r-help@r-project.org > Subject: [R] Post-hoc tests in MASS using glm.nb > > I am struggling to generate p values for comparisons of levels (post-hoc > tests) in a glm with a negative binomial distribution > > I am trying to compare cell counts on different days as grown on different > media (e.g. types of cryogel) so I have 2 explanatory variables (Day and > Cryogel), which are both factors, and an over-dispersed count variable > (number of cells) as the response. I know that both variables are > significant, and that there is a significant interaction between them. > However, I seem unable to generate multiple comparisons between the days and > cryogels. > > So my model is > > Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel) > > The output gives me comparisons between levels of the factors relative to a > reference level (Day 0 and Cryogel 1) as follows: > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 1.2040 0.2743 4.389 1.14e-05 *** > Day143.3226 0.3440 9.658 < 2e-16 *** > Day283.3546 0.3440 9.752 < 2e-16 *** > Day7 3.3638 0.3440 9.779 < 2e-16 *** > Cryogel2 0.7097 0.3655 1.942 0.05215 . > Cryogel3 0.7259 0.3651 1.988 0.04677 * > Cryogel4 1.4191 0.3539 4.010 6.07e-05 *** > Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 . > Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053 > Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 * > Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 * > Day28:Cryogel3 0.1735 0.4733 0.367 0.71395 > Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 * > Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 ** > Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997 > Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 ** > > > HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on > each of the days. I realise that such multiple comparsions need to be > approached with care to avoid Type 1 error, however it is easy to do this in
Re: [R] Post-hoc tests in MASS using glm.nb
Dear Bill Many thanks. I will try this. One question: why is the attach()function problematic? I have always done it that way (well in my very limited R-using capacity!) as dictated by 'Statistics, an Introduction using R' by Michael Crawley. My dataset is called Side ('Side.txt') so how do I import this data without using attach(data)? I have tried: side<-read.table('Side.txt',T) attach(side) instead of: data<-read.table('Side.txt',T) attach(data) But obviously I am still using the attach function, if not with 'data'!! Thanks again Bryony Tolhurst -Original Message- From: bill.venab...@csiro.au [mailto:bill.venab...@csiro.au] Sent: 17 May 2011 03:21 To: Bryony Tolhurst; r-help@r-project.org Subject: RE: [R] Post-hoc tests in MASS using glm.nb ?relevel Also, you might want to fit the models as follows Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData) myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2")) Model2 <- update(Model1, data = myData1) &c You should always spedify the data set when you fit a model if at all possible. I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances). You could fit your model as Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData) This will give you the subclass means as the regression coefficients. You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t-statistics. This is pretty straightforward as well. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of bryony Sent: Tuesday, 17 May 2011 3:46 AM To: r-help@r-project.org Subject: [R] Post-hoc tests in MASS using glm.nb I am struggling to generate p values for comparisons of levels (post-hoc tests) in a glm with a negative binomial distribution I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over-dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them. However, I seem unable to generate multiple comparisons between the days and cryogels. So my model is Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel) The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.2040 0.2743 4.389 1.14e-05 *** Day143.3226 0.3440 9.658 < 2e-16 *** Day283.3546 0.3440 9.752 < 2e-16 *** Day7 3.3638 0.3440 9.779 < 2e-16 *** Cryogel2 0.7097 0.3655 1.942 0.05215 . Cryogel3 0.7259 0.3651 1.988 0.04677 * Cryogel4 1.4191 0.3539 4.010 6.07e-05 *** Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 . Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053 Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 * Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 * Day28:Cryogel3 0.1735 0.4733 0.367 0.71395 Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 * Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 ** Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997 Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 ** HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is. Please help! Many thanks for your input Bryony -- View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Post-hoc tests in MASS using glm.nb
?relevel Also, you might want to fit the models as follows Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData) myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2")) Model2 <- update(Model1, data = myData1) &c You should always spedify the data set when you fit a model if at all possible. I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances). You could fit your model as Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData) This will give you the subclass means as the regression coefficients. You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t-statistics. This is pretty straightforward as well. Bill Venables. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of bryony Sent: Tuesday, 17 May 2011 3:46 AM To: r-help@r-project.org Subject: [R] Post-hoc tests in MASS using glm.nb I am struggling to generate p values for comparisons of levels (post-hoc tests) in a glm with a negative binomial distribution I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over-dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them. However, I seem unable to generate multiple comparisons between the days and cryogels. So my model is Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel) The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.2040 0.2743 4.389 1.14e-05 *** Day143.3226 0.3440 9.658 < 2e-16 *** Day283.3546 0.3440 9.752 < 2e-16 *** Day7 3.3638 0.3440 9.779 < 2e-16 *** Cryogel2 0.7097 0.3655 1.942 0.05215 . Cryogel3 0.7259 0.3651 1.988 0.04677 * Cryogel4 1.4191 0.3539 4.010 6.07e-05 *** Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 . Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053 Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 * Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 * Day28:Cryogel3 0.1735 0.4733 0.367 0.71395 Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 * Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 ** Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997 Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 ** HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is. Please help! Many thanks for your input Bryony -- View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Post-hoc tests in MASS using glm.nb
I am struggling to generate p values for comparisons of levels (post-hoc tests) in a glm with a negative binomial distribution I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over-dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them. However, I seem unable to generate multiple comparisons between the days and cryogels. So my model is Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel) The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.2040 0.2743 4.389 1.14e-05 *** Day143.3226 0.3440 9.658 < 2e-16 *** Day283.3546 0.3440 9.752 < 2e-16 *** Day7 3.3638 0.3440 9.779 < 2e-16 *** Cryogel2 0.7097 0.3655 1.942 0.05215 . Cryogel3 0.7259 0.3651 1.988 0.04677 * Cryogel4 1.4191 0.3539 4.010 6.07e-05 *** Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 . Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053 Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 * Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 * Day28:Cryogel3 0.1735 0.4733 0.367 0.71395 Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 * Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 ** Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997 Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 ** HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is. Please help! Many thanks for your input Bryony -- View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.