[R] ANCOVA
Hello, I am analyzing a data set that has possible covariates. It is also multivariate. Are there any R function which can do analysis of covariance? Thanks. John Cardinale [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANCOVA
On Jan 6, 2007, at 8:34 AM, John Cardinale wrote: Are there any R function which can do analysis of covariance? ?lm RSiteSearch('ancova') _ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400Charlottesville, VA 22904-4400 Parcels:Room 102Gilmer Hall McCormick RoadCharlottesville, VA 22903 Office:B011+1-434-982-4729 Lab:B019+1-434-982-4751 Fax:+1-434-982-4766 WWW:http://www.people.virginia.edu/~mk9y/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANCOVA
On 1/6/07, Michael Kubovy [EMAIL PROTECTED] wrote: On Jan 6, 2007, at 8:34 AM, John Cardinale wrote: Are there any R function which can do analysis of covariance? ?lm RSiteSearch('ancova') Given the question, you'll probably need to find how to do an ancova with lm. Several documents in http://cran.r-project.org/other-docs.html will show you how (and why ancova is just one special case of linear model). In particular, I think Faraway's Practical regression and Anova using R has explicit chapters/sections for Ancova. Many of other standard texts on R/S do too. R. _ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400Charlottesville, VA 22904-4400 Parcels:Room 102Gilmer Hall McCormick RoadCharlottesville, VA 22903 Office:B011+1-434-982-4729 Lab:B019+1-434-982-4751 Fax:+1-434-982-4766 WWW:http://www.people.virginia.edu/~mk9y/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANCOVA
Please look at the help file ?ancova in the HH package. Please get HH_1.17 which was up on CRAN yesterday. The source file has propagated to the mirrors. As of a few minutes ago, the Windows binary is on cran.at.r-project.org but not yet at the mirrors. Be sure to have history recording on in the graphics window because it will be very helpful to toggle between the displays for the different models. Rich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANCOVA in S-plus/R?
Dear R user: I have a question about doing ANCOVA in S-plus or R. I know that many users use lm to do the regression and check the ANCOVA. But is there a way to get the traditional Table form of the ANCOVA test through S-plus (like what we would get from SPSS or SAS)? The problem Im interested in is whether or not there is a treatment effect on some medical measurement. I will have the endpoint (E) and the baseline (B) of the measurement, together with a dummy variable (T) of what treatment the patients get (control, or new). Im thinking of using lm( (E-B) ~ B + T, data = ..) Question 1: if I get the regression results (E-B) = b0 + b1*B + b2*T and in order to check whether or not there is a treatment effect, all I need to do is to check whether the p-value associated with b2 is less than the significant level I set or theres more I need to do? Question 2: If I put anova( lm( (E-B) ~ B + T, data = ..) ), is the table I get here the one I want for an ANCOVA? My concern is that ANCOVA is supposed to work only on residual compared to ANOVA, but we are using the same command here. If this gives me the ANCOVA table, does the order of the variables in regression matter? Will anova( lm( (E-B) ~ T + B, data = ..) ) gives the same result? Question 3: if I have another dummy variable to indicate the location of hospital each patient goes to (L) and want to include it to my model, should I do lm( (E-B) ~ B + T + L, data = ..) or I need to consider the interactions? Thanks a lot! Cindy __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ANCOVA in S-plus/R?
To get the ANCOVA table you need to look at the anova() function. The variables T and L must be factors to get the multi-degree-of-freedom anova tables you are looking for. Order matters. You get the same residual, but the sequential sums of squares differ. bt.aov - aov(E ~ B + T) anova(bt.aov) gives you the (t-1)df test of common intercept vs variable intercept. This is equivalent to comparing the adjusted means. tb.aov - aov(E ~ T + B) anova(tb.aov) gives you the 1df test of 0 slope (ordinary ANOVA) vs non-zero common slope. lm() and aov() are essentially the same program. The default output is different. When you bring in the blocking effect of hospital, you will most likely want that sequentially first. lbt.aov - aov(E ~ L + B + T) anova(lbt.aov) If you want to test for interaction with the hospital effect, you can use l.bt.aov - aov(E ~ L * (B + T)) anova(l.bt.aov) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ANCOVA, Ops.factor, singular fit???
[EMAIL PROTECTED] writes: I'm trying to perform ANCOVAs in R 1.14, on a Mac OS X, but I can't figure out ?! There's no version 1.14 of R. what I am doing wrong. Essentially, I'm testing whether a number of quantitative dental measurements (the response variables in each ANCOVA) show sexual dimorphism (the sexes are the groups) independently of the animal's size (the concomitant variable). I have attached a 13-column matrix as a data frame (so far, so good). But then I tried to do this: model-lm(ln2~sex*ln1) or this: model-lm(ln2~sex+ln1) and got this: Warning message: - not meaningful for factors in: Ops.factor(y, z$residuals) which I don't understand. (In my matrix, ln2 is the name of the second column, a response variable, and ln1 is the name of the first column, a concomitant variable. Sex is the rightmost column, indicating sex. The first 14 rows are measurements for male individuals, and the next 13 rows are measurements for female individuals.) The data output is bizarre, too--it's just so long, and everything begins with ln 11 or ln 12. How can I fix this? My best guess is that you have a data error so that ln1 and ln2 are not read as numeric variables. Nonstandard codes for missing will do that to you, for instance. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ANCOVA, Ops.factor, singular fit???
On Sat, 2006-05-20 at 08:36 +0200, Peter Dalgaard wrote: [EMAIL PROTECTED] writes: I'm trying to perform ANCOVAs in R 1.14, on a Mac OS X, but I can't figure out ?! There's no version 1.14 of R. Looking at the Mac page on CRAN, I suspect that this is the GUI version number, rather than the R version number. This has confused me as well in prior posts. I don't use a Mac, so am unclear as to how this particular version number is obtained. Is there a Help - About type of menu on the Mac GUI where this value is displayed? Not sure if this would be considered user error, or if the GUI is unclear as to differing version numbering between R and the other components on Macs. what I am doing wrong. Essentially, I'm testing whether a number of quantitative dental measurements (the response variables in each ANCOVA) show sexual dimorphism (the sexes are the groups) independently of the animal's size (the concomitant variable). I have attached a 13-column matrix as a data frame This may be the problem. If the data source is a matrix, which can of course only contain a single data type, the inclusion of a gender column, which is presumably textual, will force all columns to text in the matrix and then to factors in the data frame: ln1 - 1:5 ln2 - 1:5 sex - c(Male, Female, Female, Male, Male) mat - cbind(ln1, sex, ln2) mat ln1 sex ln2 [1,] 1 Male 1 [2,] 2 Female 2 [3,] 3 Female 3 [4,] 4 Male 4 [5,] 5 Male 5 str(mat) chr [1:5, 1:3] 1 2 3 4 5 Male Female Female Male ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:3] ln1 sex ln2 Note that all columns in 'mat' are now character vectors. DF - as.data.frame(mat) DF ln1sex ln2 1 1 Male 1 2 2 Female 2 3 3 Female 3 4 4 Male 4 5 5 Male 5 str(DF) `data.frame': 5 obs. of 3 variables: $ ln1: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 $ sex: Factor w/ 2 levels Female,Male: 2 1 1 2 2 $ ln2: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 Note that all columns are factors, the default behavior of converting character vectors into a data frame. Now the model with interactions: model - lm(ln2 ~ sex * ln1, data = DF) Warning message: - not meaningful for factors in: Ops.factor(y, z$residuals) summary(model) Call: lm(formula = ln2 ~ sex * ln1, data = DF) Residuals: ALL 5 residuals are 0: no residual degrees of freedom! Coefficients: (5 not defined because of singularities) Estimate Std. Error t value Pr(|t|) (Intercept) 3 NA NA NA sexMale-2 NA NA NA ln12 -1 NA NA NA ln13 NA NA NA NA ln143 NA NA NA ln154 NA NA NA sexMale:ln12 NA NA NA NA sexMale:ln13 NA NA NA NA sexMale:ln14 NA NA NA NA sexMale:ln15 NA NA NA NA Residual standard error: NA on 0 degrees of freedom Multiple R-Squared:NA, Adjusted R-squared:NA F-statistic:NA on 4 and 0 DF, p-value: NA Warning message: ^ not meaningful for factors in: Ops.factor(r, 2) The long output that you refer to is presumably the multitude of terms and interactions at the various levels of the three factors. How did you read in the data initially? You need to be careful to maintain the data types, as Peter notes below. Reviewing the R Import/Export Manual may be helpful. HTH, Marc Schwartz (so far, so good). But then I tried to do this: model-lm(ln2~sex*ln1) or this: model-lm(ln2~sex+ln1) and got this: Warning message: - not meaningful for factors in: Ops.factor(y, z$residuals) which I don't understand. (In my matrix, ln2 is the name of the second column, a response variable, and ln1 is the name of the first column, a concomitant variable. Sex is the rightmost column, indicating sex. The first 14 rows are measurements for male individuals, and the next 13 rows are measurements for female individuals.) The data output is bizarre, too--it's just so long, and everything begins with ln 11 or ln 12. How can I fix this? My best guess is that you have a data error so that ln1 and ln2 are not read as numeric variables. Nonstandard codes for missing will do that to you, for instance. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] ANCOVA, Ops.factor, singular fit???
I'm trying to perform ANCOVAs in R 1.14, on a Mac OS X, but I can't figure out what I am doing wrong. Essentially, I'm testing whether a number of quantitative dental measurements (the response variables in each ANCOVA) show sexual dimorphism (the sexes are the groups) independently of the animal's size (the concomitant variable). I have attached a 13-column matrix as a data frame (so far, so good). But then I tried to do this: model-lm(ln2~sex*ln1) or this: model-lm(ln2~sex+ln1) and got this: Warning message: - not meaningful for factors in: Ops.factor(y, z$residuals) which I don't understand. (In my matrix, ln2 is the name of the second column, a response variable, and ln1 is the name of the first column, a concomitant variable. Sex is the rightmost column, indicating sex. The first 14 rows are measurements for male individuals, and the next 13 rows are measurements for female individuals.) The data output is bizarre, too--it's just so long, and everything begins with ln 11 or ln 12. How can I fix this? I have another question: If possible I would like to use a robust fit algorithm to fit the data. When I attempt to do this, substituting rlm() for lm(), the program returns another message: Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, : 'x' is singular: singular fits are not implemented in rlm What is a singular fit and why is it, apparently, undesirable? I am new to R and any help would be greatly appreciated. Thanks so much, Rebecca Sealfon __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ANCOVA with random factor
I don't understand: I just ran the first example in the aov help page, and it produced F ratios and p values. If this does not answer your question, I suggest you read Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer) if you haven't already and try to use lme for what you want. If all your applications are perfectly balanced, then aov may be fine for you. However, the function aov codified routine practice for this kind of problem as of ~1980. However, there have been substantive developments in this area since then, and Pinheiro and Bates is the best reference I know on both the theory and the application -- as of 2000. hope this helps. spencer graves David Semmens wrote: I would like to know if there is a way of directly calculating the F-ratio of a random effect using the aov function. I have 2 factors in my model, population which is random and length which is the length of female fish within each population. The dependent variable is diam which is the average diameter of eggs produced by each female. At present I set up the model like this: model - aov(diam~population*length, data=data) anova(model) then using the output: ratio=MS-population/MS-interaction 1-pf(ratio, df-population,df-interaction) Is there a way of getting r to calculate the correct F-ratio for a random factor and present it in the original output? Thanks, David Semmens. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] ANCOVA with random factor
I would like to know if there is a way of directly calculating the F-ratio of a random effect using the aov function. I have 2 factors in my model, population which is random and length which is the length of female fish within each population. The dependent variable is diam which is the average diameter of eggs produced by each female. At present I set up the model like this: model - aov(diam~population*length, data=data) anova(model) then using the output: ratio=MS-population/MS-interaction 1-pf(ratio, df-population,df-interaction) Is there a way of getting r to calculate the correct F-ratio for a random factor and present it in the original output? Thanks, David Semmens. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ANCOVA Post-hoc test
What search terms did you use? Have you considered the multcomp and multtest packages? spencer graves p.s. If you'd like more help from this list, I suggest you read the posting guide! www.R-project.org/posting-guide.html. Anecdotal evidence suggests that posts more closely aligned with this guide tend to get quicker, more useful replies. Nicolas Poulet wrote: Hello, Despite my search, I didn't find a post-hoc test for an ANCOVA. I used the functions aov() and lm() to run the ANCOVA then I tried TukeyHSD() but it didn't work (because of the covariable is a continuous variable?). Furthermore, I would like to plot the adjusted values (i.e. the values of the tested variable taking into account the covariable). Thanks for your help! N. Poulet [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] ANCOVA Post-hoc test
Hello, Despite my search, I didn't find a post-hoc test for an ANCOVA. I used the functions aov() and lm() to run the ANCOVA then I tried TukeyHSD() but it didn't work (because of the covariable is a continuous variable?). Furthermore, I would like to plot the adjusted values (i.e. the values of the tested variable taking into account the covariable). Thanks for your help! N. Poulet [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ancova and lme use
Mon cher M. MENICACCI: It looks to me like you ultimately want to use lmer in library(lme4) [which also requires library(Matrix)]. For documentation, I suggest you start with Doug Bates (2005) Fitting Linear Mixed Models in R, R News, vol. 5/1: 27-30 (available from www.r-project.org - Newsletter). After install.packages(lme4), I suggest you read Implementation.pdf in ~R\library\lme4\doc. I also suggest you get Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer). For me, this book was essential documentation for lme, the previous implementation of lmer. Studying that book might help you understand lmer. Also, I encourage you to use the extensive random number generation capabilities in R (including the nlme and lme4 packages) to produce simulated data like you expect to collect and try to analyze the simulated data. You should simulate both what you expect to see and the null hypothesis as well. If you encounter difficulties doing that, please submit another question to this listserve. Before submitting another post, I suggest you help yourself by reading the posting guide! www.R-project.org/posting-guide.html. Anecdotal evidence suggests that posts that are more consistent with this posting guide generally get more useful replies quicker. bon chance. spencer graves [EMAIL PROTECTED] wrote: Dear R-users, We expect to develop statistic procedures and environnement for the computational analysis of our experimental datas. To provide a proof of concept, we plan to implement a test for a given experiment. Its design split data into 10 groups (including a control one) with 2 mesures for each (ref at t0 and response at t1). We aim to compare each group response with control response (group 1) using a multiple comparison procedure (Dunnett test). Before achieving this, we have to normalize our data : response values cannot be compared if base line isn't corrected. Covariance analysis seems to represent the best way to do this. But how to perform this by using R ? Actually, we have identify some R functions of interest regarding this matter (lme(), lm() and glm()). For example we plan to do as describe : glm(response~baseline) and then simtest(response_corrected~group, type=Dunnett, ttype=logical) If a mixed model seems to better fit our experiment, we have some problems on using the lme function : lme(response~baseline) returns an error (Invalid formula for groups). So : Are fitted values represent the corrected response ? Is it relevant to perform these tests in our design ? And how to use lme in a glm like way ? If someone could bring us your its precious knowledge to validate our analytical protocol and to express its point of view on implementation strategy ? Best regards. Alexandre MENICACCI Bioinformatics - FOURNIER PHARMA 50, rue de Dijon - 21121 Daix - FRANCE [EMAIL PROTECTED] tél : 03.80.44.76.17 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Ancova and lme use
Dear R-users, We expect to develop statistic procedures and environnement for the computational analysis of our experimental datas. To provide a proof of concept, we plan to implement a test for a given experiment. Its design split data into 10 groups (including a control one) with 2 mesures for each (ref at t0 and response at t1). We aim to compare each group response with control response (group 1) using a multiple comparison procedure (Dunnett test). Before achieving this, we have to normalize our data : response values cannot be compared if base line isn't corrected. Covariance analysis seems to represent the best way to do this. But how to perform this by using R ? Actually, we have identify some R functions of interest regarding this matter (lme(), lm() and glm()). For example we plan to do as describe : glm(response~baseline) and then simtest(response_corrected~group, type=Dunnett, ttype=logical) If a mixed model seems to better fit our experiment, we have some problems on using the lme function : lme(response~baseline) returns an error (Invalid formula for groups). So : Are fitted values represent the corrected response ? Is it relevant to perform these tests in our design ? And how to use lme in a glm like way ? If someone could bring us your its precious knowledge to validate our analytical protocol and to express its point of view on implementation strategy ? Best regards. Alexandre MENICACCI Bioinformatics - FOURNIER PHARMA 50, rue de Dijon - 21121 Daix - FRANCE [EMAIL PROTECTED] tél : 03.80.44.76.17 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] ANCOVA
Dear Matt, The sequential sums of squares produced by anova() test for g ignoring x (and the interaction), x after g (and ignoring the interaction), and the x:g interaction after g and x. The second and third test are generally sensible, but the first doesn't adjust for x, which is probably not what you want in general (although you've constructed x to be independent of g, which is typically not the case). Note that you would not usually want to test for equal intercepts in the model that doesn't constrain the slopes to be equal (though you haven't done this): Among other things, 0 is outside the range of x. The Anova() function in the car package will produce so-called type-II tests. I hope that this helps. John -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matt Oliver Sent: Friday, August 27, 2004 4:10 PM To: [EMAIL PROTECTED] Subject: [R] ANCOVA Dear R-help list, I am attempting to understand the proper formulation of ANCOVA's in R. I would like to test both parallelism and intercept equality for some data sets, so I have generated an artificial data set to ease my understanding. This is what I have done #Limits of random error added to vectors min - -0.1 max - 0.1 x - c(c(1:10), c(1:10))+runif(20, min, max) x1 - c(c(1:10), c(1:10))+runif(20, min, max) y - c(c(1:10), c(10:1))+runif(20, min, max) z - c(c(1:10), c(11:20))+runif(20, min, max) g - as.factor(c(rep(1, 10), rep(2, 10))) #x and x1 have similar slopes and have the similar intercepts, #x and y have different slopes and different intercepts #x and z have similar slopes with different intercepts #These are my full effects models fitx1x - lm(x1~g+x+x:g) fityx - lm(y~g+x+x:g) fitzx - lm(z~g+x+x:g) anova(fitx1x) Analysis of Variance Table Response: x1 Df Sum Sq Mean Sq F value Pr(F) g 1 0.002 0.002 0.3348 0.5709 x 1 163.927 163.927 23456.8319 2e-16 *** g:x1 0.002 0.002 0.2671 0.6123 Residuals 16 0.112 0.007 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 These results confirm that x and x1 do not have significantly different means with respect to g; There is a significant linear relationship between x and x1 independent of g There is no evidence that the slopes of x and x1 in the two g groups is different anova(fityx) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(F) g 1 0.012 0.012 1.7344 0.2064 x 1 0.003 0.003 0.4399 0.5166 g:x 1 164.947 164.947 24274.4246 2e-16 *** Residuals 160.1090.007 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 These results confirm that x and y do not have significantly different means with respect to g; There is a not a significant linear relationship between x and y independent of g There is evidence that the slopes of x and y in the two g groups is significantly different. anova(fitzx) Analysis of Variance Table Response: z DfSum Sq Mean Sq F value Pr(F) g 1 501.07501.07 52709.9073 2e-16 *** x 1 165.39165.39 17398.4057 2e-16 *** g:x10.02 0.02 1.7472 0.2048 Residuals 160.15 0.01 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 These results confirm that x and z have significantly different means with respect to g; There is a a significant linear relationship between x and z independent of g There is no evidence that the slopes of x and z in the two g groups is significantly different. What I don't understand is how to formulate the model so that I can tell if the intercepts between the g groups are different. Also, how would I formulate an ANCOVA if I am dealing with Model II regressions? Any help would be greatly appreciated. Matt == When you reach an equilibrium in biology, you're dead. - A. Mandell == Matthew J. Oliver Institute of Marine and Coastal Sciences 71 Dudley Road, New Brunswick New Jersey, 08901 http://marine.rutgers.edu/cool/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
[R] ANCOVA
Dear R-help list, I am attempting to understand the proper formulation of ANCOVA's in R. I would like to test both parallelism and intercept equality for some data sets, so I have generated an artificial data set to ease my understanding. This is what I have done #Limits of random error added to vectors min - -0.1 max - 0.1 x - c(c(1:10), c(1:10))+runif(20, min, max) x1 - c(c(1:10), c(1:10))+runif(20, min, max) y - c(c(1:10), c(10:1))+runif(20, min, max) z - c(c(1:10), c(11:20))+runif(20, min, max) g - as.factor(c(rep(1, 10), rep(2, 10))) #x and x1 have similar slopes and have the similar intercepts, #x and y have different slopes and different intercepts #x and z have similar slopes with different intercepts #These are my full effects models fitx1x - lm(x1~g+x+x:g) fityx - lm(y~g+x+x:g) fitzx - lm(z~g+x+x:g) anova(fitx1x) Analysis of Variance Table Response: x1 Df Sum Sq Mean Sq F value Pr(F) g 1 0.002 0.002 0.3348 0.5709 x 1 163.927 163.927 23456.8319 2e-16 *** g:x1 0.002 0.002 0.2671 0.6123 Residuals 16 0.112 0.007 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 These results confirm that x and x1 do not have significantly different means with respect to g; There is a significant linear relationship between x and x1 independent of g There is no evidence that the slopes of x and x1 in the two g groups is different anova(fityx) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(F) g 1 0.012 0.0121.7344 0.2064 x 1 0.003 0.0030.4399 0.5166 g:x 1 164.947 164.947 24274.4246 2e-16 *** Residuals 160.1090.007 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 These results confirm that x and y do not have significantly different means with respect to g; There is a not a significant linear relationship between x and y independent of g There is evidence that the slopes of x and y in the two g groups is significantly different. anova(fitzx) Analysis of Variance Table Response: z DfSum Sq Mean Sq F value Pr(F) g 1 501.07501.0752709.9073 2e-16 *** x 1 165.39165.3917398.4057 2e-16 *** g:x10.02 0.02 1.7472 0.2048 Residuals 160.15 0.01 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 These results confirm that x and z have significantly different means with respect to g; There is a a significant linear relationship between x and z independent of g There is no evidence that the slopes of x and z in the two g groups is significantly different. What I don't understand is how to formulate the model so that I can tell if the intercepts between the g groups are different. Also, how would I formulate an ANCOVA if I am dealing with Model II regressions? Any help would be greatly appreciated. Matt == When you reach an equilibrium in biology, you're dead. - A. Mandell == Matthew J. Oliver Institute of Marine and Coastal Sciences 71 Dudley Road, New Brunswick New Jersey, 08901 http://marine.rutgers.edu/cool/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] ANCOVA when you don't know factor levels
Hello people I am doing some thinking about how to analyse data on dimorphic animals - where different individuals of the same species have rather different morphology. An example of this is that some male beetles have large horns and small wings, and rely on beating the other guys up to get access to mates, whereas others have smaller horns and larger wings, and rely on mobility to find mates before the guys with the big horns turn up and beat them up. Normally what we do to see if this is happening is to plot a scatter plot of horn length vs body size. If it's a simple straight line relationship then no dimorphism, but if it's either a line with an obvious change of slope or two separate lines then you've got a dimorphic animal. If you've just got a change of slope then it's relatively easy to test for significance by breakpoint regression, which will also tell you the body size at which the beetles switch from 'big wings, small horns' strategy to the other. What is more problematic, however, is if you have a dataset that is essentially two linear regressions that overlap in both X and Y. Here you can't do a breakpoint regression. The solution that I have come up with is to put a straight line between the two clouds of data and to use that line to define a two-level factor and fit a model with body size, the factor and the interaction to the horn size data. The line can then be moved up and down, and the slope varied, and the fit of the model for each intercept/slope combination compared. The best fit model can then be compared with a straight linear model with an F-test, which gives you a significance test, and the line allows you to decide which morph a particular beetle conforms to. I've written some R code to do this (see below). What i would like to know is a) if this problem has been dealt with before elsewhere, b) if there's a better way to do it and c) if my R function could be improved at all - it's my first attempt at such a thing, so any input would be really helpful. If anyone wants a sample data set then I can email you one. Thanks for any input Rob Knell School of Biological Sciences Queen Mary, University of London 'Phone +44 (0)20 7882 7720 http://www.qmw.ac.uk/~ugbt794 http://www.mopane.org Here's the code: switchline- function(X,Y,Intmin,Intmax,Intgap,Slopemin,Slopemax,Slopegap) { # X = unimodal variable (e.g. filename$bodysize) # Y = bimodal variable (e.g. filename$hornsize) #Intmin = Minimum switchline intercept #Slopemin = Minimum switchline slope #Intmax = Maximum switchline intercept #Slopemax = Maximum switchline slope #Intgap = Interval between each intercept #Slopegap = Interval between each slope # This function written by Rob Knell 2003. It is an attempt to produce an #objective analysis for datasets of apparently dimorphic characters in which # there is overlap in both the X and Y variable. The routine divides the #individuals into one of two classes based on whether they are above or below # a line, and fits a linear model to the Y variable with the X variable as a #continuous explanatory variable and a single factor, Morph, which is based # on whether they were above the line or below it, plus an interaction term. # The line is then adjusted, the individuals reclassified and the process #repeated. The output is of the form Int - intercept of the line dividing # one morph from another, Slp - the slope and Rsqr, which gives the R- #squared value for the fitted model when the division into factors is based on # that particular line. # An error message probably means that one of your lines has missed the data #altogether - in other words, all of your data points are classified into a #single factor. # Lines.grid- expand.grid(Intercept=seq(Intmin,Intmax,Intgap),Slope=seq(Slopemin,Slope max,Slopegap)) bandwidth-length(Lines.grid$Intercept) Lines-matrix(nrow=bandwidth,ncol=3,data=NA) dimnames(Lines)-list(seq(1,bandwidth),c(Int,Slp,Rsqr)) Lines[,1]-Lines.grid$Intercept Lines[,2]-Lines.grid$Slope for (i in 1:bandwidth) { temp.data-matrix(nrow=length(X),ncol=4,data=NA) colnames(temp.data)-c(X1,Y1,switchvalue,Morph) temp.data[,1]-X temp.data[,2]-Y for(j in 1:length(X)) { temp.data[j,3]-(Lines[i,1]+Lines[i,2]*temp.data[j,1]) ifelse(temp.data[j,2]=temp.data[j,3], temp.data[j,4]-1, temp.data[j,4]-2) } temp.data-data.frame(temp.data) temp.data$Morph-factor(temp.data$Morph) model-lm(temp.data$Y1~temp.data$X1*temp.data$Morph) Lines[i,3]-round(summary(model)$r.squared,6) } print(Lines) Lines-data.frame(Lines) Max-max(Lines$Rsqr) MaxI-Lines$Int[Lines$Rsqr==Max] MaxS-Lines$Slp[Lines$Rsqr==Max]
RE: [R] ANCOVA when you don't know factor levels
This sounds like a good case for mixture regression, for which there's Fritz Leisch's `flexmix' package. However, I don't think flexmix has facility for testing whether the mixture has one vs. two components. Others on the list surely would know more than I do. HTH, Andy From: Rob Knell Hello people I am doing some thinking about how to analyse data on dimorphic animals - where different individuals of the same species have rather different morphology. An example of this is that some male beetles have large horns and small wings, and rely on beating the other guys up to get access to mates, whereas others have smaller horns and larger wings, and rely on mobility to find mates before the guys with the big horns turn up and beat them up. Normally what we do to see if this is happening is to plot a scatter plot of horn length vs body size. If it's a simple straight line relationship then no dimorphism, but if it's either a line with an obvious change of slope or two separate lines then you've got a dimorphic animal. If you've just got a change of slope then it's relatively easy to test for significance by breakpoint regression, which will also tell you the body size at which the beetles switch from 'big wings, small horns' strategy to the other. What is more problematic, however, is if you have a dataset that is essentially two linear regressions that overlap in both X and Y. Here you can't do a breakpoint regression. The solution that I have come up with is to put a straight line between the two clouds of data and to use that line to define a two-level factor and fit a model with body size, the factor and the interaction to the horn size data. The line can then be moved up and down, and the slope varied, and the fit of the model for each intercept/slope combination compared. The best fit model can then be compared with a straight linear model with an F-test, which gives you a significance test, and the line allows you to decide which morph a particular beetle conforms to. I've written some R code to do this (see below). What i would like to know is a) if this problem has been dealt with before elsewhere, b) if there's a better way to do it and c) if my R function could be improved at all - it's my first attempt at such a thing, so any input would be really helpful. If anyone wants a sample data set then I can email you one. Thanks for any input Rob Knell School of Biological Sciences Queen Mary, University of London 'Phone +44 (0)20 7882 7720 http://www.qmw.ac.uk/~ugbt794 http://www.mopane.org Here's the code: switchline- function(X,Y,Intmin,Intmax,Intgap,Slopemin,Slopemax,Slopegap) { # X = unimodal variable (e.g. filename$bodysize) # Y = bimodal variable (e.g. filename$hornsize) #Intmin = Minimum switchline intercept #Slopemin = Minimum switchline slope #Intmax = Maximum switchline intercept #Slopemax = Maximum switchline slope #Intgap = Interval between each intercept #Slopegap = Interval between each slope # This function written by Rob Knell 2003. It is an attempt to produce an #objective analysis for datasets of apparently dimorphic characters in which # there is overlap in both the X and Y variable. The routine divides the #individuals into one of two classes based on whether they are above or below # a line, and fits a linear model to the Y variable with the X variable as a #continuous explanatory variable and a single factor, Morph, which is based # on whether they were above the line or below it, plus an interaction term. # The line is then adjusted, the individuals reclassified and the process #repeated. The output is of the form Int - intercept of the line dividing # one morph from another, Slp - the slope and Rsqr, which gives the R- #squared value for the fitted model when the division into factors is based on # that particular line. # An error message probably means that one of your lines has missed the data #altogether - in other words, all of your data points are classified into a #single factor. # Lines.grid- expand.grid(Intercept=seq(Intmin,Intmax,Intgap),Slope=seq(Slop emin,Slope max,Slopegap)) bandwidth-length(Lines.grid$Intercept) Lines-matrix(nrow=bandwidth,ncol=3,data=NA) dimnames(Lines)-list(seq(1,bandwidth),c(Int,Slp,Rsqr)) Lines[,1]-Lines.grid$Intercept Lines[,2]-Lines.grid$Slope for (i in 1:bandwidth) { temp.data-matrix(nrow=length(X),ncol=4,data=NA) colnames(temp.data)-c(X1,Y1,switchvalue,Morph) temp.data[,1]-X temp.data[,2]-Y for(j in