Re: [R-sig-eco] Regression with few observations per factor level
On 23/10/2014, at 18:17 PM, Gavin Simpson wrote: On 22 October 2014 17:24, Chris Howden ch...@trickysolutions.com.au wrote: A good place to start is by looking at your residuals to determine if the normality assumptions are being met, if not then some form of glm that correctly models the residuals or a non parametric method should be used. Doing that could be very tricky indeed; I defy anyone, without knowledge of how the data were generated, to detect departures from normality in such a small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean. Second, one usually considers the distribution of the response when fitting a GLM, not decide if residuals from an LM are non-Gaussian then move on. The decision to use the GLM should be motivated directly from the data and question to hand. Perhaps sometimes we can get away with fitting the LM, but that usually involves some thought, in which case one has probably already thought about the GLM as well. I agree completely with Gavin. If you have four data points and fit a two-parameter linear model and in addition select a one-parameter exponential family distribution (as implied in selecting a GLM family) you don't have many degrees of freedom left. I don't think you get such models accepted in many journals. Forget the regression and get more data. Some people suggested here that an acceptable model could be possible if your data points are not single observations but means from several observations. That is true: then you can proceed, but consult a statistician on the way to proceed. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
I think there are actually 4 data points per level of some factor (after seeing some of the other no-threaded emails - why can't people use emails that preserve threads?**); but yes, either way this is a small data set and trying to decide if residuals are normal or not is going to be nigh on impossible. I like the suggestion that someone made to actually do some simulation to work out whether you have any power to detect an effect of a given size; seems pointless doing the analysis if you conclusions would be well, I didn't detect an effect, but I have no power so I don't even know if I should have been able to detect an effect if one were present. You'd be in no worse off a position then than if you hadn't run the analysis or collected the data. G ** He says, hoping to heck that GMail preserves the threading information... On 23 October 2014 14:00, Jari Oksanen jari.oksa...@oulu.fi wrote: On 23/10/2014, at 18:17 PM, Gavin Simpson wrote: On 22 October 2014 17:24, Chris Howden ch...@trickysolutions.com.au wrote: A good place to start is by looking at your residuals to determine if the normality assumptions are being met, if not then some form of glm that correctly models the residuals or a non parametric method should be used. Doing that could be very tricky indeed; I defy anyone, without knowledge of how the data were generated, to detect departures from normality in such a small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean. Second, one usually considers the distribution of the response when fitting a GLM, not decide if residuals from an LM are non-Gaussian then move on. The decision to use the GLM should be motivated directly from the data and question to hand. Perhaps sometimes we can get away with fitting the LM, but that usually involves some thought, in which case one has probably already thought about the GLM as well. I agree completely with Gavin. If you have four data points and fit a two-parameter linear model and in addition select a one-parameter exponential family distribution (as implied in selecting a GLM family) you don't have many degrees of freedom left. I don't think you get such models accepted in many journals. Forget the regression and get more data. Some people suggested here that an acceptable model could be possible if your data points are not single observations but means from several observations. That is true: then you can proceed, but consult a statistician on the way to proceed. Cheers, Jari Oksanen -- Gavin Simpson, PhD [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
Why not take the opportunity of getting to know ABC some more? Rasmus Bååth wrote a piece on Tiny Data and ABC which might suit your problem very well. http://www.r-bloggers.com/tiny-data-approximate-bayesian-computation-and-the-socks-of-karl-broman/ Cheers /Lars On 2014-10-22 08:19, V. Coudrain wrote: With such a small data set, why not simulate some data sets with reasonable effect sizes and see how an analysis performs? Krzysztof Dear Krzysztof, It is good idea. Would you know some R functions thatis are well suited for this kind of simulations ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
A good place to start is by looking at your residuals to determine if the normality assumptions are being met, if not then some form of glm that correctly models the residuals or a non parametric method should be used. But just as important though is considering how you intend to use your data and exactly what it is. Irrelevant to what the statistics says if you only have 4 datum are you really confident in making broad generalisations with it? And writing a paper with your name on it? Just a couple datum could change everything, particularly if the scale isn't bounded so outliers can have a big impact. If the datum are some form of average I would be more confident with only 4 of them, but if they are raw values I would consider being very cautious about any conclusions you draw. Another reason I would be cautious of a result using only 4 datum is that their p value estimates may be very poorly estimated. Although not widely discussed we often use the Central limit theorem to assume parameter estimates are normally distributed when calculating the p value. (Because parameters can be thought of as weighted average the CLT applies to them). With only 4 datum we can't invoke the magic of the CLT and since there is no way to test whether the parameters are normal we take quite a risk assuming we have accurate p values at small sample sample sizes Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax / office) ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. On 22 Oct 2014, at 17:20, V. Coudrain v_coudr...@voila.fr wrote: With such a small data set, why not simulate some data sets with reasonable effect sizes and see how an analysis performs? Krzysztof Dear Krzysztof, It is good idea. Would you know some R functions thatis are well suited for this kind of simulations ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
Dear All, Please do not take any offence, I would really like to be removed from this mailing list, can someone let me know how this can be done. Best Regards, -- Nicholas Hamilton School of Materials Science and Engineering University of New South Wales (Australia) -- www.ggtern.com On 23 Oct 2014, at 10:24 am, Chris Howden ch...@trickysolutions.com.au wrote: A good place to start is by looking at your residuals to determine if the normality assumptions are being met, if not then some form of glm that correctly models the residuals or a non parametric method should be used. But just as important though is considering how you intend to use your data and exactly what it is. Irrelevant to what the statistics says if you only have 4 datum are you really confident in making broad generalisations with it? And writing a paper with your name on it? Just a couple datum could change everything, particularly if the scale isn't bounded so outliers can have a big impact. If the datum are some form of average I would be more confident with only 4 of them, but if they are raw values I would consider being very cautious about any conclusions you draw. Another reason I would be cautious of a result using only 4 datum is that their p value estimates may be very poorly estimated. Although not widely discussed we often use the Central limit theorem to assume parameter estimates are normally distributed when calculating the p value. (Because parameters can be thought of as weighted average the CLT applies to them). With only 4 datum we can't invoke the magic of the CLT and since there is no way to test whether the parameters are normal we take quite a risk assuming we have accurate p values at small sample sample sizes Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax / office) ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. On 22 Oct 2014, at 17:20, V. Coudrain v_coudr...@voila.fr wrote: With such a small data set, why not simulate some data sets with reasonable effect sizes and see how an analysis performs? Krzysztof Dear Krzysztof, It is good idea. Would you know some R functions thatis are well suited for this kind of simulations ___ Mode, hifi, maison,� J'ach�te malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
With such a small data set, why not simulate some data sets with reasonable effect sizes and see how an analysis performs? Krzysztof On Mon, Oct 20, 2014 at 11:53 AM, V. Coudrain v_coudr...@voila.fr wrote: Thank you for this helpful thought. So if I get it correctly it is hopeless to try testing an interaction, but we neverless may assess if a covariate has an impact, providing it is the same in all treatments. Message du 20/10/14 à 16h46 De : Elgin Perry A : v_coudr...@voila.fr Copie à : Objet : Regression with few observations per factor level If it is reasonable to assume that the slope of the covariate is the same for all treatments and you have numerous treatments then you can do this by specifying one slope parameter for all treatments as you gave in your example (e.g. lm(var ~ trt + cov)). By combining slope information over treatments, you can obtain a reasonably precise estimate. With so few observations per treatment, you will not be able to estimate separate slopes for each treatment with any degree of precision (e.g. lm(var ~ trt + trt:cov)) Elgin S. Perry, Ph.D. Statistics Consultant 377 Resolutions Rd. Colonial Beach, Va. 22443 ph. 410.610.1473 Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST) From: V. Coudrain v_coudr...@voila.fr To: r-sig-ecology@r-project.org Subject: [R-sig-eco] Regression with few observations per factor level Message-ID: 2127199056.738451413795221981.JavaMail.www@wwinf7128 Content-Type: text/plain; charset=UTF-8 Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)).? However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion. ___ Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec [[alternative HTML version deleted]] ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable? Message du 20/10/14 à 11h30 De : Roman Luštrik A : V. Coudrain Copie à : r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] Regression with few observations per factor level I think you can, but the confidence intervals will be rather large due to number of samples. Notice how standard errors change for sample size (per group) from 4 to 30. pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals: Min 1Q Median 3Q Max -1.63861 -0.46080 0.03332 0.66380 1.27974 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.2345 1.0218 1.208 0.252 trttrt2 -0.7759 0.8667 -0.895 0.390 trttrt3 7.8503 0.8308 9.449 1.3e-06 ***trttrt4 28.2685 0.9050 31.236 4.3e-12 ***cov 1.4027 1.1639 1.205 0.253 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared: 0.9932,Adjusted R-squared: 0.9908 F-statistic: 404.4 on 4 and 11 DF, p-value: 7.467e-12 pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals: Min 1Q Median 3Q Max -2.5778 -0.6584 -0.0185 0.6423 3.2077 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 2.76961 0.25232 10.977 2e-16 ***trttrt2 -1.75490 0.28546 -6.148 1.17e-08 ***trttrt3 8.40521 0.28251 29.752 2e-16 ***trttrt4 27.04095 0.28286 95.599 2e-16 ***cov 0.05129 0.32523 0.158 0.875 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 0.9913,Adjusted R-squared: 0.991 F-statistic: 3269 on 4 and 115 DF, p-value: 2.2e-16 On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain wrote: Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)). However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion. ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- In God we trust, all others bring data. ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
Hi, coefficients and their p-values are reliable if your data are OK and you do know enough about the process that generated them, so you can choose appropriate model. With 4 points per line, it may be really difficult to identify bad fit or outliers. For example: simple linear regression needs constant variance of the normal distribution from which residuals are drawn - along the regression line - to work properly. With 4 points, you can hardly estimate this, but if you know enough about the process that generated the data, you are safe. If you do not know, it is not easy to say anything about the nature of the process that generated the data. If you know (or can assume) that there is simple linear relationship, you can say: slope of this relationship is such and such, but if you want to estimate both the nature of the relationship (A *linearly* depends on B) and its magnitude (the slope of this relationship is ...), p-values would not help you much. Of course, I may be wrong - I am not a statistician, just a user. Best, Martin W. V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200: Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable? Message du 20/10/14 à 11h30 De : Roman Luštrik A : V. Coudrain Copie à : r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] Regression with few observations per factor level I think you can, but the confidence intervals will be rather large due to number of samples. Notice how standard errors change for sample size (per group) from 4 to 30. pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals: Min 1Q Median 3Q Max -1.63861 -0.46080 0.03332 0.66380 1.27974 Coefficients:Estimate Std. Error t value Pr(|t|) (Intercept) 1.2345 1.0218 1.2080.252trttrt2 -0.7759 0.8667 -0.8950.390trttrt3 7.8503 0.8308 9.449 1.3e-06 ***trttrt4 28.2685 0.9050 31.236 4.3e-12 ***cov 1.4027 1.1639 1.2050.253---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared: 0.9932,Adjusted R-squared: 0.9908 F-statistic: 404.4 on 4 and 11 DF, p-value: 7.467e-12 pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals:Min 1Q Median 3Q Max -2.5778 -0.6584 -0.0185 0.6423 3.2077 Coefficients:Estimate Std. Error t value Pr(|t|) (Intercept) 2.769610.25232 10.977 2e-16 ***trttrt2 -1.75490 0.28546 -6.148 1.17e-08 ***trttrt3 8.405210.28251 29.752 2e-16 ***trttrt4 27.040950.28286 95.599 2e-16 ***cov 0.051290.32523 0.1580.875---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 0.9913,Adjusted R-squared: 0.991 F-statistic: 3269 on 4 and 115 DF, p-value: 2.2e-16 On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain wrote: Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)). However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion. ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- In God we trust, all others bring data. ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version
Re: [R-sig-eco] Regression with few observations per factor level
You are more or less preforming an ANOVA/ANCOVA on your data? As pointed out earlier, all of the normal theory regression assumptions apply. Assuming all of those things are satisfied then if you have large confidence intervals and there are significant differences between groups I don't see why you couldn't correctly infer something about the treatments. Maybe I am missing something. Stephen On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser weis...@natur.cuni.cz wrote: Hi, coefficients and their p-values are reliable if your data are OK and you do know enough about the process that generated them, so you can choose appropriate model. With 4 points per line, it may be really difficult to identify bad fit or outliers. For example: simple linear regression needs constant variance of the normal distribution from which residuals are drawn - along the regression line - to work properly. With 4 points, you can hardly estimate this, but if you know enough about the process that generated the data, you are safe. If you do not know, it is not easy to say anything about the nature of the process that generated the data. If you know (or can assume) that there is simple linear relationship, you can say: slope of this relationship is such and such, but if you want to estimate both the nature of the relationship (A *linearly* depends on B) and its magnitude (the slope of this relationship is ...), p-values would not help you much. Of course, I may be wrong - I am not a statistician, just a user. Best, Martin W. V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200: Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable? Message du 20/10/14 à 11h30 De : Roman Luštrik A : V. Coudrain Copie à : r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] Regression with few observations per factor level I think you can, but the confidence intervals will be rather large due to number of samples. Notice how standard errors change for sample size (per group) from 4 to 30. pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals: Min 1Q Median 3Q Max -1.63861 -0.46080 0.03332 0.66380 1.27974 Coefficients:Estimate Std. Error t value Pr(|t|) (Intercept) 1.2345 1.0218 1.2080.252trttrt2 -0.7759 0.8667 -0.8950.390trttrt3 7.8503 0.8308 9.449 1.3e-06 ***trttrt4 28.2685 0.9050 31.236 4.3e-12 ***cov 1.4027 1.1639 1.2050.253---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared: 0.9932,Adjusted R-squared: 0.9908 F-statistic: 404.4 on 4 and 11 DF, p-value: 7.467e-12 pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals:Min 1Q Median 3Q Max -2.5778 -0.6584 -0.0185 0.6423 3.2077 Coefficients:Estimate Std. Error t value Pr(|t|) (Intercept) 2.769610.25232 10.977 2e-16 ***trttrt2 -1.75490 0.28546 -6.148 1.17e-08 ***trttrt3 8.405210.28251 29.752 2e-16 ***trttrt4 27.040950.28286 95.599 2e-16 ***cov 0.051290.32523 0.1580.875---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 0.9913,Adjusted R-squared: 0.991 F-statistic: 3269 on 4 and 115 DF, p-value: 2.2e-16 On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain wrote: Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)). However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion. ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML
Re: [R-sig-eco] Regression with few observations per factor level
Thank you for this helpful thought. So if I get it correctly it is hopeless to try testing an interaction, but we neverless may assess if a covariate has an impact, providing it is the same in all treatments. Message du 20/10/14 à 16h46 De : Elgin Perry A : v_coudr...@voila.fr Copie à : Objet : Regression with few observations per factor level If it is reasonable to assume that the slope of the covariate is the same for all treatments and you have numerous treatments then you can do this by specifying one slope parameter for all treatments as you gave in your example (e.g. lm(var ~ trt + cov)). By combining slope information over treatments, you can obtain a reasonably precise estimate. With so few observations per treatment, you will not be able to estimate separate slopes for each treatment with any degree of precision (e.g. lm(var ~ trt + trt:cov)) Elgin S. Perry, Ph.D. Statistics Consultant 377 Resolutions Rd. Colonial Beach, Va. 22443 ph. 410.610.1473 Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST) From: V. Coudrain v_coudr...@voila.fr To: r-sig-ecology@r-project.org Subject: [R-sig-eco] Regression with few observations per factor level Message-ID: 2127199056.738451413795221981.JavaMail.www@wwinf7128 Content-Type: text/plain; charset=UTF-8 Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)).? However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion. ___ Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec [[alternative HTML version deleted]] ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Regression with few observations per factor level
Yes, but as I fear, the residuals behave badly as soon as the model get a little bit more complex (e.g., with two covariables or an interactions). The scope for performing an ANCOVA is thus very limited. That's why I was thinking about a potential non-parametric model. But I do not want to artificially makes my data tell something if it cannot. Message du 20/10/14 à 16h50 De : stephen sefick A : Martin Weiser Copie à : V. Coudrain , r-sig-ecology Objet : Re: [R-sig-eco] Regression with few observations per factor level You are more or less preforming an ANOVA/ANCOVA on your data? As pointed out earlier, all of the normal theory regression assumptions apply. Assuming all of those things are satisfied then if you have large confidence intervals and there are significant differences between groups I don't see why you couldn't correctly infer something about the treatments. Maybe I am missing something. Stephen On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser wrote: Hi, coefficients and their p-values are reliable if your data are OK and you do know enough about the process that generated them, so you can choose appropriate model. With 4 points per line, it may be really difficult to identify bad fit or outliers. For example: simple linear regression needs constant variance of the normal distribution from which residuals are drawn - along the regression line - to work properly. With 4 points, you can hardly estimate this, but if you know enough about the process that generated the data, you are safe. If you do not know, it is not easy to say anything about the nature of the process that generated the data. If you know (or can assume) that there is simple linear relationship, you can say: slope of this relationship is such and such, but if you want to estimate both the nature of the relationship (A *linearly* depends on B) and its magnitude (the slope of this relationship is ...), p-values would not help you much. Of course, I may be wrong - I am not a statistician, just a user. Best, Martin W. V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200: Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable? Message du 20/10/14 à 11h30 De : Roman Luštrik A : V. Coudrain Copie à : r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] Regression with few observations per factor level I think you can, but the confidence intervals will be rather large due to number of samples. Notice how standard errors change for sample size (per group) from 4 to 30. pg - 4 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals: Min 1Q Median 3Q Max -1.63861 -0.46080 0.03332 0.66380 1.27974 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.2345 1.0218 1.208 0.252 trttrt2 -0.7759 0.8667 -0.895 0.390 trttrt3 7.8503 0.8308 9.449 1.3e-06 ***trttrt4 28.2685 0.9050 31.236 4.3e-12 ***cov 1.4027 1.1639 1.205 0.253 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared: 0.9932,Adjusted R-squared: 0.9908 F-statistic: 404.4 on 4 and 11 DF, p-value: 7.467e-12 pg - 30 # pg = per group my.df - data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c(trt1, trt2, trt3, trt4), each = pg), + cov = runif(pg*4)) # 4 groups summary(lm(var ~ trt + cov, data = my.df)) Call:lm(formula = var ~ trt + cov, data = my.df) Residuals: Min 1Q Median 3Q Max -2.5778 -0.6584 -0.0185 0.6423 3.2077 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 2.76961 0.25232 10.977 2e-16 ***trttrt2 -1.75490 0.28546 -6.148 1.17e-08 ***trttrt3 8.40521 0.28251 29.752 2e-16 ***trttrt4 27.04095 0.28286 95.599 2e-16 ***cov 0.05129 0.32523 0.158 0.875 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 0.9913,Adjusted R-squared: 0.991 F-statistic: 3269 on 4 and 115 DF, p-value: 2.2e-16 On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain wrote: Hi, I would like to test the impact of a treatment
Re: [R-sig-eco] Regression with few observations per factor level
Yes, the analysis with a small sample size would be valid (under the assumption that the model - both fixed and random effects are correctly specified) but at some point there must be a practical assessment as to the desired precision and the costs of the consequences of either inadequate estimates or wrong acceptance or rejection of hypotheses. If it were just about the numbers from a sample and resulting P-values, we would only need statisticians and no subject-matter experts (which is clearly not the case). And while I'm soapboxing, situations with low variability require fewer samples than situations with high variability. One can't make assessments of the adequacy of an analysis solely on the sample size. Jim Jim Baldwin Station Statistician Pacific Southwest Research Station USDA Forest Service -Original Message- From: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of V. Coudrain Sent: Monday, October 20, 2014 8:54 AM To: ElginPerry Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] Regression with few observations per factor level Thank you for this helpful thought. So if I get it correctly it is hopeless to try testing an interaction, but we neverless may assess if a covariate has an impact, providing it is the same in all treatments. Message du 20/10/14 à 16h46 De : Elgin Perry A : v_coudr...@voila.fr Copie à : Objet : Regression with few observations per factor level If it is reasonable to assume that the slope of the covariate is the same for all treatments and you have numerous treatments then you can do this by specifying one slope parameter for all treatments as you gave in your example (e.g. lm(var ~ trt + cov)). By combining slope information over treatments, you can obtain a reasonably precise estimate. With so few observations per treatment, you will not be able to estimate separate slopes for each treatment with any degree of precision (e.g. lm(var ~ trt + trt:cov)) Elgin S. Perry, Ph.D. Statistics Consultant 377 Resolutions Rd. Colonial Beach, Va. 22443 ph. 410.610.1473 Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST) From: V. Coudrain v_coudr...@voila.fr To: r-sig-ecology@r-project.org Subject: [R-sig-eco] Regression with few observations per factor level Message-ID: 2127199056.738451413795221981.JavaMail.www@wwinf7128 Content-Type: text/plain; charset=UTF-8 Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)).? However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion. ___ Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec [[alternative HTML version deleted]] ___ Mode, hifi, maison,… J'achète malin. Je compare les prix avec [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology