Re: [R] low R square value from ANCOVA model
On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
On May 8, 2012, at 08:34 , array chip wrote: Thank you Peter, so if I observe a significant coefficient, that significance still holds because the standard error of the coefficient has taken the residual error (which is large because large R square) into account, am I correct? In essence, yes. One might quibble over the use of large because, but it's not important for the main point. -pd John From: peter dalgaard pda...@gmail.com To: array chip arrayprof...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:07 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Thank you Peter, so if I observe a significant coefficient, that significance still holds because the standard error of the coefficient has taken the residual error (which is large because large R square) into account, am I correct? John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:07 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? Many thanks! John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:43 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 08:34 , array chip wrote: Thank you Peter, so if I observe a significant coefficient, that significance still holds because the standard error of the coefficient has taken the residual error (which is large because large R square) into account, am I correct? In essence, yes. One might quibble over the use of large because, but it's not important for the main point. -pd John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:07 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Hi Peter, searched old mail archive and found this topic had been discussed before. The previous discussion was around a situation where there was a very large sample size involved so even a small effect still showed up as significant even with low R square of the model. In my case, the sample size is 72, the significance of group effect is due to large effect relative to its standard error: obj-lm(y~age+sex+school+group,dat) summary(obj) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 169.8634 13.4678 12.613 2e-16 age -0.3737 0.2762 -1.353 0.1805 sexM 2.1137 8.6585 0.244 0.8079 schoolS2 4.1711 8.1811 0.510 0.6118 groupG2 -20.8944 10.2807 -2.032 0.0461 Residual standard error: 32.13 on 67 degrees of freedom Multiple R-squared: 0.1732, Adjusted R-squared: 0.1238 F-statistic: 3.509 on 4 and 67 DF, p-value: 0.01163 So R-squared is quite low (0.17), what's your opinion on the argument that the significant coefficient for group is not trustworthy because the model variance was not sufficiently accounted for, and if additional factors could be identified and included in the model, that might changed the effect of group from significant to insignificant. Many thanks for sharing your thoughts. John To: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, May 8, 2012 1:45 PM Subject: Re: [R] low R square value from ANCOVA model Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? Many thanks! John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:43 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 08:34 , array chip wrote: Thank you Peter, so if I observe a significant coefficient, that significance still holds because the standard error of the coefficient has taken the residual error (which is large because large R square) into account, am I correct? In essence, yes. One might quibble over the use of large because, but it's not important for the main point. -pd John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:07 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
On Tue, May 8, 2012 at 3:45 PM, array chip arrayprof...@yahoo.com wrote: Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? I think that argument is silly. I'm sorry if that is too blunt. Its just plain superficial. It reflects a poor understanding of what the linear model is all about. If you have other variables that might belong in the model, run them and test. The R-square, either low or high, does not have anything direct to say about whether those other variables exist. Here's my authority. Arthur Goldberger (A Course in Econometrics, 1991, p.177) “Nothing in the CR (Classical Regression) model requires that R2 be high. Hence, a high R2 is not evidence in favor of the model, and a low R2 is not evidence against it.” I found that reference in Anders Skrondal and Sophia Rabe-Hesketh, Generalized Latend Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004. From Section 8.5.2: Furthermore, how badly the baseline model fits the data depends greatly on the magnitude of the parameters of the true model. For instance, consider estimating a simple parallel measurement model. If the true model is a congeneric measurement model (with considerable variation in factor loadings and measurement error variances between items), the fit index could be high simply because the null model fits very poorly, i.e. because the reliabilities of the items are high. However, if the true model is a parallel measurement model with low reliabilities the fit index could be low although we are estimating the correct model. Similarly, estimating a simple linear regression model can yield a high R2 if the relationship is actually quadratic with a considerable linear trend and a low R2 when the model is true but with a small slope (relative to the overall variance). For a detailed argument/explanation of the argument that the R-square is not a way to decide if a model is good or bad see King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science, 30(3), 666–687. doi:10.2307/2111095 pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Paul, thanks for your thoughts. blunt, not at all If I understand correctly, it doesn't help anything to speculate whether there might be additional variables existing or not. Given current variables in the model, it's perfectly fine to draw conclusions based on significant coefficients regardless of R-squared is high or low. Gary King's article is interesting... John From: Paul Johnson pauljoh...@gmail.com Cc: peter dalgaard pda...@gmail.com; r-help@r-project.org r-help@r-project.org Sent: Tuesday, May 8, 2012 8:23 PM Subject: Re: [R] low R square value from ANCOVA model Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? I think that argument is silly. I'm sorry if that is too blunt. Its just plain superficial. It reflects a poor understanding of what the linear model is all about. If you have other variables that might belong in the model, run them and test. The R-square, either low or high, does not have anything direct to say about whether those other variables exist. Here's my authority. Arthur Goldberger (A Course in Econometrics, 1991, p.177) âNothing in the CR (Classical Regression) model requires that R2 be high. Hence, a high R2 is not evidence in favor of the model, and a low R2 is not evidence against it.â I found that reference in Anders Skrondal and Sophia Rabe-Hesketh, Generalized Latend Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004. From Section 8.5.2: Furthermore, how badly the baseline model fits the data depends greatly on the magnitude of the parameters of the true model. For instance, consider estimating a simple parallel measurement model. If the true model is a congeneric measurement model (with considerable variation in factor loadings and measurement error variances between items), the fit index could be high simply because the null model fits very poorly, i.e. because the reliabilities of the items are high. However, if the true model is a parallel measurement model with low reliabilities the fit index could be low although we are estimating the correct model. Similarly, estimating a simple linear regression model can yield a high R2 if the relationship is actually quadratic with a considerable linear trend and a low R2 when the model is true but with a small slope (relative to the overall variance). For a detailed argument/explanation of the argument that the R-square is not a way to decide if a model is good or bad see King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science, 30(3), 666â687. doi:10.2307/2111095 pj -- Paul E. Johnson Professor, Political Science   Assoc. Director 1541 Lilac Lane, Room 504   Center for Research Methods University of Kansas        University of Kansas http://pj.freefaculty.org       http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
It gets curiouser and curiouser, said Alice. -- Bert On Tue, May 8, 2012 at 9:07 PM, array chip arrayprof...@yahoo.com wrote: Paul, thanks for your thoughts. blunt, not at all If I understand correctly, it doesn't help anything to speculate whether there might be additional variables existing or not. Given current variables in the model, it's perfectly fine to draw conclusions based on significant coefficients regardless of R-squared is high or low. Gary King's article is interesting... John From: Paul Johnson pauljoh...@gmail.com Cc: peter dalgaard pda...@gmail.com; r-help@r-project.org r-help@r-project.org Sent: Tuesday, May 8, 2012 8:23 PM Subject: Re: [R] low R square value from ANCOVA model Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? I think that argument is silly. I'm sorry if that is too blunt. Its just plain superficial. It reflects a poor understanding of what the linear model is all about. If you have other variables that might belong in the model, run them and test. The R-square, either low or high, does not have anything direct to say about whether those other variables exist. Here's my authority. Arthur Goldberger (A Course in Econometrics, 1991, p.177) “Nothing in the CR (Classical Regression) model requires that R2 be high. Hence, a high R2 is not evidence in favor of the model, and a low R2 is not evidence against it.” I found that reference in Anders Skrondal and Sophia Rabe-Hesketh, Generalized Latend Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004. From Section 8.5.2: Furthermore, how badly the baseline model fits the data depends greatly on the magnitude of the parameters of the true model. For instance, consider estimating a simple parallel measurement model. If the true model is a congeneric measurement model (with considerable variation in factor loadings and measurement error variances between items), the fit index could be high simply because the null model fits very poorly, i.e. because the reliabilities of the items are high. However, if the true model is a parallel measurement model with low reliabilities the fit index could be low although we are estimating the correct model. Similarly, estimating a simple linear regression model can yield a high R2 if the relationship is actually quadratic with a considerable linear trend and a low R2 when the model is true but with a small slope (relative to the overall variance). For a detailed argument/explanation of the argument that the R-square is not a way to decide if a model is good or bad see King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science, 30(3), 666–687. doi:10.2307/2111095 pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] low R square value from ANCOVA model
Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.