Re: [R] low R square value from ANCOVA model

2012-05-08 Thread peter dalgaard

On May 8, 2012, at 05:10 , array chip wrote:

 Hi, what does a low R-square value from an ANCOVA model mean? For example, if 
 the R square from the model is about 0.2, does this mean the results should 
 NOT be trusted? I checked the residuals of the model, it looked fine...

It just means that your model has low predictive power (at the individual 
level). I.e. the noise (error) part of the model is large relative to the 
signal (systematic part). Statistical inferences are not compromised by that, 
except of course that large error variation is reflected in large standard 
errors of estimated regression coefficients. 

  
 Thanks for any suggestion.
  
 John
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread peter dalgaard

On May 8, 2012, at 08:34 , array chip wrote:

 Thank you Peter, so if I observe a significant coefficient, that significance 
 still holds because the standard error of the coefficient has taken the 
 residual error (which is large because large R square) into account, am I 
 correct?

In essence, yes. One might quibble over the use of large because, but it's 
not important for the main point.

-pd

 John
 From: peter dalgaard pda...@gmail.com
 To: array chip arrayprof...@yahoo.com 
 Cc: r-help@r-project.org r-help@r-project.org 
 Sent: Monday, May 7, 2012 11:07 PM
 Subject: Re: [R] low R square value from ANCOVA model
 
 
 On May 8, 2012, at 05:10 , array chip wrote:
 
  Hi, what does a low R-square value from an ANCOVA model mean? For example, 
  if the R square from the model is about 0.2, does this mean the results 
  should NOT be trusted? I checked the residuals of the model, it looked 
  fine...
 
 It just means that your model has low predictive power (at the individual 
 level). I.e. the noise (error) part of the model is large relative to the 
 signal (systematic part). Statistical inferences are not compromised by that, 
 except of course that large error variation is reflected in large standard 
 errors of estimated regression coefficients. 
 
   
  Thanks for any suggestion.
   
  John
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 
 
 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Thank you Peter, so if I observe a significant coefficient, that significance 
still holds because the standard error of the coefficient has taken the 
residual error (which is large because large R square) into account, am I 
correct?

John
 


 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, May 7, 2012 11:07 PM
Subject: Re: [R] low R square value from ANCOVA model
  

On May 8, 2012, at 05:10 , array chip wrote:

 Hi, what does a low R-square value from an ANCOVA model mean? For example, if 
 the R square from the model is about 0.2, does this mean the results should 
 NOT be trusted? I checked the residuals of the model, it looked fine...

It just means that your model has low predictive power (at the individual 
level). I.e. the noise (error) part of the model is large relative to the 
signal (systematic part). Statistical inferences are not compromised by that, 
except of course that large error variation is reflected in large standard 
errors of estimated regression coefficients. 

  
 Thanks for any suggestion.
  
 John
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Thanks again Peter. What about the argument that because low R square (e.g. 
R^2=0.2) indicated the model variance was not sufficiently explained by the 
factors in the model, there might be additional factors that should be 
identified and included in the model. And If these additional factors were 
indeed included, it might change the significance for the factor of interest 
that previously showed significant coefficient. In other word, if R square is 
low, the significant coefficient observed is not trustworthy.

What's your opinion on this argument?

Many thanks!

John




 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, May 7, 2012 11:43 PM
Subject: Re: [R] low R square value from ANCOVA model


On May 8, 2012, at 08:34 , array chip wrote:

 Thank you Peter, so if I observe a significant coefficient, that significance 
 still holds because the standard error of the coefficient has taken the 
 residual error (which is large because large R square) into account, am I 
 correct?

In essence, yes. One might quibble over the use of large because, but it's 
not important for the main point.

-pd

 John
 From: peter dalgaard pda...@gmail.com

 Cc: r-help@r-project.org r-help@r-project.org 
 Sent: Monday, May 7, 2012 11:07 PM
 Subject: Re: [R] low R square value from ANCOVA model
 
 
 On May 8, 2012, at 05:10 , array chip wrote:
 
  Hi, what does a low R-square value from an ANCOVA model mean? For example, 
  if the R square from the model is about 0.2, does this mean the results 
  should NOT be trusted? I checked the residuals of the model, it looked 
  fine...
 
 It just means that your model has low predictive power (at the individual 
 level). I.e. the noise (error) part of the model is large relative to the 
 signal (systematic part). Statistical inferences are not compromised by that, 
 except of course that large error variation is reflected in large standard 
 errors of estimated regression coefficients. 
 
   
  Thanks for any suggestion.
   
  John
      [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 
 
 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Hi Peter, searched old mail archive and found this topic had been discussed 
before. The previous discussion was around a situation where there was a very 
large sample size involved so even a small effect still showed up as 
significant even with low R square of the model.

In my case, the sample size is 72, the significance of group effect is due to 
large effect relative to its standard error:

obj-lm(y~age+sex+school+group,dat)

summary(obj)

Coefficients:
  Estimate Std. Error t value Pr(|t|)
(Intercept)   169.8634    13.4678  12.613   2e-16
age    -0.3737 0.2762  -1.353   0.1805
sexM    2.1137 8.6585   0.244   0.8079
schoolS2    4.1711 8.1811   0.510   0.6118
groupG2  -20.8944    10.2807  -2.032   0.0461

Residual standard error: 32.13 on 67 degrees of freedom
Multiple R-squared: 0.1732, Adjusted R-squared: 0.1238 
F-statistic: 3.509 on 4 and 67 DF,  p-value: 0.01163 

So R-squared is quite low (0.17), what's your opinion on the argument that the 
significant coefficient for group is not trustworthy because the model variance 
was not sufficiently accounted for, and if additional factors could be 
identified and included in the model, that might changed the effect of group 
from significant to insignificant.

Many thanks for sharing your thoughts.

John






To: peter dalgaard pda...@gmail.com 
Cc: r-help@r-project.org r-help@r-project.org 
Sent: Tuesday, May 8, 2012 1:45 PM
Subject: Re: [R] low R square value from ANCOVA model


Thanks again Peter. What about the argument that because low R square (e.g. 
R^2=0.2) indicated the model variance was not sufficiently explained by the 
factors in the model, there might be additional factors that should be 
identified and included in the model. And If these additional factors were 
indeed included, it might change the significance for the factor of interest 
that previously showed significant coefficient. In other word, if R square is 
low, the significant coefficient observed is not trustworthy.

What's your opinion on this argument?

Many thanks!

John




 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, May 7, 2012 11:43 PM
Subject: Re: [R] low R square value from ANCOVA model


On May 8, 2012, at 08:34 , array chip wrote:

 Thank you Peter, so if I observe a significant coefficient, that significance 
 still holds because the standard error of the coefficient has taken the 
 residual error (which is large because large R square) into account, am I 
 correct?

In essence, yes. One might quibble over the use of large because, but it's 
not important for the main point.

-pd

 John
 From: peter dalgaard pda...@gmail.com

 Cc: r-help@r-project.org r-help@r-project.org 
 Sent: Monday, May 7, 2012
 11:07 PM
 Subject: Re: [R] low R square value from ANCOVA model
 
 
 On May 8, 2012, at 05:10 , array chip wrote:
 
  Hi, what does a low R-square value from an ANCOVA model mean? For example, 
  if the R square from the model is about 0.2, does this mean the results 
  should NOT be trusted? I checked the residuals of the model, it looked 
  fine...
 
 It just means that your model has low predictive power (at the individual 
 level). I.e. the noise (error) part of the model is large relative to the 
 signal (systematic part). Statistical inferences are not compromised by that, 
 except of course that large error variation is reflected in large standard 
 errors of estimated regression coefficients. 
 
   
  Thanks for any suggestion.
   
  John
      [[alternative HTML version deleted]]
  
 
 __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 
 
 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread Paul Johnson
On Tue, May 8, 2012 at 3:45 PM, array chip arrayprof...@yahoo.com wrote:
 Thanks again Peter. What about the argument that because low R square (e.g. 
 R^2=0.2) indicated the model variance was not sufficiently explained by the 
 factors in the model, there might be additional factors that should be 
 identified and included in the model. And If these additional factors were 
 indeed included, it might change the significance for the factor of interest 
 that previously showed significant coefficient. In other word, if R square is 
 low, the significant coefficient observed is not trustworthy.

 What's your opinion on this argument?

I think that argument is silly. I'm sorry if that is too blunt. Its
just plain superficial.
 It reflects a poor understanding of what the linear model is all
about. If you have
other variables that might belong in the model, run them and test.
The R-square,
either low or high, does not have anything direct to say about whether
those other
variables exist.

Here's my authority.

Arthur Goldberger (A Course in Econometrics, 1991, p.177)
“Nothing in the CR (Classical Regression) model requires that R2 be high. Hence,
a high R2 is not evidence in favor of the model, and a low R2 is not evidence
against it.”

I found that reference in Anders Skrondal and  Sophia Rabe-Hesketh,
Generalized Latend Variable Modeling: Multilevel, Longitudinal,
and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004.

From Section 8.5.2:

Furthermore, how badly the baseline model fits the data depends greatly
on the magnitude of the parameters of the true model. For instance, consider
estimating a simple parallel measurement model. If the true model is a
congeneric measurement model (with considerable variation in factor loadings
and measurement error variances between items), the fit index could be high
simply because the null model fits very poorly, i.e. because the
reliabilities of
the items are high. However, if the true model is a parallel measurement model
with low reliabilities the fit index could be low although we are estimating the
correct model. Similarly, estimating a simple linear regression model can yield
a high R2 if the relationship is actually quadratic with a considerable linear
trend and a low R2 when the model is true but with a small slope (relative to
the overall variance).

For a detailed argument/explanation of the argument that the R-square is not
a way to decide if a model is good or bad see

King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in
Quantitative Political Science. American Journal of Political Science,
30(3), 666–687. doi:10.2307/2111095

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Paul, thanks for your thoughts. blunt, not at all

If I understand correctly, it doesn't help anything to speculate whether there 
might be additional variables existing or not. Given current variables in the 
model, it's perfectly fine to draw conclusions based on significant 
coefficients regardless of R-squared is high or low.

Gary King's article is interesting...

John




 From: Paul Johnson pauljoh...@gmail.com

Cc: peter dalgaard pda...@gmail.com; r-help@r-project.org 
r-help@r-project.org 
Sent: Tuesday, May 8, 2012 8:23 PM
Subject: Re: [R] low R square value from ANCOVA model


 Thanks again Peter. What about the argument that because low R square (e.g. 
 R^2=0.2) indicated the model variance was not sufficiently explained by the 
 factors in the model, there might be additional factors that should be 
 identified and included in the model. And If these additional factors were 
 indeed included, it might change the significance for the factor of interest 
 that previously showed significant coefficient. In other word, if R square is 
 low, the significant coefficient observed is not trustworthy.

 What's your opinion on this argument?

I think that argument is silly. I'm sorry if that is too blunt. Its
just plain superficial.
It reflects a poor understanding of what the linear model is all
about. If you have
other variables that might belong in the model, run them and test.
The R-square,
either low or high, does not have anything direct to say about whether
those other
variables exist.

Here's my authority.

Arthur Goldberger (A Course in Econometrics, 1991, p.177)
“Nothing in the CR (Classical Regression) model requires that R2 be high. 
Hence,
a high R2 is not evidence in favor of the model, and a low R2 is not evidence
against it.”

I found that reference in Anders Skrondal and  Sophia Rabe-Hesketh,
Generalized Latend Variable Modeling: Multilevel, Longitudinal,
and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004.

From Section 8.5.2:

Furthermore, how badly the baseline model fits the data depends greatly
on the magnitude of the parameters of the true model. For instance, consider
estimating a simple parallel measurement model. If the true model is a
congeneric measurement model (with considerable variation in factor loadings
and measurement error variances between items), the fit index could be high
simply because the null model fits very poorly, i.e. because the
reliabilities of
the items are high. However, if the true model is a parallel measurement model
with low reliabilities the fit index could be low although we are estimating the
correct model. Similarly, estimating a simple linear regression model can yield
a high R2 if the relationship is actually quadratic with a considerable linear
trend and a low R2 when the model is true but with a small slope (relative to
the overall variance).

For a detailed argument/explanation of the argument that the R-square is not
a way to decide if a model is good or bad see

King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in
Quantitative Political Science. American Journal of Political Science,
30(3), 666–687. doi:10.2307/2111095

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread Bert Gunter
It gets curiouser and curiouser, said Alice.

-- Bert

On Tue, May 8, 2012 at 9:07 PM, array chip arrayprof...@yahoo.com wrote:
 Paul, thanks for your thoughts. blunt, not at all

 If I understand correctly, it doesn't help anything to speculate whether 
 there might be additional variables existing or not. Given current variables 
 in the model, it's perfectly fine to draw conclusions based on significant 
 coefficients regardless of R-squared is high or low.

 Gary King's article is interesting...

 John



 
  From: Paul Johnson pauljoh...@gmail.com

 Cc: peter dalgaard pda...@gmail.com; r-help@r-project.org 
 r-help@r-project.org
 Sent: Tuesday, May 8, 2012 8:23 PM
 Subject: Re: [R] low R square value from ANCOVA model


 Thanks again Peter. What about the argument that because low R square (e.g. 
 R^2=0.2) indicated the model variance was not sufficiently explained by the 
 factors in the model, there might be additional factors that should be 
 identified and included in the model. And If these additional factors were 
 indeed included, it might change the significance for the factor of interest 
 that previously showed significant coefficient. In other word, if R square 
 is low, the significant coefficient observed is not trustworthy.

 What's your opinion on this argument?

 I think that argument is silly. I'm sorry if that is too blunt. Its
 just plain superficial.
 It reflects a poor understanding of what the linear model is all
 about. If you have
 other variables that might belong in the model, run them and test.
 The R-square,
 either low or high, does not have anything direct to say about whether
 those other
 variables exist.

 Here's my authority.

 Arthur Goldberger (A Course in Econometrics, 1991, p.177)
 “Nothing in the CR (Classical Regression) model requires that R2 be high. 
 Hence,
 a high R2 is not evidence in favor of the model, and a low R2 is not evidence
 against it.”

 I found that reference in Anders Skrondal and  Sophia Rabe-Hesketh,
 Generalized Latend Variable Modeling: Multilevel, Longitudinal,
 and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004.

 From Section 8.5.2:

 Furthermore, how badly the baseline model fits the data depends greatly
 on the magnitude of the parameters of the true model. For instance, consider
 estimating a simple parallel measurement model. If the true model is a
 congeneric measurement model (with considerable variation in factor loadings
 and measurement error variances between items), the fit index could be high
 simply because the null model fits very poorly, i.e. because the
 reliabilities of
 the items are high. However, if the true model is a parallel measurement model
 with low reliabilities the fit index could be low although we are estimating 
 the
 correct model. Similarly, estimating a simple linear regression model can 
 yield
 a high R2 if the relationship is actually quadratic with a considerable linear
 trend and a low R2 when the model is true but with a small slope (relative to
 the overall variance).

 For a detailed argument/explanation of the argument that the R-square is not
 a way to decide if a model is good or bad see

 King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes 
 in
 Quantitative Political Science. American Journal of Political Science,
 30(3), 666–687. doi:10.2307/2111095

 pj
 --
 Paul E. Johnson
 Professor, Political Science    Assoc. Director
 1541 Lilac Lane, Room 504     Center for Research Methods
 University of Kansas               University of Kansas
 http://pj.freefaculty.org            http://quant.ku.edu
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] low R square value from ANCOVA model

2012-05-07 Thread array chip
Hi, what does a low R-square value from an ANCOVA model mean? For example, if 
the R square from the model is about 0.2, does this mean the results should NOT 
be trusted? I checked the residuals of the model, it looked fine...
 
Thanks for any suggestion.
 
John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.