On Wed, 28 Sep 2005, Denis Chabot wrote:

> But what about another analogy, that of polynomials? You may not be sure what 
> degree polynomial to use, and you have not decided before analysing your 
> data. You fit different polynomials to your data, checking if added degrees 
> increase r2 sufficiently by doing F-tests.

Yes, you can. And this procedure gives you incorrect p-values.

  They may not be very incorrect -- it depends on how much model selection 
you do, and how strongly the feature you are selecting on is related to 
the one you are testing.

For example, using step() to choose a polynomial in x even when x is 
unrelated to y and z inflates the Type I error rate by giving a biased 
estimate of the residual mean squared error:

once<-function(){
   y<-rnorm(50);x<-runif(50);z<-rep(0:1,25)
   summary(step(lm(y~z),
         scope=list(lower=~z,upper=~z+x+I(x^2)+I(x^3)+I(x^4)),
         trace=0))$coef["z",4]
  }
> p<-replicate(1000,once())
> mean(p<0.05)
[1] 0.072

which is significantly higher than you would expect for an honest level 
0.05 test.

        -thomas

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to