Hi all,

I'm still not quite happy with the NA and factor handling of lm and predict.lm in 
R1.6.1 (forcing me to use my 
not very skillfully crafted patches). 

Here is the problem 1:
        > 
print(data<-data.frame(y=c(0.9,2.05,3.02,NA,5.2),x1=c(1:4,NA),x2=factor(c("blue","blue","green","green","green"),levels=c("blue","green"))))
             y x1    x2
        1 0.90  1  blue
        2 2.05  2  blue
        3 3.02  3 green
        4   NA  4 green
        5 5.20 NA green
        > fit<-lm(y~x1+x2,data=data,na.action=na.exclude)
        > predict(fit,data)
           1    2    3    4 
        0.90 2.05 3.02 4.17 
Interpretation:
There are two NAs, one in the response and one in an explanatory variable. If I 
understand the action of na.exclude, I would have expected
   1       2     3      4     5 
0.90 2.05 3.02 4.17    NA
and this is what I think it should be. (and in my personalized version of predict.lm 
it does this).

Here is problem 2:
        > 
print(data<-data.frame(y=c(0.9,2.05,3.02,NA,5.2),x1=c(1:4,NA),x2=factor(c("blue","blue","green","green","green"),levels=c("blue","green","yellow"))))
             y x1    x2
        1 0.90  1  blue
        2 2.05  2  blue
        3 3.02  3 green
        4   NA  4 green
        5 5.20 NA green
        >  fit<-lm(y~x1+x2,data=data,na.action=na.exclude)
        > predict(fit,data)
        Error in model.frame.default(object, data, xlev = xlev) : 
                factor x2 has new level(s) yellow
Interpretation:
Since level "yellow" was not used (is in some sense missing) in the data, predict.lm 
blocks. This should not happen. Maybe a warning should be given. But predict.lm should 
not quit with error.


Here is problem 3:
        > 
print(data<-data.frame(y=c(0.9,2.05,3.02,NA,5.2),x1=c(1:4,NA),x2=factor(c("blue","blue","green","green","green"),levels=c("blue","green"))))
             y x1    x2
        1 0.90  1  blue
        2 2.05  2  blue
        3 3.02  3 green
        4   NA  4 green
        5 5.20 NA green
        > fit<-lm(y~x1+x2,data=data,na.action=na.exclude)
        > 
print(newdata<-data.frame(y=c(0.9,2.05,3.02,NA,5.2),x1=c(1:4,NA),x2=factor(c("blue","blue","yellow","green","green"),levels=c("blue","green","yellow"))))
             y x1     x2
        1 0.90  1   blue
        2 2.05  2   blue
        3 3.02  3 yellow
        4   NA  4  green
        5 5.20 NA  green
        > predict(fit,newdata)
        Error in model.frame.default(object, data, xlev = xlev) : 
                factor x2 has new level(s) yellow
Interpretation:
Quite naturally, predict doesn't know what to do with a level which wasn't used in the 
model. So the result should be NA. Maybe a warning should be given that there was a 
new level in the factor. But predict.lm should not quit with error.

Fixing these problems would make exploration of residual patterns with respect to 
variables not included in the model much easier. Any opinion? Any help in sight?

Thanks in advance,

Chris.


platform i386-pc-mingw32
arch     i386           
os       mingw32        
system   i386, mingw32  
status                  
major    1              
minor    6.1            
year     2002           
month    11             
day      01             
language R 

Christian Ritter
Functional Specialist Statistics
Shell Coordination Centre S.A.
Monnet Centre International Laboratory, Avenue Jean Monnet 1, B-1348 Louvain-La-Neuve, 
Belgium

Tel: +32 10 477  349 Fax: +32 10 477 219
Email: [EMAIL PROTECTED]
Internet: http://www.shell.com/chemicals


        [[alternate HTML version deleted]]

______________________________________________
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to