Re: [R] NA and NaN randomForest

2007-04-25 Thread Liaw, Andy
Hi Clayton,

If you use the formula interface, then it should do what you want:

R library(randomForest)
randomForest 4.5-18 
Type rfNews() to see new features/changes/bug fixes.
R iris1 - iris[-(1:5),]
R iris2 - iris[1:5,]
R iris2[1, 3] - NA
R iris2[3, 1] - NA
R iris.rf - randomForest(Species ~ ., iris1)
R predict(iris.rf, iris2[-5])
[1] NA   setosa NA   setosa setosa
Levels: setosa versicolor virginica

The problem, of course, is that the formula interface is not suitable
for data with large number of variables.  I'll look into doing the same
thing in the default method.

Andy


From: [EMAIL PROTECTED]
 
 Dear R-help,
 
 This is about randomForest's handling of NA and NaNs in test set data.
 Currently, if the test set data contains an NA or NaN then 
 predict.randomForest will skip that row in the output.
 
 I would like to change that behavior to outputting an NA.
 
 Can this be done with flags to randomForest?
 If not can some sort of wrapper be built to put the NAs back in?
 
 thanks,
 
 Clayton
 _
 
 CONFIDENTIALITY NOTICE\ \ The information contained in this 
 ...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NA and NaN randomForest

2007-04-25 Thread clayton . springer
Hi Andy,

It worked for classification, but not regression. For example:

 iris1 - iris[-(1:5),]
 iris2 - iris[(1:5),]
 iris2[1,3] - NA
 iris2[3,1] - NA
 iris_sum - sum (iris$Sepal.Length + iris$Sepa.Width + iris$Petal.Length 
+ iris$Petal.Width)
 iris_sum1 -  iris_sum[-(1:5)]
 iris_sum2 -  iris_sum[(1:5)]
 iris_sum.rf - randomForest (iris_sum1 ~ ., iris1[,c(1:4)])
   predict (iris_sum.rf, iris2[-5])
   predict (iris_sum.rf, iris2[-5])
[1]  9.556591  9.589573 10.104155

# Just to be clear I was hoping for behavior like the linear model has:

 iris_sum.lm - lm (iris_sum1 ~ ., iris1[,c(1:4)])
  predict (iris_sum.lm, iris2[-5])
   12345 
  NA  9.5   NA  9.4 10.2 

In the event that this is not available in the regression part of 
randomForest, is a work around possible?



thanks,

Clayton




Liaw, Andy [EMAIL PROTECTED] 
04/25/2007 09:59 AM

To
[EMAIL PROTECTED], r-help@stat.math.ethz.ch
cc

Subject
RE: [R] NA and NaN randomForest






Hi Clayton,

If you use the formula interface, then it should do what you want:

R library(randomForest)
randomForest 4.5-18 
Type rfNews() to see new features/changes/bug fixes.
R iris1 - iris[-(1:5),]
R iris2 - iris[1:5,]
R iris2[1, 3] - NA
R iris2[3, 1] - NA
R iris.rf - randomForest(Species ~ ., iris1)
R predict(iris.rf, iris2[-5])
[1] NA   setosa NA   setosa setosa
Levels: setosa versicolor virginica

The problem, of course, is that the formula interface is not suitable
for data with large number of variables.  I'll look into doing the same
thing in the default method.

Andy


From: [EMAIL PROTECTED]
 
 Dear R-help,
 
 This is about randomForest's handling of NA and NaNs in test set data.
 Currently, if the test set data contains an NA or NaN then 
 predict.randomForest will skip that row in the output.
 
 I would like to change that behavior to outputting an NA.
 
 Can this be done with flags to randomForest?
 If not can some sort of wrapper be built to put the NAs back in?
 
 thanks,
 
 Clayton
 _
 
 CONFIDENTIALITY NOTICE\ \ The information contained in this 
 ...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NA and NaN randomForest

2007-04-24 Thread clayton . springer
Dear R-help,

This is about randomForest's handling of NA and NaNs in test set data.
Currently, if the test set data contains an NA or NaN then 
predict.randomForest will skip that row in the output.

I would like to change that behavior to outputting an NA.

Can this be done with flags to randomForest?
If not can some sort of wrapper be built to put the NAs back in?

thanks,

Clayton
_

CONFIDENTIALITY NOTICE\ \ The information contained in this ...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.