Re: [R-sig-eco] Getting output from predict.randomForest

Gavin Simpson Sat, 27 Sep 2008 08:56:41 -0700

On Fri, 2008-09-26 at 10:58 -0400, [EMAIL PROTECTED]
wrote:
> I have been trying to use randomForest and specifically predict for
> randomForest as follows:
> 
> for (y in 7:42){
>   data1 <- indata[c(1:5,y)]
>   test1 <- test[c(1:5,y)])]
>   data1 <- na.omit(data1))]
>   test1 <- na.omit(test1))]
>   set.seed(1234)
>   tree=randomForest(x=data1[,2:5], y=data1[,6], ntree=1000, mtry=3,
>      importance=TRUE, keep.forest=TRUE)
>      summary(tree)
>      print(tree)
>      tree.predict <- predict(tree, test1[,2:6], type="response",
> nodes=TRUE)
>      table(observed = test1, predicted = tree.predict)
>      varUsed(tree, count=TRUE)
> }
> 
> The data set, data1, has the following form, with ERClass and ChanClass
> being factors:
> 
>     FieldNum ERClass ChanClass DrainageArea   PctFines Clinger
> 1    04LM099       5                      1           10.2791962
> 0.000000      10
> 2    04LM127       5                      1           44.9838181
> 0.000000      10
> 3    96SC002       3                      1         668.9939004
> 0.000000      29
> 4    96SC037       3                      1         241.9048792
> 0.000000      23
> 5    97LS051        3                     1          342.3964136
> 0.000000      17
> .
> .
> .
> 
> In this example, FieldNum is a sample identifier that is not used in the
> analysis, Clinger is the dependent variable.  The other variables are
> the independent variables.  The data set, test1, is a subset of 12
> samples that were removed from data1 prior to the analysis with the same
> variables.
> 
> What I would like is to get a prediction of the characteristics (i.e.,
> something like ERClass = 3, ChanClass = 2 or 3, DrainageArea > 400,
> PctFines < 10 - although I have found an example for a similar problem,
> so I am not sure what it will look like exactly) of the end nodes where
> the majority of the trees place each of these 12 samples).
> 
> However, the output I am currently getting is:
> 
> Call:
>  randomForest(x = data1[, 2:5], y = data1[, 6], ntree = 1000,      mtry
> = 3, importance = TRUE, keep.forest = TRUE)
>                Type of random forest: regression
>                      Number of trees: 1000
> No. of variables tried at each split: 3
> 
>           Mean of squared residuals: 17.6679
>                     % Var explained: 49.65
> Error in predict.randomForest(tree, test1[, 1:6], type = "response",
> nodes = TRUE) :
>   Type of predictors in new data do not match that of the training data.
> 
> Clearly, something is wrong with my predict statement, but what?  Do I
> need to re-identify which variables are x and which variable is y?  If
> so, how?  Also, am I going to get the result I am looking for?  If not,
> how do I need to write this to get that?  The help pages I have found
> have been very inadequate.


do str(indata) and str(test) give the same information regarding the
types of variables? If any of the variables used are factors, do the
factors have the same levels in indata and test?

I'd probably do this differently, and store the test and training data
in the same df to start with, and then split it out at random into a
training and test set object (or just use the indices on the main object
depending on whether I want the training or test rows).

This way, the variables will be the same type/format/structure as they
came from the same df to begin with.

Also, I really don't follow your loop code. You seem to be indexing
indata without reference to columns/rows in first line within the loop.
There also seem to be several syntax errors - too many "]"?

So start simple, set y <- 7 and perform the first run of the loop "by
hand" and once that works, then do the loop in full.

G

> 
> Thanks for your help.
> 
> Michael
> 
> Michael B. Griffith, Ph.D.
> Research Ecologist
> 
> USEPA, NCEA (MS A-110)
> 26 W. Martin Luther King Dr.
> Cincinnati, OH  45268
> 
> telephone:  513 569-7034
> e-mail:  [EMAIL PROTECTED]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Getting output from predict.randomForest

Reply via email to