Hi all (and Andy!), When running a randomForest run in R, I get the last part of an output (with do.trace=T) that looks like this:
1993 | 0.04606 130.43 | 1994 | 0.04605 130.40 | 1995 | 0.04605 130.43 | 1996 | 0.04605 130.43 | 1997 | 0.04606 130.44 | 1998 | 0.04607 130.47 | 1999 | 0.04606 130.46 | 2000 | 0.04605 130.42 | With the first column representing the iteration, the second column representing the OOB MSE, and the last column representing the %Var(y). If I calculate a "Psuedo-R^2" from these numbers, I would get; 1-(.04605/1.3042) = 0.965 Here's the question, if I look at the summary of forest.rf (this same run), I get the following; randomForest(formula = Prev ~ ., data = plas, ntree = 2000, importance = TRUE, do.trace = T) Type of random forest: regression Number of trees: 2000 No. of variables tried at each split: 5 Mean of squared residuals: 0.04605177 % Var explained: -30.42 What does that -30.42 % Var explained relate to? I find it interesting that the %Var(y) is 130.42, and that the %Var explained is a very similar number, but have no idea how they are related. From my calculations, it seems like I have a good predictor set (Psuedo R^2 over 95%), but am I missing something? Cheers, Ryan -- Ryan Harrigan, Ph.D. Center for Tropical Research Institute of the Environment University of California, Los Angeles La Kretz Hall, Suite 300 Box 951496 Los Angeles, CA 90095-1496 203-804-9505 ilu...@ucla.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.