Re: [R-sig-Geo] Prediction variance (map) for predictions derived using RandomForest package

Tomislav Hengl Mon, 24 Jun 2013 09:46:12 -0700


Dear Forrest,

Thanks a lot for your tip. I think quantregForest is what we werelooking for. It takes much more time to compute, but the method lookssound(http://jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf). I dosimplify everything on the end and assume that I can derive upper andlower confidence limits for +/- 1 s.d. (0.15866, 1-0.15866) and then usethis as the prediction variance, but this is probably as good as itgoes. Here is the revised code:


https://code.google.com/p/gsif/source/browse/trunk/meuse/RK_vs_RandomForestK.R

Thank you all for your suggestions / opinions (very useful as usual).

cheers,

T. (Tom) Hengl
Url: http://www.wageningenur.nl/en/Persons/dr.-T-Tom-Hengl.htm
Network: http://profiles.google.com/tom.hengl
Publications: http://scholar.google.com/citations?user=2oYU7S8AAAAJ


On 23/06/2013 15:08, Forrest Stevens wrote:

Hi Tom, I've done something similar in the past to visualize the
distribution of the predictions attained for each observation across
the many trees within a random forest while looking at various aspects
of those ranges and correlating that with cross-validated prediction
errors.  It's relatively easy to generate and keep the predictions for
every tree for each observation (pixel in your case) using the
predict.all=TRUE argument:

predictions <- predict(random_forest, newdata=x_data_new, predict.all=TRUE)

Then to extract all of the individual trees' predictions for the first
observation:

predictions$individual[1]

You can do this to get the mean and SD for each observation (note the
mean should match the value in predictions$aggregate:

y_data$rf_mean <- apply(predictions$individual, MARGIN=1, mean)
y_data$rf_sd <- apply(predictions$individual, MARGIN=1, sd)
y_data$rf_cv <- apply(predictions$individual, MARGIN=1, sd)


In practice I've found during testing that the distribution of values
(assuming the continuous regression case since you're looking at SD in
the first place) is highly skewed.  The range, SD, CV and other
measures of distribution of the individual trees does not correlate
well at all with prediction errors in my work. I kind of makes
intuitive sense since the power of the random forest algorithm relies
in the ensemble nature of the technique, and the randomness injected
via variable sampling at each node and those measures of variation in
the predictions I've looked at quickly become irrelevant as you scale
up the number of trees in the forest.  So your mileage may vary but
I'd be interested to know what you find.

You may also want to look at the excellent quantregForest package as
it produces a randomForest object but also produces information on the
quantiles and quantile range for each observation's prediction for
you, including some nice plots that I've found useful.

Sincerely,
Forrest

On Sun, Jun 23, 2013 at 5:51 AM, Tomislav Hengl
<[email protected]> wrote:


Dear list,

I have a question about the randomForest models. I'm trying to figure out a
way to estimate the prediction variance (spatially) for the randomForest
function (http://cran.r-project.org/web/packages/randomForest/).

If I run a GLM I can also derive the prediction variance using:

demo(meuse, echo=FALSE)
meuse.ov <- over(meuse, meuse.grid)
meuse.ov <- cbind(meuse.ov, meuse@data)
omm0 <- glm(log1p(om)~dist+ffreq, meuse.ov, family=gaussian())
om.glm <- predict.glm(omm0, meuse.grid, se.fit=TRUE)
str(om.glm)

List of 3
  $ fit           : Named num [1:3103] 2.34 2.34 2.32 2.29 2.34 ...
   ..- attr(*, "names")= chr [1:3103] "1" "2" "3" "4" ...
  $ se.fit        : Named num [1:3103] 0.0491 0.0491 0.0481 0.046 0.0491 ...
   ..- attr(*, "names")= chr [1:3103] "1" "2" "3" "4" ...
  $ residual.scale: num 0.357

when I fit a randomForest model, I do not get any estimate of the model
uncertainty (for each pixel) but just the predictions:

meuse.ov <- meuse.ov[-omm0$na.action,]
x <- randomForest(log1p(om)~dist+ffreq, meuse.ov)
om.rf <- predict(x, meuse.grid)
str(om.rf)

  Named num [1:3103] 2.49 2.49 2.51 2.44 2.49 ...
  - attr(*, "names")= chr [1:3103] "1" "2" "3" "4" ...

Does anyone has an idea how to map the prediction variance (i.e. estimated
or propagated error) for the randomForest models spatially?

I've tried deriving a propagated error for the randomForest models (every
fit gives another model due to random component):

l.rfk <- data.frame(om_1 = rep(NA, nrow(meuse.grid)))
for(i in 1:50){

+   suppressWarnings(suppressMessages(x <-
randomForest(log1p(om)~dist+ffreq, meuse.ov)))
+   l.rfk[,paste("om",i,sep="_")] <- predict(x, meuse.grid)
+ } ## takes ca 1 minute

meuse.grid$om.rfkvar <- om.rfk@predicted$var1.var + apply(l.rfk, 1, var)


but the prediction variance I get is rather small (much smaller than e.g.
the GLM variance). Here is the complete code with some plots:

R code:
https://code.google.com/p/gsif/source/browse/trunk/meuse/RK_vs_RandomForestK.R

Predictions UK vs randomForest-kriging:
https://gsif.googlecode.com/svn/trunk/meuse/Fig_meuse_RK_vs_RFK.png

thanx,

T. (Tom) Hengl
Url: http://www.wageningenur.nl/en/Persons/dr.-T-Tom-Hengl.htm
Network: http://profiles.google.com/tom.hengl
Publications: http://scholar.google.com/citations?user=2oYU7S8AAAAJ

_______________________________________________
R-sig-Geo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


_______________________________________________
R-sig-Geo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] Prediction variance (map) for predictions derived using RandomForest package

Reply via email to