Dear Philipp,

I cannot see anything wrong in your approach. The only problem is that RDA may 
not do very well. It is very common that the calibrated values are outside the 
original range of the data and if the predictions are not very good, they can 
also be "unrealistic" values. RDA has no idea what is realistic for given data. 

You get cross-validated values outside data range even in the Dune meadow data 
you used. Here an example that applies 4-fold cross validation (in line with 
3/4 subsetting in your example):

library(vegan)
data(dune, dune.env)
vec <- rep(1:4, length=nrow(dune))
pred <- rep(NA, length=nrow(dune))
set.seed(4711)
take <- sample(vec)
for(i in 1:4) pred[take==i] <- calibrate(rda(dune ~ A1, dune.env, subset = take 
!= i), newdata = subset(dune, take==i))
range(pred)  
## [1]  0.9104486 12.3583826
with(dune.env, range(A1))
## [1]  2.8 11.5

The predicted values extend nearly 2cm below and 1cm above the data values. In 
this case the predicted values were not negative, but that could happen, too. 
The tool (RDA) may not work very well in all cases, but RMSE will tell you how 
good RDA is for the job. If it is bad, do not use it. It seems RDA is very poor 
in this dune example: RMSE is larger than standard deviation of A1 (but you 
could almost guess this by looking at the low constrained eigenvalues of the 
model).

The code above uses the 'subset' argument of vegan::rda and the subset() 
function of R to make the code a bit easier to write.

Cheers, Jari Oksanen


On 01/11/2013, at 15:53 PM, Philipp01 wrote:

> Hello all,
> 
> I would like to cross validate my rda() derived model with the calibrate
> function (vegan package) and calculate the RMSE as value for performance
> measure. 
> For simplicity I use the example from the predict.cca {vegan} help.
> 
> library(ade4)
> library(vegan)
> 
> data(dune)
> data(dune.env)
> 
> nbr <- as.numeric(rownames(dune))
> 
> library(caret)
> inTrain <- createDataPartition(y=nbr, p=3/4, list=F, times=1)
> 
> train.dune <- dune[inTrain[,i],];
> test.dune  <- dune[-inTrain[,i],];
> 
> train.dune.env <- dune.env[inTrain[,i],];
> test.dune.env  <- dune.env[-inTrain[,i],];
> 
> 
> mod  <- rda(train.dune ~ A1, train.dune.env)
> cal <- calibrate(mod, newdata=test.dune)
> 
> with(test.dune.env, plot(A1, cal[,"A1"] - A1, ylab="Prediction Error"))
> abline(h=0)
> 
> error <- cal - test.dune.env$A1
> (rmse <- sqrt(mean(error^2)))
> 
> When I apply this code snippet to my very own data I get positive and
> negative "cal" values, which would be unrealistic for parameters such as
> tree height (etc.). Therefore, I doubt that my approach is correct. How do
> you compute the RMSE for the rda() derived model?
> 
> Regards, Philipp 
> 
> 
> 
> 
> --
> View this message in context: 
> http://r-sig-ecology.471788.n2.nabble.com/Cross-validate-model-with-calibrate-tp7578486.html
> Sent from the r-sig-ecology mailing list archive at Nabble.com.
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to