Re: [R-sig-eco] Cross validate model with calibrate

2013-11-02 Thread Jari Oksanen
Dear Philipp,

I cannot see anything wrong in your approach. The only problem is that RDA may 
not do very well. It is very common that the calibrated values are outside the 
original range of the data and if the predictions are not very good, they can 
also be unrealistic values. RDA has no idea what is realistic for given data. 

You get cross-validated values outside data range even in the Dune meadow data 
you used. Here an example that applies 4-fold cross validation (in line with 
3/4 subsetting in your example):

library(vegan)
data(dune, dune.env)
vec - rep(1:4, length=nrow(dune))
pred - rep(NA, length=nrow(dune))
set.seed(4711)
take - sample(vec)
for(i in 1:4) pred[take==i] - calibrate(rda(dune ~ A1, dune.env, subset = take 
!= i), newdata = subset(dune, take==i))
range(pred)  
## [1]  0.9104486 12.3583826
with(dune.env, range(A1))
## [1]  2.8 11.5

The predicted values extend nearly 2cm below and 1cm above the data values. In 
this case the predicted values were not negative, but that could happen, too. 
The tool (RDA) may not work very well in all cases, but RMSE will tell you how 
good RDA is for the job. If it is bad, do not use it. It seems RDA is very poor 
in this dune example: RMSE is larger than standard deviation of A1 (but you 
could almost guess this by looking at the low constrained eigenvalues of the 
model).

The code above uses the 'subset' argument of vegan::rda and the subset() 
function of R to make the code a bit easier to write.

Cheers, Jari Oksanen


On 01/11/2013, at 15:53 PM, Philipp01 wrote:

 Hello all,
 
 I would like to cross validate my rda() derived model with the calibrate
 function (vegan package) and calculate the RMSE as value for performance
 measure. 
 For simplicity I use the example from the predict.cca {vegan} help.
 
 library(ade4)
 library(vegan)
 
 data(dune)
 data(dune.env)
 
 nbr - as.numeric(rownames(dune))
 
 library(caret)
 inTrain - createDataPartition(y=nbr, p=3/4, list=F, times=1)
 
 train.dune - dune[inTrain[,i],];
 test.dune  - dune[-inTrain[,i],];
 
 train.dune.env - dune.env[inTrain[,i],];
 test.dune.env  - dune.env[-inTrain[,i],];
 
 
 mod  - rda(train.dune ~ A1, train.dune.env)
 cal - calibrate(mod, newdata=test.dune)
 
 with(test.dune.env, plot(A1, cal[,A1] - A1, ylab=Prediction Error))
 abline(h=0)
 
 error - cal - test.dune.env$A1
 (rmse - sqrt(mean(error^2)))
 
 When I apply this code snippet to my very own data I get positive and
 negative cal values, which would be unrealistic for parameters such as
 tree height (etc.). Therefore, I doubt that my approach is correct. How do
 you compute the RMSE for the rda() derived model?
 
 Regards, Philipp 
 
 
 
 
 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/Cross-validate-model-with-calibrate-tp7578486.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Cross validate model with calibrate

2013-11-01 Thread Philipp01
Hello all,

I would like to cross validate my rda() derived model with the calibrate
function (vegan package) and calculate the RMSE as value for performance
measure. 
For simplicity I use the example from the predict.cca {vegan} help.

library(ade4)
library(vegan)

data(dune)
data(dune.env)

nbr - as.numeric(rownames(dune))

library(caret)
inTrain - createDataPartition(y=nbr, p=3/4, list=F, times=1)

train.dune - dune[inTrain[,i],];
test.dune  - dune[-inTrain[,i],];
  
train.dune.env - dune.env[inTrain[,i],];
test.dune.env  - dune.env[-inTrain[,i],];


mod  - rda(train.dune ~ A1, train.dune.env)
cal - calibrate(mod, newdata=test.dune)

with(test.dune.env, plot(A1, cal[,A1] - A1, ylab=Prediction Error))
abline(h=0)

error - cal - test.dune.env$A1
(rmse - sqrt(mean(error^2)))

When I apply this code snippet to my very own data I get positive and
negative cal values, which would be unrealistic for parameters such as
tree height (etc.). Therefore, I doubt that my approach is correct. How do
you compute the RMSE for the rda() derived model?

Regards, Philipp 




--
View this message in context: 
http://r-sig-ecology.471788.n2.nabble.com/Cross-validate-model-with-calibrate-tp7578486.html
Sent from the r-sig-ecology mailing list archive at Nabble.com.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology