Re: [R] help: pls package
wu sz writes: trainSet = as.data.frame(scale(trainSet, center = T, scale = T)) trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = kernelpls, CV = TRUE, validation = LOO, model = TRUE, x = TRUE, y = TRUE) [Two side notes here: 1) scaling of the data (with its sd) should be performed inside the cross-validation. In the current version of 'pls', one can use cvplsr - crossval(plsr(y ~ scale(X), ncomp = 14, data = mydata), length.seg = 1) (However, 'crossval' is slower than the built-in cross-validation on 'mvr'/'plsr'. In the development version of the package, scaling within the cross-validation has been implemented in the built-in cross-validation. This will hopefully be published shortly.) 2) The 'CV' argument is from the earlier 'pls.pcr' package, and is no longer used. It is silently ignored.] i = 1; msep_element = c() while(i = length(p)){ msep_element[,i] = (p[i]-y)^2 i = i + 1 } Hmm... I don't see how you got that code to run. This should work, though: msep_element - (p - y)^2 msep = colMeans(msep_element) msep_sd = sd(msep_element) You will get much closer to the true value with sd(msep_element) / sqrt(length(y)) However, this will not produce an unbiased estimate of the sd of the estimated MSEP, because it ignores the depencies between the residuals. E.g., the residual when sample 1 is predicted is not independent of the residual when sample 2 is predicted. In general, I think, it will produce underestimated sds. The effect should be largest for small data sets. This is the reason the pls package currently doesn't estimate se of cross-validated MSEPs. There is also the question of what the estimated should be conditioned on: for leave-one-out cross-validation, sd(MSEP | trainData) = 0. [If someone knows how to calculate unbiased estimates of cross-validated MSEPs, please let me know. :-)] -- Bjørn-Helge Mevik __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] help: pls package
Hello, I have a data set with 15 variables (first one is the response) and 1200 observations. Now I use pls package to do the plsr with cross validation as below. trainSet = as.data.frame(scale(trainSet, center = T, scale = T)) trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = kernelpls, CV = TRUE, validation = LOO, model = TRUE, x = TRUE, y = TRUE) after that I wish to obtain the value of se, estimated standard errors of the estimates(cross validation), mentioned in the function of MSEP, but not implemented yet, so I made the program by myself to calculate it. The results I got seem not right, and I wonder which step below is wrong. y = trainSet.plsr$y p = as.data.frame(trainSet.plsr$validation$pred) i = 1; msep_element = c() while(i = length(p)){ msep_element[,i] = (p[i]-y)^2 i = i + 1 } msep = colMeans(msep_element) msep_sd = sd(msep_element) Then I compare msep with trainSet.plsr$validation$MSEP, which are the same, but the values of msep_sd seem much larger than I expected, is it the same as se? Thank you, Shengzhe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] help: pls package
Hello, I have a data set with 15 variables (first one is the response) and 1200 observations. Now I use pls package to do the plsr with cross validation as below. trainSet = as.data.frame(scale(trainSet, center = T, scale = T)) trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = kernelpls, CV = TRUE, validation = LOO, model = TRUE, x = TRUE, y = TRUE) after that I wish to obtain the value of se, estimated standard errors of the estimates(cross validation), mentioned in the function of MSEP, but not implemented yet, so I made the program by myself to calculate it. The results I got seem not right, and I wonder which step below is wrong. y = trainSet.plsr$y p = as.data.frame(trainSet.plsr$validation$pred) i = 1; msep_element = c() while(i = length(p)){ msep_element[,i] = (p[i]-y)^2 i = i + 1 } msep = colMeans(msep_element) msep_sd = sd(msep_element) Then I compare msep with trainSet.plsr$validation$MSEP, which are the same, but the values of msep_sd seem much larger than I expected, is it the same as se? If not, how to calculate se of cross validation? Thank you, Shengzhe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html