Re: [R] help: pls package

2005-07-22 Thread Bjørn-Helge Mevik
wu sz writes:

 trainSet = as.data.frame(scale(trainSet, center = T, scale = T))
 trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = 
 kernelpls,
 CV = TRUE, validation = LOO, model = TRUE, x = TRUE,
 y = TRUE)

[Two side notes here:
 1) scaling of the data (with its sd) should be performed inside the
 cross-validation.  In the current version of 'pls', one can use
 cvplsr - crossval(plsr(y ~ scale(X), ncomp = 14, data = mydata),
length.seg = 1)
 (However, 'crossval' is slower than the built-in cross-validation on
 'mvr'/'plsr'.  In the development version of the package, scaling
 within the cross-validation has been implemented in the built-in
 cross-validation.  This will hopefully be published shortly.)

 2) The 'CV' argument is from the earlier 'pls.pcr' package, and is no
 longer used.  It is silently ignored.]

 i = 1; msep_element = c()
 while(i = length(p)){
msep_element[,i] = (p[i]-y)^2
i = i + 1
 }

Hmm...  I don't see how you got that code to run.  This should work, though:

msep_element - (p - y)^2

 msep = colMeans(msep_element)
 msep_sd = sd(msep_element)

You will get much closer to the true value with

sd(msep_element) / sqrt(length(y))

However, this will not produce an unbiased estimate of the sd of the
estimated MSEP, because it ignores the depencies between the
residuals.  E.g., the residual when sample 1 is predicted is not
independent of the residual when sample 2 is predicted.  In general, I
think, it will produce underestimated sds.  The effect should be
largest for small data sets.

This is the reason the pls package currently doesn't estimate se of
cross-validated MSEPs.  There is also the question of what the
estimated should be conditioned on: for leave-one-out
cross-validation, sd(MSEP | trainData) = 0.

[If someone knows how to calculate unbiased estimates of
cross-validated MSEPs, please let me know. :-)]

-- 
Bjørn-Helge Mevik

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help: pls package

2005-07-21 Thread wu sz
Hello,

I have a data set with 15 variables (first one is the response) and
1200 observations. Now I use pls package to do the plsr with cross
validation as below.

trainSet = as.data.frame(scale(trainSet, center = T, scale = T))
trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = kernelpls, 
   CV = TRUE, validation = LOO, model =
TRUE, x = TRUE,
y = TRUE)

after that I wish to obtain the value of se, estimated standard
errors of the estimates(cross validation), mentioned in the function
of MSEP, but not implemented yet, so I made the program by myself to
calculate it. The results I got seem not right, and I wonder which
step below is wrong.

y = trainSet.plsr$y
p = as.data.frame(trainSet.plsr$validation$pred)

i = 1; msep_element = c()
while(i = length(p)){
msep_element[,i] = (p[i]-y)^2 
i = i + 1
}

msep = colMeans(msep_element)
msep_sd = sd(msep_element)

Then I compare msep with trainSet.plsr$validation$MSEP, which are
the same, but the values of msep_sd seem much larger than I
expected, is it the same as se?

Thank you,
Shengzhe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] help: pls package

2005-07-21 Thread wu sz
Hello,

I have a data set with 15 variables (first one is the response) and
1200 observations. Now I use pls package to do the plsr with cross
validation as below.

trainSet = as.data.frame(scale(trainSet, center = T, scale = T))
trainSet.plsr = mvr(formula, ncomp = 14, data = trainSet, method = kernelpls,
  CV = TRUE, validation = LOO, model = TRUE, x = TRUE,
   y = TRUE)

after that I wish to obtain the value of se, estimated standard
errors of the estimates(cross validation), mentioned in the function
of MSEP, but not implemented yet, so I made the program by myself to
calculate it. The results I got seem not right, and I wonder which
step below is wrong.

y = trainSet.plsr$y
p = as.data.frame(trainSet.plsr$validation$pred)

i = 1; msep_element = c()
while(i = length(p)){
   msep_element[,i] = (p[i]-y)^2
   i = i + 1
}

msep = colMeans(msep_element)
msep_sd = sd(msep_element)

Then I compare msep with trainSet.plsr$validation$MSEP, which are
the same, but the values of msep_sd seem much larger than I
expected, is it the same as se? If not, how to calculate se of
cross validation?

Thank you,
Shengzhe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html