Dear Cristoph, David, Torsten and Bj�rn-Helge, I think that Bj�rn-Helge has made more explicit what I had in mind (which I think is close also to what David mentioned). As well, at the very least, not placing the PCA inside the cross-validation will underestimate the variance in the predictions.
Best, R. On Thursday 25 November 2004 15:05, Bj�rn-Helge Mevik wrote: > Torsten Hothorn writes: > > as long as one does not use the information in the response (the class > > variable, in this case) I don't think that one ends up with an > > optimistically biased estimate of the error > > I would be a little careful, though. The left-out sample in the > LDA-cross-validation, will still have influenced the PCA used to build > the LDA on the rest of the samples. The sample will have a tendency > to lie closer to the centre of the "complete" PCA than of a PCA on the > remaining samples. Also, if the sample has a high leverage on the > PCA, the directions of the two PCAs can be quite different. Thus, the > LDA is built on data that "fits" better to the left-out sample than if > the sample was a completely new sample. > > I have no proofs or numerical studies showing that this gives > over-optimistic error rates, but I would not recommend placing the PCA > "outside" the cross-validation. (The same for any resampling-based > validation.) -- Ram�n D�az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol�gicas (CNIO) (Spanish National Cancer Center) Melchor Fern�ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
