Re: [R] LDA decission boundaries
2007/8/20, Dani Valverde [EMAIL PROTECTED]: Hello, I would like to plot the results of a LDA analysis plotting the discriminant scores with the decission boundaries on it with rggobi. I have GGobi already installed on my computer. I have three classes, so the plot would be LD1xLD2 plus the decission boundaries. Here there is the code I use make the plot: library(MASS) data - zgcppr273K.pca$x[,1:7] Tumor - c(rep(MM,23),rep(GBM,25),rep(LGG,17)) data.lda - lda(data,Tumor) data.ld - predict(data.lda) data.ldd - data.frame(data.ld$x,data.ld$class) library(rggobi) data.g - ggobi(data.ldd) The problem is that I do not know how to plot the decission boundaries oh the graph I have made. I beg some answer as soon as you can, as it is a bit urgent. Best regards, Dani Daniel Valverde Saubí Grup d'Aplicacions Biomèdiques de la Ressonància Magnètica Nuclear (GABRMN) Departament de Bioquímica i Biologia Molecular Edifici C, Facultat de Biociències, Campus Universitat Autònoma de Barcelona 08193 Cerdanyola del Vallès, Spain Tlf. (0034) 935814126 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. May be your answer is in the classifly package. Rod. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA and RDA: different training errors
dominic senn wrote: Hello I try to fit a LDA and RDA model to the same data, which has two classes. The problem now is that the training errors of the LDA model and the training error of the RDA model with alpha=0 are not the same. In my understanding this should be the case. Am I wrong? Can someone explain what the reason for this difference could be? I assume lda from MASS? If you are using rda() from package rda, I do not know, since the help page is not very specific in telling which parameter means what (but I guess one of them should be 1). If you choose rda() from package klaR, the help page tells you that gamma=0, lambda=1 should produce identical results to LDA. (lambda=1 means that the pooled covariance matrix is weighted with 1 while the specific covariance matrices are weigthed with 0. Uwe Ligges Here my code: LDA model: === % x is a dataframe tmp = lda(response ~ ., data=x) tmp.hat = predict(tmp) tab = table(x$response, tmp.hat$class) lda.training.err = 1 - sum(tab[row(tab)==col(tab)])/sum(tab) RDA model: === % x is converted into a matrix without the response % variable. This matrix is then transposed tmp = rda(x, y, alpha=0, delta=0) rda.training.err = tmp$error / dim(x)[2] % The training error provided by rda.cv() is also different % from the training errors provided by lda() or rda() tmp.cv = rda.cv(tmp, x=x, y=y, nfold=10) tmp.cv$err / dim(x)[2] / 10 Thanks a lot! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lda and maximum likelihood
On Mon, 6 Aug 2007, [EMAIL PROTECTED] wrote: I am trying to compare several methods for classify data into groups. In that purpose I 'd like to developp model comparison and selection using AIC. In the lda function of the MASS library, the maximum likelihood of the function is not given in the output and the script is not available. The source _is_ available: it is part of the R tarball, and in the VR bundle on CRAN. Do anyone know how to extract or compute the maximum likelihood used in the lda function? It does not maximize a likelihood: what it does do is described in the book for which this is support software. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA newbie question
I am using lda for the first time. I am using version 2.3.1 of R. When I ran the lda I did not get Proportion of trace in the output. Is there another way to get this or is there a bug in my version? Sarah Hodgson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lda plotting: labeling x axis and changing y-axis scale
--- Wade Wall [EMAIL PROTECTED] wrote: Hi all, I have performed an lda on two groups and have plotted using plot(x.lda), with x.lda being my lda results. I have forgotten how to change the labels of the of the x-axes (they are currently listed as Group1 and Group 13), and to rescale the y-axis to reflect frequency. If anyone knows how to do it, I would greatly appreciate the information. Wade Wade, Are you asking about a specific ploting routine in lda or just how to use the basic plot function. If the latter try ?plot.default for what you need. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lda plotting: labeling x axis and changing y-axis scale
Sorry I wasn't clearer. I believe that it was a specialized function, but it may have been plot(). What I am basically trying to do is alter the y-axis to represent frequency and change the labels on the plotting of the linear discriminant analysis results. I can't seem to do this with plot(), if you know another function, that would be great. Or if you know how to alter the y-axis and label the two group names, that would be great also. I have been working at it for a while and am kicking myself for not saving the commands as a script. Thanks, Wade Wall On 12/22/06, John Kane [EMAIL PROTECTED] wrote: --- Wade Wall [EMAIL PROTECTED] wrote: Hi all, I have performed an lda on two groups and have plotted using plot(x.lda), with x.lda being my lda results. I have forgotten how to change the labels of the of the x-axes (they are currently listed as Group1 and Group 13), and to rescale the y-axis to reflect frequency. If anyone knows how to do it, I would greatly appreciate the information. Wade Wade, Are you asking about a specific ploting routine in lda or just how to use the basic plot function. If the latter try ?plot.default for what you need. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lda
Pieter Vermeesch wrote: I'm trying to do a linear discriminant analysis on a dataset of three classes (Affinities), using the MASS library: data.frame2 - na.omit(data.frame1) data.ld = lda(AFFINITY ~ ., data.frame2, prior = c(1,1,1)/3) Error in var(x - group.means[g, ]) : missing observations in cov/cor What does this error message mean and how can I get rid of it? What does str(data.frame2) tell us? Uwe Ligges Thanks! Pieter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lda
Pieter == Pieter Vermeesch [EMAIL PROTECTED] on Mon, 16 Oct 2006 19:15:59 +0200 writes: Pieter I'm trying to do a linear discriminant analysis on a Pieter dataset of three classes (Affinities), using the Pieter MASS library: ^^^ No, no!MASS *package* (please!) data.frame2 - na.omit(data.frame1) data.ld = lda(AFFINITY ~ ., data.frame2, prior = c(1,1,1)/3) Pieter Error in var(x - group.means[g, ]) : missing observations in cov/cor Pieter What does this error message mean and how can I get rid of it? You have (+ or -) 'Inf' data values which na.omit() does not omit and 'x - group.means[g, ]' contains 'Inf - Inf' which is NaN. Ideally, MASS:::lda.default() would check for such a case and give a more user-friendly error message. Pieter Thanks! you're welcome. Martin Maechler, ETH Zurich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lda
Dear Martin and Uwe, I did indeed have a few -Inf values in my data frame. Few enough that I didn't notice them when I inspected my data. Thanks a lot for helping me better understand the MASS *package* :-) Pieter On 10/17/06, Martin Maechler [EMAIL PROTECTED] wrote: Pieter == Pieter Vermeesch [EMAIL PROTECTED] on Mon, 16 Oct 2006 19:15:59 +0200 writes: Pieter I'm trying to do a linear discriminant analysis on a Pieter dataset of three classes (Affinities), using the Pieter MASS library: ^^^ No, no!MASS *package* (please!) data.frame2 - na.omit(data.frame1) data.ld = lda(AFFINITY ~ ., data.frame2, prior = c(1,1,1)/3) Pieter Error in var(x - group.means[g, ]) : missing observations in cov/cor Pieter What does this error message mean and how can I get rid of it? You have (+ or -) 'Inf' data values which na.omit() does not omit and 'x - group.means[g, ]' contains 'Inf - Inf' which is NaN. Ideally, MASS:::lda.default() would check for such a case and give a more user-friendly error message. Pieter Thanks! you're welcome. Martin Maechler, ETH Zurich -- Pieter Vermeesch ETH Zürich, Isotope Geology and Mineral Resources Clausiusstrasse 25, NW C 85, CH-8092 Zurich, Switzerland email: [EMAIL PROTECTED], tel: +41 44 632 4643 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lda discriminant functions
--- Leonardo Lami [EMAIL PROTECTED] wrote: Hi list, I'm looking about lda function. I'd like to know how calcolate the value of the discriminant functions for the original datas. I see that in the result object lda there is $scaling a matrix which transforms observations to discriminant functions, normalized so that within groups covariance matrix is spherical. I'd like to have the value of the discriminant function not normalized. The information you need to do this is already there, but you need to understand how to manipulate your raw data in conjunction with the lda object to calculate the result you want. If you don't understand the matrix math involved and the statistics necessary, you probably won't be successful. Suggest you read ?lda, followed by MASS, followed by Professor Ripley's Pattern Recognition and Neural Networks, followed by a any good book that shows you the matrix algebra for doing the discriminant calculations. It isn't hard to do. If you understand the math to get the individual scores, normalized or unnormalized, but just do not understand the R-way of doing things, check back again. Dr. Marc R Feldesman Professor Chair Emeritus Department of Anthropology Portland State University Portland, OR 97207 Please respond to all emails at: [EMAIL PROTECTED] Some people live and die by actuarial tables Groundhog Day __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda
On Sat, 14 May 2005, K. Steinmann wrote: if I am right a discriminant analysis can be done with lda. My questions are: 1. What method to discriminate the groups is used by lda (Fisher's linar discriminant function, diagonal linear discriminant analysis, likelihood ratio discriminant rule, ...)? None of those, but that due to Rao, which is (up to details of weighting of the covariance matrix) what is very widely called LDA. (Many people attribute to Fisher something he did not do, at least not in the paper they cite.) lda() (in package MASS, uncredited) is support software for a book, so please refer to the book for the details: it is in the references for the help page. 2. How can I see, which method is used? (Typing just lda does not give me any code). I get lda function (x, ...) UseMethod(lda) environment: namespace:MASS which _is _code. Please look up `generic functions' for example in `An Introduction to R'. In this case getS3method(lda, default) will show you the guts of the code, but I don't think you will be able to understand it without the references. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda (MASS)
Now, I use my real dataset (900 instances, 21 attributes), which 2 classes can be serparated with accuracy no more than 80% (10xval) with KNN, SVM, C4.5 and the like. I thinks these accuracies are based on cross-validation runs. Whereas the 80% accuracy you report using LDA is not based on cross-validation runs as long as CV is not set to TRUE. PS: and does anybody know how to use the CV option of lda to make xval? I can't get it. z - lda(Sp ~ ., Iris, CV = TRUE) table(Iris$Sp, z$class) cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
Dear Cristoph, David, Torsten and Bjørn-Helge, I think that Bjørn-Helge has made more explicit what I had in mind (which I think is close also to what David mentioned). As well, at the very least, not placing the PCA inside the cross-validation will underestimate the variance in the predictions. Best, R. On Thursday 25 November 2004 15:05, Bjørn-Helge Mevik wrote: Torsten Hothorn writes: as long as one does not use the information in the response (the class variable, in this case) I don't think that one ends up with an optimistically biased estimate of the error I would be a little careful, though. The left-out sample in the LDA-cross-validation, will still have influenced the PCA used to build the LDA on the rest of the samples. The sample will have a tendency to lie closer to the centre of the complete PCA than of a PCA on the remaining samples. Also, if the sample has a high leverage on the PCA, the directions of the two PCAs can be quite different. Thus, the LDA is built on data that fits better to the left-out sample than if the sample was a completely new sample. I have no proofs or numerical studies showing that this gives over-optimistic error rates, but I would not recommend placing the PCA outside the cross-validation. (The same for any resampling-based validation.) -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
Torsten Hothorn writes: as long as one does not use the information in the response (the class variable, in this case) I don't think that one ends up with an optimistically biased estimate of the error I would be a little careful, though. The left-out sample in the LDA-cross-validation, will still have influenced the PCA used to build the LDA on the rest of the samples. The sample will have a tendency to lie closer to the centre of the complete PCA than of a PCA on the remaining samples. Also, if the sample has a high leverage on the PCA, the directions of the two PCAs can be quite different. Thus, the LDA is built on data that fits better to the left-out sample than if the sample was a completely new sample. I have no proofs or numerical studies showing that this gives over-optimistic error rates, but I would not recommend placing the PCA outside the cross-validation. (The same for any resampling-based validation.) -- Bjørn-Helge Mevik __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
Dear Cristoph, I guess you want to assess the error rate of a LDA that has been fitted to a set of currently existing training data, and that in the future you will get some new observation(s) for which you want to make a prediction. Then, I'd say that you want to use the second approach. You might find that the first step turns out to be crucial and, after all, your whole subsequent LDA is contingent on the PC scores you obtain on the previous step. Somewhat similar issues have been discussed in the microarray literature. Two references are: @ARTICLE{ambroise-02, author = {Ambroise, C. and McLachlan, G. J.}, title = {Selection bias in gene extraction on the basis of microarray gene-expression data}, journal = {Proc Natl Acad Sci USA}, year = {2002}, volume = {99}, pages = {6562--6566}, number = {10}, } @ARTICLE{simon-03, author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.}, title = {Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification}, journal = {Journal of the National Cancer Institute}, year = {2003}, volume = {95}, pages = {14--18}, number = {1}, } I am not sure, though, why you use PCA followed by LDA. But that's another story. Best, R. On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote: Dear all, not really a R question but: If I want to check for the classification accuracy of a LDA with previous PCA for dimensionality reduction by means of the LOOCV method: Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA with the CV option set to TRUE (runs LOOCV) -- OR-- do I need - to compute for each 'test-bag' (the n-1 observations) a PCA (my.princomp.1), - then run the LDA on the test-bag scores (- my.lda.1) - then compute the scores of the left-out-observation using my.princomp.1 (- my.scores.2) - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of the left-out-observation ? I read some articles, where they choose procedure 1, but I am not sure, if this is really correct? many thanks for a hint Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
On Wed, 24 Nov 2004, Ramon Diaz-Uriarte wrote: Dear Cristoph, I guess you want to assess the error rate of a LDA that has been fitted to a set of currently existing training data, and that in the future you will get some new observation(s) for which you want to make a prediction. Then, I'd say that you want to use the second approach. You might find that the first step turns out to be crucial and, after all, your whole subsequent LDA is contingent on the PC scores you obtain on the previous step. Ramon, as long as one does not use the information in the response (the class variable, in this case) I don't think that one ends up with an optimistically biased estimate of the error (although leave-one-out is a suboptimal choice). Of course, when one starts to tune the method used for dimension reduction, a selection of the procedure with minimal error will produce a bias. Or am I missing something important? Btw, `ipred::slda' implements something not completely unlike the procedure Christoph is interested in. Best, Torsten Somewhat similar issues have been discussed in the microarray literature. Two references are: @ARTICLE{ambroise-02, author = {Ambroise, C. and McLachlan, G. J.}, title = {Selection bias in gene extraction on the basis of microarray gene-expression data}, journal = {Proc Natl Acad Sci USA}, year = {2002}, volume = {99}, pages = {6562--6566}, number = {10}, } @ARTICLE{simon-03, author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.}, title = {Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification}, journal = {Journal of the National Cancer Institute}, year = {2003}, volume = {95}, pages = {14--18}, number = {1}, } I am not sure, though, why you use PCA followed by LDA. But that's another story. Best, R. On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote: Dear all, not really a R question but: If I want to check for the classification accuracy of a LDA with previous PCA for dimensionality reduction by means of the LOOCV method: Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA with the CV option set to TRUE (runs LOOCV) -- OR-- do I need - to compute for each 'test-bag' (the n-1 observations) a PCA (my.princomp.1), - then run the LDA on the test-bag scores (- my.lda.1) - then compute the scores of the left-out-observation using my.princomp.1 (- my.scores.2) - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of the left-out-observation ? I read some articles, where they choose procedure 1, but I am not sure, if this is really correct? many thanks for a hint Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
Thank you, Torsten; that's what I thought, as long as one does not use the 'class label' as a constraint in the dimension reduction, the procedure is ok. Of course it is computationally more demanding, since for each new (unknown in respect of the class label) observation one has to compute a new PCA as well. Cheers Christoph Torsten Hothorn wrote: On Wed, 24 Nov 2004, Ramon Diaz-Uriarte wrote: Dear Cristoph, I guess you want to assess the error rate of a LDA that has been fitted to a set of currently existing training data, and that in the future you will get some new observation(s) for which you want to make a prediction. Then, I'd say that you want to use the second approach. You might find that the first step turns out to be crucial and, after all, your whole subsequent LDA is contingent on the PC scores you obtain on the previous step. Ramon, as long as one does not use the information in the response (the class variable, in this case) I don't think that one ends up with an optimistically biased estimate of the error (although leave-one-out is a suboptimal choice). Of course, when one starts to tune the method used for dimension reduction, a selection of the procedure with minimal error will produce a bias. Or am I missing something important? Btw, `ipred::slda' implements something not completely unlike the procedure Christoph is interested in. Best, Torsten Somewhat similar issues have been discussed in the microarray literature. Two references are: @ARTICLE{ambroise-02, author = {Ambroise, C. and McLachlan, G. J.}, title = {Selection bias in gene extraction on the basis of microarray gene-expression data}, journal = {Proc Natl Acad Sci USA}, year = {2002}, volume = {99}, pages = {6562--6566}, number = {10}, } @ARTICLE{simon-03, author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.}, title = {Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification}, journal = {Journal of the National Cancer Institute}, year = {2003}, volume = {95}, pages = {14--18}, number = {1}, } I am not sure, though, why you use PCA followed by LDA. But that's another story. Best, R. On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote: Dear all, not really a R question but: If I want to check for the classification accuracy of a LDA with previous PCA for dimensionality reduction by means of the LOOCV method: Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA with the CV option set to TRUE (runs LOOCV) -- OR-- do I need - to compute for each 'test-bag' (the n-1 observations) a PCA (my.princomp.1), - then run the LDA on the test-bag scores (- my.lda.1) - then compute the scores of the left-out-observation using my.princomp.1 (- my.scores.2) - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of the left-out-observation ? I read some articles, where they choose procedure 1, but I am not sure, if this is really correct? many thanks for a hint Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Lda versus Rda
Julien Trolet wrote: Hello, I used the lda function from the MASS (VR) package and the rda function from the klaR package. I wanted to compare the result of this two functions by using the same training set. Thus, I used the rda function with lambda=1 an gamma=0, I should emulate the lda function and I should obtain the same result. But this it not the case, the two result are very different. My training set is 70 observations * 10 variables long, and I performed a leave one out for each observations. Do somebody have an idea for the cause(s) of this? With the iris data, the following works for me: x1 - predict(lda(Species~., data=iris)) x2 - predict(rda(Species~., data=iris, lambda=1, gamma=0)) all(x1$class == x2$class) all.equal(x1$posterior, x2$posterior) So, can you specify an example (including data + code, in a private message) please? If your analysis for your data is correct, the error is probably in rda(). I won't have time to look at it before monday (and the author of the rda() code is in Auckland these days). It's always a good idea to ask the package maintainer first, BTW. Uwe Ligges Thanks Trolet Julien __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda
On Tue, 2 Nov 2004, T. Murlidharan Nair wrote: Hi !! I am trying to analyze some of my data using linear discriminant analysis. I worked out the following example code in Venables and Ripley It does not seem to be happy with it. What is `it'? If you mean R, which version, and which version of the VR bundle? library(MASS) library(stats) That line is definitely not in `Venables and Ripley' data(iris3) ir-rbind(iris3[,,1],iris3[,,2],iris3[,,3]) ir.species-factor(c(rep(s,50),rep(c,50),rep(v,50))) ir.lda-lda(log(ir),ir.species) ir.ld-predict(ir.lda,dimen=2)$x eqscplot(ir.ld, type=n, xlab = First linear discriminant, ylab = second linear discriminant) text(ir.ld, labels= as.character(ir.species[-143]), col =3 +codes(ir.species),cex =0.8) == eqscplot does not plot anything and it gives me an error saying codes is defunct. Have I missed anything there. I have no idea why eqscplot is misbehaving (your example works up to the last line for me), but the R scripts which the book refers you to do work. See p.12 (and the R posting guide asking you to read the relevant section of the book). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda
T. Murlidharan Nair wrote: Hi !! I am trying to analyze some of my data using linear discriminant analysis. I worked out the following example code in Venables and Ripley It does not seem to be happy with it. library(MASS) library(stats) data(iris3) ir-rbind(iris3[,,1],iris3[,,2],iris3[,,3]) ir.species-factor(c(rep(s,50),rep(c,50),rep(v,50))) ir.lda-lda(log(ir),ir.species) ir.ld-predict(ir.lda,dimen=2)$x eqscplot(ir.ld, type=n, xlab = First linear discriminant, ylab = second linear discriminant) text(ir.ld, labels= as.character(ir.species[-143]), col =3 +codes(ir.species),cex =0.8) == eqscplot does not plot anything and it gives me an error saying codes is defunct. Have I missed anything there. Thanks../Murli Murli, eqscplot gives you nothing because you specified `type=n'. You are not plotting anything because your call to text never completed. When I do this I get: R.version.string [1] R version 2.0.0, 2004-10-09 text(ir.ld, labels= as.character(ir.species[-143]), col =3 +codes(ir.species),cex =0.8) Error: 'codes' is defunct. See help(Defunct) This means codes is no longer available. Use ?as.integer instead. --sundar P.S. Please do read the posting guide and tell us what version of R you are using, etc. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda predict
On Tue, 21 Sep 2004, rob foxall (IFR) wrote: Dear R-helpers, I have a model created by lda, and I would like to use this model to make predictions for new or old data. The catch is, I want to do this without using the predict function, i.e. only using information directly from the foo.lda object to create my posterior probabilities. Well, that's what predict.lda does, so please read its code. In anticipation of likely responses, I will be brushing up my lda knowledge using the given references when I have time, but am being hassled for an answer asap! -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda() - again.
I remember doing this some time ago but forgot. Perhaps this might help you MASS:::predict.lda On Tue, 2004-07-13 at 23:56, marzban wrote: Hi. I asked a question about lda() and got some answers. However, one question remains (which is not independent of the earlier ones): What output does lda() produce which I can use to compute the posteriors? I know predict(lda())$posterior will give me precisely the posteriors, but suppose I'd like to compute them myself, outside of R. So far, I have not been able to use coefficients of linear discrimiants to do this, for they don't seem to be the alpha and beta in log(post) ~ alpha x + beta (this eqn being a caricature of LDA in Ripley). Caren __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda()
At 08:45 AM 7/12/2004, marzban wrote: Hello, For a simple problem with 1 predictor (x) and 2 classes (0 and 1), the linear discriminant function should be something like 2(mu_0 - mu_1)/var x+x-independent-terms where var is the common variance. Question 1: Why does lda() report only a single Coefficients of linear discriminants when there are in fact two coefficients (the x-dependent and the x-independent terms)? Question 2: And how is that single coefficient computed? It is certainly not equal to 2(mu_0 -mu_1)/var . Regards, Caren -- http://www.nhn.ou.edu/~marzban Perhaps some reading would be helpful. I suggest you look first at the help file for lda(). Second, I suggest you read Venables and Ripley, MASS, 4th Edition, where lda() is discussed extensively. Third, I suggest you read Ripley's Pattern Recognition and Neural Networks, where the theory is laid out clearly. Both of these latter books are referenced in lda's help file. Finally, you might want to tell us what version of lda() you're using, what version of R you're using, and what platform you're running on. For all we know, you're using a 2-year old version of R and lda, both long superceded by vastly improved programs and packages. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda()
Perhaps some reading would be helpful. I suggest you look first at the help file for lda(). Second, I suggest you read Venables and Ripley, MASS, 4th Edition, where lda() is discussed extensively. Third, I suggest you read Ripley's Pattern Recognition and Neural Networks, where the theory is laid out clearly. Both of these latter books are referenced in lda's help file. I am quite familiar with all of these references. The theory behind LDA is not where the problem is - I'm comfortable with that. The problem is that I do not know what R is computing when it prints Coefficients of linear discriminants. (According to the source code (lda.R), it's x$scaling, but I don't know what that is either.) Finally, you might want to tell us what version of lda() you're using, what version of R you're using, and what platform you're running on. For all we know, you're using a 2-year old version of R and lda, both long superceded by vastly improved programs and packages. R-1.9.1 on linux Caren __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda()
I haven't done this in years but I think the `scaling' element in the list returned by lda is the original data matrix multiplied by the rotation matrix from the SVD. Taking a look at getAnywhere(lda.default) will probably answer your question. -roger marzban wrote: Perhaps some reading would be helpful. I suggest you look first at the help file for lda(). Second, I suggest you read Venables and Ripley, MASS, 4th Edition, where lda() is discussed extensively. Third, I suggest you read Ripley's Pattern Recognition and Neural Networks, where the theory is laid out clearly. Both of these latter books are referenced in lda's help file. I am quite familiar with all of these references. The theory behind LDA is not where the problem is - I'm comfortable with that. The problem is that I do not know what R is computing when it prints Coefficients of linear discriminants. (According to the source code (lda.R), it's x$scaling, but I don't know what that is either.) Finally, you might want to tell us what version of lda() you're using, what version of R you're using, and what platform you're running on. For all we know, you're using a 2-year old version of R and lda, both long superceded by vastly improved programs and packages. R-1.9.1 on linux Caren __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda
i could not figure out how to write prior=c(uniform) in R. I would get an error every time. I think that it has something to do with uniform. Do you know what i use instead of uniform for R? I am trying to do a uniform distribution. try ?runif (random uniform distribution) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda
There is a help page for lda: please read it for yourself (as the posting guide requests you too). lda in R works the same way in R as it works in S-PLUS: in both it is support software for a book, and the posting guide also asks you to read that book. On Sat, 12 Jun 2004, Martin Willett wrote: I am trying to write the following code in R. The code works in S+ and i am trying to do the program in R. x=discrim(admit~gpa+gmat,prior=c(uniform),data=data.mm) i wrote the following in R: x=lda(admit~gpa+gmat,data=data.mm) i could not figure out how to write prior=c(uniform) in R. I would get an error every time. I think that it has something to do with uniform. Do you know what i use instead of uniform for R? I am trying to do a uniform distribution. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] lda
Seems rather straightforward to me. The prior=uniform in discrim() says to use equal prior for each group. You can do the same by explicitly specifying the priors; e.g., x - lda(admit ~ gpa + gmat, data=data.mm, prior=1/nlevels(data.mm$admit)) HTH, Andy From: Martin Willett I am trying to write the following code in R. The code works in S+ and i am trying to do the program in R. x=discrim(admit~gpa+gmat,prior=c(uniform),data=data.mm) i wrote the following in R: x=lda(admit~gpa+gmat,data=data.mm) i could not figure out how to write prior=c(uniform) in R. I would get an error every time. I think that it has something to do with uniform. Do you know what i use instead of uniform for R? I am trying to do a uniform distribution. Thank you. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda() called with data=subset() command
I presume is lda from the uncredited package MASS and you ignored the advice to ask the maintainer? The short answer is `don't ignore the warning', and set up a proper data frame with just the groups you actually want. As a quick fix, look in lda.default and alter the line that looks like cl - factor(max.col(dist), levels=seq(along=lev1), labels=lev1) to be exactly like that. (You will need fixInNamespace to do so.) On Mon, 5 Jan 2004, Christoph Lehmann wrote: Hi I have a data.frame with a grouping variable having the levels C, mild AD, mod AD, O and S since I want to compute a lda only for the two groups 'C' and 'mod AD' I call lda with data=subset(mydata.pca,GROUP == 'mod AD' | GROUP == 'C') my.lda - lda(GROUP ~ Comp.1 + Comp.2 + Comp.3 + Comp.4+ Comp.5 + Comp.6 + Comp.7 + Comp.8 , data=subset(mydata.pca,GROUP == 'mod AD' | GROUP == 'C'), CV = TRUE) this results in the warning group(s) mild AD O S are empty in: lda.default(x, grouping, ...) of course... my.lda$class now shows [1] C C C C C C C C C [10] C C C C C C C C C [19] C C C mild AD mild AD mild AD mild AD mild AD mild AD [28] mild AD C mild AD mild AD mild AD C C mild AD mild AD [37] mild AD mild AD Levels: C mild AD mod AD O S it seems it just took the second level (mild AD) for the second class, even though the second level was not used for the lda computation (only the first level (C) and the third level (mod AD) what shall I do to resolve this (little) problem? thanks for a hint christoph -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda source code
Frank Gibbons wrote: Wei Geng, I asked the same question about six weeks ago, so let me try to answer it. The source for the entire package 'MASS' is in a single file, I believe (at least this is true on my Linux setup). The thread gets boring, but let me correct this belief: NO! There is one file including all R code, if you are looking into the binary installation of a package, and then the C/Fortran sources are not more visible. Look into the source distribution of a package for a more structured view of things! Uwe Ligges The exact location of that file you'll have to determine by searching the directory/folder where you installed it. The function 'lda' is implemented entirely in R itself, like much of its functionality. Look at the functions 'lda' and 'predict.lda' in this file for details. Comments are sparse in most R code, from what I gather, but if you look in Pattern Recognition and Neural Networks by Brian Ripley (one of the authors of this package), you'll find a discussion in section 2.4 'Predictive classification' that covers much of what's going on, from what I've been able to glean. I hope that helps. There are certainly others out there who are more au fait with this than me. -Frank Gibbons At 05:47 PM 10/1/2003, you wrote: Wei Geng wrote: I am new to R. Trying to find out how lda() {in MASS R1.8.0 Windows} was implemented in R. Does anyone know where to find out lda source code ? Thanks. Here: http://cran.r-project.org Hint: MASS is a *package*. You want to view its *source*. Same with most other R packages. Or just about anything else you want to know about R. Cheers Jason -- Indigo Industrial Controls Ltd. http://www.indigoindustrial.co.nz 64-21-343-545 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
Wei Geng wrote: I am new to R. Trying to find out how lda() {in MASS R1.8.0 Windows} was implemented in R. Does anyone know where to find out lda source code ? Thanks. Here: http://cran.r-project.org Hint: MASS is a *package*. You want to view its *source*. Same with most other R packages. Or just about anything else you want to know about R. Cheers Jason -- Indigo Industrial Controls Ltd. http://www.indigoindustrial.co.nz 64-21-343-545 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
Consider the following: library(MASS) lda function (x, ...) UseMethod(lda) environment: namespace:MASS methods(lda) [1] lda.data.frame lda.defaultlda.formulalda.matrix Now type lda.data.frame or lda.default, etc., at a command prompt to see the corresponding R code. Is this what you want? spencer graves Wei Geng wrote: I am new to R. Trying to find out how lda() {in MASS R1.8.0 Windows} was implemented in R. Does anyone know where to find out lda source code ? Thanks. Wei __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] lda source code
Hi Jason, Spencer, Thanks for the prompt response. The strange thing about MASS is that it's not in Package Sources as most of other R packages are. It seems to come with the binary R installation. I checked out the Rxx/library/MASS on my laptop, there are source code (script) for Venables Ripley's book but no source code for lda(). The lda.data.frame or lda.default at prompt (after loading MASS library(MASS)) has Error: Object lda.data.frame not found Wei __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
Wei Geng, I asked the same question about six weeks ago, so let me try to answer it. The source for the entire package 'MASS' is in a single file, I believe (at least this is true on my Linux setup). The exact location of that file you'll have to determine by searching the directory/folder where you installed it. The function 'lda' is implemented entirely in R itself, like much of its functionality. Look at the functions 'lda' and 'predict.lda' in this file for details. Comments are sparse in most R code, from what I gather, but if you look in Pattern Recognition and Neural Networks by Brian Ripley (one of the authors of this package), you'll find a discussion in section 2.4 'Predictive classification' that covers much of what's going on, from what I've been able to glean. I hope that helps. There are certainly others out there who are more au fait with this than me. -Frank Gibbons At 05:47 PM 10/1/2003, you wrote: Wei Geng wrote: I am new to R. Trying to find out how lda() {in MASS R1.8.0 Windows} was implemented in R. Does anyone know where to find out lda source code ? Thanks. Here: http://cran.r-project.org Hint: MASS is a *package*. You want to view its *source*. Same with most other R packages. Or just about anything else you want to know about R. Cheers Jason -- Indigo Industrial Controls Ltd. http://www.indigoindustrial.co.nz 64-21-343-545 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
With R 1.7.1 on Windows 2000, I got fine R source code for each of the 4 options. What version of R and what exactly did you do? I can't reproduce your error. hope this helps. spencer graves Wei Geng wrote: Hi Jason, Spencer, Thanks for the prompt response. The strange thing about MASS is that it's not in Package Sources as most of other R packages are. It seems to come with the binary R installation. I checked out the Rxx/library/MASS on my laptop, there are source code (script) for Venables Ripley's book but no source code for lda(). The lda.data.frame or lda.default at prompt (after loading MASS library(MASS)) has Error: Object lda.data.frame not found Wei __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
Wei Geng wrote: Does anyone know where to find out lda source code ? Try typing lda.default at the prompt. That should get you started. Also see: methods(lda) as lda.default isn't the only bit of code used in lda() Alternatively, grab the source from CRAN and read it at your leisure. HTH Gav -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] [EMAIL PROTECTED] UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
Wei Geng wrote: Hi Jason, Spencer, Thanks for the prompt response. The strange thing about MASS is that it's not in Package Sources as most of other R packages are. It seems to come with the binary R installation. I checked out the Rxx/library/MASS on my laptop, there are source code (script) for Venables Ripley's book but no source code for lda(). Wei, MASS is actually distributed in a bundle called VR, which is on CRAN. VR as in Venables and Ripley, the authors of MASS (the book). The VR bundle contains MASS, nnet, spatial and class packages. The reason MASS comes with your binary installation is that the VR bundle has Recommended status - and should therefore be available in all binary distributions. HTH, G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] [EMAIL PROTECTED] UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
Wei Geng wrote: Hi Jason, Spencer, Thanks for the prompt response. The strange thing about MASS is that it's not in Package Sources as most of other R packages are. It seems to come with the binary R installation. I checked out the Rxx/library/MASS on my laptop, there are source code (script) for Venables Ripley's book but no source code for lda(). Ah. On CRAN, the MASS library is part of a bundle called VR. Download the source for that. There are a few bundles on the CRAN source pages - if you encounter this problem again, just follow the Package Sources link, and search the page for MASS (or whatever the package name is). Use Edit-Find, or Ctrl-F, or whatever. In the case of MASS, you find this one: VR: Functions and datasets to support Venables and Ripley, `Modern Applied Statistics with S' (4th edition). Bundle of: MASS class nnet spatial It's the Bundle of part you're looking for. Cheers Jason -- Indigo Industrial Controls Ltd. http://www.indigoindustrial.co.nz 64-21-343-545 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda source code
You are using a *beta* version of R 1.8.0 and in that version lda.default is not visible to the user (hidden in a namespace). You can access it though by using the ::: (triple colon) operator, as in library(MASS) MASS:::lda.default Actually, the first library() call is not necessary. -roger Wei Geng wrote: I am new to R. Trying to find out how lda() {in MASS R1.8.0 Windows} was implemented in R. Does anyone know where to find out lda source code ? Thanks. Wei __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] LDA: normalization of eigenvectors (see SPSS)
The following satisfies some of your constraints but I don't know if it satisfies all of them. Let V = eigenvectors normalized so t(V) %*% V = I. Also, let D.5 = some square root matrix, so t(D.5) %*% D.5 = Derror, and Dm.5 = solve(D.5) = invers of D.5. The Choleski decomposition (chol) provides one such solution, but you can construct a symmetric square root using eigen. Then Vstar = Dm.5%*%V will have the property you mentioned below. Consider the following: (Derror - array(c(1,1,1,4), dim=c(2,2))) [,1] [,2] [1,]11 [2,]14 D.5 - chol(Derror) t(D.5) %*% D.5 [,1] [,2] [1,]11 [2,]14 (Dm.5 - solve(D.5)) [,1] [,2] [1,]1 -0.5773503 [2,]0 0.5773503 (t(Dm.5) %*% Derror %*% Dm.5) [,1] [,2] [1,]10 [2,]01 Thus,t(Vstar)%*%Derror%*%Vstar = t(V)%*%t(Dm.5)%*%Derror%*%Dm.5%*%V = t(V)%*%V = I. hope this helps. spencer graves Christoph Lehmann wrote: Hi dear R-users I try to reproduce the steps included in a LDA. Concerning the eigenvectors there is a difference to SPSS. In my textbook (Bortz) it says, that the matrix with the eigenvectors V usually are not normalized to the length of 1, but in the way that the following holds (SPSS does the same thing): t(Vstar)%*%Derror%*%Vstar = I where Vstar are the normalized eigenvectors. Derror is an error or within squaresum- and crossproduct matrix (squaresum of the p variables on the diagonale, and the non-diagonal elements are the sum of the crossproducts). For Derror the following holds: Dtotal = Dtreat + Derror. Since I assume that many of you are familiar with this transformation: can anybody of you tell me, how to conduct this transformation in R? Would be very nice. Thanks a lot Cheers Christoph __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] LDA: normalization of eigenvectors (see SPSS)
Hi, Christoph: 1. I didn't see in your original email that you wanted V to be orthogonal, only that it's columns have length 1. You have a solution satisfying the latter constraint, but not the former. 2. I don't have time now to sort out the details, and I don't have them on the top of my head. I just entered lda into R 1.6.2 [after library(MASS)] and got the following: lda function (x, ...) { if (is.null(class(x))) class(x) - data.class(x) UseMethod(lda, x, ...) } To decode 'UseMethod(lda, ...)', I requested 'methods(lda)' with the following result: methods(lda) [1] lda.data.frame lda.defaultlda.formulalda.matrix Have you tried listing each of these 4 functions and working through them step by step? I think this should answer your question. Also see Venables and Ripley (2002) Modern Applied Statistics with S, index entry for lda. hth. spencer graves Christoph Lehmann wrote: thanks a lot, Spencer The problem is the following: my textbook has an example with the data: X x x1 x2 x3 1 3 3 4 2 4 4 3 3 4 4 6 4 2 5 5 5 2 4 5 6 3 4 6 7 3 4 4 8 2 5 5 9 4 3 6 10 5 5 6 11 4 5 7 12 4 6 4 13 3 6 6 14 4 7 6 15 6 5 6 -- y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 -- Dtot - (t(x)%*%x-t(xbar)%*%xbar) Dtot x1x2x3 x1 17.73 2.67 4.87 x2 2.67 17.33 4.33 x3 4.87 4.33 16.93 -- A - cbind(tapply(x[,1],y,sum), tapply(x[,2],y,sum), tapply(x[,3],y,sum)) A [,1] [,2] [,3] 1 18 24 29 2 14 17 21 3 21 29 29 G - apply(x,2,sum) G x1 x2 x3 53 70 79 p - ncol(x) k - length(freq) N - sum(freq) Dtreat - array(0,c(p,p)) k - length(freq) for (i in 1:p) + { + for (j in 1:k) + { + for (h in 1:k) + { + Dtreat[i,j] - Dtreat[i,j] + A[h,i]*A[h,j]/freq[h] + } + Dtreat[i,j] - Dtreat[i,j] - G[i]*G[j]/N + } + } Dtreat [,1] [,2] [,3] [1,] 3.93 5.97 3.17 [2,] 5.97 9.78 4.78 [3,] 3.17 4.78 2.55 -- Derror - Dtot-Dtreat Derror x1x2 x3 x1 13.8 -3.30 1.7 x2 -3.3 7.55 -0.45000 x3 1.7 -0.45 14.38333 -- eigen(Dtreat%*%solve(Derror)) $values [1] 2.300398e+00 2.039672e-02 -1.907034e-15 $vectors [,1] [,2] [,3] [1,] -0.4870772 0.6813155 -0.6076020 [2,] -0.7809602 -0.4342229 0.1539928 [3,] -0.3909693 0.5892874 0.7791701 V - eigen(Dtreat%*%solve(Derror))$vectors V [,1] [,2] [,3] [1,] -0.4870772 0.6813155 -0.6076020 [2,] -0.7809602 -0.4342229 0.1539928 [3,] -0.3909693 0.5892874 0.7791701 the textbook (SPSS) has similar eigenvalues, but only two!: lambda1 = 2.30048, lambda2 = 0.02091 , but as I wrote in the last mail: different eigenvectors Let's start here with your recommendation: first, it seems, since the last eigenvalue is almost 0, that the eigenvectors V are not orthogonal: t(V)%*%V [,1][,2][,3] [1,] 1.000 -0.22313575 -0.12894473 [2,] -0.2231357 1. -0.02168078 [3,] -0.1289447 -0.02168078 1. let's continue anyway? D.5 - chol(Derror) t(D.5) %*% D.5 x1x2 x3 x1 13.8 -3.30 1.7 x2 -3.3 7.55 -0.45000 x3 1.7 -0.45 14.38333 Dm.5 - solve(D.5) t(Dm.5) %*% Derror %*% Dm.5 x1x2x3 x1 1.00e+00 -2.523481e-17 -1.097755e-18 x2 -6.625163e-18 1.00e+00 -2.120970e-18 x3 4.501901e-18 4.460942e-19 1.00e+00 perfectly orthogonal t(V)%*%t(Dm.5)%*%Dfehler%*%Dm.5%*%V [,1][,2][,3] [1,] 1.000 -0.22313575 -0.12894473 [2,] -0.2231357 1. -0.02168078 [3,] -0.1289447 -0.02168078 1. again, equals t(V)%*%V not orthogonal. -- I think it has to do with the fact, that the textbook considers the third eigenvalue as = 0 and then gets the Vstar eigenvectors (which I try to reproduce: Vstar = [,1][,2][,3] [1,] 0.1689 0.1419 -0.1825 [2,] 0.3498-0.1597 0.0060 [3,] 0.0625 0.1422 0.2154 - Spencer if you find some minutes time to help me reproduce this example, it would be very nice (the data are from Jones 1961. He investigated whether essays written by children from lower, middle, upper class differ in sentence length, choosen words, complexity of sentence) Cheers Christoph ## The following satisfies some of your constraints but I don't know if it satisfies all of them. Let V = eigenvectors normalized so t(V) %*% V = I. Also, let D.5 = some square root matrix, so t(D.5) %*% D.5 = Derror, and Dm.5 = solve(D.5) = invers of D.5. The Choleski decomposition (chol) provides one such solution, but you can construct a symmetric
Re: [R] lda: how to get the eigenvalues
On Tue, 3 Jun 2003, Christoph Lehmann wrote: How can I get the eigenvalues out of an lda analysis? It uses singular values not eigenvalues: see ?lda for a description of the output, and the print method for one way to use them. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda: how to get the eigenvalues
On Tue, 3 Jun 2003, Christoph Lehmann wrote: How can I get the eigenvalues out of an lda analysis? It uses singular values not eigenvalues: see ?lda for a description of the output, and the print method for one way to use them. the function discrimin ofthe ade4 package performs discriminat analysis with eigen and so produces eigenvalues ($eig) -- Stéphane DRAY --- Biométrie et Biologie évolutive - Equipe Écologie Statistique Universite Lyon 1 - Bat 711 - 69622 Villeurbanne CEDEX - France Tel : 04 72 43 27 56 Fax : 04 72 43 13 88 04 72 43 27 57 E-mail : [EMAIL PROTECTED] --- Webhttp://www.steph280.freesurf.fr/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda: how to get the eigenvalues
let's compare lda and discrimin (ade4) using the iris data: with lda I get: lda1 - lda(iris[,1:4],iris[,5]) lda1$svd [1] 48.642644 4.579983 with discrimin: discrimin1 - discrimin(dudi.pca(iris[,1:4],scan=F),iris[,5],scan=F) discrimin1 eigen values: 0.9699 0.222 so where and how is the relationship? thanks christoph On Tue, 2003-06-03 at 13:01, Prof Brian Ripley wrote: On Tue, 3 Jun 2003, Stephane Dray wrote: On Tue, 3 Jun 2003, Christoph Lehmann wrote: How can I get the eigenvalues out of an lda analysis? It uses singular values not eigenvalues: see ?lda for a description of the output, and the print method for one way to use them. the function discrimin ofthe ade4 package performs discriminat analysis with eigen and so produces eigenvalues ($eig) For those for whom squaring is too difficult, that is. Why recommend software using an inferior algorithm to avoid squaring? -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] LDA
On Tue, 1 Apr 2003, array chip wrote: I used the lda function in the MASS library of S-Plus (R) to do a linear discriminant analysis, and got the linear coefficients, say b1 and b2 for the 2 predictors x1 and x2. I have trouble to calculate the discrimiant scores for each observation, I used 3 ways to try to repeat the scores returned by the predict function in S-Plus: 1. b1*x1+b2*x2 2. b1*(x1-mean of x1)+b2*(x2-mean of x2) 3. b1* standardized x1+b2*standardized x2 (standardize: mean 0 variance 1) none of the above procedures can repeat the scores returned by the predict function. However, method 2 3 can predict the classes correctly if using 0 as cutoff, juts like using the predict function. You've sent this to R-help, but S-PLUS (sic) and R are different as is my lda() function in each. MASS is a book, and it contains the details, as does the code. What should be the correct formula to compute the scores for each observation? BTW, how to retrieve the linear coefficients from an lda object? I can't retrieve it by using @coef, @coefficients, etc. Of course not: in R these are not S4 classes. You could read the help page to see what lda() actually computes. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda on curves
Hi, I recently work about linear dimension reduction for classification. There is a research report on ftp://ftp.stat.math.ethz.ch/Research-Reports/108.html In this report I discuss nine methods for linear dimension reduction, five of which are new. Four of the methods do not perform internal scaling which you want to avoid. Two of these have been published before by other authors. The coordinates are Young, Marco and Odell, Journal of Statistical Planning and Inference, 17 (1987), 307-319 and Hastie and Tibshirani, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (1996), 607-616. I have R functions for all the methods, but I don't want to make them open before the corresponding paper is published. If you are interested, please contact me off list. There is more literature about unscaled canonical variates especially by W. Krzanowski, two references are Krzanowski, Journal of Chemometrics, 9 (1995), 509-520 Kiers and Krzanowski in Gaul, Opitz and Schader (Eds.) Data Analysis, Springer, Berlin 2000, 207-218. Best, Christian On Mon, 17 Feb 2003, Murray Jorgensen wrote: I'm working on a rather interesting consulting problem with a client. A number of physical variables are measured on a number of cricket bowlers in the performance of a delivery. An example variable might be a directional component of angular momentum for a particular joint measured at a large number (101) of equally spaced timepoints. Each bowler generates a (fairly smooth) curve for each variable measured. I decided to represent each curve by a few orthogonal polynomial constrasts. There are 4 groups of bowlers corresponding to various speeds of delivery. I want to use canonical variant analysis to find linear combinations of my transformed variables discriminating well between the groups of bowlers. I used lda() from the MASS library to do this, but examining the output I notice that the higher-order orthogonal polynomials are getting larger coefficients than the more important lower-order ones. This is clearly because some scaling of the variables is being done by lda(), and because the higher-order polynomial vaiable values are smaller, they are scaled up. I would like to turn off this scaling as it is not what is needed in this problem and will cause the tail to wag the dog. There is no obvious parameter to do this in lda(x, grouping, prior = proportions, tol = 1.0e-4, subset, na.action = na.fail, method, CV = FALSE, nu) so I thought that I might try a hack. However: lda function (x, ...) { if (is.null(class(x))) class(x) - data.class(x) UseMethod(lda, x, ...) } which isn't very helpful. Any ideas about how to perform an unscaled canonical variates analysis? Cheers, Murray -- *** Christian Hennig Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (currently) and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg [EMAIL PROTECTED], http://stat.ethz.ch/~hennig/ [EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/ ### ich empfehle www.boag.de __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] lda
R per se does not have lda() function. Package MASS does, and MASS (the book) describes it in detail. If you use a package supporting a book (there are several) do expect to read the book for the fine details On Mon, 10 Feb 2003, Luis Silva wrote: There are some versions of lda. I would like to know which one is behind R's lda function. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] LDA newbie question
At 01:10 PM 2/1/2003, Roland Goecke wrote: Hi, Is there a simple way to get the discriminant score or do I have to manually multiply the coefficients with the data? predict.lda will generate an object with a scores component, among other things. Try ?predict.lda __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help