Mark, What you are referring to deals with the selection of covariates, since PC doesn't do dimensionality reduction in the sense of covariate selection. But what Mark is asking for is to identify how much each data point contributes to individual PCs. I don't think that Mark's query makes much sense, unless he meant to ask: which individuals have high/low scores on PC1/PC2. Here are some comments that may be tangentially related to Mark's question:
1. If one is worried about a few data points contributing heavily to the estimation of PCs, then one can use robust PCA, for example, using robust covariance matrices. MASS has some tools for this. 2. The "biplot" for the first 2 PCs can give some insights 3. PCs, especially, the last few PCs, can be used to identify "outliers". Hope this is helpful, Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Difford Sent: Monday, July 02, 2007 1:55 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp Hi James, Have a look at Cadima et al.'s subselect package [Cadima worked with/was a student of Prof Jolliffe, one of _the_ experts on PCA; Jolliffe devotes part of a Chapter to this question in his text (Principal Component Analysis, pub. Springer)]. Then you should look at psychometric stuff: a good place to start would be Professor Revelle's psych package. BestR, Mark. James R. Graham wrote: > > Hello All, > > The basic premise of what I want to do is the following: > > I have 20 "entities" for which I have ~500 measurements each. So, I > have a matrix of 20 rows by ~500 columns. > > The 20 entities fall into two classes: "good" and "bad." > > I eventually would like to derive a model that would then be able to > classify new entities as being in "good territory" or "bad territory" > based upon my existing data set. > > I know that not all ~500 measurements are meaningful, so I thought > the best place to begin would be to do a PCA in order to reduce the > amount of data with which I have to work. > > I did this using the prcomp function and found that nearly 90% of the > variance in the data is explained by PC1 and 2. > > So far, so good. > > I would now like to find out which of the original ~500 measurements > contribute to PC1 and 2 and by how much. > > Any tips would be greatly appreciated! And apologies in advance if > this turns out to be an idiotic question. > > > james > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a1139860 8 Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.