Hi Frank, >> If anyone knows of better references for this please let me know.
Many thanks: I was not aware of the Witten paper. If I turn up anything else I will be sure to let you know. Best Regards, Mark. Frank E Harrell Jr wrote: > > Mark Difford wrote: >> Hi All, >> >> I beg to differ with Ravi Varadhan's perspective. While it is true that >> principal component analysis does not itself do variable selection, it is >> an >> important method for pointing the way to what to select. This is what the >> methods in the subselect package rely on. (One of its authors was I >> believe >> a student of Jolliffe's). For a modern perspective on this, see the >> following paper: >> >> Debashis Paul, Eric Bair, Trevor Hastie and Robert Tibshirani: >> "Preconditioning" for feature selection and regression in >> high-dimensional >> problems We show that supervised principal components followed by a >> variable >> selection procedure is an effective approach for variable selection in >> very >> high dimension. Annals of Statistics 36(4), 2008, 1595-1618. >> >> http://www-stat.stanford.edu/~hastie/Papers/Preconditioning_Annals.pdf >> >> Regards, Mark. > > Mark, > > Slightly more relevant is the unsupervised sparse principal component > methods described in the following references. If anyone knows of > better references for this please let me know. -Frank > > > @Article{zou06spa, > author = {Zhou, Hui and Hastie, Trevor and Tibshirani, Robert}, > title = {Sparse principal component analysis}, > journal = J Comp Graph Stat, > year = 2006, > volume = 15, > pages = {265-286}, > annote = {gene microarray;lasso/elastic net;multivariate > analysis;data reduction;singular value > decomposition;thresholding;principal components analysis that shrinks > some loadings to zero} > } > @Article{wit08tes, > author = {Witten, Daniela M. and Tibshirani, Robert}, > title = {Testing significance of features by lassoed principal > components}, > journal = Annals Appl Stat, > year = 2008, > volume = 2, > number = 3, > pages = {986-1012}, > annote = {reduction in false discovery rates over using a vector of > t-statistics;borrowing strength across genes;``one would not expect a > single gene to be associated with the outcome, since, in practice, many > genes work together to effect a particular phenotype. LPC effectively > down-weights individual genes that are associated with the outcome but > that do not share an expression pattern with a larger group of genes, > and instead favors large groups of genes that appear to be > differentially-expressed.'';regress principal components on outcome} > } > >> >> >> Ravi Varadhan wrote: >>> Principal components analysis does "dimensionality reduction" but NOT >>> "variable reduction". However, Jolliffe's 2004 book on PCA does discuss >>> the >>> problem of selecting a subset of variables, with the goal of >>> representing >>> the internal variation of original multivariate vector as well as >>> possible >>> (see Section 6.3 of that book). I do not think that these methods can >>> handle missing data. The most important issue is to think about the >>> goal >>> of >>> variable reduction and then choose an appropriate optimality criterion >>> for >>> achieving that goal. In most instances of variable selection, the >>> criterion >>> that is optimized is never explicitly considered. >>> >>> Ravi. >>> >>> ---------------------------------------------------------------------------- >>> ------- >>> >>> Ravi Varadhan, Ph.D. >>> >>> Assistant Professor, The Center on Aging and Health >>> >>> Division of Geriatric Medicine and Gerontology >>> >>> Johns Hopkins University >>> >>> Ph: (410) 502-2619 >>> >>> Fax: (410) 614-9625 >>> >>> Email: [EMAIL PROTECTED] >>> >>> Webpage: >>> http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html >>> >>> >>> >>> ---------------------------------------------------------------------------- >>> -------- >>> >>> >>> -----Original Message----- >>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >>> On >>> Behalf Of Gabor Grothendieck >>> Sent: Tuesday, December 09, 2008 8:00 AM >>> To: Harsh >>> Cc: [email protected] >>> Subject: Re: [R] Pre-model Variable Reduction >>> >>> See: >>> >>> ?prcomp >>> ?princomp >>> >>> On Tue, Dec 9, 2008 at 5:34 AM, Harsh <[EMAIL PROTECTED]> wrote: >>>> Hello All, >>>> I am trying to carry out variable reduction. I do not have information >>>> about the dependent variable, and have only the X variables as it >>>> were. >>>> In selecting variables I wish to keep, I have considered the following >>> criteria. >>>> 1) Percentage of missing value in each column/variable >>>> 2) Variance of each variable, with a cut-off value. >>>> >>>> I recently came across Weka and found that there is an RWeka package >>>> which would allow me to make use of Weka through R. >>>> Weka provides a "Genetic search" variable reduction method, but I >>>> could not find its R code implementation in the RWeka Pdf file on >>>> CRAN. >>>> >>>> I looked for other R packages that allow me to do variable reduction >>>> without considering a dependent variable. I came across 'dprep' >>>> package but it does not have a Windows implementation. >>>> >>>> Moreover, I have a dataset that contains continuous and categorical >>>> variables, some categorical variables having 3 levels, 10 levels and >>>> so on, till a max 50 levels (E.g. States in the USA). >>>> >>>> Any suggestions in this regard will be much appreciated. >>>> >>>> Thank you >>>> >>>> Harsh Singhal >>>> Decision Systems, >>>> Mu Sigma, Inc. >>>> >>>> ______________________________________________ >>>> [email protected] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> ______________________________________________ >>> [email protected] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> [email protected] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Pre-model-Variable-Reduction-tp20912229p20919501.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

