[Re-directing to R-devel, as I think this needs changes to the code.] Can I suggest a modification to stats:predict.princomp so that it will check for column (variable) names?
In src/library/stats/R/princomp-add.R, insert the following after line 4: if (!is.null(cn <- names(object$center))) newdata <- newdata[, cn] Now Dana's example looks like: > predict(pca1, frz) Error in "[.data.frame"(newdata, , names(object$center)) : undefined columns selected > names(frz) <- c("x2", "x1") > predict(pca1, frz) Comp.1 Comp.2 1 -3.29329963 -1.24675774 2 0.15760569 0.09364550 3 1.90206906 0.06292855 4 -0.92968723 0.64356801 5 -1.15298669 0.25451588 6 0.48466884 -0.87611668 7 0.98602646 -0.52156549 8 -1.53126034 -0.96259529 9 -0.79112984 -1.50831648 10 0.02997392 -0.18888807 > names(frz) <- c("x1", "x2") > predict(pca1, frz) Comp.1 Comp.2 1 2.49603051 -2.42516162 2 -0.15633499 0.15754735 3 -1.77400454 0.81118427 4 1.05941012 0.23869214 5 1.11286213 -0.20669206 6 -0.83645436 -0.60720531 7 -1.15932677 -0.08488413 8 0.98526969 -1.47482877 9 0.09070675 -1.68781215 10 -0.14930067 -0.15239717 Best, Andy > From: Dana Honeycutt > > I am working with data sets in which the number and order of columns > may vary, but each column is uniquely identified by its name. E.g., > one data set might have columns > MW logP Num_Rings Num_H_Donors > while another has columns > Num_Rings Num_Atoms Num_H_Donors logP MW > > I would like to be able to perform a principal component > analysis (PCA) > on one data set and save the PCA object to a file. In a > later R session, > I would like to load the object and then apply the loadings to a new > data set in order to compute the principal component (PC) values for > each row of new data. > > I am trying to use the princomp method in R to do this. (I started > with prcomp, but found that there is no predict method for objects > created by prcomp.) The problem is that when using predict on a > princomp object, R ignores the names of columns and simply assumes > that the column order is the same as in the original data frame used > to do the PCA. (This contrasts, for example, with the behavior of a > model produced by lm, which is aware of column names in a data frame.) > > What I think I need to do is this: > > 1. After reloading the princomp object, extract the names and order > of columns that it expects. (If you look at the loadings for the > object, you can see that this info is there, but I would like to > get at it directly somehow.) > > 2. Reorder the columns in the new data set to correspond to this > expected order, and remove any extra columns. > > 3. Use the predict method to predict the PC values for the > new data set. > > Is this the best approach to achieve what I am attempting? > > If so, can anyone tell me how to accomplish steps 1 and 2 above? > > Thanks, > Dana Honeycutt > > P.S. Here's a script that demonstrates the problem: > > x1 <- rnorm(10) > x2 <- rnorm(10) > y <- rnorm(10) > > frx <- data.frame(x1,x2) > frxy <- data.frame(x1,x2,y) > > lm1 <- lm(y~x1+x2,frxy) > pca1 <- princomp(frx) > > rm(x1,x2,y,frx,frxy) > > z1 <- rnorm(10) > z2 <- rnorm(10) > frz <- data.frame(z1,z2) > > predict(lm1, frz) # gives error: Object "x1" not found > predict(pca1, frz) # gives no error, indicating column names ignored > > z3 <- rnorm(10) > fr3z <- data.frame(frz,z3) > predict(pca1,fr3z) # gives error due to unexpected number of columns > > loadings(pca1) # shows linear combos of variables corresponding to PCs > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel