I am currently working on this (and on the predict method for prcomp, which does exist, BTW). It needs a bit more in the way of sanity checks.

Note that the predict method for lm is for a formula-driven fit, whereas that for princomp is not, hence some of the differences. It is not reasonable to apply the docs for predict.lm to predict.princomp, and they do not work the same way.

On Thu, 24 Mar 2005, Liaw, Andy wrote:

[Re-directing to R-devel, as I think this needs changes to the code.]

Can I suggest a modification to stats:predict.princomp so that it will check
for column (variable) names?

In src/library/stats/R/princomp-add.R, insert the following after line 4:

   if (!is.null(cn <- names(object$center))) newdata <- newdata[, cn]

Now Dana's example looks like:

predict(pca1, frz)
Error in "[.data.frame"(newdata, , names(object$center)) :
       undefined columns selected
names(frz) <- c("x2", "x1")
predict(pca1, frz)
       Comp.1      Comp.2
1  -3.29329963 -1.24675774
2   0.15760569  0.09364550
3   1.90206906  0.06292855
4  -0.92968723  0.64356801
5  -1.15298669  0.25451588
6   0.48466884 -0.87611668
7   0.98602646 -0.52156549
8  -1.53126034 -0.96259529
9  -0.79112984 -1.50831648
10  0.02997392 -0.18888807
names(frz) <- c("x1", "x2")
predict(pca1, frz)
       Comp.1      Comp.2
1   2.49603051 -2.42516162
2  -0.15633499  0.15754735
3  -1.77400454  0.81118427
4   1.05941012  0.23869214
5   1.11286213 -0.20669206
6  -0.83645436 -0.60720531
7  -1.15932677 -0.08488413
8   0.98526969 -1.47482877
9   0.09070675 -1.68781215
10 -0.14930067 -0.15239717

Best,
Andy

From: Dana Honeycutt

I am working with data sets in which the number and order of columns
may vary, but each column is uniquely identified by its name.  E.g.,
one data set might have columns
        MW logP Num_Rings Num_H_Donors
while another has columns
        Num_Rings Num_Atoms Num_H_Donors logP MW

I would like to be able to perform a principal component
analysis (PCA)
on one data set and save the PCA object to a file.  In a
later R session,
I would like to load the object and then apply the loadings to a new
data set in order to compute the principal component (PC) values for
each row of new data.

I am trying to use the princomp method in R to do this. (I started
with prcomp, but found that there is no predict method for objects
created by prcomp.)  The problem is that when using predict on a
princomp object, R ignores the names of columns and simply assumes
that the column order is the same as in the original data frame used
to do the PCA.  (This contrasts, for example, with the behavior of a
model produced by lm, which is aware of column names in a data frame.)

What I think I need to do is this:

1. After reloading the princomp object, extract the names and order
of columns that it expects. (If you look at the loadings for the
object, you can see that this info is there, but I would like to
get at it directly somehow.)

2. Reorder the columns in the new data set to correspond to this
expected order, and remove any extra columns.

3. Use the predict method to predict the PC values for the
new data set.

Is this the best approach to achieve what I am attempting?

If so, can anyone tell me how to accomplish steps 1 and 2 above?

Thanks,
Dana Honeycutt

P.S. Here's a script that demonstrates the problem:

x1 <- rnorm(10)
x2 <- rnorm(10)
y <- rnorm(10)

frx <- data.frame(x1,x2)
frxy <- data.frame(x1,x2,y)

lm1 <- lm(y~x1+x2,frxy)
pca1 <- princomp(frx)

rm(x1,x2,y,frx,frxy)

z1 <- rnorm(10)
z2 <- rnorm(10)
frz <- data.frame(z1,z2)

predict(lm1, frz)  # gives error: Object "x1" not found
predict(pca1, frz) # gives no error, indicating column names ignored

z3 <- rnorm(10)
fr3z <- data.frame(frz,z3)
predict(pca1,fr3z) # gives error due to unexpected number of columns

loadings(pca1) # shows linear combos of variables corresponding to PCs

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel



-- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to