Re: [R] impute missing values in correlated variables: transcan?

roger koenker Tue, 30 Nov 2004 09:23:31 -0800

At the risk of stirring up a hornet's nest , I'd suggest that
means are dangerous in such applications.  A nice paper
on combining ratings is:  Gilbert Bassett and Joseph  Persky,
Rating Skating,  JASA, 1994,  1075-1079.


url:        www.econ.uiuc.edu/~roger                Roger Koenker
email   [EMAIL PROTECTED]                       Department of Economics
vox:    217-333-4558                            University of Illinois
fax:    217-244-6678                            Champaign, IL 61820

On Nov 30, 2004, at 10:52 AM, Jonathan Baron wrote:

I would like to impute missing data in a set of correlated
variables (columns of a matrix).  It looks like transcan() from
Hmisc is roughly what I want.  It says, "transcan automatically
transforms continuous and categorical variables to have maximum
correlation with the best linear combination of the other
variables." And, "By default, transcan imputes NAs with "best
guess" expected values of transformed variables, back transformed
to the original scale."

But I can't get it to work.  I say

m1 <- matrix(1:20+rnorm(20),5,)  # four correlated variables
colnames(m1) <- paste("R",1:4,sep="")
m1[c(2,19)] <- NA                # simulate some missing data
library(Hmisc)
transcan(m1,data=m1)

and I get

Error in rcspline.eval(y, nk = nk, inclx = TRUE) :
      fewer than 6 non-missing observations with knots omitted

I've tried a few other things, but I think it is time to ask for
help.

The specific problem is a real one.  Our graduate admissions
committee (4 members) rates applications, and we average the
ratings to get an overall rating for each applicant.  Sometimes
one of the committee members is absent, or late; hence the
missing data.  The members differ in the way they use the rating
scale, in both slope and intercept (if you regress each on the
mean).  Many decisions end up depending on the second decimal
place of the averages, so we want to do better than just averging
the non-missing ratings.

Maybe I'm just not seeing something really simple.  In fact, the
problem is simpler than transcan assumes, since we are willing to
assume linearity of the regression of each variable on the other
variables.  Other members proposed solutions that assumed this,
but they did not take into account the fact that missing data at
the high or low end of each variable (each member's ratings)
would change its mean.

Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/

______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] impute missing values in correlated variables: transcan?

Reply via email to