[R] Multiple imputation with plausible values already in the data

2007-07-17 Thread Ulrich Keller
Hello,

this is not really an R-related question, but since the posting guide does not
forbid asking non-R questions (even encourages it to some degree), I though I'd
give it a try.

I am currently doing some secondary analyses of the PISA (http://pisa.oecd.org)
student data. I would like to treat missing values properly, that is using
multiple imputation (with the mix package). But I am not sure how to do the
imputation, since the data set provided by the OECD already contains variables
with plausible values.

Roughly, the situation is like this: for each of the cognitive (achievement)
scales, there are five variables holding plausible values. So for example, there
is not one variable for math achievement, but five, pv1math through pv5math.
There are, of course, no missing values on these variables.

Most other variables show some degree of missing data. For example, some
students did not report their parents' occupation, so there is no information
about the socio-economic background (HISEI). This is the kind of data I want to
impute.

My first thought was splitting the data into five datasets, each holding only
one of the plausible value variables, but all of the normal variables. So e.g.
the first data set would include pv1math, pv1read, HISEI, and gender; while the
second would include pv2math, pv2read, HISEI, and gender. I would run mix on the
five data sets independently and end up with five imputed data sets with no
missing values.

But is this a valid approach? There would actually be two imputation runs per
data set: one for the plausible values on the achievement scales (done by the
OECD under an unknown model), and one for the other variables (done by me with
mix). The second run would use data from the first. Would this not lead to an
overestimation of the imputation variance? What alternative approaches are 
there?

Thank you in advance for you answers,

Uli

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Imputation / Non Parametric Models / Combining Results

2006-12-11 Thread Michael Dewey
At 08:12 08/12/2006, Simon P. Kempf wrote:
Dear R-Users,



The following question is more of general nature than a merely technical
one.  Nevertheless I hope someone get me some answers.



I am in no sense an expert in this area but since 
it seems that noone else has answered so far; I 
wonder whether the mitools package from CRAN helps?


I have been using the mice package to perform the multiple imputations. So
far, everything works fine with the standard regressions analysis.



However, I am wondering, if it is theoretically correct to perform
nonparametric models (GAM, spline smoothing etc.) with multiple imputed
datasets. If yes, how can I combine the results in order to show the
uncertainty?



In the research field of real estate economics, the problem of missing data
is often ignored respectively unmentioned. However, GAM, spline smoothing
etc. become increasingly popular. In my research, I would like to use
multiple imputed datasets and GAM, but I am unsure how present single
results.



Again I want to apologize that this is a rather theoretical statistical
question than a technical question on R.



Thanks in advance for any hints and advices.



Simon











Simon P. Kempf

Dipl.-Kfm. MScRE Immobilienökonom (ebs)

Wissenschaftlicher Assistent



Büro:

IREBS Immobilienakademie

c/o ebs Immobilienakademie GmbH

Berliner Str. 26a

13507 Berlin



Privat:

Dunckerstraße 60

10439 Berlin



Mobil: 0176 7002 6687

Email:  mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]




 [[alternative HTML version deleted]]

Michael Dewey
http://www.aghmed.fsnet.co.uk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple Imputation / Non Parametric Models / Combining Results

2006-12-08 Thread Simon P. Kempf
Dear R-Users,

 

The following question is more of general nature than a merely technical
one.  Nevertheless I hope someone get me some answers.

 

I have been using the mice package to perform the multiple imputations. So
far, everything works fine with the standard regressions analysis. 

 

However, I am wondering, if it is theoretically correct to perform
nonparametric models (GAM, spline smoothing etc.) with multiple imputed
datasets. If yes, how can I combine the results in order to show the
uncertainty?

 

In the research field of real estate economics, the problem of missing data
is often ignored respectively unmentioned. However, GAM, spline smoothing
etc. become increasingly popular. In my research, I would like to use
multiple imputed datasets and GAM, but I am unsure how present single
results. 

 

Again I want to apologize that this is a rather theoretical statistical
question than a technical question on R. 

 

Thanks in advance for any hints and advices.

 

Simon

 

 

 

 

 

Simon P. Kempf 

Dipl.-Kfm. MScRE Immobilienökonom (ebs)

Wissenschaftlicher Assistent

 

Büro:

IREBS Immobilienakademie

c/o ebs Immobilienakademie GmbH

Berliner Str. 26a

13507 Berlin

 

Privat:

Dunckerstraße 60

10439 Berlin

 

Mobil: 0176 7002 6687

Email:  mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple imputation

2006-09-27 Thread ozric
Hi,

is it correct that multiple-Imputation like  mice 
http://www.imputation.com can't understand as a standard data-mining 
task, beacuse i haven't a generalization mechanism perform the model on 
complete new and bigger dataset with a predict method!?

many thanks  regards,
christian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple imputation

2006-09-27 Thread Frank E Harrell Jr
[EMAIL PROTECTED] wrote:
 Hi,
 
 is it correct that multiple-Imputation like  mice 
 http://www.imputation.com can't understand as a standard data-mining 
 task, beacuse i haven't a generalization mechanism perform the model on 
 complete new and bigger dataset with a predict method!?
 
 many thanks  regards,
 christian

This is something we need.  I have not written a predict method for 
aregImpute in the Hmisc package yet (and soon a completely re-written 
version of aregImpute will be posted) but the framework in aregImpute 
may allow such a method to be written.  Volunteers welcome.

Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple imputation using mice with mean

2006-09-25 Thread Eleni Rapsomaniki

Hi

I am trying to impute missing values for my data.frame. As I intend to use the
complete data for prediction I am currently measuring the success of an
imputation method by its resulting classification error in my training data.

I have tried several approaches to replace missing values:
- mean/median substitution
- substitution by a value selected from the observed values of a variable
- MLE in the mix package
- all available methods for numerical data in the MICE package (ie. pmm, sample,
mean and norm)

I found that the least classification error results using mice with the mean
option for numerical data. However, I am not sure how the mean multiple
imputatation differs from the simple mean substitution. I tried to read some of
the documentation supporting the R package, but couldn't find much theory about
the mean imputation method. 

Are there any good papers to explain the background behind each imputation
option in MICE? 

I would really appreciate any comments on the above, as my understanding of
statistics is very limited. 

Many thanks
Eleni Rapsomaniki
Birkbeck College, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple imputation using mice with mean

2006-09-25 Thread Ted Harding
On 25-Sep-06 Eleni Rapsomaniki wrote:
 Hi
 
 I am trying to impute missing values for my data.frame. As I
 intend to use the complete data for prediction I am currently
 measuring the success of an imputation method by its resulting
 classification error in my training data.
 
 I have tried several approaches to replace missing values:
 - mean/median substitution
 - substitution by a value selected from the observed values of
 a variable
 - MLE in the mix package
 - all available methods for numerical data in the MICE package
 (ie. pmm, sample, mean and norm)
 
 I found that the least classification error results using mice
 with the mean option for numerical data. However, I am not
 sure how the mean multiple imputatation differs from the simple
 mean substitution. I tried to read some of the documentation
 supporting the R package, but couldn't find much theory about
 the mean imputation method. 
 
 Are there any good papers to explain the background behind each
 imputation option in MICE? 

I agree that the MICE documentation tends to be silent about some
imporant questions, both in the R/S help pages, and also in the
MICE user's manual which can be found at

  http://web.inter.nl.net/users/S.van.Buuren/mi/docs/Manual.pdf

Possibly it could be worth looking at some of the other relevant
reports listed at

  http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm

but they do not look very hopeful.

That being said, my understanding relating to your query is
(glossing over the technicalities of the Gibbs sampling methods
used in (b))

a) mean/median substitution relates to the very basic method
   of substituting, for a missing value, the arithmetic mean
   of the non  missing values for that variable, possibly
   with selection of cases with non-missing values so as to
   approximately match the observed covariates of the case
   being imputed.
 
b) mean imputation in MICE (as far as I can infer it) means
   that the distribution of the missing value (conditional
   on its observed covariates) is inferred from the cases
   with non-missing values, and the mean of this conditional
   distribution is subsitutedfor the missing value.

These two approaches will in general give different results.

Some further comments.

1. I would suggest that you consider the full multiple imputation
approach. Filling in missing values just once, and then using
the completed results (for predicition, in your case) in some
procedure which treats them as though they were observed values,
will not take into account the uncertainty as to what values
they should have (as opposed to the values they were imputed to have).

Whe multiple imputation is used, the variation from imputation
to imputation in the imputed values will represent this
uncertainty, and so a more realistic picture of the overall
uncertainty of prediction can be obtained.

2. You stated that one method tried was MLE in the mix package.
MLE (maximum likelihood estimation) using the EM algorithm is
implemented in the mix functions em.mix and ecm.mix, but neither
of these produces values to substitute for missing data. The
result is essentially just parameter estimation by MLE based
on the incomplete data.

Values to substitute for missing data are produced by other
functions, such as imp.mix; but these are randomly sampled
from the conditional distributions of the missing values and
therefore, each time it is done, the results are different.
In particular, the first value you sample will be random.
Hence the values you impute will be more or less good, in
terms of your training set, depending on the luck of the
draw when you use (say) imp.mix.

I don't know if I have understood what you meant by MLE in
the mix package, but if the above is a correct understanding
then the remarks under (1) apply: in particular, as just
noted, that comparing a single imputation with your training
set is an uncertain comparison.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 25-Sep-06   Time: 15:33:59
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple imputation of anova tables

2005-11-25 Thread Leo Gürtler
Dear list members,

how can multiple imputation realized for anova tables in R? Concretely, 
how to combine

F-values and R^2, R^2_adjusted from multiple imputations in R?

Of course, the point estimates can be averaged, but how to get 
standarderrors for F-values/R^2 etc. in R?
For linear models, lm.mids() works well, but according to Rubins rules, 
standard errors have to be used together with the estimates to get 
unbiased estimates. The same is needed for lme models. For the 
regression coefficients of lme, it is no problem, because s.e.'s are 
present. But how to combine AIC/ BIC,loglik and especially how to 
proceed with the random effects in lme's? I assume there is a general 
rule which can be applied to all these cases, but I do not get it right.

e.g.

  anova(limo1)
Analysis of Variance Table

Response: lverb.ona
   Df Sum Sq Mean Sq F value  Pr(F)
klasse  6  301.650.3  2.0985 0.05514 .
Residuals 193 4623.324.0
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

or (from the manpage of lme)

   summary(fm2)
Linear mixed-effects model fit by REML
 Data: Orthodont
   AIC  BIClogLik
  447.5125 460.7823 -218.7563

Random effects:
 Formula: ~1 | Subject
(Intercept) Residual
StdDev:1.807425 1.431592

Fixed effects: distance ~ age + Sex
Value Std.Error DF   t-value p-value
(Intercept) 17.706713 0.8339225 80 21.233044  0.
age  0.660185 0.0616059 80 10.716263  0.
SexFemale   -2.321023 0.7614168 25 -3.048294  0.0054
 Correlation:
  (Intr) age  
age   -0.813  
SexFemale -0.372  0.000

Standardized Within-Group Residuals:
Min  Q1 Med  Q3 Max
-3.74889609 -0.55034466 -0.02516628  0.45341781  3.65746539

Number of Observations: 108
Number of Groups: 27
 

and the ANOVA of the lme:

  anova(fm2)
numDF denDF  F-value p-value
(Intercept) 180 4123.156  .0001
age 180  114.838  .0001
Sex 1259.292  0.0054

I am confused about that and I did not find any hint in norm, 
mice/pan/mix or Hmisc.

Any help and hints are appreciated,

best regards

Leo Gürtler / Germany

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multiple imputation with fit.mult.impute in Hmisc

2003-07-28 Thread Jonathan Baron
Thanks for the quick reply!  One more question, below.

On 07/27/03 22:20, Frank E Harrell Jr wrote:
On Sun, 27 Jul 2003 14:47:30 -0400
Jonathan Baron [EMAIL PROTECTED] wrote:

 I have always avoided missing data by keeping my distance from
 the real world.  But I have a student who is doing a study of
 real patients.  We're trying to test regression models using
 multiple imputation.  We did the following (roughly):
 
 f - aregImpute(~ [list of 32 variables, separated by + signs],
  n.impute=20, defaultLinear=T, data=t1)
 # I read that 20 is better than the default of 5.
 # defaultLinear makes sense for our data.
 
 fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest],
  xtrans=f, fitter=lm, data=t1)
 
 and all goes well (usually) except that we get the following
 message at the end of the last step:
 
  Warning message: Not using a Design fitting function;
  summary(fit) will use standard errors, t, P from last imputation
  only.  Use Varcov(fit) to get the correct covariance matrix,
  sqrt(diag(Varcov(fit))) to get s.e.
 
 I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it
 didn't seem to change anything from when I did summary(fmp).
 
 But this Warning message sounds scary.  It sounds like the whole
 process of multiple imputation is being ignored, if only the last
 one is being used.

The warning message may be ignored.  But the advice to use Varcov(fmp) is faulty for 
lm fits - I will fix that in the next release of Hmisc.  You may get the 
imputation-corrected covariance matrix for now using fmp$var

Then it seems to me that summary(fmp) is also giving incorrect
std err.r, t, and p.  Right?  It seems to use Varcof(fmp) and not
fmp$var.

 So I discovered I could get rid of this warning by loading the
 Design library and then using ols instead of lm as the fitter in
 fit.mult.imput.  It seems that ols provides a variance/covariance
 matrix (or something) that fit.mult.impute can use.

That works too.

That gives me what I get if I use lm and then recalculate the t
values by hand from fmp$var.  Thus, ols seems like the way to
go for now, if only to avoid additional calculations.

Jon

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] multiple imputation with fit.mult.impute in Hmisc

2003-07-28 Thread Frank E Harrell Jr
On Mon, 28 Jul 2003 08:18:09 -0400
Jonathan Baron [EMAIL PROTECTED] wrote:

 Thanks for the quick reply!  One more question, below.
 
 On 07/27/03 22:20, Frank E Harrell Jr wrote:
 On Sun, 27 Jul 2003 14:47:30 -0400
 Jonathan Baron [EMAIL PROTECTED] wrote:
 
  I have always avoided missing data by keeping my distance from
  the real world.  But I have a student who is doing a study of
  real patients.  We're trying to test regression models using
  multiple imputation.  We did the following (roughly):
  
  f - aregImpute(~ [list of 32 variables, separated by + signs],
   n.impute=20, defaultLinear=T, data=t1)
  # I read that 20 is better than the default of 5.
  # defaultLinear makes sense for our data.
  
  fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest],
   xtrans=f, fitter=lm, data=t1)
  
  and all goes well (usually) except that we get the following
  message at the end of the last step:
  
   Warning message: Not using a Design fitting function;
   summary(fit) will use standard errors, t, P from last imputation
   only.  Use Varcov(fit) to get the correct covariance matrix,
   sqrt(diag(Varcov(fit))) to get s.e.
  
  I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it
  didn't seem to change anything from when I did summary(fmp).
  
  But this Warning message sounds scary.  It sounds like the whole
  process of multiple imputation is being ignored, if only the last
  one is being used.
 
 The warning message may be ignored.  But the advice to use Varcov(fmp) is faulty 
 for 
 lm fits - I will fix that in the next release of Hmisc.  You may get the 
 imputation-corrected covariance matrix for now using fmp$var
 
 Then it seems to me that summary(fmp) is also giving incorrect
 std err.r, t, and p.  Right?  It seems to use Varcof(fmp) and not
 fmp$var.

summary is using the usual lm output, for the last fit, so it is not adjusted for 
multiple imputation.  Varcov(fmp) is using what summary uses because I forgot to tell 
Varcov.lm to look for fmp$var first.

Frank

 
  So I discovered I could get rid of this warning by loading the
  Design library and then using ols instead of lm as the fitter in
  fit.mult.imput.  It seems that ols provides a variance/covariance
  matrix (or something) that fit.mult.impute can use.
 
 That works too.
 
 That gives me what I get if I use lm and then recalculate the t
 values by hand from fmp$var.  Thus, ols seems like the way to
 go for now, if only to avoid additional calculations.
 
 Jon
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


---
Frank E Harrell Jr  Prof. of Biostatistics  Statistics
Div. of Biostatistics  Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] multiple imputation with fit.mult.impute in Hmisc

2003-07-27 Thread Jonathan Baron
I have always avoided missing data by keeping my distance from
the real world.  But I have a student who is doing a study of
real patients.  We're trying to test regression models using
multiple imputation.  We did the following (roughly):

f - aregImpute(~ [list of 32 variables, separated by + signs],
 n.impute=20, defaultLinear=T, data=t1)
# I read that 20 is better than the default of 5.
# defaultLinear makes sense for our data.

fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest],
 xtrans=f, fitter=lm, data=t1)

and all goes well (usually) except that we get the following
message at the end of the last step:

 Warning message: Not using a Design fitting function;
 summary(fit) will use standard errors, t, P from last imputation
 only.  Use Varcov(fit) to get the correct covariance matrix,
 sqrt(diag(Varcov(fit))) to get s.e.

I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it
didn't seem to change anything from when I did summary(fmp).

But this Warning message sounds scary.  It sounds like the whole
process of multiple imputation is being ignored, if only the last
one is being used.

So I discovered I could get rid of this warning by loading the
Design library and then using ols instead of lm as the fitter in
fit.mult.imput.  It seems that ols provides a variance/covariance
matrix (or something) that fit.mult.impute can use.

But here I am beyond my (very recently acquired) understanding of
what this is all about.

Should I worry about that warning message?  Or am I maybe off the
track in some larger way?

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:http://www.sas.upenn.edu/~baron
R page:   http://finzi.psych.upenn.edu/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] multiple imputation with fit.mult.impute in Hmisc

2003-07-27 Thread Frank E Harrell Jr
On Sun, 27 Jul 2003 14:47:30 -0400
Jonathan Baron [EMAIL PROTECTED] wrote:

 I have always avoided missing data by keeping my distance from
 the real world.  But I have a student who is doing a study of
 real patients.  We're trying to test regression models using
 multiple imputation.  We did the following (roughly):
 
 f - aregImpute(~ [list of 32 variables, separated by + signs],
  n.impute=20, defaultLinear=T, data=t1)
 # I read that 20 is better than the default of 5.
 # defaultLinear makes sense for our data.
 
 fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest],
  xtrans=f, fitter=lm, data=t1)
 
 and all goes well (usually) except that we get the following
 message at the end of the last step:
 
  Warning message: Not using a Design fitting function;
  summary(fit) will use standard errors, t, P from last imputation
  only.  Use Varcov(fit) to get the correct covariance matrix,
  sqrt(diag(Varcov(fit))) to get s.e.
 
 I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it
 didn't seem to change anything from when I did summary(fmp).
 
 But this Warning message sounds scary.  It sounds like the whole
 process of multiple imputation is being ignored, if only the last
 one is being used.

The warning message may be ignored.  But the advice to use Varcov(fmp) is faulty for 
lm fits - I will fix that in the next release of Hmisc.  You may get the 
imputation-corrected covariance matrix for now using fmp$var


 
 So I discovered I could get rid of this warning by loading the
 Design library and then using ols instead of lm as the fitter in
 fit.mult.imput.  It seems that ols provides a variance/covariance
 matrix (or something) that fit.mult.impute can use.

That works too.

Frank

 
 But here I am beyond my (very recently acquired) understanding of
 what this is all about.
 
 Should I worry about that warning message?  Or am I maybe off the
 track in some larger way?
 
 -- 
 Jonathan Baron, Professor of Psychology, University of Pennsylvania
 Home page:http://www.sas.upenn.edu/~baron
 R page:   http://finzi.psych.upenn.edu/
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


---
Frank E Harrell Jr  Prof. of Biostatistics  Statistics
Div. of Biostatistics  Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Multiple imputation

2003-06-12 Thread Jonck van der Kogel
Hi all,
I'm currently working with a dataset that has quite a few missing 
values and after some investigation I figured that multiple imputation 
is probably the best solution to handle the missing data in my case. I 
found several references to functions in S-Plus that perform multiple 
imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions?
I searched the archives but was not able to find anything conclusive 
there.
Any help on this subject is much appreciated.
Thanks, Jonck

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Multiple imputation

2003-06-12 Thread Simon Blomberg
There is also the mice package at http://www.multiple-imputation.com. 
CRAN has package norm.

Simon.

Simon Blomberg, PhD
Depression  Anxiety Consumer Research Unit
Centre for Mental Health Research
Australian National University
http://www.anu.edu.au/cmhr/
[EMAIL PROTECTED]  +61 (2) 6125 3379


 -Original Message-
 From: Jonck van der Kogel [mailto:[EMAIL PROTECTED]
 Sent: Friday, 13 June 2003 7:58 AM
 To: [EMAIL PROTECTED]
 Subject: [R] Multiple imputation
 
 
 Hi all,
 I'm currently working with a dataset that has quite a few missing 
 values and after some investigation I figured that multiple 
 imputation 
 is probably the best solution to handle the missing data in 
 my case. I 
 found several references to functions in S-Plus that perform multiple 
 imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions?
 I searched the archives but was not able to find anything conclusive 
 there.
 Any help on this subject is much appreciated.
 Thanks, Jonck
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Multiple imputation

2003-06-12 Thread Frank E Harrell Jr
On Thu, 12 Jun 2003 23:57:45 +0200
Jonck van der Kogel [EMAIL PROTECTED] wrote:

 Hi all,
 I'm currently working with a dataset that has quite a few missing 
 values and after some investigation I figured that multiple imputation 
 is probably the best solution to handle the missing data in my case. I 
 found several references to functions in S-Plus that perform multiple 
 imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions?
 I searched the archives but was not able to find anything conclusive 
 there.
 Any help on this subject is much appreciated.
 Thanks, Jonck
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Look at the aregImpute function in the Hmisc package 
(http://hesweb1.med.virginia.edu/biostat/s/Hmisc.html).  aregImpute uses the 
bootstrap, predictive mean matching, and flexible additive regression models to do 
multiple imputation.  In one simulation study it performs as well as MICE but it runs 
much faster and does not assume linearity in the imputation models.  I hope that 
someday we'll have simulation studies comparing aregImpute with NORM.
---
Frank E Harrell Jr  Prof. of Biostatistics  Statistics
Div. of Biostatistics  Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Multiple imputation

2003-06-12 Thread John Fox
Dear Jonck,

In addition, there are ports of both norm and mix in the 
contributed-packages section of CRAN.

Regards,
 John
At 07:48 PM 6/12/2003 -0400, Frank E Harrell Jr wrote:
On Thu, 12 Jun 2003 23:57:45 +0200
Jonck van der Kogel [EMAIL PROTECTED] wrote:
 Hi all,
 I'm currently working with a dataset that has quite a few missing
 values and after some investigation I figured that multiple imputation
 is probably the best solution to handle the missing data in my case. I
 found several references to functions in S-Plus that perform multiple
 imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions?
 I searched the archives but was not able to find anything conclusive
 there.
 Any help on this subject is much appreciated.
 Thanks, Jonck

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Look at the aregImpute function in the Hmisc package 
(http://hesweb1.med.virginia.edu/biostat/s/Hmisc.html).  aregImpute uses 
the bootstrap, predictive mean matching, and flexible additive regression 
models to do multiple imputation.  In one simulation study it performs as 
well as MICE but it runs much faster and does not assume linearity in the 
imputation models.  I hope that someday we'll have simulation studies 
comparing aregImpute with NORM.
---
Frank E Harrell Jr  Prof. of Biostatistics  Statistics
Div. of Biostatistics  Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
-
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: [EMAIL PROTECTED]
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help