[R] Multiple imputation with plausible values already in the data
Hello, this is not really an R-related question, but since the posting guide does not forbid asking non-R questions (even encourages it to some degree), I though I'd give it a try. I am currently doing some secondary analyses of the PISA (http://pisa.oecd.org) student data. I would like to treat missing values properly, that is using multiple imputation (with the mix package). But I am not sure how to do the imputation, since the data set provided by the OECD already contains variables with plausible values. Roughly, the situation is like this: for each of the cognitive (achievement) scales, there are five variables holding plausible values. So for example, there is not one variable for math achievement, but five, pv1math through pv5math. There are, of course, no missing values on these variables. Most other variables show some degree of missing data. For example, some students did not report their parents' occupation, so there is no information about the socio-economic background (HISEI). This is the kind of data I want to impute. My first thought was splitting the data into five datasets, each holding only one of the plausible value variables, but all of the normal variables. So e.g. the first data set would include pv1math, pv1read, HISEI, and gender; while the second would include pv2math, pv2read, HISEI, and gender. I would run mix on the five data sets independently and end up with five imputed data sets with no missing values. But is this a valid approach? There would actually be two imputation runs per data set: one for the plausible values on the achievement scales (done by the OECD under an unknown model), and one for the other variables (done by me with mix). The second run would use data from the first. Would this not lead to an overestimation of the imputation variance? What alternative approaches are there? Thank you in advance for you answers, Uli __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Imputation / Non Parametric Models / Combining Results
At 08:12 08/12/2006, Simon P. Kempf wrote: Dear R-Users, The following question is more of general nature than a merely technical one. Nevertheless I hope someone get me some answers. I am in no sense an expert in this area but since it seems that noone else has answered so far; I wonder whether the mitools package from CRAN helps? I have been using the mice package to perform the multiple imputations. So far, everything works fine with the standard regressions analysis. However, I am wondering, if it is theoretically correct to perform nonparametric models (GAM, spline smoothing etc.) with multiple imputed datasets. If yes, how can I combine the results in order to show the uncertainty? In the research field of real estate economics, the problem of missing data is often ignored respectively unmentioned. However, GAM, spline smoothing etc. become increasingly popular. In my research, I would like to use multiple imputed datasets and GAM, but I am unsure how present single results. Again I want to apologize that this is a rather theoretical statistical question than a technical question on R. Thanks in advance for any hints and advices. Simon Simon P. Kempf Dipl.-Kfm. MScRE Immobilienökonom (ebs) Wissenschaftlicher Assistent Büro: IREBS Immobilienakademie c/o ebs Immobilienakademie GmbH Berliner Str. 26a 13507 Berlin Privat: Dunckerstraße 60 10439 Berlin Mobil: 0176 7002 6687 Email: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] [[alternative HTML version deleted]] Michael Dewey http://www.aghmed.fsnet.co.uk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple Imputation / Non Parametric Models / Combining Results
Dear R-Users, The following question is more of general nature than a merely technical one. Nevertheless I hope someone get me some answers. I have been using the mice package to perform the multiple imputations. So far, everything works fine with the standard regressions analysis. However, I am wondering, if it is theoretically correct to perform nonparametric models (GAM, spline smoothing etc.) with multiple imputed datasets. If yes, how can I combine the results in order to show the uncertainty? In the research field of real estate economics, the problem of missing data is often ignored respectively unmentioned. However, GAM, spline smoothing etc. become increasingly popular. In my research, I would like to use multiple imputed datasets and GAM, but I am unsure how present single results. Again I want to apologize that this is a rather theoretical statistical question than a technical question on R. Thanks in advance for any hints and advices. Simon Simon P. Kempf Dipl.-Kfm. MScRE Immobilienökonom (ebs) Wissenschaftlicher Assistent Büro: IREBS Immobilienakademie c/o ebs Immobilienakademie GmbH Berliner Str. 26a 13507 Berlin Privat: Dunckerstraße 60 10439 Berlin Mobil: 0176 7002 6687 Email: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple imputation
Hi, is it correct that multiple-Imputation like mice http://www.imputation.com can't understand as a standard data-mining task, beacuse i haven't a generalization mechanism perform the model on complete new and bigger dataset with a predict method!? many thanks regards, christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple imputation
[EMAIL PROTECTED] wrote: Hi, is it correct that multiple-Imputation like mice http://www.imputation.com can't understand as a standard data-mining task, beacuse i haven't a generalization mechanism perform the model on complete new and bigger dataset with a predict method!? many thanks regards, christian This is something we need. I have not written a predict method for aregImpute in the Hmisc package yet (and soon a completely re-written version of aregImpute will be posted) but the framework in aregImpute may allow such a method to be written. Volunteers welcome. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple imputation using mice with mean
Hi I am trying to impute missing values for my data.frame. As I intend to use the complete data for prediction I am currently measuring the success of an imputation method by its resulting classification error in my training data. I have tried several approaches to replace missing values: - mean/median substitution - substitution by a value selected from the observed values of a variable - MLE in the mix package - all available methods for numerical data in the MICE package (ie. pmm, sample, mean and norm) I found that the least classification error results using mice with the mean option for numerical data. However, I am not sure how the mean multiple imputatation differs from the simple mean substitution. I tried to read some of the documentation supporting the R package, but couldn't find much theory about the mean imputation method. Are there any good papers to explain the background behind each imputation option in MICE? I would really appreciate any comments on the above, as my understanding of statistics is very limited. Many thanks Eleni Rapsomaniki Birkbeck College, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple imputation using mice with mean
On 25-Sep-06 Eleni Rapsomaniki wrote: Hi I am trying to impute missing values for my data.frame. As I intend to use the complete data for prediction I am currently measuring the success of an imputation method by its resulting classification error in my training data. I have tried several approaches to replace missing values: - mean/median substitution - substitution by a value selected from the observed values of a variable - MLE in the mix package - all available methods for numerical data in the MICE package (ie. pmm, sample, mean and norm) I found that the least classification error results using mice with the mean option for numerical data. However, I am not sure how the mean multiple imputatation differs from the simple mean substitution. I tried to read some of the documentation supporting the R package, but couldn't find much theory about the mean imputation method. Are there any good papers to explain the background behind each imputation option in MICE? I agree that the MICE documentation tends to be silent about some imporant questions, both in the R/S help pages, and also in the MICE user's manual which can be found at http://web.inter.nl.net/users/S.van.Buuren/mi/docs/Manual.pdf Possibly it could be worth looking at some of the other relevant reports listed at http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm but they do not look very hopeful. That being said, my understanding relating to your query is (glossing over the technicalities of the Gibbs sampling methods used in (b)) a) mean/median substitution relates to the very basic method of substituting, for a missing value, the arithmetic mean of the non missing values for that variable, possibly with selection of cases with non-missing values so as to approximately match the observed covariates of the case being imputed. b) mean imputation in MICE (as far as I can infer it) means that the distribution of the missing value (conditional on its observed covariates) is inferred from the cases with non-missing values, and the mean of this conditional distribution is subsitutedfor the missing value. These two approaches will in general give different results. Some further comments. 1. I would suggest that you consider the full multiple imputation approach. Filling in missing values just once, and then using the completed results (for predicition, in your case) in some procedure which treats them as though they were observed values, will not take into account the uncertainty as to what values they should have (as opposed to the values they were imputed to have). Whe multiple imputation is used, the variation from imputation to imputation in the imputed values will represent this uncertainty, and so a more realistic picture of the overall uncertainty of prediction can be obtained. 2. You stated that one method tried was MLE in the mix package. MLE (maximum likelihood estimation) using the EM algorithm is implemented in the mix functions em.mix and ecm.mix, but neither of these produces values to substitute for missing data. The result is essentially just parameter estimation by MLE based on the incomplete data. Values to substitute for missing data are produced by other functions, such as imp.mix; but these are randomly sampled from the conditional distributions of the missing values and therefore, each time it is done, the results are different. In particular, the first value you sample will be random. Hence the values you impute will be more or less good, in terms of your training set, depending on the luck of the draw when you use (say) imp.mix. I don't know if I have understood what you meant by MLE in the mix package, but if the above is a correct understanding then the remarks under (1) apply: in particular, as just noted, that comparing a single imputation with your training set is an uncertain comparison. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 25-Sep-06 Time: 15:33:59 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple imputation of anova tables
Dear list members, how can multiple imputation realized for anova tables in R? Concretely, how to combine F-values and R^2, R^2_adjusted from multiple imputations in R? Of course, the point estimates can be averaged, but how to get standarderrors for F-values/R^2 etc. in R? For linear models, lm.mids() works well, but according to Rubins rules, standard errors have to be used together with the estimates to get unbiased estimates. The same is needed for lme models. For the regression coefficients of lme, it is no problem, because s.e.'s are present. But how to combine AIC/ BIC,loglik and especially how to proceed with the random effects in lme's? I assume there is a general rule which can be applied to all these cases, but I do not get it right. e.g. anova(limo1) Analysis of Variance Table Response: lverb.ona Df Sum Sq Mean Sq F value Pr(F) klasse 6 301.650.3 2.0985 0.05514 . Residuals 193 4623.324.0 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 or (from the manpage of lme) summary(fm2) Linear mixed-effects model fit by REML Data: Orthodont AIC BIClogLik 447.5125 460.7823 -218.7563 Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev:1.807425 1.431592 Fixed effects: distance ~ age + Sex Value Std.Error DF t-value p-value (Intercept) 17.706713 0.8339225 80 21.233044 0. age 0.660185 0.0616059 80 10.716263 0. SexFemale -2.321023 0.7614168 25 -3.048294 0.0054 Correlation: (Intr) age age -0.813 SexFemale -0.372 0.000 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.74889609 -0.55034466 -0.02516628 0.45341781 3.65746539 Number of Observations: 108 Number of Groups: 27 and the ANOVA of the lme: anova(fm2) numDF denDF F-value p-value (Intercept) 180 4123.156 .0001 age 180 114.838 .0001 Sex 1259.292 0.0054 I am confused about that and I did not find any hint in norm, mice/pan/mix or Hmisc. Any help and hints are appreciated, best regards Leo Gürtler / Germany __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multiple imputation with fit.mult.impute in Hmisc
Thanks for the quick reply! One more question, below. On 07/27/03 22:20, Frank E Harrell Jr wrote: On Sun, 27 Jul 2003 14:47:30 -0400 Jonathan Baron [EMAIL PROTECTED] wrote: I have always avoided missing data by keeping my distance from the real world. But I have a student who is doing a study of real patients. We're trying to test regression models using multiple imputation. We did the following (roughly): f - aregImpute(~ [list of 32 variables, separated by + signs], n.impute=20, defaultLinear=T, data=t1) # I read that 20 is better than the default of 5. # defaultLinear makes sense for our data. fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest], xtrans=f, fitter=lm, data=t1) and all goes well (usually) except that we get the following message at the end of the last step: Warning message: Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use Varcov(fit) to get the correct covariance matrix, sqrt(diag(Varcov(fit))) to get s.e. I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it didn't seem to change anything from when I did summary(fmp). But this Warning message sounds scary. It sounds like the whole process of multiple imputation is being ignored, if only the last one is being used. The warning message may be ignored. But the advice to use Varcov(fmp) is faulty for lm fits - I will fix that in the next release of Hmisc. You may get the imputation-corrected covariance matrix for now using fmp$var Then it seems to me that summary(fmp) is also giving incorrect std err.r, t, and p. Right? It seems to use Varcof(fmp) and not fmp$var. So I discovered I could get rid of this warning by loading the Design library and then using ols instead of lm as the fitter in fit.mult.imput. It seems that ols provides a variance/covariance matrix (or something) that fit.mult.impute can use. That works too. That gives me what I get if I use lm and then recalculate the t values by hand from fmp$var. Thus, ols seems like the way to go for now, if only to avoid additional calculations. Jon __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] multiple imputation with fit.mult.impute in Hmisc
On Mon, 28 Jul 2003 08:18:09 -0400 Jonathan Baron [EMAIL PROTECTED] wrote: Thanks for the quick reply! One more question, below. On 07/27/03 22:20, Frank E Harrell Jr wrote: On Sun, 27 Jul 2003 14:47:30 -0400 Jonathan Baron [EMAIL PROTECTED] wrote: I have always avoided missing data by keeping my distance from the real world. But I have a student who is doing a study of real patients. We're trying to test regression models using multiple imputation. We did the following (roughly): f - aregImpute(~ [list of 32 variables, separated by + signs], n.impute=20, defaultLinear=T, data=t1) # I read that 20 is better than the default of 5. # defaultLinear makes sense for our data. fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest], xtrans=f, fitter=lm, data=t1) and all goes well (usually) except that we get the following message at the end of the last step: Warning message: Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use Varcov(fit) to get the correct covariance matrix, sqrt(diag(Varcov(fit))) to get s.e. I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it didn't seem to change anything from when I did summary(fmp). But this Warning message sounds scary. It sounds like the whole process of multiple imputation is being ignored, if only the last one is being used. The warning message may be ignored. But the advice to use Varcov(fmp) is faulty for lm fits - I will fix that in the next release of Hmisc. You may get the imputation-corrected covariance matrix for now using fmp$var Then it seems to me that summary(fmp) is also giving incorrect std err.r, t, and p. Right? It seems to use Varcof(fmp) and not fmp$var. summary is using the usual lm output, for the last fit, so it is not adjusted for multiple imputation. Varcov(fmp) is using what summary uses because I forgot to tell Varcov.lm to look for fmp$var first. Frank So I discovered I could get rid of this warning by loading the Design library and then using ols instead of lm as the fitter in fit.mult.imput. It seems that ols provides a variance/covariance matrix (or something) that fit.mult.impute can use. That works too. That gives me what I get if I use lm and then recalculate the t values by hand from fmp$var. Thus, ols seems like the way to go for now, if only to avoid additional calculations. Jon __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help --- Frank E Harrell Jr Prof. of Biostatistics Statistics Div. of Biostatistics Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] multiple imputation with fit.mult.impute in Hmisc
I have always avoided missing data by keeping my distance from the real world. But I have a student who is doing a study of real patients. We're trying to test regression models using multiple imputation. We did the following (roughly): f - aregImpute(~ [list of 32 variables, separated by + signs], n.impute=20, defaultLinear=T, data=t1) # I read that 20 is better than the default of 5. # defaultLinear makes sense for our data. fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest], xtrans=f, fitter=lm, data=t1) and all goes well (usually) except that we get the following message at the end of the last step: Warning message: Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use Varcov(fit) to get the correct covariance matrix, sqrt(diag(Varcov(fit))) to get s.e. I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it didn't seem to change anything from when I did summary(fmp). But this Warning message sounds scary. It sounds like the whole process of multiple imputation is being ignored, if only the last one is being used. So I discovered I could get rid of this warning by loading the Design library and then using ols instead of lm as the fitter in fit.mult.imput. It seems that ols provides a variance/covariance matrix (or something) that fit.mult.impute can use. But here I am beyond my (very recently acquired) understanding of what this is all about. Should I worry about that warning message? Or am I maybe off the track in some larger way? -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page:http://www.sas.upenn.edu/~baron R page: http://finzi.psych.upenn.edu/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] multiple imputation with fit.mult.impute in Hmisc
On Sun, 27 Jul 2003 14:47:30 -0400 Jonathan Baron [EMAIL PROTECTED] wrote: I have always avoided missing data by keeping my distance from the real world. But I have a student who is doing a study of real patients. We're trying to test regression models using multiple imputation. We did the following (roughly): f - aregImpute(~ [list of 32 variables, separated by + signs], n.impute=20, defaultLinear=T, data=t1) # I read that 20 is better than the default of 5. # defaultLinear makes sense for our data. fmp - fit.mult.impute(Y ~ X1 + X2 ... [for the model of interest], xtrans=f, fitter=lm, data=t1) and all goes well (usually) except that we get the following message at the end of the last step: Warning message: Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use Varcov(fit) to get the correct covariance matrix, sqrt(diag(Varcov(fit))) to get s.e. I did try using sqrt(diag(Varcov(fmp))), as it suggested, and it didn't seem to change anything from when I did summary(fmp). But this Warning message sounds scary. It sounds like the whole process of multiple imputation is being ignored, if only the last one is being used. The warning message may be ignored. But the advice to use Varcov(fmp) is faulty for lm fits - I will fix that in the next release of Hmisc. You may get the imputation-corrected covariance matrix for now using fmp$var So I discovered I could get rid of this warning by loading the Design library and then using ols instead of lm as the fitter in fit.mult.imput. It seems that ols provides a variance/covariance matrix (or something) that fit.mult.impute can use. That works too. Frank But here I am beyond my (very recently acquired) understanding of what this is all about. Should I worry about that warning message? Or am I maybe off the track in some larger way? -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page:http://www.sas.upenn.edu/~baron R page: http://finzi.psych.upenn.edu/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help --- Frank E Harrell Jr Prof. of Biostatistics Statistics Div. of Biostatistics Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Multiple imputation
Hi all, I'm currently working with a dataset that has quite a few missing values and after some investigation I figured that multiple imputation is probably the best solution to handle the missing data in my case. I found several references to functions in S-Plus that perform multiple imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions? I searched the archives but was not able to find anything conclusive there. Any help on this subject is much appreciated. Thanks, Jonck __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Multiple imputation
There is also the mice package at http://www.multiple-imputation.com. CRAN has package norm. Simon. Simon Blomberg, PhD Depression Anxiety Consumer Research Unit Centre for Mental Health Research Australian National University http://www.anu.edu.au/cmhr/ [EMAIL PROTECTED] +61 (2) 6125 3379 -Original Message- From: Jonck van der Kogel [mailto:[EMAIL PROTECTED] Sent: Friday, 13 June 2003 7:58 AM To: [EMAIL PROTECTED] Subject: [R] Multiple imputation Hi all, I'm currently working with a dataset that has quite a few missing values and after some investigation I figured that multiple imputation is probably the best solution to handle the missing data in my case. I found several references to functions in S-Plus that perform multiple imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions? I searched the archives but was not able to find anything conclusive there. Any help on this subject is much appreciated. Thanks, Jonck __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Multiple imputation
On Thu, 12 Jun 2003 23:57:45 +0200 Jonck van der Kogel [EMAIL PROTECTED] wrote: Hi all, I'm currently working with a dataset that has quite a few missing values and after some investigation I figured that multiple imputation is probably the best solution to handle the missing data in my case. I found several references to functions in S-Plus that perform multiple imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions? I searched the archives but was not able to find anything conclusive there. Any help on this subject is much appreciated. Thanks, Jonck __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Look at the aregImpute function in the Hmisc package (http://hesweb1.med.virginia.edu/biostat/s/Hmisc.html). aregImpute uses the bootstrap, predictive mean matching, and flexible additive regression models to do multiple imputation. In one simulation study it performs as well as MICE but it runs much faster and does not assume linearity in the imputation models. I hope that someday we'll have simulation studies comparing aregImpute with NORM. --- Frank E Harrell Jr Prof. of Biostatistics Statistics Div. of Biostatistics Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Multiple imputation
Dear Jonck, In addition, there are ports of both norm and mix in the contributed-packages section of CRAN. Regards, John At 07:48 PM 6/12/2003 -0400, Frank E Harrell Jr wrote: On Thu, 12 Jun 2003 23:57:45 +0200 Jonck van der Kogel [EMAIL PROTECTED] wrote: Hi all, I'm currently working with a dataset that has quite a few missing values and after some investigation I figured that multiple imputation is probably the best solution to handle the missing data in my case. I found several references to functions in S-Plus that perform multiple imputation (NORM, CAT, MIX, PAN). Does R have corresponding functions? I searched the archives but was not able to find anything conclusive there. Any help on this subject is much appreciated. Thanks, Jonck __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Look at the aregImpute function in the Hmisc package (http://hesweb1.med.virginia.edu/biostat/s/Hmisc.html). aregImpute uses the bootstrap, predictive mean matching, and flexible additive regression models to do multiple imputation. In one simulation study it performs as well as MICE but it runs much faster and does not assume linearity in the imputation models. I hope that someday we'll have simulation studies comparing aregImpute with NORM. --- Frank E Harrell Jr Prof. of Biostatistics Statistics Div. of Biostatistics Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help - John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: [EMAIL PROTECTED] phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help