Hello Ya. I am no expert, so I am eager to read suggestions from other people in the mailing list. But just a few pointers I am (somewhat) sure of -
You can try using this package: http://cran.r-project.org/web/packages/imputation/imputation.pdf And use something like kNNImpute. KNN solving is a type of EM. In any event, an imputation based on EM is also based on some assumption of the underlying distribution of the data (observable and missing). From what I see here: http://www.youtube.com/watch?v=xEkJxl6mmQ0 It seems that the EM of SPSS often assumed a (multi?!) normal distribution of the data. Which is a stronger assumption than what knn will use. Also the function I linked to has a CV option to check how stable the imputation process is. If you are looking for more options just google R+imputation. There are numerous packages and functions for this. Good luck, Tal ----------------Contact Details:------------------------------------------------------- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Sat, Jul 21, 2012 at 2:55 PM, ya <xinxi...@163.com> wrote: > Hi list, > > I am wondering if there is a way to use EM algorithm to handle missing > data and get a completed data set in R? > > I usually do it in SPSS because EM in SPSS kind of "fill in" the estimated > value for the missing data, and then the completed dataset can be saved and > used for further analysis. But I have not found a way to get the a > completed data set like this in R or SAS. With Amelia or MICE, the missing > data set were imputed a couple of times, and the new imputed datasets were > not combined. I understand that the parameter estimation can still be done > in the way of combination of estimates from each imputed data set, but it > would be more convenient to have a combined dataset to do some analysis, > for example, ANOVA with IVs having more than two categories. In this case, > the only way to get the main effect of the whole IV is to estimate > parameters in a single data set(as far as I know). If the separated imputed > data sets were used, then the main effect showed in the result were for > each category of the IV, respectively. I figured sometimes the readers and > reviewers would like to see how bi! > g the effect for the whole IV instead of the effect of each category of > that IV. > > This is one of the reasons I can not fully move to R from SPSS. So any > suggestions? > > Thank you very much. > > > > > ya > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.