Hey Andy, I don't think it is so bad to code. Perhaps I could do a related projects repo first and then see if something similar can be moved in officially?
I would essentially treat missing predictor values as a response variable, and have a regression model predict missing x_1, then another model predict missing x_2, and so on to x_n. I would then repeat this process m times. Ouwen On Fri, Oct 23, 2015, 10:03 AM <josef.p...@gmail.com> wrote: > On Fri, Oct 23, 2015 at 9:44 AM, Andy <t3k...@gmail.com> wrote: > >> Hi Ouwen. >> I think this looks interesting, and it would be good to have more >> non-trivial imputation methods. >> >> Is anyone familiar with the method? I don't have time to go into the >> details of the paper at the moment. >> > > statsmodels had a GSOC last year to implement MICE, which hasn't been > merged yet. > > For statistics there are two parts to MICE, impute and combine to adjust > inference > > The core is to cycle through all variables, features and dependent, with > missing values and impute them based on the other variables, either nearest > neighbor or with a full regression model for that variable. > > This would create "fake" data that would mess up the inference. > Inference is based on imputing several times through a cycle that is > similar to Gibbs sampling or MCMC but simplified. Then we combine the > random imputations to get the results for the model that we are actually > interested in. > > If you don't need inference, then I guess it could be as simple as cycling > several times through a nearest neighbor search. > > Josef > > >> >> Adding something like this to sklearn is probably a major undertaking. It >> would likely to be a good addition, >> but getting it merged may take a lot of effort and patience. >> You might want to try tackling an easy issue first to become familiar >> with our development practices. >> >> Cheers, >> Andy >> >> >> >> On 10/21/2015 05:13 PM, Ouwen Huang wrote: >> >> Hello all, >> >> MICE is a recent imputation method that is supported by a package in R. >> However, I would like it to be a part of scikit-learn. I see there exists >> an imputer that fills in mean, median, and most frequent. Would an added >> imputation method 'mice' be acceptable? If so, what are the steps to >> creating this addition for scikit-learn (new to the community)? >> >> MICE reference: http://www.jstatsoft.org/article/view/v045i03/v45i03.pdf >> >> Best, >> Ouwen >> >> >> ------------------------------------------------------------------------------ >> >> >> >> _______________________________________________ >> Scikit-learn-general mailing >> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general