Hey Andy,

I don't think it is so bad to code. Perhaps I could do a related projects
repo first and then see if something similar can be moved in officially?

I would essentially treat missing predictor values as a response variable,
and have a regression model predict missing x_1, then another model predict
missing x_2, and so on to x_n.

I would then repeat this process m times.

Ouwen

On Fri, Oct 23, 2015, 10:03 AM  <josef.p...@gmail.com> wrote:

> On Fri, Oct 23, 2015 at 9:44 AM, Andy <t3k...@gmail.com> wrote:
>
>> Hi Ouwen.
>> I think this looks interesting, and it would be good to have more
>> non-trivial imputation methods.
>>
>> Is anyone familiar with the method? I don't have time to go into the
>> details of the paper at the moment.
>>
>
> statsmodels had a GSOC last year to implement MICE, which hasn't been
> merged yet.
>
> For statistics there are two parts to MICE, impute and combine to adjust
> inference
>
> The core is to cycle through all variables, features and dependent, with
> missing values and impute them based on the other variables, either nearest
> neighbor or with a full regression model for that variable.
>
> This would create "fake" data that would mess up the inference.
> Inference is based on imputing several times through a cycle that is
> similar to Gibbs sampling or MCMC but simplified. Then we combine the
> random imputations to get the results for the model that we are actually
> interested in.
>
> If you don't need inference, then I guess it could be as simple as cycling
> several times through a nearest neighbor search.
>
> Josef
>
>
>>
>> Adding something like this to sklearn is probably a major undertaking. It
>> would likely to be a good addition,
>> but getting it merged may take a lot of effort and patience.
>> You might want to try tackling an easy issue first to become familiar
>> with our development practices.
>>
>> Cheers,
>> Andy
>>
>>
>>
>> On 10/21/2015 05:13 PM, Ouwen Huang wrote:
>>
>> Hello all,
>>
>> MICE is a recent imputation method that is supported by a package in R.
>> However, I would like it to be a part of scikit-learn. I see there exists
>> an imputer that fills in mean, median, and most frequent. Would an added
>> imputation method 'mice' be acceptable? If so, what are the steps to
>> creating this addition for scikit-learn (new to the community)?
>>
>> MICE reference: http://www.jstatsoft.org/article/view/v045i03/v45i03.pdf
>>
>> Best,
>> Ouwen
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to