Re: [scikit-learn] classification model that can handle missing values w/o learning from missing values

Martin Gütlein Fri, 10 Mar 2023 05:21:08 -0800

Hi Gaël,

[...] the other one that
amounts to imputation using a forest, and can be done in scikit-learn
by setting up the IteratuiveImputer to use forests as a base learner
(this will however be slow).

The main difference is that when I use the IterativeImputer inscikit-learn, I still have to apply this imputation on the test set,before being able to predict with the RF. However, other implementationsdo not impute missing values, but instead split up the test instance.

I made the experience that this makes a big difference, and you are ableto use features where the majority of values is missing, and where atthe same time the class ratio of the examples with missing values islargely different to those without missing values.


Kind regards,
Martin





Am 03.03.2023 15:41 schrieb Gael Varoquaux:

On Fri, Mar 03, 2023 at 10:22:04AM +0000, Martin Gütlein wrote:

> 2. Ignores whether a value is missing or not for the inference
What I meant is rather, that the missing value should NOT be treatedas
another possible value of the variable (this is e.g., what the
HistGradientBoostingClassifier implementation in sk-learn does).Instead,multiple predictions could be done when a split-attribute is missing,and
those can be averaged.

This is how it is e.g. implemented in WEKA (we cannot switch do Java,though

;-):
http://web.archive.org/web/20080601175721/http://wekadocs.com/node/2/#_edn4
and described by the inventors of the RF:
https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1


The text that you link to describes two types of strategies, one that
is similar to that done in HistGradientBoosting, the other one that
amounts to imputation using a forest, and can be done in scikit-learn
by setting up the IteratuiveImputer to use forests as a base learner
(this will however be slow).

Cheers,

Gaël

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] classification model that can handle missing values w/o learning from missing values

Reply via email to