Re: [scikit-learn] classification model that can handle missing values w/o learning from missing values

Gael Varoquaux Fri, 03 Mar 2023 06:43:10 -0800

On Fri, Mar 03, 2023 at 10:22:04AM +0000, Martin Gütlein wrote:
> > 2. Ignores whether a value is missing or not for the inference
> What I meant is rather, that the missing value should NOT be treated as
> another possible value of the variable (this is e.g., what the
> HistGradientBoostingClassifier implementation in sk-learn does). Instead,
> multiple predictions could be done when a split-attribute is missing, and
> those can be averaged.


> This is how it is e.g. implemented in WEKA (we cannot switch do Java, though
> ;-):
> http://web.archive.org/web/20080601175721/http://wekadocs.com/node/2/#_edn4
> and described by the inventors of the RF:
> https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1

The text that you link to describes two types of strategies, one that is 
similar to that done in HistGradientBoosting, the other one that amounts to 
imputation using a forest, and can be done in scikit-learn by setting up the 
IteratuiveImputer to use forests as a base learner (this will however be slow).

Cheers,

Gaël
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] classification model that can handle missing values w/o learning from missing values

Reply via email to