On Fri, Mar 03, 2023 at 10:22:04AM +0000, Martin Gütlein wrote: > > 2. Ignores whether a value is missing or not for the inference > What I meant is rather, that the missing value should NOT be treated as > another possible value of the variable (this is e.g., what the > HistGradientBoostingClassifier implementation in sk-learn does). Instead, > multiple predictions could be done when a split-attribute is missing, and > those can be averaged.
> This is how it is e.g. implemented in WEKA (we cannot switch do Java, though > ;-): > http://web.archive.org/web/20080601175721/http://wekadocs.com/node/2/#_edn4 > and described by the inventors of the RF: > https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#missing1 The text that you link to describes two types of strategies, one that is similar to that done in HistGradientBoosting, the other one that amounts to imputation using a forest, and can be done in scikit-learn by setting up the IteratuiveImputer to use forests as a base learner (this will however be slow). Cheers, Gaël _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn