You can simply make a new binary feature (per feature that might have a
missing value) that is 1 if the value is missing and 0 otherwise.  The RF
can then work out what to do with this information.

I don't know how this compares in practice to more sophisticated approaches.


On Thursday, October 13, 2016, Stuart Reynolds <>

> I'm looking for a decision tree and RF implementation that supports
> missing data (without imputation) -- ideally in Python, Java/Scala or C++.
> It seems that scikit's decision tree algorithm doesn't allow this --
> which is disappointing because its one of the few methods that should be
> able to sensibly handle problems with high amounts of missingness.
> Are there plans to allow missing data in scikit's decision trees?
> Also, is there any particular reason why missing values weren't supported
> originally (e.g. integrates poorly with other features)
> Regards
> - Stuart
scikit-learn mailing list

Reply via email to