Hi Stuart Reynold, Like Jacob said we have an active PR at https://github.com/scikit-learn/scikit-learn/pull/5974
You could do git fetch https://github.com/raghavrv/scikit-learn.git missing_values_rf:missing_values_rf git checkout missing_values_rf python setup.py install And try it out. I warn you thought, there are some memory leaks I'm trying to debug. But for the most part it works well and outperforms basic imputation techniques. Please let us know if it breaks / not solves your usecase. Your input as a user of that feature would be invaluable! > I ran into this several times as well with scikit-learn implementation of GBM. Look at xgboost if you have not already (is there someone out there that hasn't ? :)- it deals with missing values in the predictor space in a very eloquent manner. http://xgboost.readthedocs.io/ en/latest/python/python_intro.html <http://xgboost.readthedocs.io/en/latest/python/python_intro.html> The PR handles it in a conceptually similar approach. It is currently implemented for DecisionTreeClassifier. After reviews and integration, DecisionTreeRegressor would also be supporting missing values. Once that happens, enabling it in gradient boosting will be possible. Thanks for the interest!! On Thu, Oct 13, 2016 at 8:33 PM, Raphael C <drr...@gmail.com> wrote: > You can simply make a new binary feature (per feature that might have a > missing value) that is 1 if the value is missing and 0 otherwise. The RF > can then work out what to do with this information. > > I don't know how this compares in practice to more sophisticated > approaches. > > Raphael > > > On Thursday, October 13, 2016, Stuart Reynolds <stu...@stuartreynolds.net> > wrote: > >> I'm looking for a decision tree and RF implementation that supports >> missing data (without imputation) -- ideally in Python, Java/Scala or C++. >> >> It seems that scikit's decision tree algorithm doesn't allow this -- >> which is disappointing because its one of the few methods that should be >> able to sensibly handle problems with high amounts of missingness. >> >> Are there plans to allow missing data in scikit's decision trees? >> >> Also, is there any particular reason why missing values weren't supported >> originally (e.g. integrates poorly with other features) >> >> Regards >> - Stuart >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn