I ran into this several times as well with scikit-learn implementation of GBM. Look at xgboost if you have not already (is there someone out there that hasn't ? :)- it deals with missing values in the predictor space in a very eloquent manner.

http://xgboost.readthedocs.io/en/latest/python/python_intro.html

https://arxiv.org/abs/1603.02754


Jeff



On 10/13/2016 2:14 PM, Stuart Reynolds wrote:
I'm looking for a decision tree and RF implementation that supports missing data (without imputation) -- ideally in Python, Java/Scala or C++.

It seems that scikit's decision tree algorithm doesn't allow this -- which is disappointing because its one of the few methods that should be able to sensibly handle problems with high amounts of missingness.

Are there plans to allow missing data in scikit's decision trees?

Also, is there any particular reason why missing values weren't supported originally (e.g. integrates poorly with other features)

Regards
- Stuart


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to