I ran into this several times as well with scikit-learn implementation
of GBM. Look at xgboost if you have not already (is there someone out
there that hasn't ? :)- it deals with missing values in the predictor
space in a very eloquent manner.
http://xgboost.readthedocs.io/en/latest/python/python_intro.html
https://arxiv.org/abs/1603.02754
Jeff
On 10/13/2016 2:14 PM, Stuart Reynolds wrote:
I'm looking for a decision tree and RF implementation that supports
missing data (without imputation) -- ideally in Python, Java/Scala or
C++.
It seems that scikit's decision tree algorithm doesn't allow this --
which is disappointing because its one of the few methods that should
be able to sensibly handle problems with high amounts of missingness.
Are there plans to allow missing data in scikit's decision trees?
Also, is there any particular reason why missing values weren't
supported originally (e.g. integrates poorly with other features)
Regards
- Stuart
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn