Re: [scikit-learn] Missing data and decision trees

Dale T Smith Thu, 13 Oct 2016 11:53:03 -0700

Please define “sensibly”. I would be strongly opposed to modifying any models 
to incorporate “missingness”. No model handles missing data for you. That is 
for you to decide based on your individual problem domain.


Take a look at a talk from last winter on missing data by Nina Zumel. Nina 
defines “sensibly” in several ways.

https://www.r-bloggers.com/prepping-data-for-analysis-using-r/



__________________________________________________________________________________________
Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science
770-658-5176 | 5985 State Bridge Road, Johns Creek, GA 30097 | 
[email protected]

From: scikit-learn 
[mailto:[email protected]] On Behalf Of 
Stuart Reynolds
Sent: Thursday, October 13, 2016 2:14 PM
To: [email protected]
Subject: [scikit-learn] Missing data and decision trees

⚠ EXT MSG:
I'm looking for a decision tree and RF implementation that supports missing 
data (without imputation) -- ideally in Python, Java/Scala or C++.

It seems that scikit's decision tree algorithm doesn't allow this -- which is 
disappointing because its one of the few methods that should be able to 
sensibly handle problems with high amounts of missingness.

Are there plans to allow missing data in scikit's decision trees?

Also, is there any particular reason why missing values weren't supported 
originally (e.g. integrates poorly with other features)

Regards
- Stuart
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening 
attachments.

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Missing data and decision trees

Reply via email to