Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Raghav R V
Hi Stuart Reynold, Like Jacob said we have an active PR at https://github.com/scikit-learn/scikit-learn/pull/5974 You could do git fetch https://github.com/raghavrv/scikit-learn.git missing_values_rf:missing_values_rf git checkout missing_values_rf python setup.py install And try it out. I

Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Dale T Smith
Please define “sensibly”. I would be strongly opposed to modifying any models to incorporate “missingness”. No model handles missing data for you. That is for you to decide based on your individual problem domain. Take a look at a talk from last winter on missing data by Nina Zumel. Nina

Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Raphael C
You can simply make a new binary feature (per feature that might have a missing value) that is 1 if the value is missing and 0 otherwise. The RF can then work out what to do with this information. I don't know how this compares in practice to more sophisticated approaches. Raphael On Thursday,

Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Jason Rudy
It's not a decision tree, but py-earth may also do what you need. It handles missingness as described in section 3.4 here: http://media.salford-systems.com/library/MARS_V2_JHF_LCS-108.pdf. Basically, missingness is considered potentially predictive. On Thu, Oct 13, 2016 at 11:20 AM, Jeff

Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Jeff
I ran into this several times as well with scikit-learn implementation of GBM. Look at xgboost if you have not already (is there someone out there that hasn't ? :)- it deals with missing values in the predictor space in a very eloquent manner.

Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Jacob Schreiber
I think Raghav is working on it in this PR: https://github.com/scikit-learn/scikit-learn/pull/5974 The reason they weren't initially supported is likely that it involves a lot of work and design choices to handle missing values appropriately, and the discussion on the best way to handle it was

Re: [scikit-learn] Permission for creating new labels

2016-10-13 Thread Nelle Varoquaux
On 13 October 2016 at 08:36, Andreas Mueller wrote: > going to the mailing list > > On 10/13/2016 01:35 AM, Raghav R V wrote: > > Thanks for the messages {Ga|Jo}el. ;) > >> We can use "needs second review" as an alternative to "MRG+1" but I don't >> see the point of using both.

Re: [scikit-learn] Permission for creating new labels

2016-10-13 Thread Andreas Mueller
going to the mailing list On 10/13/2016 01:35 AM, Raghav R V wrote: Thanks for the messages {Ga|Jo}el. ;) > We can use "needs second review" as an alternative to "MRG+1" but I don't see the point of using both. I see the system of MRG+1 and MRG+2 as a more robust way of tracking approvals