Re: [scikit-learn] A necessary feature for Decision trees

2018-01-04 Thread Andreas Mueller
Your contribution would be very welcome, I think the current work has stalled. On 01/04/2018 10:02 AM, Julio Antonio Soto de Vicente wrote: Hi Yang Li, I have to agree with you. Bitset and/or one hot encoding are just hacks which should not be necessary for decision tree learners. There is

Re: [scikit-learn] A necessary feature for Decision trees

2018-01-04 Thread Julio Antonio Soto de Vicente
Hi Yang Li, I have to agree with you. Bitset and/or one hot encoding are just hacks which should not be necessary for decision tree learners. There is some WIP on an implementation for natural handling of categorical features in trees: please take a look at https://github.com/scikit-learn/scik

Re: [scikit-learn] A necessary feature for Decision trees

2018-01-04 Thread 李扬
Dear J.B., Thanks for your advice! Yeah, I have considered using bitstring or sequence number, but the problem is the algorithm not the representation of categorical data. Take the regression tree as an example, the algorithm in sklearn find a split value of the feature, and find the best spl

Re: [scikit-learn] A necessary feature for Decision trees

2018-01-03 Thread Brown J.B. via scikit-learn
Dear Yang Li, > Neither the classificationTree nor the regressionTree supports categorical feature. That means the Decision trees model can only accept continuous feature. Consider either manually encoding your categories in bitstrings (e.g., "Facebook" = 001, "Twitter" = 010, "Google" = 100), or

[scikit-learn] A necessary feature for Decision trees

2018-01-03 Thread 李扬
Hi, I`m a graduate student utilizing sklean for some data work. And when I`m handling the data using the Decision Trees library, I found there are some inconvenience: Neither the classificationTree nor the regressionTree supports categorical feature. That means the Decision trees model can only