Your contribution would be very welcome, I think the current work has
stalled.
On 01/04/2018 10:02 AM, Julio Antonio Soto de Vicente wrote:
Hi Yang Li,
I have to agree with you. Bitset and/or one hot encoding are just
hacks which should not be necessary for decision tree learners.
There is
Hi Yang Li,
I have to agree with you. Bitset and/or one hot encoding are just hacks which
should not be necessary for decision tree learners.
There is some WIP on an implementation for natural handling of categorical
features in trees: please take a look at
https://github.com/scikit-learn/scik
Dear J.B.,
Thanks for your advice!
Yeah, I have considered using bitstring or sequence number, but the problem is
the algorithm not the representation of categorical data.
Take the regression tree as an example, the algorithm in sklearn find a split
value of the feature, and find the best spl
Dear Yang Li,
> Neither the classificationTree nor the regressionTree supports
categorical feature. That means the Decision trees model can only accept
continuous feature.
Consider either manually encoding your categories in bitstrings (e.g.,
"Facebook" = 001, "Twitter" = 010, "Google" = 100), or
Hi, I`m a graduate student utilizing sklean for some data work.
And when I`m handling the data using the Decision Trees library, I found there
are some inconvenience:
Neither the classificationTree nor the regressionTree supports categorical
feature. That means the Decision trees model can only