[Scikit-learn-general] DecisionTree: How to split categorical features into two subsets instead of a single value and the rest?

Rex X Fri, 11 Sep 2015 20:02:56 -0700

Given categorical attributes, for instance
city = ['a', 'b', 'c', 'd', 'e', 'f']


With DictVectorizer(), we can transform "city" into a sparse matrix, using
1-of-k representation.

But for each split, the decisionTree evaluate only one single attribute, say
city == 'a' - True or False?

What I want is to ask if the city is in a subset
city.isin['a', 'b', 'c'] - True or False?


As I know, the implementation of MLlib of spark can do this?

Can we make do this within scikit-learn?


Best,
Rex

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] DecisionTree: How to split categorical features into two subsets instead of a single value and the rest?

Reply via email to