[Scikit-learn-general] Categorical values and decision tree classifier

yegle Wed, 11 Sep 2013 18:41:21 -0700

Hi list,

I'm a beginner in Machine Learning and trying to write a classifier using 
training set containing categorical values.


From the document [1] I learned that I need to encode (vectorize) my 
categorical features in order to be learned by the classifier. So I uses 
`DictVectorizer` to do this

The code I'm using: http://pastie.org/8318625

But the result graph of the decision tree doesn't make much sense to me. What I 
expect was each node marked with `FEATURE_1 == VALUE_1`, instead of `X[1] <= 
0.5`

So here's my question:

1. Am I dong right in handling features with categorical values?
2. If the previous answer is yes, is it possible to `un-vectorize` in the final 
tree graph so that I don't need to know that `X[1]` and `X[2]` together 
represents a feature?

IMHO WEKA handles categorical values much better than in scikit-learn. I don't 
need to vectorize the training set myself and the graph makes more sense to a 
beginner.



[1]: 
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

--
yegle
http://about.me/yegle

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Categorical values and decision tree classifier

Reply via email to