Hi list,
I'm a beginner in Machine Learning and trying to write a classifier using
training set containing categorical values.
From the document [1] I learned that I need to encode (vectorize) my
categorical features in order to be learned by the classifier. So I uses
`DictVectorizer` to do this
The code I'm using: http://pastie.org/8318625
But the result graph of the decision tree doesn't make much sense to me. What I
expect was each node marked with `FEATURE_1 == VALUE_1`, instead of `X[1] <=
0.5`
So here's my question:
1. Am I dong right in handling features with categorical values?
2. If the previous answer is yes, is it possible to `un-vectorize` in the final
tree graph so that I don't need to know that `X[1]` and `X[2]` together
represents a feature?
IMHO WEKA handles categorical values much better than in scikit-learn. I don't
need to vectorize the training set myself and the graph makes more sense to a
beginner.
[1]:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
--
yegle
http://about.me/yegle
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general