Re: [Scikit-learn-general] Categorical values and decision tree classifier

2013-09-12 Thread yegle
Hi Kyle, Thank you for the hint, really a good idea :-) -- yegle http://about.me/yegle On Sep 12, 2013, at 10:14 PM, Kyle Kastner wrote: > You can do some regexes for the tree in string form directly to modify the > output, then write out the modified dotfile. As an example (from some code I

Re: [Scikit-learn-general] Categorical values and decision tree classifier

2013-09-12 Thread Kyle Kastner
You can do some regexes for the tree in string form directly to modify the output, then write out the modified dotfile. As an example (from some code I recently did): def customize_graphviz(string, true="Output = True" , false="Output = False"): import re string = re.sub(r'entropy = [0-9]+

Re: [Scikit-learn-general] Categorical values and decision tree classifier

2013-09-12 Thread yegle
Thank you Olivier, this does solves my problem. Although not very perfect. Now the node was marked with "humidity=high <= 0.5000". It'll be great if I can remove the "<=0.5000" part. -- yegle http://about.me/yegle On Sep 12, 2013, at 3:16 AM, Olivier Grisel wrote: > You can pass a feature_nam

Re: [Scikit-learn-general] Categorical values and decision tree classifier

2013-09-12 Thread Olivier Grisel
You can pass a feature_names param to the tree export method. To get the vectorized feature names, use the vectorizer.get_feature_names() method. -- How ServiceNow helps IT people transform IT departments: 1. Consolidate l

Re: [Scikit-learn-general] Categorical values and decision tree classifier

2013-09-12 Thread Gilles Louppe
Dear Yegle, 1) What does your data represent? Are your features numbers or concepts? In the first case, you should try to build your estimator without encoding anything. In the second case, it might also not be necessary to one-hot encode your categorical features. Try with and without encoding a

[Scikit-learn-general] Categorical values and decision tree classifier

2013-09-11 Thread yegle
Hi list, I'm a beginner in Machine Learning and trying to write a classifier using training set containing categorical values. From the document [1] I learned that I need to encode (vectorize) my categorical features in order to be learned by the classifier. So I uses `DictVectorizer` to do th