Hi, Gregory,

> I dont't use the iris dataset. My classes are distributed in my Y array. 

Yeah, I just used this as a simple example :). 

> the nodes of the graphical tree seem to be filled with the predominant class

I think that’s right, it gets the class name of the majority class at each node 
via "class_name = class_names[np.argmax(value)]” 
(https://github.com/scikit-learn/scikit-learn/blob/3a106fc792eb8e70e1fd078e351ba42487d3214d/sklearn/tree/export.py#L286)

>  in a vector with the classes in alphabetical order ( the same order as in 
> clf.classes_)

yes, it should be in ascending, alpha numerical order. Not sure if this is 
still a general recommendation in the sklearn 0.18, but I typically convert 
string class labels to integers before I feed it to a classifier (but it seems 
to work either way now)

-> from sklearn.preprocessing import LabelEncoder
-> le = LabelEncoder()
-> y = le.fit_transform(labels)
-> le.classes_

array(['Setosa', 'Versicolor', 'Virginica'], 
      dtype='<U21’)

-> import numpy as np
-> np.bincount(y)

array([50, 50, 50])

Best,
Sebastian

> On Oct 25, 2016, at 3:00 AM, greg g <greg...@hotmail.fr> wrote:
> 
> Hi Sebastian,
> Thanks for your answer.
> I dont't use the iris dataset. My classes are distributed in my Y array. 
> It seems that I can get the classes in alphabetical order with 
> > clf.classes_  
> where clf is my tree.
> And with
> > export_graphviz(clf, 
> > out_file=dot_data,feature_names=FEATURES,class_names=clf.classes_)
> the nodes of the graphical tree seem to be filled with the predominant class 
> and samples repartition in a vector with the classes in alphabetical order ( 
> the same order as in clf.classes_)
> I have to confirm that with more classes.
> 
> Regards
> Gregory
> 
> De : scikit-learn <scikit-learn-bounces+greg315=hotmail...@python.org> de la 
> part de Sebastian Raschka <se.rasc...@gmail.com>
> Envoyé : lundi 24 octobre 2016 17:47
> À : Scikit-learn user and developer mailing list
> Objet : Re: [scikit-learn] tree visualization with class names in leaves
>  
> Hi, Greg,
> if you provide the `class_names` argument, a “class” label of the majority 
> class will be added at the bottom of each node. For instance, if you have the 
> Iris dataset, with class labels 0, 1, 2, you can provide the `class_names` as 
> ['setosa', 'versicolor', 'virginica’], where  0 -> ‘setosa’, 1 -> 
> ‘versicolor’, 2 -> ‘virginica’.
> 
> Best,
> Sebastian
> 
> > On Oct 24, 2016, at 10:18 AM, greg g <greg...@hotmail.fr> wrote:
> > 
> > bLaf1ox-forefront-antispam-report: EFV:NLI; SFV:NSPM; 
> > SFS:(10019020)(98900003);
> > DIR:OUT; SFP:1102; SCL:1; SRVR:DB5EUR03HT168;
> > H:DB3PR04MB0780.eurprd04.prod.outlook.com; FPR:; SPF:None; LANG:en;
> > x-ms-office365-filtering-correlation-id: 
> > 319900b9-973c-49bb-8e9a-08d3fc1895c4
> > x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
> > RULEID:(1601124038)(1603103081)(1601125047); SRVR:DB5EUR03HT168;
> > x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
> > RULEID:(432015012)(82015046); SRVR:DB5EUR03HT168; BCL:0; PCL:0; RULEID:;
> > SRVR:DB5EUR03HT168;
> > x-forefront-prvs: 0105DAA385
> > X-OriginatorOrg: outlook.com
> > X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Oct 2016 14:18:11.0102 
> > (UTC)
> > X-MS-Exchange-CrossTenant-fromentityheader: Internet
> > X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
> > X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR03HT168
> > 
> > 
> > Hi,
> >  I just begin with scikit-learn and would like to visualize a 
> > classification tree with class names displayed in the leaves as shown in 
> > the SCIKITLEARN.TREE documentation 
> > http://scikit-learn.org/stable/modules/tree.html where we find 
> > class=’virginica’ etc…
> 
> 1.10. Decision Trees — scikit-learn 0.18 documentation
> scikit-learn.org
> Decision-tree learners can create over-complex trees that do not generalise 
> the data well. This is called overfitting. Mechanisms such as pruning (not 
> currently ...
> 
> > I made a tree providing a 2D array X (n1 samples , n2 features) and 1D 
> > array Y (n1 corresponding classes ) such that Y(i) is the class of the 
> > sample X(i, …)
> > After that I have correct predictions using predict()
> > Then I use the function 
> > export_graphviz(clf, out_file=dot_data,feature_names=FEATURES)
> > with FEATURES being the array of my n2 features names in the same order as 
> > in X 
> > I obtain the tree .png but can’t find a way to have the correct class names 
> > in the leaves…
> > In export_graphviz() should I use the class_names optional parameter and 
> > how ?
> > Thanks for any help
> >  
> > Gregory, Toulouse FRANCE
> > 
> > 
> > 
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn 
> Archives. Using scikit-learn: To post a message to all the list members ...
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> scikit-learn Info Page - Python
> mail.python.org
> To see the collection of prior postings to the list, visit the scikit-learn 
> Archives. Using scikit-learn: To post a message to all the list members ...
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to