Hm, best practices for dealing with class imbalances are (still) a tricky business I think. Typically, you see people using different sample techniques to shift the bias towards the minority class (most often by oversampling). I think the class weight in scikit-learn's has a (very) similar effect here? I remember an interesting paper about a “skew insensitive splitting criterion” using Gini and ROC but I haven’t implemented/tried it yet: https://www.cs.bris.ac.uk/~flach/papers/icml03-226.pdf (The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics).
Best, Sebastian > On Sep 26, 2015, at 8:50 AM, Luca Puggini <lucapug...@gmail.com> wrote: > > Hi, > > I have binary output y where class 0 has much more samples than class 1. > I am trying to understand the importance of each predictor. > > I do not know if the class weights should be used or not when the tree is > trained i.e. > > etw = ExtraTreesClassifier(n_estimators=n_estimators, max_depth = 5, > class_weight='auto') > or > et = ExtraTreesClassifier(n_estimators=n_estimators, max_depth = 5) > > Is there a preferred option or some literature about this ? > > Thanks, > Luca > -- > Sent by mobile phone > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general