Hm, 
best practices for dealing with class imbalances are (still) a tricky business 
I think. Typically, you see people using different sample techniques to shift 
the bias towards the minority class (most often by oversampling). I think the 
class weight in scikit-learn's has a (very) similar effect here? I remember an 
interesting paper about a “skew insensitive splitting criterion” using Gini and 
ROC but I haven’t implemented/tried it yet: 
https://www.cs.bris.ac.uk/~flach/papers/icml03-226.pdf (The Geometry of ROC 
Space:
Understanding Machine Learning Metrics through ROC Isometrics).

Best,
Sebastian


> On Sep 26, 2015, at 8:50 AM, Luca Puggini <lucapug...@gmail.com> wrote:
> 
> Hi, 
> 
> I have binary output y where class 0 has much more samples than class 1. 
> I am trying to understand the importance of each predictor. 
> 
> I do not know if the class weights should be used or not when the tree is 
> trained i.e.
> 
> etw = ExtraTreesClassifier(n_estimators=n_estimators, max_depth = 5, 
> class_weight='auto') 
> or 
> et = ExtraTreesClassifier(n_estimators=n_estimators, max_depth = 5) 
> 
> Is there a preferred option or some literature about this ?
> 
> Thanks,
> Luca
> -- 
> Sent by mobile phone
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to