Hi All,
I have asked this question couple of weeks ago on the list. I have a two class
problem where my positive class ( Class 1 ) and negative class ( Class 0 )
is imbalanced. Secondly I care much less about the negative class. So, I
specified both class weight (to a random forest classifier) and sample wright
to
the fit function to give more importance to my positive class.
cl_weight = {0:weight1, 1:weight2}
clf = RandomForestClassifier(n_estimators=400, max_depth=None,
min_samples_split=2, random_state=0, oob_score=True, class_weight = cl_weight,
criterion=“gini")
sample_weight = np.array([weight if m == 1 else 1 for m in df_tr[label_column]])
y_pred = clf.fit(X_tr, y_tr,sample_weight= sample_weight).predict(X_te)
Despite specifying dramatically different class weight I do not observe much
difference.
Example :: cl_weight = {0:0.001, 1:0.999} and cl_weight = {0:0.50, 1:0.50}.
Am I passing the class weight correctly ?
I am giving example of two folds from these two runs :: Fold 1 and Fold 2.
## cl_weight = {0:0.001, 1:0.999}
Fold_1 Confusion Matrix
0 1
0 1681 26
1 636 149
Fold_5 Confusion Matrix
0 1
0 1670 15
1 734 160
## cl_weight = {0:0.50, 1:0.50}
Fold_1 Confusion Matrix
0 1
0 1690 15
1 630 163
Fold_5 Confusion Matrix
0 1
0 1676 14
1 709 170
Thanks,
Mamun
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general