Hi everyone,
I am trying to run a RandomForest classifier with and without sample weight via
a custom 5 fold cross validation.
My data has two classes. [ 0 = Negative and 1 = Positive ].
The ratio between my positive and negative class is 1/22 (i.e. negative class
is 22 times of positive class).
I do not observe any performance difference between weighted vs non-weighted
run.
Weighted run AUC : Mean AUC = 0.64
Non Weighted run AUC : Mean AUC = 0.64
*** I ran weighted and non-weighted run separately.
Any comment and suggestion appreciated.
Thanks,
Mamun
Code Segment
==============
clf = RandomForestClassifier(n_estimators=250, max_depth=None,
min_samples_split=2, random_state=0, oob_score=True)
weight=tr_neg_data.shape[0]/tr_pos_data.shape[0]
print "Neg to Pos Class Ratio :: ", weight
fold_sample_weight = np.array([weight if m == 1 else 1 for m in
df_tr[label_column]])
## Clf Non Weighted Run ##
y_pred = clf.fit(X_tr, y_tr ).predict(X_te)
y_prob = clf.fit(X_tr, y_tr ).predict_proba(X_te)
## Clf in Weighted Run ##
y_pred = clf.fit(X_tr, y_tr,sample_weight=fold_sample_weight ).predict(X_te)
y_prob = clf.fit(X_tr,
y_tr,sample_weight=fold_sample_weight).predict_proba(X_te)
Version
===========
python : 3.5.0 ; Sklearn : 0.17
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general