Hi everyone,

I am trying to run a RandomForest classifier with and without sample weight via 
a custom 5 fold cross validation. 
My data has two classes. [ 0 = Negative and 1 = Positive ].

The ratio between my positive and negative class is 1/22 (i.e. negative class 
is 22 times of positive class). 
I do not observe any performance difference between weighted vs non-weighted 
run. 

Weighted run AUC : Mean AUC = 0.64
Non Weighted run AUC : Mean AUC = 0.64

*** I ran weighted and non-weighted run separately. 

Any comment and suggestion appreciated. 

Thanks,
Mamun

Code Segment
==============

clf = RandomForestClassifier(n_estimators=250, max_depth=None, 
min_samples_split=2, random_state=0, oob_score=True)
weight=tr_neg_data.shape[0]/tr_pos_data.shape[0]
print "Neg to Pos Class Ratio :: ", weight
fold_sample_weight = np.array([weight if m == 1 else 1 for m in 
df_tr[label_column]])


## Clf Non Weighted Run ##
y_pred = clf.fit(X_tr, y_tr ).predict(X_te)
y_prob = clf.fit(X_tr, y_tr ).predict_proba(X_te)

## Clf in Weighted Run ##
y_pred = clf.fit(X_tr, y_tr,sample_weight=fold_sample_weight ).predict(X_te)
y_prob = clf.fit(X_tr, 
y_tr,sample_weight=fold_sample_weight).predict_proba(X_te)

Version 
===========
python : 3.5.0 ; Sklearn : 0.17

 
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to