Hi Paul,

a) Scaling has no effect on decision trees.

b) You shouldn't set max_depth=5. Instead, build fully developed trees
(max_depth=None) or rather tune min_samples_split using
cross-validation.

Hope this helps.

Gilles

On 6 November 2012 16:21,  <[email protected]> wrote:
>
> ear SciKitters,
>
> given a rather unbalanced data set (454 samples with classification "0" and
> 168 samples with classification "1"), I would like to train a RandomForest.
>
> For my data set, I have calculated 177 features per sample.
> In a first step, I have preprocessed my data set:
> "
> dataDescrs_array_scaled = preprocessing.scale(dataDescrs_array)
> "
>
> Or is preprocessing not necessary if one uses a RandomForest classifier?
> In the documentation
> (http://scikit-learn.org/stable/modules/preprocessing.html), RF is not
> explicitly mentioned, but at least machine learning in general is sensitive
> to the distribution of the feature space.
>
> For the training/test set split, I make use of the train_test_split module:
> "
> from sklearn.cross_validation import train_test_split
> X_train,X_test,y_train,y_test = train_test_split
> (dataDescrs_array_scaled,data_activities,test_size=.4)
> "
>
> RF is trained as follows
> "
> from sklearn.ensemble import RandomForestClassifier
> clf_RF = RandomForestClassifier(n_estimators=100,
> max_depth=5,random_state=0,n_jobs=1)
> clf_RF = clf_RF.fit(X_train,y_train)
> y_predict = clf_RF.predict(X_test)
> accuracy  = clf_RF.score(X_test,y_test)
> fpr, tpr, thresholds = metrics.roc_curve(y_test, y_predict)
> print metrics.confusion_matrix
> (y_test,y_predict),"\n",accuracy,"\n",metrics.auc(fpr,tpr)
> "
>
> The performance is rather modest:
> "
> [[175   7]
>  [ 53  14]]
> 0.759036144578
> 0.58524684271
> "
>
> In my of my former mails, it was recommended to make use of reweighting and
> subsampling:
> http://www.mail-archive.com/[email protected]/msg04975.html
> In another thread, the flag "class_weight=auto" was mentioned:
> http://www.mail-archive.com/[email protected]/msg03759.html
> However, this does not work in conjunction with "RandomForestClassifier" -
> did I miss something?
>
>
> Cheers & Thanks,
> Paul
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>
>
> ------------------------------------------------------------------------------
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to