Re: [Scikit-learn-general] RandomForest - optimisation of min_samples_split

Andreas Mueller Wed, 07 Nov 2012 06:58:49 -0800

Am 07.11.2012 15:48, schrieb [email protected]:
> However, f1_score is not found. I would have suspected that this works in
> analogy to the recall_score.
How do you mean it is not found? It is in sklearn.metrics.
>> If you want your confusion matrix to be more balanced, you can try two
>> things (as class weights are not implemented yet afaik):
>> - set a different decision threshold: classify all as positive that have
>> a probability of being positive of over .20 (for example).
>> - stratify the dataset, meaning make it such that there is the same
>> number of samples from both classes.
> Indeed, undersampling gives better performance for the class being
> underrepresented. However, I have been doing this in a pre-processing step
> outside sklearn.
> => How to do this within sklearn?
>
afaik there is no function to do this in sklearn yet.
Having that would be convenient but it shouldn't be so hard to do
that in numpy.
PR welcome ;)


Andy

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] RandomForest - optimisation of min_samples_split

Reply via email to