Re: [scikit-learn] random forests and multil-class probability

Guillaume Lemaître Tue, 27 Jul 2021 03:04:09 -0700

As far that I remember, `precision_recall_curve` and `roc_curve` do not support 
multi class. They are design to work only with binary classification.
Then, we provide an example for precision-recall that shows one way to compute 
precision-recall curve via averaging: 
https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py
 
<https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py>
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/


> On 27 Jul 2021, at 11:42, Sole Galli via scikit-learn 
> <scikit-learn@python.org> wrote:
> 
> Thank you!
> 
> So when in the multiclass document says that for the algorithms that support 
> intrinsically multiclass, which are listed here 
> <https://scikit-learn.org/stable/modules/multiclass.html>, when it says that 
> they do not need to be wrapped by the OnevsRest, it means that there is no 
> need, because they can indeed handle multi class, each one in their own way.
> 
> But, if I want to plot PR curves or ROC curves, then I do need to wrap them 
> because those metrics are calculated as a 1 vs rest manner, and this is not 
> how it is handled by the algos. Is my understanding correct?
> 
> Thank you!
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug <nio...@gmail.com> wrote:
>> To add to Guillaume's answer: the native multiclass support for 
>> forests/trees is described here: 
>> https://scikit-learn.org/stable/modules/tree.html#multi-output-problems 
>> <https://scikit-learn.org/stable/modules/tree.html#multi-output-problems>
>> It's not a one-vs-rest strategy and can be summed up as:
>> 
>> 
>>> Store n output values in leaves, instead of 1;
>>> 
>>> Use splitting criteria that compute the average reduction across all n 
>>> outputs.
>>> 
>> 
>> 
>> Nicolas
>> 
>> On 27/07/2021 10:22, Guillaume Lemaître wrote:
>>>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn 
>>>> <scikit-learn@python.org> <mailto:scikit-learn@python.org> wrote:
>>>> 
>>>> Hello community,
>>>> 
>>>> Do I understand correctly that Random Forests are trained as a 1 vs rest 
>>>> when the target has more than 2 classes? Say the target takes values 0, 1 
>>>> and 2, then the model would train 3 estimators 1 per class under the hood?.
>>> Each decision tree of the forest is natively supporting multi class.
>>> 
>>>> The predict_proba output is an array with 3 columns, containing the 
>>>> probability of each class. If it is 1 vs rest. am I correct to assume that 
>>>> the sum of the probabilities for the 3 classes should not necessarily add 
>>>> up to 1? are they normalized? how is it done so that they do add up to 1?
>>> According to the above answer, the sum for each row of the array given by 
>>> `predict_proba` will sum to 1.
>>> According to the documentation, the probabilities are computed as:
>>> 
>>> The predicted class probabilities of an input sample are computed as the 
>>> mean predicted class probabilities of the trees in the forest. The class 
>>> probability of a single tree is the fraction of samples of the same class 
>>> in a leaf.
>>> 
>>>> Thank you
>>>> Sole
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn@python.org <mailto:scikit-learn@python.org>
>>>> https://mail.python.org/mailman/listinfo/scikit-learn 
>>>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org <mailto:scikit-learn@python.org>
>>> https://mail.python.org/mailman/listinfo/scikit-learn 
>>> <https://mail.python.org/mailman/listinfo/scikit-learn>
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] random forests and multil-class probability

Reply via email to