Re: [scikit-learn] random forests and multil-class probability

2021-08-14 Thread Francois Dion
Yellowbrick has multi label precision recall curves and multiclass roc/auc 
builtin:
https://www.scikit-yb.org/en/latest/api/classifier/rocauc.html


Sent from my iPad

> On Jul 27, 2021, at 6:03 AM, Guillaume Lemaître  
> wrote:
> 
> As far that I remember, `precision_recall_curve` and `roc_curve` do not 
> support multi class. They are design to work only with binary classification.
> Then, we provide an example for precision-recall that shows one way to 
> compute precision-recall curve via averaging: 
> https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> 
>> On 27 Jul 2021, at 11:42, Sole Galli via scikit-learn 
>>  wrote:
>> 
>> Thank you!
>> 
>> So when in the multiclass document says that for the algorithms that support 
>> intrinsically multiclass, which are listed here, when it says that they do 
>> not need to be wrapped by the OnevsRest, it means that there is no need, 
>> because they can indeed handle multi class, each one in their own way.
>> 
>> But, if I want to plot PR curves or ROC curves, then I do need to wrap them 
>> because those metrics are calculated as a 1 vs rest manner, and this is not 
>> how it is handled by the algos. Is my understanding correct?
>> 
>> Thank you!
>> 
>> ‐‐‐ Original Message ‐‐‐
>> On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug  
>> wrote:
>>> To add to Guillaume's answer: the native multiclass support for 
>>> forests/trees is described here: 
>>> https://scikit-learn.org/stable/modules/tree.html#multi-output-problems
>>> 
>>> It's not a one-vs-rest strategy and can be summed up as:
>>> 
>>> 
 Store n output values in leaves, instead of 1;
 
 Use splitting criteria that compute the average reduction across all n 
 outputs.
 
>>> 
>>> 
>>> Nicolas
>>> 
>>> On 27/07/2021 10:22, Guillaume Lemaître wrote:
>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn 
>>  wrote:
>> 
>> Hello community,
>> 
>> Do I understand correctly that Random Forests are trained as a 1 vs rest 
>> when the target has more than 2 classes? Say the target takes values 0, 
>> 1 and 2, then the model would train 3 estimators 1 per class under the 
>> hood?.
> Each decision tree of the forest is natively supporting multi class.
> 
> The predict_proba output is an array with 3 columns, containing the 
> probability of each class. If it is 1 vs rest. am I correct to assume 
> that the sum of the probabilities for the 3 classes should not 
> necessarily add up to 1? are they normalized? how is it done so that they 
> do add up to 1?
 According to the above answer, the sum for each row of the array given by 
 `predict_proba` will sum to 1.
 According to the documentation, the probabilities are computed as:
 
 The predicted class probabilities of an input sample are computed as the 
 mean predicted class probabilities of the trees in the forest. The class 
 probability of a single tree is the fraction of samples of the same class 
 in a leaf.
 
> Thank you
> Sole
> 
> 
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
 ___
 scikit-learn mailing list
 scikit-learn@python.org
 https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-08-14 Thread Matteo Caorsi
Greetings!

I am currently out of office, with limited access to emails, till August the 
30th.
Please contact supp...@giotto.ai for technical issues concerning Giotto 
Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 12:42, Brown J.B. via scikit-learn  
wrote:

2021年7月27日(火) 12:03 Guillaume Lemaître :
As far that I remember, `precision_recall_curve` and `roc_curve` do not support 
multi class. They are design to work only with binary classification.

Correct, the TPR-FPR curve (ROC) was originally intended for tuning a free 
parameter, in signal detection, and is a binary-type metric.
For ML problems, it lets you tune/determine an estimator's output value 
threshold (e.g., a probability or a raw discriminant value such as in SVM) for 
arriving an optimized model that will be used to give a final, 
binary-discretized answer in new prediction tasks.

Hope this helps, J.B.

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-08-14 Thread Matteo Caorsi
Greetings!

I am currently out of office, with limited access to emails, till August the 
30th.
Please contact supp...@giotto.ai for technical issue concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 11:31, Sole Galli via scikit-learn  
wrote:

Thank you!

I was confused because in the multiclass documentation it says that for those 
estimators that have multiclass support built in, like Decision trees and 
Random Forests, then we do not need to use the wrapper classes like the 
OnevsRest.

Thus I have the following question, if I want to determine the PR curves or the 
ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? 
Or it does not matter? The probability values do change slightly.

Thank you!





‐‐‐ Original Message ‐‐‐

On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lemaître 
 wrote:

On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn scikit-learn@python.org 
wrote:

Hello community,

Do I understand correctly that Random Forests are trained as a 1 vs rest when 
the target has more than 2 classes? Say the target takes values 0, 1 and 2, 
then the model would train 3 estimators 1 per class under the hood?.

Each decision tree of the forest is natively supporting multi class.

The predict_proba output is an array with 3 columns, containing the probability 
of each class. If it is 1 vs rest. am I correct to assume that the sum of the 
probabilities for the 3 classes should not necessarily add up to 1? are they 
normalized? how is it done so that they do add up to 1?

According to the above answer, the sum for each row of the array given by 
`predict_proba` will sum to 1.

According to the documentation, the probabilities are computed as:

The predicted class probabilities of an input sample are computed as the mean 
predicted class probabilities of the trees in the forest. The class probability 
of a single tree is the fraction of samples of the same class in a leaf.

Thank you

Sole

scikit-learn mailing list

scikit-learn@python.org

https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-08-14 Thread Matteo Caorsi
Greetings!

I am currently out of office, with limited access to emails, till August the 
30th.
Please contact supp...@giotto.ai for technical issues concerning Giotto 
Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 11:31, Sole Galli via scikit-learn  
wrote:

Thank you!

I was confused because in the multiclass documentation it says that for those 
estimators that have multiclass support built in, like Decision trees and 
Random Forests, then we do not need to use the wrapper classes like the 
OnevsRest.

Thus I have the following question, if I want to determine the PR curves or the 
ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? 
Or it does not matter? The probability values do change slightly.

Thank you!





‐‐‐ Original Message ‐‐‐

On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lemaître 
 wrote:

On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn scikit-learn@python.org 
wrote:

Hello community,

Do I understand correctly that Random Forests are trained as a 1 vs rest when 
the target has more than 2 classes? Say the target takes values 0, 1 and 2, 
then the model would train 3 estimators 1 per class under the hood?.

Each decision tree of the forest is natively supporting multi class.

The predict_proba output is an array with 3 columns, containing the probability 
of each class. If it is 1 vs rest. am I correct to assume that the sum of the 
probabilities for the 3 classes should not necessarily add up to 1? are they 
normalized? how is it done so that they do add up to 1?

According to the above answer, the sum for each row of the array given by 
`predict_proba` will sum to 1.

According to the documentation, the probabilities are computed as:

The predicted class probabilities of an input sample are computed as the mean 
predicted class probabilities of the trees in the forest. The class probability 
of a single tree is the fraction of samples of the same class in a leaf.

Thank you

Sole

scikit-learn mailing list

scikit-learn@python.org

https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-08-14 Thread Matteo Caorsi
Greetings!

I am currently out of office, with limited access to emails, till August the 
30th.
Please contact supp...@giotto.ai for technical issue concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.

With best regards,

Matteo


On 27 Jul 2021, at 12:42, Brown J.B. via scikit-learn  
wrote:

2021年7月27日(火) 12:03 Guillaume Lemaître :
As far that I remember, `precision_recall_curve` and `roc_curve` do not support 
multi class. They are design to work only with binary classification.

Correct, the TPR-FPR curve (ROC) was originally intended for tuning a free 
parameter, in signal detection, and is a binary-type metric.
For ML problems, it lets you tune/determine an estimator's output value 
threshold (e.g., a probability or a raw discriminant value such as in SVM) for 
arriving an optimized model that will be used to give a final, 
binary-discretized answer in new prediction tasks.

Hope this helps, J.B.

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-07-27 Thread Brown J.B. via scikit-learn
2021年7月27日(火) 12:03 Guillaume Lemaître :

> As far that I remember, `precision_recall_curve` and `roc_curve` do not
> support multi class. They are design to work only with binary
> classification.
>

Correct, the TPR-FPR curve (ROC) was originally intended for tuning a free
parameter, in signal detection, and is a binary-type metric.
For ML problems, it lets you tune/determine an estimator's output value
threshold (e.g., a probability or a raw discriminant value such as in SVM)
for arriving an optimized model that will be used to give a final,
binary-discretized answer in new prediction tasks.

Hope this helps, J.B.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-07-27 Thread Guillaume Lemaître
As far that I remember, `precision_recall_curve` and `roc_curve` do not support 
multi class. They are design to work only with binary classification.
Then, we provide an example for precision-recall that shows one way to compute 
precision-recall curve via averaging: 
https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py
 

--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

> On 27 Jul 2021, at 11:42, Sole Galli via scikit-learn 
>  wrote:
> 
> Thank you!
> 
> So when in the multiclass document says that for the algorithms that support 
> intrinsically multiclass, which are listed here 
> , when it says that 
> they do not need to be wrapped by the OnevsRest, it means that there is no 
> need, because they can indeed handle multi class, each one in their own way.
> 
> But, if I want to plot PR curves or ROC curves, then I do need to wrap them 
> because those metrics are calculated as a 1 vs rest manner, and this is not 
> how it is handled by the algos. Is my understanding correct?
> 
> Thank you!
> 
> ‐‐‐ Original Message ‐‐‐
> On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug  wrote:
>> To add to Guillaume's answer: the native multiclass support for 
>> forests/trees is described here: 
>> https://scikit-learn.org/stable/modules/tree.html#multi-output-problems 
>> 
>> It's not a one-vs-rest strategy and can be summed up as:
>> 
>> 
>>> Store n output values in leaves, instead of 1;
>>> 
>>> Use splitting criteria that compute the average reduction across all n 
>>> outputs.
>>> 
>> 
>> 
>> Nicolas
>> 
>> On 27/07/2021 10:22, Guillaume Lemaître wrote:
 On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn 
   wrote:
 
 Hello community,
 
 Do I understand correctly that Random Forests are trained as a 1 vs rest 
 when the target has more than 2 classes? Say the target takes values 0, 1 
 and 2, then the model would train 3 estimators 1 per class under the hood?.
>>> Each decision tree of the forest is natively supporting multi class.
>>> 
 The predict_proba output is an array with 3 columns, containing the 
 probability of each class. If it is 1 vs rest. am I correct to assume that 
 the sum of the probabilities for the 3 classes should not necessarily add 
 up to 1? are they normalized? how is it done so that they do add up to 1?
>>> According to the above answer, the sum for each row of the array given by 
>>> `predict_proba` will sum to 1.
>>> According to the documentation, the probabilities are computed as:
>>> 
>>> The predicted class probabilities of an input sample are computed as the 
>>> mean predicted class probabilities of the trees in the forest. The class 
>>> probability of a single tree is the fraction of samples of the same class 
>>> in a leaf.
>>> 
 Thank you
 Sole
 
 
 
 ___
 scikit-learn mailing list
 scikit-learn@python.org 
 https://mail.python.org/mailman/listinfo/scikit-learn 
 
>>> ___
>>> scikit-learn mailing list
>>> scikit-learn@python.org 
>>> https://mail.python.org/mailman/listinfo/scikit-learn 
>>> 
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-07-27 Thread Sole Galli via scikit-learn
Thank you!

So when in the multiclass document says that for the algorithms that support 
intrinsically multiclass, which are listed 
[here](https://scikit-learn.org/stable/modules/multiclass.html), when it says 
that they do not need to be wrapped by the OnevsRest, it means that there is no 
need, because they can indeed handle multi class, each one in their own way.

But, if I want to plot PR curves or ROC curves, then I do need to wrap them 
because those metrics are calculated as a 1 vs rest manner, and this is not how 
it is handled by the algos. Is my understanding correct?

Thank you!

‐‐‐ Original Message ‐‐‐
On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug  wrote:

> To add to Guillaume's answer: the native multiclass support for forests/trees 
> is described here: 
> https://scikit-learn.org/stable/modules/tree.html#multi-output-problems
>
> It's not a one-vs-rest strategy and can be summed up as:
>
>>> -
>>>
>>> Store n output values in leaves, instead of 1;
>>>
>>> -
>>>
>>> Use splitting criteria that compute the average reduction across all n 
>>> outputs.
>
> Nicolas
>
> On 27/07/2021 10:22, Guillaume Lemaître wrote:
>
>>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn
>>> [](mailto:scikit-learn@python.org)
>>> wrote:
>>>
>>> Hello community,
>>>
>>> Do I understand correctly that Random Forests are trained as a 1 vs rest 
>>> when the target has more than 2 classes? Say the target takes values 0, 1 
>>> and 2, then the model would train 3 estimators 1 per class under the hood?.
>>
>> Each decision tree of the forest is natively supporting multi class.
>>
>>> The predict_proba output is an array with 3 columns, containing the 
>>> probability of each class. If it is 1 vs rest. am I correct to assume that 
>>> the sum of the probabilities for the 3 classes should not necessarily add 
>>> up to 1? are they normalized? how is it done so that they do add up to 1?
>>
>> According to the above answer, the sum for each row of the array given by 
>> `predict_proba` will sum to 1.
>> According to the documentation, the probabilities are computed as:
>>
>> The predicted class probabilities of an input sample are computed as the 
>> mean predicted class probabilities of the trees in the forest. The class 
>> probability of a single tree is the fraction of samples of the same class in 
>> a leaf.
>>
>>> Thank you
>>> Sole
>>>
>>> ___
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>>
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>>
>> https://mail.python.org/mailman/listinfo/scikit-learn___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-07-27 Thread Nicolas Hug
To add to Guillaume's answer: the native multiclass support for 
forests/trees is described here: 
https://scikit-learn.org/stable/modules/tree.html#multi-output-problems


It's not a one-vs-rest strategy and can be summed up as:


 *

Store n output values in leaves, instead of 1;

 *

Use splitting criteria that compute the average reduction
across all n outputs.



Nicolas

On 27/07/2021 10:22, Guillaume Lemaître wrote:

On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn  
wrote:

Hello community,

Do I understand correctly that Random Forests are trained as a 1 vs rest when 
the target has more than 2 classes? Say the target takes values 0, 1 and 2, 
then the model would train 3 estimators 1 per class under the hood?.

Each decision tree of the forest is natively supporting multi class.


The predict_proba output is an array with 3 columns, containing the probability 
of each class. If it is 1 vs rest. am I correct to assume that the sum of the 
probabilities for the 3 classes should not necessarily add up to 1? are they 
normalized? how is it done so that they do add up to 1?

According to the above answer, the sum for each row of the array given by 
`predict_proba` will sum to 1.
According to the documentation, the probabilities are computed as:

The predicted class probabilities of an input sample are computed as the mean 
predicted class probabilities of the trees in the forest. The class probability 
of a single tree is the fraction of samples of the same class in a leaf.


Thank you
Sole



___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-07-27 Thread Sole Galli via scikit-learn
Thank you!

I was confused because in the multiclass documentation it says that for those 
estimators that have multiclass support built in, like Decision trees and 
Random Forests, then we do not need to use the wrapper classes like the 
OnevsRest.

Thus I have the following question, if I want to determine the PR curves or the 
ROC curve, say with micro-average, do I need to wrap them with the 1 vs rest? 
Or it does not matter? The probability values do change slightly.

Thank you!





‐‐‐ Original Message ‐‐‐

On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lemaître 
 wrote:

> > On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn 
> > scikit-learn@python.org wrote:
> >
> > Hello community,
> >
> > Do I understand correctly that Random Forests are trained as a 1 vs rest 
> > when the target has more than 2 classes? Say the target takes values 0, 1 
> > and 2, then the model would train 3 estimators 1 per class under the hood?.
>
> Each decision tree of the forest is natively supporting multi class.
>
> > The predict_proba output is an array with 3 columns, containing the 
> > probability of each class. If it is 1 vs rest. am I correct to assume that 
> > the sum of the probabilities for the 3 classes should not necessarily add 
> > up to 1? are they normalized? how is it done so that they do add up to 1?
>
> According to the above answer, the sum for each row of the array given by 
> `predict_proba` will sum to 1.
>
> According to the documentation, the probabilities are computed as:
>
> The predicted class probabilities of an input sample are computed as the mean 
> predicted class probabilities of the trees in the forest. The class 
> probability of a single tree is the fraction of samples of the same class in 
> a leaf.
>
> > Thank you
> >
> > Sole
> >
> > scikit-learn mailing list
> >
> > scikit-learn@python.org
> >
> > https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] random forests and multil-class probability

2021-07-27 Thread Guillaume Lemaître


> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn 
>  wrote:
> 
> Hello community,
> 
> Do I understand correctly that Random Forests are trained as a 1 vs rest when 
> the target has more than 2 classes? Say the target takes values 0, 1 and 2, 
> then the model would train 3 estimators 1 per class under the hood?.

Each decision tree of the forest is natively supporting multi class.

> 
> The predict_proba output is an array with 3 columns, containing the 
> probability of each class. If it is 1 vs rest. am I correct to assume that 
> the sum of the probabilities for the 3 classes should not necessarily add up 
> to 1? are they normalized? how is it done so that they do add up to 1?

According to the above answer, the sum for each row of the array given by 
`predict_proba` will sum to 1.
According to the documentation, the probabilities are computed as:

The predicted class probabilities of an input sample are computed as the mean 
predicted class probabilities of the trees in the forest. The class probability 
of a single tree is the fraction of samples of the same class in a leaf.

> 
> Thank you
> Sole
> 
> 
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] random forests and multil-class probability

2021-07-27 Thread Sole Galli via scikit-learn
Hello community,

Do I understand correctly that Random Forests are trained as a 1 vs rest when 
the target has more than 2 classes? Say the target takes values 0, 1 and 2, 
then the model would train 3 estimators 1 per class under the hood?.

The predict_proba output is an array with 3 columns, containing the probability 
of each class. If it is 1 vs rest. am I correct to assume that the sum of the 
probabilities for the 3 classes should not necessarily add up to 1? are they 
normalized? how is it done so that they do add up to 1?

Thank you
Sole___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn