Re: [scikit-learn] sample_weight vs class_weight

Sole Galli via scikit-learn Sat, 05 Dec 2020 04:58:06 -0800

Thank you guys! very helpful :)

Soledad Galli
https://www.trainindata.com/


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, December 4, 2020 12:06 PM, mrschots <maykonsch...@gmail.com> wrote:

> I have been using both in time-series classification. I put a exponential 
> decay in sample_weights AND class weights as a dictionary.
>
> BR/Schots
>
> Em sex., 4 de dez. de 2020 às 12:01, Nicolas Hug <nio...@gmail.com> escreveu:
>
>> Basically passing class weights should be equivalent to passing 
>> per-class-constant sample weights.
>>
>>> why do some estimators allow to pass weights both as a dict in the init or 
>>> as sample weights in fit? what's the logic?
>>
>> SW is a per-sample property (aligned with X and y) so we avoid passing those 
>> to init because the data isn't known when initializing the estimator. It's 
>> only known when calling fit. In general we avoid passing data-related info 
>> into init so that the same instance can be fitted on any data (with 
>> different number of samples, different classes, etc.).
>>
>> We allow to pass class_weight in init because the 'balanced' option is 
>> data-agnostic. Arguably, allowing a dict with actual class values violates 
>> the above argument (of not having data-related stuff in init), so I guess 
>> that's where the logic ends ;)
>>
>> As to why one would use both, I'm not so sure honestly.
>>
>> Nicolas
>>
>> On 12/4/20 10:40 AM, Sole Galli via scikit-learn wrote:
>>
>>> Actually, I found the answer. Both seem to be optimising the loss function 
>>> for the various algorithms, below I include some links.
>>>
>>> If, we pass class_weight and sample_weight, then the final cost / weight is 
>>> a combination of both.
>>>
>>> I have a follow up question: in which scenario would we use both? why do 
>>> some estimators allow to pass weights both as a dict in the init or as 
>>> sample weights in fit? what's the logic? I found it a bit confusing at the 
>>> beginning.
>>>
>>> Thank you!
>>>
>>> https://stackoverflow.com/questions/30805192/scikit-learn-random-forest-class-weight-and-sample-weight-parameters
>>>
>>> https://stackoverflow.com/questions/30972029/how-does-the-class-weight-parameter-in-scikit-learn-work/30982811#30982811
>>>
>>> Soledad Galli
>>> https://www.trainindata.com/
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Thursday, December 3, 2020 11:55 AM, Sole Galli via scikit-learn 
>>> [<scikit-learn@python.org>](mailto:scikit-learn@python.org) wrote:
>>>
>>>> Hello team,
>>>>
>>>> What is the difference in the implementation of class_weight and 
>>>> sample_weight in those algorithms that support both? like random forest or 
>>>> logistic regression?
>>>>
>>>> Are both modifying the loss function? in a similar way?
>>>>
>>>> Thank you!
>>>>
>>>> Sole
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>>
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> --
> Schots

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] sample_weight vs class_weight

Reply via email to