It depends a bit on the classifier but usually you'd have a loss that is 
(something like)

       A   B
c(A)  0   1
c(B)  1   0

so class weights would make that

       A   B
c(A)  0   1
c(B) 10   0



On 08/04/2015 11:59 PM, Simon Burton wrote:
> Hi Andy,
>
> thankyou for your comments. I am still a bit confused
> about this so let me try again to explain what I am thinking.
>
> Here is a little table showing how we would normally score
> classifier performance on a single sample:
>
>        A   B
> c(A) +1  -1
> c(B) -1  +1
>
> Ie. if the true class is "A" and the classifier prediction c(A)
> is also "A" then score +1, but if c(A) is "B" then score -1, etc.
>
> It would seem if we re-weighted the classes, so that A was
> 10 times more important then this would be our new score table:
>
>        A   B
> c(A) +10 -1
> c(B) -10 +1
>
> Is that correct?
>
> However, I am looking to classify data where a mistake in c(B)
> is much worse than a mistake in c(A):
>
>        A   B
> c(A) +1  -1
> c(B) -10 +1
>
> So I don't see how to achieve this with class weights. Am I
> missing something ?
>
> Perhaps my approach is completely wrong and I should
> be doing something else like regression or something.
>
> Many thanks,
>
> Simon.
>
>
>
> On Tue, 04 Aug 2015 11:36:31 -0400
> Andreas Mueller <t3k...@gmail.com> wrote:
>
>> Hi Simon.
>> In general in scikit-learn you could use class-weights to make one class
>> more important then the other.
>> Unfortunately that is not implemented for AdaBoost yet. You can however
>> use the sample_weights parameter of the fit method,
>> and create sample weights either by hand based on the class, or use the
>> compute_sample_weights function:
>> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py#L82
>>
>> The other possibility is to simply threshold the predict_proba of your
>> classifier differently based on your cost.
>>
>> Best,
>> Andy
>>
>>
>> On 08/04/2015 10:29 AM, Simon Burton wrote:
>>> Hi,
>>>
>>> I am attempting to build some classification models where false-positives 
>>> are
>>> much worse than false-negatives. Normally these two outcomes are
>>> treated equally (equal loss) in the training procedure,
>>> but I would like to be able to customize this.
>>>
>>> I've been using the AdaBoost classifier, which works well as a general
>>> data-miner, except for this issue. I tried hacking a bit on the
>>> code by only boosting the false-positive samples,
>>> but I don't really know if that makes any sense (it tends to forget
>>> about the false-negatives).
>>>
>>> Googling around I found a paper [1] but it's not clear to me
>>> if this is what I am looking for.
>>>
>>> Thankyou for any suggestions.
>>>
>>> Simon.
>>>
>>>
>>> [1] McCane, Brendan; Novins, Kevin; Albert, Michael (2005). "Optimizing 
>>> cascade classifiers.".
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to