Hi Andy,

thankyou for your comments. I am still a bit confused
about this so let me try again to explain what I am thinking.

Here is a little table showing how we would normally score
classifier performance on a single sample:

      A   B   
c(A) +1  -1  
c(B) -1  +1  

Ie. if the true class is "A" and the classifier prediction c(A)
is also "A" then score +1, but if c(A) is "B" then score -1, etc.

It would seem if we re-weighted the classes, so that A was
10 times more important then this would be our new score table:

      A   B   
c(A) +10 -1
c(B) -10 +1

Is that correct?

However, I am looking to classify data where a mistake in c(B)
is much worse than a mistake in c(A):

      A   B   
c(A) +1  -1  
c(B) -10 +1

So I don't see how to achieve this with class weights. Am I
missing something ? 

Perhaps my approach is completely wrong and I should
be doing something else like regression or something.

Many thanks,

Simon.



On Tue, 04 Aug 2015 11:36:31 -0400
Andreas Mueller <t3k...@gmail.com> wrote:

> Hi Simon.
> In general in scikit-learn you could use class-weights to make one class 
> more important then the other.
> Unfortunately that is not implemented for AdaBoost yet. You can however 
> use the sample_weights parameter of the fit method,
> and create sample weights either by hand based on the class, or use the 
> compute_sample_weights function: 
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py#L82
> 
> The other possibility is to simply threshold the predict_proba of your 
> classifier differently based on your cost.
> 
> Best,
> Andy
> 
> 
> On 08/04/2015 10:29 AM, Simon Burton wrote:
> > Hi,
> >
> > I am attempting to build some classification models where false-positives 
> > are
> > much worse than false-negatives. Normally these two outcomes are
> > treated equally (equal loss) in the training procedure,
> > but I would like to be able to customize this.
> >
> > I've been using the AdaBoost classifier, which works well as a general
> > data-miner, except for this issue. I tried hacking a bit on the
> > code by only boosting the false-positive samples,
> > but I don't really know if that makes any sense (it tends to forget
> > about the false-negatives).
> >
> > Googling around I found a paper [1] but it's not clear to me
> > if this is what I am looking for.
> >
> > Thankyou for any suggestions.
> >
> > Simon.
> >
> >
> > [1] McCane, Brendan; Novins, Kevin; Albert, Michael (2005). "Optimizing 
> > cascade classifiers.".
> >
> >
> > ------------------------------------------------------------------------------
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to