Hi Andy,
thankyou for your comments. I am still a bit confused
about this so let me try again to explain what I am thinking.
Here is a little table showing how we would normally score
classifier performance on a single sample:
A B
c(A) +1 -1
c(B) -1 +1
Ie. if the true class is "A" and the classifier prediction c(A)
is also "A" then score +1, but if c(A) is "B" then score -1, etc.
It would seem if we re-weighted the classes, so that A was
10 times more important then this would be our new score table:
A B
c(A) +10 -1
c(B) -10 +1
Is that correct?
However, I am looking to classify data where a mistake in c(B)
is much worse than a mistake in c(A):
A B
c(A) +1 -1
c(B) -10 +1
So I don't see how to achieve this with class weights. Am I
missing something ?
Perhaps my approach is completely wrong and I should
be doing something else like regression or something.
Many thanks,
Simon.
On Tue, 04 Aug 2015 11:36:31 -0400
Andreas Mueller <[email protected]> wrote:
> Hi Simon.
> In general in scikit-learn you could use class-weights to make one class
> more important then the other.
> Unfortunately that is not implemented for AdaBoost yet. You can however
> use the sample_weights parameter of the fit method,
> and create sample weights either by hand based on the class, or use the
> compute_sample_weights function:
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py#L82
>
> The other possibility is to simply threshold the predict_proba of your
> classifier differently based on your cost.
>
> Best,
> Andy
>
>
> On 08/04/2015 10:29 AM, Simon Burton wrote:
> > Hi,
> >
> > I am attempting to build some classification models where false-positives
> > are
> > much worse than false-negatives. Normally these two outcomes are
> > treated equally (equal loss) in the training procedure,
> > but I would like to be able to customize this.
> >
> > I've been using the AdaBoost classifier, which works well as a general
> > data-miner, except for this issue. I tried hacking a bit on the
> > code by only boosting the false-positive samples,
> > but I don't really know if that makes any sense (it tends to forget
> > about the false-negatives).
> >
> > Googling around I found a paper [1] but it's not clear to me
> > if this is what I am looking for.
> >
> > Thankyou for any suggestions.
> >
> > Simon.
> >
> >
> > [1] McCane, Brendan; Novins, Kevin; Albert, Michael (2005). "Optimizing
> > cascade classifiers.".
> >
> >
> > ------------------------------------------------------------------------------
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general