It depends a bit on the classifier but usually you'd have a loss that is (something like)
A B c(A) 0 1 c(B) 1 0 so class weights would make that A B c(A) 0 1 c(B) 10 0 On 08/04/2015 11:59 PM, Simon Burton wrote: > Hi Andy, > > thankyou for your comments. I am still a bit confused > about this so let me try again to explain what I am thinking. > > Here is a little table showing how we would normally score > classifier performance on a single sample: > > A B > c(A) +1 -1 > c(B) -1 +1 > > Ie. if the true class is "A" and the classifier prediction c(A) > is also "A" then score +1, but if c(A) is "B" then score -1, etc. > > It would seem if we re-weighted the classes, so that A was > 10 times more important then this would be our new score table: > > A B > c(A) +10 -1 > c(B) -10 +1 > > Is that correct? > > However, I am looking to classify data where a mistake in c(B) > is much worse than a mistake in c(A): > > A B > c(A) +1 -1 > c(B) -10 +1 > > So I don't see how to achieve this with class weights. Am I > missing something ? > > Perhaps my approach is completely wrong and I should > be doing something else like regression or something. > > Many thanks, > > Simon. > > > > On Tue, 04 Aug 2015 11:36:31 -0400 > Andreas Mueller <t3k...@gmail.com> wrote: > >> Hi Simon. >> In general in scikit-learn you could use class-weights to make one class >> more important then the other. >> Unfortunately that is not implemented for AdaBoost yet. You can however >> use the sample_weights parameter of the fit method, >> and create sample weights either by hand based on the class, or use the >> compute_sample_weights function: >> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py#L82 >> >> The other possibility is to simply threshold the predict_proba of your >> classifier differently based on your cost. >> >> Best, >> Andy >> >> >> On 08/04/2015 10:29 AM, Simon Burton wrote: >>> Hi, >>> >>> I am attempting to build some classification models where false-positives >>> are >>> much worse than false-negatives. Normally these two outcomes are >>> treated equally (equal loss) in the training procedure, >>> but I would like to be able to customize this. >>> >>> I've been using the AdaBoost classifier, which works well as a general >>> data-miner, except for this issue. I tried hacking a bit on the >>> code by only boosting the false-positive samples, >>> but I don't really know if that makes any sense (it tends to forget >>> about the false-negatives). >>> >>> Googling around I found a paper [1] but it's not clear to me >>> if this is what I am looking for. >>> >>> Thankyou for any suggestions. >>> >>> Simon. >>> >>> >>> [1] McCane, Brendan; Novins, Kevin; Albert, Michael (2005). "Optimizing >>> cascade classifiers.". >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general