Hi Andy, thankyou for your comments. I am still a bit confused about this so let me try again to explain what I am thinking.
Here is a little table showing how we would normally score classifier performance on a single sample: A B c(A) +1 -1 c(B) -1 +1 Ie. if the true class is "A" and the classifier prediction c(A) is also "A" then score +1, but if c(A) is "B" then score -1, etc. It would seem if we re-weighted the classes, so that A was 10 times more important then this would be our new score table: A B c(A) +10 -1 c(B) -10 +1 Is that correct? However, I am looking to classify data where a mistake in c(B) is much worse than a mistake in c(A): A B c(A) +1 -1 c(B) -10 +1 So I don't see how to achieve this with class weights. Am I missing something ? Perhaps my approach is completely wrong and I should be doing something else like regression or something. Many thanks, Simon. On Tue, 04 Aug 2015 11:36:31 -0400 Andreas Mueller <t3k...@gmail.com> wrote: > Hi Simon. > In general in scikit-learn you could use class-weights to make one class > more important then the other. > Unfortunately that is not implemented for AdaBoost yet. You can however > use the sample_weights parameter of the fit method, > and create sample weights either by hand based on the class, or use the > compute_sample_weights function: > https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py#L82 > > The other possibility is to simply threshold the predict_proba of your > classifier differently based on your cost. > > Best, > Andy > > > On 08/04/2015 10:29 AM, Simon Burton wrote: > > Hi, > > > > I am attempting to build some classification models where false-positives > > are > > much worse than false-negatives. Normally these two outcomes are > > treated equally (equal loss) in the training procedure, > > but I would like to be able to customize this. > > > > I've been using the AdaBoost classifier, which works well as a general > > data-miner, except for this issue. I tried hacking a bit on the > > code by only boosting the false-positive samples, > > but I don't really know if that makes any sense (it tends to forget > > about the false-negatives). > > > > Googling around I found a paper [1] but it's not clear to me > > if this is what I am looking for. > > > > Thankyou for any suggestions. > > > > Simon. > > > > > > [1] McCane, Brendan; Novins, Kevin; Albert, Michael (2005). "Optimizing > > cascade classifiers.". > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general