On Jan 8, 2007, at 10:51 AM, Tom Fawcett wrote:

Just to add a note here: Ken is correct -- both NB and SVMs are known to be rather poor at providing accurate probabilities. Their scores tend to be too extreme. Producing good probabilities from these scores is called calibrating the classifier, and it's more complex than just taking a root of the score. There are several methods for calibrating scores. The good news is that there's an effective one called isotonic regression (or Pool Adjacent Violators) which is pretty easy and fast. The bad news is that there's no plug-in (ie, CPAN-ready) perl implementation of it (I've got a simple implementation which I should convert and contribute someday).

If you want to read about classifier calibration, google one of these titles:

"Transforming classifier scores into accurate multiclass probability estimates"
by Bianca Zadrozny and Charles Elkan

"Predicting Good Probabilities With Supervised Learning"
by A. Niculescu-Mizil and R. Caruana


Cool, thanks for the references. It might be nice to add somesuch scheme to Algorithm::NaiveBayes (and friends), so that the user has a choice of several normalization schemes, including "none". If I get a surplus of tuits I'll add it, or if you feel like contributing your stuff that would be great too.

 -Ken

Reply via email to