[scikit-learn] PR #8190: "Implement Complement Naive Bayes."

Michael Alcorn Fri, 20 Jan 2017 08:39:08 -0800

Hi all,

I would appreciate it if a couple of maintainers could take a look at my
pull request (https://github.com/scikit-learn/scikit-learn/pull/8190)
implementing the Complement Naive Bayes (CNB) classifier described in
Rennie et al. (2003). CNB regularly outperforms the standard Multinomial
Naive Bayes (MNB) classifier on real world data sets due to the tendency
for real world data sets to suffer from class imbalance. Apache Mahout
offers its own implementation of CNB alongside MNB, but it would be nice to
have an easily usable CNB implementation available in scikit-learn.


Training the CNB classifier on a reasonably sized data set of 493,038
documents with a median length of 87 tokens and 1,155,784 distinct tokens
took around 8.5 seconds. For comparison, the MNB classifier took around 4.5
seconds to train, but the CNB had a 10% lower error rate, a seemingly
worthwhile tradeoff.

Happy to answer any questions or discuss further.

Thanks,
Michael A. Alcorn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] PR #8190: "Implement Complement Naive Bayes."

Reply via email to