On 11/04/2016 05:45 AM, Marcin Mirończuk wrote:
Hi,
In our experiments, we use a Multinomial Naive Bayes (MNB). The
traditional MNB implies the TF weight of the words. We read in
documentation http://scikit-learn.org/stable/modules/naive_bayes.html
which describes Multinomial Naive Bayes that "... where the data are
typically represented as word vector counts, although tf-idf vectors
are also known to work well in practice". The "word vector counts" is
a TF and it is well known. We have a problem which the "tf-idf
vectors". In this case, i.e. tf-idf it was implemented the approach
of the D. M. Rennie et all Tackling the Poor Assumptions of Naive
Bayes Text Classification? In the documentation, there are not any
citation of this solution.
No, I think that paper implements something slightly different. The
documentation says that you can apply the TfidfVectorizer instead of
CountVectorizer and it can still work.
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn