Hello,

Could anybody tell me the difference between using augmented frequency
(which is used for weighting term frequencies to eliminate the bias towards
larger documents) and cosine normalization (l2 norm which scikit-learn uses
for TfidfTransformer).
Augmented frequency is given by the following equation. It tries to divide
the natural term frequency by the maximum frequency of any term in the
document.

[image: Inline image 1]

Do they both do the same thing when it comes to eliminating bias towards
larger documents? I suppose scikit-learn uses the natural term freq, and
using cosine normalization is enabled with using norm=l2

Any help would be appreciated!

- Apurva
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to