Hello, Could anybody tell me the difference between using augmented frequency (which is used for weighting term frequencies to eliminate the bias towards larger documents) and cosine normalization (l2 norm which scikit-learn uses for TfidfTransformer). Augmented frequency is given by the following equation. It tries to divide the natural term frequency by the maximum frequency of any term in the document.
[image: Inline image 1] Do they both do the same thing when it comes to eliminating bias towards larger documents? I suppose scikit-learn uses the natural term freq, and using cosine normalization is enabled with using norm=l2 Any help would be appreciated! - Apurva
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn