date:20240528

[scikit-learn] why the modification in the df-idf formula?

2024-05-28 Thread Sole Galli via scikit-learn

Hi guys, I'd like to understand why sklearn's implementation of tf-idf is different from the standard textbook notation as described in the docs: https://scikit-learn.org/stable/modules/feature_extraction.html#tfidf-term-weighting Do you have any reference that I could take a look at? I didn't

Re: [scikit-learn] why the modification in the df-idf formula?

2024-05-28 Thread Sebastian Raschka

Hi Sole, It’s been a long time, but I remember helping with drafting the Tf-idf text in the documentation as part of a scikit-learn sprint at SciPy a looong time ago where I mentioned this difference (since it initially surprised me, because I couldn’t get it to match my from-scratch implementa