Hello all,

You can use sklearn.feature_extraction.text.TfidfVectorizer to learn a
corpus of documents and rank them in order of relevance to a new previously
unseen query.

BM25 works in a similar manner to TfidfVectorizer, but is more complex and
considered one of the most successful information retrieval algorithms.

I currently have code that implements BM25 quite efficiently to learn a
corpus of documents and I want to modify/port it to align with the
fit-transform framework of sklearn. I think it could fit neatly into the
current codebase.

Questions:
1.) Would this be a desirable feature?
2.) Any advice for how to proceed with this? Things to watch out for?

Any and all advice is welcome.

Thanks!
Basil
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to