Re: [scikit-learn] Adding BM25 relevance function to sklearn.feature_extraction.text

Basil Beirouti Mon, 13 Jun 2016 19:51:12 -0700

Hello all,

You can use sklearn.feature_extraction.text.TfidfVectorizer to learn a
corpus of documents and rank them in order of relevance to a new previously
unseen query.


BM25 works in a similar manner to TfidfVectorizer, but is more complex and
considered one of the most successful information retrieval algorithms.

I currently have code that implements BM25 quite efficiently to learn a
corpus of documents and I want to modify/port it to align with the
fit-transform framework of sklearn. I think it could fit neatly into the
current codebase.

Questions:
1.) Would this be a desirable feature?
2.) Any advice for how to proceed with this? Things to watch out for?

Any and all advice is welcome.

Thanks!
Basil

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Adding BM25 relevance function to sklearn.feature_extraction.text

Reply via email to