Hello all, You can use sklearn.feature_extraction.text.TfidfVectorizer to learn a corpus of documents and rank them in order of relevance to a new previously unseen query.
BM25 works in a similar manner to TfidfVectorizer, but is more complex and considered one of the most successful information retrieval algorithms. I currently have code that implements BM25 quite efficiently to learn a corpus of documents and I want to modify/port it to align with the fit-transform framework of sklearn. I think it could fit neatly into the current codebase. Questions: 1.) Would this be a desirable feature? 2.) Any advice for how to proceed with this? Things to watch out for? Any and all advice is welcome. Thanks! Basil
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
