[scikit-learn] BM25 Pull Request

2016-08-04 Thread Basil Beirouti
Hi all, Just sending an email for visibility. I've made a pull request to add Bm25 capabilities to complement TFIDF in feature_extraction.text. All tests pass. Sincerely, Basil Beirouti ___ scikit-learn mailing list scikit-learn@python.org https://mail.

Re: [scikit-learn] Bm25 pull request

2016-07-11 Thread Joel Nothman
CircleCI checks the documentation build (although apparently it ignores changes only to docstrings). Travis runs all tests on a linux system. AppVeyor tests on Windows. On 12 July 2016 at 08:11, Basil Beirouti wrote: > > Hi, > > Joel thanks for pointing out the indentation issue. I have fixed it

[scikit-learn] Bm25 pull request

2016-07-11 Thread Basil Beirouti
Hi, Joel thanks for pointing out the indentation issue. I have fixed it. Can someone explain what the 3 tests that were automatically run on my code are? And why did the Appveyor and Travis ones fail? Sincerely, Basil Beirouti Sent from my iPhone > On Jul 11, 2016, at 11:00 AM, scikit-lear

Re: [scikit-learn] Bm25

2016-07-01 Thread Vlad Niculae
For the first question, look up the possible ways to construct scipy.sparse.csr_matrix objects; one of them will take (data, indices, indptr). Just pass a new array for data, and take the latter two from X. For the second question, you can just do the elementwise operation in place on the data

Re: [scikit-learn] Bm25

2016-07-01 Thread Vlad Niculae
For the first question, look up the possible ways to construct scipy.sparse.csr_matrix objects; one of them will take (data, indices, indptr). Just pass a new array for data, and take the latter two from X. For the second question, you can just do the elementwise operation in place on the data

Re: [scikit-learn] Bm25

2016-07-01 Thread Basil Beirouti
Oh yes that's exactly what I was looking for. So how do I initialize an array with the same sparsity pattern as X? And then how do I do an element wise divide of the numerator over the denominator, when both are sparse matrices? Like you said it should only do this operation on the non zero elem

Re: [scikit-learn] Bm25

2016-07-01 Thread Vlad Niculae
In the denominator you mean? It looks like you only need to add that to nonzero elements, since the others would all have a 0 in the numerator, right? So the final value would be zero there. Or am I missing something? You can initialize an array with the same sparsity pattern as X, but its data

[scikit-learn] Bm25

2016-07-01 Thread Basil Beirouti
Hi Vlad, Thanks for the quick reply. Unfortunately there's still the question of adding a scalar to every element in sparse matrix, which is not allowed for sparse matrices, and which is not possible to avoid in the equation. Sincerely, Basil Beirouti > On Jul 1, 2016, at 4:36 PM, scikit-lea