Re: [Scikit-learn-general] delta idf and bm25

2014-10-11 Thread Pavel Soriano
Hi Lars, Thanks for the gist. For my part, I will look into it and propose some ideas as soon as possible. Cheers, Pavel S. On Thu, Oct 9, 2014 at 1:22 PM, Lars Buitinck wrote: > 2014-08-23 21:25 GMT+02:00 Lars Buitinck : > > I was just implementing tf-chi2 today (I have a text classificati

Re: [Scikit-learn-general] delta idf and bm25

2014-10-09 Thread Lars Buitinck
2014-08-23 21:25 GMT+02:00 Lars Buitinck : > I was just implementing tf-chi2 today (I have a text classification > task to improve anyway), so I might send a PR somewhere over the next > week to at least establish the API. Supervised term weighting is > pretty big, with hundreds of citations for th

Re: [Scikit-learn-general] delta idf and bm25

2014-08-26 Thread Pavel Soriano
Greetings all! Sorry for the delay. Thanks for all your answers ! Vlad, regarding the multi-class (orr multi-label) options, as fas as I know, there is not a lot of documentation on the topic. I did once implemented the multi-class option for the project I was working on, in an OvR fashion. In th

Re: [Scikit-learn-general] delta idf and bm25

2014-08-25 Thread Lars Buitinck
2014-08-23 17:06 GMT+02:00 Lars Buitinck : > [3] This paper from a guy at HP Research that I cannot find right now. Found it: Forman et al., Feature Shaping for Linear SVM Classifiers, http://www.hpl.hp.com/techreports/2009/HPL-2009-31R1.pdf (SIGKDD 2009).

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 20:41 GMT+02:00 Gael Varoquaux : > Interesting discussion. Of course, the danger here is that it might be > borderline for the scope of scikit-learn. In case somebody is going to > docstringdo a PR on these topics, I would advise to work on the docstring > and narrative documentation to

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Gael Varoquaux
Hey there, Interesting discussion. Of course, the danger here is that it might be borderline for the scope of scikit-learn. In case somebody is going to docstringdo a PR on these topics, I would advise to work on the docstring and narrative documentation to explain well why this can be useful not

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Joel Nothman
I agree with Vlad that delta-IDF is interesting; but it is not well supported by the community, and I'm not sure it is worth including ... yet. As Lars points out (and as you suggest), there are other ways to supervise feature weighting. I agree this has to be a separate transformer (SupervisedTFID

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 15:44 GMT+02:00 Pavel Soriano : > I don't know if this would be helpful to anybody or if this was already > discussed. That is why I am asking if it is worthy to be pull requested. > Gist URL : > https://gist.github.com/psorianom/0b9d8a742fe0efe0fe82 Yes! BM25 is high on my wishlist. I

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Vlad Niculae
Hi Pavel, First of all, this is an interesting subject, thanks for bringing it up! I fear that it's too domain-specific to go very deep in this direction. That being said, and trying to interpret your benchmarks, it seems that Delta-idf might actually be interesting. Or, more generally, the idea o