[Scikit-learn-general] partial-fit in gradient boosting

2014-08-23 Thread Mahendra Kariya
Hello All, I have a 12G dataset on which I want to run GradientBoostingRegressor. But loading such a large dataset in memory is practically impossible. I can load it in chunks and train the model in batch mode, but I don't see any partial_fit method in gradient boosting. Is there any other

[Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Pavel Soriano
Greetings scikit, Last year I used delta idf and bm25 text weighting schemes with scikit classifiers for an opinion classification task. Today I decided to clean them and recheck them in order to propose it to scikit-learn text feature extractors. I only implemented delta idf and bm25 tf and

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Vlad Niculae
Hi Pavel, First of all, this is an interesting subject, thanks for bringing it up! I fear that it's too domain-specific to go very deep in this direction. That being said, and trying to interpret your benchmarks, it seems that Delta-idf might actually be interesting. Or, more generally, the idea

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 15:44 GMT+02:00 Pavel Soriano sorianopa...@gmail.com: I don't know if this would be helpful to anybody or if this was already discussed. That is why I am asking if it is worthy to be pull requested. Gist URL : https://gist.github.com/psorianom/0b9d8a742fe0efe0fe82 Yes! BM25 is high

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Joel Nothman
I agree with Vlad that delta-IDF is interesting; but it is not well supported by the community, and I'm not sure it is worth including ... yet. As Lars points out (and as you suggest), there are other ways to supervise feature weighting. I agree this has to be a separate transformer

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Gael Varoquaux
Hey there, Interesting discussion. Of course, the danger here is that it might be borderline for the scope of scikit-learn. In case somebody is going to docstringdo a PR on these topics, I would advise to work on the docstring and narrative documentation to explain well why this can be useful not

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 20:41 GMT+02:00 Gael Varoquaux gael.varoqu...@normalesup.org: Interesting discussion. Of course, the danger here is that it might be borderline for the scope of scikit-learn. In case somebody is going to docstringdo a PR on these topics, I would advise to work on the docstring and