Hi Lars,
Thanks for the gist. For my part, I will look into it and propose some
ideas as soon as possible.
Cheers,
Pavel S.
On Thu, Oct 9, 2014 at 1:22 PM, Lars Buitinck wrote:
> 2014-08-23 21:25 GMT+02:00 Lars Buitinck :
> > I was just implementing tf-chi2 today (I have a text classificati
2014-08-23 21:25 GMT+02:00 Lars Buitinck :
> I was just implementing tf-chi2 today (I have a text classification
> task to improve anyway), so I might send a PR somewhere over the next
> week to at least establish the API. Supervised term weighting is
> pretty big, with hundreds of citations for th
Greetings all!
Sorry for the delay. Thanks for all your answers !
Vlad, regarding the multi-class (orr multi-label) options, as fas as I
know, there is not a lot of documentation on the topic. I did once
implemented the multi-class option for the project I was working on, in an
OvR fashion. In th
2014-08-23 17:06 GMT+02:00 Lars Buitinck :
> [3] This paper from a guy at HP Research that I cannot find right now.
Found it: Forman et al., Feature Shaping for Linear SVM Classifiers,
http://www.hpl.hp.com/techreports/2009/HPL-2009-31R1.pdf (SIGKDD
2009).
2014-08-23 20:41 GMT+02:00 Gael Varoquaux :
> Interesting discussion. Of course, the danger here is that it might be
> borderline for the scope of scikit-learn. In case somebody is going to
> docstringdo a PR on these topics, I would advise to work on the docstring
> and narrative documentation to
Hey there,
Interesting discussion. Of course, the danger here is that it might be
borderline for the scope of scikit-learn. In case somebody is going to
docstringdo a PR on these topics, I would advise to work on the docstring
and narrative documentation to explain well why this can be useful
not
I agree with Vlad that delta-IDF is interesting; but it is not well
supported by the community, and I'm not sure it is worth including ... yet.
As Lars points out (and as you suggest), there are other ways to supervise
feature weighting. I agree this has to be a separate transformer
(SupervisedTFID
2014-08-23 15:44 GMT+02:00 Pavel Soriano :
> I don't know if this would be helpful to anybody or if this was already
> discussed. That is why I am asking if it is worthy to be pull requested.
> Gist URL :
> https://gist.github.com/psorianom/0b9d8a742fe0efe0fe82
Yes! BM25 is high on my wishlist. I
Hi Pavel,
First of all, this is an interesting subject, thanks for bringing it
up! I fear that it's too domain-specific to go very deep in this
direction.
That being said, and trying to interpret your benchmarks, it seems
that Delta-idf might actually be interesting.
Or, more generally, the idea o