Re: word weights using BM25

2014-10-02 Thread Pat Ferrel
We are moving to higher performance platforms than Hadoop mapreduce, like Spark. You can still do map/reduce style code but Mahout's not taking new Hadoop mr code. On Oct 1, 2014, at 6:30 AM, Arian Pasquali ar...@arianpasquali.com wrote: Yes Suneel, Indeed It is in MR fashion. What exactly do

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term score on Mahout. It was straightforward using the current TFIDF implementation as an example. Basically what I did was implement the interface

Re: word weights using BM25

2014-10-01 Thread Suneel Marthi
How did u implement BM25PartialVectorReducer and BM25Converter?? The present implementations for TFIDFConverter and Reducer are MR. Mahout is not accepting any new MapReduce code. On Wed, Oct 1, 2014 at 7:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give

Re: word weights using BM25

2014-10-01 Thread Ted Dunning
Thanks so much for the feedback. Glad to hear it was straightforward. But the important question is how did BM25 work for you? On Wed, Oct 1, 2014 at 6:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give you some feedback. I managed to

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
Hi Ted, My dataset is a collection of documents in german and I can say that the scores seems better compared to my TFIDF scores. Results make more sense now, specially my bi-grams. Arian Pasquali http://about.me/arianpasquali 2014-10-01 13:09 GMT+01:00 Ted Dunning ted.dunn...@gmail.com:

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
Yes Suneel, Indeed It is in MR fashion. What exactly do you mean when you said Mahout is not accepting any new MapReduce code? Do you mean for submitting a patch? I'm sure there might be better ways to implement it, but I'm more interesting in the results right now. What would be your

Re: word weights using BM25

2014-10-01 Thread Ted Dunning
On Wed, Oct 1, 2014 at 7:52 AM, Arian Pasquali ar...@arianpasquali.com wrote: My dataset is a collection of documents in german and I can say that the scores seems better compared to my TFIDF scores. Results make more sense now, specially my bi-grams. OK. I will take note.

Re: word weights using BM25

2014-09-24 Thread Arian Pasquali
Yes, I'm studying his work http://nlp.uned.es/~jperezi/Lucene-BM25/ and the current mahout's tfidf code. Trying to understand how I would port that to mr. I ll try to share something if I succeed. Arian Pasquali http://about.me/arianpasquali 2014-09-24 5:12 GMT+01:00 Suneel Marthi

Re: word weights using BM25

2014-09-24 Thread Marko
Hello everyone, I'm very sorry to bump in like this, I have been added to the mail list (I think), but it seems that I'm somehow unable to ask a question, that is, I asked a question full times and got no answer. I hope this way will work. I'm new to Mahout and I've been struggling with

Re: word weights using BM25

2014-09-24 Thread Suneel Marthi
@Marko, Subject: Streaming KMeans See http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-means/18090471#18090471 for how to invoke Streaming Kmeans Also look at examples/bin/cluster-reuters.sh for the Streaming KMeans option. On Wed, Sep 24, 2014 at 11:34 AM, Marko

Re: word weights using BM25

2014-09-24 Thread Ted Dunning
Marko, Suneel's answer is much better than mine. On Wed, Sep 24, 2014 at 10:10 PM, Suneel Marthi suneel.mar...@gmail.com wrote: @Marko, Subject: Streaming KMeans See http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-means/18090471#18090471 for how to invoke

word weights using BM25

2014-09-23 Thread Arian Pasquali
Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to know if anyone here has already tried to do so. If not, what would be your suggestion for such implementation on Mahout? Arian Pasquali

Re: word weights using BM25

2014-09-23 Thread Ted Dunning
Should be pretty easy. I haven't heard of anyone doing it. Sent from my iPhone On Sep 23, 2014, at 18:53, Arian Pasquali ar...@arianpasquali.com wrote: Hi, I was wondering if would be possible to support bm25 term weighting extending Mahout's tf-idf implementation. I was curious to

Re: word weights using BM25

2014-09-23 Thread Suneel Marthi
Lucene 4.x supports okapi-bm25. So it should be easy to implement. On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning ted.dunn...@gmail.com wrote: Should be pretty easy. I haven't heard of anyone doing it. Sent from my iPhone On Sep 23, 2014, at 18:53, Arian Pasquali ar...@arianpasquali.com