We are moving to higher performance platforms than Hadoop mapreduce, like Spark. You can still do map/reduce style code but Mahout's not taking new Hadoop mr code.
On Oct 1, 2014, at 6:30 AM, Arian Pasquali <[email protected]> wrote: Yes Suneel, Indeed It is in MR fashion. What exactly do you mean when you said Mahout is not accepting any new MapReduce code? Do you mean for submitting a patch? I'm sure there might be better ways to implement it, but I'm more interesting in the results right now. What would be your suggestion? best Arian Pasquali http://about.me/arianpasquali 2014-10-01 13:10 GMT+01:00 Suneel Marthi <[email protected]>: > How did u implement BM25PartialVectorReducer and BM25Converter?? The > present implementations for TFIDFConverter and Reducer are MR. > Mahout is not accepting any new MapReduce code. > > On Wed, Oct 1, 2014 at 7:18 AM, Arian Pasquali <[email protected]> > wrote: > >> Hey guys, >> I think it is fair to give you some feedback. >> I managed to implement BM25+ <http://en.wikipedia.org/wiki/Okapi_BM25> >> term >> score on Mahout. >> It was straightforward using the current TFIDF implementation as an >> example. >> >> Basically what I did was implement the interface >> org.apache.mahout.vectorizer.Weight, create a BM25Converter and >> BM25PartialVectorReducer similar to TFIDFConverter >> < >> > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFConverter.html >>> >> and >> TFIDFPartialVectorReducer >> < >> > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/vectorizer/tfidf/TFIDFPartialVectorReducer.html >>> >> respectively . >> >> cheers >> Arian >> >> Arian Pasquali >> http://about.me/arianpasquali >> >> 2014-09-24 14:14 GMT+01:00 Arian Pasquali <[email protected]>: >> >>> Yes, >>> I'm studying his work <http://nlp.uned.es/~jperezi/Lucene-BM25/> and > the >>> current mahout's tfidf code. >>> Trying to understand how I would port that to mr. >>> I ll try to share something if I succeed. >>> >>> Arian Pasquali >>> http://about.me/arianpasquali >>> >>> 2014-09-24 5:12 GMT+01:00 Suneel Marthi <[email protected]>: >>> >>>> Lucene 4.x supports okapi-bm25. So it should be easy to implement. >>>> >>>> On Tue, Sep 23, 2014 at 11:57 PM, Ted Dunning <[email protected]> >>>> wrote: >>>> >>>>> Should be pretty easy. I haven't heard of anyone doing it. >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Sep 23, 2014, at 18:53, Arian Pasquali < > [email protected]> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> I was wondering if would be possible to support bm25 term > weighting >>>>>> extending Mahout's tf-idf implementation. >>>>>> >>>>>> I was curious to know if anyone here has already tried to do so. >>>>>> If not, what would be your suggestion for such implementation on >>>> Mahout? >>>>>> >>>>>> >>>>>> Arian Pasquali >>>>>> http://about.me/arianpasquali >>>>> >>>> >>> >>> >> >
