Re: Cosine Similarity and LogLikelihood not helpful for implicit feedback!

2014-10-01 Thread Ted Dunning
Here is a paper that includes an analysis of voting patterns using LDA. http://arxiv.org/pdf/math/0604410.pdf On Tue, Sep 30, 2014 at 7:04 PM, Parimi Rohit rohit.par...@gmail.com wrote: Ted, I know LDA can be used to model text data but never used it in this setting. Can you please give

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
Hey guys, I think it is fair to give you some feedback. I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term score on Mahout. It was straightforward using the current TFIDF implementation as an example. Basically what I did was implement the interface

Re: word weights using BM25

2014-10-01 Thread Suneel Marthi
How did u implement BM25PartialVectorReducer and BM25Converter?? The present implementations for TFIDFConverter and Reducer are MR. Mahout is not accepting any new MapReduce code. On Wed, Oct 1, 2014 at 7:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give

Re: word weights using BM25

2014-10-01 Thread Ted Dunning
Thanks so much for the feedback. Glad to hear it was straightforward. But the important question is how did BM25 work for you? On Wed, Oct 1, 2014 at 6:18 AM, Arian Pasquali ar...@arianpasquali.com wrote: Hey guys, I think it is fair to give you some feedback. I managed to

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
Hi Ted, My dataset is a collection of documents in german and I can say that the scores seems better compared to my TFIDF scores. Results make more sense now, specially my bi-grams. Arian Pasquali http://about.me/arianpasquali 2014-10-01 13:09 GMT+01:00 Ted Dunning ted.dunn...@gmail.com:

Re: Cosine Similarity and LogLikelihood not helpful for implicit feedback!

2014-10-01 Thread Parimi Rohit
Thanks Ted! Will look into it. Rohit On Wed, Oct 1, 2014 at 1:04 AM, Ted Dunning ted.dunn...@gmail.com wrote: Here is a paper that includes an analysis of voting patterns using LDA. http://arxiv.org/pdf/math/0604410.pdf On Tue, Sep 30, 2014 at 7:04 PM, Parimi Rohit rohit.par...@gmail.com

Re: word weights using BM25

2014-10-01 Thread Arian Pasquali
Yes Suneel, Indeed It is in MR fashion. What exactly do you mean when you said Mahout is not accepting any new MapReduce code? Do you mean for submitting a patch? I'm sure there might be better ways to implement it, but I'm more interesting in the results right now. What would be your

Re: word weights using BM25

2014-10-01 Thread Ted Dunning
On Wed, Oct 1, 2014 at 7:52 AM, Arian Pasquali ar...@arianpasquali.com wrote: My dataset is a collection of documents in german and I can say that the scores seems better compared to my TFIDF scores. Results make more sense now, specially my bi-grams. OK. I will take note.

Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-10-01 Thread Pat Ferrel
First I agree with Ted that LLR is better. I've tried all of the similarity methods in Mahout on exactly the same dataset and got far higher cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and 1.0 but it is not supported in the Mahout 1.0 Spark jobs. If you have

Re: how to get recommendations by using user-user correlation for the given table in this mail

2014-10-01 Thread vinayakb malagatti
Hi Pat, If I am wrong plz correct me, if we take table 2 (user2) then he rated for vendor 1 - vendor 3, 1. I am going assign for each user an ID starting from 1 - N. 2. Vendors will have the ID with 601,602,603 3. Services will have the ID with 501,502,503. 4. If I translate