Here is a paper that includes an analysis of voting patterns using LDA.
http://arxiv.org/pdf/math/0604410.pdf
On Tue, Sep 30, 2014 at 7:04 PM, Parimi Rohit rohit.par...@gmail.com
wrote:
Ted,
I know LDA can be used to model text data but never used it in this
setting. Can you please give
Hey guys,
I think it is fair to give you some feedback.
I managed to implement BM25+ http://en.wikipedia.org/wiki/Okapi_BM25 term
score on Mahout.
It was straightforward using the current TFIDF implementation as an example.
Basically what I did was implement the interface
How did u implement BM25PartialVectorReducer and BM25Converter?? The
present implementations for TFIDFConverter and Reducer are MR.
Mahout is not accepting any new MapReduce code.
On Wed, Oct 1, 2014 at 7:18 AM, Arian Pasquali ar...@arianpasquali.com
wrote:
Hey guys,
I think it is fair to give
Thanks so much for the feedback. Glad to hear it was straightforward.
But the important question is
how did BM25 work for you?
On Wed, Oct 1, 2014 at 6:18 AM, Arian Pasquali ar...@arianpasquali.com
wrote:
Hey guys,
I think it is fair to give you some feedback.
I managed to
Hi Ted,
My dataset is a collection of documents in german and I can say that the
scores seems better compared to my TFIDF scores. Results make more sense
now, specially my bi-grams.
Arian Pasquali
http://about.me/arianpasquali
2014-10-01 13:09 GMT+01:00 Ted Dunning ted.dunn...@gmail.com:
Thanks Ted! Will look into it.
Rohit
On Wed, Oct 1, 2014 at 1:04 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Here is a paper that includes an analysis of voting patterns using LDA.
http://arxiv.org/pdf/math/0604410.pdf
On Tue, Sep 30, 2014 at 7:04 PM, Parimi Rohit rohit.par...@gmail.com
Yes Suneel,
Indeed It is in MR fashion.
What exactly do you mean when you said Mahout is not accepting any new
MapReduce code?
Do you mean for submitting a patch?
I'm sure there might be better ways to implement it, but I'm more
interesting in the results right now.
What would be your
On Wed, Oct 1, 2014 at 7:52 AM, Arian Pasquali ar...@arianpasquali.com
wrote:
My dataset is a collection of documents in german and I can say that the
scores seems better compared to my TFIDF scores. Results make more sense
now, specially my bi-grams.
OK.
I will take note.
First I agree with Ted that LLR is better. I've tried all of the similarity
methods in Mahout on exactly the same dataset and got far higher
cross-validation scores for LLR. You may still use pearson with Mahout 0.9 and
1.0 but it is not supported in the Mahout 1.0 Spark jobs.
If you have
Hi Pat,
If I am wrong plz correct me, if we take table 2 (user2) then he rated for
vendor 1 - vendor 3,
1. I am going assign for each user an ID starting from 1 - N.
2. Vendors will have the ID with 601,602,603
3. Services will have the ID with 501,502,503.
4. If I translate
10 matches
Mail list logo