Ted, Thanks for your response. Following is the information about the
approach and the datasets:

I am using the ItemSimilarityJob and passing it  "itemID, userID,
prefCount" tuples as input to compute user-user similarity using LLR. I
read this approach from a response for one of the stackoverflow questions
on calculating user similarity using mahout. .


Following are the stats for the datasets:

Coauthor dataset:

users = 29189
items =  140091
averageItemsClicked = 15.808660796875536

Conference Dataset:

users = 29189
items =  2393
averageItemsClicked = 7.265099866388023

Reference Dataset:

users = 29189
items =  201570
averageItemsClicked = 61.08564870327863

By Scale, did you mean rating scale? If so, I am using preference counts,
not rating.

Thanks,
Rohit


On Tue, Sep 30, 2014 at 12:08 AM, Ted Dunning <[email protected]> wrote:

> How are you using LLR to compute user similarity?  It is normally used to
> compute item similarity?
>
> Also, what is your scale?  how many users? how many items?  how many
> actions per user?
>
>
>
> On Mon, Sep 29, 2014 at 6:24 PM, Parimi Rohit <[email protected]>
> wrote:
>
> > Hi,
> >
> > I am exploring a random-walk based algorithm for recommender systems
> which
> > works by propagating the item preferences for users on the user-user
> graph.
> > To do this, I have to compute user-user similarity and form a
> neighborhood.
> > I have tried the following three simple techniques to compute the score
> > between two users and find the neighborhood.
> >
> > 1. Score = (Common Items between users A and B) / (items preferred by A +
> > items Preferred by B)
> > 2. Scoring based on Mahout's Cosine Similarity
> > 3. Scoring based on Mahout's LogLikelihood similarity.
> >
> > My understanding is that similarity based on LogLikelihood is more
> robust,
> > however, I get better results using the naive approach (technique 1 from
> > the above list). The problems I am addressing are collaborator
> > recommendation, conference recommendation and reference recommendation
> and
> > the data has implicit feedback.
> >
> > So, my questions is, are there any cases where cosine similarity and
> > loglikelihood metrics fail (to capture similarity), for example, for the
> > problems stated above, users only collaborate with few other users (based
> > on area of interest), publish in only few conferences (again based on
> area
> > of interest) and refer to publications in a specific domain. So, the
> > preference counts are fairly small compared to other domains (music/video
> > etc).
> >
> > Secondly, for CosineSimilarity, should I treat the preferences as boolean
> > or use the counts? (I think loglikelihood metric does not take into
> account
> > the preference counts.. correct me if I am wrong.)
> >
> > Any insight into this is much appreciated.
> >
> > Thanks,
> > Rohit
> >
> > p.s. Ted, Pat: I am following the discussion on the thread
> > "LogLikelihoodSimilarity Calculation" and your answers helped me a lot to
> > understand how it works and made me wonder why things are different in my
> > case.
> >
>

Reply via email to