Re: Sparse data & Item Similarity

Chris Schilling Wed, 16 Feb 2011 14:32:23 -0800

Mathew,

I was running into a similar issue with my data.  I discussed it with Sean Owen 
offline and his advice was, in a nutshell, to use the log-likelihood similarity 
metric.  Since you describe your users as having only links, I assume you are 
not dealing with preference data.  So, with the boolean data, the 
log-likelihood metric works very well (in my case, which I am also dealing with 
very sparse data).   How do your results look if you try the likelihood 
approach?


Hope this helps,
Chris

 
On Feb 16, 2011, at 2:24 PM, Matthew Runo wrote:

> Hello folks -
> 
> (I think that) I'm running into an issue with my user data being too
> sparse with my item-item similarity calculations. A typical item_id in
> my data might have about 2000 links to other items, but very few
> "combinations" of users have viewed the same products.
> 
> For example we have two items, 1244 and 2319 - and there are only
> three users in common between them.
> 
> So, there's only those three users who viewed both items. I'm
> assigning preferences to different types of actions in my data.. and
> since all three users did the same action towards the item, they have
> the same preference value. Maybe I just need to start with a bigger
> set of data to get more links between items in different "actions" in
> order to spread out the generated similarities? I'm using the
> EuclideanDistanceSimilarity to do the final computation.
> 
> I think this is leading to a huge number of "1" values being returned.
> Nearly 72% of my item-item similarities are 1.0. I feel that this is
> invalid, but I'm not quite sure of the best way to attack it.
> 
> There are some similarities of 1 where the items do not appear to be
> similar at all, and the best I've been able to come up with as to how
> the 1 came around was that there was only one user who had a link
> between them and so that one user.
> 
> How many item-user-item combinations per item pair does it take to get
> good output?
> 
> Sorry if I'm not quite describing my problem in the proper terms..
> 
> --Matthew Runo

Re: Sparse data & Item Similarity

Reply via email to