Quite an interesting question indeed.

On Thu, Sep 23, 2010 at 2:05 AM, Stanley Ipkiss <[email protected]> wrote:
> On the other hand when using user based collaborative filtering, you always
> use the same neighborhood set, irrespective of whether that user has rated
> the particular item or not. You check for this in the code, but your

What is "the particular item" here?
In user-based recommendation, you pick your neighborhood, then find
all items connected to users in that neighborhood, and do a weighted
average over users in the neighborhood to estimate


> item-item cf -
> r(user, item_i) = SUM_(all j rated by user)  s(item_i, item_j) * r(user,
> item_j)

Correct (except for perhaps the normalization, dividing by sum of weights)


> user-user cf -
> r(user_i, item) = SUM_(all j in user_i neihborhood) s(user_i, user_j) *
> r(user_j, item)

Yes, same.


> Why don't we use the same approach in user to user cf? Why not try to
> retrieve those users who have rated the particular item in question and then
> either take the k closest ones or based on some threshold? Or, why not use
> the same user to user logic in item-item cf?

A dumb answer is that this is simply not how these canonical
algorithms are described, and the idea was to provide an
implementation of the stock algorithms.

A more intelligent answer is that this wouldn't scale as well. If I
understand you, by tweaking it that way, you lose the implicit
definition of the set of candiate items. The "SUM" becomes over all
items, not those in the neighborhood. I believe that's the essential
motivation.

I am not as sure what the inverse proposal is, but I believe there is
a similar issue... OK in a user-based recommender, I start with a
neighborhood of similar items around each item and... wait, what items
am I starting from?

>
> I know that you can argue users tend to be similar because of their general
> taste in items. And this similarity is not much determined by individual
> similar items. So, it makes more sense to use a constant neighborhood for
> users, irrespective of which item we are trying to rate. But, the same logic
> can be used when it comes to item-item cf. The items inherent features are
> determined by how many users rate it together, and blah blah..

I don't know if that's the argument. It's because the neighborhood
provides a nice, bounded subset of the universe to compute over in the
outermost loop of the algorithm.

These two approaches are not quite symmetrical and it's a subtle and
interesting notion.

I'd argue there's no point in such symmetry. If you want to use
exactly item-based recommendation but substitute "user" for "item",
then you can easily do so already by transposing your input data (swap
user, item ID). You are actually solving a different problem them --
recommend users to items.

And as above, if you try to swap them but solve the same problem, I
think the details of the algorithm may fall apart -- either don't make
sense or don't scale.

One tangentaial reason there is asymmetry is that item-item
similarities tend to be stable and converge. That is, two movies will
never change what movie they are. However users can change, or at
least, an understanding of a user can evolve rapidly as more
information is gained. So item-item similarity lends itself more to
precomputation.

Reply via email to