Re: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

Sean Owen Tue, 18 Oct 2011 02:09:47 -0700

Nice question. I have answers I like.

Really, it would be better to find words that mean
thing-being-recommended-to and thing-being-recommended. I couldn't
find easy, general terms that were more intuitive than "user" and
"item". Even though these things need not be actual people or
products, and so are inaccurate terms, they connote the right sorts of
ways of thinking about what they are and how they work.

You could also say that since both can be anything, there should be at
best one term for both -- a thing or entity. I don't like this on the
same grounds that it makes things harder to think about in practice.
Is that "thingID" the thing being recommended or recommended to in the
code...?

More important I don't think users and items are entirely symmetric,
even though you could plug items in for users and vice versa. For
instance, one is 'causing' the ratings and the other isn't. It's
harder to make future predictions about the black-box source of new
surprising data. That is, I may learn something quite new about you in
your 1000th rating, when you rate your first classical music album
ever; the 1000th rating for that same album probably didn't add much
new info. Users, the causers, are more variable.

And I think you do tend to have an independent/dependent variable, so
to speak, in any setup. And, the algorithms sort of embed that
assymmetry. Item-based recommenders aren't quite the same. For example
it rather encourages you to pre-compute item-item similarity since
this is likely to be relatively fixed, being the dependent variable.

On Tue, Oct 18, 2011 at 9:24 AM, Dan Brickley <[email protected]> wrote:
> As an aside, I've notice this 'users' terminology lurking in the
> background of RowSimilarityJob (eg. in JIRA discussion).
>
> My use of it last week seemed perfectly reasonable; but rows were
> books (or bibliographic records), with feature columns from library
> topic codes. Does the 'user' terminology suggest it's really focussed
> on recommendations?
>
> I'm used to seeing this in the Taste part of Mahout, where sometimes
> it's suggested we can re-use recommender pieces by eg. thinking more
> broadly and 'recommending topics to books' or vice versa. This makes
> sense but introduces an extra layer of conceptual confusion. Is there
> any important sense in which rows (or columns?) in RowSimilarityJob
> ought to be thought of as users? Or the values/weights as preferences?
>
> cheers,
>
> Dan
>

Re: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

Reply via email to