Re: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

Sean Owen Thu, 20 Oct 2011 04:25:12 -0700

I misunderstood the original question then.

Thing-thing similarity is a key piece of most recommender algorithms.
RowSimiliartyJob is reused for distributed comptuation. In those
senses, the answer is 'yes'.


But I think the answer is 'no'. The similarity metrics are not derived
from recommenders and are used in other contexts. You can compute
thing-thing similarity for other reasons. There is no user-item
asymmetry at this level, I think. Using the user-item terms is not
well motivated here.

At the same time I don't think it hurts much, and, lets you understand
the relation to recommendations (which is a primary user of this
general component) more easily.


On Thu, Oct 20, 2011 at 12:06 PM, Dan Brickley <[email protected]> wrote:
> In general, I completely agree with your perspective here. Even when
> everything bottoms out as matrix maths underneath, that doesn't mean
> that developers should only ever see that abstraction in their
> day-to-day hacking. Mahout lets you adopt at various levels; Taste
> gives almost a drop-in running service; the bin/mahout utility and
> recommender APIs give a variety of high level entry points, and then
> of course being opensource, Java developers can jump into the code at
> any level that suits their need. For lots of those entry points,
> 'user' and 'item' are a great way to present things.
>
> Anyhow, I think my question still holds: is the 'bin/mahout
> rowsimilarity' piece of Mahout something that should be understood
> primarily as a recommendations-oriented component? For my application
> I was seeking just 'the most similar books' for any given book, to
> feed those affinities to Gephi for visual mapping. I could
> conceptualise this in terms of recommending I guess; but I didn't. So
> that's why I was mildly suprised when I noticed that others in Jira
> and email did seem to think of rowsimiliarityjob in
> recommendation-oriented terms (ie. users and items). I completely
> agree that those are useful notions to have in the APIs and utilities,
> I just somehow wasn't expecting it right there (just as I wouldn't
> expect it on the more mathsy APIs either).
>
> cheers,
>
> Dan
>
> ps. as an aside, your points here also remind me of a few passages in
> http://en.wikipedia.org/wiki/Six_Degrees:_The_Science_of_a_Connected_Age
> that emphasise how a purely mathemetical perspective on
> networks/graphs can obscure the ways in which different kinds of
> network can usefully be understood, and that sometimes you do need to
> think about the social context alongside the maths...
>
>> On Tue, Oct 18, 2011 at 9:24 AM, Dan Brickley <[email protected]> wrote:
>>> As an aside, I've notice this 'users' terminology lurking in the
>>> background of RowSimilarityJob (eg. in JIRA discussion).
>>>
>>> My use of it last week seemed perfectly reasonable; but rows were
>>> books (or bibliographic records), with feature columns from library
>>> topic codes. Does the 'user' terminology suggest it's really focussed
>>> on recommendations?
>>>
>>> I'm used to seeing this in the Taste part of Mahout, where sometimes
>>> it's suggested we can re-use recommender pieces by eg. thinking more
>>> broadly and 'recommending topics to books' or vice versa. This makes
>>> sense but introduces an extra layer of conceptual confusion. Is there
>>> any important sense in which rows (or columns?) in RowSimilarityJob
>>> ought to be thought of as users? Or the values/weights as preferences?
>>>
>>> cheers,
>>>
>>> Dan
>

Re: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

Reply via email to