Hi Sean, On 18 October 2011 11:09, Sean Owen <[email protected]> wrote: > Nice question. I have answers I like. > > Really, it would be better to find words that mean > thing-being-recommended-to and thing-being-recommended. I couldn't > find easy, general terms that were more intuitive than "user" and > "item". Even though these things need not be actual people or > products, and so are inaccurate terms, they connote the right sorts of > ways of thinking about what they are and how they work. > > You could also say that since both can be anything, there should be at > best one term for both -- a thing or entity. I don't like this on the > same grounds that it makes things harder to think about in practice. > Is that "thingID" the thing being recommended or recommended to in the > code...? > > More important I don't think users and items are entirely symmetric, > even though you could plug items in for users and vice versa. For > instance, one is 'causing' the ratings and the other isn't. It's > harder to make future predictions about the black-box source of new > surprising data. That is, I may learn something quite new about you in > your 1000th rating, when you rate your first classical music album > ever; the 1000th rating for that same album probably didn't add much > new info. Users, the causers, are more variable. > > And I think you do tend to have an independent/dependent variable, so > to speak, in any setup. And, the algorithms sort of embed that > assymmetry. Item-based recommenders aren't quite the same. For example > it rather encourages you to pre-compute item-item similarity since > this is likely to be relatively fixed, being the dependent variable.
In general, I completely agree with your perspective here. Even when everything bottoms out as matrix maths underneath, that doesn't mean that developers should only ever see that abstraction in their day-to-day hacking. Mahout lets you adopt at various levels; Taste gives almost a drop-in running service; the bin/mahout utility and recommender APIs give a variety of high level entry points, and then of course being opensource, Java developers can jump into the code at any level that suits their need. For lots of those entry points, 'user' and 'item' are a great way to present things. Anyhow, I think my question still holds: is the 'bin/mahout rowsimilarity' piece of Mahout something that should be understood primarily as a recommendations-oriented component? For my application I was seeking just 'the most similar books' for any given book, to feed those affinities to Gephi for visual mapping. I could conceptualise this in terms of recommending I guess; but I didn't. So that's why I was mildly suprised when I noticed that others in Jira and email did seem to think of rowsimiliarityjob in recommendation-oriented terms (ie. users and items). I completely agree that those are useful notions to have in the APIs and utilities, I just somehow wasn't expecting it right there (just as I wouldn't expect it on the more mathsy APIs either). cheers, Dan ps. as an aside, your points here also remind me of a few passages in http://en.wikipedia.org/wiki/Six_Degrees:_The_Science_of_a_Connected_Age that emphasise how a purely mathemetical perspective on networks/graphs can obscure the ways in which different kinds of network can usefully be understood, and that sometimes you do need to think about the social context alongside the maths... > On Tue, Oct 18, 2011 at 9:24 AM, Dan Brickley <[email protected]> wrote: >> As an aside, I've notice this 'users' terminology lurking in the >> background of RowSimilarityJob (eg. in JIRA discussion). >> >> My use of it last week seemed perfectly reasonable; but rows were >> books (or bibliographic records), with feature columns from library >> topic codes. Does the 'user' terminology suggest it's really focussed >> on recommendations? >> >> I'm used to seeing this in the Taste part of Mahout, where sometimes >> it's suggested we can re-use recommender pieces by eg. thinking more >> broadly and 'recommending topics to books' or vice versa. This makes >> sense but introduces an extra layer of conceptual confusion. Is there >> any important sense in which rows (or columns?) in RowSimilarityJob >> ought to be thought of as users? Or the values/weights as preferences? >> >> cheers, >> >> Dan
