Ok, you guys got me convinced :) >From a technical point of view two ways to implement that filter come to my mind:
1) Just load the user/item pairs to filter into memory in the AggregateAndRecommendReducer (easy but might not be scalable) like Han Hui suggested 2) Have the AggregateAndRecommendReducer not pick only the top-K recommendations but write all predicted preferences to disk. Add another M/R step after that which joins recommendations and user/item filter pairs to allow for custom rescoring/filtering --sebastian Am 24.08.2010 06:07, schrieb Ted Dunning: > Sorry to chime in late, but removing items after recommendation isn't such a > crazy thing to do. > > In particular, it is common to remove previously viewed items (for a period > of time). Likewise, it the user says "don't show this again", it makes > sense to backstop the actual recommendation system with a UI limitation that > does a post-recommendation elimination. > > Moreover, this approach has the great benefit that the results are very > predictable. Exactly the requested/seen items will be eliminated and no > surprising effect on recommendations will occur. > > That predictability is exactly the problem, though. Generally you want a > bit more systemic effect for negative recommendations. This is a really > sticky area, however, because negative recommendations often impart > information about positive preferences in addition to some level of negative > information. > > I used an explicit filter at both Musicmatch and at Veoh. Both systems > worked well. Especially at Veoh, there was a lot of additional machinery > required to handle the related problem of anti-flooding. That was done at > the UI level as well. > > On Mon, Aug 23, 2010 at 8:16 PM, Sean Owen <[email protected]> wrote: > > >> (Uncanny, I was just minutes before researching Grooveshark for >> unrelated reasons... Good to hear from any company doing >> recommendations and is willing to talk about it. I know of a number >> that can't or won't unfortunately.) >> >> Yeah, sounds like we're all on the same page. One key point in what I >> think everyone is talking about is that this is not simply removing >> items *after* recommendations are computed. This risks removing most >> or all recommended items. It needs to be done during the process of >> selecting recommendations. >> >> But beyond that, it's a simple idea and just a question of >> implementation. It's "Rescorer" in the non-Hadoop code, which does >> more than provide a way to remove items but rather generally rearrange >> recommendations according to some logic. I think it's likely easy and >> useful to imitate this with a simple optional Mapper/Reducer phase in >> this nascent "RecommenderJob" pipeline that Sebastian is now helping >> expand into something more configurable and general purpose. >> >> Sean >> >> On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates >> <[email protected]> wrote: >> >>> Hi all, >>> >>> I'm new to this forum and haven't seen the code you are talking about, so >>> take this with a grain of salt. The way we handle "banned items" at >>> Grooveshark is to post-process the itemID pairs in Hive. If a user >>> >> dislikes >> >>> a recommended song/artist, an item pair is stored in HDFS and then when >>> >> the >> >>> recs are computed, those banned user-item pairs are taken into account. >>> Here is an example query: >>> >>> SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as banned FROM >>> streams_u2u st LEFT OUTER JOIN bannedsimusers b ON (b.simuid=st.simuid); >>> >>> That query will print out a 1 or a 0 if the recommended item pair is >>> >> banned >> >>> or not. Hive also supports case statements (I think), so you can make a >>> range of "banned-ness" I guess. Just another solution to the "dislike" >>> problem. >>> >>> Chris >>> >> >
