On Tue, May 24, 2011 at 10:17 AM, Uwe Reimann <[email protected]> wrote: > Since the user provides new preferences at a high rate, I expect to change > the neighborhood of an individual user rapidly. Using CachingUserSimilarity > or CachingUserNeighborhood probably won't work here. Using a > ClusteringRecommender seems to be an option here in order to search against > some clusters instead against many users. The cluster should be recalculated > periodically in the background.
(You can have the cache clear just entries for the current user.) Neighborhoods ought to be stable-ish. I would not expect that one new data point would significantly change who your most similar users are. So you can probably get away with perioidically recomputing these, perhaps frequently, but not necessarily at every update. You do need to use the latest preferences in recommendation, of course, but that's separate from calculating a neighborhood. > Dislikes should be considered during similarity search. I'd like to express > those as negative preference values. PearsonCorrelationSimilarity should be > ok with that, right? Yes. > Since I expect to have very low overlap in items between (especially new) > users, I'd like to take the item's category into account during similarity > search. User u1, who likes items i1 of category c1 should get item i2 of > category c1 recommended if user u2 likes that. Both users would have a > preference value for category c1 in common. This should clearly be possible > by just providing the calculated preference values for the category items. You are describing more of an item-based recommender and indeed I think that could be better here since it avoids cold-start problems better. (I prefer it as well.) You might instead look at GenericItemBasedRecommender and ItemSImilarity instead. Your thinking about using Lucene almost surely also applies to item-item similarity. > I think I need to provide different DataModels to the different stages of > recommendation calculation: 1) one which includes likes and dislike for > items and categories for similarity search, 2) one which includes just the > liked items to pick the recommendations from and 3) one which includes all > items of a user (liked, disliked and skipped ones) for filtering out the > user's items using an IDRescorer. I think one DataModel is fine. You want to include all data in similarity calculations (1). It is also good to have all items available in recommendation (2); you don't want to exclude an item just because someone didn't like it. And in (3) you do not need to filter out items the user has rated; that's done already.
