I always use SIMILARITY_LOGLIKELIHOOD. LLR almost always works best for places that call for “similarity” or “distance”.
1000 people isn’t very many. How many items? Look in the data and count unique number of users * number of items, this tell you the cardinality of your data. The number of interactions will tell you how sparse the data is. So it you have 1000 items you have a 1000 by 1000 input matrix, most of which will be empty and therefore “sparse”. But if there aren’t enough interactions or non blank spots in the matrix you will not have enough data to return recs for every user. Collaborative filtering works well if you have long lived items and enough users interacting with them. To get a handle on whether your data supports CF ask youself: How many interactions? Every unique input (userID, itemID) is an interaction. How many people interacted with each item? How many people total? How many people interacted with more than one item? Another way is to run the hadoop version of the recommender (using LLR) and see how many people get recommendations. LLR uses the above mentioned metrics in calculating recs so the number of people that get recs is an indirect way of telling how dense your data is. On Aug 25, 2014, at 1:51 AM, Wei Li <[email protected]> wrote: Thanks Sharma, does all similarity measures have this problem or only some specific similarity measures have? On Mon, Aug 25, 2014 at 4:48 PM, Yash Sharma <[email protected]> wrote: > Pearson Coefficient Similarity does not go very well with small datasets > with less similarities - and removes those from output. Since you are using > co-occurrence similarity this is not the case. > > > On Mon, Aug 25, 2014 at 2:11 PM, Peng Zhang <[email protected]> wrote: > >> If there are no suitable recommendations for a user, the output will not >> contain any records related to this user. >> >> >> Peng Zhang >> >> >> On Aug 25, 2014, at 4:38 PM, Wei Li <[email protected]> wrote: >> >>> thanks Peng's answers. Yes, I know this case, but RecommenderJob does > not >>> output these records? >>> >>> >>> On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <[email protected]> >> wrote: >>> >>>> If an item is not similar to anyone else, and a user only connects > with >>>> this item, this user doesnt get any recommended items. >>>> >>>> This is just one example. >>>> >>>> Peng Zhang >>>> >>>> -- >>>> Sent from my iPhone >>>> >>>>> On Aug 25, 2014, at 2:22 PM, Wei Li <[email protected]> wrote: >>>>> >>>>> Hi Mahout users: >>>>> >>>>> We have tried the item-based CF recommender with a user_id, > item_id, >>>>> rating data. while the recommendation output is less than our > expected, >>>> for >>>>> example, if we have 1000 users, the output should have 1000 records, >> one >>>>> for each user, right? >>>>> >>>>> Best >>>>> Wei >>>> >> >> >
