Understanding mahout's recommendation system parameters

Kris Jack Thu, 14 Jul 2011 07:12:38 -0700

Hello,

I'm trying to get a better understanding of the following 2 RecommenderJob
parameters:
1) --maxCooccurrencesPerItem (integer): Maximum number of cooccurrences
considered per item (100)
2) --maxSimilaritiesPerItem (integer): Maximum number of similarities
considered per item (100)


Could you please help me to understand these in terms of a recommender job
where we are trying to recommend items to users?

>From what I see, maxCooccurrencesPerItem first gets used in job 4/12 in the
pipeline, the MaybePruneRowsMapper job.  Does maxCooccurrencesPerItem limit
the number of cooccurrences that are kept for that item?  Is this limit
within a single user's set of items or globally for all users?  For example,
if a user has 100 items then each item can be seen to cooccur with the 99
other items.  Taking all user libraries, however, assume that it cooccurs
with 1,000,000 other items.  Does maxCooccurrencesPerItem limit the number
of cooccurrences on a user item set basis or is this applied to the set of
items with which the item cooccurs with regard to all user libraries?  Also,
how is the selection made (most frequent or first found)?

maxSimilaritiesPerItem first gets used in job 7/12 in the pipeline,
EntriesToVectorsReducer.  Does this cap the number of rows that are compared
with one another?  Are the rows cooccurrence vectors of items for a given
user by this point in the process?

Thanks,
Kris

Understanding mahout's recommendation system parameters

Reply via email to