Hello, I'm trying to get a better understanding of the following 2 RecommenderJob parameters: 1) --maxCooccurrencesPerItem (integer): Maximum number of cooccurrences considered per item (100) 2) --maxSimilaritiesPerItem (integer): Maximum number of similarities considered per item (100)
Could you please help me to understand these in terms of a recommender job where we are trying to recommend items to users? >From what I see, maxCooccurrencesPerItem first gets used in job 4/12 in the pipeline, the MaybePruneRowsMapper job. Does maxCooccurrencesPerItem limit the number of cooccurrences that are kept for that item? Is this limit within a single user's set of items or globally for all users? For example, if a user has 100 items then each item can be seen to cooccur with the 99 other items. Taking all user libraries, however, assume that it cooccurs with 1,000,000 other items. Does maxCooccurrencesPerItem limit the number of cooccurrences on a user item set basis or is this applied to the set of items with which the item cooccurs with regard to all user libraries? Also, how is the selection made (most frequent or first found)? maxSimilaritiesPerItem first gets used in job 7/12 in the pipeline, EntriesToVectorsReducer. Does this cap the number of rows that are compared with one another? Are the rows cooccurrence vectors of items for a given user by this point in the process? Thanks, Kris
