Hi Brian, Happy to give you some details: So, from a matrix A (user x item) that holds user-item interactions, this algorithm first computes a matrix S (item x item) of item similarities and afterwards uses these item similarities to compute recommendations for users.
the parameters refer to the following: 'maxPrefsPerUserInItemSimilarity' the maximum number of interactions per user to take into account when computing S (e.g. the maximum number of entries to look at per row in A, selected at random). Single power-users with an anomalous number of interactions can heavily increase the computation time, without contributing to the actual quality of the output. Setting this to something like 500 should give you reasonable performance and results. 'maxSimilaritiesPerItem' this number determines the maximum number of similar items to look at per item (e.g. the maximum number of entries per row in S). Research papers reported good results with something between 20 and 100. 'maxPrefsPerUser': this number determines how many interactions per user to take into account in the final recommendation phase. This thing is probably bugged and should be set to a very high number (as large as the maximum number of interactions per user or larger) otherwise you might see items in the recommendations that the user already knows. In general, the only way to get a picture of the quality of a recommender is by doing tests in a live system with real users. You can of course do some hold-out tests or cross-validation offline, but good performance there does not necessarily correlate with good performance in a real system. I suggest you start by using the default values, do you use trunk or 0.8? Best, Sebastian 2013/9/11 Brian Arnold <[email protected]> > Hi, > > I am currently trying to run the distributed Item Based Collaborative > filtering algorithm on our Hadoop cluster, and I have a few questions > regarding tweaking the various properties of the algorithm. For the > maxPrefsPerUser,maxSimilaritiesPerItem, and maxPrefsPerUserItemSimilarity > properties I was wondering if I could get a more detailed explanation of > what these properties control. I saw the description in the code, but I am > just wondering how changing these values will affect the results of the > algorithm, and will increasing them result in a better recommendation. > > Thanks >
