Hi Rafal, you are right, unfortunately there is no tooling available for doing holdout tests with RecommenderJob. It would be an awesome contribution to Mahout though.
Ideally, you would want to split your dataset in a way that you retain some portion of the interactions of each user and then see how much of the held-out interactions you can reproduce. You should be aware that this is basically a test of how good a recommender can reproduce what already happened. If you get recommendations for items that are not in your held out data, this does not automatically mean that they are wrong. They might be very interesting things that the user simply hasn't had a chance to look at yet. The real "performance" of a recommender can only be found via extensive A/B testing in production systems. Btw, I would strongly recommend that you use a more sophisticated similarity than cooccurrence count, e..g LoglikelihoodRation. Best, Sebastian 2013/8/8 Rafal Lukawiecki <[email protected]> > I'd like to compare the accuracy, precision and recall of various vector > similarity measures with regards to our data sets. Ideally, I'd like to do > that for RecommenderJob, including CooccurrenceCount. However, I don't > think RecommenderJob supports calculation of the performance metrics. > > Alternatively, I could use the evaluator logic in the non-Hadoop-based > Item-based recommenders, but they do not seem to support the option of > using CooccurrenceCount as a measure, or am I wrong? > > Reading archived conversations from here, I can see others have asked a > similar question in 2011 ( > http://comments.gmane.org/gmane.comp.apache.mahout.user/9758) but there > seems no clear guidance. Also, I am unsure if it is valid to split the data > set into training/testing that way, as testing users' key characteristic is > the items they have preferred—and there is no "model" to fit them to, so to > speak, or they would become anonymous users if we stripped their > preferences. Am I right in thinking that I could test RecommenderJob by > feeding X random preferences of a user, having hidden the remainder of > their preferences, and see if the hidden items/preferences would become > their recommendations? However, that approach would change what a user > "likes" (by hiding their preferences for testing purposes) and I'd be > concerned about the value of the recommendation. Am I in a loop? Is there a > way to somehow tap into the recommendation to get an accuracy metric out? > > Did anyone, perhaps, share a method or a script (R, Python, Java) for > evaluating RecommenderJob results? > > Many thanks, > Rafal Lukawiecki >
