On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov < [email protected]> wrote:
> >> BTW lest we forget this does not imply the Solr-recommender is better >> than Myrrix or the Mahout-only recommenders. There needs to be some careful >> comparison of results. Michael, did you do offline or A/B tests during your >> implementation? >> > > I ran some offline tests using our historical data, but I don't have a lot > of faith in these beyond the fact they indicate we didn't make any obvious > implementation errors. We haven't attempted A/B testing yet since our site > is so new, and we need to get a meaningful baseline going and sort out a > lot of other more pressing issues on the site - recommendations are only > one piece, albeit an important one. > > > Actually there was an interesting idea for an article posted recently > about the difficulty of comparing results across systems in this field: > http://www.docear.org/2013/09/**23/research-paper-recommender-** > system-evaluation-a-**quantitative-literature-**survey/<http://www.docear.org/2013/09/23/research-paper-recommender-system-evaluation-a-quantitative-literature-survey/>but > that's no excuse not to do better. I'll certainly share when I know > more :) I tend to be a pessimist with regard to off-line evaluation. It is fine to do, but if a system is anywhere near best, I think that it should be considered for A/B testing.
