We do not use these for recommenders. The precision rate is low when the lift in your KPI like sales is relatively high. This is not like classification.
We use MAP@k with increasing values of k. This should yield a diminishing mean average precision chart with increasing k. This tells you 2 things; 1) you are guessing in the right order, Map@1 greater than MAP@2 means your first guess is better than than your second. The rate of decrease tells you how fast the precision drops off with higher k. And 2) the baseline MAP@k for future comparisons to tuning your engine or in champion/challenger comparisons before putting into A/B tests. Also note that RMSE has been pretty much discarded as an offline metric for recommenders, it only really gives you a metric for ratings, and who cares about that. No one wants to optimize rating guess anymore, conversions are all that matters and precision is the way to measure potential conversion since it actually measures how precise our guess about that the user actually converted on in the test set. Ranking is next most important since you have a limited number of recommendations to show, you want the best ranked first. MAP@k over a range of k does this but clients often try to read sales lift in this and there is no absolute relationship. You can guess at one once you have A/B test results, and you should also compare non-recommendation results like random recs, or popular recs. If MAP is lower or close to these, you may not have a good recommender or data. AUC is not for every task. In this case the only positive is a conversion in the test data and the only negative is the absence of conversion and the ROC curve will be nearly useless From: Nasos Papageorgiou <[email protected]> <[email protected]> Reply: [email protected] <[email protected]> <[email protected]> Date: June 12, 2018 at 7:17:04 AM To: [email protected] <[email protected]> <[email protected]> Subject: True Negative - ROC Curve Hi all, I want to use ROC curve (AUC - Area Under the Curve) for evaluation of recommended system in case of retailer. Could you please give an example of True Negative value? i.e. True Positive is the number of items on the Recommended List that are appeared on the test data set, where the test data set may be the 20% of the full data. Thank you. <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
