I'm trying to evaluate a recommendation model, and found that Spark and
Rival <http://dl.acm.org/citation.cfm?id=2645712> give different results,
and it seems that Rival's one is what Kaggle defines
Am I using RankingMetrics in a wrong way, or is Spark's implementation
To my knowledge, NDCG should be dependent on the relevance (or preference)
values, but Spark's implementation
seems not; it uses 1.0 where it should be 2^(relevance) - 1, probably
assuming that relevance is all 1.0? I also tried tweaking, but its method
to obtain the ideal DCG also seems wrong.
Any feedback from MLlib developers would be appreciated. I made a
modified/extended version of RankingMetrics that produces the identical
numbers to Kaggle and Rival's results, and I'm wondering if it is something
appropriate to be added back to MLlib.