I'm trying to evaluate a recommendation model, and found that Spark and
Rival <http://dl.acm.org/citation.cfm?id=2645712> give different results,
and it seems that Rival's one is what Kaggle defines

Am I using RankingMetrics in a wrong way, or is Spark's implementation

To my knowledge, NDCG should be dependent on the relevance (or preference)
values, but Spark's implementation
seems not; it uses 1.0 where it should be 2^(relevance) - 1, probably
assuming that relevance is all 1.0? I also tried tweaking, but its method
to obtain the ideal DCG also seems wrong.

Any feedback from MLlib developers would be appreciated. I made a
modified/extended version of RankingMetrics that produces the identical
numbers to Kaggle and Rival's results, and I'm wondering if it is something
appropriate to be added back to MLlib.

Jong Wook

Reply via email to