Hi Jong,

I think the definition from Kaggle is correct. I'm working on
implementing ranking metrics in Spark ML now, but the timeline is
unknown. Feel free to submit a PR for this in MLlib.



DB Tsai
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D

On Sun, Sep 18, 2016 at 8:42 PM, Jong Wook Kim <jongw...@nyu.edu> wrote:
> Hi,
> I'm trying to evaluate a recommendation model, and found that Spark and
> Rival give different results, and it seems that Rival's one is what Kaggle
> defines: https://gist.github.com/jongwook/5d4e78290eaef22cb69abbf68b52e597
> Am I using RankingMetrics in a wrong way, or is Spark's implementation
> incorrect?
> To my knowledge, NDCG should be dependent on the relevance (or preference)
> values, but Spark's implementation seems not; it uses 1.0 where it should be
> 2^(relevance) - 1, probably assuming that relevance is all 1.0? I also tried
> tweaking, but its method to obtain the ideal DCG also seems wrong.
> Any feedback from MLlib developers would be appreciated. I made a
> modified/extended version of RankingMetrics that produces the identical
> numbers to Kaggle and Rival's results, and I'm wondering if it is something
> appropriate to be added back to MLlib.
> Jong Wook

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to