Thanks for the clarification and the relevant links. I overlooked the
comments explicitly saying that the relevance is binary.
I understand that the label is not a relevance, but I have been, and I
think many people are using the label as relevance in the implicit feedback
context where the user-provided exact label is not defined anyway. I think
that's why RiVal <https://github.com/recommenders/rival>'s using the term
"preference" for both the label for MAE and the relevance for NDCG.
At the same time, I see why Spark decided to assume the relevance is
binary, in part to conform to the class RankingMetrics's constructor. I
think it would be nice if the upcoming DataFrame-based RankingEvaluator can
be optionally set a "relevance column" that has non-binary relevance
values, otherwise defaulting to either 1.0 or the label column.
My extended version of RankingMetrics is here:
https://github.com/jongwook/spark-ranking-metrics . It has a test case
checking that the numbers are same as RiVal's.
On 19 September 2016 at 03:13, Sean Owen <so...@cloudera.com> wrote:
> Yes, relevance is always 1. The label is not a relevance score so
> don't think it's valid to use it as such.
> On Mon, Sep 19, 2016 at 4:42 AM, Jong Wook Kim <jongw...@nyu.edu> wrote:
> > Hi,
> > I'm trying to evaluate a recommendation model, and found that Spark and
> > Rival give different results, and it seems that Rival's one is what
> > defines: https://gist.github.com/jongwook/5d4e78290eaef22cb69abbf68b52e5
> > Am I using RankingMetrics in a wrong way, or is Spark's implementation
> > incorrect?
> > To my knowledge, NDCG should be dependent on the relevance (or
> > values, but Spark's implementation seems not; it uses 1.0 where it
> should be
> > 2^(relevance) - 1, probably assuming that relevance is all 1.0? I also
> > tweaking, but its method to obtain the ideal DCG also seems wrong.
> > Any feedback from MLlib developers would be appreciated. I made a
> > modified/extended version of RankingMetrics that produces the identical
> > numbers to Kaggle and Rival's results, and I'm wondering if it is
> > appropriate to be added back to MLlib.
> > Jong Wook