This might be a question for Xiangrui. Recently I was using
BinaryClassificationMetrics to build an AUC curve for a classifier
over a reasonably large number of points (~12M). The scores were all
probabilities, so tended to be almost entirely unique.
The computation does some operations by key,
Yes, if there are many distinct values, we need binning to compute the
AUC curve. Usually, the scores are not evenly distribution, we cannot
simply truncate the digits. Estimating the quantiles for binning is
necessary, similar to RangePartitioner:
Agree, just rounding only makes sense if the values are sort of evenly
distributed -- in my case they were in 0,1. I will put it on my to-do
list to look at, yes. Thanks for the confirmation.
On Sun, Nov 2, 2014 at 7:44 PM, Xiangrui Meng men...@gmail.com wrote:
Yes, if there are many distinct
Does this happen if you clean and recompile? I've seen failures on and
off, but haven't been able to find one that I could reproduce from a
clean build such that we could hand it to the scala team.
- Patrick
On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid im...@therashids.com wrote:
I'm finding
I often see this when I first build the whole Spark project with SBT, then
modify some code and tries to build and debug within IDEA, or vice versa. A
clean rebuild can always solve this.
On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell pwend...@gmail.com wrote:
Does this happen if you clean
Yes I have seen this same error - and for team members as well - repeatedly
since June. A Patrick and Cheng mentioned, the next step is to do an sbt
clean
2014-11-02 19:37 GMT-08:00 Cheng Lian lian.cs@gmail.com:
I often see this when I first build the whole Spark project with SBT, then
By the way - we can report issues to the Scala/Typesafe team if we
have a way to reproduce this. I just haven't found a reliable
reproduction yet.
- Patrick
On Sun, Nov 2, 2014 at 7:48 PM, Stephen Boesch java...@gmail.com wrote:
Yes I have seen this same error - and for team members as well -