OOM when making bins in BinaryClassificationMetrics ?

2014-11-02 Thread Sean Owen
This might be a question for Xiangrui. Recently I was using BinaryClassificationMetrics to build an AUC curve for a classifier over a reasonably large number of points (~12M). The scores were all probabilities, so tended to be almost entirely unique. The computation does some operations by key,

Re: OOM when making bins in BinaryClassificationMetrics ?

2014-11-02 Thread Xiangrui Meng
Yes, if there are many distinct values, we need binning to compute the AUC curve. Usually, the scores are not evenly distribution, we cannot simply truncate the digits. Estimating the quantiles for binning is necessary, similar to RangePartitioner:

Re: OOM when making bins in BinaryClassificationMetrics ?

2014-11-02 Thread Sean Owen
Agree, just rounding only makes sense if the values are sort of evenly distributed -- in my case they were in 0,1. I will put it on my to-do list to look at, yes. Thanks for the confirmation. On Sun, Nov 2, 2014 at 7:44 PM, Xiangrui Meng men...@gmail.com wrote: Yes, if there are many distinct

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
Does this happen if you clean and recompile? I've seen failures on and off, but haven't been able to find one that I could reproduce from a clean build such that we could hand it to the scala team. - Patrick On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid im...@therashids.com wrote: I'm finding

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Cheng Lian
I often see this when I first build the whole Spark project with SBT, then modify some code and tries to build and debug within IDEA, or vice versa. A clean rebuild can always solve this. On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell pwend...@gmail.com wrote: Does this happen if you clean

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Stephen Boesch
Yes I have seen this same error - and for team members as well - repeatedly since June. A Patrick and Cheng mentioned, the next step is to do an sbt clean 2014-11-02 19:37 GMT-08:00 Cheng Lian lian.cs@gmail.com: I often see this when I first build the whole Spark project with SBT, then

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
By the way - we can report issues to the Scala/Typesafe team if we have a way to reproduce this. I just haven't found a reliable reproduction yet. - Patrick On Sun, Nov 2, 2014 at 7:48 PM, Stephen Boesch java...@gmail.com wrote: Yes I have seen this same error - and for team members as well -