Re: sbt scala compiler crashes on spark-sql
By the way - we can report issues to the Scala/Typesafe team if we have a way to reproduce this. I just haven't found a reliable reproduction yet. - Patrick On Sun, Nov 2, 2014 at 7:48 PM, Stephen Boesch wrote: > Yes I have seen this same error - and for team members as well - repeatedly > since June. A Patrick and Cheng mentioned, the next step is to do an sbt > clean > > 2014-11-02 19:37 GMT-08:00 Cheng Lian : > >> I often see this when I first build the whole Spark project with SBT, then >> modify some code and tries to build and debug within IDEA, or vice versa. >> A >> clean rebuild can always solve this. >> >> On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell >> wrote: >> >> > Does this happen if you clean and recompile? I've seen failures on and >> > off, but haven't been able to find one that I could reproduce from a >> > clean build such that we could hand it to the scala team. >> > >> > - Patrick >> > >> > On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid >> > wrote: >> > > I'm finding the scala compiler crashes when I compile the spark-sql >> > project >> > > in sbt. This happens in both the 1.1 branch and master (full error >> > > below). The other projects build fine in sbt, and everything builds >> > > fine >> > > in maven. is there some sbt option I'm forgetting? Any one else >> > > experiencing this? >> > > >> > > Also, are there up-to-date instructions on how to do common dev tasks >> > > in >> > > both sbt & maven? I have only found these instructions on building >> > > with >> > > maven: >> > > >> > > http://spark.apache.org/docs/latest/building-with-maven.html >> > > >> > > and some general info here: >> > > >> > > >> > > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark >> > > >> > > but I think this doesn't walk through a lot of the steps of a typical >> > > dev >> > > cycle, eg, continuous compilation, running one test, running one main >> > > class, etc. (especially since it seems like people still favor sbt >> > > for >> > > dev.) If it doesn't already exist somewhere, I could try to put >> > together a >> > > brief doc for how to do the basics. (I'm returning to spark dev after >> > > a >> > > little hiatus myself, and I'm hitting some stumbling blocks that are >> > > probably common knowledge to everyone still dealing with it all the >> > time.) >> > > >> > > thanks, >> > > Imran >> > > >> > > -- >> > > full crash info from sbt: >> > > >> > >> project sql >> > > [info] Set current project to spark-sql (in build >> > > file:/Users/imran/spark/spark/) >> > >> compile >> > > [info] Compiling 62 Scala sources to >> > > /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes... >> > > [info] Compiling 45 Scala sources and 39 Java sources to >> > > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes... >> > > [error] >> > > [error] while compiling: >> > > >> > >> > /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala >> > > [error] during phase: jvm >> > > [error] library version: version 2.10.4 >> > > [error] compiler version: version 2.10.4 >> > > [error] reconstructed args: -classpath >> > > >> > >> > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spar
Re: sbt scala compiler crashes on spark-sql
Yes I have seen this same error - and for team members as well - repeatedly since June. A Patrick and Cheng mentioned, the next step is to do an sbt clean 2014-11-02 19:37 GMT-08:00 Cheng Lian : > I often see this when I first build the whole Spark project with SBT, then > modify some code and tries to build and debug within IDEA, or vice versa. A > clean rebuild can always solve this. > > On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell > wrote: > > > Does this happen if you clean and recompile? I've seen failures on and > > off, but haven't been able to find one that I could reproduce from a > > clean build such that we could hand it to the scala team. > > > > - Patrick > > > > On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid > wrote: > > > I'm finding the scala compiler crashes when I compile the spark-sql > > project > > > in sbt. This happens in both the 1.1 branch and master (full error > > > below). The other projects build fine in sbt, and everything builds > fine > > > in maven. is there some sbt option I'm forgetting? Any one else > > > experiencing this? > > > > > > Also, are there up-to-date instructions on how to do common dev tasks > in > > > both sbt & maven? I have only found these instructions on building > with > > > maven: > > > > > > http://spark.apache.org/docs/latest/building-with-maven.html > > > > > > and some general info here: > > > > > > > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > > > > > > but I think this doesn't walk through a lot of the steps of a typical > dev > > > cycle, eg, continuous compilation, running one test, running one main > > > class, etc. (especially since it seems like people still favor sbt for > > > dev.) If it doesn't already exist somewhere, I could try to put > > together a > > > brief doc for how to do the basics. (I'm returning to spark dev after > a > > > little hiatus myself, and I'm hitting some stumbling blocks that are > > > probably common knowledge to everyone still dealing with it all the > > time.) > > > > > > thanks, > > > Imran > > > > > > -- > > > full crash info from sbt: > > > > > >> project sql > > > [info] Set current project to spark-sql (in build > > > file:/Users/imran/spark/spark/) > > >> compile > > > [info] Compiling 62 Scala sources to > > > /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes... > > > [info] Compiling 45 Scala sources and 39 Java sources to > > > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes... > > > [error] > > > [error] while compiling: > > > > > > /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala > > > [error] during phase: jvm > > > [error] library version: version 2.10.4 > > > [error] compiler version: version 2.10.4 > > > [error] reconstructed args: -classpath > > > > > > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.1
Re: sbt scala compiler crashes on spark-sql
I often see this when I first build the whole Spark project with SBT, then modify some code and tries to build and debug within IDEA, or vice versa. A clean rebuild can always solve this. On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell wrote: > Does this happen if you clean and recompile? I've seen failures on and > off, but haven't been able to find one that I could reproduce from a > clean build such that we could hand it to the scala team. > > - Patrick > > On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid wrote: > > I'm finding the scala compiler crashes when I compile the spark-sql > project > > in sbt. This happens in both the 1.1 branch and master (full error > > below). The other projects build fine in sbt, and everything builds fine > > in maven. is there some sbt option I'm forgetting? Any one else > > experiencing this? > > > > Also, are there up-to-date instructions on how to do common dev tasks in > > both sbt & maven? I have only found these instructions on building with > > maven: > > > > http://spark.apache.org/docs/latest/building-with-maven.html > > > > and some general info here: > > > > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > > > > but I think this doesn't walk through a lot of the steps of a typical dev > > cycle, eg, continuous compilation, running one test, running one main > > class, etc. (especially since it seems like people still favor sbt for > > dev.) If it doesn't already exist somewhere, I could try to put > together a > > brief doc for how to do the basics. (I'm returning to spark dev after a > > little hiatus myself, and I'm hitting some stumbling blocks that are > > probably common knowledge to everyone still dealing with it all the > time.) > > > > thanks, > > Imran > > > > -- > > full crash info from sbt: > > > >> project sql > > [info] Set current project to spark-sql (in build > > file:/Users/imran/spark/spark/) > >> compile > > [info] Compiling 62 Scala sources to > > /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes... > > [info] Compiling 45 Scala sources and 39 Java sources to > > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes... > > [error] > > [error] while compiling: > > > /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala > > [error] during phase: jvm > > [error] library version: version 2.10.4 > > [error] compiler version: version 2.10.4 > > [error] reconstructed args: -classpath > > > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-util-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-servlet-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-security-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-server-8.1.14.v20131031.jar:/Users/imran/spark/spa
Re: sbt scala compiler crashes on spark-sql
Does this happen if you clean and recompile? I've seen failures on and off, but haven't been able to find one that I could reproduce from a clean build such that we could hand it to the scala team. - Patrick On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid wrote: > I'm finding the scala compiler crashes when I compile the spark-sql project > in sbt. This happens in both the 1.1 branch and master (full error > below). The other projects build fine in sbt, and everything builds fine > in maven. is there some sbt option I'm forgetting? Any one else > experiencing this? > > Also, are there up-to-date instructions on how to do common dev tasks in > both sbt & maven? I have only found these instructions on building with > maven: > > http://spark.apache.org/docs/latest/building-with-maven.html > > and some general info here: > > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > > but I think this doesn't walk through a lot of the steps of a typical dev > cycle, eg, continuous compilation, running one test, running one main > class, etc. (especially since it seems like people still favor sbt for > dev.) If it doesn't already exist somewhere, I could try to put together a > brief doc for how to do the basics. (I'm returning to spark dev after a > little hiatus myself, and I'm hitting some stumbling blocks that are > probably common knowledge to everyone still dealing with it all the time.) > > thanks, > Imran > > -- > full crash info from sbt: > >> project sql > [info] Set current project to spark-sql (in build > file:/Users/imran/spark/spark/) >> compile > [info] Compiling 62 Scala sources to > /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes... > [info] Compiling 45 Scala sources and 39 Java sources to > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes... > [error] > [error] while compiling: > /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala > [error] during phase: jvm > [error] library version: version 2.10.4 > [error] compiler version: version 2.10.4 > [error] reconstructed args: -classpath > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-util-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-servlet-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-security-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-server-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.servlet-3.0.0.v201112011016.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-continuation-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-http-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-io-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-jndi-8.1.14.v20131031.jar
sbt scala compiler crashes on spark-sql
I'm finding the scala compiler crashes when I compile the spark-sql project in sbt. This happens in both the 1.1 branch and master (full error below). The other projects build fine in sbt, and everything builds fine in maven. is there some sbt option I'm forgetting? Any one else experiencing this? Also, are there up-to-date instructions on how to do common dev tasks in both sbt & maven? I have only found these instructions on building with maven: http://spark.apache.org/docs/latest/building-with-maven.html and some general info here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark but I think this doesn't walk through a lot of the steps of a typical dev cycle, eg, continuous compilation, running one test, running one main class, etc. (especially since it seems like people still favor sbt for dev.) If it doesn't already exist somewhere, I could try to put together a brief doc for how to do the basics. (I'm returning to spark dev after a little hiatus myself, and I'm hitting some stumbling blocks that are probably common knowledge to everyone still dealing with it all the time.) thanks, Imran -- full crash info from sbt: > project sql [info] Set current project to spark-sql (in build file:/Users/imran/spark/spark/) > compile [info] Compiling 62 Scala sources to /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes... [info] Compiling 45 Scala sources and 39 Java sources to /Users/imran/spark/spark/sql/core/target/scala-2.10/classes... [error] [error] while compiling: /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala [error] during phase: jvm [error] library version: version 2.10.4 [error] compiler version: version 2.10.4 [error] reconstructed args: -classpath /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-util-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-servlet-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-security-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-server-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.servlet-3.0.0.v201112011016.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-continuation-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-http-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-io-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-jndi-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.mail.glassfish-1.4.1.v201005082020.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.activation-1.1.0.v201105071233.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang3-3.3.2.jar:/Users/imran/spark/spark/lib_managed/jars/jsr305-1.3.9.jar:/Users/imran/spark/spark/lib_managed/jars/slf
Re: OOM when making bins in BinaryClassificationMetrics ?
Agree, just rounding only makes sense if the values are sort of evenly distributed -- in my case they were in 0,1. I will put it on my to-do list to look at, yes. Thanks for the confirmation. On Sun, Nov 2, 2014 at 7:44 PM, Xiangrui Meng wrote: > Yes, if there are many distinct values, we need binning to compute the > AUC curve. Usually, the scores are not evenly distribution, we cannot > simply truncate the digits. Estimating the quantiles for binning is > necessary, similar to RangePartitioner: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L104 > . Limiting the number of bins is definitely useful. Do you have time > to work on it? -Xiangrui > > On Sun, Nov 2, 2014 at 9:34 AM, Sean Owen wrote: >> This might be a question for Xiangrui. Recently I was using >> BinaryClassificationMetrics to build an AUC curve for a classifier >> over a reasonably large number of points (~12M). The scores were all >> probabilities, so tended to be almost entirely unique. >> >> The computation does some operations by key, and this ran out of >> memory. It's something you can solve with more than the default amount >> of memory, but in this case, it seemed unuseful to create an AUC curve >> with such fine-grained resolution. >> >> I ended up just binning the scores so there were ~1000 unique values >> and then it was fine. >> >> Does that sound generally useful as some kind of parameter? or am I >> missing a trick here. >> >> Sean >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: OOM when making bins in BinaryClassificationMetrics ?
Yes, if there are many distinct values, we need binning to compute the AUC curve. Usually, the scores are not evenly distribution, we cannot simply truncate the digits. Estimating the quantiles for binning is necessary, similar to RangePartitioner: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L104 . Limiting the number of bins is definitely useful. Do you have time to work on it? -Xiangrui On Sun, Nov 2, 2014 at 9:34 AM, Sean Owen wrote: > This might be a question for Xiangrui. Recently I was using > BinaryClassificationMetrics to build an AUC curve for a classifier > over a reasonably large number of points (~12M). The scores were all > probabilities, so tended to be almost entirely unique. > > The computation does some operations by key, and this ran out of > memory. It's something you can solve with more than the default amount > of memory, but in this case, it seemed unuseful to create an AUC curve > with such fine-grained resolution. > > I ended up just binning the scores so there were ~1000 unique values > and then it was fine. > > Does that sound generally useful as some kind of parameter? or am I > missing a trick here. > > Sean > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
OOM when making bins in BinaryClassificationMetrics ?
This might be a question for Xiangrui. Recently I was using BinaryClassificationMetrics to build an AUC curve for a classifier over a reasonably large number of points (~12M). The scores were all probabilities, so tended to be almost entirely unique. The computation does some operations by key, and this ran out of memory. It's something you can solve with more than the default amount of memory, but in this case, it seemed unuseful to create an AUC curve with such fine-grained resolution. I ended up just binning the scores so there were ~1000 unique values and then it was fine. Does that sound generally useful as some kind of parameter? or am I missing a trick here. Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org