Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
By the way - we can report issues to the Scala/Typesafe team if we
have a way to reproduce this. I just haven't found a reliable
reproduction yet.

- Patrick

On Sun, Nov 2, 2014 at 7:48 PM, Stephen Boesch  wrote:
> Yes I have seen this same error - and for team members as well - repeatedly
> since June. A Patrick and Cheng mentioned, the next step is to do an sbt
> clean
>
> 2014-11-02 19:37 GMT-08:00 Cheng Lian :
>
>> I often see this when I first build the whole Spark project with SBT, then
>> modify some code and tries to build and debug within IDEA, or vice versa.
>> A
>> clean rebuild can always solve this.
>>
>> On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell 
>> wrote:
>>
>> > Does this happen if you clean and recompile? I've seen failures on and
>> > off, but haven't been able to find one that I could reproduce from a
>> > clean build such that we could hand it to the scala team.
>> >
>> > - Patrick
>> >
>> > On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid 
>> > wrote:
>> > > I'm finding the scala compiler crashes when I compile the spark-sql
>> > project
>> > > in sbt.  This happens in both the 1.1 branch and master (full error
>> > > below).  The other projects build fine in sbt, and everything builds
>> > > fine
>> > > in maven.  is there some sbt option I'm forgetting?  Any one else
>> > > experiencing this?
>> > >
>> > > Also, are there up-to-date instructions on how to do common dev tasks
>> > > in
>> > > both sbt & maven?  I have only found these instructions on building
>> > > with
>> > > maven:
>> > >
>> > > http://spark.apache.org/docs/latest/building-with-maven.html
>> > >
>> > > and some general info here:
>> > >
>> > >
>> > > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>> > >
>> > > but I think this doesn't walk through a lot of the steps of a typical
>> > > dev
>> > > cycle, eg, continuous compilation, running one test, running one main
>> > > class, etc.  (especially since it seems like people still favor sbt
>> > > for
>> > > dev.)  If it doesn't already exist somewhere, I could try to put
>> > together a
>> > > brief doc for how to do the basics.  (I'm returning to spark dev after
>> > > a
>> > > little hiatus myself, and I'm hitting some stumbling blocks that are
>> > > probably common knowledge to everyone still dealing with it all the
>> > time.)
>> > >
>> > > thanks,
>> > > Imran
>> > >
>> > > --
>> > > full crash info from sbt:
>> > >
>> > >> project sql
>> > > [info] Set current project to spark-sql (in build
>> > > file:/Users/imran/spark/spark/)
>> > >> compile
>> > > [info] Compiling 62 Scala sources to
>> > > /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes...
>> > > [info] Compiling 45 Scala sources and 39 Java sources to
>> > > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes...
>> > > [error]
>> > > [error]  while compiling:
>> > >
>> >
>> > /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala
>> > > [error] during phase: jvm
>> > > [error]  library version: version 2.10.4
>> > > [error] compiler version: version 2.10.4
>> > > [error]   reconstructed args: -classpath
>> > >
>> >
>> > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spar

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Stephen Boesch
Yes I have seen this same error - and for team members as well - repeatedly
since June. A Patrick and Cheng mentioned, the next step is to do an sbt
clean

2014-11-02 19:37 GMT-08:00 Cheng Lian :

> I often see this when I first build the whole Spark project with SBT, then
> modify some code and tries to build and debug within IDEA, or vice versa. A
> clean rebuild can always solve this.
>
> On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell 
> wrote:
>
> > Does this happen if you clean and recompile? I've seen failures on and
> > off, but haven't been able to find one that I could reproduce from a
> > clean build such that we could hand it to the scala team.
> >
> > - Patrick
> >
> > On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid 
> wrote:
> > > I'm finding the scala compiler crashes when I compile the spark-sql
> > project
> > > in sbt.  This happens in both the 1.1 branch and master (full error
> > > below).  The other projects build fine in sbt, and everything builds
> fine
> > > in maven.  is there some sbt option I'm forgetting?  Any one else
> > > experiencing this?
> > >
> > > Also, are there up-to-date instructions on how to do common dev tasks
> in
> > > both sbt & maven?  I have only found these instructions on building
> with
> > > maven:
> > >
> > > http://spark.apache.org/docs/latest/building-with-maven.html
> > >
> > > and some general info here:
> > >
> > >
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> > >
> > > but I think this doesn't walk through a lot of the steps of a typical
> dev
> > > cycle, eg, continuous compilation, running one test, running one main
> > > class, etc.  (especially since it seems like people still favor sbt for
> > > dev.)  If it doesn't already exist somewhere, I could try to put
> > together a
> > > brief doc for how to do the basics.  (I'm returning to spark dev after
> a
> > > little hiatus myself, and I'm hitting some stumbling blocks that are
> > > probably common knowledge to everyone still dealing with it all the
> > time.)
> > >
> > > thanks,
> > > Imran
> > >
> > > --
> > > full crash info from sbt:
> > >
> > >> project sql
> > > [info] Set current project to spark-sql (in build
> > > file:/Users/imran/spark/spark/)
> > >> compile
> > > [info] Compiling 62 Scala sources to
> > > /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes...
> > > [info] Compiling 45 Scala sources and 39 Java sources to
> > > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes...
> > > [error]
> > > [error]  while compiling:
> > >
> >
> /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala
> > > [error] during phase: jvm
> > > [error]  library version: version 2.10.4
> > > [error] compiler version: version 2.10.4
> > > [error]   reconstructed args: -classpath
> > >
> >
> /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.1

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Cheng Lian
I often see this when I first build the whole Spark project with SBT, then
modify some code and tries to build and debug within IDEA, or vice versa. A
clean rebuild can always solve this.

On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell  wrote:

> Does this happen if you clean and recompile? I've seen failures on and
> off, but haven't been able to find one that I could reproduce from a
> clean build such that we could hand it to the scala team.
>
> - Patrick
>
> On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid  wrote:
> > I'm finding the scala compiler crashes when I compile the spark-sql
> project
> > in sbt.  This happens in both the 1.1 branch and master (full error
> > below).  The other projects build fine in sbt, and everything builds fine
> > in maven.  is there some sbt option I'm forgetting?  Any one else
> > experiencing this?
> >
> > Also, are there up-to-date instructions on how to do common dev tasks in
> > both sbt & maven?  I have only found these instructions on building with
> > maven:
> >
> > http://spark.apache.org/docs/latest/building-with-maven.html
> >
> > and some general info here:
> >
> > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> >
> > but I think this doesn't walk through a lot of the steps of a typical dev
> > cycle, eg, continuous compilation, running one test, running one main
> > class, etc.  (especially since it seems like people still favor sbt for
> > dev.)  If it doesn't already exist somewhere, I could try to put
> together a
> > brief doc for how to do the basics.  (I'm returning to spark dev after a
> > little hiatus myself, and I'm hitting some stumbling blocks that are
> > probably common knowledge to everyone still dealing with it all the
> time.)
> >
> > thanks,
> > Imran
> >
> > --
> > full crash info from sbt:
> >
> >> project sql
> > [info] Set current project to spark-sql (in build
> > file:/Users/imran/spark/spark/)
> >> compile
> > [info] Compiling 62 Scala sources to
> > /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes...
> > [info] Compiling 45 Scala sources and 39 Java sources to
> > /Users/imran/spark/spark/sql/core/target/scala-2.10/classes...
> > [error]
> > [error]  while compiling:
> >
> /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala
> > [error] during phase: jvm
> > [error]  library version: version 2.10.4
> > [error] compiler version: version 2.10.4
> > [error]   reconstructed args: -classpath
> >
> /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-util-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-servlet-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-security-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-server-8.1.14.v20131031.jar:/Users/imran/spark/spa

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
Does this happen if you clean and recompile? I've seen failures on and
off, but haven't been able to find one that I could reproduce from a
clean build such that we could hand it to the scala team.

- Patrick

On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid  wrote:
> I'm finding the scala compiler crashes when I compile the spark-sql project
> in sbt.  This happens in both the 1.1 branch and master (full error
> below).  The other projects build fine in sbt, and everything builds fine
> in maven.  is there some sbt option I'm forgetting?  Any one else
> experiencing this?
>
> Also, are there up-to-date instructions on how to do common dev tasks in
> both sbt & maven?  I have only found these instructions on building with
> maven:
>
> http://spark.apache.org/docs/latest/building-with-maven.html
>
> and some general info here:
>
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>
> but I think this doesn't walk through a lot of the steps of a typical dev
> cycle, eg, continuous compilation, running one test, running one main
> class, etc.  (especially since it seems like people still favor sbt for
> dev.)  If it doesn't already exist somewhere, I could try to put together a
> brief doc for how to do the basics.  (I'm returning to spark dev after a
> little hiatus myself, and I'm hitting some stumbling blocks that are
> probably common knowledge to everyone still dealing with it all the time.)
>
> thanks,
> Imran
>
> --
> full crash info from sbt:
>
>> project sql
> [info] Set current project to spark-sql (in build
> file:/Users/imran/spark/spark/)
>> compile
> [info] Compiling 62 Scala sources to
> /Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes...
> [info] Compiling 45 Scala sources and 39 Java sources to
> /Users/imran/spark/spark/sql/core/target/scala-2.10/classes...
> [error]
> [error]  while compiling:
> /Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala
> [error] during phase: jvm
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4
> [error]   reconstructed args: -classpath
> /Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-util-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-servlet-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-security-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-server-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.servlet-3.0.0.v201112011016.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-continuation-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-http-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-io-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-jndi-8.1.14.v20131031.jar

sbt scala compiler crashes on spark-sql

2014-11-02 Thread Imran Rashid
I'm finding the scala compiler crashes when I compile the spark-sql project
in sbt.  This happens in both the 1.1 branch and master (full error
below).  The other projects build fine in sbt, and everything builds fine
in maven.  is there some sbt option I'm forgetting?  Any one else
experiencing this?

Also, are there up-to-date instructions on how to do common dev tasks in
both sbt & maven?  I have only found these instructions on building with
maven:

http://spark.apache.org/docs/latest/building-with-maven.html

and some general info here:

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

but I think this doesn't walk through a lot of the steps of a typical dev
cycle, eg, continuous compilation, running one test, running one main
class, etc.  (especially since it seems like people still favor sbt for
dev.)  If it doesn't already exist somewhere, I could try to put together a
brief doc for how to do the basics.  (I'm returning to spark dev after a
little hiatus myself, and I'm hitting some stumbling blocks that are
probably common knowledge to everyone still dealing with it all the time.)

thanks,
Imran

--
full crash info from sbt:

> project sql
[info] Set current project to spark-sql (in build
file:/Users/imran/spark/spark/)
> compile
[info] Compiling 62 Scala sources to
/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes...
[info] Compiling 45 Scala sources and 39 Java sources to
/Users/imran/spark/spark/sql/core/target/scala-2.10/classes...
[error]
[error]  while compiling:
/Users/imran/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/types/util/DataTypeConversions.scala
[error] during phase: jvm
[error]  library version: version 2.10.4
[error] compiler version: version 2.10.4
[error]   reconstructed args: -classpath
/Users/imran/spark/spark/sql/core/target/scala-2.10/classes:/Users/imran/spark/spark/core/target/scala-2.10/classes:/Users/imran/spark/spark/sql/catalyst/target/scala-2.10/classes:/Users/imran/spark/spark/lib_managed/jars/hadoop-client-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/hadoop-core-1.0.4.jar:/Users/imran/spark/spark/lib_managed/jars/xmlenc-0.52.jar:/Users/imran/spark/spark/lib_managed/jars/commons-math-2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-configuration-1.6.jar:/Users/imran/spark/spark/lib_managed/jars/commons-collections-3.2.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang-2.4.jar:/Users/imran/spark/spark/lib_managed/jars/commons-logging-1.1.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-digester-1.8.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-1.7.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-beanutils-core-1.8.0.jar:/Users/imran/spark/spark/lib_managed/jars/commons-net-2.2.jar:/Users/imran/spark/spark/lib_managed/jars/commons-el-1.0.jar:/Users/imran/spark/spark/lib_managed/jars/hsqldb-1.8.0.10.jar:/Users/imran/spark/spark/lib_managed/jars/oro-2.0.8.jar:/Users/imran/spark/spark/lib_managed/jars/jets3t-0.7.1.jar:/Users/imran/spark/spark/lib_managed/jars/commons-httpclient-3.1.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-recipes-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-framework-2.4.0.jar:/Users/imran/spark/spark/lib_managed/bundles/curator-client-2.4.0.jar:/Users/imran/spark/spark/lib_managed/jars/zookeeper-3.4.5.jar:/Users/imran/spark/spark/lib_managed/jars/slf4j-log4j12-1.7.5.jar:/Users/imran/spark/spark/lib_managed/bundles/log4j-1.2.17.jar:/Users/imran/spark/spark/lib_managed/jars/jline-0.9.94.jar:/Users/imran/spark/spark/lib_managed/bundles/guava-14.0.1.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-plus-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.transaction-1.1.1.v201105210645.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-webapp-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-xml-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-util-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-servlet-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-security-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-server-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.servlet-3.0.0.v201112011016.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-continuation-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-http-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-io-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/jars/jetty-jndi-8.1.14.v20131031.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.mail.glassfish-1.4.1.v201005082020.jar:/Users/imran/spark/spark/lib_managed/orbits/javax.activation-1.1.0.v201105071233.jar:/Users/imran/spark/spark/lib_managed/jars/commons-lang3-3.3.2.jar:/Users/imran/spark/spark/lib_managed/jars/jsr305-1.3.9.jar:/Users/imran/spark/spark/lib_managed/jars/slf

Re: OOM when making bins in BinaryClassificationMetrics ?

2014-11-02 Thread Sean Owen
Agree, just rounding only makes sense if the values are sort of evenly
distributed -- in my case they were in 0,1. I will put it on my to-do
list to look at, yes. Thanks for the confirmation.

On Sun, Nov 2, 2014 at 7:44 PM, Xiangrui Meng  wrote:
> Yes, if there are many distinct values, we need binning to compute the
> AUC curve. Usually, the scores are not evenly distribution, we cannot
> simply truncate the digits. Estimating the quantiles for binning is
> necessary, similar to RangePartitioner:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L104
> . Limiting the number of bins is definitely useful. Do you have time
> to work on it? -Xiangrui
>
> On Sun, Nov 2, 2014 at 9:34 AM, Sean Owen  wrote:
>> This might be a question for Xiangrui. Recently I was using
>> BinaryClassificationMetrics to build an AUC curve for a classifier
>> over a reasonably large number of points (~12M). The scores were all
>> probabilities, so tended to be almost entirely unique.
>>
>> The computation does some operations by key, and this ran out of
>> memory. It's something you can solve with more than the default amount
>> of memory, but in this case, it seemed unuseful to create an AUC curve
>> with such fine-grained resolution.
>>
>> I ended up just binning the scores so there were ~1000 unique values
>> and then it was fine.
>>
>> Does that sound generally useful as some kind of parameter? or am I
>> missing a trick here.
>>
>> Sean
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: OOM when making bins in BinaryClassificationMetrics ?

2014-11-02 Thread Xiangrui Meng
Yes, if there are many distinct values, we need binning to compute the
AUC curve. Usually, the scores are not evenly distribution, we cannot
simply truncate the digits. Estimating the quantiles for binning is
necessary, similar to RangePartitioner:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L104
. Limiting the number of bins is definitely useful. Do you have time
to work on it? -Xiangrui

On Sun, Nov 2, 2014 at 9:34 AM, Sean Owen  wrote:
> This might be a question for Xiangrui. Recently I was using
> BinaryClassificationMetrics to build an AUC curve for a classifier
> over a reasonably large number of points (~12M). The scores were all
> probabilities, so tended to be almost entirely unique.
>
> The computation does some operations by key, and this ran out of
> memory. It's something you can solve with more than the default amount
> of memory, but in this case, it seemed unuseful to create an AUC curve
> with such fine-grained resolution.
>
> I ended up just binning the scores so there were ~1000 unique values
> and then it was fine.
>
> Does that sound generally useful as some kind of parameter? or am I
> missing a trick here.
>
> Sean
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



OOM when making bins in BinaryClassificationMetrics ?

2014-11-02 Thread Sean Owen
This might be a question for Xiangrui. Recently I was using
BinaryClassificationMetrics to build an AUC curve for a classifier
over a reasonably large number of points (~12M). The scores were all
probabilities, so tended to be almost entirely unique.

The computation does some operations by key, and this ran out of
memory. It's something you can solve with more than the default amount
of memory, but in this case, it seemed unuseful to create an AUC curve
with such fine-grained resolution.

I ended up just binning the scores so there were ~1000 unique values
and then it was fine.

Does that sound generally useful as some kind of parameter? or am I
missing a trick here.

Sean

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org