DataFrame. SparkPlan / Project serialization issue: ArrayIndexOutOfBounds.

2015-08-21 Thread Eugene Morozov
Hi, I'm using spark 1.3.1 built against hadoop 1.0.4 and java 1.7 and I'm trying to save my data frame to parquet. The issue I'm stuck looks like serialization tries to do pretty weird thing: tries to write to an empty array. The last (through stack trace) line of spark code that leads to

Tungsten and sun.misc.Unsafe

2015-08-21 Thread Marek Kolodziej
Hello, I attended the Tungsten-related presentations at Spark Summit (by Josh Rosen) and at Big Data Scala (by Matei Zaharia). Needless to say, this project holds great promise for major performance improvements. At Josh's talk, I heard about the use of sun.misc.Unsafe as a way of achieving some

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-21 Thread Sean Owen
Signatures, license, etc. look good. I'm getting some fairly consistent failures using Java 7 + Ubuntu 15 + -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 -- does anyone else see these? they are likely just test problems, but worth asking. Stack traces are at the end. There are currently 79

Re: DataFrame. SparkPlan / Project serialization issue: ArrayIndexOutOfBounds.

2015-08-21 Thread Reynold Xin
You've probably hit this bug: https://issues.apache.org/jira/browse/SPARK-7180 It's fixed in Spark 1.4.1+. Try setting spark.serializer.extraDebugInfo to false and see if it goes away. On Fri, Aug 21, 2015 at 3:37 AM, Eugene Morozov evgeny.a.moro...@gmail.com wrote: Hi, I'm using spark

RE: Dataframe aggregation with Tungsten unsafe

2015-08-21 Thread Ulanov, Alexander
I’ve made few experiments in different settings based on the same code that you used. 1)Created two datasets in hdfs on a cluster of 5 worker nodes and copied them to local fs: val size = 1 val partitions = 10 val repetitions = 5 val data = sc.parallelize(1 to size, partitions).map(x =

Re: Tungsten and sun.misc.Unsafe

2015-08-21 Thread Marek Kolodziej
Thanks Reynold, that helps a lot. I'm glad you're involved with that Google Doc community effort. I think it's because of that doc that the JEP's wording and scope changed for the better since it originally got introduced. Marek On Fri, Aug 21, 2015 at 11:18 AM, Reynold Xin r...@databricks.com

Re: Tungsten and sun.misc.Unsafe

2015-08-21 Thread Reynold Xin
I'm actually somewhat involved with the Google Docs you linked to. I don't think Oracle will remove Unsafe in JVM 9. As you said, JEP 260 already proposes making Unsafe available. Given the widespread use of Unsafe for performance and advanced functionalities, I don't think Oracle can just remove

Re: Tungsten and sun.misc.Unsafe

2015-08-21 Thread Steve Loughran
On 21 Aug 2015, at 05:29, Marek Kolodziej mkolod@gmail.commailto:mkolod@gmail.com wrote: I doubt that Oracle would want to make life difficult for everyone. In addition to Spark's code base, projects such as Akka, Cassandra, Hibernate, Netty, Neo4j and Spring (among many others)

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-21 Thread mkhaitman
Just a heads up that this RC1 release is still appearing as 1.5.0-SNAPSHOT (Not just me right..?) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC1-tp13780p13792.html Sent from the Apache Spark Developers List mailing

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-21 Thread Marcelo Vanzin
The pom files look correct, but this file is not: https://github.com/apache/spark/blob/4c56ad772637615cc1f4f88d619fac6c372c8552/core/src/main/scala/org/apache/spark/package.scala So, I guess, -1? On Fri, Aug 21, 2015 at 2:17 PM, mkhaitman mark.khait...@chango.com wrote: Just a heads up that

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-21 Thread Reynold Xin
Problem noted. Apparently the release script doesn't automate the replacement of all version strings yet. I'm going to publish a new RC over the weekend with the release version properly assigned. Please continue the testing and report any problems you find. Thanks! On Fri, Aug 21, 2015 at 2:20

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-21 Thread Ted Yu
I pointed hbase-spark module (in HBase project) to 1.5.0-rc1 and was able to build the module (with proper maven repo). FYI On Fri, Aug 21, 2015 at 2:17 PM, mkhaitman mark.khait...@chango.com wrote: Just a heads up that this RC1 release is still appearing as 1.5.0-SNAPSHOT (Not just me