[ https://issues.apache.org/jira/browse/SPARK-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-6520. ------------------------------ Resolution: Won't Fix Yes, I think this is a function of how {{:paste}}d code is evaluated and how that interacts with what Kryo expects. I don't know that it's realistic to expect that changes; spark-shell is just quite different in how classes are defined on the fly. You can run a compiled program and you can separately paste your class definitions first if you had to. > Kyro serialization broken in the shell > -------------------------------------- > > Key: SPARK-6520 > URL: https://issues.apache.org/jira/browse/SPARK-6520 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Affects Versions: 1.3.0 > Reporter: Aaron Defazio > > If I start spark as follows: > {quote} > ~/spark-1.3.0-bin-hadoop2.4/bin/spark-shell --master local[1] --conf > "spark.serializer=org.apache.spark.serializer.KryoSerializer" > {quote} > Then using :paste, run > {quote} > case class Example(foo : String, bar : String) > val ex = sc.parallelize(List(Example("foo1", "bar1"), Example("foo2", > "bar2"))).collect() > {quote} > I get the error: > {quote} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 > (TID 0, localhost): java.io.IOException: > com.esotericsoftware.kryo.KryoException: Error constructing instance of > class: $line3.$read > Serialization trace: > $VAL10 ($iwC) > $outer ($iwC$$iwC) > $outer ($iwC$$iwC$Example) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140) > at > org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:979) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1873) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > {quote} > As far as I can tell, when using :paste, Kyro serialization doesn't work for > classes defined in within the same paste. It does work when the statements > are entered without paste. > This issue seems serious to me, since Kyro serialization is virtually > mandatory for performance (20x slower with default serialization on my > problem), and I'm assuming feature parity between spark-shell and > spark-submit is a goal. > Note that this is different from SPARK-6497, which covers the case when Kyro > is set to require registration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org