Thanks Jeff for your quick answer. Yes, the tasks shall be serializable and I believe they are.
My test script has 2 tasks (doing the same job) one is a closure, the other is a org.apache.spark.api.java.function.Function - and according to a small test in my script both are serializable for Java/Groovy. I am a bit puzzled/stuck here. On 26 July 2015 at 10:34, Jeff MAURY <jeffma...@jeffmaury.com> wrote: > Spark is distribution tasks on cluster nodes so the task needs to be > serializable. Appears that you task is a Groovy closure so you must make it > serializable. > > Jeff > > On Sun, Jul 26, 2015 at 11:12 AM, tog <guillaume.all...@gmail.com> wrote: > >> Hi >> >> I am starting to play with Apache Spark using groovy. I have a small >> script <https://gist.github.com/galleon/d6540327c418aa8a479f> that I use >> for that purpose. >> >> When the script is transformed in a class and launched with java, this is >> working fine but it fails when run as a script. >> >> Any idea what I am doing wrong ? May be some of you have already come >> accros that problem. >> >> $ groovy -version >> >> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac OS >> X >> >> $ groovy GroovySparkWordcount.groovy >> >> class org.apache.spark.api.java.JavaRDD >> >> true >> >> true >> >> Caught: org.apache.spark.SparkException: Task not serializable >> >> org.apache.spark.SparkException: Task not serializable >> >> at >> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315) >> >> at >> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305) >> >> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132) >> >> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893) >> >> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311) >> >> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310) >> >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) >> >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >> >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >> >> at org.apache.spark.rdd.RDD.filter(RDD.scala:310) >> >> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78) >> >> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source) >> >> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27) >> >> Caused by: java.io.NotSerializableException: GroovySparkWordcount >> >> Serialization stack: >> >> - object not serializable (class: GroovySparkWordcount, value: >> GroovySparkWordcount@57c6feea) >> >> - field (class: GroovySparkWordcount$1, name: this$0, type: class >> GroovySparkWordcount) >> >> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78) >> >> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, >> name: f$1, type: interface org.apache.spark.api.java.function.Function) >> >> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, >> <function1>) >> >> at >> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) >> >> at >> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) >> >> at >> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81) >> >> at >> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312) >> >> ... 12 more >> >> >> > > > -- > Jeff MAURY > > > "Legacy code" often differs from its suggested alternative by actually > working and scaling. > - Bjarne Stroustrup > > http://www.jeffmaury.com > http://riadiscuss.jeffmaury.com > http://www.twitter.com/jeffmaury > -- PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net