A closure keeps a reference to its owner/thisObject, which is in your case the script. The script is not serializable. If you dehydrate the closure (call closure.dehydrate()) it will not keep a reference to the script anymore and it should be serializable.
2015-07-26 11:57 GMT+02:00 Jeff MAURY <jeffma...@jeffmaury.com>: > So it may be an object stored in your task that is not > > Jeff > > Le 26 juil. 2015 11:42, "tog" <guillaume.all...@gmail.com> a écrit : >> >> Thanks Jeff for your quick answer. >> >> Yes, the tasks shall be serializable and I believe they are. >> >> My test script has 2 tasks (doing the same job) one is a closure, the >> other is a org.apache.spark.api.java.function.Function - and according to a >> small test in my script both are serializable for Java/Groovy. >> >> I am a bit puzzled/stuck here. >> >> On 26 July 2015 at 10:34, Jeff MAURY <jeffma...@jeffmaury.com> wrote: >>> >>> Spark is distribution tasks on cluster nodes so the task needs to be >>> serializable. Appears that you task is a Groovy closure so you must make it >>> serializable. >>> >>> Jeff >>> >>> On Sun, Jul 26, 2015 at 11:12 AM, tog <guillaume.all...@gmail.com> wrote: >>>> >>>> Hi >>>> >>>> I am starting to play with Apache Spark using groovy. I have a small >>>> script that I use for that purpose. >>>> >>>> When the script is transformed in a class and launched with java, this >>>> is working fine but it fails when run as a script. >>>> >>>> Any idea what I am doing wrong ? May be some of you have already come >>>> accros that problem. >>>> >>>> $ groovy -version >>>> >>>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac >>>> OS X >>>> >>>> $ groovy GroovySparkWordcount.groovy >>>> >>>> class org.apache.spark.api.java.JavaRDD >>>> >>>> true >>>> >>>> true >>>> >>>> Caught: org.apache.spark.SparkException: Task not serializable >>>> >>>> org.apache.spark.SparkException: Task not serializable >>>> >>>> at >>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315) >>>> >>>> at >>>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305) >>>> >>>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132) >>>> >>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893) >>>> >>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311) >>>> >>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310) >>>> >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) >>>> >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >>>> >>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>>> >>>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310) >>>> >>>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78) >>>> >>>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source) >>>> >>>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27) >>>> >>>> Caused by: java.io.NotSerializableException: GroovySparkWordcount >>>> >>>> Serialization stack: >>>> >>>> - object not serializable (class: GroovySparkWordcount, value: >>>> GroovySparkWordcount@57c6feea) >>>> >>>> - field (class: GroovySparkWordcount$1, name: this$0, type: class >>>> GroovySparkWordcount) >>>> >>>> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78) >>>> >>>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, >>>> name: f$1, type: interface org.apache.spark.api.java.function.Function) >>>> >>>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, >>>> <function1>) >>>> >>>> at >>>> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) >>>> >>>> at >>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) >>>> >>>> at >>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81) >>>> >>>> at >>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312) >>>> >>>> ... 12 more >>>> >>>> >>> >>> >>> >>> -- >>> Jeff MAURY >>> >>> >>> "Legacy code" often differs from its suggested alternative by actually >>> working and scaling. >>> - Bjarne Stroustrup >>> >>> http://www.jeffmaury.com >>> http://riadiscuss.jeffmaury.com >>> http://www.twitter.com/jeffmaury >> >> >> >> >> -- >> PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net