Spark is distribution tasks on cluster nodes so the task needs to be serializable. Appears that you task is a Groovy closure so you must make it serializable.
Jeff On Sun, Jul 26, 2015 at 11:12 AM, tog <guillaume.all...@gmail.com> wrote: > Hi > > I am starting to play with Apache Spark using groovy. I have a small > script <https://gist.github.com/galleon/d6540327c418aa8a479f> that I use > for that purpose. > > When the script is transformed in a class and launched with java, this is > working fine but it fails when run as a script. > > Any idea what I am doing wrong ? May be some of you have already come > accros that problem. > > $ groovy -version > > Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac OS X > > $ groovy GroovySparkWordcount.groovy > > class org.apache.spark.api.java.JavaRDD > > true > > true > > Caught: org.apache.spark.SparkException: Task not serializable > > org.apache.spark.SparkException: Task not serializable > > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315) > > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305) > > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132) > > at org.apache.spark.SparkContext.clean(SparkContext.scala:1893) > > at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311) > > at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > > at org.apache.spark.rdd.RDD.filter(RDD.scala:310) > > at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78) > > at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source) > > at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27) > > Caused by: java.io.NotSerializableException: GroovySparkWordcount > > Serialization stack: > > - object not serializable (class: GroovySparkWordcount, value: > GroovySparkWordcount@57c6feea) > > - field (class: GroovySparkWordcount$1, name: this$0, type: class > GroovySparkWordcount) > > - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78) > > - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, name: > f$1, type: interface org.apache.spark.api.java.function.Function) > > - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, > <function1>) > > at > org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) > > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) > > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81) > > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312) > > ... 12 more > > > -- Jeff MAURY "Legacy code" often differs from its suggested alternative by actually working and scaling. - Bjarne Stroustrup http://www.jeffmaury.com http://riadiscuss.jeffmaury.com http://www.twitter.com/jeffmaury