I think you have stumbled across this idiosyncrasy:

http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/




----- Original Message -----
> I am not sure this is more of a question for Spark or just Scala but I am
> posting my question here.
> 
> The code snippet below shows an example of passing a reference to a closure
> in rdd.foreachPartition method.
> 
> ```
> object testing {
>     object foo extends Serializable {
>       val v = 42
>     }
>     val list = List(1,2,3)
>     val rdd = sc.parallelize(list)
>     def func = {
>       val after = rdd.foreachPartition {
>         it => println(foo.v)
>       }
>     }
>   }
> ```
> When running this code, I got an exception
> 
> ```
> Caused by: java.io.NotSerializableException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
> Serialization stack:
> - object not serializable (class:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
> - field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
> - object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> <function1>)
> ```
> 
> It looks like Spark needs to serialize `testing` object. Why is it
> serializing testing even though I only pass foo (another serializable
> object) in the closure?
> 
> A more general question is, how can I prevent Spark from serializing the
> parent class where RDD is defined, with still support of passing in
> function defined in other classes?
> 
> --
> Chen Song
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to