I think you have stumbled across this idiosyncrasy:
http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
----- Original Message -----
> I am not sure this is more of a question for Spark or just Scala but I am
> posting my question here.
>
> The code snippet below shows an example of passing a reference to a closure
> in rdd.foreachPartition method.
>
> ```
> object testing {
> object foo extends Serializable {
> val v = 42
> }
> val list = List(1,2,3)
> val rdd = sc.parallelize(list)
> def func = {
> val after = rdd.foreachPartition {
> it => println(foo.v)
> }
> }
> }
> ```
> When running this code, I got an exception
>
> ```
> Caused by: java.io.NotSerializableException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
> Serialization stack:
> - object not serializable (class:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
> - field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
> - object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> <function1>)
> ```
>
> It looks like Spark needs to serialize `testing` object. Why is it
> serializing testing even though I only pass foo (another serializable
> object) in the closure?
>
> A more general question is, how can I prevent Spark from serializing the
> parent class where RDD is defined, with still support of passing in
> function defined in other classes?
>
> --
> Chen Song
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]