Thanks Erik. I saw the document too. That is why I am confused because as
per the article, it should be good as long as *foo *is serializable.
However, what I have seen is that it would work if *testing* is
serializable, even foo is not serializable, as shown below. I don't know if
there is something specific to Spark.
For example, the code example below works.
object testing extends Serializable {
object foo {
val v = 42
}
val list = List(1,2,3)
val rdd = sc.parallelize(list)
def func = {
val after = rdd.foreachPartition {
it => println(foo.v)
}
}
}
On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <[email protected]> wrote:
> I think you have stumbled across this idiosyncrasy:
>
>
> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>
>
>
>
> ----- Original Message -----
> > I am not sure this is more of a question for Spark or just Scala but I am
> > posting my question here.
> >
> > The code snippet below shows an example of passing a reference to a
> closure
> > in rdd.foreachPartition method.
> >
> > ```
> > object testing {
> > object foo extends Serializable {
> > val v = 42
> > }
> > val list = List(1,2,3)
> > val rdd = sc.parallelize(list)
> > def func = {
> > val after = rdd.foreachPartition {
> > it => println(foo.v)
> > }
> > }
> > }
> > ```
> > When running this code, I got an exception
> >
> > ```
> > Caused by: java.io.NotSerializableException:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
> > Serialization stack:
> > - object not serializable (class:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
> > - field (class:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> > name: $outer, type: class
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
> > - object (class
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> > <function1>)
> > ```
> >
> > It looks like Spark needs to serialize `testing` object. Why is it
> > serializing testing even though I only pass foo (another serializable
> > object) in the closure?
> >
> > A more general question is, how can I prevent Spark from serializing the
> > parent class where RDD is defined, with still support of passing in
> > function defined in other classes?
> >
> > --
> > Chen Song
> >
>
--
Chen Song