Repost the code example,
object testing extends Serializable {
object foo {
val v = 42
}
val list = List(1,2,3)
val rdd = sc.parallelize(list)
def func = {
val after = rdd.foreachPartition {
it => println(foo.v)
}
}
}
On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <[email protected]> wrote:
> Thanks Erik. I saw the document too. That is why I am confused because as
> per the article, it should be good as long as *foo *is serializable.
> However, what I have seen is that it would work if *testing* is
> serializable, even foo is not serializable, as shown below. I don't know if
> there is something specific to Spark.
>
> For example, the code example below works.
>
> object testing extends Serializable {
>
> object foo {
>
> val v = 42
>
> }
>
> val list = List(1,2,3)
>
> val rdd = sc.parallelize(list)
>
> def func = {
>
> val after = rdd.foreachPartition {
>
> it => println(foo.v)
>
> }
>
> }
>
> }
>
> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <[email protected]> wrote:
>
>> I think you have stumbled across this idiosyncrasy:
>>
>>
>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>>
>>
>>
>>
>> ----- Original Message -----
>> > I am not sure this is more of a question for Spark or just Scala but I
>> am
>> > posting my question here.
>> >
>> > The code snippet below shows an example of passing a reference to a
>> closure
>> > in rdd.foreachPartition method.
>> >
>> > ```
>> > object testing {
>> > object foo extends Serializable {
>> > val v = 42
>> > }
>> > val list = List(1,2,3)
>> > val rdd = sc.parallelize(list)
>> > def func = {
>> > val after = rdd.foreachPartition {
>> > it => println(foo.v)
>> > }
>> > }
>> > }
>> > ```
>> > When running this code, I got an exception
>> >
>> > ```
>> > Caused by: java.io.NotSerializableException:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
>> > Serialization stack:
>> > - object not serializable (class:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
>> > - field (class:
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>> > name: $outer, type: class
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
>> > - object (class
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>> > <function1>)
>> > ```
>> >
>> > It looks like Spark needs to serialize `testing` object. Why is it
>> > serializing testing even though I only pass foo (another serializable
>> > object) in the closure?
>> >
>> > A more general question is, how can I prevent Spark from serializing the
>> > parent class where RDD is defined, with still support of passing in
>> > function defined in other classes?
>> >
>> > --
>> > Chen Song
>> >
>>
>
>
>
> --
> Chen Song
>
>
--
Chen Song