Hi Chen, I believe the issue is that `object foo` is a member of `object testing`, so the only way to access `object foo` is to first pull `object testing` into the closure, then access a pointer to get to `object foo`. There are two workarounds that I'm aware of:
(1) Move `object foo` outside of `object testing`. This is only a problem because of the nested objects. Also, by design it's simpler to reason about but that's a separate discussion. (2) Create a local variable for `foo.v`. If all your closure cares about is the integer, then it makes sense to add a `val v = foo.v` inside `func` and use this in your closure instead. This avoids pulling in $outer pointers into your closure at all since it only references local variables. As others have commented, I think this is more of a Scala problem than a Spark one. Let me know if these work, -Andrew 2015-07-09 13:36 GMT-07:00 Richard Marscher <rmarsc...@localytics.com>: > Reading that article and applying it to your observations of what happens > at runtime: > > shouldn't the closure require serializing testing? The foo singleton > object is a member of testing, and then you call this foo value in the > closure func and further in the foreachPartition closure. So following by > that article, Scala will attempt to serialize the containing object/class > testing to get the foo instance. > > On Thu, Jul 9, 2015 at 4:11 PM, Chen Song <chen.song...@gmail.com> wrote: > >> Repost the code example, >> >> object testing extends Serializable { >> object foo { >> val v = 42 >> } >> val list = List(1,2,3) >> val rdd = sc.parallelize(list) >> def func = { >> val after = rdd.foreachPartition { >> it => println(foo.v) >> } >> } >> } >> >> On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <chen.song...@gmail.com> wrote: >> >>> Thanks Erik. I saw the document too. That is why I am confused because >>> as per the article, it should be good as long as *foo *is serializable. >>> However, what I have seen is that it would work if *testing* is >>> serializable, even foo is not serializable, as shown below. I don't know if >>> there is something specific to Spark. >>> >>> For example, the code example below works. >>> >>> object testing extends Serializable { >>> >>> object foo { >>> >>> val v = 42 >>> >>> } >>> >>> val list = List(1,2,3) >>> >>> val rdd = sc.parallelize(list) >>> >>> def func = { >>> >>> val after = rdd.foreachPartition { >>> >>> it => println(foo.v) >>> >>> } >>> >>> } >>> >>> } >>> >>> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <e...@redhat.com> wrote: >>> >>>> I think you have stumbled across this idiosyncrasy: >>>> >>>> >>>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/ >>>> >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> > I am not sure this is more of a question for Spark or just Scala but >>>> I am >>>> > posting my question here. >>>> > >>>> > The code snippet below shows an example of passing a reference to a >>>> closure >>>> > in rdd.foreachPartition method. >>>> > >>>> > ``` >>>> > object testing { >>>> > object foo extends Serializable { >>>> > val v = 42 >>>> > } >>>> > val list = List(1,2,3) >>>> > val rdd = sc.parallelize(list) >>>> > def func = { >>>> > val after = rdd.foreachPartition { >>>> > it => println(foo.v) >>>> > } >>>> > } >>>> > } >>>> > ``` >>>> > When running this code, I got an exception >>>> > >>>> > ``` >>>> > Caused by: java.io.NotSerializableException: >>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$ >>>> > Serialization stack: >>>> > - object not serializable (class: >>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value: >>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824) >>>> > - field (class: >>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, >>>> > name: $outer, type: class >>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$) >>>> > - object (class >>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, >>>> > <function1>) >>>> > ``` >>>> > >>>> > It looks like Spark needs to serialize `testing` object. Why is it >>>> > serializing testing even though I only pass foo (another serializable >>>> > object) in the closure? >>>> > >>>> > A more general question is, how can I prevent Spark from serializing >>>> the >>>> > parent class where RDD is defined, with still support of passing in >>>> > function defined in other classes? >>>> > >>>> > -- >>>> > Chen Song >>>> > >>>> >>> >>> >>> >>> -- >>> Chen Song >>> >>> >> >> >> -- >> Chen Song >> >> > > > -- > -- > *Richard Marscher* > Software Engineer > Localytics > Localytics.com <http://localytics.com/> | Our Blog > <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> | > Facebook <http://facebook.com/localytics> | LinkedIn > <http://www.linkedin.com/company/1148792?trk=tyah> >