TaskNotSerializableException when running through Spark shell

2014-10-16 Thread Akshat Aranya
Hi,

Can anyone explain how things get captured in a closure when runing through
the REPL.  For example:

def foo(..) = { .. }

rdd.map(foo)

sometimes complains about classes not being serializable that are
completely unrelated to foo.  This happens even when I write it such:

object Foo {
  def foo(..) = { .. }
}

rdd.map(Foo.foo)

It also doesn't happen all the time.


Re: TaskNotSerializableException when running through Spark shell

2014-10-16 Thread Jimmy McErlain
I actually only ran into this issue recently after we upgraded to Spark
1.1.  Within the REPL for Spark 1.0 everything works fine but within the
REPL for 1.1, it is not.  FYI I am also only doing simple regex matching
functions within an RDD... Now when I am running the same code as App
everything is working fine... it leads me to believe that it is a bug
within the REPL for 1.1

Can anyone else confirm this?

ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Oct 16, 2014 at 7:56 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 Can anyone explain how things get captured in a closure when runing
 through the REPL.  For example:

 def foo(..) = { .. }

 rdd.map(foo)

 sometimes complains about classes not being serializable that are
 completely unrelated to foo.  This happens even when I write it such:

 object Foo {
   def foo(..) = { .. }
 }

 rdd.map(Foo.foo)

 It also doesn't happen all the time.