Hi,

I want to use an object x in my RDD processing as follows:

val x = new X()
rdd.map(row => x.doSomethingWith(row))
println(rdd.count())
x.shutdown()

Now the problem is that X is non-serializable, so while this works
locally, it does not work in cluster setup. I thought I could do

rdd.mapPartitions(iter => {
  val x = new X()
  val result = iter.map(row => x.doSomethingWith(row))
  x.shutdown()
  result
})

to create an instance of X locally, but obviously x.shutdown() is
called before the first row is processed.

How can I specify these node-local setup/teardown functions or how do
I deal in general with non-serializable classes?

Thanks
Tobias

Reply via email to