Hi Tobias,

One hack you can try is:

rdd.mapPartitions(iter => {
  val x = new X()
  iter.map(row => x.doSomethingWith(row)) ++ { x.shutdown(); Iterator.empty }
})

Best,
Xiangrui

On Thu, May 29, 2014 at 11:38 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:
> Hi,
>
> I want to use an object x in my RDD processing as follows:
>
> val x = new X()
> rdd.map(row => x.doSomethingWith(row))
> println(rdd.count())
> x.shutdown()
>
> Now the problem is that X is non-serializable, so while this works
> locally, it does not work in cluster setup. I thought I could do
>
> rdd.mapPartitions(iter => {
>   val x = new X()
>   val result = iter.map(row => x.doSomethingWith(row))
>   x.shutdown()
>   result
> })
>
> to create an instance of X locally, but obviously x.shutdown() is
> called before the first row is processed.
>
> How can I specify these node-local setup/teardown functions or how do
> I deal in general with non-serializable classes?
>
> Thanks
> Tobias

Reply via email to