Hi, I want to use an object x in my RDD processing as follows:
val x = new X() rdd.map(row => x.doSomethingWith(row)) println(rdd.count()) x.shutdown() Now the problem is that X is non-serializable, so while this works locally, it does not work in cluster setup. I thought I could do rdd.mapPartitions(iter => { val x = new X() val result = iter.map(row => x.doSomethingWith(row)) x.shutdown() result }) to create an instance of X locally, but obviously x.shutdown() is called before the first row is processed. How can I specify these node-local setup/teardown functions or how do I deal in general with non-serializable classes? Thanks Tobias