Is there any workaround to distribute non-serializable object for RDD
transformation or broadcast variable ?
Say I have an object of class C which is not serializable. Class C is in a
jar package, I have no control on it. Now I need to distribute it either by
rdd transformation or by broadcast.
If the object cannot be serialized, then I don't think broadcast will make
it magically serializable. You can't transfer data structures between nodes
without serializing them somehow.
On Fri, Aug 7, 2015 at 7:31 AM, Sujit Pal sujitatgt...@gmail.com wrote:
Hi Hao,
I think sc.broadcast will
If the object is something like an utility object (say a DB connection
handler), I often use:
@transient lazy val someObj = MyFactory.getObj(...)
So basically `@transient` tell the closure cleaner don't serialize this,
and the `lazy val` allows it to be initiated on each executor upon its
Hao,
I’d say there are few possible ways to achieve that:
1. Use KryoSerializer.
The flaw of KryoSerializer is that current version (2.21) has an issue with
internal state and it might not work for some objects. Spark get kryo
dependency as transitive through chill and it’ll not be resolved