This could probably also be done just doing the initialization in an Object so it will be done on each node when the jar is loaded.
On Tue, Nov 12, 2013 at 10:44 PM, Jason Lenderman <[email protected]>wrote: > You probably want to look at the mapPartition method of RDD. The usage > might look something like: > > data mapPartition {iter => > val o = new Expensive() > for (x <- iter) { > o.foo(x) > } > } > > Note that each split of data is processed using a single instance of > Expensive. > > Another method of RDD that you want to be aware of is > mapPartitionWithIndex. This method enables you to use the index of the > split in your transformation of the Iterator. > > > On Tue, Nov 12, 2013 at 7:43 PM, Pranay Tonpay < > [email protected]> wrote: > >> When I use Spark Streaming for real time analytics, there is a >> limitation that I encounter…. >> >> >> >> *Scenario –* >> >> I have a third party class and have to use some APIs from that class. >> >> I invoke the object once in the driver method and pass this object to the >> map method and use the function of the object inside “call” method…. >> >> I am able to do this, if the class is serializable… If it’s not, I am >> forced to create that object inside the call method itself which is a heavy >> operation because the constructor is pretty heavy… Remember that I am doing >> a real time analytics, so the number of times this would get invoked is >> very high and frequent … ( and since the class is a part of third party >> jar, making it Serializable is not convenient and may not be possible at >> all ).. >> >> >> >> I know the reason for the need of serializabilty in Spark, but is there a >> way to get over the above limitation ( keeping Serialization intact) … If >> you see, Storm does provide a way to handle this by providing a “prepare” >> function in a bolt, where I can create the object only once… If not, I >> think, it could be a very useful enhancement to have ( if possible ).. >> >> >> >> Pls let me know >> >> >> >> Thx >> >> pranay >> >> ------------------------------ >> >> >> >> >> >> >> NOTE: This message may contain information that is confidential, >> proprietary, privileged or otherwise protected by law. The message is >> intended solely for the named addressee. If received in error, please >> destroy and notify the sender. Any use of this email is prohibited when >> received in error. Impetus does not represent, warrant and/or guarantee, >> that the integrity of this communication has been maintained nor that the >> communication is free of errors, virus, interception or interference. >> > >
