When I use Spark Streaming for real time analytics, there is a limitation that 
I encounter....

Scenario -
I have a third party class and have to use some APIs from that class.
I invoke the object once in the driver method and pass this object to the map 
method and use the function of the object inside "call" method....
I am able to do this, if the class is serializable... If it's not, I am forced 
to create that object inside the call method itself which is a heavy operation 
because the constructor is pretty heavy... Remember that I am doing a real time 
analytics, so the number of times this would get invoked is very high and 
frequent ... ( and since the class is a part of third party jar, making it 
Serializable is not convenient and may not be possible at all )..

I know the reason for the need of serializabilty in Spark, but is there a way 
to get over the above limitation ( keeping Serialization intact)  ... If you 
see, Storm does provide a way to handle this by providing a "prepare" function 
in a bolt, where I can create the object only once... If not, I think, it could 
be a very useful enhancement to have ( if possible )..

Pls let me know

Thx
pranay

________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Reply via email to