You should really only be concerned with serialization when it comes to the 
values you use in tuples. If you are using all primitives, then you shouldn't 
have anything to worry about. If you are passing custom objects around in 
tuples, then you should consider registering a custom ryo serializer for those 
objects, and keep them immutable if possible.

In terms of instance variables of components, if they are not serializable, 
then they should be marked as transient and initialized in the prepare() method 
of the component.

So if you have "heavy" resources like a database connection, guava cache, etc., 
initialize those in the prepare() method and keep everything as stateless and 
idempotent as possible for your use case.

Keep in mind that the Topology object created when you call the build()method  
is going to be serialized, including all the components in the topology graph.

A Storm topology is really only a static, serializable data structure (Thrift) 
containing instructions telling Storm how to deploy and run it in a cluster. 
Only when it is deployed (and serialized in the process) and initialized (i.e. 
prepare() and other life cycle methods are called on components) does it really 
do anything in terms of message processing.

Hope this helps.

-Taylor

> On Jun 25, 2014, at 8:38 PM, "Cody A. Ray" <[email protected]> wrote:
> 
> I have a serialization question.
> 
> As long as your tuple values are all primitive/built-in types (strings, byte 
> arrays, longs, etc), is Java serialization only used at (re)build time (i.e., 
> when the nimbus distributes the code of your topology to the supervisors, and 
> when they create worker jvms)?
> 
> Another way of asking: will using a heavy library (e.g., guava or apache 
> commons) in your trident functions (but not tuple types) incur a runtime 
> overhead (other than when you lose a worker/supervisor)?
> 
> -Cody
> 
> -- 
> Cody A. Ray, LEED AP
> [email protected]
> 215.501.7891

Reply via email to