You should really only be concerned with serialization when it comes to the values you use in tuples. If you are using all primitives, then you shouldn't have anything to worry about. If you are passing custom objects around in tuples, then you should consider registering a custom ryo serializer for those objects, and keep them immutable if possible.
In terms of instance variables of components, if they are not serializable, then they should be marked as transient and initialized in the prepare() method of the component. So if you have "heavy" resources like a database connection, guava cache, etc., initialize those in the prepare() method and keep everything as stateless and idempotent as possible for your use case. Keep in mind that the Topology object created when you call the build()method is going to be serialized, including all the components in the topology graph. A Storm topology is really only a static, serializable data structure (Thrift) containing instructions telling Storm how to deploy and run it in a cluster. Only when it is deployed (and serialized in the process) and initialized (i.e. prepare() and other life cycle methods are called on components) does it really do anything in terms of message processing. Hope this helps. -Taylor > On Jun 25, 2014, at 8:38 PM, "Cody A. Ray" <[email protected]> wrote: > > I have a serialization question. > > As long as your tuple values are all primitive/built-in types (strings, byte > arrays, longs, etc), is Java serialization only used at (re)build time (i.e., > when the nimbus distributes the code of your topology to the supervisors, and > when they create worker jvms)? > > Another way of asking: will using a heavy library (e.g., guava or apache > commons) in your trident functions (but not tuple types) incur a runtime > overhead (other than when you lose a worker/supervisor)? > > -Cody > > -- > Cody A. Ray, LEED AP > [email protected] > 215.501.7891
