Itai & Jens, Thank you for sharing your thoughts. My requirement is what Jens has referred to as "export" data from my topology outside.
I can clearly see the benefits of segregating this functionality to another bolt - for e.g. to scale it independently of the processing bolts, or for accommodating changes. The only negative (if it is that) seems to be the increase in number of runtime bolt instances in the topology. I understand that it can be solved with more hardware resources and the horizontal scalability of Storm. Also, it might be hard to quantify this precisely, given the different scaling requirements for processing and I/O bound bolts. Do you see this as a concern ? Thanks hemanth On Wed, Jan 7, 2015 at 9:39 PM, Jens-U. Mozdzen <[email protected]> wrote: > Hi Hemanth, > > Zitat von Hemanth Yamijala <[email protected]> > >> Hi all, >> >> I guess it is common to build topologies where message processing in >> storm results in data that should be stored in external stores like NoSQL >> DBs or message queues like Kafka. >> >> There are two broad approaches to handle this storage: >> >> 1) Inline the storage functionality with the processing functionality - >> i.e. the bolt generating the info to be stored also takes care of storing >> it. >> 2) Separate out the two and make a downstream bolt responsible for the >> storage. >> >> Just wanted to see if people on the list think if there are advantages to >> favour one approach over the other. Any pitfalls to take care of in one >> case over the other. >> > > I'd say: it depends ;) In case of aggregation bolts that persist their > states, you may want to limit the memory footprint of each bolt instance. > Thus implementing an in-mem cache for persisted data is pretty helpful, but > means to incorporate persistence access per-bolt. > > OTOH, if you plan to "export" data from your topology (which seems to be > the main focus of your question), separating calculation and "export" into > separate bolts seems a natural choice to me - especially when you consider > future changes (i.e. to support a different or possibly *additional* export > paths - you can keep the "tuple interface" as it is and simply connect > different and/or additional export bolts). > > Regards, > Jens > >
