Re: Storm patterns vis-a-vis external data storage

Jens-U. Mozdzen Wed, 07 Jan 2015 08:11:36 -0800

Hi Hemanth,

Zitat von Hemanth Yamijala <[email protected]>

Hi all,
I guess it is common to build topologies where message processing instorm results in data that should be stored in external stores likeNoSQL DBs or message queues like Kafka.
There are two broad approaches to handle this storage:
1) Inline the storage functionality with the processingfunctionality - i.e. the bolt generating the info to be stored alsotakes care of storing it.2) Separate out the two and make a downstream bolt responsible forthe storage.
Just wanted to see if people on the list think if there areadvantages to favour one approach over the other. Any pitfalls totake care of in one case over the other.

I'd say: it depends ;) In case of aggregation bolts that persist theirstates, you may want to limit the memory footprint of each boltinstance. Thus implementing an in-mem cache for persisted data ispretty helpful, but means to incorporate persistence access per-bolt.

OTOH, if you plan to "export" data from your topology (which seems tobe the main focus of your question), separating calculation and"export" into separate bolts seems a natural choice to me - especiallywhen you consider future changes (i.e. to support a different orpossibly *additional* export paths - you can keep the "tupleinterface" as it is and simply connect different and/or additionalexport bolts).


Regards,
Jens

Re: Storm patterns vis-a-vis external data storage

Reply via email to