Hi Hemanth,

Zitat von Hemanth Yamijala <[email protected]>
Hi all,

I guess it is common to build topologies where message processing in storm results in data that should be stored in external stores like NoSQL DBs or message queues like Kafka.

There are two broad approaches to handle this storage:

1) Inline the storage functionality with the processing functionality - i.e. the bolt generating the info to be stored also takes care of storing it. 2) Separate out the two and make a downstream bolt responsible for the storage.

Just wanted to see if people on the list think if there are advantages to favour one approach over the other. Any pitfalls to take care of in one case over the other.

I'd say: it depends ;) In case of aggregation bolts that persist their states, you may want to limit the memory footprint of each bolt instance. Thus implementing an in-mem cache for persisted data is pretty helpful, but means to incorporate persistence access per-bolt.

OTOH, if you plan to "export" data from your topology (which seems to be the main focus of your question), separating calculation and "export" into separate bolts seems a natural choice to me - especially when you consider future changes (i.e. to support a different or possibly *additional* export paths - you can keep the "tuple interface" as it is and simply connect different and/or additional export bolts).

Regards,
Jens

Reply via email to