Hi Hemanth,
Zitat von Hemanth Yamijala <[email protected]>
Hi all,
I guess it is common to build topologies where message processing in
storm results in data that should be stored in external stores like
NoSQL DBs or message queues like Kafka.
There are two broad approaches to handle this storage:
1) Inline the storage functionality with the processing
functionality - i.e. the bolt generating the info to be stored also
takes care of storing it.
2) Separate out the two and make a downstream bolt responsible for
the storage.
Just wanted to see if people on the list think if there are
advantages to favour one approach over the other. Any pitfalls to
take care of in one case over the other.
I'd say: it depends ;) In case of aggregation bolts that persist their
states, you may want to limit the memory footprint of each bolt
instance. Thus implementing an in-mem cache for persisted data is
pretty helpful, but means to incorporate persistence access per-bolt.
OTOH, if you plan to "export" data from your topology (which seems to
be the main focus of your question), separating calculation and
"export" into separate bolts seems a natural choice to me - especially
when you consider future changes (i.e. to support a different or
possibly *additional* export paths - you can keep the "tuple
interface" as it is and simply connect different and/or additional
export bolts).
Regards,
Jens