Re: Global Topology Cache

Bipin Prasad via user Sun, 07 Aug 2022 10:16:47 -0700

Does the Cassandra update trigger use a class that is generating events for a 
Storm Topology spout?If that is the case then the event could use 
fieldsGrouping instead of shuffleGrouping that it appears from your description.


A very minimal version topology builder code would be useful for the discussion.

Sent from Yahoo Mail for iPhone


On Sunday, August 7, 2022, 9:32 AM, Nadav Glickman 
<[email protected]> wrote:

Thanks for the quick answer!
Our DB is Cassandra. It's updated by Storm as well as by another service, and 
it's also read by both Storm and the other service. One of the main issues we 
are facing is when a table is updated by the service, Storm detects it in one 
of the executors, but the others remain oblivious to the change because the 
cache is not shared between them.The cache would be updated and read quite 
frequently and should be able to handle roughly 50-100k entries, each being a 
complex Java class. Naturally we're looking for minimal network overhead, which 
is why the caching solution as part of the topology would be most ideal.
How would you update all the executors but query with only one of them? Is 
there a way to partition data between them? Similar to Bolt grouping, but for 
the entire executor?
On Sun, Aug 7, 2022 at 6:59 PM Bipin Prasad via user <[email protected]> 
wrote:

Hello Nadav,
Is the database updated by some other data flow other that this topology? How 
are the database changes detected and the “cache” updated? The size of the 
cached data and update volume will also influence the design. 
Without knowing some crucial details of data, size, update frequency, natural 
partitioning, network speed, etc it is hard to give a general answer.
But assuming that you are looking at a storm topology to itself serve as a 
cache provider and cache is small, one “possible” way to do this would be to 
have the updates hit all executors, but queries hit one of the “many” executors.
Sent from Yahoo Mail for iPhone


On Sunday, August 7, 2022, 8:25 AM, Nadav Glickman 
<[email protected]> wrote:

Hi all,
We're looking for a caching solution to cache data reads from the DB and have 
it available to the entire topology.We need the data to be updated and the same 
for all the bolts, so we can't have the same cache split among different 
executors.Ideally we can have some in-memory solution within Storm. We tried an 
enum, and singletons, but they aren't shared between executors.
I know distributed caching DBs like memcached and Redis are viable options, but 
I'd really like to find a solution that won't require another machine and 
another piece of technology in our stack.
Looking forward to your ideas!
Thanks,Nadav

Re: Global Topology Cache

Reply via email to