Thanks for the quick answer! Our DB is Cassandra. It's updated by Storm as well as by another service, and it's also read by both Storm and the other service. One of the main issues we are facing is when a table is updated by the service, Storm detects it in one of the executors, but the others remain oblivious to the change because the cache is not shared between them. The cache would be updated and read quite frequently and should be able to handle roughly 50-100k entries, each being a complex Java class. Naturally we're looking for minimal network overhead, which is why the caching solution as part of the topology would be most ideal.
How would you update all the executors but query with only one of them? Is there a way to partition data between them? Similar to Bolt grouping, but for the entire executor? On Sun, Aug 7, 2022 at 6:59 PM Bipin Prasad via user <[email protected]> wrote: > Hello Nadav, > > Is the database updated by some other data flow other that this topology? > How are the database changes detected and the “cache” updated? The size of > the cached data and update volume will also influence the design. > > Without knowing some crucial details of data, size, update frequency, > natural partitioning, network speed, etc it is hard to give a general > answer. > > But assuming that you are looking at a storm topology to itself serve as a > cache provider and cache is small, one “possible” way to do this would be > to have the updates hit all executors, but queries hit one of the “many” > executors. > > Sent from Yahoo Mail for iPhone > <https://overview.mail.yahoo.com/?.src=iOS> > > On Sunday, August 7, 2022, 8:25 AM, Nadav Glickman > <[email protected]> wrote: > > Hi all, > > We're looking for a caching solution to cache data reads from the DB and > have it available to the entire topology. > We need the data to be updated and the same for all the bolts, so we can't > have the same cache split among different executors. > Ideally we can have some in-memory solution within Storm. We tried an > enum, and singletons, but they aren't shared between executors. > > I know distributed caching DBs like memcached and Redis are viable > options, but I'd really like to find a solution that won't require another > machine and another piece of technology in our stack. > > Looking forward to your ideas! > Thanks, > Nadav > >
