Hi there, We're looking for some guidance on patterns and implementation for a server implementation that involves two raft clusters — a primary & a secondary.
The primary cluster is responsible for front-line communications and managing database state. The secondary cluster is responsible for off-line processing; anything that's CPU or I/O intensive. The primary logs user requests and delegates work to the secondary, who reports back with an update along the processing chain & waits for the primary to update state and trigger the next step. This has caused us to run into a few problems in our implementation: - The client is instantiated as part of the state machine, since messages are sent between primary <--> secondary throughout the process. Since the client is created before the leader is established, it seems to have trouble communicating to the leader. We've created a wrapper around the client that refreshes itself when the leader changes for this purpose - Only the leader in both the primary & the secondary triggers a call-back to the other, since otherwise there is an exponential explosion. - For some reason, any rebuild causes the server to hang indefinitely without any logging to indicate a crash (e.g. no StateMachineUpdater catching exceptions) Right now we're still in relatively early stages of our implementation, though we have a working product that uses Ratis as our distributed consensus model for the back-end. Before we get too deep into the implementation, though, we'd love some guidance (or warnings!) on how to manage a primary / secondary cluster and ensure health. Has anyone built something like that before? All the best, Adam Zionts
