Snapshots are taken by the StateMachine. If it is implemented correctly, the snapshots should be able to bring up new nodes. The considerations we should be aware of are that (1) snapshots should not contain information for a particular machine, (2) each snapshot should have a checksum for integrity checking and (3) the size of a snapshot should be as small as possible. Also, the snapshot operation must be synchronous -- it should capture the information at a particular log index but not allow the StateMachine to mutate in between. It may be tricky if the snapshot size is large. For example, when HDFS takes a snapshot, it holds a lock to pause client operations and dump the memory to disk. If the size is large, the client may timeout and fail. HDFS solves the problem by taking snapshots in a secondary/standby machine.
Tsz-Wo On Wed, Jul 26, 2023 at 6:27 AM Asad Awadia <[email protected]> wrote: > If the leader does a snapshot and uploads it to S3 > > Are there any situations in which the snapshot cannot be used or should > not be used when starting up a new node? > > Any considerations we should be aware of and take care of when using > snapshots to bring up new nodes? >
