Thanks Ilya,
1. "So all nodes will know when node A begins hosting that partition as primary" - how is that consensus achieved? Will it result in partition map exchange and new topology version? 2. What I actually meant is that it is impossible to know when Node A is fully caught up to node B unless you stop all the writes to Node B while node A is catching up. So how does Ignite know that it is safe to set A to primary again? On Mon, Sep 17, 2018 at 8:48 AM Ilya Kasnacheev <[email protected]> wrote: > Hello! > > Apache Ignite is NOT "eventually consistent" if you ask that. Apache > Ignite is strongly consistent. It has discovery ring (or discovery star > with Zk) which allows messages to be sent and acknowledged by all nodes. > > So all nodes will know when node A begins hosting that partition as > primary. > > Regards, > -- > Ilya Kasnacheev > > > пн, 17 сент. 2018 г. в 15:45, eugene miretsky <[email protected]>: > >> How is "finish syncing" defined? Since it is a distributed system that is >> no way to guarantee that node A is 100% caught up to node B. In Kafka there >> is a replica.lag.time.max.ms settings, is there something similar in >> Ignite? >> >> >> >> On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev < >> [email protected]> wrote: >> >>> Hello! >>> >>> Node A will have two choices: either drop partition completely and >>> re-download it from B, or replicate recent changes on it. Either one will >>> be choosed internally. >>> Node A will only become primary again when it finishes syncing that >>> partition. >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> пт, 14 сент. 2018 г. в 22:23, eugene miretsky <[email protected] >>> >: >>> >>>> What is the process when a node goes down and then restarts? >>>> >>>> Say backups = 1. We have node A that is primary for some key, and node >>>> B that is back up. >>>> >>>> Node A goes down and then restarts after 5 min. What are the steps? >>>> 1) Node A is servicing all traffic for key X >>>> 2) Node A goes down >>>> 3) Node B starts serving all traffic for key X (I guess the clients >>>> detect the failover and start calling node B ) >>>> 4) Node A comes back up >>>> 5) WAL replication is initiated >>>> >>>> What happens next? When does node A become the primary again? How are >>>> in-flight updates happen? >>>> >>>>
