Re: Backup failover with persistence
Hello! 1. A thing called "late affinity assignment" will happen. 2. Before "late affinity assignment" happens Node A is not primary. Once it happens, Node A is primary. Regards, -- Ilya Kasnacheev пн, 17 сент. 2018 г. в 16:07, eugene miretsky : > Thanks Ilya, > > >1. "So all nodes will know when node A begins hosting that partition >as primary" - how is that consensus achieved? Will it result in partition >map exchange and new topology version? >2. What I actually meant is that it is impossible to know when Node A >is fully caught up to node B unless you stop all the writes to Node B while >node A is catching up. So how does Ignite know that it is safe to set A to >primary again? > > > On Mon, Sep 17, 2018 at 8:48 AM Ilya Kasnacheev > wrote: > >> Hello! >> >> Apache Ignite is NOT "eventually consistent" if you ask that. Apache >> Ignite is strongly consistent. It has discovery ring (or discovery star >> with Zk) which allows messages to be sent and acknowledged by all nodes. >> >> So all nodes will know when node A begins hosting that partition as >> primary. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> пн, 17 сент. 2018 г. в 15:45, eugene miretsky > >: >> >>> How is "finish syncing" defined? Since it is a distributed system that >>> is no way to guarantee that node A is 100% caught up to node B. In Kafka >>> there is a replica.lag.time.max.ms settings, is there something >>> similar in Ignite? >>> >>> >>> >>> On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev < >>> ilya.kasnach...@gmail.com> wrote: >>> Hello! Node A will have two choices: either drop partition completely and re-download it from B, or replicate recent changes on it. Either one will be choosed internally. Node A will only become primary again when it finishes syncing that partition. Regards, -- Ilya Kasnacheev пт, 14 сент. 2018 г. в 22:23, eugene miretsky < eugene.miret...@gmail.com>: > What is the process when a node goes down and then restarts? > > Say backups = 1. We have node A that is primary for some key, and node > B that is back up. > > Node A goes down and then restarts after 5 min. What are the steps? > 1) Node A is servicing all traffic for key X > 2) Node A goes down > 3) Node B starts serving all traffic for key X (I guess the clients > detect the failover and start calling node B ) > 4) Node A comes back up > 5) WAL replication is initiated > > What happens next? When does node A become the primary again? How are > in-flight updates happen? > >
Re: Backup failover with persistence
Thanks Ilya, 1. "So all nodes will know when node A begins hosting that partition as primary" - how is that consensus achieved? Will it result in partition map exchange and new topology version? 2. What I actually meant is that it is impossible to know when Node A is fully caught up to node B unless you stop all the writes to Node B while node A is catching up. So how does Ignite know that it is safe to set A to primary again? On Mon, Sep 17, 2018 at 8:48 AM Ilya Kasnacheev wrote: > Hello! > > Apache Ignite is NOT "eventually consistent" if you ask that. Apache > Ignite is strongly consistent. It has discovery ring (or discovery star > with Zk) which allows messages to be sent and acknowledged by all nodes. > > So all nodes will know when node A begins hosting that partition as > primary. > > Regards, > -- > Ilya Kasnacheev > > > пн, 17 сент. 2018 г. в 15:45, eugene miretsky : > >> How is "finish syncing" defined? Since it is a distributed system that is >> no way to guarantee that node A is 100% caught up to node B. In Kafka there >> is a replica.lag.time.max.ms settings, is there something similar in >> Ignite? >> >> >> >> On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev < >> ilya.kasnach...@gmail.com> wrote: >> >>> Hello! >>> >>> Node A will have two choices: either drop partition completely and >>> re-download it from B, or replicate recent changes on it. Either one will >>> be choosed internally. >>> Node A will only become primary again when it finishes syncing that >>> partition. >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> пт, 14 сент. 2018 г. в 22:23, eugene miretsky >> >: >>> What is the process when a node goes down and then restarts? Say backups = 1. We have node A that is primary for some key, and node B that is back up. Node A goes down and then restarts after 5 min. What are the steps? 1) Node A is servicing all traffic for key X 2) Node A goes down 3) Node B starts serving all traffic for key X (I guess the clients detect the failover and start calling node B ) 4) Node A comes back up 5) WAL replication is initiated What happens next? When does node A become the primary again? How are in-flight updates happen?
Re: Backup failover with persistence
Hello! Apache Ignite is NOT "eventually consistent" if you ask that. Apache Ignite is strongly consistent. It has discovery ring (or discovery star with Zk) which allows messages to be sent and acknowledged by all nodes. So all nodes will know when node A begins hosting that partition as primary. Regards, -- Ilya Kasnacheev пн, 17 сент. 2018 г. в 15:45, eugene miretsky : > How is "finish syncing" defined? Since it is a distributed system that is > no way to guarantee that node A is 100% caught up to node B. In Kafka there > is a replica.lag.time.max.ms settings, is there something similar in > Ignite? > > > > On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev > wrote: > >> Hello! >> >> Node A will have two choices: either drop partition completely and >> re-download it from B, or replicate recent changes on it. Either one will >> be choosed internally. >> Node A will only become primary again when it finishes syncing that >> partition. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> пт, 14 сент. 2018 г. в 22:23, eugene miretsky > >: >> >>> What is the process when a node goes down and then restarts? >>> >>> Say backups = 1. We have node A that is primary for some key, and node B >>> that is back up. >>> >>> Node A goes down and then restarts after 5 min. What are the steps? >>> 1) Node A is servicing all traffic for key X >>> 2) Node A goes down >>> 3) Node B starts serving all traffic for key X (I guess the clients >>> detect the failover and start calling node B ) >>> 4) Node A comes back up >>> 5) WAL replication is initiated >>> >>> What happens next? When does node A become the primary again? How are >>> in-flight updates happen? >>> >>>
Re: Backup failover with persistence
How is "finish syncing" defined? Since it is a distributed system that is no way to guarantee that node A is 100% caught up to node B. In Kafka there is a replica.lag.time.max.ms settings, is there something similar in Ignite? On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev wrote: > Hello! > > Node A will have two choices: either drop partition completely and > re-download it from B, or replicate recent changes on it. Either one will > be choosed internally. > Node A will only become primary again when it finishes syncing that > partition. > > Regards, > -- > Ilya Kasnacheev > > > пт, 14 сент. 2018 г. в 22:23, eugene miretsky : > >> What is the process when a node goes down and then restarts? >> >> Say backups = 1. We have node A that is primary for some key, and node B >> that is back up. >> >> Node A goes down and then restarts after 5 min. What are the steps? >> 1) Node A is servicing all traffic for key X >> 2) Node A goes down >> 3) Node B starts serving all traffic for key X (I guess the clients >> detect the failover and start calling node B ) >> 4) Node A comes back up >> 5) WAL replication is initiated >> >> What happens next? When does node A become the primary again? How are >> in-flight updates happen? >> >>
Re: Backup failover with persistence
Hello! Node A will have two choices: either drop partition completely and re-download it from B, or replicate recent changes on it. Either one will be choosed internally. Node A will only become primary again when it finishes syncing that partition. Regards, -- Ilya Kasnacheev пт, 14 сент. 2018 г. в 22:23, eugene miretsky : > What is the process when a node goes down and then restarts? > > Say backups = 1. We have node A that is primary for some key, and node B > that is back up. > > Node A goes down and then restarts after 5 min. What are the steps? > 1) Node A is servicing all traffic for key X > 2) Node A goes down > 3) Node B starts serving all traffic for key X (I guess the clients detect > the failover and start calling node B ) > 4) Node A comes back up > 5) WAL replication is initiated > > What happens next? When does node A become the primary again? How are > in-flight updates happen? > >
Re: Backup failover with persistence
Hello, After step 4 above, Ignite will detect that original primary Node A is up, so all the updates happened while Node A was down will be applied to Node A so thay it is latest and it will marked as Primary again. Till the time this process is completed, Node B will still be considered as primary. Regards, Gaurav On 14-Sep-2018 9:23 PM, "eugene miretsky" wrote: > What is the process when a node goes down and then restarts? > > Say backups = 1. We have node A that is primary for some key, and node B > that is back up. > > Node A goes down and then restarts after 5 min. What are the steps? > 1) Node A is servicing all traffic for key X > 2) Node A goes down > 3) Node B starts serving all traffic for key X (I guess the clients detect > the failover and start calling node B ) > 4) Node A comes back up > 5) WAL replication is initiated > > What happens next? When does node A become the primary again? How are > in-flight updates happen? > >
Backup failover with persistence
What is the process when a node goes down and then restarts? Say backups = 1. We have node A that is primary for some key, and node B that is back up. Node A goes down and then restarts after 5 min. What are the steps? 1) Node A is servicing all traffic for key X 2) Node A goes down 3) Node B starts serving all traffic for key X (I guess the clients detect the failover and start calling node B ) 4) Node A comes back up 5) WAL replication is initiated What happens next? When does node A become the primary again? How are in-flight updates happen?