Re: Backup failover with persistence

2018-09-17 Thread Ilya Kasnacheev
Hello!

1. A thing called "late affinity assignment" will happen.
2. Before "late affinity assignment" happens Node A is not primary. Once it
happens, Node A is primary.

Regards,
-- 
Ilya Kasnacheev


пн, 17 сент. 2018 г. в 16:07, eugene miretsky :

> Thanks Ilya,
>
>
>1. "So all nodes will know when node A begins hosting that partition
>as primary" - how is that consensus achieved? Will it result in partition
>map exchange and new topology version?
>2. What I actually meant is that it is impossible to know when Node A
>is fully caught up to node B unless you stop all the writes to Node B while
>node A is catching up. So how does Ignite know that it is safe to set A to
>primary again?
>
>
> On Mon, Sep 17, 2018 at 8:48 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> Apache Ignite is NOT "eventually consistent" if you ask that. Apache
>> Ignite is strongly consistent. It has discovery ring (or discovery star
>> with Zk) which allows messages to be sent and acknowledged by all nodes.
>>
>> So all nodes will know when node A begins hosting that partition as
>> primary.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 17 сент. 2018 г. в 15:45, eugene miretsky > >:
>>
>>> How is "finish syncing" defined? Since it is a distributed system that
>>> is no way to guarantee that node A is 100% caught up to node B. In Kafka
>>> there is a replica.lag.time.max.ms settings, is there something
>>> similar in Ignite?
>>>
>>>
>>>
>>> On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
 Hello!

 Node A will have two choices: either drop partition completely and
 re-download it from B, or replicate recent changes on it. Either one will
 be choosed internally.
 Node A will only become primary again when it finishes syncing that
 partition.

 Regards,
 --
 Ilya Kasnacheev


 пт, 14 сент. 2018 г. в 22:23, eugene miretsky <
 eugene.miret...@gmail.com>:

> What is the process when a node goes down and then restarts?
>
> Say backups = 1. We have node A that is primary for some key, and node
> B that is back up.
>
> Node A goes down and then restarts after 5 min. What are the steps?
> 1) Node A is servicing all traffic for key X
> 2) Node A goes down
> 3) Node B starts serving all traffic for key X (I guess the clients
> detect the failover and start calling node B )
> 4) Node A comes back up
> 5) WAL replication is initiated
>
> What happens next? When does node A become the primary again? How are
> in-flight updates happen?
>
>


Re: Backup failover with persistence

2018-09-17 Thread eugene miretsky
Thanks Ilya,


   1. "So all nodes will know when node A begins hosting that partition as
   primary" - how is that consensus achieved? Will it result in partition map
   exchange and new topology version?
   2. What I actually meant is that it is impossible to know when Node A is
   fully caught up to node B unless you stop all the writes to Node B while
   node A is catching up. So how does Ignite know that it is safe to set A to
   primary again?


On Mon, Sep 17, 2018 at 8:48 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Apache Ignite is NOT "eventually consistent" if you ask that. Apache
> Ignite is strongly consistent. It has discovery ring (or discovery star
> with Zk) which allows messages to be sent and acknowledged by all nodes.
>
> So all nodes will know when node A begins hosting that partition as
> primary.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 сент. 2018 г. в 15:45, eugene miretsky :
>
>> How is "finish syncing" defined? Since it is a distributed system that is
>> no way to guarantee that node A is 100% caught up to node B. In Kafka there
>> is a replica.lag.time.max.ms settings, is there something similar in
>> Ignite?
>>
>>
>>
>> On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Node A will have two choices: either drop partition completely and
>>> re-download it from B, or replicate recent changes on it. Either one will
>>> be choosed internally.
>>> Node A will only become primary again when it finishes syncing that
>>> partition.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пт, 14 сент. 2018 г. в 22:23, eugene miretsky >> >:
>>>
 What is the process when a node goes down and then restarts?

 Say backups = 1. We have node A that is primary for some key, and node
 B that is back up.

 Node A goes down and then restarts after 5 min. What are the steps?
 1) Node A is servicing all traffic for key X
 2) Node A goes down
 3) Node B starts serving all traffic for key X (I guess the clients
 detect the failover and start calling node B )
 4) Node A comes back up
 5) WAL replication is initiated

 What happens next? When does node A become the primary again? How are
 in-flight updates happen?




Re: Backup failover with persistence

2018-09-17 Thread Ilya Kasnacheev
Hello!

Apache Ignite is NOT "eventually consistent" if you ask that. Apache Ignite
is strongly consistent. It has discovery ring (or discovery star with Zk)
which allows messages to be sent and acknowledged by all nodes.

So all nodes will know when node A begins hosting that partition as primary.

Regards,
-- 
Ilya Kasnacheev


пн, 17 сент. 2018 г. в 15:45, eugene miretsky :

> How is "finish syncing" defined? Since it is a distributed system that is
> no way to guarantee that node A is 100% caught up to node B. In Kafka there
> is a replica.lag.time.max.ms settings, is there something similar in
> Ignite?
>
>
>
> On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> Node A will have two choices: either drop partition completely and
>> re-download it from B, or replicate recent changes on it. Either one will
>> be choosed internally.
>> Node A will only become primary again when it finishes syncing that
>> partition.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 14 сент. 2018 г. в 22:23, eugene miretsky > >:
>>
>>> What is the process when a node goes down and then restarts?
>>>
>>> Say backups = 1. We have node A that is primary for some key, and node B
>>> that is back up.
>>>
>>> Node A goes down and then restarts after 5 min. What are the steps?
>>> 1) Node A is servicing all traffic for key X
>>> 2) Node A goes down
>>> 3) Node B starts serving all traffic for key X (I guess the clients
>>> detect the failover and start calling node B )
>>> 4) Node A comes back up
>>> 5) WAL replication is initiated
>>>
>>> What happens next? When does node A become the primary again? How are
>>> in-flight updates happen?
>>>
>>>


Re: Backup failover with persistence

2018-09-17 Thread eugene miretsky
How is "finish syncing" defined? Since it is a distributed system that is
no way to guarantee that node A is 100% caught up to node B. In Kafka there
is a replica.lag.time.max.ms settings, is there something similar in
Ignite?



On Mon, Sep 17, 2018 at 8:37 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Node A will have two choices: either drop partition completely and
> re-download it from B, or replicate recent changes on it. Either one will
> be choosed internally.
> Node A will only become primary again when it finishes syncing that
> partition.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 14 сент. 2018 г. в 22:23, eugene miretsky :
>
>> What is the process when a node goes down and then restarts?
>>
>> Say backups = 1. We have node A that is primary for some key, and node B
>> that is back up.
>>
>> Node A goes down and then restarts after 5 min. What are the steps?
>> 1) Node A is servicing all traffic for key X
>> 2) Node A goes down
>> 3) Node B starts serving all traffic for key X (I guess the clients
>> detect the failover and start calling node B )
>> 4) Node A comes back up
>> 5) WAL replication is initiated
>>
>> What happens next? When does node A become the primary again? How are
>> in-flight updates happen?
>>
>>


Re: Backup failover with persistence

2018-09-17 Thread Ilya Kasnacheev
Hello!

Node A will have two choices: either drop partition completely and
re-download it from B, or replicate recent changes on it. Either one will
be choosed internally.
Node A will only become primary again when it finishes syncing that
partition.

Regards,
-- 
Ilya Kasnacheev


пт, 14 сент. 2018 г. в 22:23, eugene miretsky :

> What is the process when a node goes down and then restarts?
>
> Say backups = 1. We have node A that is primary for some key, and node B
> that is back up.
>
> Node A goes down and then restarts after 5 min. What are the steps?
> 1) Node A is servicing all traffic for key X
> 2) Node A goes down
> 3) Node B starts serving all traffic for key X (I guess the clients detect
> the failover and start calling node B )
> 4) Node A comes back up
> 5) WAL replication is initiated
>
> What happens next? When does node A become the primary again? How are
> in-flight updates happen?
>
>


Re: Backup failover with persistence

2018-09-15 Thread Gaurav Bajaj
Hello,

After step 4 above, Ignite will detect that original primary Node A is up,
so all the updates happened while Node A was down will be applied to Node A
so thay it is latest and it will marked as Primary again. Till the time
this process is completed, Node B will still be considered as primary.


Regards,
Gaurav

On 14-Sep-2018 9:23 PM, "eugene miretsky"  wrote:

> What is the process when a node goes down and then restarts?
>
> Say backups = 1. We have node A that is primary for some key, and node B
> that is back up.
>
> Node A goes down and then restarts after 5 min. What are the steps?
> 1) Node A is servicing all traffic for key X
> 2) Node A goes down
> 3) Node B starts serving all traffic for key X (I guess the clients detect
> the failover and start calling node B )
> 4) Node A comes back up
> 5) WAL replication is initiated
>
> What happens next? When does node A become the primary again? How are
> in-flight updates happen?
>
>


Backup failover with persistence

2018-09-14 Thread eugene miretsky
What is the process when a node goes down and then restarts?

Say backups = 1. We have node A that is primary for some key, and node B
that is back up.

Node A goes down and then restarts after 5 min. What are the steps?
1) Node A is servicing all traffic for key X
2) Node A goes down
3) Node B starts serving all traffic for key X (I guess the clients detect
the failover and start calling node B )
4) Node A comes back up
5) WAL replication is initiated

What happens next? When does node A become the primary again? How are
in-flight updates happen?