[jira] [Updated] (IGNITE-17738) Cluster must be able to fix the inconsistency on restart/node_join by itself

2022-10-19 Thread Maxim Muzafarov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Muzafarov updated IGNITE-17738:
-
Fix Version/s: 2.15

> Cluster must be able to fix the inconsistency on restart/node_join by itself
> 
>
> Key: IGNITE-17738
> URL: https://issues.apache.org/jira/browse/IGNITE-17738
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Vinogradov
>Assignee: Maxim Muzafarov
>Priority: Major
>  Labels: iep-31, ise
> Fix For: 2.15
>
> Attachments: PartialHistoricalRebalanceTest.java, 
> SkippedRebalanceBecauseOfTheSameLwmTest.java
>
>
> On cluster restart (because of power-off, OOM or some other problem) it's 
> possible to have PDS inconsistent (primary partitions may contain operations 
> missed on backups as well as counters may contain gaps even on primary).
> 1) Currently, "historical rebalance" is able to sync the data to the highest 
> LWM for every partition. 
> Most likely, a primary will be chosen as a rebalance source, but the data 
> after the LWM will not be rebalanced. So, all updates between LWM and HWM 
> will not be synchronized.
> See [^PartialHistoricalRebalanceTest.java]
> Such partition may be rebalanced correctly "later" in case of full rebalance 
> will be triggered sometime.
> 2) In case LWM is the same on primary and backup, rebalance will be skipped 
> for such partition.
> See [^SkippedRebalanceBecauseOfTheSameLwmTest.java]
> Proposals:
> 1) Cheap fix
> A possible solution for the case when the cluster failed and restarted (same 
> baseline) is to fix the counters automatically (when cluster composition is 
> equal to the baseline specified before the crash).
> Counters should be set as
>  - HWM at primary and as LWM at backups for caches with 2+ backups,
>  - LWM at primary and as HWM at backups for caches with a single backup.
> 2) Correct fix
> Rebalance must honor whole counter state (LWM, HWM, gaps).
> 2.0) Primary HWM must be set to the highest HWM across the copies to avoid 
> reapplying of already applied update counters on backups.
> 2.1) In case when WAL is available all entries between LWM and HWM 
> (including) must be rebalanced to other nodes where they are required.
> Even from backups to the primary.
> 2.2) Full rebalance must be restricted when it causes any updates loss.
> For example, it's
>  - ok to replace B with A when
> A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=120],
> because A contains whole B.
>  - NOT ok to replace B with A when
> A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], 
> hwm={*}148{*}], 
> when update *142* will be lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-17738) Cluster must be able to fix the inconsistency on restart/node_join by itself

2022-10-18 Thread Anton Vinogradov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Vinogradov updated IGNITE-17738:
--
Description: 
On cluster restart (because of power-off, OOM or some other problem) it's 
possible to have PDS inconsistent (primary partitions may contain operations 
missed on backups as well as counters may contain gaps even on primary).

1) Currently, "historical rebalance" is able to sync the data to the highest 
LWM for every partition. 
Most likely, a primary will be chosen as a rebalance source, but the data after 
the LWM will not be rebalanced. So, all updates between LWM and HWM will not be 
synchronized.
See [^PartialHistoricalRebalanceTest.java]

Such partition may be rebalanced correctly "later" in case of full rebalance 
will be triggered sometime.

2) In case LWM is the same on primary and backup, rebalance will be skipped for 
such partition.
See [^SkippedRebalanceBecauseOfTheSameLwmTest.java]

Proposals:

1) Cheap fix
A possible solution for the case when the cluster failed and restarted (same 
baseline) is to fix the counters automatically (when cluster composition is 
equal to the baseline specified before the crash).

Counters should be set as
 - HWM at primary and as LWM at backups for caches with 2+ backups,
 - LWM at primary and as HWM at backups for caches with a single backup.

2) Correct fix
Rebalance must honor whole counter state (LWM, HWM, gaps).
2.0) Primary HWM must be set to the highest HWM across the copies to avoid 
reapplying of already applied update counters on backups.
2.1) In case when WAL is available all entries between LWM and HWM (including) 
must be rebalanced to other nodes where they are required.
Even from backups to the primary.
2.2) Full rebalance must be restricted when it causes any updates loss.
For example, it's
 - ok to replace B with A when
A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=120],
because A contains whole B.
 - NOT ok to replace B with A when
A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm={*}148{*}], 
when update *142* will be lost.

  was:
On cluster restart (because of power-off, OOM or some other problem) it's 
possible to have PDS inconsistent (primary partitions may contain operations 
missed on backups as well as counters may contain gaps even on primary).

1) Currently, "historical rebalance" is able to sync the data to the highest 
LWM for every partition. 
Most likely, a primary will be chosen as a rebalance source, but the data after 
the LWM will not be rebalanced. So, all updates between LWM and HWM will not be 
synchronized.
See  [^PartialHistoricalRebalanceTest.java] 

Such partition may be rebalanced correctly "later" in case of full rebalance 
will be triggered sometime.

2) In case LWM is the same on primary and backup, rebalance will be skipped for 
such partition.
See  [^SkippedRebalanceBecauseOfTheSameLwmTest.java] 

Proposals:

1) Cheap fix
A possible solution for the case when the cluster failed and restarted (same 
baseline) is to fix the counters automatically (when cluster composition is 
equal to the baseline specified before the crash).

Counters should be set as
 - HWM at primary and as LWM at backups for caches with 2+ backups,
 - LWM at primary and as HWM at backups for caches with a single backup.

2) Correct fix
Rebalance must honor whole counter state (LWM, HWM, gaps).
2.1) In case when WAL is available all entries between LWM and HWM (including) 
must be rebalanced to other nodes where they are required.
Even from backups to the primary.
2.2) Full rebalance must be restricted when it causes any updates loss.
For example, it's 
- ok to replace B with A when 
A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=120],
because A contains whole B.
- NOT ok to replace B with A when 
A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=*148*], 
when update *142* will be lost.


> Cluster must be able to fix the inconsistency on restart/node_join by itself
> 
>
> Key: IGNITE-17738
> URL: https://issues.apache.org/jira/browse/IGNITE-17738
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Vinogradov
>Priority: Major
>  Labels: iep-31, ise
> Attachments: PartialHistoricalRebalanceTest.java, 
> SkippedRebalanceBecauseOfTheSameLwmTest.java
>
>
> On cluster restart (because of power-off, OOM or some other problem) it's 
> possible to have PDS inconsistent (primary partitions may contain operations 
> missed on backups as well as counters may contain gaps even on primary).
> 1) Currently, "historical rebalance" is able to sync the data to the highest 
> LWM for every partition. 
> Most likely, a primary will be chosen as a rebalance source, but the data 
> after 

[jira] [Updated] (IGNITE-17738) Cluster must be able to fix the inconsistency on restart/node_join by itself

2022-10-17 Thread Anton Vinogradov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Vinogradov updated IGNITE-17738:
--
Summary: Cluster must be able to fix the inconsistency on restart/node_join 
by itself  (was: Cluster must be able to fix the inconsistency on restart by 
itself)

> Cluster must be able to fix the inconsistency on restart/node_join by itself
> 
>
> Key: IGNITE-17738
> URL: https://issues.apache.org/jira/browse/IGNITE-17738
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Vinogradov
>Priority: Major
>  Labels: iep-31, ise
> Attachments: PartialHistoricalRebalanceTest.java, 
> SkippedRebalanceBecauseOfTheSameLwmTest.java
>
>
> On cluster restart (because of power-off, OOM or some other problem) it's 
> possible to have PDS inconsistent (primary partitions may contain operations 
> missed on backups as well as counters may contain gaps even on primary).
> 1) Currently, "historical rebalance" is able to sync the data to the highest 
> LWM for every partition. 
> Most likely, a primary will be chosen as a rebalance source, but the data 
> after the LWM will not be rebalanced. So, all updates between LWM and HWM 
> will not be synchronized.
> See  [^PartialHistoricalRebalanceTest.java] 
> Such partition may be rebalanced correctly "later" in case of full rebalance 
> will be triggered sometime.
> 2) In case LWM is the same on primary and backup, rebalance will be skipped 
> for such partition.
> See  [^SkippedRebalanceBecauseOfTheSameLwmTest.java] 
> Proposals:
> 1) Cheap fix
> A possible solution for the case when the cluster failed and restarted (same 
> baseline) is to fix the counters automatically (when cluster composition is 
> equal to the baseline specified before the crash).
> Counters should be set as
>  - HWM at primary and as LWM at backups for caches with 2+ backups,
>  - LWM at primary and as HWM at backups for caches with a single backup.
> 2) Correct fix
> Rebalance must honor whole counter state (LWM, HWM, gaps).
> 2.1) In case when WAL is available all entries between LWM and HWM 
> (including) must be rebalanced to other nodes where they are required.
> Even from backups to the primary.
> 2.2) Full rebalance must be restricted when it causes any updates loss.
> For example, it's 
> - ok to replace B with A when 
> A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=120],
> because A contains whole B.
> - NOT ok to replace B with A when 
> A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=*148*], 
> when update *142* will be lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)