[ https://issues.apache.org/jira/browse/IGNITE-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maxim Muzafarov reassigned IGNITE-17738: ---------------------------------------- Assignee: Maxim Muzafarov > Cluster must be able to fix the inconsistency on restart/node_join by itself > ---------------------------------------------------------------------------- > > Key: IGNITE-17738 > URL: https://issues.apache.org/jira/browse/IGNITE-17738 > Project: Ignite > Issue Type: Sub-task > Reporter: Anton Vinogradov > Assignee: Maxim Muzafarov > Priority: Major > Labels: iep-31, ise > Attachments: PartialHistoricalRebalanceTest.java, > SkippedRebalanceBecauseOfTheSameLwmTest.java > > > On cluster restart (because of power-off, OOM or some other problem) it's > possible to have PDS inconsistent (primary partitions may contain operations > missed on backups as well as counters may contain gaps even on primary). > 1) Currently, "historical rebalance" is able to sync the data to the highest > LWM for every partition. > Most likely, a primary will be chosen as a rebalance source, but the data > after the LWM will not be rebalanced. So, all updates between LWM and HWM > will not be synchronized. > See [^PartialHistoricalRebalanceTest.java] > Such partition may be rebalanced correctly "later" in case of full rebalance > will be triggered sometime. > 2) In case LWM is the same on primary and backup, rebalance will be skipped > for such partition. > See [^SkippedRebalanceBecauseOfTheSameLwmTest.java] > Proposals: > 1) Cheap fix > A possible solution for the case when the cluster failed and restarted (same > baseline) is to fix the counters automatically (when cluster composition is > equal to the baseline specified before the crash). > Counters should be set as > - HWM at primary and as LWM at backups for caches with 2+ backups, > - LWM at primary and as HWM at backups for caches with a single backup. > 2) Correct fix > Rebalance must honor whole counter state (LWM, HWM, gaps). > 2.0) Primary HWM must be set to the highest HWM across the copies to avoid > reapplying of already applied update counters on backups. > 2.1) In case when WAL is available all entries between LWM and HWM > (including) must be rebalanced to other nodes where they are required. > Even from backups to the primary. > 2.2) Full rebalance must be restricted when it causes any updates loss. > For example, it's > - ok to replace B with A when > A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], hwm=120], > because A contains whole B. > - NOT ok to replace B with A when > A[lwm=100, gaps=[142], hwm=200] and B[lwm=50, gaps=[76,99,111], > hwm={*}148{*}], > when update *142* will be lost. -- This message was sent by Atlassian Jira (v8.20.10#820010)