[jira] [Commented] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

2020-07-03 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151022#comment-17151022
 ] 

Vladislav Pyatkov commented on IGNITE-13193:


LGTM.

> Implement fallback to full partition rebalancing in case historical supplier 
> failed to read all necessary data updates from WAL
> ---
>
> Key: IGNITE-13193
> URL: https://issues.apache.org/jira/browse/IGNITE-13193
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.8.1
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Historical rebalance may fail for several reasons:
> 1) WAL on supplier node is corrupted - the supplier will trigger a failure 
> handler in the current implementation.
> 2) After iteration over WAL demander node didn't receive all updates to make 
> MOVING partition up-to-date (resulting update counter didn't converge with 
> expected update counter of OWNING partition) - demander will silently ignore 
> lack of updates in the current implementation.
> Such behavior negatively affects the stability of the cluster: an 
> inappropriate state of historical WAL is not a reason to fail a supplier node.
> The more proper way to handle this scenario is:
>  - Either try to rebalance partition historically from another supplier
>  - Or use full partition rebalance for problem partition
> Once the supplier fails to provide data from part of the WAL, its 
> corresponding sequence of checkpoints should be marked as inapplicable for 
> historical rebalance in order to prevent further errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

2020-07-03 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151020#comment-17151020
 ] 

Ignite TC Bot commented on IGNITE-13193:


{panel:title=Branch: [pull/7971/head] Base: [master] : Possible Blockers 
(1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}PDS (Indexing){color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=5436543]]

{panel}
{panel:title=Branch: [pull/7971/head] Base: [master] : New Tests 
(8)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}Service Grid{color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=fab90e3a-7a80-42d5-aace-d8bcd082be28, topVer=0, 
nodeId8=de225e46, msg=, type=NODE_JOINED, tstamp=1593735466437], 
val2=AffinityTopologyVersion [topVer=2720052317725509699, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=9cd49021371-6d964edf-1554-43c4-9610-ba4d1175766c, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=b6e73219-cc3c-4851-b05e-8c9f4d4e04fc, topVer=0, nodeId8=b6e73219, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593735466437]], 
val2=AffinityTopologyVersion [topVer=-9005077219046006491, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=9cd49021371-6d964edf-1554-43c4-9610-ba4d1175766c, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=b6e73219-cc3c-4851-b05e-8c9f4d4e04fc, topVer=0, nodeId8=b6e73219, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593735466437]], 
val2=AffinityTopologyVersion [topVer=-9005077219046006491, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=fab90e3a-7a80-42d5-aace-d8bcd082be28, topVer=0, 
nodeId8=de225e46, msg=, type=NODE_JOINED, tstamp=1593735466437], 
val2=AffinityTopologyVersion [topVer=2720052317725509699, minorTopVer=0]]] - 
PASSED{color}

{color:#8b}Service Grid (legacy mode){color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=d6e5484b-0c04-4958-af8c-87060631298e, topVer=0, 
nodeId8=0b6cb16a, msg=, type=NODE_JOINED, tstamp=1593735530785], 
val2=AffinityTopologyVersion [topVer=8840357433858611253, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=d6e5484b-0c04-4958-af8c-87060631298e, topVer=0, 
nodeId8=0b6cb16a, msg=, type=NODE_JOINED, tstamp=1593735530785], 
val2=AffinityTopologyVersion [topVer=8840357433858611253, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=d1db4121371-52eedea3-cb4b-4f87-ad96-23a96c33063b, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=39549e86-8440-4fda-aa43-8f588a7e8ac8, topVer=0, nodeId8=39549e86, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593735530785]], 
val2=AffinityTopologyVersion [topVer=2935041773177708175, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=d1db4121371-52eedea3-cb4b-4f87-ad96-23a96c33063b, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=39549e86-8440-4fda-aa43-8f588a7e8ac8, topVer=0, nodeId8=39549e86, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593735530785]], 
val2=AffinityTopologyVersion [topVer=2935041773177708175, minorTopVer=0]]] - 
PASSED{color}

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=5436032buildTypeId=IgniteTests24Java8_RunAll]

> Implement fallback to full partition rebalancing in case historical supplier 
> failed to read all necessary data updates from WAL
> ---
>
> Key: IGNITE-13193
> URL: https://issues.apache.org/jira/browse/IGNITE-13193
> Project: Ignite
>  Issue Type: Improvement
>  

[jira] [Commented] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

2020-07-03 Thread Vyacheslav Koptilin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151021#comment-17151021
 ] 

Vyacheslav Koptilin commented on IGNITE-13193:
--

Hello [~v.pyatkov],

I have addressed your comments at PR. Please take a look.

> Implement fallback to full partition rebalancing in case historical supplier 
> failed to read all necessary data updates from WAL
> ---
>
> Key: IGNITE-13193
> URL: https://issues.apache.org/jira/browse/IGNITE-13193
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.8.1
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Historical rebalance may fail for several reasons:
> 1) WAL on supplier node is corrupted - the supplier will trigger a failure 
> handler in the current implementation.
> 2) After iteration over WAL demander node didn't receive all updates to make 
> MOVING partition up-to-date (resulting update counter didn't converge with 
> expected update counter of OWNING partition) - demander will silently ignore 
> lack of updates in the current implementation.
> Such behavior negatively affects the stability of the cluster: an 
> inappropriate state of historical WAL is not a reason to fail a supplier node.
> The more proper way to handle this scenario is:
>  - Either try to rebalance partition historically from another supplier
>  - Or use full partition rebalance for problem partition
> Once the supplier fails to provide data from part of the WAL, its 
> corresponding sequence of checkpoints should be marked as inapplicable for 
> historical rebalance in order to prevent further errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

2020-07-02 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150228#comment-17150228
 ] 

Vladislav Pyatkov commented on IGNITE-13193:


[~slava.koptilin] I left three comments in PR.

Please look at those.

> Implement fallback to full partition rebalancing in case historical supplier 
> failed to read all necessary data updates from WAL
> ---
>
> Key: IGNITE-13193
> URL: https://issues.apache.org/jira/browse/IGNITE-13193
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.8.1
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Historical rebalance may fail for several reasons:
> 1) WAL on supplier node is corrupted - the supplier will trigger a failure 
> handler in the current implementation.
> 2) After iteration over WAL demander node didn't receive all updates to make 
> MOVING partition up-to-date (resulting update counter didn't converge with 
> expected update counter of OWNING partition) - demander will silently ignore 
> lack of updates in the current implementation.
> Such behavior negatively affects the stability of the cluster: an 
> inappropriate state of historical WAL is not a reason to fail a supplier node.
> The more proper way to handle this scenario is:
>  - Either try to rebalance partition historically from another supplier
>  - Or use full partition rebalance for problem partition
> Once the supplier fails to provide data from part of the WAL, its 
> corresponding sequence of checkpoints should be marked as inapplicable for 
> historical rebalance in order to prevent further errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13193) Implement fallback to full partition rebalancing in case historical supplier failed to read all necessary data updates from WAL

2020-06-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148704#comment-17148704
 ] 

Ignite TC Bot commented on IGNITE-13193:


{panel:title=Branch: [pull/7971/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/7971/head] Base: [master] : New Tests 
(12)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}PDS (Indexing){color} [tests 4]
* {color:#013220}IgnitePdsWithIndexingCoreTestSuite: 
IgniteWalRebalanceTest.testSwitchHistoricalRebalanceToFullAndClientJoin - 
PASSED{color}
* {color:#013220}IgnitePdsWithIndexingCoreTestSuite: 
IgniteWalRebalanceTest.testMultipleNodesFailHistoricalRebalance - PASSED{color}
* {color:#013220}IgnitePdsWithIndexingCoreTestSuite: 
IgniteWalRebalanceTest.testSwitchHistoricalRebalanceToFullDueToFailOnCreatingWalIterator
 - PASSED{color}
* {color:#013220}IgnitePdsWithIndexingCoreTestSuite: 
IgniteWalRebalanceTest.testSwitchHistoricalRebalanceToFullWhileIteratingOverWAL 
- PASSED{color}

{color:#8b}Service Grid{color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=d4fb8cb4-f5b4-40c5-aa31-78a86f176a39, topVer=0, 
nodeId8=988d3ef2, msg=, type=NODE_JOINED, tstamp=1593484888764], 
val2=AffinityTopologyVersion [topVer=2250924009422792828, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=d4fb8cb4-f5b4-40c5-aa31-78a86f176a39, topVer=0, 
nodeId8=988d3ef2, msg=, type=NODE_JOINED, tstamp=1593484888764], 
val2=AffinityTopologyVersion [topVer=2250924009422792828, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=0cac9130371-e91bd767-ce93-405f-8c4c-e65a6af284f1, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=f96e45f8-ee71-45c8-b086-4f12b58b4e47, topVer=0, nodeId8=f96e45f8, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593484888764]], 
val2=AffinityTopologyVersion [topVer=6194541553269355410, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=0cac9130371-e91bd767-ce93-405f-8c4c-e65a6af284f1, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=f96e45f8-ee71-45c8-b086-4f12b58b4e47, topVer=0, nodeId8=f96e45f8, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593484888764]], 
val2=AffinityTopologyVersion [topVer=6194541553269355410, minorTopVer=0]]] - 
PASSED{color}

{color:#8b}Service Grid (legacy mode){color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=7edce5fc-46de-4493-a2cf-e515e4a06cb3, topVer=0, 
nodeId8=dae95a9e, msg=, type=NODE_JOINED, tstamp=1593485037885], 
val2=AffinityTopologyVersion [topVer=-5513012272294030853, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=7edce5fc-46de-4493-a2cf-e515e4a06cb3, topVer=0, 
nodeId8=dae95a9e, msg=, type=NODE_JOINED, tstamp=1593485037885], 
val2=AffinityTopologyVersion [topVer=-5513012272294030853, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=f9156230371-9f284914-20cf-402c-a951-de72c3dce064, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=3c1ef5dc-b557-424e-84c4-ef739a37e3fb, topVer=0, nodeId8=3c1ef5dc, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593485037885]], 
val2=AffinityTopologyVersion [topVer=4163126190682535866, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=f9156230371-9f284914-20cf-402c-a951-de72c3dce064, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=3c1ef5dc-b557-424e-84c4-ef739a37e3fb, topVer=0, nodeId8=3c1ef5dc, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593485037885]], 
val2=AffinityTopologyVersion [topVer=4163126190682535866, minorTopVer=0]]] - 
PASSED{color}

{panel}
[TeamCity *-- Run :: All*