[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-20 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000857#comment-17000857
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

Merged to ignite-2.8 branch.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.8, 2.9
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000260#comment-17000260
 ] 

Ignite TC Bot commented on IGNITE-9913:
---

{panel:title=Branch: [pull/7165/head] Base: [pull/7102/head] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4855614buildTypeId=IgniteTests24Java8_RunAll]

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.9
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-19 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1629#comment-1629
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

Merged to master branch.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.9
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-13 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995574#comment-16995574
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

Yardstick run found no performance drop.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-12 Thread Alexei Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994406#comment-16994406
 ] 

Alexei Scherbakov commented on IGNITE-9913:
---

[~avinogradov]

1. I've came to a conclusion having rebalanced state calculated on coordinator 
is the most robust way to say the grid is rebalanced. Let's keep it.
2. I've left two comments in your PR regarding the change.
3. ok.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-11 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993678#comment-16993678
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

[~ascherbakov]
Fixes ready to be reviewed. 
Could you please check?

>> Can we get rid of sending any assignments for protocol v3 ?
Done

>> Also could you add a test ...
Done

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-11 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993675#comment-16993675
 ] 

Ignite TC Bot commented on IGNITE-9913:
---

{panel:title=Branch: [pull/7069/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4829803buildTypeId=IgniteTests24Java8_RunAll]

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-10 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993233#comment-16993233
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

[~ascherbakov]

>>  for this exchange forceAffReassignment=true and 
>> GridDhtPartitionsFullMessage#idealAffinityDiff().isEmpty()
Unfortunatelly idealAffinityDiff().isEmpty() does not mean cluster rebalanced, 
it only shows primaries are rebalanced.
It rebalanced when we have empty waitinfo which calculated only at coordinator.
So, it seems, the flag is required.

>> Can we get rid of sending any assignments for protocol v3 ?
It seems it's possible, will check.

>> Also could you add a test ...
Sure 

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-10 Thread Alexei Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992697#comment-16992697
 ] 

Alexei Scherbakov commented on IGNITE-9913:
---

[~avinogradov]

I've reviewed a PR. Overall the idea and the implementation looks valid.

My questions are:

1. You've introduced a flag _rebalanced_ indicating the previous exchange 
future was completed after everything is rebalanced.
Seems the flag is not necessary. The _rebalanced_ state can be figured out by 
the conditions:
a) this exchange is triggered by CacheAffinityChangeMessage 
b) for this exchange forceAffReassignment=true and 
GridDhtPartitionsFullMessage#idealAffinityDiff().isEmpty()

Can we get rid of the flag ?

2. It seems CacheAffinityChangeMessage is no longer contains any useful 
assignments when is triggered by switching from late to ideal state.
Can we get rid of sending any assignments for protocol v3 ?

Also could you add a test when all owners of the partition are left one by one 
under the load and make sure updates to other partitions work as expected 
without PME, using different loss policy modes and backups number ?

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-05 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988798#comment-16988798
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

[~ascherbakov] [~agoncharuk],
Issue ready to be reviewed.
Could you please check the solution?

Fix description can be found here: 
http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-12-05 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988566#comment-16988566
 ] 

Ignite TC Bot commented on IGNITE-9913:
---

{panel:title=Branch: [pull/7069/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4811839buildTypeId=IgniteTests24Java8_RunAll]

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-10-12 Thread Alexei Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950019#comment-16950019
 ] 

Alexei Scherbakov commented on IGNITE-9913:
---

[~avinogradov]

I've reviewed changes. Seems it follows the architecture we discussed privately.
I left comments in the PR.
Besides what you definitely should add more tests.
Important scenarious migh be:
1. Baseline node is left under tx load while rebalancing is in progress 
(rebalancing is due to other node joins).
2. Owners are left one by one under tx load until subset of partitions will 
have single owner.
3. Owners are left one by one under tx load until subset of partitions will 
have no owner. Validate partition loss.

All tests should check partition integrity: see 
org.apache.ignite.testframework.junits.common.GridCommonAbstractTest#assertPartitionsSame

Do you have plans to implement non-blocking mapping for transactions not 
affected by topology change by the same ticket ?




> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-10-09 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947637#comment-16947637
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

Alexey Goncharuk,

1. Seems, I've got a situation, please check the understanding.

Prepared Txs can be located at backup nodes having partitions state from: 
{{[state == MOVING || state == OWNING || state == RENTING]}}
 So, we have to cover all these cases.
 For example, we should repair all partitions with non-finished rebalance 
(moving), correct?

So, {{Set failedPrimaries = 
aff.primaryPartitions(fut.exchangeId().eventNode().id(), aff.lastVersion());}} 
is a correct calculation, but 
 {{Set locBackups = 
aff.backupPartitions(fut.sharedContext().localNodeId(), aff.lastVersion());}} 
should be replaced with dht.localPartitions() usage?

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-10-09 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947604#comment-16947604
 ] 

Anton Vinogradov commented on IGNITE-9913:
--

[~agoncharuk]
 Thanks for joining!

1. Not sure I've got an issue.
 As far as I can see {{aff.[primary,backup]Partitions}} uses {{aff.assignment}} 
(not an {{idealAssignment}}) to calculate list of nodes.
 Having that baseline enabled and was not changed we should just check the 
latest assignment, which was calculated using part2node during the latest 
finished regular PME.
 Have I missed something? Could you, please, reexplain the situation?

2. Non-affected nodes finish PME immediately. 
 So, we will block new operations only at affected nodes and only during the 
recovery.
 Benchmarks are in progress, will provide the result once it will be ready.
 But the main improvement here should be the ability to skip waiting for 
already started operations completion.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-10-09 Thread Alexey Goncharuk (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947584#comment-16947584
 ] 

Alexey Goncharuk commented on IGNITE-9913:
--

[~NSAmelchev], [~avinogradov], a few comments for the PR:
 * The {{localRecoveryNeeded}} does not seem right - you check the list of 
partitions from affinity assignment cache. There may be a case when a node 
still owns a partition, but it is not an assigned backup for this partition 
(this will happen right after late affinity assignment change, when affinity 
cache is changed to an ideal assignment, but the node did not yet RENTed a 
partition). In this case, the partition will not be reported in the list of 
partitions and recovery will be skipped
 * Do I understand correctly that *all* new transactions will still wait for 
this optimized PME to complete? If yes, what is the actual time boost that this 
change gives? Do you have any benchmark numbers? If no, how do you order 
transactions on a new primary node with the backup transactions on the same 
node that did not finish recovery yet?

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-06-24 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871096#comment-16871096
 ] 

Amelchev Nikita commented on IGNITE-9913:
-

Hi, [~ivan.glukos].
I found two possible blockers to do such lightweight PME without blocking 
updates:

1. Finalize partitions counter. It seems that we can't correctly collect gaps 
and process them without completing all txs. See the 
{{GridDhtPartitionTopologyImpl#finalizeUpdateCounters}} method.

2. Apply update counters. We can't correctly set {{HWM}} counter if primary 
left the cluster and sent updates to part of backups. Such updates can be 
processed later and break guarantee that {{LWM<=HWM}}.

Could you take a look?

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-05-24 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847456#comment-16847456
 ] 

Amelchev Nikita commented on IGNITE-9913:
-

I investigate the issue about MOVING partitions - should we allow lightweight 
PME if the cluster has those partitions. 
[Dev-list 
discussion.|http://apache-ignite-developers.2346864.n4.nabble.com/Lightweight-version-of-partitions-map-exchange-td41551.html]

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-04-03 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808482#comment-16808482
 ] 

Amelchev Nikita commented on IGNITE-9913:
-

[~Jokser], Thank you for the review!
For now the coordinator set partitions states based on their availability if 
cluster has moving partitions. In theory we can calculate it locally too. But 
it should be consistent for each node. I suggest do it with other 
optimizations, such as leaving not-baseline node with persistence enabled.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-04-01 Thread Pavel Kovalenko (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806597#comment-16806597
 ] 

Pavel Kovalenko commented on IGNITE-9913:
-

[~NSAmelchev]
I've reviewed your changes. I have a question regarding conditions when this 
optimization is disabled.
Why local affinity calculation is turned off when there are some moving 
partitions in topology and affinity assignments are not equal to ideal?


> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-29 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804711#comment-16804711
 ] 

Alexey Goncharuk commented on IGNITE-9913:
--

[~NSAmelchev], I see that your optimization is disabled when there are 
in-memory caches present in the cluster. However, this will no longer be the 
case when IGNITE-11188 is merged. I think we need to coordinate these two 
changes to make most out of both.

[~DmitriyGovorukhin], [~ibessonov], can you take a look at this change and 
coordinate with Nikita on merge order?

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-28 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804006#comment-16804006
 ] 

Amelchev Nikita commented on IGNITE-9913:
-

Hi, [~ilantukh], thank you for taking a look!

Yes, updates may be blocked until the discovery event processed on the primary 
node. In the worst case, this is the time that the node leave event will pass 
the ring (TcpDiscovery).

I was thinking about introducing some message with topology information. And 
this message the coordinator would send via communication SPI. But in my 
opinion, it is not quite correct to forward information about the new nodes 
topology for affinity via communication. If it makes sense, I can investigate 
it.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-28 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803972#comment-16803972
 ] 

Ilya Lantukh commented on IGNITE-9913:
--

Hi [~NSAmelchev],

 

Thanks for the contribution! I've added some comments on your PR on github.

In general, I think that what you have done doesn't match the ticket's 
description. PME should definitely be faster now, because you removed the 
distributed exchange phase out of it. But cache operations might still be 
blocked until PME is finished on all nodes. For large clusters it might take 
significant amount of time for NODE_LEFT event to reach all nodes, and for that 
time some nodes will have topVer == X, while others will have it == X-1. If a 
cache operation involves nodes from both subsets, it will get blocked until 
node with lower version updates it to a higher version.

 

[~ivan.glukos], do you agree with that?

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-28 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803849#comment-16803849
 ] 

Amelchev Nikita commented on IGNITE-9913:
-

I have implemented lightweight PME (based on PR by Ivan Rakov) for the case 
when the baseline server leaves topology.

I have benchmarked it with master under yardstick load 
(IgniteGetAndPutTxBenchmark, 6 servers, 2 clients by 64 threads):
master:
 !master_yardstick.png! 
with my changes:
 !9913_yardstick.png! 

PME duration
master: servers 1440+-35 ms (servers); 989+-87 ms (clients)
with changes:  117+-10 ms (servers and clients)

Also, max latency of transactions was decreased: 
master: 1439 ms
with changes: 293 ms

In summary, PME duration was decreased by 10 times and the maximum latency of 
transactions was decreased by 4-5 times.

TC tests look good. (testRebalancingDuringLoad_N can be muted until 
IGNITE-11623 will be resolved). 


> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-28 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803848#comment-16803848
 ] 

Amelchev Nikita commented on IGNITE-9913:
-

Seems, my changes increase the failure probability of these tests (light PME 
happens and topology changes more faster). The same failure in these group of 
tests happens in master. I have filed the IGNITE-11623 issue for this case.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-28 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803843#comment-16803843
 ] 

Ignite TC Bot commented on IGNITE-9913:
---

{panel:title=-- Run :: All: Possible 
Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}MVCC PDS 4{color} [[tests 
7|https://ci.ignite.apache.org/viewLog.html?buildId=3442305]]
* IgnitePdsMvccTestSuite4: 
IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes.testRebalancingDuringLoad_1000_500_1_1
 - 0,0% fails in last 394 master runs.
* IgnitePdsMvccTestSuite4: 
IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes.testRebalancingDuringLoad_1000_500_8_16
 - 0,0% fails in last 394 master runs.
* IgnitePdsMvccTestSuite4: 
IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes.testRebalancingDuringLoad_8000_500_8_1
 - 0,0% fails in last 394 master runs.
* IgnitePdsMvccTestSuite4: 
IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes.testRebalancingDuringLoad_8000_500_1_1
 - 0,0% fails in last 394 master runs.
* IgnitePdsMvccTestSuite4: 
IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes.testRebalancingDuringLoad_1000_2_1_1
 - 0,0% fails in last 394 master runs.
* IgnitePdsMvccTestSuite4: 
IgnitePdsContinuousRestartTestWithSharedGroupAndIndexes.testRebalancingDuringLoad_1000_2_8_1
 - 0,0% fails in last 394 master runs.

{color:#d04437}MVCC PDS 3{color} [[tests 
4|https://ci.ignite.apache.org/viewLog.html?buildId=3394663]]
* IgnitePdsMvccTestSuite3: 
IgnitePdsContinuousRestartTest.testRebalancingDuringLoad_1000_500_1_1 - 0,0% 
fails in last 399 master runs.
* IgnitePdsMvccTestSuite3: 
IgnitePdsContinuousRestartTest.testRebalancingDuringLoad_1000_2_8_1 - 0,0% 
fails in last 399 master runs.
* IgnitePdsMvccTestSuite3: 
IgnitePdsContinuousRestartTest.testRebalancingDuringLoad_1000_2_1_1 - 0,0% 
fails in last 399 master runs.
* IgnitePdsMvccTestSuite3: 
IgnitePdsContinuousRestartTest.testRebalancingDuringLoad_1000_2_8_16 - 0,0% 
fails in last 399 master runs.

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3383266buildTypeId=IgniteTests24Java8_RunAll]

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-28 Thread Aleksey Plekhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803706#comment-16803706
 ] 

Aleksey Plekhanov commented on IGNITE-9913:
---

[~NSAmelchev] I've looked at your patch, it looks good to me.

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-02-20 Thread Amelchev Nikita (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772876#comment-16772876
 ] 

Amelchev Nikita commented on IGNITE-9913:
-

[~ivan.glukos], Hi. Do you mind if I keep working on the issue based on your 
PR? I have investigated the issue and I would prepare PR to improve it at 
nearest time. 

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Ivan Rakov
>Priority: Major
> Fix For: 2.8
>
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)