[jira] [Commented] (IGNITE-10799) Optimize affinity initialization/re-calculation

2019-04-02 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807640#comment-16807640
 ] 

Alexey Goncharuk commented on IGNITE-10799:
---

[~Jokser], looks good, please proceed with merge.

> Optimize affinity initialization/re-calculation
> ---
>
> Key: IGNITE-10799
> URL: https://issues.apache.org/jira/browse/IGNITE-10799
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of persistence enabled and a baseline is set we have 2 main 
> approaches to recalculate affinity:
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerJoinWithExchangeMergeProtocol
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerLeftWithExchangeMergeProtocol
> {noformat}
> Both of them following the same approach of recalculating:
> 1) Take a current baseline (ideal assignment).
> 2) Filter out offline nodes from it.
> 3) Choose new primary nodes if previous went away.
> 4) Place temporal primary nodes to late affinity assignment set.
> Looking at implementation details we may notice that we do a lot of 
> unnecessary online nodes cache lookups and array list copies. The performance 
> becomes too slow if we do recalculate affinity for replicated caches (It 
> takes P * N on each node, where P - partitions count, N - the number of nodes 
> in the cluster). In case of large partitions count or large cluster, it may 
> take few seconds, which is unacceptable, because this process happens during 
> PME and freezes ongoing cluster operations.
> We should investigate possible bottlenecks and improve the performance of 
> affinity recalculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10799) Optimize affinity initialization/re-calculation

2019-04-01 Thread Pavel Kovalenko (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806922#comment-16806922
 ] 

Pavel Kovalenko commented on IGNITE-10799:
--

Blockers from Ignite TC Bot are not related to my changes.
[~agoncharuk] Your comments have fixed. Could you please look on change again?

> Optimize affinity initialization/re-calculation
> ---
>
> Key: IGNITE-10799
> URL: https://issues.apache.org/jira/browse/IGNITE-10799
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of persistence enabled and a baseline is set we have 2 main 
> approaches to recalculate affinity:
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerJoinWithExchangeMergeProtocol
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerLeftWithExchangeMergeProtocol
> {noformat}
> Both of them following the same approach of recalculating:
> 1) Take a current baseline (ideal assignment).
> 2) Filter out offline nodes from it.
> 3) Choose new primary nodes if previous went away.
> 4) Place temporal primary nodes to late affinity assignment set.
> Looking at implementation details we may notice that we do a lot of 
> unnecessary online nodes cache lookups and array list copies. The performance 
> becomes too slow if we do recalculate affinity for replicated caches (It 
> takes P * N on each node, where P - partitions count, N - the number of nodes 
> in the cluster). In case of large partitions count or large cluster, it may 
> take few seconds, which is unacceptable, because this process happens during 
> PME and freezes ongoing cluster operations.
> We should investigate possible bottlenecks and improve the performance of 
> affinity recalculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10799) Optimize affinity initialization/re-calculation

2019-04-01 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806920#comment-16806920
 ] 

Ignite TC Bot commented on IGNITE-10799:


{panel:title=-- Run :: All: Possible 
Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform C++ (Linux){color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3479549]]

{color:#d04437}Thin client: Python{color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3479602]]

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3479625buildTypeId=IgniteTests24Java8_RunAll]

> Optimize affinity initialization/re-calculation
> ---
>
> Key: IGNITE-10799
> URL: https://issues.apache.org/jira/browse/IGNITE-10799
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of persistence enabled and a baseline is set we have 2 main 
> approaches to recalculate affinity:
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerJoinWithExchangeMergeProtocol
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerLeftWithExchangeMergeProtocol
> {noformat}
> Both of them following the same approach of recalculating:
> 1) Take a current baseline (ideal assignment).
> 2) Filter out offline nodes from it.
> 3) Choose new primary nodes if previous went away.
> 4) Place temporal primary nodes to late affinity assignment set.
> Looking at implementation details we may notice that we do a lot of 
> unnecessary online nodes cache lookups and array list copies. The performance 
> becomes too slow if we do recalculate affinity for replicated caches (It 
> takes P * N on each node, where P - partitions count, N - the number of nodes 
> in the cluster). In case of large partitions count or large cluster, it may 
> take few seconds, which is unacceptable, because this process happens during 
> PME and freezes ongoing cluster operations.
> We should investigate possible bottlenecks and improve the performance of 
> affinity recalculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10799) Optimize affinity initialization/re-calculation

2019-03-29 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805080#comment-16805080
 ] 

Alexey Goncharuk commented on IGNITE-10799:
---

[~Jokser] a few comments on 
{{onServerLeftWithExchangeMergeProtocolLightweight}}:
1) Can we split the internals of the closure inside 
{{forAllRegisteredCacheGroups}} call into several methods? Currently it's hard 
to follow on what is going on in two loops
2) Let's make {{aliveNodes}} a hash set - otherwise multiple {{contains}} calls 
on this collection for each cache group and for each partition may consume 
significant time on large topologies and {{REPLICATED}} caches
3) In {{GridAffinityAssignmentV2}} there is an unnecessary check for 
{{idealPrimary == null}}

Otherwise looks good.

> Optimize affinity initialization/re-calculation
> ---
>
> Key: IGNITE-10799
> URL: https://issues.apache.org/jira/browse/IGNITE-10799
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of persistence enabled and a baseline is set we have 2 main 
> approaches to recalculate affinity:
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerJoinWithExchangeMergeProtocol
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerLeftWithExchangeMergeProtocol
> {noformat}
> Both of them following the same approach of recalculating:
> 1) Take a current baseline (ideal assignment).
> 2) Filter out offline nodes from it.
> 3) Choose new primary nodes if previous went away.
> 4) Place temporal primary nodes to late affinity assignment set.
> Looking at implementation details we may notice that we do a lot of 
> unnecessary online nodes cache lookups and array list copies. The performance 
> becomes too slow if we do recalculate affinity for replicated caches (It 
> takes P * N on each node, where P - partitions count, N - the number of nodes 
> in the cluster). In case of large partitions count or large cluster, it may 
> take few seconds, which is unacceptable, because this process happens during 
> PME and freezes ongoing cluster operations.
> We should investigate possible bottlenecks and improve the performance of 
> affinity recalculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10799) Optimize affinity initialization/re-calculation

2019-03-13 Thread Ignite TC Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791719#comment-16791719
 ] 

Ignite TC Bot commented on IGNITE-10799:


{panel:title=-- Run :: All: No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3256705buildTypeId=IgniteTests24Java8_RunAll]

> Optimize affinity initialization/re-calculation
> ---
>
> Key: IGNITE-10799
> URL: https://issues.apache.org/jira/browse/IGNITE-10799
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In case of persistence enabled and a baseline is set we have 2 main 
> approaches to recalculate affinity:
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerJoinWithExchangeMergeProtocol
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#onServerLeftWithExchangeMergeProtocol
> {noformat}
> Both of them following the same approach of recalculating:
> 1) Take a current baseline (ideal assignment).
> 2) Filter out offline nodes from it.
> 3) Choose new primary nodes if previous went away.
> 4) Place temporal primary nodes to late affinity assignment set.
> Looking at implementation details we may notice that we do a lot of 
> unnecessary online nodes cache lookups and array list copies. The performance 
> becomes too slow if we do recalculate affinity for replicated caches (It 
> takes P * N on each node, where P - partitions count, N - the number of nodes 
> in the cluster). In case of large partitions count or large cluster, it may 
> take few seconds, which is unacceptable, because this process happens during 
> PME and freezes ongoing cluster operations.
> We should investigate possible bottlenecks and improve the performance of 
> affinity recalculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)