[jira] [Assigned] (IGNITE-9433) Refactoring to improve constant usage for file suffixes

2018-08-31 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-9433:
--

Assignee: Pavel Voronkin

> Refactoring to improve constant usage for file suffixes
> ---
>
> Key: IGNITE-9433
> URL: https://issues.apache.org/jira/browse/IGNITE-9433
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.7
>
>
> We need extract file suffix constants to avoid duplication of string 
> constants for zip files, like ".zip" and ".tmp" across the project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9870) GridDhtPartitionsFullMessage#prepareMarshal compression parallelization.

2018-10-22 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658970#comment-16658970
 ] 

Pavel Voronkin commented on IGNITE-9870:


We can parallelize each marshall and zip of "parts", "partCntrs", "partCntrs2", 
"partHistSuppliers" and "partsToReload" in a separate task in thread pool and 
speed up message preparation. 

 

> GridDhtPartitionsFullMessage#prepareMarshal compression parallelization.
> 
>
> Key: IGNITE-9870
> URL: https://issues.apache.org/jira/browse/IGNITE-9870
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 2.6
>Reporter: Stanilovsky Evgeny
>Assignee: Stanilovsky Evgeny
>Priority: Major
>
> In huge topologies ~ 200 cluster nodes , 100 caches , 32k partitions per 
> cache, full map generation takes about 3 sec, seems correct approach here in 
> parallelization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9970) Add ability to set nodeId for VisorIdleVerifyDumpTask executed from ./control.sh --host HOST --cache idle_verify

2018-10-23 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9970:
---
Description: 
--cache idle_verify sets null nodeId on invocation of 

private  R executeTaskByNameOnNode(GridClient client, String taskClsName, 
Object taskArgs, UUID nodeId
) throws GridClientException {

 

Which causes reduce phase to be assigned on random node, we want to dump 
results on particular node.

> Add ability to set nodeId for VisorIdleVerifyDumpTask executed from 
> ./control.sh --host HOST --cache idle_verify
> 
>
> Key: IGNITE-9970
> URL: https://issues.apache.org/jira/browse/IGNITE-9970
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>
> --cache idle_verify sets null nodeId on invocation of 
> private  R executeTaskByNameOnNode(GridClient client, String taskClsName, 
> Object taskArgs, UUID nodeId
> ) throws GridClientException {
>  
> Which causes reduce phase to be assigned on random node, we want to dump 
> results on particular node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9970) Add ability to set nodeId for VisorIdleVerifyDumpTask executed from ./control.sh --host HOST --cache idle_verify

2018-10-23 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-9970:
--

 Summary: Add ability to set nodeId for VisorIdleVerifyDumpTask 
executed from ./control.sh --host HOST --cache idle_verify
 Key: IGNITE-9970
 URL: https://issues.apache.org/jira/browse/IGNITE-9970
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10085) Make compressed wal archives user friendly.

2018-10-31 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-10085:
---

 Summary: Make compressed wal archives user friendly.
 Key: IGNITE-10085
 URL: https://issues.apache.org/jira/browse/IGNITE-10085
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin


Compressed wal archives are created with ZipEntry(""). In some ZIP GUIs those 
archives are shown empty which can really confuse users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10085) Make compressed wal archives user friendly.

2018-10-31 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10085:

Priority: Minor  (was: Major)

> Make compressed wal archives user friendly.
> ---
>
> Key: IGNITE-10085
> URL: https://issues.apache.org/jira/browse/IGNITE-10085
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Minor
>
> Compressed wal archives are created with ZipEntry(""). In some ZIP GUIs those 
> archives are shown empty which can really confuse users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9942) We need ability in WebConsole to disable selfregistration feature.

2018-10-19 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-9942:
--

 Summary: We need ability in WebConsole to disable selfregistration 
feature.
 Key: IGNITE-9942
 URL: https://issues.apache.org/jira/browse/IGNITE-9942
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9942) We need ability in WebConsole to disable selfregistration feature.

2018-10-19 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9942:
---
Component/s: general

> We need ability in WebConsole to disable selfregistration feature.
> --
>
> Key: IGNITE-9942
> URL: https://issues.apache.org/jira/browse/IGNITE-9942
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 2.6
>Reporter: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9942) We need ability in WebConsole to disable selfregistration feature.

2018-10-19 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9942:
---
Affects Version/s: 2.6

> We need ability in WebConsole to disable selfregistration feature.
> --
>
> Key: IGNITE-9942
> URL: https://issues.apache.org/jira/browse/IGNITE-9942
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.6
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9942) We need ability in WebConsole to disable selfregistration feature.

2018-10-19 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9942:
---
Fix Version/s: 2.8

> We need ability in WebConsole to disable selfregistration feature.
> --
>
> Key: IGNITE-9942
> URL: https://issues.apache.org/jira/browse/IGNITE-9942
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 2.6
>Reporter: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9942) We need ability in WebConsole to disable selfregistration feature.

2018-10-19 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9942:
---
Description: Administrator of WebConsole lacks of functionality to disable 
self registration in WebConsole.

> We need ability in WebConsole to disable selfregistration feature.
> --
>
> Key: IGNITE-9942
> URL: https://issues.apache.org/jira/browse/IGNITE-9942
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 2.6
>Reporter: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>
> Administrator of WebConsole lacks of functionality to disable self 
> registration in WebConsole.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-10228) Start multiple caches in parallel may lead to the fact that some of the caches won't be registered.

2018-11-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-10228:
---

Assignee: Pavel Voronkin

> Start multiple caches in parallel may lead to the fact that some of the 
> caches won't be registered.
> ---
>
> Key: IGNITE-10228
> URL: https://issues.apache.org/jira/browse/IGNITE-10228
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Vyacheslav Koptilin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: CacheStartingParallelTest.java
>
>
> It looks like the root cause of the issue is that 
> {{CacheGroupContext.addCacheContext()}} (which is called in parallel) does 
> not use a lock/semaphore in order to synchronize {{caches}} updates.
>  
> {code:java}
> private void addCacheContext(GridCacheContext cctx) {
> ArrayList caches = new ArrayList<>(this.caches);
> boolean add = caches.add(cctx);
> ...
> this.caches = caches;
> }
> {code}
>  
> The possible workaround is to disable parallel start of caches by setting the 
> {{IGNITE_ALLOW_START_CACHES_IN_PARALLEL}} property to {{false}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10295) Rework Sending Full Message logging.

2018-11-16 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-10295:
---

 Summary: Rework Sending Full Message logging.
 Key: IGNITE-10295
 URL: https://issues.apache.org/jira/browse/IGNITE-10295
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10295) Rework Sending Full Message logging.

2018-11-16 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10295:

Description: 
[16:33:34,410][INFO][sys-#122][GridDhtPartitionsExchangeFuture] Sending Full 
Message performed in 0 ms.

[16:33:34,410][INFO][sys-#122][GridDhtPartitionsExchangeFuture] Sending Full 
Message to all nodes performed in 0 ms.

[16:33:32,993][INFO][exchange-worker-#66][GridDhtPartitionsExchangeFuture] 
Sending Single Message performed in 150 ms.

 

For varying number of caches, server nodes or number of partitions, reported 
single message time change respectively, but reported full message time always 
printed as 0 or 1 ms. Seems that it is calculated incorrectly, cause of async 
send of message.

The most we spend in this method is actually open connection on send()

 

 

 

> Rework Sending Full Message logging.
> 
>
> Key: IGNITE-10295
> URL: https://issues.apache.org/jira/browse/IGNITE-10295
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>
> [16:33:34,410][INFO][sys-#122][GridDhtPartitionsExchangeFuture] Sending Full 
> Message performed in 0 ms.
> [16:33:34,410][INFO][sys-#122][GridDhtPartitionsExchangeFuture] Sending Full 
> Message to all nodes performed in 0 ms.
> [16:33:32,993][INFO][exchange-worker-#66][GridDhtPartitionsExchangeFuture] 
> Sending Single Message performed in 150 ms.
>  
> For varying number of caches, server nodes or number of partitions, reported 
> single message time change respectively, but reported full message time 
> always printed as 0 or 1 ms. Seems that it is calculated incorrectly, cause 
> of async send of message.
> The most we spend in this method is actually open connection on send()
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-10295) Rework Sending Full Message logging.

2018-11-16 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-10295:
---

Assignee: Pavel Voronkin

> Rework Sending Full Message logging.
> 
>
> Key: IGNITE-10295
> URL: https://issues.apache.org/jira/browse/IGNITE-10295
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>
> [16:33:34,410][INFO][sys-#122][GridDhtPartitionsExchangeFuture] Sending Full 
> Message performed in 0 ms.
> [16:33:34,410][INFO][sys-#122][GridDhtPartitionsExchangeFuture] Sending Full 
> Message to all nodes performed in 0 ms.
> [16:33:32,993][INFO][exchange-worker-#66][GridDhtPartitionsExchangeFuture] 
> Sending Single Message performed in 150 ms.
>  
> For varying number of caches, server nodes or number of partitions, reported 
> single message time change respectively, but reported full message time 
> always printed as 0 or 1 ms. Seems that it is calculated incorrectly, cause 
> of async send of message.
> The most we spend in this method is actually open connection on send()
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10285) U.doInParallel may lead to deadlock

2018-11-16 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689391#comment-16689391
 ] 

Pavel Voronkin commented on IGNITE-10285:
-

Looks good for me.

> U.doInParallel may lead to deadlock
> ---
>
> Key: IGNITE-10285
> URL: https://issues.apache.org/jira/browse/IGNITE-10285
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
> Attachments: dump.rtf
>
>
> There are exist case when we can get a deadlock on the thread pool.
> If we try doInParallel in the thread of sys-pool in the number of 
> hreads==sys-pool.size we lead to deadlock because threads in sys-pool will 
> try doInParallel through the same sys-pool, and they will wait on future 
> infinitely because no one thread cannot complete operation doInParallel which 
> require threads from sys-pool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10300) control.sh: incorrect error message after three tries on unsuccessful authorization

2018-11-16 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-10300:
---

 Summary: control.sh: incorrect error message after three tries on 
unsuccessful authorization
 Key: IGNITE-10300
 URL: https://issues.apache.org/jira/browse/IGNITE-10300
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin


1. start grid with securirty enabled
2. try to issue control.sh --cache
authentication credentials asked
3. enter incorrect credentials three times

expected: Authentication error printed and logged
actual: Latest topology update failed error printed
{noformat}
IGNITE_HOME=`pwd` bin/control.sh --cache list .
Control utility [ver. 2.5.1-p160#20181113-sha1:5f845ca7]
2018 Copyright(C) Apache Software Foundation
User: mshonichev

Authentication error, try connection again.
user: 
password: 
Authentication error, try connection again.
user: 
password: 
Authentication error, try connection again.
user: 
password: 
Authentication error.
Error: Latest topology update failed.

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10300) control.sh: incorrect error message after three tries on unsuccessful authorization.

2018-11-16 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10300:

Summary: control.sh: incorrect error message after three tries on 
unsuccessful authorization.  (was: control.sh: incorrect error message after 
three tries on unsuccessful authorization)

> control.sh: incorrect error message after three tries on unsuccessful 
> authorization.
> 
>
> Key: IGNITE-10300
> URL: https://issues.apache.org/jira/browse/IGNITE-10300
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>
> 1. start grid with securirty enabled
> 2. try to issue control.sh --cache
> authentication credentials asked
> 3. enter incorrect credentials three times
> expected: Authentication error printed and logged
> actual: Latest topology update failed error printed
> {noformat}
> IGNITE_HOME=`pwd` bin/control.sh --cache list .
> Control utility [ver. 2.5.1-p160#20181113-sha1:5f845ca7]
> 2018 Copyright(C) Apache Software Foundation
> User: mshonichev
> 
> Authentication error, try connection again.
> user: 
> password: 
> Authentication error, try connection again.
> user: 
> password: 
> Authentication error, try connection again.
> user: 
> password: 
> Authentication error.
> Error: Latest topology update failed.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-10300) control.sh: incorrect error message after three tries on unsuccessful authorization

2018-11-16 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-10300:
---

Assignee: Pavel Voronkin

> control.sh: incorrect error message after three tries on unsuccessful 
> authorization
> ---
>
> Key: IGNITE-10300
> URL: https://issues.apache.org/jira/browse/IGNITE-10300
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>
> 1. start grid with securirty enabled
> 2. try to issue control.sh --cache
> authentication credentials asked
> 3. enter incorrect credentials three times
> expected: Authentication error printed and logged
> actual: Latest topology update failed error printed
> {noformat}
> IGNITE_HOME=`pwd` bin/control.sh --cache list .
> Control utility [ver. 2.5.1-p160#20181113-sha1:5f845ca7]
> 2018 Copyright(C) Apache Software Foundation
> User: mshonichev
> 
> Authentication error, try connection again.
> user: 
> password: 
> Authentication error, try connection again.
> user: 
> password: 
> Authentication error, try connection again.
> user: 
> password: 
> Authentication error.
> Error: Latest topology update failed.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10228) Start multiple caches in parallel may lead to the fact that some of the caches won't be registered.

2018-11-14 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686396#comment-16686396
 ] 

Pavel Voronkin commented on IGNITE-10228:
-

https://reviews.ignite.apache.org/ignite/review/IGNT-CR-963

> Start multiple caches in parallel may lead to the fact that some of the 
> caches won't be registered.
> ---
>
> Key: IGNITE-10228
> URL: https://issues.apache.org/jira/browse/IGNITE-10228
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Vyacheslav Koptilin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: CacheStartingParallelTest.java
>
>
> It looks like the root cause of the issue is that 
> {{CacheGroupContext.addCacheContext()}} (which is called in parallel) does 
> not use a lock/semaphore in order to synchronize {{caches}} updates.
>  
> {code:java}
> private void addCacheContext(GridCacheContext cctx) {
> ArrayList caches = new ArrayList<>(this.caches);
> boolean add = caches.add(cctx);
> ...
> this.caches = caches;
> }
> {code}
>  
> The possible workaround is to disable parallel start of caches by setting the 
> {{IGNITE_ALLOW_START_CACHES_IN_PARALLEL}} property to {{false}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9356) Ignite rest command http://localhost:8080/ignite?cmd=log=n=m return more line in linux than windows

2018-11-15 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9356:
---
Priority: Minor  (was: Major)

> Ignite rest command http://localhost:8080/ignite?cmd=log=n=m return 
> more line in linux than windows  
> -
>
> Key: IGNITE-9356
> URL: https://issues.apache.org/jira/browse/IGNITE-9356
> Project: Ignite
>  Issue Type: Improvement
>  Components: rest
>Affects Versions: 2.5
> Environment: Centos/ Windows10
>Reporter: ARomantsov
>Priority: Minor
> Fix For: 2.8
>
>
> I run cluster in diffrent configuration (centos and windows 10) and notice 
> that log command return diffrent count of rows in same from and to
> Windows rest return 1 less rows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10085) Make compressed wal archives user friendly.

2018-11-08 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681005#comment-16681005
 ] 

Pavel Voronkin commented on IGNITE-10085:
-

Looks good to me

> Make compressed wal archives user friendly.
> ---
>
> Key: IGNITE-10085
> URL: https://issues.apache.org/jira/browse/IGNITE-10085
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Sergey Antonov
>Priority: Minor
>
> Compressed wal archives are created with ZipEntry(""). In some ZIP GUIs those 
> archives are shown empty which can really confuse users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) -> 601 -> 724 - > 910 -> 655 -> ...* deactivated

*665(coordinator) --> 601 -> {color:#ff}724{color} - > 910 -> 655-* -*> 
...* **node failed

*665(coordinator) -> 601 -> 910 -> 655-*  *> ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

 

Coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 601, 724, 910, 655 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Spi on coordinator received node 724 failed message:

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

topology rolled to version 187, then another node 931 failed:

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated

*665(coordinator) -> 601 > {color:#ff}724{color} > 910 > 655-* *> ...* 
**node failed

*665(coordinator) -> 601 -> 910 -> 655-*  *> ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

 

Coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 601, 724, 910, 655 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Spi on coordinator received node 724 failed message:

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

topology rolled to version 187, then another node 931 failed:

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated

*665(coordinator) --> 601 -> {color:#ff}724{color} - > 910 -> 655-* -*> 
...* **node failed

*665(coordinator) -> 601 -> 910 -> 655-*  *> ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

 

Coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 601, 724, 910, 655 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Spi on coordinator received node 724 failed message:

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

topology rolled to version 187, then another node 931 failed:

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated

*665(coordinator)* *> 601 > {color:#FF}724{color} > 910 > 655 > ...* 
node failed

*665(coordinator) > 601 > 910 > 655 > ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

 

Coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 601, 724, 910, 655 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Spi on coordinator received node 724 failed message:

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

topology rolled to version 187, then another node 931 failed:

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Attachment: 601_gc_server_memory.log.0.current.7z

> Deactivation, segmentation of one node, activation may lead to hang 
> activation forever
> --
>
> Key: IGNITE-9793
> URL: https://issues.apache.org/jira/browse/IGNITE-9793
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Pavel Voronkin
>Priority: Major
> Attachments: 601_gc_server_memory.log.0.current.7z
>
>
> There is topology with ring of nodes:
> *665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated
> *665(coordinator)* *> 601 > {color:#FF}724{color} > 910 > 655 > ...* 
> node failed
> *665(coordinator) > 601 > 910 > 655 > ...* **activated
> During activation node 910 haven't received StateChangedMessage, hovever 655 
> and all subsequent nodes received and responded to coordinator.
> So coordinator expects to have 154 messages but received only 153 that is why 
> activation hangs.
> Details bellow:
>  
> Coordinator deactivated:
> 2018-09-24 15:09:01.609 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
>  2018-09-24 15:09:01.620 
> [DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
>  Server latch is created [latch=CompletableLatchUid
> {id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}
> , participantsSize=160]
>  2018-09-24 15:09:01.621 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]
> nodes 601, 724, 910, 655 were deactivated:
> 2018-09-24 15:09:01.609 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.328 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.334 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.332 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> Spi on coordinator received node 724 failed message:
> 2018-09-24 15:17:00.220 [WARN 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
> addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
> [grid724.domain/10.116.206.98:47500], discPort=47500, order=110, 
> intOrder=110, lastExchangeTime=1537528210290, loc=false, 
> ver=2.5.1#20180906-sha1:ebde6c79, isClient=false]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
> offheap=19.0GB, heap=4800.0GB]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  ^-- Baseline [id=6, size=160, online=156, offline=4]
> topology rolled to version 187, then another node 931 failed:
> 2018-09-24 15:17:00.466 [WARN 
> 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Attachment: 724_gc_server_memory.log.0.current.7z

> Deactivation, segmentation of one node, activation may lead to hang 
> activation forever
> --
>
> Key: IGNITE-9793
> URL: https://issues.apache.org/jira/browse/IGNITE-9793
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Pavel Voronkin
>Priority: Major
> Attachments: 601_gc_server_memory.log.0.current.7z, 
> 724_gc_server_memory.log.0.current.7z
>
>
> There is topology with ring of nodes:
> *665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated
> *665(coordinator)* *> 601 > {color:#FF}724{color} > 910 > 655 > ...* 
> node failed
> *665(coordinator) > 601 > 910 > 655 > ...* **activated
> During activation node 910 haven't received StateChangedMessage, hovever 655 
> and all subsequent nodes received and responded to coordinator.
> So coordinator expects to have 154 messages but received only 153 that is why 
> activation hangs.
> Details bellow:
>  
> Coordinator deactivated:
> 2018-09-24 15:09:01.609 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
>  2018-09-24 15:09:01.620 
> [DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
>  Server latch is created [latch=CompletableLatchUid
> {id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}
> , participantsSize=160]
>  2018-09-24 15:09:01.621 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]
> nodes 601, 724, 910, 655 were deactivated:
> 2018-09-24 15:09:01.609 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.328 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.334 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.332 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> Spi on coordinator received node 724 failed message:
> 2018-09-24 15:17:00.220 [WARN 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
> addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
> [grid724.domain/10.116.206.98:47500], discPort=47500, order=110, 
> intOrder=110, lastExchangeTime=1537528210290, loc=false, 
> ver=2.5.1#20180906-sha1:ebde6c79, isClient=false]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
> offheap=19.0GB, heap=4800.0GB]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  ^-- Baseline [id=6, size=160, online=156, offline=4]
> topology rolled to version 187, then another node 931 failed:
> 2018-09-24 15:17:00.466 [WARN 
> 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Attachment: 910_gc_server_memory.log.0.current.7z

> Deactivation, segmentation of one node, activation may lead to hang 
> activation forever
> --
>
> Key: IGNITE-9793
> URL: https://issues.apache.org/jira/browse/IGNITE-9793
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Pavel Voronkin
>Priority: Major
> Attachments: 601_gc_server_memory.log.0.current.7z, 
> 724_gc_server_memory.log.0.current.7z, 910_gc_server_memory.log.0.current.7z
>
>
> There is topology with ring of nodes:
> *665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated
> *665(coordinator)* *> 601 > {color:#FF}724{color} > 910 > 655 > ...* 
> node failed
> *665(coordinator) > 601 > 910 > 655 > ...* **activated
> During activation node 910 haven't received StateChangedMessage, hovever 655 
> and all subsequent nodes received and responded to coordinator.
> So coordinator expects to have 154 messages but received only 153 that is why 
> activation hangs.
> Details bellow:
>  
> Coordinator deactivated:
> 2018-09-24 15:09:01.609 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
>  2018-09-24 15:09:01.620 
> [DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
>  Server latch is created [latch=CompletableLatchUid
> {id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}
> , participantsSize=160]
>  2018-09-24 15:09:01.621 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]
> nodes 601, 724, 910, 655 were deactivated:
> 2018-09-24 15:09:01.609 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.328 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.334 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> 2018-09-24 15:09:03.332 [INFO 
> ][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Successfully deactivated data structures, services and caches 
> [nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
> topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
> Spi on coordinator received node 724 failed message:
> 2018-09-24 15:17:00.220 [WARN 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
> addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
> [grid724.domain/10.116.206.98:47500], discPort=47500, order=110, 
> intOrder=110, lastExchangeTime=1537528210290, loc=false, 
> ver=2.5.1#20180906-sha1:ebde6c79, isClient=false]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
> offheap=19.0GB, heap=4800.0GB]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
>  2018-09-24 15:17:00.221 [INFO 
> ][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  ^-- Baseline [id=6, size=160, online=156, offline=4]
> topology rolled to version 187, then another node 931 failed:
> 2018-09-24 15:17:00.466 [WARN 
> 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated

*665(coordinator)* *> 601 > {color:#ff}724{color} > 910 > 655 > ...* 
node failed

*665(coordinator) > 601 > 910 > 655 > ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

*Coordinator deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

*nodes 601, 724, 910, 655 were deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

*Spi on coordinator received node 724 failed message:*

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

*topology rolled to version 187, then another node 931 failed:*

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated

*665(coordinator)* *> 601 > {color:#ff}724{color} > 910 > 655 > ...* 
node failed

*665(coordinator) > 601 > 910 > 655 > ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

*Coordinator deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

*nodes 601, 724, 910, 655 were deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

*Spi on coordinator received node 724 failed message:*

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

*topology rolled to version 187, then another node 931 failed:*

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is coordinator and ring of nodes

coordinator -> 1 -> 2 - > 3 -> 4

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid{id='exchange', 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}, 
participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 1, 2, 3, 4 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Node 2 SEGMENTED

2018-09-24 15:17:50.068 [WARN 
][tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%|#2%DPL_GRID%DplGridNodeName%][o.a.i.s.d.tcp.TcpDiscoverySpi]
 Node is out of topology (probably, due to short-time network problems).
 2018-09-24 15:17:50.069 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Local node SEGMENTED: TcpDiscoveryNode 
[id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, addrs=ArrayList [10.116.206.98], 
sockAddrs=HashSet [grid724.domain/10.116.206.98:47500], discPort=47500, 
order=110, intOrder=110, lastExchangeTime=1537791470063, loc=true, 
ver=2.5.1#20180906-sha1:ebde6c79, isClient=false]

Coordinator started activation on topology without node2

2018-09-24 15:19:48.686 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Start activation process [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, 
client=false, topVer=AffinityTopologyVersion [topVer=188, minorTopVer=1]]

But node 3 which is next to node 2 haven't received activation message.

Coordinator sent activation to all except 3.

Node 3 haven't received activation message.

2018-09-24 15:24:25.911 [INFO 
][sys-#28144%DPL_GRID%DplGridNodeName%|#28144%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Coordinator received single message [ver=AffinityTopologyVersion [topVer=188, 
minorTopVer=1], node=073f1598-6b70-49df-8f45-126735611775, allReceived=false]

GridDhtPartitionsExchangeFuture hangs forever.

 

 

 

  was:
There is coordinator and ring of nodes

coordinator -> 1 -> 2 - > 3 -> 4

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid\{id='exchange', 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}, 
participantsSize=160]
2018-09-24 15:09:01.621 [INFO ][exchange-worker-#153%DPL_GRID%DplGridNodeName%]

nodes 1, 2, 3, 4 were deactivated:

2018-09-24 15:09:01.609 [INFO 

[jira] [Created] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-9793:
--

 Summary: Deactivation, segmentation of one node, activation may 
lead to hang activation forever
 Key: IGNITE-9793
 URL: https://issues.apache.org/jira/browse/IGNITE-9793
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Pavel Voronkin


There is coordinator and ring of nodes

coordinator -> 1 -> 2 - > 3 -> 4

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid\{id='exchange', 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}, 
participantsSize=160]
2018-09-24 15:09:01.621 [INFO ][exchange-worker-#153%DPL_GRID%DplGridNodeName%]

nodes 1, 2, 3, 4 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Node 2 SEGMENTED

2018-09-24 15:17:50.068 [WARN 
][tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%][o.a.i.s.d.tcp.TcpDiscoverySpi]
 Node is out of topology (probably, due to short-time network problems).
2018-09-24 15:17:50.069 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Local node SEGMENTED: TcpDiscoveryNode 
[id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, addrs=ArrayList [10.116.206.98], 
sockAddrs=HashSet [grid724.domain/10.116.206.98:47500], discPort=47500, 
order=110, intOrder=110, lastExchangeTime=1537791470063, loc=true, 
ver=2.5.1#20180906-sha1:ebde6c79, isClient=false]

Coordinator started activation on topology without node2

2018-09-24 15:19:48.686 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Start activation process [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, 
client=false, topVer=AffinityTopologyVersion [topVer=188, minorTopVer=1]]

But node 3 which is next to node 2 haven't received activation message.

Coordinator sent activation to all except

2018-09-24 15:24:25.911 [INFO 
][sys-#28144%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Coordinator received single message [ver=AffinityTopologyVersion [topVer=188, 
minorTopVer=1], node=073f1598-6b70-49df-8f45-126735611775, allReceived=false]

GridDhtPartitionsExchangeFuture hangs forever.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is coordinator and ring of nodes

coordinator -> 1 -> 2 - > 3 -> 4

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 1, 2, 3, 4 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Node 2 SEGMENTED

2018-09-24 15:17:50.068 [WARN 
][tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%|#2%DPL_GRID%DplGridNodeName%][o.a.i.s.d.tcp.TcpDiscoverySpi]
 Node is out of topology (probably, due to short-time network problems).
 2018-09-24 15:17:50.069 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Local node SEGMENTED: TcpDiscoveryNode 
[id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, addrs=ArrayList [10.116.206.98], 
sockAddrs=HashSet [grid724.domain/10.116.206.98:47500], discPort=47500, 
order=110, intOrder=110, lastExchangeTime=1537791470063, loc=true, 
ver=2.5.1#20180906-sha1:ebde6c79, isClient=false]

Coordinator started activation on topology without node2

2018-09-24 15:19:48.686 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Start activation process [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, 
client=false, topVer=AffinityTopologyVersion [topVer=188, minorTopVer=1]]

But node 3 which is next to node 2 haven't received activation message.

Coordinator sent activation to all except 3.

2018-09-24 15:24:25.911 [INFO 
][sys-#28144%DPL_GRID%DplGridNodeName%|#28144%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Coordinator received single message [ver=AffinityTopologyVersion [topVer=188, 
minorTopVer=1], node=073f1598-6b70-49df-8f45-126735611775, allReceived=false]

GridDhtPartitionsExchangeFuture hangs forever.

 

 

  was:
There is coordinator and ring of nodes

coordinator -> 1 -> 2 - > 3 -> 4

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid{id='exchange', 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}, 
participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 1, 2, 3, 4 were deactivated:

2018-09-24 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is coordinator and ring of nodes

665(coordinator) -> 601 -> 724 - > 910 -> 655

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}, 
participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 601, 724, 910, 655 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Spi on coordinator received node 724 failed message:

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4] 

topology rolled to version 187, then another node 931 failed:

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=155, offline=5]

topology rolled to version 188.

Node 724 SEGMENTED before activation starts:

2018-09-24 15:17:50.068 [WARN 
][tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%|#2%DPL_GRID%DplGridNodeName%][o.a.i.s.d.tcp.TcpDiscoverySpi]
 Node is out of topology (probably, due to short-time network problems).
 2018-09-24 15:17:50.069 [WARN 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is coordinator and ring of nodes

coordinator -> 1 -> 2 - > 3 -> 4

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 1, 2, 3, 4 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Node 2 SEGMENTED

2018-09-24 15:17:50.068 [WARN 
][tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%|#2%DPL_GRID%DplGridNodeName%][o.a.i.s.d.tcp.TcpDiscoverySpi]
 Node is out of topology (probably, due to short-time network problems).
 2018-09-24 15:17:50.069 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Local node SEGMENTED: TcpDiscoveryNode 
[id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, addrs=ArrayList [10.116.206.98], 
sockAddrs=HashSet [grid724.domain/10.116.206.98:47500], discPort=47500, 
order=110, intOrder=110, lastExchangeTime=1537791470063, loc=true, 
ver=2.5.1#20180906-sha1:ebde6c79, isClient=false]

Coordinator started activation on topology without node2

2018-09-24 15:19:48.686 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Start activation process [nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, 
client=false, topVer=AffinityTopologyVersion [topVer=188, minorTopVer=1]]

But node 3 which is next to node 2 haven't received activation message.

Coordinator sent activation to all except 3.

2018-09-24 15:24:25.911 [INFO 
][sys-#28144%DPL_GRID%DplGridNodeName%|#28144%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Coordinator received single message [ver=AffinityTopologyVersion [topVer=188, 
minorTopVer=1], node=073f1598-6b70-49df-8f45-126735611775, allReceived=false]

GridDhtPartitionsExchangeFuture hangs forever.

 So one node in the ring missed the message, hovever all other nodes in 
topology got it, how is that possible?

 

  was:
There is coordinator and ring of nodes

coordinator -> 1 -> 2 - > 3 -> 4

coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) -> 601 -> 724 - > 910 -> 655 -> ...* deactivated

*665(coordinator) -> 601 -> {color:#FF}724{color} - > 910 -> 655* *-> ...* 
**node failed

*665(coordinator) -> 601 -> 910 -> 655*  *-> ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

 

Coordinator deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

nodes 601, 724, 910, 655 were deactivated:

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

Spi on coordinator received node 724 failed message:

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

topology rolled to version 187, then another node 931 failed:

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.ca.sbrf.ru/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated

*665(coordinator)* *> 601 > {color:#ff}724{color} > 910 > 655 > ...* 
node failed

*665(coordinator) > 601 > 910 > 655 > ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

*Coordinator deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

*nodes 601, 724, 910, 655 were deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

*Spi on coordinator received node 724 failed message:*

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

*topology rolled to version 187, then another node 931 failed:*

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.domain/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9793) Deactivation, segmentation of one node, activation may lead to hang activation forever

2018-10-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9793:
---
Description: 
There is topology with ring of nodes:

*665(coordinator) > 601 > 724 > 910 > 655 > ...* deactivated

*665(coordinator)* *> 601 > {color:#ff}724{color} > 910 > 655 > ...* 
node failed

*665(coordinator) > 601 > 910 > 655 > ...* **activated

During activation node 910 haven't received StateChangedMessage, hovever 655 
and all subsequent nodes received and responded to coordinator.

So coordinator expects to have 154 messages but received only 153 that is why 
activation hangs.

Details bellow:

*Coordinator deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]
 2018-09-24 15:09:01.620 
[DEBUG][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
 Server latch is created [latch=CompletableLatchUid

{id='exchange', topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]}

, participantsSize=160]
 2018-09-24 15:09:01.621 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%]

*nodes 601, 724, 910, 655 were deactivated:*

2018-09-24 15:09:01.609 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=e002e011-8d1c-4353-a0f3-b71264c5b0f4, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.328 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=22a58223-47b5-43c2-897b-e70e8e50edf7, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.334 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=973eb8ce-3b8c-463d-a6ab-00ac66d93f13, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

2018-09-24 15:09:03.332 [INFO 
][exchange-worker-#153%DPL_GRID%DplGridNodeName%|#153%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
 Successfully deactivated data structures, services and caches 
[nodeId=a904bac4-aaed-4f69-90f3-c13bc4d331d1, client=false, 
topVer=AffinityTopologyVersion [topVer=183, minorTopVer=1]]

*Spi on coordinator received node 724 failed message:*

2018-09-24 15:17:00.220 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=a904bac4-aaed-4f69-90f3-c13bc4d331d1, 
addrs=ArrayList [10.116.206.98], sockAddrs=HashSet 
[grid724.domain/10.116.206.98:47500], discPort=47500, order=110, intOrder=110, 
lastExchangeTime=1537528210290, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=187, servers=156, clients=0, CPUs=8736, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Node [id=E002E011-8D1C-4353-A0F3-B71264C5B0F4, clusterState=INACTIVE]
 2018-09-24 15:17:00.221 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 ^-- Baseline [id=6, size=160, online=156, offline=4]

*topology rolled to version 187, then another node 931 failed:*

2018-09-24 15:17:00.466 [WARN 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Node FAILED: TcpDiscoveryNode [id=83536b6d-8aa3-4c85-b3da-5e577ae37ac6, 
addrs=ArrayList [10.116.215.3], sockAddrs=HashSet 
[grid931.domain/10.116.215.3:47500], discPort=47500, order=73, intOrder=73, 
lastExchangeTime=1537528186599, loc=false, ver=2.5.1#20180906-sha1:ebde6c79, 
isClient=false]
 2018-09-24 15:17:00.467 [INFO 
][disco-event-worker-#152%DPL_GRID%DplGridNodeName%|#152%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
 Topology snapshot [ver=188, servers=155, clients=0, CPUs=8680, 
offheap=19.0GB, heap=4800.0GB]
 2018-09-24 15:17:00.467 [INFO 

[jira] [Updated] (IGNITE-9433) Refactoring to improve constant usage for file suffixes

2018-08-30 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9433:
---
Description: We need extract file suffix constants to avoid duplication of 
string constants for zip files, like ".zip" and ".tmp" across the project  
(was: We need extract file suffix constants to avoid duplication of string 
constants for zip files, like ".zip" across the project)

> Refactoring to improve constant usage for file suffixes
> ---
>
> Key: IGNITE-9433
> URL: https://issues.apache.org/jira/browse/IGNITE-9433
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Voronkin
>Priority: Major
> Fix For: 2.7
>
>
> We need extract file suffix constants to avoid duplication of string 
> constants for zip files, like ".zip" and ".tmp" across the project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9433) Refactoring to improve constant usage for file suffixes

2018-08-30 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-9433:
--

 Summary: Refactoring to improve constant usage for file suffixes
 Key: IGNITE-9433
 URL: https://issues.apache.org/jira/browse/IGNITE-9433
 Project: Ignite
  Issue Type: Task
Reporter: Pavel Voronkin
 Fix For: 2.7






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9433) Refactoring to improve constant usage for file suffixes

2018-08-30 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9433:
---
Description: We need extract file suffix constants to avoid duplication of 
string constants for zip files, like ".zip" across the project

> Refactoring to improve constant usage for file suffixes
> ---
>
> Key: IGNITE-9433
> URL: https://issues.apache.org/jira/browse/IGNITE-9433
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Voronkin
>Priority: Major
> Fix For: 2.7
>
>
> We need extract file suffix constants to avoid duplication of string 
> constants for zip files, like ".zip" across the project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9433) Refactoring to improve constant usage for file suffixes

2018-09-03 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602106#comment-16602106
 ] 

Pavel Voronkin commented on IGNITE-9433:


[~DmitriyGovorukhin], please assist with merge

> Refactoring to improve constant usage for file suffixes
> ---
>
> Key: IGNITE-9433
> URL: https://issues.apache.org/jira/browse/IGNITE-9433
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.7
>
>
> We need extract file suffix constants to avoid duplication of string 
> constants for zip files, like ".zip" and ".tmp" across the project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9433) Refactoring to improve constant usage for file suffixes

2018-09-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-9433:
---
Ignite Flags:   (was: Docs Required)

> Refactoring to improve constant usage for file suffixes
> ---
>
> Key: IGNITE-9433
> URL: https://issues.apache.org/jira/browse/IGNITE-9433
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.7
>
>
> We need extract file suffix constants to avoid duplication of string 
> constants for zip files, like ".zip" and ".tmp" across the project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10851) Improve WalCompactionSwitchOnTest to rely on rollOver() instead of hardcoded values.

2018-12-29 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10851:

Description: 
WalCompactionSwitchOnTest expects hardcoded number of segments after load.

This is flaky approach, we need to calculate expected size using WAL rollOver() 
and archive compressed events.

> Improve WalCompactionSwitchOnTest to rely on rollOver() instead of hardcoded 
> values.
> 
>
> Key: IGNITE-10851
> URL: https://issues.apache.org/jira/browse/IGNITE-10851
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>
> WalCompactionSwitchOnTest expects hardcoded number of segments after load.
> This is flaky approach, we need to calculate expected size using WAL 
> rollOver() and archive compressed events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10851) Improve WalCompactionSwitchOnTest to rely on rollOver() instead of hardcoded values.

2018-12-29 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-10851:
---

 Summary: Improve WalCompactionSwitchOnTest to rely on rollOver() 
instead of hardcoded values.
 Key: IGNITE-10851
 URL: https://issues.apache.org/jira/browse/IGNITE-10851
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-17-12-58-07-382.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-17-12-59-52-137.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744853#comment-16744853
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

Here is the result of optimization on :

8K partitions, 200 caches, 20 groups, 4 nodes

Before:

!image-2019-01-17-12-58-07-382.png!

After:

 

!image-2019-01-17-12-59-52-137.png!

We see compaction upto 250times for backup collections.

Moreover if backup less then 6 we won't allocate List> using 
viewReadOnly(ArrayList) to not allocate and keep objects.

 

 

 

 

 

 

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-15 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743205#comment-16743205
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

Benchmark Mode Cnt Score Error Units
SmallHashSetsVsReadOnlyViewBenchmark.hashSetContainsRandom thrpt 20 
26221395,193 ± 240929,392 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.hashSetIteratorRandom thrpt 20 
12626598,194 ± 1742223,886 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.readOnlyViewContainsRandom thrpt 20 
23301229,681 ± 534549,170 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.readOnlyViewIteratorRandom thrpt 20 
21134614,093 ± 666488,488 ops/s

we see 2x improvement in iterator and slight reduce in contains() in addition 
to reduced allocations.

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr
>
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-15 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743205#comment-16743205
 ] 

Pavel Voronkin edited comment on IGNITE-10877 at 1/15/19 4:41 PM:
--

Benchmark Mode Cnt Score Error Units
SmallHashSetsVsReadOnlyViewBenchmark.hashSetContainsRandom thrpt 20 
25690717,349 ± 200741,979 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.hashSetIteratorRandom thrpt 20 
12836581,770 ± 248020,906 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.readOnlyViewContainsRandom thrpt 20 
22278517,368 ± 339376,502 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.readOnlyViewIteratorRandom thrpt 20 
19959598,363 ± 709696,316 ops/s

we see 2x improvement in iterator and slight reduce in contains() in addition 
to reduced allocations.

 


was (Author: voropava):
Benchmark Mode Cnt Score Error Units
SmallHashSetsVsReadOnlyViewBenchmark.hashSetContainsRandom thrpt 20 
26221395,193 ± 240929,392 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.hashSetIteratorRandom thrpt 20 
12626598,194 ± 1742223,886 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.readOnlyViewContainsRandom thrpt 20 
23301229,681 ± 534549,170 ops/s
SmallHashSetsVsReadOnlyViewBenchmark.readOnlyViewIteratorRandom thrpt 20 
21134614,093 ± 666488,488 ops/s

we see 2x improvement in iterator and slight reduce in contains() in addition 
to reduced allocations.

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr
>
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-14 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-10877:
---

Assignee: Pavel Voronkin

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr
>
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Summary: RecoveryLastReceivedMessage(NEED_WAIT) write message failed in 
case of SSL  (was: NEED_WAIT write message failed in case of SSL)

> RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Description: 
Problem: 

In case of initiator node haven't joined topology yet (doesn't exist in 
DiscoCache, but exists in TcpDsicovery ring)

we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
else clause:

if (unknownNode)

{ U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
ses=" + ses + ']'); ses.close(); }

else {
 ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
CI1>() {
 @Override public void apply(IgniteInternalFuture fut)

{ ses.close(); }

});
 }

In case of SSL such code do encrypt and send concurrently with session.close() 
which results in exception:


 javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
[status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
hashCode=1324367867, interrupted=false, 
runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
 [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
select=true, super=]DirectNioClientWorker [super=], 
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, 
outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, 
rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, 
bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, 
sndSchedTime=1544502852522, lastSndTime=1544502852522, 
lastRcvTime=1544502852522, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=true, markedForClose=true]]]
                 at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
                 at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
                 at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
                 at java.lang.Thread.run(Thread.java:745)
  
So initiator receive closed exception instead of NEED_WAIT message which leads 
to exception scenario.

As result instead of NEED_WAIT loop we retry with exception N times and fail.

 

  was:
The problem is that in case of initiator node doesn't exist in DiscoCache, but 
exists in TcpDsicovery ring

we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
else:

 

if (unknownNode) {
 U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
ses=" + ses + ']');

 ses.close();
}
else {
 ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
CI1>() {
 @Override public void apply(IgniteInternalFuture fut) {
 ses.close();
 }
 });
}

In case of SSL such code do encrypt and send concurrently with close which 
results in :

 
javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
[status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
hashCode=1324367867, interrupted=false, 
runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
 [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
select=true, super=]DirectNioClientWorker [super=], 
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, 
outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, 
rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, 
bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, 
sndSchedTime=1544502852522, lastSndTime=1544502852522, 
lastRcvTime=1544502852522, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=true, markedForClose=true]]]
                at 

[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Description: 
The problem is that in case of initiator node doesn't exist in DiscoCache, but 
exists in TcpDsicovery ring

we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
else:

 

if (unknownNode) {
 U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
ses=" + ses + ']');

 ses.close();
}
else {
 ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
CI1>() {
 @Override public void apply(IgniteInternalFuture fut) {
 ses.close();
 }
 });
}

In case of SSL such code do encrypt and send concurrently with close which 
results in :

 
javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
[status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
hashCode=1324367867, interrupted=false, 
runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
 [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
select=true, super=]DirectNioClientWorker [super=], 
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, 
outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, 
rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, 
bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, 
sndSchedTime=1544502852522, lastSndTime=1544502852522, 
lastRcvTime=1544502852522, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=true, markedForClose=true]]]
                at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
                at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
                at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
                at java.lang.Thread.run(Thread.java:745)
 
 
 

 

 

 

 

> RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> The problem is that in case of initiator node doesn't exist in DiscoCache, 
> but exists in TcpDsicovery ring
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else:
>  
> if (unknownNode) {
>  U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']');
>  ses.close();
> }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut) {
>  ses.close();
>  }
>  });
> }
> In case of SSL such code do encrypt and send concurrently with close which 
> results in :
>  
> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 

[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Summary: RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to 
encrypt data (SSL engine error)".  (was: RecoveryLastReceivedMessage(NEED_WAIT) 
write message failed in case of SSL)

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> The problem is that in case of initiator node doesn't exist in DiscoCache, 
> but exists in TcpDsicovery ring
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else:
>  
> if (unknownNode) {
>  U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']');
>  ses.close();
> }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut) {
>  ses.close();
>  }
>  });
> }
> In case of SSL such code do encrypt and send concurrently with close which 
> results in :
>  
> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                 at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                 at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                 at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                 at java.lang.Thread.run(Thread.java:745)
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11016) NEED_WAIT write message failed in case of SSL

2019-01-21 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11016:
---

 Summary: NEED_WAIT write message failed in case of SSL
 Key: IGNITE-11016
 URL: https://issues.apache.org/jira/browse/IGNITE-11016
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Attachment: IgniteClientConnectSslTest.java

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
> Attachments: IgniteClientConnectSslTest.java
>
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748456#comment-16748456
 ] 

Pavel Voronkin commented on IGNITE-11016:
-

Reproducer attached.

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
> Attachments: IgniteClientConnectSslTest.java
>
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11017) OffheapEntriesCount metrics calculate size on all not EVICTED partitions

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11017:

Description: GridDhtPartitionTopologyImpl.CurrentPartitionsIterator 
iterates over not EVICTED partitions on calculating entries size.

> OffheapEntriesCount metrics calculate size on all not EVICTED partitions
> 
>
> Key: IGNITE-11017
> URL: https://issues.apache.org/jira/browse/IGNITE-11017
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not 
> EVICTED partitions on calculating entries size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11017) OffheapEntriesCount metrics calculate size on all not EVICTED partitions

2019-01-21 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11017:
---

 Summary: OffheapEntriesCount metrics calculate size on all not 
EVICTED partitions
 Key: IGNITE-11017
 URL: https://issues.apache.org/jira/browse/IGNITE-11017
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-22 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-11016:
---

Assignee: Pavel Voronkin

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: IgniteClientConnectSslTest.java
>
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay in .NET.

2019-01-22 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11026:

Summary: Support TcpCommunicationSpi.NeedWaitDelay, 
TcpCommunicationSpi.MaxNeedWaitDelay in .NET.  (was: Support 
TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay.)

> Support TcpCommunicationSpi.NeedWaitDelay, 
> TcpCommunicationSpi.MaxNeedWaitDelay in .NET.
> 
>
> Key: IGNITE-11026
> URL: https://issues.apache.org/jira/browse/IGNITE-11026
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay.

2019-01-22 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11026:
---

 Summary: Support TcpCommunicationSpi.NeedWaitDelay, 
TcpCommunicationSpi.MaxNeedWaitDelay.
 Key: IGNITE-11026
 URL: https://issues.apache.org/jira/browse/IGNITE-11026
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-20 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747703#comment-16747703
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

I don't think it breaks compatibilty, cause we have ignite property to rollback 
to original behaviour for mixed envs.

Moveover GridAffinityAssignment serialization is broken right now. See 
IGNITE-10925, we need to fix issue there.

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-20 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747703#comment-16747703
 ] 

Pavel Voronkin edited comment on IGNITE-10877 at 1/21/19 6:59 AM:
--

I don't think it breaks compatibilty, cause we have ignite property to rollback 
to original behaviour for mixed envs.

Moveover GridAffinityAssignment serialization is broken right now. See 
IGNITE-10925, we need to fix all issues there.

 


was (Author: voropava):
I don't think it breaks compatibilty, cause we have ignite property to rollback 
to original behaviour for mixed envs.

Moveover GridAffinityAssignment serialization is broken right now. See 
IGNITE-10925, we need to fix issue there.

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-17-15-45-53-043.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-17-15-45-49-561.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745009#comment-16745009
 ] 

Pavel Voronkin edited comment on IGNITE-10877 at 1/17/19 12:47 PM:
---

65k partitions, 160 nodes, 3 backups

 

!image-2019-01-17-15-45-53-043.png!

!image-2019-01-17-15-46-32-872.png!

 


was (Author: voropava):
65k partitions 160 nodes

 

!image-2019-01-17-15-45-53-043.png!

!image-2019-01-17-15-46-32-872.png!

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-17-15-46-32-872.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-17 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745009#comment-16745009
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

65k partitions 160 nodes

 

!image-2019-01-17-15-45-53-043.png!

!image-2019-01-17-15-46-32-872.png!

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-56-10-339.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-55-39-496.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-56-18-040.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745931#comment-16745931
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

1024part 4k nodes

HashSet

!image-2019-01-18-11-56-10-339.png!

BitSet

!image-2019-01-18-11-56-18-040.png!

 On as small number of partitions BitSet is roughly the same with HashSet.

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-12-09-04-835.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-12-09-32-876.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746083#comment-16746083
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

16part, 16 nodes

HashSet

!image-2019-01-18-12-09-32-876.png!

BitSet

!image-2019-01-18-12-09-04-835.png!

 

In total we have:

N - number of nodes

P - number of parts

low P, low N  - BitSet better

high P, low N - BitSet better

low P, high N - BitSet slightly better

high P, high N - HashSet is better

I suggest to have threshold of 500.

 

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746083#comment-16746083
 ] 

Pavel Voronkin edited comment on IGNITE-10877 at 1/18/19 9:34 AM:
--

16part, 16 nodes

HashSet

!image-2019-01-18-12-09-32-876.png!

BitSet

!image-2019-01-18-12-09-04-835.png!

 

In total we have:

N - number of nodes

P - number of parts

low P, low N  - BitSet better

high P, low N - BitSet better

low P, high N - BitSet slightly better

high P, high N - HashSet is better

At nodes more than 500 we need compacted BitSet see 

 

 


was (Author: voropava):
16part, 16 nodes

HashSet

!image-2019-01-18-12-09-32-876.png!

BitSet

!image-2019-01-18-12-09-04-835.png!

 

In total we have:

N - number of nodes

P - number of parts

low P, low N  - BitSet better

high P, low N - BitSet better

low P, high N - BitSet slightly better

high P, high N - HashSet is better

I suggest to have threshold of 500.

 

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-36-57-451.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-38-39-410.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745911#comment-16745911
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

64k parts, 4k nodes 

HashSets

!image-2019-01-18-11-38-39-410.png!

BitSets

!image-2019-01-18-11-36-57-451.png!

 

We see that HashSet is the clear winner on high number of cluster nodes, 
alongside with high number of partitions.

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-22 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749517#comment-16749517
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

Thanks for your feedback [~ascherbakov], i've resolved them.

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10644) CorruptedTreeException might occur after force node kill during transaction

2018-12-11 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10644:

Description: 
Partition eviction process on the other hand:

 

2018-12-10 20:59:24.426 
[ERROR]sys-#204%_GRID%GridNodeName%[o.a.i.i.p.c.d.d.t.PartitionsEvictManager] 
Partition eviction failed, this can cause grid hang.
org.h2.message.DbException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
X.common.dpl.model.backstream.DBackStreamMessage_DPL_PROXY [idHash=1961442513, 
hash=529139710, colocationKey=14465, entityType=I, 
lastChangeDate=1544464745135, errorMessage=No api 
[X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, partition_DPL_id=5, 
messageId=1211871172446406939, entityId=1211871174131851324, ownerId=ucp, 
responseDate=null, entityVersion=1, isDeleted=false, requestDate=Mon Dec 10 
20:59:05 MSK 2018, id=4071535538120363041], ver: GridCacheVersion 
[topVer=155940834, order=1544596983071, nodeOrder=114] ][ I, null, 
1211871172446406939, 1211871174131851324, null, 1, 2018-12-10 20:59:05.115, No 
api [X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, 4071535538120363041, FALSE, 5 
]" [5-195]
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.DbException.convert(DbException.java:295)
at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:293)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:515)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:738)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2487)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:433)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1435)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1633)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:383)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3706)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:652)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:1079)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.tryClear(GridDhtLocalPartition.java:915)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:423)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6782)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.h2.jdbc.JdbcSQLException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
X.common.dpl.model.backstream.DBackStreamMessage_DPL_PROXY [idHash=1961442513, 
hash=529139710, colocationKey=14465, entityType=I, 
lastChangeDate=1544464745135, errorMessage=No api 
[X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, partition_DPL_id=5, 
messageId=1211871172446406939, entityId=1211871174131851324, ownerId=ucp, 
responseDate=null, entityVersion=1, isDeleted=false, requestDate=Mon Dec 10 
20:59:05 MSK 2018, id=4071535538120363041], ver: 

[jira] [Updated] (IGNITE-10644) CorruptedTreeException might occur after force node kill during transaction

2018-12-11 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10644:

Description: 
Partition eviction process on the other hand:

 

2018-12-10 20:59:24.426 
[ERROR]sys-#204%_GRID%GridNodeName%[o.a.i.i.p.c.d.d.t.PartitionsEvictManager] 
Partition eviction failed, this can cause grid hang.
org.h2.message.DbException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
X.common.dpl.model.backstream.DBackStreamMessage_DPL_PROXY [idHash=1961442513, 
hash=529139710, colocationKey=14465, entityType=I, 
lastChangeDate=1544464745135, errorMessage=No api 
[X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, partition_X_id=5, 
messageId=1211871172446406939, entityId=1211871174131851324, ownerId=ucp, 
responseDate=null, entityVersion=1, isDeleted=false, requestDate=Mon Dec 10 
20:59:05 MSK 2018, id=4071535538120363041], ver: GridCacheVersion 
[topVer=155940834, order=1544596983071, nodeOrder=114] ][ I, null, 
1211871172446406939, 1211871174131851324, null, 1, 2018-12-10 20:59:05.115, No 
api [X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, 4071535538120363041, FALSE, 5 
]" [5-195]
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.DbException.convert(DbException.java:295)
at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:293)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:515)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:738)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2487)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:433)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1465)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1435)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1633)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:383)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3706)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:652)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:1079)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.tryClear(GridDhtLocalPartition.java:915)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:423)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6782)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.h2.jdbc.JdbcSQLException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
X.common.X.model.backstream.DBackStreamMessage_X_PROXY [idHash=1961442513, 
hash=529139710, colocationKey=14465, entityType=I, 
lastChangeDate=1544464745135, errorMessage=No api 
[X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, partition_X_id=5, 
messageId=1211871172446406939, entityId=1211871174131851324, ownerId=ucp, 
responseDate=null, entityVersion=1, isDeleted=false, requestDate=Mon Dec 10 
20:59:05 MSK 2018, id=4071535538120363041], ver: GridCacheVersion 

[jira] [Updated] (IGNITE-10644) CorruptedTreeException might occur after force node kill during transaction

2018-12-11 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10644:

Description: 
Partition eviction process on the other hand:

 

2018-12-10 20:59:24.426 
[ERROR][sys-#204%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.t.PartitionsEvictManager]
 Partition eviction failed, this can cause grid hang.
org.h2.message.DbException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
com.sbt.bm.ucp.common.dpl.model.backstream.DBackStreamMessage_DPL_PROXY 
[idHash=1961442513, hash=529139710, colocationKey=14465, entityType=I, 
lastChangeDate=1544464745135, errorMessage=No api 
[ru.sbt.integration.orchestration.scripts.ucp.retail.propagate.publicapi.ClientPropagateService]
 services available for route: [*]-[*]-[kbt] (zone-node-module).IP: [*]. 
 List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, partition_DPL_id=5, 
messageId=1211871172446406939, entityId=1211871174131851324, ownerId=ucp, 
responseDate=null, entityVersion=1, isDeleted=false, requestDate=Mon Dec 10 
20:59:05 MSK 2018, id=4071535538120363041], ver: GridCacheVersion 
[topVer=155940834, order=1544596983071, nodeOrder=114] ][ I, null, 
1211871172446406939, 1211871174131851324, null, 1, 2018-12-10 20:59:05.115, No 
api 
[ru.sbt.integration.orchestration.scripts.ucp.retail.propagate.publicapi.ClientPropagateService]
 services available for route: [*]-[*]-[kbt] (zone-node-module).IP: [*]. 
 List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, 4071535538120363041, FALSE, 5 
]" [5-195]
 at org.h2.message.DbException.get(DbException.java:168)
 at org.h2.message.DbException.convert(DbException.java:295)
 at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:293)
 at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:515)
 at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:738)
 at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2487)
 at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:433)
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1465)
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1435)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1633)
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:383)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3706)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:652)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:1079)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.tryClear(GridDhtLocalPartition.java:915)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:423)
 at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6782)
 at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: org.h2.jdbc.JdbcSQLException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
com.sbt.bm.ucp.common.dpl.model.backstream.DBackStreamMessage_DPL_PROXY 
[idHash=1961442513, hash=529139710, colocationKey=14465, entityType=I, 
lastChangeDate=1544464745135, errorMessage=No api 
[ru.sbt.integration.orchestration.scripts.ucp.retail.propagate.publicapi.ClientPropagateService]
 services available for route: [*]-[*]-[kbt] (zone-node-module).IP: [*]. 
 List of services violations:
NODE MODULE FILTER VIOLATIONS 
No services or violations were found for routing, partition_DPL_id=5, 
messageId=1211871172446406939, 

[jira] [Updated] (IGNITE-10671) Double initialization of segmentAware and FileArchiver lead to race breaking file compression.

2018-12-13 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10671:

Attachment: WalCompactionSwitchOverTest.java

> Double initialization of segmentAware and FileArchiver lead to race breaking 
> file compression.
> --
>
> Key: IGNITE-10671
> URL: https://issues.apache.org/jira/browse/IGNITE-10671
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
> Attachments: WalCompactionSwitchOverTest.java
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10671) Double initialization of segmentAware and FileArchiver lead to race breaking file compression.

2018-12-13 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-10671:
---

 Summary: Double initialization of segmentAware and FileArchiver 
lead to race breaking file compression.
 Key: IGNITE-10671
 URL: https://issues.apache.org/jira/browse/IGNITE-10671
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin
 Attachments: WalCompactionSwitchOverTest.java





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10671) Double initialization of segmentAware and FileArchiver lead to race breaking file compression.

2018-12-13 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10671:

Description: 
Race is painful when you switch your cluster from walCompaction=false to 
walCompaction=true.

The same FileCompressor instance will use different segmentAwares due to 
start0() is called twice which leads to inconsistent behaviour and errors 
during compaction, basically we will try to archive files twice concurrently.

> Double initialization of segmentAware and FileArchiver lead to race breaking 
> file compression.
> --
>
> Key: IGNITE-10671
> URL: https://issues.apache.org/jira/browse/IGNITE-10671
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
> Attachments: WalCompactionSwitchOverTest.java
>
>
> Race is painful when you switch your cluster from walCompaction=false to 
> walCompaction=true.
> The same FileCompressor instance will use different segmentAwares due to 
> start0() is called twice which leads to inconsistent behaviour and errors 
> during compaction, basically we will try to archive files twice concurrently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10324) Disallow fallback to Scanner in control.sh when asking password

2018-12-20 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10324:

Ignite Flags:   (was: Docs Required)

> Disallow fallback to Scanner in control.sh when asking password
> ---
>
> Key: IGNITE-10324
> URL: https://issues.apache.org/jira/browse/IGNITE-10324
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Alexand Polyakov
>Priority: Major
>
> After implementing IGNITE-9990 we still can fallback to Scanner in case of 
> Console is not allowed, user can easily fallback to non-secure mode by using 
> some java agent. We should not allow this, cause otherwise all efforts in 
> IGNITE-9990 are useless.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IGNITE-10324) Disallow fallback to Scanner in control.sh when asking password

2018-12-20 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin closed IGNITE-10324.
---

> Disallow fallback to Scanner in control.sh when asking password
> ---
>
> Key: IGNITE-10324
> URL: https://issues.apache.org/jira/browse/IGNITE-10324
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Alexand Polyakov
>Priority: Major
>
> After implementing IGNITE-9990 we still can fallback to Scanner in case of 
> Console is not allowed, user can easily fallback to non-secure mode by using 
> some java agent. We should not allow this, cause otherwise all efforts in 
> IGNITE-9990 are useless.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10671) Double initialization of segmentAware and FileArchiver lead to race breaking file compression.

2018-12-20 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725886#comment-16725886
 ] 

Pavel Voronkin commented on IGNITE-10671:
-

[~akalashnikov] thanks, i've fixed your comments.

> Double initialization of segmentAware and FileArchiver lead to race breaking 
> file compression.
> --
>
> Key: IGNITE-10671
> URL: https://issues.apache.org/jira/browse/IGNITE-10671
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
> Attachments: WalCompactionSwitchOverTest.java
>
>
> Race is painful when you switch your cluster from walCompaction=false to 
> walCompaction=true.
> The same FileCompressor instance will use different segmentAwares due to 
> start0() is called twice which leads to inconsistent behaviour and errors 
> during compaction, basically we will try to archive files twice concurrently.
> See reproducer in attachment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-9618) Need to be replace the data compression algorithm

2018-12-20 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin resolved IGNITE-9618.

Resolution: Later

> Need to be replace the data compression algorithm
> -
>
> Key: IGNITE-9618
> URL: https://issues.apache.org/jira/browse/IGNITE-9618
> Project: Ignite
>  Issue Type: New Feature
>  Components: persistence
>Reporter: Alexand Polyakov
>Assignee: Alexand Polyakov
>Priority: Major
>
> Now used zip and its speed slow
> Exist alternatives and on tests in terms of performance they showed 
> themselves to be better
> source file wal 1Gb
> result
> ||algoritm||time, ms||size, byte||
> |zip|18 889|79 950 283|
> |[Snappy|https://github.com/xerial/snappy-java]|3 372|156 482 623|
> |[lz4|https://github.com/lz4/lz4-java]|2 047|128 591 795|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10671) Double initialization of segmentAware and FileArchiver lead to race breaking file compression.

2018-12-21 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726537#comment-16726537
 ] 

Pavel Voronkin commented on IGNITE-10671:
-

There is an error with ignite bot, i can't paste visa, 
[https://mtcga.gridgain.com/pr.html?serverId=apache=IgniteTests24Java8_RunAll==pull/5665/head=Latest.]
 Can you please check i've rerun blockers.

> Double initialization of segmentAware and FileArchiver lead to race breaking 
> file compression.
> --
>
> Key: IGNITE-10671
> URL: https://issues.apache.org/jira/browse/IGNITE-10671
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
> Attachments: WalCompactionSwitchOverTest.java
>
>
> Race is painful when you switch your cluster from walCompaction=false to 
> walCompaction=true.
> The same FileCompressor instance will use different segmentAwares due to 
> start0() is called twice which leads to inconsistent behaviour and errors 
> during compaction, basically we will try to archive files twice concurrently.
> See reproducer in attachment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10648) Ignite hang to stop if node wasn't started completely. GridTcpRestNioListener hangs on latch.

2018-12-21 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726670#comment-16726670
 ] 

Pavel Voronkin commented on IGNITE-10648:
-

[~v.pyatkov] looks good for me.

> Ignite hang to stop if node wasn't started completely. GridTcpRestNioListener 
> hangs on latch.
> -
>
> Key: IGNITE-10648
> URL: https://issues.apache.org/jira/browse/IGNITE-10648
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Vladislav Pyatkov
>Priority: Major
>
> If Ignition.start waits on rebalance then GridRestProcessor is not started 
> yet then we call Ingition.stop and 
> GridTcpRestNioListener hangs on 
> if (marshMapLatch.getCount() > 0)
>  U.awaitQuiet(marshMapLatch);
> cause wasn't counted down on start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10648) Ignite hang to stop if node wasn't started completely. GridTcpRestNioListener hangs on latch.

2018-12-11 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-10648:
---

 Summary: Ignite hang to stop if node wasn't started completely. 
GridTcpRestNioListener hangs on latch.
 Key: IGNITE-10648
 URL: https://issues.apache.org/jira/browse/IGNITE-10648
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10648) Ignite hang to stop if node wasn't started completely. GridTcpRestNioListener hangs on latch.

2018-12-11 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10648:

Description: 
If Ignition.start waits on rebalance then GridRestProcessor is not started yet 
then we call Ingition.stop and 

GridTcpRestNioListener hangs on 

if (marshMapLatch.getCount() > 0)
 U.awaitQuiet(marshMapLatch);

cause wasn't counted down on start.

> Ignite hang to stop if node wasn't started completely. GridTcpRestNioListener 
> hangs on latch.
> -
>
> Key: IGNITE-10648
> URL: https://issues.apache.org/jira/browse/IGNITE-10648
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> If Ignition.start waits on rebalance then GridRestProcessor is not started 
> yet then we call Ingition.stop and 
> GridTcpRestNioListener hangs on 
> if (marshMapLatch.getCount() > 0)
>  U.awaitQuiet(marshMapLatch);
> cause wasn't counted down on start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10638) Improve CacheNoAffinityExchangeTest.testNoAffinityChangeOnClientLeftWithMergedExchanges to cover persistence case

2018-12-11 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-10638:
---

 Summary: Improve 
CacheNoAffinityExchangeTest.testNoAffinityChangeOnClientLeftWithMergedExchanges 
to cover persistence case
 Key: IGNITE-10638
 URL: https://issues.apache.org/jira/browse/IGNITE-10638
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10644) CorruptedTreeException might occur after force node kill during transaction

2018-12-17 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10644:

Attachment: (was: IndexingTest.java)

> CorruptedTreeException might occur after force node kill during transaction
> ---
>
> Key: IGNITE-10644
> URL: https://issues.apache.org/jira/browse/IGNITE-10644
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> Partition eviction process on the other hand:
>  
> 2018-12-10 20:59:24.426 
> [ERROR]sys-#204%_GRID%GridNodeName%[o.a.i.i.p.c.d.d.t.PartitionsEvictManager] 
> Partition eviction failed, this can cause grid hang.
> org.h2.message.DbException: General error: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
> X.common.dpl.model.backstream.DBackStreamMessage_DPL_PROXY 
> [idHash=1961442513, hash=529139710, colocationKey=14465, entityType=I, 
> lastChangeDate=1544464745135, errorMessage=No api 
> [X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
> available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
> List of services violations:
> NODE MODULE FILTER VIOLATIONS 
> No services or violations were found for routing, partition_X_id=5, 
> messageId=1211871172446406939, entityId=1211871174131851324, ownerId=ucp, 
> responseDate=null, entityVersion=1, isDeleted=false, requestDate=Mon Dec 10 
> 20:59:05 MSK 2018, id=4071535538120363041], ver: GridCacheVersion 
> [topVer=155940834, order=1544596983071, nodeOrder=114] ][ I, null, 
> 1211871172446406939, 1211871174131851324, null, 1, 2018-12-10 20:59:05.115, 
> No api [X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] 
> services available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
> List of services violations:
> NODE MODULE FILTER VIOLATIONS 
> No services or violations were found for routing, 4071535538120363041, FALSE, 
> 5 ]" [5-195]
> at org.h2.message.DbException.get(DbException.java:168)
> at org.h2.message.DbException.convert(DbException.java:295)
> at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:293)
> at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:515)
> at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:738)
> at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2487)
> at 
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:433)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1465)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1435)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1633)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:383)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3706)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:652)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:1079)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.tryClear(GridDhtLocalPartition.java:915)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:423)
> at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6782)
> at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.h2.jdbc.JdbcSQLException: General error: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
> X.common.X.model.backstream.DBackStreamMessage_X_PROXY [idHash=1961442513, 
> hash=529139710, 

[jira] [Issue Comment Deleted] (IGNITE-10644) CorruptedTreeException might occur after force node kill during transaction

2018-12-17 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10644:

Comment: was deleted

(was: Reproducer attached.)

> CorruptedTreeException might occur after force node kill during transaction
> ---
>
> Key: IGNITE-10644
> URL: https://issues.apache.org/jira/browse/IGNITE-10644
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> Partition eviction process on the other hand:
>  
> 2018-12-10 20:59:24.426 
> [ERROR]sys-#204%_GRID%GridNodeName%[o.a.i.i.p.c.d.d.t.PartitionsEvictManager] 
> Partition eviction failed, this can cause grid hang.
> org.h2.message.DbException: General error: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
> X.common.dpl.model.backstream.DBackStreamMessage_DPL_PROXY 
> [idHash=1961442513, hash=529139710, colocationKey=14465, entityType=I, 
> lastChangeDate=1544464745135, errorMessage=No api 
> [X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] services 
> available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
> List of services violations:
> NODE MODULE FILTER VIOLATIONS 
> No services or violations were found for routing, partition_X_id=5, 
> messageId=1211871172446406939, entityId=1211871174131851324, ownerId=ucp, 
> responseDate=null, entityVersion=1, isDeleted=false, requestDate=Mon Dec 10 
> 20:59:05 MSK 2018, id=4071535538120363041], ver: GridCacheVersion 
> [topVer=155940834, order=1544596983071, nodeOrder=114] ][ I, null, 
> 1211871172446406939, 1211871174131851324, null, 1, 2018-12-10 20:59:05.115, 
> No api [X.scripts.ucp.retail.propagate.publicapi.ClientPropagateService] 
> services available for route: [*][*][kbt] (zone-node-module).IP: [*]. 
> List of services violations:
> NODE MODULE FILTER VIOLATIONS 
> No services or violations were found for routing, 4071535538120363041, FALSE, 
> 5 ]" [5-195]
> at org.h2.message.DbException.get(DbException.java:168)
> at org.h2.message.DbException.convert(DbException.java:295)
> at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:293)
> at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.remove(GridH2Table.java:515)
> at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:738)
> at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2487)
> at 
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:433)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1465)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1435)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1633)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:383)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3706)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:652)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:1079)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.tryClear(GridDhtLocalPartition.java:915)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:423)
> at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6782)
> at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.h2.jdbc.JdbcSQLException: General error: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: Row@3580787f[ key: 4071535538120363041, val: 
> X.common.X.model.backstream.DBackStreamMessage_X_PROXY [idHash=1961442513, 
> hash=529139710, 

  1   2   3   >