[jira] [Commented] (IGNITE-9238) Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when coordinator forces client to reconnect on grid startup.

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716537#comment-16716537
 ] 

ASF GitHub Bot commented on IGNITE-9238:


Github user xtern closed the pull request at:

https://github.com/apache/ignite/pull/4503


> Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when 
> coordinator forces client to reconnect on grid startup.
> -
>
> Key: IGNITE-9238
> URL: https://issues.apache.org/jira/browse/IGNITE-9238
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Pereslegin
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
> Attachments: Reproducer.java
>
>
> Example of such hang on TC: 
> https://ci.ignite.apache.org/viewLog.html?buildId=1605243=buildResultsDiv=IgniteTests24Java8_ComputeGrid
> Log output:
> {noformat}
> ...
> [2018-08-07 12:20:09,804][WARN 
> ][sys-#12799%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Client node tries to connect but its exchange info is cleaned up from 
> exchange history. Consider increasing 'IGNITE_EXCHANGE_HISTORY_SIZE' property 
> or start clients in  smaller batches. Current settings and versions: 
> [IGNITE_EXCHANGE_HISTORY_SIZE=1000, initVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], readyVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0]].
> [2018-08-07 12:20:09,804][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=511d5932-5f22-4919-807d-575c7f61, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=6b9a7a1d-07bf-4d20-882a-8462ada3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, 
> lastExchangeTime=1533644409739, loc=false, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=21]
> [2018-08-07 12:20:09,806][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][time] 
> Finished exchange init [topVer=AffinityTopologyVersion [topVer=3, 
> minorTopVer=0], crd=true]
> [2018-08-07 12:20:09,807][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], force=false, evt=NODE_JOINED, 
> node=6b9a7a1d-07bf-4d20-882a-8462ada3]
> [2018-08-07 12:20:09,811][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> err=null]
> [2018-08-07 12:20:09,813][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=a3206c1f-6d57-4fd6-8aa5-e22f3b42, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=a3206c1f-6d57-4fd6-8aa5-e22f3b42, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1533644409779, loc=true, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=41]
> [2018-08-07 12:20:09,814][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] To 
> start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> >>> +---+
> >>> Ignite ver. 
> >>> 2.7.0-SNAPSHOT#20180807-sha1:e96616f580930f267eab44f75d410fa29a876bcb
> >>> +---+
> >>> OS name: Linux 4.4.0-128-generic amd64
> >>> CPU(s): 5
> >>> Heap: 2.0GB
> >>> VM name: 20126@8790182f15a5
> >>> Ignite instance name: internal.GridTaskFailoverAffinityRunTest1
> >>> Local node [ID=511D5932-5F22-4919-807D-575C7F61, order=2, 
> >>> clientMode=false]
> >>> Local node addresses: [127.0.0.1]
> >>> Local ports: TCP:10801 TCP:45821 TCP:47501 
> [2018-08-07 12:20:09,816][INFO 
> 

[jira] [Commented] (IGNITE-9238) Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when coordinator forces client to reconnect on grid startup.

2018-09-04 Thread Alexey Goncharuk (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602922#comment-16602922
 ] 

Alexey Goncharuk commented on IGNITE-9238:
--

[~xtern], I do not understand how it can be that {{initVer.compareTo(readyVer) 
< 0}}, but exchange for the client is not completed? 
If a client joins on topology version {{initVer}}, then {{initVer <= 
readyVer}}. 

> Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when 
> coordinator forces client to reconnect on grid startup.
> -
>
> Key: IGNITE-9238
> URL: https://issues.apache.org/jira/browse/IGNITE-9238
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Pereslegin
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
> Attachments: Reproducer.java
>
>
> Example of such hang on TC: 
> https://ci.ignite.apache.org/viewLog.html?buildId=1605243=buildResultsDiv=IgniteTests24Java8_ComputeGrid
> Log output:
> {noformat}
> ...
> [2018-08-07 12:20:09,804][WARN 
> ][sys-#12799%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Client node tries to connect but its exchange info is cleaned up from 
> exchange history. Consider increasing 'IGNITE_EXCHANGE_HISTORY_SIZE' property 
> or start clients in  smaller batches. Current settings and versions: 
> [IGNITE_EXCHANGE_HISTORY_SIZE=1000, initVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], readyVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0]].
> [2018-08-07 12:20:09,804][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=511d5932-5f22-4919-807d-575c7f61, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=6b9a7a1d-07bf-4d20-882a-8462ada3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, 
> lastExchangeTime=1533644409739, loc=false, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=21]
> [2018-08-07 12:20:09,806][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][time] 
> Finished exchange init [topVer=AffinityTopologyVersion [topVer=3, 
> minorTopVer=0], crd=true]
> [2018-08-07 12:20:09,807][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], force=false, evt=NODE_JOINED, 
> node=6b9a7a1d-07bf-4d20-882a-8462ada3]
> [2018-08-07 12:20:09,811][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> err=null]
> [2018-08-07 12:20:09,813][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=a3206c1f-6d57-4fd6-8aa5-e22f3b42, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=a3206c1f-6d57-4fd6-8aa5-e22f3b42, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1533644409779, loc=true, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=41]
> [2018-08-07 12:20:09,814][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] To 
> start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> >>> +---+
> >>> Ignite ver. 
> >>> 2.7.0-SNAPSHOT#20180807-sha1:e96616f580930f267eab44f75d410fa29a876bcb
> >>> +---+
> >>> OS name: Linux 4.4.0-128-generic amd64
> >>> CPU(s): 5
> >>> Heap: 2.0GB
> >>> VM name: 20126@8790182f15a5
> >>> Ignite instance name: internal.GridTaskFailoverAffinityRunTest1
> >>> Local node [ID=511D5932-5F22-4919-807D-575C7F61, order=2, 
> >>> clientMode=false]
> >>> Local node 

[jira] [Commented] (IGNITE-9238) Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when coordinator forces client to reconnect on grid startup.

2018-08-24 Thread Anton Kalashnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591769#comment-16591769
 ] 

Anton Kalashnikov commented on IGNITE-9238:
---

Looks good for me. Tests also is good. I think it can be merged. 
[~agoncharuk], can you help with merge.

> Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when 
> coordinator forces client to reconnect on grid startup.
> -
>
> Key: IGNITE-9238
> URL: https://issues.apache.org/jira/browse/IGNITE-9238
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Pereslegin
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
> Attachments: Reproducer.java
>
>
> Example of such hang on TC: 
> https://ci.ignite.apache.org/viewLog.html?buildId=1605243=buildResultsDiv=IgniteTests24Java8_ComputeGrid
> Log output:
> {noformat}
> ...
> [2018-08-07 12:20:09,804][WARN 
> ][sys-#12799%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Client node tries to connect but its exchange info is cleaned up from 
> exchange history. Consider increasing 'IGNITE_EXCHANGE_HISTORY_SIZE' property 
> or start clients in  smaller batches. Current settings and versions: 
> [IGNITE_EXCHANGE_HISTORY_SIZE=1000, initVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], readyVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0]].
> [2018-08-07 12:20:09,804][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=511d5932-5f22-4919-807d-575c7f61, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=6b9a7a1d-07bf-4d20-882a-8462ada3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, 
> lastExchangeTime=1533644409739, loc=false, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=21]
> [2018-08-07 12:20:09,806][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][time] 
> Finished exchange init [topVer=AffinityTopologyVersion [topVer=3, 
> minorTopVer=0], crd=true]
> [2018-08-07 12:20:09,807][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], force=false, evt=NODE_JOINED, 
> node=6b9a7a1d-07bf-4d20-882a-8462ada3]
> [2018-08-07 12:20:09,811][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> err=null]
> [2018-08-07 12:20:09,813][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=a3206c1f-6d57-4fd6-8aa5-e22f3b42, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=a3206c1f-6d57-4fd6-8aa5-e22f3b42, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1533644409779, loc=true, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=41]
> [2018-08-07 12:20:09,814][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] To 
> start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> >>> +---+
> >>> Ignite ver. 
> >>> 2.7.0-SNAPSHOT#20180807-sha1:e96616f580930f267eab44f75d410fa29a876bcb
> >>> +---+
> >>> OS name: Linux 4.4.0-128-generic amd64
> >>> CPU(s): 5
> >>> Heap: 2.0GB
> >>> VM name: 20126@8790182f15a5
> >>> Ignite instance name: internal.GridTaskFailoverAffinityRunTest1
> >>> Local node [ID=511D5932-5F22-4919-807D-575C7F61, order=2, 
> >>> clientMode=false]
> >>> Local node addresses: [127.0.0.1]
> >>> Local ports: TCP:10801 TCP:45821 TCP:47501 
> [2018-08-07 12:20:09,816][INFO 
> 

[jira] [Commented] (IGNITE-9238) Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when coordinator forces client to reconnect on grid startup.

2018-08-24 Thread Pavel Pereslegin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591451#comment-16591451
 ] 

Pavel Pereslegin commented on IGNITE-9238:
--

Hello [~akalashnikov].
Take a look at this fix, please.
Test results look good.

> Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when 
> coordinator forces client to reconnect on grid startup.
> -
>
> Key: IGNITE-9238
> URL: https://issues.apache.org/jira/browse/IGNITE-9238
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Pereslegin
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
> Attachments: Reproducer.java
>
>
> Example of such hang on TC: 
> https://ci.ignite.apache.org/viewLog.html?buildId=1605243=buildResultsDiv=IgniteTests24Java8_ComputeGrid
> Log output:
> {noformat}
> ...
> [2018-08-07 12:20:09,804][WARN 
> ][sys-#12799%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Client node tries to connect but its exchange info is cleaned up from 
> exchange history. Consider increasing 'IGNITE_EXCHANGE_HISTORY_SIZE' property 
> or start clients in  smaller batches. Current settings and versions: 
> [IGNITE_EXCHANGE_HISTORY_SIZE=1000, initVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], readyVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0]].
> [2018-08-07 12:20:09,804][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=511d5932-5f22-4919-807d-575c7f61, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=6b9a7a1d-07bf-4d20-882a-8462ada3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, 
> lastExchangeTime=1533644409739, loc=false, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=21]
> [2018-08-07 12:20:09,806][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][time] 
> Finished exchange init [topVer=AffinityTopologyVersion [topVer=3, 
> minorTopVer=0], crd=true]
> [2018-08-07 12:20:09,807][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], force=false, evt=NODE_JOINED, 
> node=6b9a7a1d-07bf-4d20-882a-8462ada3]
> [2018-08-07 12:20:09,811][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> err=null]
> [2018-08-07 12:20:09,813][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=a3206c1f-6d57-4fd6-8aa5-e22f3b42, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=a3206c1f-6d57-4fd6-8aa5-e22f3b42, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1533644409779, loc=true, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=41]
> [2018-08-07 12:20:09,814][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] To 
> start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> >>> +---+
> >>> Ignite ver. 
> >>> 2.7.0-SNAPSHOT#20180807-sha1:e96616f580930f267eab44f75d410fa29a876bcb
> >>> +---+
> >>> OS name: Linux 4.4.0-128-generic amd64
> >>> CPU(s): 5
> >>> Heap: 2.0GB
> >>> VM name: 20126@8790182f15a5
> >>> Ignite instance name: internal.GridTaskFailoverAffinityRunTest1
> >>> Local node [ID=511D5932-5F22-4919-807D-575C7F61, order=2, 
> >>> clientMode=false]
> >>> Local node addresses: [127.0.0.1]
> >>> Local ports: TCP:10801 TCP:45821 TCP:47501 
> [2018-08-07 12:20:09,816][INFO 
> 

[jira] [Commented] (IGNITE-9238) Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when coordinator forces client to reconnect on grid startup.

2018-08-13 Thread Pavel Pereslegin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578006#comment-16578006
 ] 

Pavel Pereslegin commented on IGNITE-9238:
--

Hello [~amashenkov],
I'm not sure how the client in server mode should properly handle 
IgniteNeedReconnectException, may be coordinator shouldn't send it to such 
"clients".
But, as I described above, the main reason for the current failure is that the 
coordinator incorrectly determines absence of the exchange history and forces 
client to reconnect when there is an exchange history.



> Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when 
> coordinator forces client to reconnect on grid startup.
> -
>
> Key: IGNITE-9238
> URL: https://issues.apache.org/jira/browse/IGNITE-9238
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Pereslegin
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
>
> Example of such hang on TC: 
> https://ci.ignite.apache.org/viewLog.html?buildId=1605243=buildResultsDiv=IgniteTests24Java8_ComputeGrid
> Log output:
> {noformat}
> ...
> [2018-08-07 12:20:09,804][WARN 
> ][sys-#12799%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Client node tries to connect but its exchange info is cleaned up from 
> exchange history. Consider increasing 'IGNITE_EXCHANGE_HISTORY_SIZE' property 
> or start clients in  smaller batches. Current settings and versions: 
> [IGNITE_EXCHANGE_HISTORY_SIZE=1000, initVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], readyVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0]].
> [2018-08-07 12:20:09,804][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=511d5932-5f22-4919-807d-575c7f61, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=6b9a7a1d-07bf-4d20-882a-8462ada3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, 
> lastExchangeTime=1533644409739, loc=false, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=21]
> [2018-08-07 12:20:09,806][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][time] 
> Finished exchange init [topVer=AffinityTopologyVersion [topVer=3, 
> minorTopVer=0], crd=true]
> [2018-08-07 12:20:09,807][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], force=false, evt=NODE_JOINED, 
> node=6b9a7a1d-07bf-4d20-882a-8462ada3]
> [2018-08-07 12:20:09,811][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> err=null]
> [2018-08-07 12:20:09,813][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=a3206c1f-6d57-4fd6-8aa5-e22f3b42, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=a3206c1f-6d57-4fd6-8aa5-e22f3b42, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1533644409779, loc=true, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=41]
> [2018-08-07 12:20:09,814][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] To 
> start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> >>> +---+
> >>> Ignite ver. 
> >>> 2.7.0-SNAPSHOT#20180807-sha1:e96616f580930f267eab44f75d410fa29a876bcb
> >>> +---+
> >>> OS name: Linux 4.4.0-128-generic amd64
> >>> CPU(s): 5
> >>> Heap: 2.0GB
> >>> VM name: 20126@8790182f15a5
> >>> Ignite instance name: 

[jira] [Commented] (IGNITE-9238) Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when coordinator forces client to reconnect on grid startup.

2018-08-12 Thread Andrew Mashenkov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577851#comment-16577851
 ] 

Andrew Mashenkov commented on IGNITE-9238:
--

There is a known issue with client force server mode [1]. 
Is it possible this caused by wrong-way check "if node is a client" somewhere 
in code? 

[1] https://issues.apache.org/jira/browse/IGNITE-9241

> Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when 
> coordinator forces client to reconnect on grid startup.
> -
>
> Key: IGNITE-9238
> URL: https://issues.apache.org/jira/browse/IGNITE-9238
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Pereslegin
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
>
> Example of such hang on TC: 
> https://ci.ignite.apache.org/viewLog.html?buildId=1605243=buildResultsDiv=IgniteTests24Java8_ComputeGrid
> Log output:
> {noformat}
> ...
> [2018-08-07 12:20:09,804][WARN 
> ][sys-#12799%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Client node tries to connect but its exchange info is cleaned up from 
> exchange history. Consider increasing 'IGNITE_EXCHANGE_HISTORY_SIZE' property 
> or start clients in  smaller batches. Current settings and versions: 
> [IGNITE_EXCHANGE_HISTORY_SIZE=1000, initVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], readyVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0]].
> [2018-08-07 12:20:09,804][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=511d5932-5f22-4919-807d-575c7f61, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=6b9a7a1d-07bf-4d20-882a-8462ada3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, 
> lastExchangeTime=1533644409739, loc=false, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=21]
> [2018-08-07 12:20:09,806][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][time] 
> Finished exchange init [topVer=AffinityTopologyVersion [topVer=3, 
> minorTopVer=0], crd=true]
> [2018-08-07 12:20:09,807][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], force=false, evt=NODE_JOINED, 
> node=6b9a7a1d-07bf-4d20-882a-8462ada3]
> [2018-08-07 12:20:09,811][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> err=null]
> [2018-08-07 12:20:09,813][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=a3206c1f-6d57-4fd6-8aa5-e22f3b42, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=a3206c1f-6d57-4fd6-8aa5-e22f3b42, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1533644409779, loc=true, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=41]
> [2018-08-07 12:20:09,814][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] To 
> start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> >>> +---+
> >>> Ignite ver. 
> >>> 2.7.0-SNAPSHOT#20180807-sha1:e96616f580930f267eab44f75d410fa29a876bcb
> >>> +---+
> >>> OS name: Linux 4.4.0-128-generic amd64
> >>> CPU(s): 5
> >>> Heap: 2.0GB
> >>> VM name: 20126@8790182f15a5
> >>> Ignite instance name: internal.GridTaskFailoverAffinityRunTest1
> >>> Local node [ID=511D5932-5F22-4919-807D-575C7F61, order=2, 
> >>> clientMode=false]
> >>> Local node addresses: [127.0.0.1]
> >>> Local ports: TCP:10801 TCP:45821 

[jira] [Commented] (IGNITE-9238) Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when coordinator forces client to reconnect on grid startup.

2018-08-10 Thread Pavel Pereslegin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575932#comment-16575932
 ] 

Pavel Pereslegin commented on IGNITE-9238:
--

Hello [~Jokser],
review this fix, please.

When coordinator checks exchange history, it can see updated affinity version, 
but the exchange future on which the affinity version was updated is not fully 
completed.

> Test GridTaskFailoverAffinityRunTest.testNodeRestartClient hangs when 
> coordinator forces client to reconnect on grid startup.
> -
>
> Key: IGNITE-9238
> URL: https://issues.apache.org/jira/browse/IGNITE-9238
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Pereslegin
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
>
> Example of such hang on TC: 
> https://ci.ignite.apache.org/viewLog.html?buildId=1605243=buildResultsDiv=IgniteTests24Java8_ComputeGrid
> Log output:
> {noformat}
> ...
> [2018-08-07 12:20:09,804][WARN 
> ][sys-#12799%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Client node tries to connect but its exchange info is cleaned up from 
> exchange history. Consider increasing 'IGNITE_EXCHANGE_HISTORY_SIZE' property 
> or start clients in  smaller batches. Current settings and versions: 
> [IGNITE_EXCHANGE_HISTORY_SIZE=1000, initVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], readyVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0]].
> [2018-08-07 12:20:09,804][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=511d5932-5f22-4919-807d-575c7f61, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=3, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=6b9a7a1d-07bf-4d20-882a-8462ada3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=3, intOrder=3, 
> lastExchangeTime=1533644409739, loc=false, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=21]
> [2018-08-07 12:20:09,806][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][time] 
> Finished exchange init [topVer=AffinityTopologyVersion [topVer=3, 
> minorTopVer=0], crd=true]
> [2018-08-07 12:20:09,807][INFO 
> ][exchange-worker-#12782%internal.GridTaskFailoverAffinityRunTest1%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], force=false, evt=NODE_JOINED, 
> node=6b9a7a1d-07bf-4d20-882a-8462ada3]
> [2018-08-07 12:20:09,811][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> err=null]
> [2018-08-07 12:20:09,813][INFO 
> ][sys-#12798%internal.GridTaskFailoverAffinityRunTest2%][GridDhtPartitionsExchangeFuture]
>  Completed partition exchange 
> [localNode=a3206c1f-6d57-4fd6-8aa5-e22f3b42, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
> [id=a3206c1f-6d57-4fd6-8aa5-e22f3b42, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1533644409779, loc=true, ver=2.7.0#20180807-sha1:e96616f5, 
> isClient=false], done=true], topVer=AffinityTopologyVersion [topVer=4, 
> minorTopVer=0], durationFromInit=41]
> [2018-08-07 12:20:09,814][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] To 
> start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> [2018-08-07 12:20:09,815][INFO 
> ][grid-starter-testNodeRestartClient-1][GridTaskFailoverAffinityRunTest1] 
> >>> +---+
> >>> Ignite ver. 
> >>> 2.7.0-SNAPSHOT#20180807-sha1:e96616f580930f267eab44f75d410fa29a876bcb
> >>> +---+
> >>> OS name: Linux 4.4.0-128-generic amd64
> >>> CPU(s): 5
> >>> Heap: 2.0GB
> >>> VM name: 20126@8790182f15a5
> >>> Ignite instance name: internal.GridTaskFailoverAffinityRunTest1
> >>> Local node [ID=511D5932-5F22-4919-807D-575C7F61, order=2, 
> >>> clientMode=false]
> >>> Local node addresses: [127.0.0.1]
> >>> Local ports: TCP:10801