[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-07-23 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552567#comment-16552567
 ] 

Sergey Chugunov edited comment on IGNITE-8131 at 7/23/18 9:42 AM:
--

[~garus.d.g],

Changes look good to me as well. I run session expire tests locally and didn't 
manage to reproduce the issue.

Thanks for contribution!

[~dpavlov], could you please also review and merge if you're OK with the patch?


was (Author: sergey-chugunov):
[~garus.d.g],

Changes look good to me as well. I run session expire tests locally and didn't 
manage to reproduce the issue.

[~dpavlov], could you please also review and merge if you're OK with the patch?

Thanks for contribution!

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
> Attachments: ZK_client_reconnect_failure.log, 
> ZK_client_reconnect_success.log
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-07-18 Thread Denis Garus (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547576#comment-16547576
 ] 

Denis Garus edited comment on IGNITE-8131 at 7/18/18 9:30 AM:
--

[~sergey-chugunov]

If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then 
ZkDiscoveryEventsData#topVer (about 20 usages), 
else it would be confusing. This would be the reason some refactorings that 
touched plenty code base.
If we'll choose this way, I think, we should assign to ZkRuntimeState.topVer 
default value 1L and add follow condition into 
ZookeeperDiscoveryImpl#processNewEvents(byte[]):
{code:java}
if (rtState.joined)
   rtState.evtsData = newEvts;
else
   rtState.topVer = 1L; //may be it will be constant
{code}
I would like offer to move the assigning of rtState.evtsData 
to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before 
onEventProcessed call, like this:
{code:java}
if (rtState.joined)
   rtState.evtsData = evtsData;

if (rtState.crd)
   handleProcessedEvents("procEvt");
else
   onEventProcessed(rtState, updateNodeInfo, evtProcessed);
{code}
It looks like small change and also fix the issue.

What do you think?


was (Author: garus.d.g):
[~sergey-chugunov]

If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then 
ZkDiscoveryEventsData#topVer (about 20 usages), 
else it would be confusing. This would be the reason some refactorings that 
touched plenty code base.
If we'll choose this way, I think, we should assign to ZkRuntimeState.topVer 
default value 1L and add follow condition into 
ZookeeperDiscoveryImpl#processNewEvents(byte[]):
{code:java}
if (rtState.joined)
   rtState.evtsData = newEvts;
else
   rtState.topVer = 1L; //may be it will be constant
{code}
I would like offer to replace the assigning of rtState.evtsData 
to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before 
onEventProcessed call, like this:
{code:java}
if (rtState.joined)
   rtState.evtsData = evtsData;

if (rtState.crd)
   handleProcessedEvents("procEvt");
else
   onEventProcessed(rtState, updateNodeInfo, evtProcessed);
{code}
It looks like small change and also fix the issue.

What do you think?

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
> Attachments: ZK_client_reconnect_failure.log, 
> ZK_client_reconnect_success.log
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-07-18 Thread Denis Garus (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547576#comment-16547576
 ] 

Denis Garus edited comment on IGNITE-8131 at 7/18/18 9:26 AM:
--

[~sergey-chugunov]

If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then 
ZkDiscoveryEventsData#topVer (about 20 usages), 
else it would be confusing. This would be the reason some refactorings that 
touched plenty code base.
If we'll choose this way, I think, we should assign to ZkRuntimeState.topVer 
default value 1L and add follow condition into 
ZookeeperDiscoveryImpl#processNewEvents(byte[]):
{code:java}
if (rtState.joined)
   rtState.evtsData = newEvts;
else
   rtState.topVer = 1L; //may be it will be constant
{code}
I would like offer to replace the assigning of rtState.evtsData 
to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before 
onEventProcessed call, like this:
{code:java}
if (rtState.joined)
   rtState.evtsData = evtsData;

if (rtState.crd)
   handleProcessedEvents("procEvt");
else
   onEventProcessed(rtState, updateNodeInfo, evtProcessed);
{code}
It looks like small change and also fix the issue.

What do you think?


was (Author: garus.d.g):
[~sergey-chugunov]

If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then 
ZkDiscoveryEventsData#topVer (about 20 usages), 
else it would be confusing. This would be the reason some refactorings that 
touched plenty code base.
If we'll choice this way, I think, we should assigne to ZkRuntimeState.topVer 
default value 1L and add follow condition into 
ZookeeperDiscoveryImpl#processNewEvents(byte[]):
{code:java}
if (rtState.joined)
   rtState.evtsData = newEvts;
else
   rtState.topVer = 1L; //may be it will be constant
{code}
I would like offer to replace the assigning of rtState.evtsData 
to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before 
onEventProcessed call, like this:
{code:java}
if (rtState.joined)
   rtState.evtsData = evtsData;

if (rtState.crd)
   handleProcessedEvents("procEvt");
else
   onEventProcessed(rtState, updateNodeInfo, evtProcessed);
{code}
It looks like small change.

What do you think?

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
> Attachments: ZK_client_reconnect_failure.log, 
> ZK_client_reconnect_success.log
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-07-18 Thread Denis Garus (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547576#comment-16547576
 ] 

Denis Garus edited comment on IGNITE-8131 at 7/18/18 9:07 AM:
--

[~sergey-chugunov]

If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then 
ZkDiscoveryEventsData#topVer (about 20 usages), 
else it would be confusing. This would be the reason some refactorings that 
touched plenty code base.
If we'll choice this way, I think, we should assigne to ZkRuntimeState.topVer 
default value 1L and add follow condition into 
ZookeeperDiscoveryImpl#processNewEvents(byte[]):
{code:java}
if (rtState.joined)
   rtState.evtsData = newEvts;
else
   rtState.topVer = 1L; //may be it will be constant
{code}
I would like offer to replace the assigning of rtState.evtsData 
to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before 
onEventProcessed call, like this:
{code:java}
if (rtState.joined)
   rtState.evtsData = evtsData;

if (rtState.crd)
   handleProcessedEvents("procEvt");
else
   onEventProcessed(rtState, updateNodeInfo, evtProcessed);
{code}
It looks like small change.

What do you think?


was (Author: garus.d.g):
Sergey Chugunov​

If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then 
ZkDiscoveryEventsData#topVer (about 20 usages), 
else it would be confusing. This would be the reason some refactorings that 
touched plenty code base.
If we'll choice this way, I think, we should assigne to ZkRuntimeState.topVer 
default value 1L and add follow condition into 
ZookeeperDiscoveryImpl#processNewEvents(byte[]):
{code}
if (rtState.joined)
   rtState.evtsData = newEvts;
else
   rtState.topVer = 1L; //may be it will be constant
{code} 

I would like offer to replace the assigning of rtState.evtsData 
to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before 
onEventProcessed call, like this:
{code}
if (rtState.joined)
   rtState.evtsData = evtsData;

if (rtState.crd)
   handleProcessedEvents("procEvt");
else
   onEventProcessed(rtState, updateNodeInfo, evtProcessed);
{code} 

It looks like small change. 

What do you think?

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
> Attachments: ZK_client_reconnect_failure.log, 
> ZK_client_reconnect_success.log
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-07-17 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546603#comment-16546603
 ] 

Sergey Chugunov edited comment on IGNITE-8131 at 7/17/18 1:28 PM:
--

[~garus.d.g],

I finally got the whole picture around the issue.

As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* 
client tries to save some data to ZooKeeper (step 0) in 
*ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because 
of *KeeperException$SessionExpiredException*.
Program flow doesn't return to *processNewEvents* and goes to 
*ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and 
exception is thrown.

But *processNewEvents* didn't finish some logic crucial for successful 
reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect 
event is obtained from.

Proposed fix tries to close this race by postponing saving data to ZooKeeper at 
step 0 so *rtState.evtsData* will be initialized to the moment when session 
expiration is detected. To me it is risky because it looks like all acks from 
client nodes will be postponed for a significant amount of time (one minute by 
default).

As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why 
not save topVer earlier, before going to code which could face session 
expiration? 
So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix 
for the problem, could you take a look at it, please, and share your opinion?
It may be not a perfect place to do such early initialization, but we could 
change it - the question is more about approach in general.




was (Author: sergey-chugunov):
[~garus.d.g],

I finally got the whole picture around the issue.

As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* 
client tries to save some data to ZooKeeper (step 0) in 
*ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because 
of *KeeperException$SessionExpiredException*.
Program flow doesn't return to *processNewEvents* and goes to 
*ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and 
exception is thrown.

But *processNewEvents* didn't finish some logic crucial for successful 
reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect 
is obtained from.

Proposed fix tries to avoid this race by postponing saving data to ZooKeeper at 
step 0 so *rtState.evtsData* will be initialized to the moment when session 
expiration is detected. To me it is risky because it looks like all acks from 
client nodes will be postponed for a significant amount of time (one minute by 
default).

As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why 
not save topVer earlier, before going to code which could face session 
expiration? 
So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix 
for the problem, could you take a look at it, please, and share your opinion?
It may be not a perfect place to do such early initialization, but we could 
change it - the question is more about approach in general.



> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
> Attachments: ZK_client_reconnect_failure.log, 
> ZK_client_reconnect_success.log
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with 

[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-07-17 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546603#comment-16546603
 ] 

Sergey Chugunov edited comment on IGNITE-8131 at 7/17/18 1:27 PM:
--

[~garus.d.g],

I finally got the whole picture around the issue.

As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* 
client tries to save some data to ZooKeeper (step 0) in 
*ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because 
of *KeeperException$SessionExpiredException*.
Program flow doesn't return to *processNewEvents* and goes to 
*ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and 
exception is thrown.

But *processNewEvents* didn't finish some logic crucial for successful 
reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect 
is obtained from.

Proposed fix tries to avoid this race by postponing saving data to ZooKeeper at 
step 0 so *rtState.evtsData* will be initialized to the moment when session 
expiration is detected. To me it is risky because it looks like all acks from 
client nodes will be postponed for a significant amount of time (one minute by 
default).

As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why 
not save topVer earlier, before going to code which could face session 
expiration? 
So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix 
for the problem, could you take a look at it, please, and share your opinion?
It may be not a perfect place to do such early initialization, but we could 
change it - the question is more about approach in general.




was (Author: sergey-chugunov):
[~garus.d.g],

I finally got the whole picture around the issue.

As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* 
client tries to save some data to ZooKeeper (step 0) in 
*ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because 
of *KeeperException$SessionExpiredException*.
Program flow doesn't return to *processNewEvents* but goes to 
*ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and 
exception is thrown.

But *processNewEvents* didn't finish some logic crucial for successful 
reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect 
is obtained from.

Proposed fix tries to avoid this race by postponing saving data to ZooKeeper at 
step 0 so *rtState.evtsData* will be initialized to the moment when session 
expiration is detected. To me it is risky because it looks like all acks from 
client nodes will be postponed for a significant amount of time (one minute by 
default).

As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why 
not save topVer earlier, before going to code which could face session 
expiration? 
So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix 
for the problem, could you take a look at it, please, and share your opinion?
It may be not a perfect place to do such early initialization, but we could 
change it - the question is more about approach in general.



> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
> Attachments: ZK_client_reconnect_failure.log, 
> ZK_client_reconnect_success.log
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 

[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-06-13 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510835#comment-16510835
 ] 

Sergey Chugunov edited comment on IGNITE-8131 at 6/13/18 9:25 AM:
--

[~garus.d.g], thanks for your efforts.

I double checked your result and managed to reproduce the issue locally: I've 
run this single test with "Until failure" configuration and observed failure on 
120th execution.

I found another detail about failure: in all successful executions ZooKeeper 
client is closed because of session timeout reported by this line in logs:
{noformat}
[WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] 
Failed to establish ZooKeeper connection, close client [timeout=2000]
{noformat}

But in failed execution I don't see this line in the logs. It looks worth 
investigating what causes this behavior and its implications.

Could you please try to reproduce the failure and figure out what's going on? 
Or if it is more convenient for you I can share full logs from my local run for 
both successful and failed executions.


was (Author: sergey-chugunov):
[~garus.d.g], thanks for your efforts.

I double checked your result and managed to reproduce the issue locally: I've 
run this single test with "Until failure" configuration and observed failure on 
120th execution.

I found another detail about failure: in all successful executions ZooKeeper 
client is closed because of session timeout reported by this line in logs:
{noformat}
[WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] 
Failed to establish ZooKeeper connection, close client [timeout=2000]
{noformat}

But in failed execution I don't see this line in the logs. It looks worth 
investigating what causes this behavior and its implications.

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-06-13 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510835#comment-16510835
 ] 

Sergey Chugunov edited comment on IGNITE-8131 at 6/13/18 9:24 AM:
--

[~garus.d.g], thanks for your efforts.

I double checked your result and managed to reproduce the issue locally: I've 
run this single test with "Until failure" configuration and observed failure on 
120th execution.

I found another detail about failure: in all successful executions ZooKeeper 
client is closed because of session timeout reported by this line in logs:
{noformat}
[WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] 
Failed to establish ZooKeeper connection, close client [timeout=2000]
{noformat}

But in failed execution I don't see this line in the logs. It looks worth 
investigating what causes this behavior and its implications.


was (Author: sergey-chugunov):
[~garus.d.g],

I double checked your result and managed to reproduce the issue locally: I've 
run this single test with "Until failure" configuration and observed failure on 
120th execution.

I found another detail about failure: in all successful executions ZooKeeper 
client is closed because of session timeout reported by this line in logs:
{noformat}
[WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] 
Failed to establish ZooKeeper connection, close client [timeout=2000]
{noformat}

But in failed execution I don't see this line in the logs. It looks worth 
investigating what causes this behavior and its implications.

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC

2018-06-09 Thread Denis Garus (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506969#comment-16506969
 ] 

Denis Garus edited comment on IGNITE-8131 at 6/9/18 1:01 PM:
-

[~sergey-chugunov], 
I deleted a fail line in tests and ran TC. Everything seems to be OK.
Should we close this ticket?
Test history:
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-1700882236921635901=testDetails

https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=1337433329747388373=testDetails


was (Author: garus.d.g):
[~sergey-chugunov], 
I deleted a fail line in tests and ran TC. Everything seems to be OK.
Should we close this ticket?
Test history:
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-1700882236921635901=testDetails

> ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
> 
>
> Key: IGNITE-8131
> URL: https://issues.apache.org/jira/browse/IGNITE-8131
> Project: Ignite
>  Issue Type: Bug
>  Components: zookeeper
>Reporter: Sergey Chugunov
>Assignee: Denis Garus
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> Two tests always fail on TC with the assertion
> {noformat}
> junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect 
> event.
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231)
> at 
> org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206)
> {noformat}
> from client disconnect/reconnect events check. Obviously client doesn't 
> generate these events as it supposed to do.
> (TC runs can be found 
> [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery_IgniteTests24Java8=pull%2F3730%2Fhead=buildTypeStatusDiv]).
> It is possible to reproduce test failure locally as well, but with low 
> probability: one failure for 50 or even 300 successful executions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)