[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552567#comment-16552567 ] Sergey Chugunov edited comment on IGNITE-8131 at 7/23/18 9:42 AM: -- [~garus.d.g], Changes look good to me as well. I run session expire tests locally and didn't manage to reproduce the issue. Thanks for contribution! [~dpavlov], could you please also review and merge if you're OK with the patch? was (Author: sergey-chugunov): [~garus.d.g], Changes look good to me as well. I run session expire tests locally and didn't manage to reproduce the issue. [~dpavlov], could you please also review and merge if you're OK with the patch? Thanks for contribution! > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: ZK_client_reconnect_failure.log, > ZK_client_reconnect_success.log > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure locally as well, but with low > probability: one failure for 50 or even 300 successful executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547576#comment-16547576 ] Denis Garus edited comment on IGNITE-8131 at 7/18/18 9:30 AM: -- [~sergey-chugunov] If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then ZkDiscoveryEventsData#topVer (about 20 usages), else it would be confusing. This would be the reason some refactorings that touched plenty code base. If we'll choose this way, I think, we should assign to ZkRuntimeState.topVer default value 1L and add follow condition into ZookeeperDiscoveryImpl#processNewEvents(byte[]): {code:java} if (rtState.joined) rtState.evtsData = newEvts; else rtState.topVer = 1L; //may be it will be constant {code} I would like offer to move the assigning of rtState.evtsData to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before onEventProcessed call, like this: {code:java} if (rtState.joined) rtState.evtsData = evtsData; if (rtState.crd) handleProcessedEvents("procEvt"); else onEventProcessed(rtState, updateNodeInfo, evtProcessed); {code} It looks like small change and also fix the issue. What do you think? was (Author: garus.d.g): [~sergey-chugunov] If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then ZkDiscoveryEventsData#topVer (about 20 usages), else it would be confusing. This would be the reason some refactorings that touched plenty code base. If we'll choose this way, I think, we should assign to ZkRuntimeState.topVer default value 1L and add follow condition into ZookeeperDiscoveryImpl#processNewEvents(byte[]): {code:java} if (rtState.joined) rtState.evtsData = newEvts; else rtState.topVer = 1L; //may be it will be constant {code} I would like offer to replace the assigning of rtState.evtsData to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before onEventProcessed call, like this: {code:java} if (rtState.joined) rtState.evtsData = evtsData; if (rtState.crd) handleProcessedEvents("procEvt"); else onEventProcessed(rtState, updateNodeInfo, evtProcessed); {code} It looks like small change and also fix the issue. What do you think? > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: ZK_client_reconnect_failure.log, > ZK_client_reconnect_success.log > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure locally as well, but with low > probability: one failure for 50 or even 300 successful executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547576#comment-16547576 ] Denis Garus edited comment on IGNITE-8131 at 7/18/18 9:26 AM: -- [~sergey-chugunov] If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then ZkDiscoveryEventsData#topVer (about 20 usages), else it would be confusing. This would be the reason some refactorings that touched plenty code base. If we'll choose this way, I think, we should assign to ZkRuntimeState.topVer default value 1L and add follow condition into ZookeeperDiscoveryImpl#processNewEvents(byte[]): {code:java} if (rtState.joined) rtState.evtsData = newEvts; else rtState.topVer = 1L; //may be it will be constant {code} I would like offer to replace the assigning of rtState.evtsData to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before onEventProcessed call, like this: {code:java} if (rtState.joined) rtState.evtsData = evtsData; if (rtState.crd) handleProcessedEvents("procEvt"); else onEventProcessed(rtState, updateNodeInfo, evtProcessed); {code} It looks like small change and also fix the issue. What do you think? was (Author: garus.d.g): [~sergey-chugunov] If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then ZkDiscoveryEventsData#topVer (about 20 usages), else it would be confusing. This would be the reason some refactorings that touched plenty code base. If we'll choice this way, I think, we should assigne to ZkRuntimeState.topVer default value 1L and add follow condition into ZookeeperDiscoveryImpl#processNewEvents(byte[]): {code:java} if (rtState.joined) rtState.evtsData = newEvts; else rtState.topVer = 1L; //may be it will be constant {code} I would like offer to replace the assigning of rtState.evtsData to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before onEventProcessed call, like this: {code:java} if (rtState.joined) rtState.evtsData = evtsData; if (rtState.crd) handleProcessedEvents("procEvt"); else onEventProcessed(rtState, updateNodeInfo, evtProcessed); {code} It looks like small change. What do you think? > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: ZK_client_reconnect_failure.log, > ZK_client_reconnect_success.log > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure locally as well, but with low > probability: one failure for 50 or even 300 successful executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547576#comment-16547576 ] Denis Garus edited comment on IGNITE-8131 at 7/18/18 9:07 AM: -- [~sergey-chugunov] If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then ZkDiscoveryEventsData#topVer (about 20 usages), else it would be confusing. This would be the reason some refactorings that touched plenty code base. If we'll choice this way, I think, we should assigne to ZkRuntimeState.topVer default value 1L and add follow condition into ZookeeperDiscoveryImpl#processNewEvents(byte[]): {code:java} if (rtState.joined) rtState.evtsData = newEvts; else rtState.topVer = 1L; //may be it will be constant {code} I would like offer to replace the assigning of rtState.evtsData to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before onEventProcessed call, like this: {code:java} if (rtState.joined) rtState.evtsData = evtsData; if (rtState.crd) handleProcessedEvents("procEvt"); else onEventProcessed(rtState, updateNodeInfo, evtProcessed); {code} It looks like small change. What do you think? was (Author: garus.d.g): Sergey Chugunov If we'll add field ZkRuntimeState.topVer, we must use it everywhere rather then ZkDiscoveryEventsData#topVer (about 20 usages), else it would be confusing. This would be the reason some refactorings that touched plenty code base. If we'll choice this way, I think, we should assigne to ZkRuntimeState.topVer default value 1L and add follow condition into ZookeeperDiscoveryImpl#processNewEvents(byte[]): {code} if (rtState.joined) rtState.evtsData = newEvts; else rtState.topVer = 1L; //may be it will be constant {code} I would like offer to replace the assigning of rtState.evtsData to ZookeeperDiscoveryImpl#processNewEvents(ZkDiscoveryEventsData) before onEventProcessed call, like this: {code} if (rtState.joined) rtState.evtsData = evtsData; if (rtState.crd) handleProcessedEvents("procEvt"); else onEventProcessed(rtState, updateNodeInfo, evtProcessed); {code} It looks like small change. What do you think? > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: ZK_client_reconnect_failure.log, > ZK_client_reconnect_success.log > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure locally as well, but with low > probability: one failure for 50 or even 300 successful executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546603#comment-16546603 ] Sergey Chugunov edited comment on IGNITE-8131 at 7/17/18 1:28 PM: -- [~garus.d.g], I finally got the whole picture around the issue. As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* client tries to save some data to ZooKeeper (step 0) in *ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because of *KeeperException$SessionExpiredException*. Program flow doesn't return to *processNewEvents* and goes to *ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and exception is thrown. But *processNewEvents* didn't finish some logic crucial for successful reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect event is obtained from. Proposed fix tries to close this race by postponing saving data to ZooKeeper at step 0 so *rtState.evtsData* will be initialized to the moment when session expiration is detected. To me it is risky because it looks like all acks from client nodes will be postponed for a significant amount of time (one minute by default). As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why not save topVer earlier, before going to code which could face session expiration? So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix for the problem, could you take a look at it, please, and share your opinion? It may be not a perfect place to do such early initialization, but we could change it - the question is more about approach in general. was (Author: sergey-chugunov): [~garus.d.g], I finally got the whole picture around the issue. As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* client tries to save some data to ZooKeeper (step 0) in *ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because of *KeeperException$SessionExpiredException*. Program flow doesn't return to *processNewEvents* and goes to *ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and exception is thrown. But *processNewEvents* didn't finish some logic crucial for successful reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect is obtained from. Proposed fix tries to avoid this race by postponing saving data to ZooKeeper at step 0 so *rtState.evtsData* will be initialized to the moment when session expiration is detected. To me it is risky because it looks like all acks from client nodes will be postponed for a significant amount of time (one minute by default). As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why not save topVer earlier, before going to code which could face session expiration? So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix for the problem, could you take a look at it, please, and share your opinion? It may be not a perfect place to do such early initialization, but we could change it - the question is more about approach in general. > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: ZK_client_reconnect_failure.log, > ZK_client_reconnect_success.log > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failur
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546603#comment-16546603 ] Sergey Chugunov edited comment on IGNITE-8131 at 7/17/18 1:27 PM: -- [~garus.d.g], I finally got the whole picture around the issue. As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* client tries to save some data to ZooKeeper (step 0) in *ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because of *KeeperException$SessionExpiredException*. Program flow doesn't return to *processNewEvents* and goes to *ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and exception is thrown. But *processNewEvents* didn't finish some logic crucial for successful reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect is obtained from. Proposed fix tries to avoid this race by postponing saving data to ZooKeeper at step 0 so *rtState.evtsData* will be initialized to the moment when session expiration is detected. To me it is risky because it looks like all acks from client nodes will be postponed for a significant amount of time (one minute by default). As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why not save topVer earlier, before going to code which could face session expiration? So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix for the problem, could you take a look at it, please, and share your opinion? It may be not a perfect place to do such early initialization, but we could change it - the question is more about approach in general. was (Author: sergey-chugunov): [~garus.d.g], I finally got the whole picture around the issue. As part of execution of *ZookeeperDiscoveryImpl#processNewEvents(byte[] data)* client tries to save some data to ZooKeeper (step 0) in *ZookeeperClient#setData(String path, byte[] data, int ver)* but fails because of *KeeperException$SessionExpiredException*. Program flow doesn't return to *processNewEvents* but goes to *ZookeeperClient#onZookeeperError* where reconnect closure is scheduled and exception is thrown. But *processNewEvents* didn't finish some logic crucial for successful reconnect: it hasn't initialized *rtState.evtsData* where topVer for reconnect is obtained from. Proposed fix tries to avoid this race by postponing saving data to ZooKeeper at step 0 so *rtState.evtsData* will be initialized to the moment when session expiration is detected. To me it is risky because it looks like all acks from client nodes will be postponed for a significant amount of time (one minute by default). As we need *rtState.evtsData* only to retrieve topVer from it on reconnect, why not save topVer earlier, before going to code which could face session expiration? So I opened a [PR|https://github.com/apache/ignite/pull/4366] with a quick fix for the problem, could you take a look at it, please, and share your opinion? It may be not a perfect place to do such early initialization, but we could change it - the question is more about approach in general. > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.7 > > Attachments: ZK_client_reconnect_failure.log, > ZK_client_reconnect_success.log > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure loca
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510835#comment-16510835 ] Sergey Chugunov edited comment on IGNITE-8131 at 6/13/18 9:25 AM: -- [~garus.d.g], thanks for your efforts. I double checked your result and managed to reproduce the issue locally: I've run this single test with "Until failure" configuration and observed failure on 120th execution. I found another detail about failure: in all successful executions ZooKeeper client is closed because of session timeout reported by this line in logs: {noformat} [WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] Failed to establish ZooKeeper connection, close client [timeout=2000] {noformat} But in failed execution I don't see this line in the logs. It looks worth investigating what causes this behavior and its implications. Could you please try to reproduce the failure and figure out what's going on? Or if it is more convenient for you I can share full logs from my local run for both successful and failed executions. was (Author: sergey-chugunov): [~garus.d.g], thanks for your efforts. I double checked your result and managed to reproduce the issue locally: I've run this single test with "Until failure" configuration and observed failure on 120th execution. I found another detail about failure: in all successful executions ZooKeeper client is closed because of session timeout reported by this line in logs: {noformat} [WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] Failed to establish ZooKeeper connection, close client [timeout=2000] {noformat} But in failed execution I don't see this line in the logs. It looks worth investigating what causes this behavior and its implications. > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure locally as well, but with low > probability: one failure for 50 or even 300 successful executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510835#comment-16510835 ] Sergey Chugunov edited comment on IGNITE-8131 at 6/13/18 9:24 AM: -- [~garus.d.g], thanks for your efforts. I double checked your result and managed to reproduce the issue locally: I've run this single test with "Until failure" configuration and observed failure on 120th execution. I found another detail about failure: in all successful executions ZooKeeper client is closed because of session timeout reported by this line in logs: {noformat} [WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] Failed to establish ZooKeeper connection, close client [timeout=2000] {noformat} But in failed execution I don't see this line in the logs. It looks worth investigating what causes this behavior and its implications. was (Author: sergey-chugunov): [~garus.d.g], I double checked your result and managed to reproduce the issue locally: I've run this single test with "Until failure" configuration and observed failure on 120th execution. I found another detail about failure: in all successful executions ZooKeeper client is closed because of session timeout reported by this line in logs: {noformat} [WARN ][zk-client-timer-internal.ZookeeperDiscoverySpiTest1][ZookeeperClient] Failed to establish ZooKeeper connection, close client [timeout=2000] {noformat} But in failed execution I don't see this line in the logs. It looks worth investigating what causes this behavior and its implications. > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure locally as well, but with low > probability: one failure for 50 or even 300 successful executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8131) ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC
[ https://issues.apache.org/jira/browse/IGNITE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506969#comment-16506969 ] Denis Garus edited comment on IGNITE-8131 at 6/9/18 1:01 PM: - [~sergey-chugunov], I deleted a fail line in tests and ran TC. Everything seems to be OK. Should we close this ticket? Test history: https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-1700882236921635901&tab=testDetails https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=1337433329747388373&tab=testDetails was (Author: garus.d.g): [~sergey-chugunov], I deleted a fail line in tests and ran TC. Everything seems to be OK. Should we close this ticket? Test history: https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-1700882236921635901&tab=testDetails > ZookeeperDiscoverySpiTest#testClientReconnectSessionExpire* tests fail on TC > > > Key: IGNITE-8131 > URL: https://issues.apache.org/jira/browse/IGNITE-8131 > Project: Ignite > Issue Type: Bug > Components: zookeeper >Reporter: Sergey Chugunov >Assignee: Denis Garus >Priority: Major > Labels: MakeTeamcityGreenAgain > > Two tests always fail on TC with the assertion > {noformat} > junit.framework.AssertionFailedError: Failed to wait for disconnect/reconnect > event. > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.waitReconnectEvent(ZookeeperDiscoverySpiTest.java:4221) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.reconnectClientNodes(ZookeeperDiscoverySpiTest.java:4183) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.clientReconnectSessionExpire(ZookeeperDiscoverySpiTest.java:2231) > at > org.apache.ignite.spi.discovery.zk.internal.ZookeeperDiscoverySpiTest.testClientReconnectSessionExpire1_1(ZookeeperDiscoverySpiTest.java:2206) > {noformat} > from client disconnect/reconnect events check. Obviously client doesn't > generate these events as it supposed to do. > (TC runs can be found > [here|https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_IgniteZooKeeperDiscovery&branch_IgniteTests24Java8=pull%2F3730%2Fhead&tab=buildTypeStatusDiv]). > It is possible to reproduce test failure locally as well, but with low > probability: one failure for 50 or even 300 successful executions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)