[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332105#comment-14332105 ] Hadoop QA commented on YARN-3242: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700085/YARN-3242.001.patch against trunk revision fe7a302. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6691//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6691//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6691//console This message is automatically generated. > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > --- > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore after the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332077#comment-14332077 ] zhihai xu commented on YARN-3242: - I find out the oldZkClient is not useful any more, the added activeZkClient can replace it. uploaded a new patch YARN-3242.001.patch which remove oldZkClient. > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > --- > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch, YARN-3242.001.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore when the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332060#comment-14332060 ] Hadoop QA commented on YARN-3242: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700076/YARN-3242.000.patch against trunk revision fe7a302. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6690//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6690//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6690//console This message is automatically generated. > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > --- > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore when the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332037#comment-14332037 ] zhihai xu commented on YARN-3242: - I uploaded a draft patch which will only process watcher event from current ZooKeeper Client session. > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > --- > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3242.000.patch > > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore when the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.
[ https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332030#comment-14332030 ] zhihai xu commented on YARN-3242: - The following ZooKeeper client logs in RM show this error: {code} // old session closed 2015-02-16 06:01:12,985 INFO org.apache.zookeeper.ZooKeeper: Session: 0x24b8df4044005d4 closed // new session created and connected 2015-02-16 06:01:12,991 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete sessionid = 0x24b8df4044005d8, negotiated timeout = 1 2015-02-16 06:01:12,994 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED // old session disconnected and EventThread shutdown 2015-02-16 06:01:12,995 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED 2015-02-16 06:01:12,995 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down // Error: Wait for ZKClient creation timed out and RM shutdown 2015-02-16 06:01:13,095 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1424095053378_0010 2015-02-16 06:01:33,100 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing app: application_1424095053378_0010 java.io.IOException: Wait for ZKClient creation timed out 2015-02-16 06:01:33,107 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 {code} The following ZooKeeper server logs show the new session 0x24b8df4044005d8 connected until RM shutdown at 2015-02-16 06:01:33. {code} 2015-02-16 06:01:12,991 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x24b8df4044005d8 with negotiated timeout 1 for client 2015-02-16 06:01:33,886 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x24b8df4044005d8, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) 2015-02-16 06:01:33,888 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client which had sessionid 0x24b8df4044005d8 {code} > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > --- > > Key: YARN-3242 > URL: https://issues.apache.org/jira/browse/YARN-3242 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > > Old ZK client session watcher event messed up new ZK client session due to > ZooKeeper asynchronously closing client session. > The watcher event from old ZK client session can still be sent to > ZKRMStateStore when the old ZK client session is closed. > This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper > session. > We only have one ZKRMStateStore but we can have multiple ZK client sessions. > Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher > event is from current session. So the watcher event from old ZK client > session which just is closed will still be processed. > For example, If a Disconnected event received from old session after new > session is connected, the zkClient will be set to null > {code} > case Disconnected: > LOG.info("ZKRMStateStore Session disconnected"); > oldZkClient = zkClient; > zkClient = null; > break; > {code} > Then ZKRMStateStore won't receive SyncConnected event from new session > because new session is already in SyncConnected state and it won't send > SyncConnected event until it is disconnected and connected again. > Then we will see all the ZKRMStateStore operations fail with IOException > "Wait for ZKClient creation timed out" until RM shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)