[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.

2015-02-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332105#comment-14332105
 ] 

Hadoop QA commented on YARN-3242:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700085/YARN-3242.001.patch
  against trunk revision fe7a302.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6691//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6691//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6691//console

This message is automatically generated.

> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> ---
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.

2015-02-22 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332077#comment-14332077
 ] 

zhihai xu commented on YARN-3242:
-

I find out the oldZkClient is not useful any more, the added activeZkClient can 
replace it.
uploaded a new patch YARN-3242.001.patch which remove oldZkClient.

> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> ---
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore when the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.

2015-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332060#comment-14332060
 ] 

Hadoop QA commented on YARN-3242:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700076/YARN-3242.000.patch
  against trunk revision fe7a302.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6690//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6690//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6690//console

This message is automatically generated.

> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> ---
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore when the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.

2015-02-21 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332037#comment-14332037
 ] 

zhihai xu commented on YARN-3242:
-

I uploaded a draft patch which will only process watcher event from current 
ZooKeeper Client session.

> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> ---
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3242.000.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore when the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3242) Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.

2015-02-21 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332030#comment-14332030
 ] 

zhihai xu commented on YARN-3242:
-

The following ZooKeeper client logs in RM show this error:
{code}
// old session closed
2015-02-16 06:01:12,985 INFO org.apache.zookeeper.ZooKeeper: Session: 
0x24b8df4044005d4 closed


// new session created and connected
2015-02-16 06:01:12,991 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete sessionid = 0x24b8df4044005d8, negotiated timeout = 1
2015-02-16 06:01:12,994 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher 
event type: None with state:SyncConnected for path:null for Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED


// old session disconnected and EventThread shutdown
2015-02-16 06:01:12,995 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher 
event type: None with state:Disconnected for path:null for Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
2015-02-16 06:01:12,995 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down

// Error: Wait for ZKClient creation timed out and RM shutdown
2015-02-16 06:01:13,095 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing 
application with id application_1424095053378_0010
2015-02-16 06:01:33,100 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
storing app: application_1424095053378_0010
java.io.IOException: Wait for ZKClient creation timed out
2015-02-16 06:01:33,107 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
{code}

The following ZooKeeper server logs show the new session 0x24b8df4044005d8 
connected until RM shutdown at 2015-02-16 06:01:33.
{code}
2015-02-16 06:01:12,991 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Established session 0x24b8df4044005d8 with negotiated timeout 1 for client 

2015-02-16 06:01:33,886 WARN org.apache.zookeeper.server.NIOServerCnxn: caught 
end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x24b8df4044005d8, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:744)
2015-02-16 06:01:33,888 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed 
socket connection for client which had sessionid 0x24b8df4044005d8
{code}

> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> ---
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore when the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)