[jira] [Updated] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-09-01 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3242:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and 
TestZKRMStateStoreZKClientConnections before the push. Patch applied cleanly.

> Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
> for old client
> -
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
> YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to 
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to 
> ZKRMStateStore after the old  ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
> event is from current session. So the watcher event from old ZK client 
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new 
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
>   LOG.info("ZKRMStateStore Session disconnected");
>   oldZkClient = zkClient;
>   zkClient = null;
>   break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session 
> because new session is already in SyncConnected state and it won't send 
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException 
> "Wait for ZKClient creation timed out" until  RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after 
> receive eventOfDeath, EventThread will still process all the events until  
> waitingEvents queue is empty.
> {code}
>   while (true) {
>  Object event = waitingEvents.take();
>  if (event == eventOfDeath) {
> wasKilled = true;
>  } else {
> processEvent(event);
>  }
>  if (wasKilled)
> synchronized (waitingEvents) {
>if (waitingEvents.isEmpty()) {
>   isRunning = false;
>   break;
>}
> }
>   }
>   private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
>   } else {
> public void disconnect() {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Disconnecting client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> sendThread.close();
> eventThread.queueEventOfDeath();
> }
> public void close() throws IOException {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closing client for session: 0x"
>   + Long.toHexString(getSessionId()));
> }
> try {
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.closeSession);
> submitRequest(h, null, null, null);
> } catch (InterruptedException e) {
> // ignore, close the send/event threads
> } finally {
> disconnect();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3242:
--
Labels: 2.6.1-candidate  (was: )

 Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
 for old client
 -

 Key: YARN-3242
 URL: https://issues.apache.org/jira/browse/YARN-3242
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
 YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch


 Old ZK client session watcher event messed up new ZK client session due to 
 ZooKeeper asynchronously closing client session.
 The watcher event from old ZK client session can still be sent to 
 ZKRMStateStore after the old  ZK client session is closed.
 This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
 session.
 We only have one ZKRMStateStore but we can have multiple ZK client sessions.
 Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
 event is from current session. So the watcher event from old ZK client 
 session which just is closed will still be processed.
 For example, If a Disconnected event received from old session after new 
 session is connected, the zkClient will be set to null
 {code}
 case Disconnected:
   LOG.info(ZKRMStateStore Session disconnected);
   oldZkClient = zkClient;
   zkClient = null;
   break;
 {code}
 Then ZKRMStateStore won't receive SyncConnected event from new session 
 because new session is already in SyncConnected state and it won't send 
 SyncConnected event until it is disconnected and connected again.
 Then we will see all the ZKRMStateStore operations fail with IOException 
 Wait for ZKClient creation timed out until  RM shutdown.
 The following code from zookeeper(ClientCnxn#EventThread) show even after 
 receive eventOfDeath, EventThread will still process all the events until  
 waitingEvents queue is empty.
 {code}
   while (true) {
  Object event = waitingEvents.take();
  if (event == eventOfDeath) {
 wasKilled = true;
  } else {
 processEvent(event);
  }
  if (wasKilled)
 synchronized (waitingEvents) {
if (waitingEvents.isEmpty()) {
   isRunning = false;
   break;
}
 }
   }
   private void processEvent(Object event) {
   try {
   if (event instanceof WatcherSetEventPair) {
   // each watcher will process the event
   WatcherSetEventPair pair = (WatcherSetEventPair) event;
   for (Watcher watcher : pair.watchers) {
   try {
   watcher.process(pair.event);
   } catch (Throwable t) {
   LOG.error(Error while calling watcher , t);
   }
   }
   } else {
 public void disconnect() {
 if (LOG.isDebugEnabled()) {
 LOG.debug(Disconnecting client for session: 0x
   + Long.toHexString(getSessionId()));
 }
 sendThread.close();
 eventThread.queueEventOfDeath();
 }
 public void close() throws IOException {
 if (LOG.isDebugEnabled()) {
 LOG.debug(Closing client for session: 0x
   + Long.toHexString(getSessionId()));
 }
 try {
 RequestHeader h = new RequestHeader();
 h.setType(ZooDefs.OpCode.closeSession);
 submitRequest(h, null, null, null);
 } catch (InterruptedException e) {
 // ignore, close the send/event threads
 } finally {
 disconnect();
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3242) Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

2015-03-04 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3242:
---
Summary: Asynchrony in ZK-close can lead to ZKRMStateStore watcher 
receiving events for old client  (was: Old ZK client session watcher event 
causes ZKRMStateStore out of sync with current ZK client session due to 
ZooKeeper asynchronously closing client session.)

 Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events 
 for old client
 -

 Key: YARN-3242
 URL: https://issues.apache.org/jira/browse/YARN-3242
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3242.000.patch, YARN-3242.001.patch, 
 YARN-3242.002.patch, YARN-3242.003.patch, YARN-3242.004.patch


 Old ZK client session watcher event messed up new ZK client session due to 
 ZooKeeper asynchronously closing client session.
 The watcher event from old ZK client session can still be sent to 
 ZKRMStateStore after the old  ZK client session is closed.
 This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper 
 session.
 We only have one ZKRMStateStore but we can have multiple ZK client sessions.
 Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher 
 event is from current session. So the watcher event from old ZK client 
 session which just is closed will still be processed.
 For example, If a Disconnected event received from old session after new 
 session is connected, the zkClient will be set to null
 {code}
 case Disconnected:
   LOG.info(ZKRMStateStore Session disconnected);
   oldZkClient = zkClient;
   zkClient = null;
   break;
 {code}
 Then ZKRMStateStore won't receive SyncConnected event from new session 
 because new session is already in SyncConnected state and it won't send 
 SyncConnected event until it is disconnected and connected again.
 Then we will see all the ZKRMStateStore operations fail with IOException 
 Wait for ZKClient creation timed out until  RM shutdown.
 The following code from zookeeper(ClientCnxn#EventThread) show even after 
 receive eventOfDeath, EventThread will still process all the events until  
 waitingEvents queue is empty.
 {code}
   while (true) {
  Object event = waitingEvents.take();
  if (event == eventOfDeath) {
 wasKilled = true;
  } else {
 processEvent(event);
  }
  if (wasKilled)
 synchronized (waitingEvents) {
if (waitingEvents.isEmpty()) {
   isRunning = false;
   break;
}
 }
   }
   private void processEvent(Object event) {
   try {
   if (event instanceof WatcherSetEventPair) {
   // each watcher will process the event
   WatcherSetEventPair pair = (WatcherSetEventPair) event;
   for (Watcher watcher : pair.watchers) {
   try {
   watcher.process(pair.event);
   } catch (Throwable t) {
   LOG.error(Error while calling watcher , t);
   }
   }
   } else {
 public void disconnect() {
 if (LOG.isDebugEnabled()) {
 LOG.debug(Disconnecting client for session: 0x
   + Long.toHexString(getSessionId()));
 }
 sendThread.close();
 eventThread.queueEventOfDeath();
 }
 public void close() throws IOException {
 if (LOG.isDebugEnabled()) {
 LOG.debug(Closing client for session: 0x
   + Long.toHexString(getSessionId()));
 }
 try {
 RequestHeader h = new RequestHeader();
 h.setType(ZooDefs.OpCode.closeSession);
 submitRequest(h, null, null, null);
 } catch (InterruptedException e) {
 // ignore, close the send/event threads
 } finally {
 disconnect();
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)