[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2016-11-25 Thread stefanlee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695267#comment-15695267
 ] 

stefanlee commented on YARN-3795:
-

hi ,i have the same problem,but my scenario is that when i failover RM2 to 
RM1,the zookeeper in RM1 report watcher num is large, and RM1 is health, then i 
reboot the zookeeper in RM1,after that ,i found RM1's web can't access and  a 
lot of "Broken pipe" message in RM1's log ,and "java.io.IOException: Len error" 
 appeared in ZK server 's log ,so i want to  know if your ZK is health when the 
above problem occured?

> ZKRMStateStore crashes due to IOException: Broken pipe
> --
>
> Key: YARN-3795
> URL: https://issues.apache.org/jira/browse/YARN-3795
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: lachisis
>Priority: Critical
> Fix For: 2.7.1
>
>
> 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to dap88/134.41.33.88:2181, initiating session
> 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server dap88/134.41.33.88:2181, sessionid = 
> 0x34db2f72ac50c86, negotiated timeout = 1
> 2015-06-05 06:06:54,881 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Watcher event type: None with state:SyncConnected for path:null for Service 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
> 2015-06-05 06:06:54,881 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-05 06:06:54,881 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
> closing socket connection and attempting reconnect
> java.io.IOException: Broken pipe
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
> 2015-06-05 06:06:54,986 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Watcher event type: None with state:Disconnected for path:null for Service 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
> 2015-06-05 06:06:54,986 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session disconnected
> 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server dap87/134.41.33.87:2181. Will not attempt to 
> authenticate using SASL (unknown error)
> 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to dap87/134.41.33.87:2181, initiating session
> 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server dap87/134.41.33.87:2181, sessionid = 
> 0x34db2f72ac50c86, negotiated timeout = 1
> 2015-06-05 06:06:55,343 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Watcher event type: None with state:SyncConnected for path:null for Service 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
> 2015-06-05 06:06:55,343 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-05 06:06:55,344 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
> closing socket connection and attempting reconnect
> java.io.IOException: Broken pipe
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
>   at 

[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581576#comment-14581576
 ] 

lachisis commented on YARN-3795:


Yes, I have found Len error in zookeeper server as Following:
2015-06-05 06:06:52,976 [myid:2] - INFO 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZookeeperServer@897] - auth success 
/134.41.33.88:49189
2015-06-05 06:06:53,007 [myid:2] - WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x34db2f72ac50c86 due to java.io.IoException: Len 
error 1113979
2015-06-05 06:06:53,008 [myid:2] - WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Close socket 
connection for client /134/41/33.88:49189 which bad sessionid 0x34db2f72ac50c86 

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)

[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581580#comment-14581580
 ] 

lachisis commented on YARN-3795:


But I think it is not a good way to change jute.maxbuffer size. 
Because there is no larger znode in ZKRMStateStore.  this Exception is caused 
by larger numbers of Watcher.
And I think these Watchers  seems not necessary

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at 

[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581584#comment-14581584
 ] 

lachisis commented on YARN-3795:


On, I checked the YARN-3469. It seems resolve the problem. 
A moment...

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581589#comment-14581589
 ] 

lachisis commented on YARN-3795:


Emm, Could anyone tell me how to close the issus.
I find YARN-3469 have resolved the problem.

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581534#comment-14581534
 ] 

lachisis commented on YARN-3795:


It is better if zookeeper fix the ZOOKEEPER-706. 

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581537#comment-14581537
 ] 

lachisis commented on YARN-3795:


But I think most of these Watchers in ZKRMStateStore  seems not necessary.

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581553#comment-14581553
 ] 

zhihai xu commented on YARN-3795:
-

Hi [~lachisis], thanks for reporting this issue.
Most likely, Broken pipe is due to Len error at ZooKeeper server.
To confirm this, Could you check the ZooKeeper server logs to see whether you 
can find the following log:
{code}
WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of 
session 0x due to java.io.IOException: Len error ???
{code}

You can work around the Len error issue by increasing jute.maxbuffer size at 
ZooKeeper server or you can try YARN-3469.
 

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at 

[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581517#comment-14581517
 ] 

lachisis commented on YARN-3795:


This exception appears two days ago in a yarn platform.
there are about 7000+ history jobs in rmstore. Then one time, Activate 
ReourceManager find session expiry and transitionToStandby. 
meanwhile, the standby ReourceManager  start to transitionToActive, but Throw 
exception as attached above.

 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 

[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe

2015-06-11 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581531#comment-14581531
 ] 

lachisis commented on YARN-3795:


I have found ZOOKEEPER-706, this means if zookeeper server receive a request 
which the body size is larger than 1M, the server will throw exception Broken 
pipe to reject the request.
this feature is used to limit the body size of Znode.

By scanning the zookeeper snapshot, I do not find a znode created by 
ZKRMStateStore which have large data size. 
Then analyzing code,  I find large numbers of Watcher are set when call 
function of loadRMAppState and loadApplicationAttemptState. 



 ZKRMStateStore crashes due to IOException: Broken pipe
 --

 Key: YARN-3795
 URL: https://issues.apache.org/jira/browse/YARN-3795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: lachisis
Priority: Critical

 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap88/134.41.33.88:2181, initiating session
 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap88/134.41.33.88:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:54,881 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
   at 
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:Disconnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:54,986 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session disconnected
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server dap87/134.41.33.87:2181. Will not attempt to 
 authenticate using SASL (unknown error)
 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to dap87/134.41.33.87:2181, initiating session
 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server dap87/134.41.33.87:2181, sessionid = 
 0x34db2f72ac50c86, negotiated timeout = 1
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Watcher event type: None with state:SyncConnected for path:null for Service 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
 2015-06-05 06:06:55,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-05 06:06:55,344 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 
 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, 
 closing socket connection and attempting reconnect
 java.io.IOException: Broken pipe
   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
   at sun.nio.ch.IOUtil.write(IOUtil.java:65)