[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695267#comment-15695267 ] stefanlee commented on YARN-3795: - hi ,i have the same problem,but my scenario is that when i failover RM2 to RM1,the zookeeper in RM1 report watcher num is large, and RM1 is health, then i reboot the zookeeper in RM1,after that ,i found RM1's web can't access and a lot of "Broken pipe" message in RM1's log ,and "java.io.IOException: Len error" appeared in ZK server 's log ,so i want to know if your ZK is health when the above problem occured? > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > Fix For: 2.7.1 > > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.ni
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581589#comment-14581589 ] lachisis commented on YARN-3795: Emm, Could anyone tell me how to close the issus. I find YARN-3469 have resolved the problem. > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientC
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581584#comment-14581584 ] lachisis commented on YARN-3795: On, I checked the YARN-3469. It seems resolve the problem. A moment... > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) -- T
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581580#comment-14581580 ] lachisis commented on YARN-3795: But I think it is not a good way to change jute.maxbuffer size. Because there is no larger znode in ZKRMStateStore. this Exception is caused by larger numbers of Watcher. And I think these Watchers seems not necessary > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581576#comment-14581576 ] lachisis commented on YARN-3795: Yes, I have found "Len error" in zookeeper server as Following: 2015-06-05 06:06:52,976 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZookeeperServer@897] - auth success /134.41.33.88:49189 2015-06-05 06:06:53,007 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x34db2f72ac50c86 due to java.io.IoException: Len error 1113979 2015-06-05 06:06:53,008 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Close socket connection for client /134/41/33.88:49189 which bad sessionid 0x34db2f72ac50c86 > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Metho
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581553#comment-14581553 ] zhihai xu commented on YARN-3795: - Hi [~lachisis], thanks for reporting this issue. Most likely, Broken pipe is due to Len error at ZooKeeper server. To confirm this, Could you check the ZooKeeper server logs to see whether you can find the following log: {code} WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x due to java.io.IOException: Len error ??? {code} You can work around the Len error issue by increasing jute.maxbuffer size at ZooKeeper server or you can try YARN-3469. > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeF
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581537#comment-14581537 ] lachisis commented on YARN-3795: But I think most of these Watchers in ZKRMStateStore seems not necessary. > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) -
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581534#comment-14581534 ] lachisis commented on YARN-3795: It is better if zookeeper fix the ZOOKEEPER-706. > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) -- This message was sent b
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581531#comment-14581531 ] lachisis commented on YARN-3795: I have found ZOOKEEPER-706, this means if zookeeper server receive a request which the body size is larger than 1M, the server will throw exception "Broken pipe" to reject the request. this feature is used to limit the body size of Znode. By scanning the zookeeper snapshot, I do not find a znode created by ZKRMStateStore which have large data size. Then analyzing code, I find large numbers of Watcher are set when call function of "loadRMAppState" and "loadApplicationAttemptState". > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNati
[jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
[ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581517#comment-14581517 ] lachisis commented on YARN-3795: This exception appears two days ago in a yarn platform. there are about 7000+ history jobs in rmstore. Then one time, Activate ReourceManager find session expiry and transitionToStandby. meanwhile, the standby ReourceManager start to transitionToActive, but Throw exception as attached above. > ZKRMStateStore crashes due to IOException: Broken pipe > -- > > Key: YARN-3795 > URL: https://issues.apache.org/jira/browse/YARN-3795 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: lachisis >Priority: Critical > > 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap88/134.41.33.88:2181, initiating session > 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap88/134.41.33.88:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:54,881 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap88/134.41.33.88:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075) > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:Disconnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:54,986 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session disconnected > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server dap87/134.41.33.87:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to dap87/134.41.33.87:2181, initiating session > 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server dap87/134.41.33.87:2181, sessionid = > 0x34db2f72ac50c86, negotiated timeout = 1 > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: None with state:SyncConnected for path:null for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED > 2015-06-05 06:06:55,343 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2015-06-05 06:06:55,344 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session > 0x34db2f72ac50c86 for server dap87/134.41.33.87:2181, unexpected error, > closing socket connection and attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450) > at > org.apache.zookeeper.ClientCnxnSock