I haven't see any issues with large hbase clusters, however the the hbase community might be a better place to get more insight into that.
Patrick On Wed, Sep 17, 2014 at 1:55 AM, tobe <[email protected]> wrote: > @patrick Thanks for your concern and I feel sorry about this. The reason of > my problem is nothing about ZooKeeper. It's TaoKeeper which is used to > monitor ZooKeeper and send these commands periodically. Large number of > watches(more than 700K) may cause Broken Pipe when using wchs to get the > detailed status of znodes. > > The large number of znodes are from HBase. We will store all the files for > replication in ZooKeeper and it will be large when the replication is out > of date. And I'm wondering if the HBase cluster scales to thousands of > servers, can ZooKeeper handle all the meta data of HBase? > > > On Wed, Sep 17, 2014 at 2:47 PM, tobe <[email protected]> wrote: > >> @patrick The ports of these commands are sequential. Here're 48668, 48669, >> 48670. So is it the periodically check of ZooKeeper? We mainly use >> ZooKeeper for HDFS and HBase. Other processes in this server will not sent >> 4lw to ZooKeeper. >> >> >> On Wed, Sep 17, 2014 at 2:19 PM, tobe <[email protected]> wrote: >> >>> >>> I have found that the command is called from the same host and run >>> `stat`, `wchs`, `wchc` and `cons` as well. Here'is log from another >>> ZooKeeper cluster. It works well because it doesn't have a lot of znodes. >>> But I wonder why it call the commands frequently? We're using >>> ZooKeeper-3.4.4 and any patch about this? >>> >>> Log from 10.0.4.161 ZooKeeper: >>> 2014-09-17,08:26:38,336 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Processing stat command from /10.0.4.161:48668 >>> 2014-09-17,08:26:38,336 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Stat command output >>> 2014-09-17,08:26:38,337 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Closed socket connection for client /10.0.4.161:48668 (no >>> session established for client) >>> 2014-09-17,08:26:38,398 INFO >>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket >>> connection from /10.0.4.172:62393 >>> 2014-09-17,08:26:38,398 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] Client attempting to establish new session at /10.0.4.172:62393 >>> 2014-09-17,08:26:38,399 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] Established session 0x4833538b30989d with negotiated timeout 30000 >>> for client /10.0.4.172:62393 >>> 2014-09-17,08:26:38,400 INFO >>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0] >>> Successfully authenticated client: >>> authenticationID=hbase_srv/[email protected]; >>> authorizationID=hbase_srv/[email protected]. >>> 2014-09-17,08:26:38,400 INFO >>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0] >>> Setting authorizedID: hbase_srv >>> 2014-09-17,08:26:38,400 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] adding SASL authorization for authorizationID: hbase_srv >>> 2014-09-17,08:26:38,403 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Closed socket connection for client /10.0.4.172:62393 which had >>> sessionid 0x4833538b30989d >>> 2014-09-17,08:26:38,526 INFO >>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket >>> connection from /10.0.4.161:48669 >>> 2014-09-17,08:26:38,526 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Processing wchs command from /10.0.4.161:48669 >>> 2014-09-17,08:26:38,527 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Closed socket connection for client /10.0.4.161:48669 (no >>> session established for client) >>> 2014-09-17,08:26:38,587 INFO >>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket >>> connection from /10.0.4.171:50026 >>> 2014-09-17,08:26:38,588 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] Client attempting to establish new session at /10.0.4.171:50026 >>> 2014-09-17,08:26:38,588 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] Established session 0x4833538b30989e with negotiated timeout 30000 >>> for client /10.0.4.171:50026 >>> 2014-09-17,08:26:38,589 INFO >>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0] >>> Successfully authenticated client: >>> authenticationID=hbase_srv/[email protected]; >>> authorizationID=hbase_srv/[email protected]. >>> 2014-09-17,08:26:38,589 INFO >>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0] >>> Setting authorizedID: hbase_srv >>> 2014-09-17,08:26:38,589 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] adding SASL authorization for authorizationID: hbase_srv >>> 2014-09-17,08:26:38,592 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Closed socket connection for client /10.0.4.171:50026 which had >>> sessionid 0x4833538b30989e >>> 2014-09-17,08:26:38,613 INFO >>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket >>> connection from /10.0.4.172:62397 >>> 2014-09-17,08:26:38,614 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] Client attempting to establish new session at /10.0.4.172:62397 >>> 2014-09-17,08:26:38,614 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] Established session 0x4833538b30989f with negotiated timeout 30000 >>> for client /10.0.4.172:62397 >>> 2014-09-17,08:26:38,615 INFO >>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0] >>> Successfully authenticated client: >>> authenticationID=hbase_srv/[email protected]; >>> authorizationID=hbase_srv/[email protected]. >>> 2014-09-17,08:26:38,615 INFO >>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0] >>> Setting authorizedID: hbase_srv >>> 2014-09-17,08:26:38,615 INFO org.apache.zookeeper.server.ZooKeeperServer: >>> [myid:0] adding SASL authorization for authorizationID: hbase_srv >>> 2014-09-17,08:26:38,618 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Closed socket connection for client /10.0.4.172:62397 which had >>> sessionid 0x4833538b30989f >>> 2014-09-17,08:26:38,707 INFO >>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket >>> connection from /10.0.4.161:48670 >>> 2014-09-17,08:26:38,708 INFO org.apache.zookeeper.server.NIOServerCnxn: >>> [myid:0] Processing wchc command from /10.0.4.161:48670 >>> >>> >>> On Wed, Sep 17, 2014 at 1:00 PM, Patrick Hunt <[email protected]> wrote: >>> >>>> Looks like a client is calling the "dump" 4lw and not waiting for the >>>> results before closing the socket. Try tracking down where that >>>> command is called from. The logs should have something like this in >>>> it: >>>> >>>> 2014-09-16 21:59:11,407 [myid:1] - INFO >>>> [NIOWorkerThread-2:NIOServerCnxn@835] - Processing dump command from >>>> /127.0.0.1:51740 >>>> >>>> (note the ip address) >>>> >>>> Patrick >>>> >>>> On Tue, Sep 16, 2014 at 7:57 PM, tobe <[email protected]> wrote: >>>> > Can any one help to explain why I get so many "Broken pipe IOE" in >>>> > ZooKeeper log? >>>> > >>>> > ZooKeeper throws this exception almost every minute. I don't think we >>>> use >>>> > the four letter command to dumpWatches so frequently. So what does this >>>> > mean? >>>> > >>>> > 2014-09-17,10:52:09,179 ERROR >>>> org.apache.zookeeper.server.NIOServerCnxn: >>>> > [myid:0] Error sending data synchronously >>>> > java.io.IOException: Broken pipe >>>> > at sun.nio.ch.FileDispatcher.write0(Native Method) >>>> > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) >>>> > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69) >>>> > at sun.nio.ch.IOUtil.write(IOUtil.java:40) >>>> > at >>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474) >>>> > at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111) >>>> > at java.io.BufferedWriter.write(BufferedWriter.java:212) >>>> > at java.io.PrintWriter.write(PrintWriter.java:412) >>>> > at java.io.PrintWriter.write(PrintWriter.java:429) >>>> > at java.io.PrintWriter.print(PrintWriter.java:559) >>>> > at java.io.PrintWriter.println(PrintWriter.java:695) >>>> > at >>>> > >>>> org.apache.zookeeper.server.WatchManager.dumpWatches(WatchManager.java:166) >>>> > at >>>> > org.apache.zookeeper.server.DataTree.dumpWatches(DataTree.java:1240) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn$WatchCommand.commandRun(NIOServerCnxn.java:722) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:496) >>>> > 2014-09-17,10:52:09,179 INFO org.apache.zookeeper.server.NIOServerCnxn: >>>> > [myid:0] Closed socket connection for client /10.20.201.234:53756 >>>> which had >>>> > sessionid 0x34840357f664081 >>>> > 2014-09-17,10:52:09,179 ERROR >>>> org.apache.zookeeper.server.NIOServerCnxn: >>>> > [myid:0] Error sending data synchronously >>>> > java.io.IOException: Broken pipe >>>> > at sun.nio.ch.FileDispatcher.write0(Native Method) >>>> > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) >>>> > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69) >>>> > at sun.nio.ch.IOUtil.write(IOUtil.java:40) >>>> > at >>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474) >>>> > at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111) >>>> > at java.io.BufferedWriter.flush(BufferedWriter.java:235) >>>> > at java.io.PrintWriter.flush(PrintWriter.java:276) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn.cleanupWriterSocket(NIOServerCnxn.java:424) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn.access$000(NIOServerCnxn.java:60) >>>> > at >>>> > >>>> org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:500) >>>> >>> >>> >>
