jingych, inline:
On Wed, Nov 13, 2013 at 7:06 PM, jingych <[email protected]> wrote: > Thanks, Esteban and Stack! > > As Esteban said, the problem was solved. > > My config is below: > <code> > conf.setInt("hbase.client.retries.number", 1); > conf.setInt("zookeeper.session.timeout", 5000); > conf.setInt("zookeeper.recovery.retry", 1); > conf.setInt("zookeeper.recovery.retry.intervalmill", 50); > </code> > But it still cost 46 seconds. > And the log printing: > <log> > > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/hbaseid > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/master > > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/root-region-server > > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/root-region-server > > </log> > It still tried to build the 4 above connections. > The client (via HConnectionManager) needs to set 3 watchers on each of those znodes in ZK, each attempt will have a max timeout of 5 seconds (you have a single zk server) plus 10 seconds of the second attempt: 3 * (5 * 2^0) + 3 * (5 * 2^1) = 45 and the extra second should come from a hardcoded sleep in the RPC implementation during a retry. Setting zookeeper.recovery.retry=0 can make it fail faster but in case of a transient failure then you will have to handle the reconnection in your code. > > Could you please explain why the ZK do this? ( I'm realy new to the HBase > world.) > If i set the ZK session timeout with 1s, is't OK? > you *could* but you don't want clients to overwhelm ZK by re-establishing connections over and over. > And what do you mean about "depending on the number of ZK servers you have > running the socket level timeout in the client to a ZK server will be > zookeeper.session.timeout/#ZKs"? > It means that if i hava 3 zookeepers and zookeeper.session.timeout=5000, > each connection will 5000/3 timeout? > thats correct, the timeout to establish a connection to ZK will be around 1.6 seconds (5000 milliseconds / 3) with 3 ZKs. > I'm running ZK and HBase Master at one node as pseudo-distributed mode. > > Best Regards! > > ------------------------------ > > jingych > > 2013-11-14 > > *发件人:* Esteban Gutierrez <[email protected]> > *发送时间:* 2013-11-14 06:10 > *收件人:* Stack <[email protected]> > *抄送:* Hbase-User <[email protected]>; jingych <[email protected]> > *主题:* Re: Re: HBaseAdmin#checkHBaseAvailable COST ABOUT 1 MINUTE TO CHECK > A DEAD(OR NOT EXISTS) HBASE MASTER > > jingych, > > That timeout comes from ZooKeeper, are you running ZK on the same node you > are running the HBase Master? If your environment requires to fail fast > even for ZK connection timeouts then you need to reduce > zookeeper.recovery.retry.intervalmill and zookeeper.recovery.retry since > the retries are done via an exponential backoff (1 second, 2 seconds, 8 > seconds), also depending on the number of ZK servers you have running the > socket level timeout in the client to a ZK server will be > zookeeper.session.timeout/#ZKs > > cheers, > esteban. > > > > > > > -- > Cloudera, Inc. > > > > On Wed, Nov 13, 2013 at 7:21 AM, Stack <[email protected]> wrote: > >> More of the log and the version of HBase involved please. Thanks. >> St.Ack >> >> >> On Wed, Nov 13, 2013 at 1:07 AM, jingych <[email protected]> wrote: >> >>> Thanks, esteban! >>> >>> I'v tried. But it did not work. >>> >>> I first load the customer hbase-site.xml, and then try to check the >>> hbase server. >>> So my code is like this: >>> <code> >>> conf.setInt("hbase.client.retries.number", 1); >>> conf.setInt("hbase.client.pause", 5); >>> conf.setInt("ipc.socket.timeout", 5000); >>> conf.setInt("hbase.rpc.timeout", 5000); >>> </code> >>> >>> The log printing: Sleeping 4000ms before retry #2... >>> >>> If the zookeeper's quarum is the wrong address, the process will take >>> very long time. >>> >>> >>> >>> >>> 井玉成 >>> >>> 基础软件事业部 >>> 东软集团股份有限公司 >>> 手机:13889491801 >>> 电话:0411-84835702 >>> >>> 大连市甘井子区黄浦路901号 D1座217室 >>> Postcode:116085 >>> Email:[email protected] >>> >>> From: Esteban Gutierrez >>> Date: 2013-11-13 11:12 >>> To: [email protected]; jingych >>> Subject: Re: HBaseAdmin#checkHBaseAvailable COST ABOUT 1 MINUTE TO CHECK >>> A DEAD(OR NOT EXISTS) HBASE MASTER >>> jingych, >>> >>> The behavior is driven by the number of retries >>> (hbase.client.retries.number), the length of the pause between retries >>> (hbase.client.pause) and the timeout to establish a connection >>> (ipc.socket.timeout) and the time to get some data back from HBase >>> (hbase.rpc.timeout). Lowering the rpc timeout and the ipc socket timeout >>> should help you to fail fast the operation when the HBase Master is not >>> responsive. >>> >>> cheers, >>> esteban. >>> >>> >>> >>> >>> -- >>> Cloudera, Inc. >>> >>> >>> >>> On Tue, Nov 12, 2013 at 6:49 PM, jingych <[email protected]> wrote: >>> >>> > HI, >>> > >>> > I wonder is there any way to limit the "HBaseAdmin#checkHBaseAvailable" >>> > method time cost. >>> > >>> > As i use the "HBaseAdmin#checkHBaseAvailable" method to test if the >>> hbase >>> > master is connectable. >>> > But if the target master is dead or not exists at all, this method will >>> > cost 1 minute to wait the result. >>> > >>> > >>> > >>> > >>> > jingych >>> > 2013-11-13 >>> > >>> > >>> --------------------------------------------------------------------------------------------------- >>> > Confidentiality Notice: The information contained in this e-mail and >>> any >>> > accompanying attachment(s) >>> > is intended only for the use of the intended recipient and may be >>> > confidential and/or privileged of >>> > Neusoft Corporation, its subsidiaries and/or its affiliates. If any >>> reader >>> > of this communication is >>> > not the intended recipient, unauthorized use, forwarding, printing, >>> > storing, disclosure or copying >>> > is strictly prohibited, and may be unlawful.If you have received this >>> > communication in error,please >>> > immediately notify the sender by return e-mail, and delete the original >>> > message and all copies from >>> > your system. Thank you. >>> > >>> > >>> --------------------------------------------------------------------------------------------------- >>> > >>> >>> --------------------------------------------------------------------------------------------------- >>> Confidentiality Notice: The information contained in this e-mail and any >>> accompanying attachment(s) >>> is intended only for the use of the intended recipient and may be >>> confidential and/or privileged of >>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any >>> reader of this communication is >>> not the intended recipient, unauthorized use, forwarding, printing, >>> storing, disclosure or copying >>> is strictly prohibited, and may be unlawful.If you have received this >>> communication in error,please >>> immediately notify the sender by return e-mail, and delete the original >>> message and all copies from >>> your system. Thank you. >>> >>> --------------------------------------------------------------------------------------------------- >>> >> >> > > --------------------------------------------------------------------------------------------------- > Confidentiality Notice: The information contained in this e-mail and any > accompanying attachment(s) > is intended only for the use of the intended recipient and may be > confidential and/or privileged of > Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader > of this communication is > not the intended recipient, unauthorized use, forwarding, printing, > storing, disclosure or copying > is strictly prohibited, and may be unlawful.If you have received this > communication in error,please > immediately notify the sender by return e-mail, and delete the original > message and all copies from > your system. Thank you. > > --------------------------------------------------------------------------------------------------- >
