No worries. Regarding stopping the ZK client from trying to connect, closing the connections using HConnectionManager will stop the client.
J-D On Thu, Apr 14, 2011 at 12:23 PM, Sandy Pratt <[email protected]> wrote: > Actually, upon looking at it further, I think this one has more to do with > SSH tunnels than with ZK per se. Let me explain. > > This tool runs as a cron job. It locks locally to prevent overruns. It > establishes an SSH dynamic proxy (-D portnum) for Hadoop and HBase clients to > use as well as a direct tunnel to ZK on 21811. Here' s the sequence of > events: > > 1) Last run is finishing up > 2) New run starts, initializes clients before looking for the lock that last > run holds > 3) New clients find all the network access they need using old run's SSH > process > 4) Last run finishes, closing SSH client and releasing lock > 5) New run checks lock, acquires, proceeds > 6) HBase client fails a get as it can't find a region server (the SSH tunnel > it found during init is gone, and the new one couldn't be established because > ports were in use) > 7) The reconnect would likely have succeeded if the SSH tunnel were in place > (or just not needed) > > To sum up, I'm pretty certain that this is not an HBase or ZK problem, except > in as much as I'd like to be able to tell ZK to stop trying at some point and > I'm not sure how to do that. But, I certainly need to change the bounds of > my locks, and longer term get off cronjobs and SSH tunnels. Thanks for > taking a look though, J-D, and I'm sorry if you wasted too much time on it. > > Sandy > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of Jean- >> Daniel Cryans >> Sent: Monday, April 11, 2011 17:34 >> To: [email protected] >> Subject: Re: Catching ZK ConnectionLoss with HTable >> >> I thought a lot more about this issue and it could be a bigger undertaking >> than >> I thought, basically any HTable operation can throw ZK-related errors and I >> think they should be considered as fatal. >> >> In the mean time HBase could improve the situation a bit. You say it was >> spinning, do you know where exactly? Looking at the 0.90 code, if there's a >> ConnectionLoss it will be eaten by HCM.prefetchRegionCache and then the >> normal .META. querying will take place so I don't see where it could be >> spinning. >> >> J-D >> >> On Mon, Apr 11, 2011 at 2:13 PM, Sandy Pratt <[email protected]> wrote: >> > Thanks J-D. I'll keep an eye on the Jira. >> > >> >> -----Original Message----- >> >> From: [email protected] [mailto:[email protected]] On Behalf Of >> >> Jean- Daniel Cryans >> >> Sent: Monday, April 11, 2011 11:52 >> >> To: [email protected] >> >> Subject: Re: Catching ZK ConnectionLoss with HTable >> >> >> >> I'm cleaning this up in this jira >> >> https://issues.apache.org/jira/browse/HBASE-3755 >> >> >> >> But it's a failure case I haven't seen before, really interesting. >> >> There's a HTable that's created in the guts if HCM that will throw a >> >> ZookeeperConnectionException but it will bubble up as an IOE. I'll >> >> try to address this too in 3755. >> >> >> >> J-D >> >> >
