Actually, upon looking at it further, I think this one has more to do with SSH 
tunnels than with ZK per se.  Let me explain.

This tool runs as a cron job.  It locks locally to prevent overruns.  It 
establishes an SSH dynamic proxy (-D portnum) for Hadoop and HBase clients to 
use as well as a direct tunnel to ZK on 21811.  Here' s the sequence of events:

1) Last run is finishing up
2) New run starts, initializes clients before looking for the lock that last 
run holds
3) New clients find all the network access they need using old run's SSH process
4) Last run finishes, closing SSH client and releasing lock
5) New run checks lock, acquires, proceeds
6) HBase client fails a get as it can't find a region server (the SSH tunnel it 
found during init is gone, and the new one couldn't be established because 
ports were in use)
7) The reconnect would likely have succeeded if the SSH tunnel were in place 
(or just not needed)

To sum up, I'm pretty certain that this is not an HBase or ZK problem, except 
in as much as I'd like to be able to tell ZK to stop trying at some point and 
I'm not sure how to do that.  But, I certainly need to change the bounds of my 
locks, and longer term get off cronjobs and SSH tunnels.  Thanks for taking a 
look though, J-D, and I'm sorry if you wasted too much time on it.

Sandy

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Jean-
> Daniel Cryans
> Sent: Monday, April 11, 2011 17:34
> To: [email protected]
> Subject: Re: Catching ZK ConnectionLoss with HTable
> 
> I thought a lot more about this issue and it could be a bigger undertaking 
> than
> I thought, basically any HTable operation can throw ZK-related errors and I
> think they should be considered as fatal.
> 
> In the mean time HBase could improve the situation a bit. You say it was
> spinning, do you know where exactly? Looking at the 0.90 code, if there's a
> ConnectionLoss it will be eaten by HCM.prefetchRegionCache and then the
> normal .META. querying will take place so I don't see where it could be
> spinning.
> 
> J-D
> 
> On Mon, Apr 11, 2011 at 2:13 PM, Sandy Pratt <[email protected]> wrote:
> > Thanks J-D.  I'll keep an eye on the Jira.
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:[email protected]] On Behalf Of
> >> Jean- Daniel Cryans
> >> Sent: Monday, April 11, 2011 11:52
> >> To: [email protected]
> >> Subject: Re: Catching ZK ConnectionLoss with HTable
> >>
> >> I'm cleaning this up in this jira
> >> https://issues.apache.org/jira/browse/HBASE-3755
> >>
> >> But it's a failure case I haven't seen before, really interesting.
> >> There's a HTable that's created in the guts if HCM that will throw a
> >> ZookeeperConnectionException but it will bubble up as an IOE. I'll
> >> try to address this too in 3755.
> >>
> >> J-D
> >>

Reply via email to