Re: Catching ZK ConnectionLoss with HTable

Jean-Daniel Cryans Thu, 14 Apr 2011 13:31:30 -0700

No worries.

Regarding stopping the ZK client from trying to connect, closing the
connections using HConnectionManager will stop the client.


J-D

On Thu, Apr 14, 2011 at 12:23 PM, Sandy Pratt <[email protected]> wrote:
> Actually, upon looking at it further, I think this one has more to do with 
> SSH tunnels than with ZK per se.  Let me explain.
>
> This tool runs as a cron job.  It locks locally to prevent overruns.  It 
> establishes an SSH dynamic proxy (-D portnum) for Hadoop and HBase clients to 
> use as well as a direct tunnel to ZK on 21811.  Here' s the sequence of 
> events:
>
> 1) Last run is finishing up
> 2) New run starts, initializes clients before looking for the lock that last 
> run holds
> 3) New clients find all the network access they need using old run's SSH 
> process
> 4) Last run finishes, closing SSH client and releasing lock
> 5) New run checks lock, acquires, proceeds
> 6) HBase client fails a get as it can't find a region server (the SSH tunnel 
> it found during init is gone, and the new one couldn't be established because 
> ports were in use)
> 7) The reconnect would likely have succeeded if the SSH tunnel were in place 
> (or just not needed)
>
> To sum up, I'm pretty certain that this is not an HBase or ZK problem, except 
> in as much as I'd like to be able to tell ZK to stop trying at some point and 
> I'm not sure how to do that.  But, I certainly need to change the bounds of 
> my locks, and longer term get off cronjobs and SSH tunnels.  Thanks for 
> taking a look though, J-D, and I'm sorry if you wasted too much time on it.
>
> Sandy
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Jean-
>> Daniel Cryans
>> Sent: Monday, April 11, 2011 17:34
>> To: [email protected]
>> Subject: Re: Catching ZK ConnectionLoss with HTable
>>
>> I thought a lot more about this issue and it could be a bigger undertaking 
>> than
>> I thought, basically any HTable operation can throw ZK-related errors and I
>> think they should be considered as fatal.
>>
>> In the mean time HBase could improve the situation a bit. You say it was
>> spinning, do you know where exactly? Looking at the 0.90 code, if there's a
>> ConnectionLoss it will be eaten by HCM.prefetchRegionCache and then the
>> normal .META. querying will take place so I don't see where it could be
>> spinning.
>>
>> J-D
>>
>> On Mon, Apr 11, 2011 at 2:13 PM, Sandy Pratt <[email protected]> wrote:
>> > Thanks J-D.  I'll keep an eye on the Jira.
>> >
>> >> -----Original Message-----
>> >> From: [email protected] [mailto:[email protected]] On Behalf Of
>> >> Jean- Daniel Cryans
>> >> Sent: Monday, April 11, 2011 11:52
>> >> To: [email protected]
>> >> Subject: Re: Catching ZK ConnectionLoss with HTable
>> >>
>> >> I'm cleaning this up in this jira
>> >> https://issues.apache.org/jira/browse/HBASE-3755
>> >>
>> >> But it's a failure case I haven't seen before, really interesting.
>> >> There's a HTable that's created in the guts if HCM that will throw a
>> >> ZookeeperConnectionException but it will bubble up as an IOE. I'll
>> >> try to address this too in 3755.
>> >>
>> >> J-D
>> >>
>

Re: Catching ZK ConnectionLoss with HTable

Reply via email to