Re: Query regarding HTable.get and timeouts

Shrijeet Paliwal Thu, 18 Aug 2011 13:22:24 -0700

It follows exponential back off. Each pause is longer than the last one and
all adds up close to 600.


On Thu, Aug 18, 2011 at 12:09 PM, Srikanth P. Shreenivas <
[email protected]> wrote:

> My apologies, I may not be reading the code right.
>
> You are right, it is GridGain timeout that is making the line 1255 to
> execute.
> However, the question is what would make a HTable.get() to take close to 10
> minutes to induce a timeout in GridGain task.
>
> The value of numRetries at line 1236 should be 10 (default) and if we go
> with default value of HConstants.RETRY_BACKOFF, then, sleep time added with
> all retries will be only 61 seconds, and not close to 600 seconds as the
> case in our code is.
>
>
> Regards,
> Srikanth
>
>
> ________________________________________
> From: Srikanth P. Shreenivas
> Sent: Friday, August 19, 2011 12:21 AM
> To: [email protected]
> Subject: RE: Query regarding HTable.get and timeouts
>
> Please note that line numbers I am referencing are from the file :
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
>
>
> ________________________________________
> From: Srikanth P. Shreenivas
> Sent: Friday, August 19, 2011 12:19 AM
> To: [email protected]
> Subject: RE: Query regarding HTable.get and timeouts
>
> Hi Stack,
>
> Thanks a lot for your reply.  It's always a comforting feeling to see very
> active community and especially your prompt replies to the queries.
>
> Yes, I am running it in as GridGain task,  so it runs it GridGain's thread
> pool.   In this case, we can imaging GridGain as something that hands off
> works to various worker threads and waits asynhronously  for it complete.  I
>  have 10 minute timeout after which GridGain would consider work as timed
> out.
>
> What we are observing is that our tasks are timeing out at 10 minute
> boundary, and delay seems to be caused by the part of the work which is
> doing HTable.get.
>
> My suspicion is that Line 1255 in HConnectionManager.java is calling the
> Thread.currentThread().interrupt(), due to which the GridGain thread kind of
> stops doing what it was meant to do, and never responsds to master node
> resulting in timeout in master.
>
> In order for line 1255 to execute, we will have to assume that all retries
> were exhausted.
> Hence, my query that what would cause a HTable.get() to get into a
> situation wherein
> HConnectionManager$HConnectionImplementation.getRegionServerWithRetries gets
> to line 1255.
>
>
> Regards,
> Srikanth
>
> ________________________________________
> From: [email protected] [[email protected]] on behalf of Stack [
> [email protected]]
> Sent: Friday, August 19, 2011 12:03 AM
> To: [email protected]
> Subject: Re: Query regarding HTable.get and timeouts
>
> Is your client running inside a container of some form and could the
> container be doing the interrupting?   I've not come across
> client-side thread interrupts before.
> St.Ack
>
> On Thu, Aug 18, 2011 at 7:37 AM, Srikanth P. Shreenivas
> <[email protected]> wrote:
> > Hi,
> >
> > We are experiencing an issue in our HBase Cluster wherein some of the
> gets are timing outs at:
> >
> > java.io.IOException: Giving up trying to get region server: thread is
> interrupted.
> >                at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
> >                at
> org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
> >
> >
> > When we look at the logs of master, zookeeper and region servers, there
> is nothing that indicates anything abnormal.
> >
> > I tried looking up below functions, but at this point could not make much
> out of it.
> >
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
>  - getRegionServerWithRetries  starts at Line 1233
> >
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
>  Htable.get starts at Line 611.
> >
> >
> > If you can please suggest what are the scenarios in which all retries can
> get exhausted resulting in thread interruption.
> >
> > We have seen this issue in two of our HBase Clusters, where load is quite
> less.  We have 20 reads per minute,  we run 1 zookeeper, and 4 regionservers
> in fully-distributed mode (Hadoop).  We are using CDH3.
> >
> > Thanks,
> > Srikanth
> >
> > ________________________________
> >
> > http://www.mindtree.com/email/disclaimer.html
> >
>

Re: Query regarding HTable.get and timeouts

Reply via email to