RE: Average RPC Queue Time

Jean-Marc Spaggiari Wed, 20 Nov 2013 09:25:07 -0800

But that will depend on the timeout that they have configured, right?

I have seen some third party applications recommending to increase timeouts
to 1h30...


JMS
Le 2013-11-20 12:08, "Vladimir Rodionov" <[email protected]> a écrit :

> >>The RpcQueueTime metrics are a measurement of how long individual calls
> >>stay in this queued state.  If your handlers were never 100% occupied,
> this
> >>value would be 0.  An average of 3 hours is concerning, it basically
> means
> >>that when a call comes into the RegionServer it takes on average 3 hours
> to
> >>start processing, because handlers are all occupied for that amount of
> time.
>
> Definitely, this metric is meaningless because default RPC timeout is 60
> sec and under no circumstances
> call data can survive this 60 sec in a callQueue unless we have  a bug.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [email protected]
>
> ________________________________________
> From: Bryan Beaudreault [[email protected]]
> Sent: Wednesday, November 20, 2013 8:55 AM
> To: [email protected]
> Subject: Re: Average RPC Queue Time
>
> A regionserver is configured with a certain number of RPC handlers
> (hbase.regionserver.handler.count).  When these handlers are all occupied,
> the calls back up into a callQueue.  This call queue is bounded by
> ipc.server.max.callqueue.size (defaulting to 1GB of serialized requests)
> and ipc.server.max.callqueue.length (10 * numHandlers).  So, with 5
> handlers a maximum of 50 calls will be queued up before requests are
> rejected outright.
>
> The RpcQueueTime metrics are a measurement of how long individual calls
> stay in this queued state.  If your handlers were never 100% occupied, this
> value would be 0.  An average of 3 hours is concerning, it basically means
> that when a call comes into the RegionServer it takes on average 3 hours to
> start processing, because handlers are all occupied for that amount of
> time.
>
> You can lower time through a few options:
>
> - Up the max number of handlers (beware using too many, as this just shifts
> load to the disks, and also takes more memory)
> - Make your requests smaller (use caching or batching on a scan to return
> less data per RPC call)
> - Lower your client-side timeouts, so that you can handle the issue on the
> client side (i.e. retries)
> - Investigate disk or network issues that could be causing extremely slow
> response times (ensure data is 100% local, too)
>
> Just for perspective, the nominal operating value of this probably varies
> greatly with the workload/environment, but in our clusters we have an
> Average RPC Queue Time of near 0.  We only see the callQueue fill up in the
> case of real problems, and almost always respond with immediate
> redistribution of data to other servers.
>
> HTH
>
>  - Bryan
>
>
> On Wed, Nov 20, 2013 at 11:31 AM, Shawn Hermans <[email protected]
> >wrote:
>
> > I am using CDH 4.3.1 with HBase 0.94.6.  Using Cloudera manager, I
> notice a
> > metric called Average RPC Queue Time is abnormal.  It is over 3 hours
> > normally and drops to a few minutes during non-peak times.  What is the
> > meaning of this metric? Are these high queue times normal?
> >
> > Thanks,
> > Shawn
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or [email protected] and
> delete or destroy any copy of this message and its attachments.
>

RE: Average RPC Queue Time

Reply via email to