Shouldn't be.  Looks like Cloudera just converts it to nicer values.  So
the actual peak value is 14438088.62 ms for Average RPC queue time.


On Wed, Nov 20, 2013 at 11:51 AM, Bryan Beaudreault <
[email protected]> wrote:

> I'm not sure about the cloudera manager ui, but the metric posted to JMX is
> in milliseconds.  Are we sure that is not accounting for the confusion?
>
>
> On Wed, Nov 20, 2013 at 12:46 PM, Shawn Hermans <[email protected]
> >wrote:
>
> > Our hbase.rpc.timeout is set for 60 seconds.  Confused as to why I would
> > see such large values for the average rpc queue time.  Are there any
> other
> > metrics? The RPC call queue length is consistently between 150 and 200
> > during peak usage time.  Is this normal?
> >
> > Regards,
> > Shawn
> >
> >
> > On Wed, Nov 20, 2013 at 11:24 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > But that will depend on the timeout that they have configured, right?
> > >
> > > I have seen some third party applications recommending to increase
> > timeouts
> > > to 1h30...
> > >
> > > JMS
> > > Le 2013-11-20 12:08, "Vladimir Rodionov" <[email protected]> a
> > > écrit :
> > >
> > > > >>The RpcQueueTime metrics are a measurement of how long individual
> > calls
> > > > >>stay in this queued state.  If your handlers were never 100%
> > occupied,
> > > > this
> > > > >>value would be 0.  An average of 3 hours is concerning, it
> basically
> > > > means
> > > > >>that when a call comes into the RegionServer it takes on average 3
> > > hours
> > > > to
> > > > >>start processing, because handlers are all occupied for that amount
> > of
> > > > time.
> > > >
> > > > Definitely, this metric is meaningless because default RPC timeout is
> > 60
> > > > sec and under no circumstances
> > > > call data can survive this 60 sec in a callQueue unless we have  a
> bug.
> > > >
> > > > Best regards,
> > > > Vladimir Rodionov
> > > > Principal Platform Engineer
> > > > Carrier IQ, www.carrieriq.com
> > > > e-mail: [email protected]
> > > >
> > > > ________________________________________
> > > > From: Bryan Beaudreault [[email protected]]
> > > > Sent: Wednesday, November 20, 2013 8:55 AM
> > > > To: [email protected]
> > > > Subject: Re: Average RPC Queue Time
> > > >
> > > > A regionserver is configured with a certain number of RPC handlers
> > > > (hbase.regionserver.handler.count).  When these handlers are all
> > > occupied,
> > > > the calls back up into a callQueue.  This call queue is bounded by
> > > > ipc.server.max.callqueue.size (defaulting to 1GB of serialized
> > requests)
> > > > and ipc.server.max.callqueue.length (10 * numHandlers).  So, with 5
> > > > handlers a maximum of 50 calls will be queued up before requests are
> > > > rejected outright.
> > > >
> > > > The RpcQueueTime metrics are a measurement of how long individual
> calls
> > > > stay in this queued state.  If your handlers were never 100%
> occupied,
> > > this
> > > > value would be 0.  An average of 3 hours is concerning, it basically
> > > means
> > > > that when a call comes into the RegionServer it takes on average 3
> > hours
> > > to
> > > > start processing, because handlers are all occupied for that amount
> of
> > > > time.
> > > >
> > > > You can lower time through a few options:
> > > >
> > > > - Up the max number of handlers (beware using too many, as this just
> > > shifts
> > > > load to the disks, and also takes more memory)
> > > > - Make your requests smaller (use caching or batching on a scan to
> > return
> > > > less data per RPC call)
> > > > - Lower your client-side timeouts, so that you can handle the issue
> on
> > > the
> > > > client side (i.e. retries)
> > > > - Investigate disk or network issues that could be causing extremely
> > slow
> > > > response times (ensure data is 100% local, too)
> > > >
> > > > Just for perspective, the nominal operating value of this probably
> > varies
> > > > greatly with the workload/environment, but in our clusters we have an
> > > > Average RPC Queue Time of near 0.  We only see the callQueue fill up
> in
> > > the
> > > > case of real problems, and almost always respond with immediate
> > > > redistribution of data to other servers.
> > > >
> > > > HTH
> > > >
> > > >  - Bryan
> > > >
> > > >
> > > > On Wed, Nov 20, 2013 at 11:31 AM, Shawn Hermans <
> > [email protected]
> > > > >wrote:
> > > >
> > > > > I am using CDH 4.3.1 with HBase 0.94.6.  Using Cloudera manager, I
> > > > notice a
> > > > > metric called Average RPC Queue Time is abnormal.  It is over 3
> hours
> > > > > normally and drops to a few minutes during non-peak times.  What is
> > the
> > > > > meaning of this metric? Are these high queue times normal?
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > >
> > > > Confidentiality Notice:  The information contained in this message,
> > > > including any attachments hereto, may be confidential and is intended
> > to
> > > be
> > > > read only by the individual or entity to whom this message is
> > addressed.
> > > If
> > > > the reader of this message is not the intended recipient or an agent
> or
> > > > designee of the intended recipient, please note that any review, use,
> > > > disclosure or distribution of this message or its attachments, in any
> > > form,
> > > > is strictly prohibited.  If you have received this message in error,
> > > please
> > > > immediately notify the sender and/or [email protected] and
> > > > delete or destroy any copy of this message and its attachments.
> > > >
> > >
> >
>

Reply via email to