Shouldn't be. Looks like Cloudera just converts it to nicer values. So the actual peak value is 14438088.62 ms for Average RPC queue time.
On Wed, Nov 20, 2013 at 11:51 AM, Bryan Beaudreault < [email protected]> wrote: > I'm not sure about the cloudera manager ui, but the metric posted to JMX is > in milliseconds. Are we sure that is not accounting for the confusion? > > > On Wed, Nov 20, 2013 at 12:46 PM, Shawn Hermans <[email protected] > >wrote: > > > Our hbase.rpc.timeout is set for 60 seconds. Confused as to why I would > > see such large values for the average rpc queue time. Are there any > other > > metrics? The RPC call queue length is consistently between 150 and 200 > > during peak usage time. Is this normal? > > > > Regards, > > Shawn > > > > > > On Wed, Nov 20, 2013 at 11:24 AM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > > > But that will depend on the timeout that they have configured, right? > > > > > > I have seen some third party applications recommending to increase > > timeouts > > > to 1h30... > > > > > > JMS > > > Le 2013-11-20 12:08, "Vladimir Rodionov" <[email protected]> a > > > écrit : > > > > > > > >>The RpcQueueTime metrics are a measurement of how long individual > > calls > > > > >>stay in this queued state. If your handlers were never 100% > > occupied, > > > > this > > > > >>value would be 0. An average of 3 hours is concerning, it > basically > > > > means > > > > >>that when a call comes into the RegionServer it takes on average 3 > > > hours > > > > to > > > > >>start processing, because handlers are all occupied for that amount > > of > > > > time. > > > > > > > > Definitely, this metric is meaningless because default RPC timeout is > > 60 > > > > sec and under no circumstances > > > > call data can survive this 60 sec in a callQueue unless we have a > bug. > > > > > > > > Best regards, > > > > Vladimir Rodionov > > > > Principal Platform Engineer > > > > Carrier IQ, www.carrieriq.com > > > > e-mail: [email protected] > > > > > > > > ________________________________________ > > > > From: Bryan Beaudreault [[email protected]] > > > > Sent: Wednesday, November 20, 2013 8:55 AM > > > > To: [email protected] > > > > Subject: Re: Average RPC Queue Time > > > > > > > > A regionserver is configured with a certain number of RPC handlers > > > > (hbase.regionserver.handler.count). When these handlers are all > > > occupied, > > > > the calls back up into a callQueue. This call queue is bounded by > > > > ipc.server.max.callqueue.size (defaulting to 1GB of serialized > > requests) > > > > and ipc.server.max.callqueue.length (10 * numHandlers). So, with 5 > > > > handlers a maximum of 50 calls will be queued up before requests are > > > > rejected outright. > > > > > > > > The RpcQueueTime metrics are a measurement of how long individual > calls > > > > stay in this queued state. If your handlers were never 100% > occupied, > > > this > > > > value would be 0. An average of 3 hours is concerning, it basically > > > means > > > > that when a call comes into the RegionServer it takes on average 3 > > hours > > > to > > > > start processing, because handlers are all occupied for that amount > of > > > > time. > > > > > > > > You can lower time through a few options: > > > > > > > > - Up the max number of handlers (beware using too many, as this just > > > shifts > > > > load to the disks, and also takes more memory) > > > > - Make your requests smaller (use caching or batching on a scan to > > return > > > > less data per RPC call) > > > > - Lower your client-side timeouts, so that you can handle the issue > on > > > the > > > > client side (i.e. retries) > > > > - Investigate disk or network issues that could be causing extremely > > slow > > > > response times (ensure data is 100% local, too) > > > > > > > > Just for perspective, the nominal operating value of this probably > > varies > > > > greatly with the workload/environment, but in our clusters we have an > > > > Average RPC Queue Time of near 0. We only see the callQueue fill up > in > > > the > > > > case of real problems, and almost always respond with immediate > > > > redistribution of data to other servers. > > > > > > > > HTH > > > > > > > > - Bryan > > > > > > > > > > > > On Wed, Nov 20, 2013 at 11:31 AM, Shawn Hermans < > > [email protected] > > > > >wrote: > > > > > > > > > I am using CDH 4.3.1 with HBase 0.94.6. Using Cloudera manager, I > > > > notice a > > > > > metric called Average RPC Queue Time is abnormal. It is over 3 > hours > > > > > normally and drops to a few minutes during non-peak times. What is > > the > > > > > meaning of this metric? Are these high queue times normal? > > > > > > > > > > Thanks, > > > > > Shawn > > > > > > > > > > > > > Confidentiality Notice: The information contained in this message, > > > > including any attachments hereto, may be confidential and is intended > > to > > > be > > > > read only by the individual or entity to whom this message is > > addressed. > > > If > > > > the reader of this message is not the intended recipient or an agent > or > > > > designee of the intended recipient, please note that any review, use, > > > > disclosure or distribution of this message or its attachments, in any > > > form, > > > > is strictly prohibited. If you have received this message in error, > > > please > > > > immediately notify the sender and/or [email protected] and > > > > delete or destroy any copy of this message and its attachments. > > > > > > > > > >
