But that will depend on the timeout that they have configured, right? I have seen some third party applications recommending to increase timeouts to 1h30...
JMS Le 2013-11-20 12:08, "Vladimir Rodionov" <[email protected]> a écrit : > >>The RpcQueueTime metrics are a measurement of how long individual calls > >>stay in this queued state. If your handlers were never 100% occupied, > this > >>value would be 0. An average of 3 hours is concerning, it basically > means > >>that when a call comes into the RegionServer it takes on average 3 hours > to > >>start processing, because handlers are all occupied for that amount of > time. > > Definitely, this metric is meaningless because default RPC timeout is 60 > sec and under no circumstances > call data can survive this 60 sec in a callQueue unless we have a bug. > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Bryan Beaudreault [[email protected]] > Sent: Wednesday, November 20, 2013 8:55 AM > To: [email protected] > Subject: Re: Average RPC Queue Time > > A regionserver is configured with a certain number of RPC handlers > (hbase.regionserver.handler.count). When these handlers are all occupied, > the calls back up into a callQueue. This call queue is bounded by > ipc.server.max.callqueue.size (defaulting to 1GB of serialized requests) > and ipc.server.max.callqueue.length (10 * numHandlers). So, with 5 > handlers a maximum of 50 calls will be queued up before requests are > rejected outright. > > The RpcQueueTime metrics are a measurement of how long individual calls > stay in this queued state. If your handlers were never 100% occupied, this > value would be 0. An average of 3 hours is concerning, it basically means > that when a call comes into the RegionServer it takes on average 3 hours to > start processing, because handlers are all occupied for that amount of > time. > > You can lower time through a few options: > > - Up the max number of handlers (beware using too many, as this just shifts > load to the disks, and also takes more memory) > - Make your requests smaller (use caching or batching on a scan to return > less data per RPC call) > - Lower your client-side timeouts, so that you can handle the issue on the > client side (i.e. retries) > - Investigate disk or network issues that could be causing extremely slow > response times (ensure data is 100% local, too) > > Just for perspective, the nominal operating value of this probably varies > greatly with the workload/environment, but in our clusters we have an > Average RPC Queue Time of near 0. We only see the callQueue fill up in the > case of real problems, and almost always respond with immediate > redistribution of data to other servers. > > HTH > > - Bryan > > > On Wed, Nov 20, 2013 at 11:31 AM, Shawn Hermans <[email protected] > >wrote: > > > I am using CDH 4.3.1 with HBase 0.94.6. Using Cloudera manager, I > notice a > > metric called Average RPC Queue Time is abnormal. It is over 3 hours > > normally and drops to a few minutes during non-peak times. What is the > > meaning of this metric? Are these high queue times normal? > > > > Thanks, > > Shawn > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
