Thanks for all the help. Follow-up question. Is it normal to see the average RPC call queue length stay at over 100 for times of peak usage?
On Wed, Nov 20, 2013 at 12:09 PM, Bryan Beaudreault < [email protected]> wrote: > I'm not sure why it is so much higher than your rpc timeout. Enabling > DEBUG log level on org.apache.hadoop.ipc.HBaseServer.trace and > org.apache.hadoop.ipc.HBaseServer loggers might provide you with some > insight. > > > On Wed, Nov 20, 2013 at 12:55 PM, Shawn Hermans <[email protected] > >wrote: > > > Shouldn't be. Looks like Cloudera just converts it to nicer values. So > > the actual peak value is 14438088.62 ms for Average RPC queue time. > > > > > > On Wed, Nov 20, 2013 at 11:51 AM, Bryan Beaudreault < > > [email protected]> wrote: > > > > > I'm not sure about the cloudera manager ui, but the metric posted to > JMX > > is > > > in milliseconds. Are we sure that is not accounting for the confusion? > > > > > > > > > On Wed, Nov 20, 2013 at 12:46 PM, Shawn Hermans < > [email protected] > > > >wrote: > > > > > > > Our hbase.rpc.timeout is set for 60 seconds. Confused as to why I > > would > > > > see such large values for the average rpc queue time. Are there any > > > other > > > > metrics? The RPC call queue length is consistently between 150 and > 200 > > > > during peak usage time. Is this normal? > > > > > > > > Regards, > > > > Shawn > > > > > > > > > > > > On Wed, Nov 20, 2013 at 11:24 AM, Jean-Marc Spaggiari < > > > > [email protected]> wrote: > > > > > > > > > But that will depend on the timeout that they have configured, > right? > > > > > > > > > > I have seen some third party applications recommending to increase > > > > timeouts > > > > > to 1h30... > > > > > > > > > > JMS > > > > > Le 2013-11-20 12:08, "Vladimir Rodionov" <[email protected]> > a > > > > > écrit : > > > > > > > > > > > >>The RpcQueueTime metrics are a measurement of how long > individual > > > > calls > > > > > > >>stay in this queued state. If your handlers were never 100% > > > > occupied, > > > > > > this > > > > > > >>value would be 0. An average of 3 hours is concerning, it > > > basically > > > > > > means > > > > > > >>that when a call comes into the RegionServer it takes on > average > > 3 > > > > > hours > > > > > > to > > > > > > >>start processing, because handlers are all occupied for that > > amount > > > > of > > > > > > time. > > > > > > > > > > > > Definitely, this metric is meaningless because default RPC > timeout > > is > > > > 60 > > > > > > sec and under no circumstances > > > > > > call data can survive this 60 sec in a callQueue unless we have > a > > > bug. > > > > > > > > > > > > Best regards, > > > > > > Vladimir Rodionov > > > > > > Principal Platform Engineer > > > > > > Carrier IQ, www.carrieriq.com > > > > > > e-mail: [email protected] > > > > > > > > > > > > ________________________________________ > > > > > > From: Bryan Beaudreault [[email protected]] > > > > > > Sent: Wednesday, November 20, 2013 8:55 AM > > > > > > To: [email protected] > > > > > > Subject: Re: Average RPC Queue Time > > > > > > > > > > > > A regionserver is configured with a certain number of RPC > handlers > > > > > > (hbase.regionserver.handler.count). When these handlers are all > > > > > occupied, > > > > > > the calls back up into a callQueue. This call queue is bounded > by > > > > > > ipc.server.max.callqueue.size (defaulting to 1GB of serialized > > > > requests) > > > > > > and ipc.server.max.callqueue.length (10 * numHandlers). So, > with 5 > > > > > > handlers a maximum of 50 calls will be queued up before requests > > are > > > > > > rejected outright. > > > > > > > > > > > > The RpcQueueTime metrics are a measurement of how long individual > > > calls > > > > > > stay in this queued state. If your handlers were never 100% > > > occupied, > > > > > this > > > > > > value would be 0. An average of 3 hours is concerning, it > > basically > > > > > means > > > > > > that when a call comes into the RegionServer it takes on average > 3 > > > > hours > > > > > to > > > > > > start processing, because handlers are all occupied for that > amount > > > of > > > > > > time. > > > > > > > > > > > > You can lower time through a few options: > > > > > > > > > > > > - Up the max number of handlers (beware using too many, as this > > just > > > > > shifts > > > > > > load to the disks, and also takes more memory) > > > > > > - Make your requests smaller (use caching or batching on a scan > to > > > > return > > > > > > less data per RPC call) > > > > > > - Lower your client-side timeouts, so that you can handle the > issue > > > on > > > > > the > > > > > > client side (i.e. retries) > > > > > > - Investigate disk or network issues that could be causing > > extremely > > > > slow > > > > > > response times (ensure data is 100% local, too) > > > > > > > > > > > > Just for perspective, the nominal operating value of this > probably > > > > varies > > > > > > greatly with the workload/environment, but in our clusters we > have > > an > > > > > > Average RPC Queue Time of near 0. We only see the callQueue fill > > up > > > in > > > > > the > > > > > > case of real problems, and almost always respond with immediate > > > > > > redistribution of data to other servers. > > > > > > > > > > > > HTH > > > > > > > > > > > > - Bryan > > > > > > > > > > > > > > > > > > On Wed, Nov 20, 2013 at 11:31 AM, Shawn Hermans < > > > > [email protected] > > > > > > >wrote: > > > > > > > > > > > > > I am using CDH 4.3.1 with HBase 0.94.6. Using Cloudera > manager, > > I > > > > > > notice a > > > > > > > metric called Average RPC Queue Time is abnormal. It is over 3 > > > hours > > > > > > > normally and drops to a few minutes during non-peak times. > What > > is > > > > the > > > > > > > meaning of this metric? Are these high queue times normal? > > > > > > > > > > > > > > Thanks, > > > > > > > Shawn > > > > > > > > > > > > > > > > > > > Confidentiality Notice: The information contained in this > message, > > > > > > including any attachments hereto, may be confidential and is > > intended > > > > to > > > > > be > > > > > > read only by the individual or entity to whom this message is > > > > addressed. > > > > > If > > > > > > the reader of this message is not the intended recipient or an > > agent > > > or > > > > > > designee of the intended recipient, please note that any review, > > use, > > > > > > disclosure or distribution of this message or its attachments, in > > any > > > > > form, > > > > > > is strictly prohibited. If you have received this message in > > error, > > > > > please > > > > > > immediately notify the sender and/or > [email protected] > > > > > > delete or destroy any copy of this message and its attachments. > > > > > > > > > > > > > > > > > > > > >
