I'm still struggling with the "slow" requests described above. As a temporary solution, I want to limit request execution time and just drop those messages that make HBase so long to respond. Could you advise which timeout fits best for this task. I see the following options: * hbase.rpc.timeout * hbase.client.operation.timeout * something else I missed?
On Tue, Oct 25, 2016 at 1:21 PM, Alexander Ilyin <[email protected]> wrote: > Hi Ted, Stack, > > thanks for the answers. > > We're not using Phoenix but the server process was running on some > datanodes. We stopped it to be sure, nothing changed. > > Ted, I suppose you meant region server jstack traces, here it is: > http://pastebin.com/CHzyrBp9 > http://pastebin.com/PdNczfVa > > Stack, our application pulls the messages from RabbitMQ queue and > processes them using HBase so there is no dependency on external spikes in > load. At least no obvious one. > It looks like random requests taking much more time to complete that the > others. It happens all the time starting Friday, not during some specific > time windows, several percent of requests just slow. > Region server GC pauses look normal -- less than 0.1 second every several > seconds. > > The fact that most of "long" requests have execution time in the range of > 800-900ms makes me think that we're hitting some timeout which is not > logged properly. But this is just an idea. > > > On Mon, Oct 24, 2016 at 7:20 PM, Stack <[email protected]> wrote: > >> Any correlation with a spike in load -- more i/os, more traffic, other >> tasks running on the box -- or a long GC. Is it random requests or over a >> particular period all requests suffer the 800ms penalty? >> Thanks, >> St.Ack >> >> On Mon, Oct 24, 2016 at 5:44 AM, Alexander Ilyin <[email protected]> >> wrote: >> >> > Hi, >> > >> > I'm experiencing some spikes in HBase response time. The problem >> appeared >> > in one moment: before all the requests used to take 4-30ms to complete >> and >> > now some of them (several percent) take as long as 800ms decreasing >> > significantly an average processing time. >> > >> > This happens with the requests to all the tables I have so it doesn't >> seem >> > to be related to some problem in data schema. And it didn't change >> anyway. >> > >> > I don't see any problems with the lack of resources: there is enough >> CPU, >> > memory on datanodes. >> > >> > Automatic splitting is off, no cluster balancing is taking place so it >> > shouldn't be the client retries as well. >> > >> > There are some stacktraces in datanode log but as far as I understand >> these >> > are normal and anyway they had existed before the problem appeared: >> > 2016-10-24 05:42:19,186 INFO >> > [B.defaultRpcServer.handler=17,queue=2,port=16020] >> > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x3a130ae2): could >> not >> > load 1075792477_BP-415105157-10.190.240.206-1468423545420 due to >> > InvalidToken exception. >> > org.apache.hadoop.security.token.SecretManager$InvalidToken: access >> > control >> > error while attempting to set up short-circuit access to >> > /apps/hbase/data/data/profile/DailyUserScores/ >> > 84105becaefcf071e43ebc4972d0249f/d/c7653cc98860451cb79e678e9 >> c3f0e85Block >> > token with block_token_identifier (expiryDate=1477279957879, >> > keyId=2039256459, userId=hbase, >> > blockPoolId=BP-415105157-10.190.240.206-1468423545420, >> blockId=1075792477, >> > access modes=[READ]) is expired. >> > >> > At the moment I'm a bit stuck so any advices on what to check next are >> > welcome. >> > >> > PS. HBase version is 1.1.2 >> > >> > >
