Thank you very much for your reply. The probability of this slow query is very low, but it has a great impact on the business. Do I need to always dump jstack? Have you ever been in this situation?
------------------ ???????? ------------------ ??????: "Yu Li";<[email protected]>; ????????: 2018??1??23??(??????) ????4:23 ??????: "Hbase-User"<[email protected]>; ????: Re: Query occasionally respond very slowly 0.98.6 is really old version and doesn't include some later on improvements which could help locating the issue such as HBASE-16033 <https://issues.apache.org/jira/browse/HBASE-16033> (including row message of the slow query so we could repeat the query in hbase shell and try reproducing the issue, available from 0.98.21) and HBASE-15160 <https://issues.apache.org/jira/browse/HBASE-15160> (add metrics on HDFS operations so we could check whether any IO spike at the same time of the slow respond, available from 1.4.0), so my first suggestion is to upgrade your hbase version (especially branch-0.98 is already EOL, FYI), or manually backport these patches to your version and try. If upgrading is impossible, from the posted limited information I could only say the DN log seems irrelative to the issue. In my perspective the most effective way to locate the problem is to dump the jstack of the RS when slow query happening and check where it's waiting (the slow query last for more than 20 seconds, so if it happens frequently, there's a high chance to catch it). Hope these information helps, and good luck. Best Regards, Yu On 23 January 2018 at 15:46, ???? <[email protected]> wrote: > The hbase version is 0.98.6-cdh5.2.0. > The HDFS version is 2.5.0-cdh5.2.0. > > > ------------------ ???????? ------------------ > ??????: "????-????";<[email protected]>; > ????????: 2018??1??23??(??????) ????2:50 > ??????: "user"<[email protected]>; > > ????: Query occasionally respond very slowly > > > > Recently, query occasionally respond very slowly.These queries usually > return quickly, within a few milliseconds.But occasionally it gets very > slow, reaching more than 20 seconds.I looked at the gc log and there was no > full gc happening. > > > A regionserver log is as follows?? > 2018-01-22 16:38:13,580 WARN > [B.defaultRpcServer.handler=35,queue=5,port=60020] > ipc.RpcServer: (responseTooSlow): {"processingtimems":23513," > call":"Get(org.apache.hadoop.hbase.protobuf.generated. > ClientProtos$GetRequest)","client":"10.94.76.216:34324"," > starttimems":1516610270064,"queuetimems":0,"class":" > HRegionServer","responsesize":412,"method":"Get"} > > > One of the datanode logs is as follows?? > 2018-01-22 16:37:42,417 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: > src: /10.90.18.70:50010, dest: /10.90.18.70:54469, bytes: 12288, op: > HDFS_READ, cliID: > DFSClient_hb_rs_l-hbase50.dba.cn2.qunar.com,60020,1505725242560_-1708409423_37, > offset: 948224, srvID: ab75b2a1-af8b-4fcf-a93a-6245aab9241c, blockid: > BP-1760821987-10.90.18.66-1447407547902:blk_1121353497_47612799, > duration: 9866301 > 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051 > src: /10.90.18.69:36293 dest: /10.90.18.70:50010 > 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > opWriteBlock BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051 > received exception > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: > Block BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051 > already exists in state FINALIZED and thus cannot be created. > 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > l-hbase50.dba.cn2:50010:DataXceiver error processing WRITE_BLOCK > operation src: /10.90.18.69:36293 dst: /10.90.18.70:50010; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: > Block BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051 > already exists in state FINALIZED and thus cannot be created. > 2018-01-22 16:37:42,506 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: > src: /10.90.18.70:50010, dest: /10.90.18.70:54510, bytes: 12288, op: > HDFS_READ, cliID: > DFSClient_hb_rs_l-hbase50.dba.cn2.qunar.com,60020,1505725242560_-1708409423_37, > offset: 34276352, srvID: ab75b2a1-af8b-4fcf-a93a-6245aab9241c, blockid: > BP-1760821987-10.90.18.66-1447407547902:blk_1121354564_47613866, > duration: 7418016 > > > > > > > > Another datanode log is as follows: > 2018-01-22 16:37:42,497 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.90.18.69, > datanodeUuid=95aafbc6-239c-4661-ba37-4687ae9e663b, > infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID- > 1fa1156b-bd6f-4113-8d02-3af80df935c3;nsid=470632750;c=0) Starting thread > to transfer BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051 > to 10.90.18.70:50010 > 2018-01-22 16:37:42,499 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.90.18.69, > datanodeUuid=95aafbc6-239c-4661-ba37-4687ae9e663b, > infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID- > 1fa1156b-bd6f-4113-8d02-3af80df935c3;nsid=470632750;c=0):Failed to > transfer BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051 > to 10.90.18.70:50010 got > java.net.SocketException: Original Exception : java.io.IOException: > Connection reset by peer > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > at sun.nio.ch.FileChannelImpl.transferToDirectly( > FileChannelImpl.java:433) > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565) > at org.apache.hadoop.net.SocketOutputStream.transferToFully( > SocketOutputStream.java:223) > at org.apache.hadoop.hdfs.server.datanode.BlockSender. > sendPacket(BlockSender.java:547) > at org.apache.hadoop.hdfs.server.datanode.BlockSender. > sendBlock(BlockSender.java:716) > at org.apache.hadoop.hdfs.server.datanode.DataNode$ > DataTransfer.run(DataNode.java:1805) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Connection reset by peer > ... 8 more > 2018-01-22 16:37:42,520 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: > src: /10.90.18.69:50010, dest: /10.90.18.69:49343, bytes: 14848, op: > HDFS_READ, cliID: > DFSClient_hb_rs_l-hbase49.dba.cn2.qunar.com,60020,1464835349894_1899722521_37, > offset: 61291520, srvID: 95aafbc6-239c-4661-ba37-4687ae9e663b, blockid: > BP-1760821987-10.90.18.66-1447407547902:blk_1121217415_47476717, > duration: 5939553 > > > > This question confused me.What caused the problem?How do we solve this? >
