?????? Query occasionally respond very slowly

???? Tue, 23 Jan 2018 00:38:27 -0800

Thank you very much for your reply.
The probability of this slow query is very low, but it has a great impact on 
the business. Do I need to always dump jstack? Have you ever been in this 
situation?



------------------ ???????? ------------------
??????: "Yu Li";<[email protected]>;
????????: 2018??1??23??(??????) ????4:23
??????: "Hbase-User"<[email protected]>;

????: Re: Query occasionally respond very slowly



0.98.6 is really old version and doesn't include some later on improvements
which could help locating the issue such as HBASE-16033
<https://issues.apache.org/jira/browse/HBASE-16033> (including row message
of the slow query so we could repeat the query in hbase shell and try
reproducing the issue, available from 0.98.21) and HBASE-15160
<https://issues.apache.org/jira/browse/HBASE-15160> (add metrics on HDFS
operations so we could check whether any IO spike at the same time of the
slow respond, available from 1.4.0), so my first suggestion is to upgrade
your hbase version (especially branch-0.98 is already EOL, FYI), or
manually backport these patches to your version and try.

If upgrading is impossible, from the posted limited information I could
only say the DN log seems irrelative to the issue. In my perspective the
most effective way to locate the problem is to dump the jstack of the RS
when slow query happening and check where it's waiting (the slow query last
for more than 20 seconds, so if it happens frequently, there's a high
chance to catch it).

Hope these information helps, and good luck.

Best Regards,
Yu

On 23 January 2018 at 15:46, ???? <[email protected]> wrote:

> The hbase version is 0.98.6-cdh5.2.0.
> The HDFS version is 2.5.0-cdh5.2.0.
>
>
> ------------------ ???????? ------------------
> ??????: "????-????";<[email protected]>;
> ????????: 2018??1??23??(??????) ????2:50
> ??????: "user"<[email protected]>;
>
> ????: Query occasionally respond very slowly
>
>
>
> Recently, query occasionally respond very slowly.These queries usually
> return quickly, within a few milliseconds.But occasionally it gets very
> slow, reaching more than 20 seconds.I looked at the gc log and there was no
> full gc happening.
>
>
> A regionserver log is as follows??
> 2018-01-22 16:38:13,580 WARN  
> [B.defaultRpcServer.handler=35,queue=5,port=60020]
> ipc.RpcServer: (responseTooSlow): {"processingtimems":23513,"
> call":"Get(org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$GetRequest)","client":"10.94.76.216:34324","
> starttimems":1516610270064,"queuetimems":0,"class":"
> HRegionServer","responsesize":412,"method":"Get"}
>
>
> One of the datanode logs is as follows??
> 2018-01-22 16:37:42,417 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
> src: /10.90.18.70:50010, dest: /10.90.18.70:54469, bytes: 12288, op:
> HDFS_READ, cliID: 
> DFSClient_hb_rs_l-hbase50.dba.cn2.qunar.com,60020,1505725242560_-1708409423_37,
> offset: 948224, srvID: ab75b2a1-af8b-4fcf-a93a-6245aab9241c, blockid:
> BP-1760821987-10.90.18.66-1447407547902:blk_1121353497_47612799,
> duration: 9866301
> 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Receiving BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> src: /10.90.18.69:36293 dest: /10.90.18.70:50010
> 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> opWriteBlock BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> received exception 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> already exists in state FINALIZED and thus cannot be created.
> 2018-01-22 16:37:42,499 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> l-hbase50.dba.cn2:50010:DataXceiver error processing WRITE_BLOCK
> operation  src: /10.90.18.69:36293 dst: /10.90.18.70:50010;
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> already exists in state FINALIZED and thus cannot be created.
> 2018-01-22 16:37:42,506 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
> src: /10.90.18.70:50010, dest: /10.90.18.70:54510, bytes: 12288, op:
> HDFS_READ, cliID: 
> DFSClient_hb_rs_l-hbase50.dba.cn2.qunar.com,60020,1505725242560_-1708409423_37,
> offset: 34276352, srvID: ab75b2a1-af8b-4fcf-a93a-6245aab9241c, blockid:
> BP-1760821987-10.90.18.66-1447407547902:blk_1121354564_47613866,
> duration: 7418016
>
>
>
>
>
>
>
> Another datanode log is as follows:
> 2018-01-22 16:37:42,497 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.90.18.69, 
> datanodeUuid=95aafbc6-239c-4661-ba37-4687ae9e663b,
> infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-
> 1fa1156b-bd6f-4113-8d02-3af80df935c3;nsid=470632750;c=0) Starting thread
> to transfer BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> to 10.90.18.70:50010
> 2018-01-22 16:37:42,499 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.90.18.69, 
> datanodeUuid=95aafbc6-239c-4661-ba37-4687ae9e663b,
> infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-
> 1fa1156b-bd6f-4113-8d02-3af80df935c3;nsid=470632750;c=0):Failed to
> transfer BP-1760821987-10.90.18.66-1447407547902:blk_1121355749_47615051
> to 10.90.18.70:50010 got
> java.net.SocketException: Original Exception : java.io.IOException:
> Connection reset by peer
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>         at sun.nio.ch.FileChannelImpl.transferToDirectly(
> FileChannelImpl.java:433)
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565)
>         at org.apache.hadoop.net.SocketOutputStream.transferToFully(
> SocketOutputStream.java:223)
>         at org.apache.hadoop.hdfs.server.datanode.BlockSender.
> sendPacket(BlockSender.java:547)
>         at org.apache.hadoop.hdfs.server.datanode.BlockSender.
> sendBlock(BlockSender.java:716)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode$
> DataTransfer.run(DataNode.java:1805)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Connection reset by peer
>         ... 8 more
> 2018-01-22 16:37:42,520 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
> src: /10.90.18.69:50010, dest: /10.90.18.69:49343, bytes: 14848, op:
> HDFS_READ, cliID: 
> DFSClient_hb_rs_l-hbase49.dba.cn2.qunar.com,60020,1464835349894_1899722521_37,
> offset: 61291520, srvID: 95aafbc6-239c-4661-ba37-4687ae9e663b, blockid:
> BP-1760821987-10.90.18.66-1447407547902:blk_1121217415_47476717,
> duration: 5939553
>
>
>
> This question confused me.What caused the problem?How do we solve this?
>

?????? Query occasionally respond very slowly

Reply via email to