Jingcheng Du created HDFS-9668: ---------------------------------- Summary: Many long-time BLOCKED threads on FsDatasetImpl in a tiered storage Key: HDFS-9668 URL: https://issues.apache.org/jira/browse/HDFS-9668 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Jingcheng Du Assignee: Jingcheng Du
During the HBase test on a tiered storage of HDFS (WAL is stored in SSD/RAMDISK, and all other files are stored in HDD), we observe many long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part of the jstack result: {noformat} "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48521 [Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread t@93336 java.lang.Thread.State: BLOCKED at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1111) - waiting to lock <18324c9> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48520 [Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at /192.168.50.16:48520 [Receiving block BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread t@93335 java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1012) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140) - locked <18324c9> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:183) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None {noformat} We measured the execution of some operations in FsDatasetImpl during the test. Here following is the result. !execution_time.png! The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy load take a really long time. It means one slow operation of finalizeBlock, addBlock and createRbw in a slow storage can block all the other same operations in the same DataNode, especially in HBase when many wal/flusher/compactor are configured. We need a finer grained lock mechanism in a new FsDatasetImpl implementation and users can choose the implementation by configuring "dfs.datanode.fsdataset.factory" in DataNode. We can implement the lock by either storage level or block-level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)