[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-11-30 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-26.patch

Upload a new patch V26 to address [~arpitagarwal]'s comments except the ones 
with questions. Thanks.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, 
> HDFS-9668-25.patch, HDFS-9668-26.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-11-23 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-25.patch

Upload a new patch V25 to fix the checkstyle issue.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, 
> HDFS-9668-25.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-11-23 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-24.patch

Upload a new patch V24 that uses read-write lock to address [~arpitagarwal]'s 
comments. Hi guys, please help review. Thanks.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, 
> HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, 
> HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-11-06 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-23.patch

I checked the latest build, and these issues came up in other build too. The 
failures should not be related with this patch.
Re-upload the patch V23 (the same with the existing one) to trigger the Hadoop 
QA.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-11-03 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-23.patch

Upload a new patch V23 to address [~eddyxu]'s comments. Thanks!

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-11-02 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-22.patch

Upload a new patch V22, which adds configurations to hdfs-default.xml to fix 
the test failure in 
TestHdfsConfigFields.testCompareConfigurationClassAgainstXml.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, 
> HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-11-01 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-21.patch

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-10-30 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-20.patch

Rebase the code and upload a new patch V20.
The code had been changed since HADOOP-13702 was committed and the rebase, 
[~eddyxu] would you mind taking a look at the latest patch again? Thanks a lot!

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-10-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-19.patch

Re-upload the patch V19 to trigger the Hadoop QA.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-10-26 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Status: Patch Available  (was: Open)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-10-26 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Status: Open  (was: Patch Available)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-10-26 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-19.patch

Upload the patch V19 to fix the warnings in check style.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-10-26 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-18.patch

Upload a new patch V18 according to [~arpitagarwal] and [~eddyxu] after the 
HADOOP-13702 was committed. Thanks a lot for the comments!

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-29 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-17.patch

Upload a new patch V17 to fix the concurrent issue in InstrumentedReadLock. 
Thanks [~arpitagarwal].

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-29 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-16.patch

Upload a new patch V16 to refine the code in unit tests.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-15.patch

Upload a new patch V15 to fix the newly added checkstyle warning.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-14.patch

The failure of unit tests should not be related with this patch. Reattach the 
patch V14 to run the Haoop QA again.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: (was: HDFS-9668-14.patch)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-14.patch

Re-attach the patch V14 to trigger the Hadoop QA.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-14.patch

Refine the comments mentioned in HDFS-10828 by [~drankye] in a new patch V14.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-13.patch

Rebase the code after HDFS-10828, and upload a new patch V13.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-20 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-12.patch

Upload a new patch V12 to fix the checkstyle warnings.
Again, the warning ( FsDatasetImpl.java:2234:  @Override:3: Method length is 
170 lines (max allowed is 150).) won't be fixed.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-19 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-11.patch

Re-attach the patch V11 with the correct name.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-19 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: hdfs-9668-11.patch

Upload a new patch V11 to fix most of the new checkstyle issues .
Part of the issues are new, but part of them are introduced in previous commits 
that are still regarded as new.
This one (FsDatasetImpl.java:2219:  @Override:3: Method length is 168 lines 
(max allowed is 150).) won't be fixed. This method was longer than 150 lines 
before I modified it(I added two lines to this method in the patch), and most 
of the lines were taken by formats. And I don't think it is necessary to 
address this in this patch. Thanks.


> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-19 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: (was: hdfs-9668-11.patch)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-18 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-10.patch

Upload a new patch V10 according to [~eddyxu]'s comments.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, HDFS-9668-5.patch, 
> HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, HDFS-9668-9.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-14 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-9.patch

Upload a new patch V9 to fix the whitespace warning.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-14 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-8.patch

Rebase the code for the updates in HDFS-10636 and upload a new patch V8, and 
fix the issues in the unit tests.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> HDFS-9668-8.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9668:

Fix Version/s: (was: 3.0.0-alpha2)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-13 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-7.patch

Rebase and upload the new patch V7 to fix the merging issues.
Hi guys, I beg you for the reviews. Thanks a lot.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-31 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Status: Patch Available  (was: Open)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-31 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Status: Open  (was: Patch Available)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-31 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-6.patch

Rebase and upload the new patch V6 to fix the merging issues.
[~drankye], thanks for submitting the patch.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-30 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-9668:

Fix Version/s: 3.0.0-alpha2
   Status: Patch Available  (was: Open)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-5.patch

Upload a new patch V5 according to [~eddyxu]'s comments.
Any comments are welcome, thanks a lot.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-24 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-4.patch

Upload a new patch V4.
In this patch, AutoCloseableReadLock and AutoCloseableWriteLock are added.
The AutoCloseableLock is replaced by the AutoCloseableReadLock and 
AutoCloseableWriteLock, and the synchronization locks by block ids are added 
when AutoCloseableReadLock is used.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-18 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-3.patch

Upload a new patch V3. Sorry for the late updates.
This patch is much simpler. According to [~cmccabe] 's suggestion, the disk 
operations are moved out of the lock scope. Thanks.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-03-14 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9668:
---
Summary: Optimize the locking in FsDatasetImpl  (was: Many long-time 
BLOCKED threads on FsDatasetImpl in a tiered storage test)

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)