[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642695#comment-16642695 ] Yiqun Lin commented on HDFS-13768: -- Thanks for sharing the results. +1 for the v03 patch for branch-2. Committing... > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768-branch-2.01.patch, > HDFS-13768-branch-2.02.patch, HDFS-13768-branch-2.03.patch, > HDFS-13768.01-branch-2.patch, HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641595#comment-16641595 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~linyiqun] for review. I uploaded V3 patch for branch-2. I missed on change. Both the tests are passing with V3 patch {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.833 s - in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList [INFO] Running org.apache.hadoop.hdfs.server.namenode.TestFileTruncate [INFO] Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 92.182 s - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate [INFO] [INFO] Results: [INFO] [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0 {noformat} > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768-branch-2.01.patch, > HDFS-13768-branch-2.02.patch, HDFS-13768-branch-2.03.patch, > HDFS-13768.01-branch-2.patch, HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641276#comment-16641276 ] Yiqun Lin commented on HDFS-13768: -- The change LGTM. [~surendrasingh], could you please verify following related UTs: * TestFsVolumeList#testAddRplicaProcessorForAddingReplicaInMap * TestFileTruncate(since we have some change of util class FsDatasetTestUtils#getStoredGenerationStamp that used under this test) > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768-branch-2.01.patch, > HDFS-13768-branch-2.02.patch, HDFS-13768.01-branch-2.patch, > HDFS-13768.01.patch, HDFS-13768.02.patch, HDFS-13768.03.patch, > HDFS-13768.04.patch, HDFS-13768.05.patch, HDFS-13768.06.patch, > HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641191#comment-16641191 ] Surendra Singh Lilhore commented on HDFS-13768: --- Attached V2 patch for branch-2 > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768-branch-2.01.patch, > HDFS-13768-branch-2.02.patch, HDFS-13768.01-branch-2.patch, > HDFS-13768.01.patch, HDFS-13768.02.patch, HDFS-13768.03.patch, > HDFS-13768.04.patch, HDFS-13768.05.patch, HDFS-13768.06.patch, > HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639516#comment-16639516 ] Yiqun Lin commented on HDFS-13768: -- I have filed HDFS-13962 for track remaining work. Just some minor work, I assign to you, [~surendrasingh]. I will help take the review there also, :). > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768-branch-2.01.patch, > HDFS-13768.01-branch-2.patch, HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639504#comment-16639504 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thansk [~linyiqun] I will fix the review comment in next patch. {quote} I plan to file another JIRA to fix some places still needed to improve for trunk. {quote} Sure, pls file new jira... > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768-branch-2.01.patch, > HDFS-13768.01-branch-2.patch, HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639451#comment-16639451 ] Yiqun Lin commented on HDFS-13768: -- Thanks [~surendrasingh] for attaching the patch for branch-2. When I reviewed this patch, I found some differences between these two patches. Also caught some minor changes I was missing for trunk patch. For the branch-2 patch, some minor comments: *ReplicaMap#addAndGet* For the logic consistentency, I prefer to use following way to get old replica info instead of {{m.get(replicaInfo);}} {code:java} ReplicaInfo oldReplicaInfo = m.get(new Block(replicaInfo.getBlockId())); {code} *BlockPoolSlice* Can we add the additional null check of {{addReplicaThreadPool}} before invoking initializeAddReplicaPool ? That can avoid synchronized lock acquiring behaviour if pool is already initialized. *FsDatasetImplTestUtils#getStoredGenerationStamp* We should also use {{FILE_COMPARATOR}} to sort the list files. *TestFsVolumeList.java* Following one line looks unaligned, can you format this line? {code} +RamDiskReplicaTracker ramDiskReplicaMap = RamDiskReplicaTracker +.getInstance(conf, fsDataset); +FsVolumeImpl vol = (FsVolumeImpl) fsDataset.getFsVolumeReferences().get(0); + String bpid = cluster.getNamesystem().getBlockPoolId(); +// It will create BlockPoolSlice.AddReplicaProcessor task's and lunch in +// ForkJoinPool recursively {code} I found the wrong patch name for branch-2, the right way is HDFS-13768-branch-2.xx.patch. Attach the same patch again. I plan to file another JIRA to fix some places still needed to improve for trunk. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768.01-branch-2.patch, HDFS-13768.01.patch, > HDFS-13768.02.patch, HDFS-13768.03.patch, HDFS-13768.04.patch, > HDFS-13768.05.patch, HDFS-13768.06.patch, HDFS-13768.07.patch, > HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638050#comment-16638050 ] Surendra Singh Lilhore commented on HDFS-13768: --- Attached branch-2 patch.. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768.01-branch-2.patch, HDFS-13768.01.patch, > HDFS-13768.02.patch, HDFS-13768.03.patch, HDFS-13768.04.patch, > HDFS-13768.05.patch, HDFS-13768.06.patch, HDFS-13768.07.patch, > HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635132#comment-16635132 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~linyiqun]. {quote} would you mind attaching the patch for branch-2? {quote} Sure, I will attach soon... > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634848#comment-16634848 ] Hudson commented on HDFS-13768: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15090 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15090/]) HDFS-13768. Adding replicas to volume map makes DataNode start slowly. (yqlin: rev 5689355783de005ebc604f4403dc5129a286bfca) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetUtil.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634838#comment-16634838 ] Yiqun Lin commented on HDFS-13768: -- I have committed this to trunk and branch-3.1, but there are some conflicts when backporting to branch-2. [~surendrasingh], would you mind attaching the patch for branch-2? I think this will be nice to have in 2.x versions. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634835#comment-16634835 ] Yiqun Lin commented on HDFS-13768: -- The failed UTs are not related. +1. Committing this. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631396#comment-16631396 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 48s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941631/HDFS-13768.07.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 08c78a2857ae 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5c8d907 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/25159/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/25159/testReport/ | | Max. process+thread count | 4043 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631305#comment-16631305 ] Yiqun Lin commented on HDFS-13768: -- Thanks [~surendrasingh] for sharing the data. The improvement looks great! Jenkins is fine now, I'd like to attach the same patch of v06 patch to re-trigger this. +1 from me. I will fold off the commit a couple of days in case [~arpitagarwal] or other guys have comments for this. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.07.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630576#comment-16630576 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~linyiqun]. Attached v6 patch and fixed whitespaces warnings. {quote}BTW, [~surendrasingh], would you mind making a new test based on latest patch? I am curious about current rate compared with the data you gave before. {quote} After latest path it look 3465ms. its almost 80% faster compare to initial time (16772ms). > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.06.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629686#comment-16629686 ] Yiqun Lin commented on HDFS-13768: -- Thanks [~surendrasingh] for addressing the comments. The latest patch almost looks good to me. Only following generated whitespaces warnings we need to fix. +1 for others. {noformat} ./hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml:1288: block in volume. Default value for this configuration is max of ./hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml:1289: (volume * number of bp_service, number of processor) {noformat} BTW, [~surendrasingh], would you mind making a new test based on latest patch? I am curious about current rate compared with the data you gave before. {quote}Without this fix DN took 16772ms to load 101260 blocks With this patch and 8 processor(default) its took 9823ms {quote} > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628977#comment-16628977 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}126m 23s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}193m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.server.namenode.sps.TestBlockStorageMovementAttemptedItems | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941381/HDFS-13768.05.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux eead017d7a2c 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e5287a4 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | whitespace |
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628681#comment-16628681 ] Surendra Singh Lilhore commented on HDFS-13768: --- Attached updated patch v5. Fixed all above comments. Please review.. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, > HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628302#comment-16628302 ] Yiqun Lin commented on HDFS-13768: -- {quote} we need old replica to resolve duplicate replica, So we can't change this method. {quote} Agree, I am missing for this, [~surendrasingh]. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627630#comment-16627630 ] Arpit Agarwal commented on HDFS-13768: -- bq. FsDatasetUtil.getGenrationStampFromFile(), This method loop over the all the file to find out the meta file. This is time consuming. We can sort the list of file's and in sorted list next to block file metafile will be there, so we no need to loop and find out the meta file. That's a nice find [~surendrasingh]. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627486#comment-16627486 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~linyiqun] and [~knanasi] for review. {quote}*ReplicaMap#addAndGet* For the new method {{addAndGet}}, I would prefer to return a boolean type value to see if new replicaInfo was added. For example, if old replicainfo exists, we return false. And then we don't need to check the reference of returned replicaInfo in {{BlockPoolSlice#addReplicaToReplicasMap}}, it looks a little confused. {quote} we need old replica to resolve duplicate replica, So we can't change this method. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626406#comment-16626406 ] Kitti Nanasi commented on HDFS-13768: - Thanks [~surendrasingh] for the patch and [~linyiqun] for reviewing it! I have some minor comments: - TestDataNodeVolumeFailureReporting test failure does seem related to the patch - I think it would be better to rename the forkJoinTasks from f1, f2 to something meaningful, like the comments before them and then the comments can be deleted - TestFsVolumeList#testAddRplicaProcessorForAddingReplicaInMap could have a timeout > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625737#comment-16625737 ] Yiqun Lin commented on HDFS-13768: -- Thanks [~surendrasingh] for updating the patch! Some minor comments from me: *ReplicaMap#addAndGet* For the new method {{addAndGet}}, I would prefer to return a boolean type value to see if new replicaInfo was added. For example, if old replicainfo exists, we return false. And then we don't need to check the reference of returned replicaInfo in {{BlockPoolSlice#addReplicaToReplicasMap}}, it looks a little confused. *FsDatasetUtil#getGenerationStampFromFile* * Following loop seems can be deleted if I am understanding correctly. Since list files were ordered, the meta file only will be the next file if it really exists. {code:java} for (int j = 0; j < listdir.length; j++) { String path = listdir[j].getName(); if (!path.startsWith(blockName)) { continue; } if (blockFile.getCanonicalPath().equals(listdir[j].getCanonicalPath())) { continue; } return Block.getGenerationStamp(listdir[j].getName()); } {code} * Please complete the javadoc for this method. Other minor comments * FsVolumeList.java: There need a white space after 'pool'. * hdfs-default.xml: The description of the this config should be updated since we use max of (volume * number of bp_service, number of processor) actually. Not always the value we configured. * TestFsVolumeList.java 1. We can use {{Configuration#setInt}} to set {{DFS_REPLICATION_KEY}}. 2.I prefer to explicitly reset the thread pool size (e.g. 5 or 10) for ensure making adding replica behaviour concurrently running in ForkJoinPool. * Please fix findbugs error and white space warning happened in line {{DFSConfigKeys.java:368}}. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624974#comment-16624974 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 3s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}149m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible doublecheck on org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addReplicaThreadPool in new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:[lines 186-188] | | Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940946/HDFS-13768.04.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624923#comment-16624923 ] Surendra Singh Lilhore commented on HDFS-13768: --- fixed checkstyle.. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624795#comment-16624795 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 56s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 715 unchanged - 0 fixed = 719 total (was 715) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 59s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 23s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}149m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible doublecheck on org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addReplicaThreadPool in new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:[lines 186-188] | | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940917/HDFS-13768.03.patch | | Optional Tests | dupname asflicense compile javac javadoc
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624745#comment-16624745 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~linyiqun]. Attached updated patch. Please review... > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618683#comment-16618683 ] Yiqun Lin commented on HDFS-13768: -- Makes sense to me. Thanks for looking into this, [~surendrasingh]. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618651#comment-16618651 ] Surendra Singh Lilhore commented on HDFS-13768: --- [~linyiqun] and [~arpitagarwal], I found some more improvment point. !screenshot-1.png! If you see this screenshot {{FsDatasetUtil.getGenrationStampFromFile()}} and {{lock()}} is taking so much time. This can be improved. 1. {{FsDatasetUtil.getGenrationStampFromFile()}}, This method loop over the all the file to find out the meta file. This is time consuming. We can sort the list of file's and in sorted list next to block file metafile will be there, so we no need to loop and find out the meta file. {code:java} /** * Find the meta-file for the specified block file * and then return the generation stamp from the name of the meta-file. */ static long getGenerationStampFromFile(List files, File blockFile, int index) throws IOException { String blockName = blockFile.getName(); if ((index + 1) < files.size()) { // Check if next index file is meta file String metaFile = files.get(index + 1).getName(); if (metaFile.startsWith(blockName)) { return Block.getGenerationStamp(metaFile); } } //Search meta file in list for (int j = 0; j < files.size(); j++) { .. .. return Block.getGenerationStamp(files.get(j).getName()); } FsDatasetImpl.LOG.warn("Block " + blockFile + " does not have a metafile!"); return HdfsConstants.GRANDFATHER_GENERATION_STAMP; } {code} After this change *60% datanode statup time* is redused in my cluster. 2. {{BlockPoolSlice.addReplicaToReplicasMap()}}, in this method first its find the oldReplica from {{ReplicaMap}} and if it is null then it will add the ReplicaInfo in {{ReplicaMap}}. For this its need to get the lock two time. Its better to add one method in ReplicaMap for this work, so this work can be done in one lock. *Example:* {code:java} /** * Add a replica's meta information into the map, if already exist return the * old replicaInfo * * @param bpid * @param replicaInfo * @return */ ReplicaInfo addAndGet(String bpid, ReplicaInfo replicaInfo) { checkBlockPool(bpid); checkBlock(replicaInfo); try (AutoCloseableLock l = lock.acquire()) { FoldedTreeSet set = map.get(bpid); if (set == null) { // Add an entry for block pool if it does not exist already set = new FoldedTreeSet<>(); map.put(bpid, set); } ReplicaInfo oldReplicaInfo = set.get(replicaInfo.getBlockId(), LONG_AND_BLOCK_COMPARATOR); if (oldReplicaInfo != null) { return oldReplicaInfo; } else { set.add(replicaInfo); } return replicaInfo; } } {code} If you both are agree for this change then I will add this in next patch. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618628#comment-16618628 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-13768 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13768 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/25090/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618618#comment-16618618 ] Surendra Singh Lilhore commented on HDFS-13768: --- Hi [~arpitagarwal] bq. the default should probably be the same as today i.e. number of volumes. To keep it same as today we need to give default value = *valume's X blockPoolSlice's* > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617817#comment-16617817 ] Arpit Agarwal commented on HDFS-13768: -- Hi [~surendrasingh], the default should probably be the same as today i.e. number of volumes. Else we risk degrading performance when number of disks > number of processors. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617143#comment-16617143 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~linyiqun] for review... bq. This comment seems not fully addressed . I mean we can also make AddReplicaProcessor in an Asynchronous mode. Sorry I missed it. bq. Could you please add the UT for this improvement? yes, in next patch I will fix both comments. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617137#comment-16617137 ] Yiqun Lin commented on HDFS-13768: -- Thanks [~surendrasingh] for updating the patch. Almost looks good to me. Two comments from me: {quote}The adding replica operation can still speed up. Since the ReplicaMap is thread-safe to add replicas, we can also make AddReplicaProcessor of dir finalizedDir and rbwDir in an async runnable way. And getting their Future objects and wait for completion. {quote} This comment seems not fully addressed . I mean we can also make AddReplicaProcessor in an Asynchronous mode. The call ForkJoinPool#invoke is a sync way. Based on the v02, patch, we can improve this like following: {code:java} // add finalized replicas AddReplicaProcessor task = new AddReplicaProcessor(volumeMap, finalizedDir, lazyWriteReplicaMap, true, exceptions, subTaskQueue); ForkJoinTask f1 = addReplicaThreadPool.submit(task); // add rbw replicas task = new AddReplicaProcessor(volumeMap, rbwDir, lazyWriteReplicaMap, false, exceptions, subTaskQueue); ForkJoinTask f2 = addReplicaThreadPool.submit(task); try { f1.get(); f2.get(); } catch (InterruptedException | ExecutionException e) { // exception handle } //wait for all the tasks to finish. waitForSubTaskToFinish(subTaskQueue, exceptions); {code} Could you please add the UT for this improvement? > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612134#comment-16612134 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 57s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 466 unchanged - 0 fixed = 468 total (was 466) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 3s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 4s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible doublecheck on org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addReplicaThreadPool in new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:[lines 176-178] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939392/HDFS-13768.02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml |
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611948#comment-16611948 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~linyiqun] and [~arpitagarwal] for review.. Attached updated patch. Fixed all the above comments given by [~linyiqun]. {quote}Looks like this is going to use number processors x num disks threads by default. {quote} No, pool object is static. So it will use only number processors threads in pool by default. {quote}Any idea what kind of speedup you get with lower number of threads. e.g. 2? {quote} * Without this fix DN took *16772ms* to load 101260 blocks * With this patch and 8 processor(default) its took *9823ms* * Configured pool size to 2 then its took *8766ms* > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611051#comment-16611051 ] Arpit Agarwal commented on HDFS-13768: -- The patch did not apply cleanly for me. Can you please rebase it? Looks like this is going to use number processors x num disks threads by default. Any idea what kind of speedup you get with lower number of threads. e.g. 2? > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610178#comment-16610178 ] Yiqun Lin commented on HDFS-13768: -- The patch almost looks good to me. Some initial comments: *DFSConfigKeys.java* * I prefer to rename {{dfs.datanode.volumes.replica.add.threads}} to {{dfs.datanode.volumes.replica-add.threadpool.size}} * The default value of pool size {{Runtime.getRuntime().availableProcessors();}} will be better placed in class {{BlockPoolSlice}} not in config class. *BlockPoolSlice.java* * The adding replica operation can still speed up. Since the ReplicaMap is thread-safe to add replicas, we can also make {{AddReplicaProcessor}} of dir finalizedDir and rbwDir in an async runnable way. And getting their Future objects and wait for completion. * We can use {{MultipleIOException}} to return list IOExceptions, like following way {code} if (!exceptions.isEmpty()) { throw exceptions.get(0); } {code} to {code} if (!exceptions.isEmpty()) { throw MultipleIOException.createIOException(exceptions); } {code} * Please complete the javadoc of the method {{waitForSubTaskToFinish}}. Please fix other checkstyle issues and rebase the patch. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610138#comment-16610138 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 57s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 11 new + 466 unchanged - 0 fixed = 477 total (was 466) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 6 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 13m 33s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 30s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 32s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}169m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible doublecheck on org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.forkJoinPool in new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:new org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice(String, FsVolumeImpl, File, Configuration, Timer) At BlockPoolSlice.java:[lines 173-175] | | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.hdfs.TestMaintenanceState | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL |
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610125#comment-16610125 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 16 unchanged - 0 fixed = 28 total (was 16) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}171m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.TestPersistBlocks | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.TestDatanodeStartupFixesLegacyStorageIDs | | | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | | | hadoop.hdfs.TestDatanodeLayoutUpgrade | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.datanode.TestDeleteBlockPool | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12937996/HDFS-13768.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 69ec7d898a3c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | |
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610057#comment-16610057 ] Yiqun Lin commented on HDFS-13768: -- Thanks for working this, [~RANith] and [~surendrasingh]. I will get a chance to have a review recently, :). > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610055#comment-16610055 ] Surendra Singh Lilhore commented on HDFS-13768: --- Added initial patch, I will add test case in next patch. *+_Initial test report_+* # Before fix : Restarted datanode with 101260 block and it's took *16203ms* # After fix : Restarted datanode with 101260 block and it's took *9693ms* > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610040#comment-16610040 ] Surendra Singh Lilhore commented on HDFS-13768: --- Discussed with [~RANith] offline, assigning jira to my self, I will upload updated patch.. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Ranith Sardar >Priority: Major > Attachments: HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602728#comment-16602728 ] Surendra Singh Lilhore commented on HDFS-13768: --- Thanks [~RANith] for patch. Some review comment 1. You can start you first task in BlockPoolSlice#getVolumeMap() and then BlockPoolSlice#addToReplicasMap() can submit the sub task when item is directory. {code} if (!success) { // add finalized replicas addToReplicasMap(volumeMap, finalizedDir, lazyWriteReplicaMap, true); // add rbw replicas addToReplicasMap(volumeMap, rbwDir, lazyWriteReplicaMap, false); } {code} This can be changed to {code} if (!success) { // add finalized replicas AddReplicaProcessor task = new AddReplicaProcessor(volumeMap, finalizedDir, lazyWriteReplicaMap, true); forkJoinPool.invoke(task); // add rbw replicas task = new AddReplicaProcessor(volumeMap, rbwDir, lazyWriteReplicaMap, false); forkJoinPool.invoke(task); } {code} 2. You used {{forkJoinPool.invoke()}} in BlockPoolSlice#addToReplicasMap(). It will wait till the task is finished. You need to use here {{task.fork()}}. {code} +forkJoinPool.invoke(new AddReplicaProcessor(volumeMap, file, +lazyWriteReplicaMap, isFinalized)); {code} instead of this use {code} +AddReplicaProcessor task = new AddReplicaProcessor(volumeMap, file, +lazyWriteReplicaMap, isFinalized); +task.fork(); {code} 2. line 78, you shutdown the {{forkJoinPool}}. This may be by mistake. Pls check once. {code} + forkJoinPool.shutdown(); {code} 3. What if the exception occur's in {{AddReplicaProcessor}} ?, you need to report it back to the next level. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Ranith Sardar >Priority: Major > Attachments: HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: >
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601796#comment-16601796 ] Hadoop QA commented on HDFS-13768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 16 unchanged - 0 fixed = 28 total (was 16) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 53s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.TestPersistBlocks | | | hadoop.hdfs.TestDatanodeStartupFixesLegacyStorageIDs | | | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | | | hadoop.hdfs.TestDatanodeLayoutUpgrade | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.datanode.TestDeleteBlockPool | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HDFS-13768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12937996/HDFS-13768.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 439cf7342730 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ff036e4 | |
[jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599573#comment-16599573 ] Ranith Sardar commented on HDFS-13768: -- According to code structure, each thread of volumes will call getVolumeMap and add replicas under the given directory to the volume map is recursive call. As we can see, thread result will be in hold until all thread completes their performances. For recursive call in addToReplicasMap() method, will take more time for more sub-dirs (depended). *Solutions:* * Declare a common ForkJoinPool in BlockPoolSlice. * Considering all call for addToReplicasMap() as a single task. Submitting them to common ForkJoinPool. * forkpoolSize is considered as half of core size, because processor will busy with other works. > Adding replicas to volume map makes DataNode start slowly > --- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yiqun Lin >Assignee: Ranith Sardar >Priority: Major > Attachments: HDFS-13768.patch > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x7f40879ff000 nid=0x145da > runnable [0x7f4043a38000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at >