[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293977#comment-17293977 ] Stephen O'Donnell commented on HDFS-13677: -- I pushed this change to branch 2.10, so it will be in 2.10.2 when it is released. Thanks for letting us know about this [~seys] > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3, 2.10.2 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293891#comment-17293891 ] Stephen O'Donnell commented on HDFS-13677: -- Huum, this change is in branch 2.8, 2.9, 3.1, 3.2 and 3.3. It has somehow missed out on branch 2.10. I will see if I can cherry-pick it over to the 2.10 release line later. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293824#comment-17293824 ] chad commented on HDFS-13677: - Hi all, we just encountered this bug or one exactly like it in 2.10.1 . Maybe the patch missed being merged for 2.10.0 . Not a big deal for us we will simply stop/start the DN daemon. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902885#comment-16902885 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 17s{color} | {color:red} https://github.com/apache/hadoop/pull/780 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/8/console | | versions | git=2.7.4 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902624#comment-16902624 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 11s{color} | {color:red} https://github.com/apache/hadoop/pull/780 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/7/console | | versions | git=2.7.4 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898460#comment-16898460 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red} https://github.com/apache/hadoop/pull/780 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/6/console | | versions | git=2.7.4 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893857#comment-16893857 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 11s{color} | {color:red} https://github.com/apache/hadoop/pull/780 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/5/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1650#comment-1650 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red} https://github.com/apache/hadoop/pull/780 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/4/console | | versions | git=2.7.4 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830191#comment-16830191 ] Stephen O'Donnell commented on HDFS-13677: -- [~arpitagarwal] I have uploaded a patch based on the 2.9 branch. Two changes were needed: 1. Remove Lamdas and replace with nested for loops. 2. Replace String.join with StringUtils.join as String.join does not exist in Java 7. I ran both the changed test classes locally and they all passed and the code compiles under Java 7. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-13677-001.patch, HDFS-13677-002-2.9-branch.patch, > HDFS-13677-002.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829766#comment-16829766 ] Hudson commented on HDFS-13677: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16478 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16478/]) HDFS-13677. Dynamic refresh Disk configuration results in overwriting (arp: rev 4b4200f1f87ad40d9c19ba160f706ffd0470a8d4) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestReplicaMap.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0, 2.9.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Attachments: HDFS-13677-001.patch, HDFS-13677-002.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829745#comment-16829745 ] Arpit Agarwal commented on HDFS-13677: -- I am going to commit this shortly. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0, 2.9.0, 3.0.0, 3.1.0, 3.2.0 >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Attachments: HDFS-13677-001.patch, HDFS-13677-002.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829686#comment-16829686 ] Siyao Meng commented on HDFS-13677: --- [~xuzq_zander] Nice work. lgtm on patch rev 002. Ran unit test locally w/ and w/o the fix. FsDatasetImpl#activateVolume() is introduced in HDFS-9715, so I believe we need to backport this to 2.8.x and 2.9.x. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Attachments: HDFS-13677-001.patch, HDFS-13677-002.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829646#comment-16829646 ] Arpit Agarwal commented on HDFS-13677: -- Thank you for reporting and fixing this [~xuzq_zander]. Also thanks [~sodonnell]! I am +1 on the latest patch, I especially like the targeted unit test for mergeAll. Will hold off committing for a day or two in case someone else has comments. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Assignee: xuzq >Priority: Blocker > Attachments: HDFS-13677-001.patch, HDFS-13677-002.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829073#comment-16829073 ] xuzq commented on HDFS-13677: - Thanks [~sodonnell] ! > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Priority: Major > Attachments: HDFS-13677-001.patch, HDFS-13677-002.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829061#comment-16829061 ] Stephen O'Donnell commented on HDFS-13677: -- HI [~xuzq_zander] - Thanks for the additional revision. I see you have added the test I gave above into the patch - that is the correct thing to do as if the test is in the patch it will get executed by the automated build run. The 002 patch looks good to me and the tests which failed seem unrelated to the change (there are often some tests that fail intermittently due to to load on the build server etc). > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Priority: Major > Attachments: HDFS-13677-001.patch, HDFS-13677-002.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828806#comment-16828806 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.fs.viewfs.TestViewFileSystemHdfs | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-13677 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967322/HDFS-13677-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f12a0f1c7d76 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 43b2a4b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26722/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26722/testReport/ | | Max. process+thread count | 3744 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828800#comment-16828800 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 38s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-780/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3140035b8a9a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 43b2a4b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/3/testReport/ | | Max. process+thread count | 3989 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828043#comment-16828043 ] xuzq commented on HDFS-13677: - Hi [~sodonnell] - I will drop the lock in mergeAll(). The testReAddVolumeWithBlocks() is very good. I need to add the code into the patch? This is my first MR, so there is somethings I don't understand. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Priority: Major > Attachments: HDFS-13677-001.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828026#comment-16828026 ] Stephen O'Donnell commented on HDFS-13677: -- That failing test seems to be flaky. It failed sometimes and passed sometimes when I ran it locally even without the patch in place. Borrowing a lot from the existing tests in TestDataNodeHotSwapVolumes, here is a test that reproduces the issue without the patch and passes with the patch in place: {code} /** * Test re-adding one volume with some blocks on a running MiniDFSCluster * with only one NameNode to reproduce HDFS-13677. */ @Test(timeout=6) public void testReAddVolumeWithBlocks() throws IOException, ReconfigurationException, InterruptedException, TimeoutException { startDFSCluster(1, 1); String bpid = cluster.getNamesystem().getBlockPoolId(); final int numBlocks = 10; Path testFile = new Path("/test"); createFile(testFile, numBlocks); List> blockReports = cluster.getAllBlockReports(bpid); assertEquals(1, blockReports.size()); // 1 DataNode assertEquals(2, blockReports.get(0).size()); // 2 volumes // Now remove the second volume DataNode dn = cluster.getDataNodes().get(0); Collection oldDirs = getDataDirs(dn); String newDirs = oldDirs.iterator().next(); // Keep the first volume. assertThat( "DN did not update its own config", dn.reconfigurePropertyImpl( DFSConfigKeys.DFS_DATANODE_DATA_DIR_KEY, newDirs), is(dn.getConf().get(DFS_DATANODE_DATA_DIR_KEY))); assertFileLocksReleased( new ArrayList(oldDirs).subList(1, oldDirs.size())); // Now create another file - the first volume should have 15 blocks // and 5 blocks on the previously removed volume createFile(new Path("/test2"), numBlocks); dn.scheduleAllBlockReport(0); blockReports = cluster.getAllBlockReports(bpid); assertEquals(1, blockReports.size()); // 1 DataNode assertEquals(1, blockReports.get(0).size()); // 1 volume for (BlockListAsLongs blockList : blockReports.get(0).values()) { assertEquals(15, blockList.getNumberOfBlocks()); } // Now add the original volume back again and ensure 15 blocks are reported assertThat( "DN did not update its own config", dn.reconfigurePropertyImpl( DFSConfigKeys.DFS_DATANODE_DATA_DIR_KEY, String.join(",", oldDirs)), is(dn.getConf().get(DFS_DATANODE_DATA_DIR_KEY))); dn.scheduleAllBlockReport(0); blockReports = cluster.getAllBlockReports(bpid); assertEquals(1, blockReports.size()); // 1 DataNode assertEquals(2, blockReports.get(0).size()); // 2 volumes // The order of the block reports is not guaranteed. As we expect 2, get the // max block count and the min block count and then assert on that. int minNumBlocks = Integer.MAX_VALUE; int maxNumBlocks = Integer.MIN_VALUE; for (BlockListAsLongs blockList : blockReports.get(0).values()) { minNumBlocks = Math.min(minNumBlocks, blockList.getNumberOfBlocks()); maxNumBlocks = Math.max(maxNumBlocks, blockList.getNumberOfBlocks()); } assertEquals(5, minNumBlocks); assertEquals(15, maxNumBlocks); } {code} Without the patch, the second last assertEquals will fail as zero blocks will be reported from the volume that was not removed instead of 15. Feel free to use the above test in the patch or refactor it as needed. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Priority: Major > Attachments: HDFS-13677-001.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at >
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827984#comment-16827984 ] Stephen O'Donnell commented on HDFS-13677: -- Hi [~xuzq_zander] - The new patch looks better. A couple of comments: 1. I don't think we need the lock in mergeAll method, as there is a lock in the add method it calls which protects the structures. With a lock in mergeAll, if the disk being added has a lot of blocks, it could block the DN adding anything else to the volumeMap (eg new blocks being created) for a bit of time while all the volumes blocks are loaded. 2. Do you think we could add a test in TestDataNodeHotSwapVolumes that reproduces the issue? Eg Have a DN with 1 volume and 5 blocks and then add another volume with 2 blocks and ensure it reports 7 blocks rather than 2. 3. One test has failed in TestDataNodeHotSwapVolumes. Not sure if its related to this change or not, so we will need to dig into it and see. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: xuzq >Priority: Major > Attachments: HDFS-13677-001.patch, image-2018-06-14-13-05-54-354.png, > image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827850#comment-16827850 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 90 unchanged - 0 fixed = 91 total (was 90) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}164m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-13677 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967305/HDFS-13677-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux dab98239121c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 43b2a4b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/26720/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26720/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827848#comment-16827848 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 90 unchanged - 0 fixed = 91 total (was 90) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 14s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}168m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.tools.TestDFSZKFailoverController | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-780/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b0370166451e 4.4.0-141-generic #167~14.04.1-Ubuntu SMP Mon Dec 10 13:20:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 43b2a4b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/1/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | |
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827847#comment-16827847 ] Hadoop QA commented on HDFS-13677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 90 unchanged - 0 fixed = 91 total (was 90) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 53s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}160m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-780/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/780 | | JIRA Issue | HDFS-13677 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6172042187a3 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 43b2a4b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-780/2/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827798#comment-16827798 ] xuzq commented on HDFS-13677: - {quote}1. I wonder if we should change the addAll method as other parts of the code may be using it and expecting it to simply replace the blockpool? It would be worth a look around to see if its being used in any other places. Perhaps we should add a new method "mergeAll" which does what we need here and better describes its purpose? 2. Rather than the new method addAndNotReplace, we should just call the existing method add: {quote} [~sodonnell] I think this is a good idea, and I will upload a new version code based the current trunk. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0 >Reporter: xuzq >Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827730#comment-16827730 ] Stephen O'Donnell commented on HDFS-13677: -- Looking at the patch [~xuzq_zander] uploaded some time ago: 1. I wonder if we should change the addAll method as other parts of the code may be using it and expecting it to simply replace the blockpool? It would be worth a look around to see if its being used in any other places. Perhaps we should add a new method "mergeAll" which does what we need here and better describes its purpose? 2. Rather than the new method addAndNotReplace, we should just call the existing method add: {code} ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) { checkBlockPool(bpid); checkBlock(replicaInfo); try (AutoCloseableLock l = lock.acquire()) { FoldedTreeSet set = map.get(bpid); if (set == null) { // Add an entry for block pool if it does not exist already set = new FoldedTreeSet<>(); map.put(bpid, set); } return set.addOrReplace(replicaInfo); } } {code} It handles adding the blockpool entry if it is needed and also puts the required locking around the calls to make it threadsafe. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0 >Reporter: xuzq >Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827725#comment-16827725 ] Stephen O'Donnell commented on HDFS-13677: -- Yes, I just thought that perhaps the reason the problem does not reproduce is because my new disk is empty. Re-adding a non-empty disk is what causes this problem to manifest. So steps to reproduce: 1. Single node local cluster with 1 storage configured. Add 10 files or so, giving a known number of blocks. 2. Reconfigure to add a new second storage. Add say 5 more files to put 2 or 3 blocks on the new disk. 3. Reconfigure to remove the second storage, then add it again. Then the DN should generate a FBR with only 2 or 3 blocks (ie missing those from the original storage) due to this bug. {code} 2019-04-27 19:50:12,801 INFO datanode.DataNode: Successfully sent block report 0x230a1c633c634987, containing 1 storage report(s), of which we sent 1. The reports had 10 total blocks and used 1 RPC(s). This took 3 msec to generate and 20 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5. {code} Now I add an empty disk and we can no issues, still 10 blocks reported: {code} 2019-04-27 19:52:18,085 INFO impl.FsDatasetImpl: Added volume - [DISK]file:/tmp/hadoop-sodonnell/dfs/data2, StorageType: DISK 2019-04-27 19:52:18,085 INFO datanode.DataNode: Successfully added volume: [DISK]file:/tmp/hadoop-sodonnell/dfs/data2 2019-04-27 19:52:18,086 INFO datanode.DataNode: Block pool BP-1999061334-192.168.0.24-1556390848658 (Datanode Uuid 242e590a-b1e3-4a1e-9b32-e67cc095bb0f): scheduling a full block report. 2019-04-27 19:52:18,087 INFO datanode.DataNode: Forcing a full block report to localhost/127.0.0.1:8020 2019-04-27 19:52:18,087 INFO conf.ReconfigurableBase: Property rpc.engine.org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolPB is not configurable: old value: org.apache.hadoop.ipc.ProtobufRpcEngine, new value: null 2019-04-27 19:52:18,089 INFO datanode.DataNode: Successfully sent block report 0x230a1c633c634988, containing 2 storage report(s), of which we sent 2. The reports had 10 total blocks and used 1 RPC(s). This took 1 msec to generate and 2 msecs for RPC and NN processing. Got back no commands. {code} Now add 5 more files, giving 15 blocks total and then remove the second volume. We can see 13 blocks reported as 2 are on the removed disk, still as expected: {code} 2019-04-27 19:54:19,740 INFO impl.FsDatasetImpl: Removed volume: /tmp/hadoop-sodonnell/dfs/data2 2019-04-27 19:54:19,740 INFO impl.FsDatasetImpl: Volume reference is released. 2019-04-27 19:54:19,741 INFO common.Storage: Removing block level storage: /tmp/hadoop-sodonnell/dfs/data2/current/BP-1999061334-192.168.0.24-1556390848658 2019-04-27 19:54:19,743 INFO datanode.DataNode: Block pool BP-1999061334-192.168.0.24-1556390848658 (Datanode Uuid 242e590a-b1e3-4a1e-9b32-e67cc095bb0f): scheduling a full block report. 2019-04-27 19:54:19,743 INFO datanode.DataNode: Forcing a full block report to localhost/127.0.0.1:8020 2019-04-27 19:54:19,743 INFO conf.ReconfigurableBase: Property rpc.engine.org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolPB is not configurable: old value: org.apache.hadoop.ipc.ProtobufRpcEngine, new value: null 2019-04-27 19:54:19,746 INFO datanode.DataNode: Successfully sent block report 0x230a1c633c634989, containing 1 storage report(s), of which we sent 1. The reports had 13 total blocks and used 1 RPC(s). This took 0 msec to generate and 2 msecs for RPC and NN processing. Got back no commands. {code} Finally add the disk back in and the problem appears, as its no longer an empty disk. The FBR reports only 2 blocks from the new disk instead of 15 as it should: {code} 2019-04-27 19:57:24,710 INFO impl.FsDatasetImpl: Added volume - [DISK]file:/tmp/hadoop-sodonnell/dfs/data2, StorageType: DISK 2019-04-27 19:57:24,710 INFO datanode.DataNode: Successfully added volume: [DISK]file:/tmp/hadoop-sodonnell/dfs/data2 2019-04-27 19:57:24,711 INFO datanode.DataNode: Block pool BP-1999061334-192.168.0.24-1556390848658 (Datanode Uuid 242e590a-b1e3-4a1e-9b32-e67cc095bb0f): scheduling a full block report. 2019-04-27 19:57:24,711 INFO datanode.DataNode: Forcing a full block report to localhost/127.0.0.1:8020 2019-04-27 19:57:24,711 INFO conf.ReconfigurableBase: Property rpc.engine.org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolPB is not configurable: old value: org.apache.hadoop.ipc.ProtobufRpcEngine, new value: null 2019-04-27 19:57:24,713 INFO datanode.DataNode: Successfully sent block report 0x230a1c633c63498a, containing 2 storage report(s), of which we sent 2. The reports had 2 total blocks and used 1 RPC(s). This took 0 msec to generate and 2 msecs for RPC and NN processing. Got back no commands. {code} > Dynamic refresh Disk configuration results in overwriting VolumeMap >
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827705#comment-16827705 ] Virajith Jalaparti commented on HDFS-13677: --- The issue seems to happen when the new volume being added has some blocks in a block pool that exists in other volumes in the Datanode. As mentioned by [~xuzq_zander], the bug is due to the way {{ReplicaMap#addAll}} is implemented. I was able to reproduce this bug on trunk by modifying one of the unit tests in {{TestDataNodeHotSwapVolumes}}. [~arpitagarwal] - did you see this problem? > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0 >Reporter: xuzq >Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827656#comment-16827656 ] Stephen O'Donnell commented on HDFS-13677: -- I tried to reproduce this with current trunk (namenode and one DN running locally) as I had it built already and the problem does not occur. If I: # Format the NN # Add 10 files giving me 10 blocks # Add a disk and reconfig - what I expected to happen due to this bug, is it will 'forget' the 10 blocks when it issues a FBR, but it does not {code} ... 2019-04-27 17:06:54,561 INFO datanode.DataNode: Successfully sent block report 0x6850d9b5dfeab332, containing 1 storage report(s), of which we sent 1. The reports had 10 total blocks and used 1 RPC(s). This took 3 msec to generate and 20 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5. ... 2019-04-27 17:13:54,111 INFO datanode.DataNode: Reconfiguring dfs.datanode.data.dir to /tmp/hadoop-sodonnell/dfs/data,/tmp/hadoop-sodonnell/dfs/data2 2019-04-27 17:13:54,122 INFO datanode.DataNode: Adding new volumes: [DISK]file:/tmp/hadoop-sodonnell/dfs/data2 2019-04-27 17:13:54,123 INFO common.Storage: /private/tmp/hadoop-sodonnell/dfs/data2 does not exist. Creating ... 2019-04-27 17:13:54,281 INFO common.Storage: Lock on /tmp/hadoop-sodonnell/dfs/data2/in_use.lock acquired by nodename 7155@SOdonnell-MBP15.local 2019-04-27 17:13:54,281 INFO common.Storage: Storage directory with location [DISK]file:/tmp/hadoop-sodonnell/dfs/data2 is not formatted for namespace 79966483. Formatting... 2019-04-27 17:13:54,282 INFO common.Storage: Generated new storageID DS-9449564c-f688-46b8-b88d-2aacd53810f4 for directory /tmp/hadoop-sodonnell/dfs/data2 2019-04-27 17:13:54,304 INFO common.Storage: Locking is disabled for /tmp/hadoop-sodonnell/dfs/data2/current/BP-615868015-192.168.0.24-1556381051528 2019-04-27 17:13:54,304 INFO common.Storage: Block pool storage directory for location [DISK]file:/tmp/hadoop-sodonnell/dfs/data2 and block pool id BP-615868015-192.168.0.24-1556381051528 is not formatted. Formatting ... 2019-04-27 17:13:54,304 INFO common.Storage: Formatting block pool BP-615868015-192.168.0.24-1556381051528 directory /tmp/hadoop-sodonnell/dfs/data2/current/BP-615868015-192.168.0.24-1556381051528/current 2019-04-27 17:13:54,324 INFO impl.BlockPoolSlice: Replica Cache file: /tmp/hadoop-sodonnell/dfs/data2/current/BP-615868015-192.168.0.24-1556381051528/current/replicas doesn't exist 2019-04-27 17:13:54,325 INFO impl.FsDatasetImpl: Added new volume: DS-9449564c-f688-46b8-b88d-2aacd53810f4 2019-04-27 17:13:54,325 INFO impl.FsDatasetImpl: Added volume - [DISK]file:/tmp/hadoop-sodonnell/dfs/data2, StorageType: DISK 2019-04-27 17:13:54,325 INFO datanode.DataNode: Successfully added volume: [DISK]file:/tmp/hadoop-sodonnell/dfs/data2 2019-04-27 17:13:54,326 INFO datanode.DataNode: Block pool BP-615868015-192.168.0.24-1556381051528 (Datanode Uuid 93ca3bd5-ee44-4506-b952-dc243eac4d18): scheduling a full block report. 2019-04-27 17:13:54,326 INFO datanode.DataNode: Forcing a full block report to localhost/127.0.0.1:8020 2019-04-27 17:13:54,326 INFO conf.ReconfigurableBase: Property rpc.engine.org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolPB is not configurable: old value: org.apache.hadoop.ipc.ProtobufRpcEngine, new value: null 2019-04-27 17:13:54,329 INFO datanode.DataNode: Successfully sent block report 0x6850d9b5dfeab333, containing 2 storage report(s), of which we sent 2. The reports had 10 total blocks and used 1 RPC(s). This took 0 msec to generate and 3 msecs for RPC and NN processing. Got back no commands. < ^^^ Note still 10 blocks reported as expected > {code} > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0 >Reporter: xuzq >Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827648#comment-16827648 ] Stephen O'Donnell commented on HDFS-13677: -- Ah, I thought the variable VolumeMap was a per volume set, but its actually a map of BPID -> Replica Info. So when we build the new tempVolumeMap, as you said, it only contains blocks for the new storage and then when it gets merged into the master VolumeMap, it wipes out all other blocks in the map as it replaces the original BPID key in the map. If I am reading this correctly, does that mean the dynamic disk add feature basically doesn't work at all and this is reproducible easily, with a single block pool? If I get some time I will see if I can reproduce this on a test cluster. > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0 >Reporter: xuzq >Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827614#comment-16827614 ] xuzq commented on HDFS-13677: - [~sodonnell] Thanks for your replay. Reproduce step as blow: # A machine has many disks, like /media/disk1, /media/disk2, /media/disk3 # Each disk has some BlockPool, like /media/disk*/dn/current/BP-1, /media/disk*/dn/current/BP-2, /media/disk*/dn/current/BP-3 # There's a lot of datas in every BP in every disk. # Then we dropped the /media/disk1 through "reconfig datanode" # Sleep some times # Then we add the /media/disk1 to production through "reconfig datanode" The relevant part of the code in 'addVolume()' from current trunk looks like: {code:java} final ReplicaMap tempVolumeMap = new ReplicaMap(new AutoCloseableLock()); ArrayList exceptions = Lists.newArrayList(); for (final NamespaceInfo nsInfo : nsInfos) { String bpid = nsInfo.getBlockPoolID(); try { fsVolume.addBlockPool(bpid, this.conf, this.timer); fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker); } catch (IOException e) { LOG.warn("Caught exception when adding " + fsVolume + ". Will throw later.", e); exceptions.add(e); } } if (!exceptions.isEmpty()) { try { sd.unlock(); } catch (IOException e) { exceptions.add(e); } throw MultipleIOException.createIOException(exceptions); } final FsVolumeReference ref = fsVolume.obtainReference(); setupAsyncLazyPersistThread(fsVolume); builder.build(); activateVolume(tempVolumeMap, sd, storageType, ref); LOG.info("Added volume - " + location + ", StorageType: " + storageType); {code} As we all known, the tempVolumeMap.map contains only blocks for each BP of the newly added Storage. In activateVolume() will add the tempVolumeMap.map into volumeMap like blow. {code:java} /** * Add all entries from the given replica map into the local replica map. */ void addAll(ReplicaMap other) { map.putAll(other.map); } {code} But map.putAll(otherMap) will use the new values in otherMap to replace the old value in map for the same key. The document of HashMap.putAll() as blow. These mapping will replace any mapping that this hashtable had for any of keys currently in the specified map. {code:java} /** * Copies all of the mappings from the specified map to this hashtable. * These mappings will replace any mappings that this hashtable had for any * of the keys currently in the specified map. * * @param t mappings to be stored in this map * @throws NullPointerException if the specified map is null * @since 1.2 */ public synchronized void putAll(Map t) { for (Map.Entry e : t.entrySet()) put(e.getKey(), e.getValue()); }{code} > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0 >Reporter: xuzq >Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap
[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827566#comment-16827566 ] Stephen O'Donnell commented on HDFS-13677: -- [~xuzq_zander] This is an interesting find. I have seen this happen a couple of times when a disk was removed and then added to the cluster, but was never able to reproduce. It seemed to happen somewhat randomly and resulted in the DN reporting much fewer than expected blocks to the NN and hence causing missing blocks. A DN restarted fixed it. Are you able to reproduce this problem every time, or is it caused by some sort of race condition and only happens sometimes? Looking at the source just before the activeVolume call, the new temporary map should be populated with all the replicas on the volume before it is called by getVolumeMap(): {code} ReplicaMap tempVolumeMap = new ReplicaMap(datasetLock); fsVolume.getVolumeMap(tempVolumeMap, ramDiskReplicaTracker); activateVolume(tempVolumeMap, sd, storageLocation.getStorageType(), ref); {code} If the new map is populated correctly, do you know why the call to volumeMap.addAll() causes the issue? Is it possible the tempVolumeMap object is not populated fully somehow? > Dynamic refresh Disk configuration results in overwriting VolumeMap > --- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0 >Reporter: xuzq >Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > :50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org