[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944096#comment-16944096 ] Wei-Chiu Chuang commented on HDFS-14849: Cherrypicked the commit to branch-3.2 without conflicts. There is a trivial conflict for branch-3.1. So attached a patch [^HDFS-14849.branch-3.1.patch] for posterity. > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > HDFS-14849.branch-3.1.patch, fsck-file.png, liveBlockIndices.png, > scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal > block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940099#comment-16940099 ] Hudson commented on HDFS-14849: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17410 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17410/]) Revert "HDFS-14849. Erasure Coding: the internal block is replicated (ayushsaxena: rev 0d5d0b914ac959ce2c41f483ac5b74f58053cd00) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java HDFS-14849. Erasure Coding: the internal block is replicated many times (ayushsaxena: rev c4c8d5fd0e3c17ccdcf18ece8e005f510328b060) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Fix For: 3.3.0 > > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal > block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940074#comment-16940074 ] Ayush Saxena commented on HDFS-14849: - FYI. While commit, Got change from another JIRA too. Re-Commiting with the changes only here > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Fix For: 3.3.0 > > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal > block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939469#comment-16939469 ] Hudson commented on HDFS-14849: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17407 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17407/]) HDFS-14849. Erasure Coding: the internal block is replicated many times (ayushsaxena: rev ce58c05f1d89a72c787f3571f78a9464d0ab3933) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Fix For: 3.3.0 > > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal > block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939456#comment-16939456 ] Ayush Saxena commented on HDFS-14849: - Committed to trunk. Thanx [~marvelrock] for the contribution and [~ferhui] for the review!!! > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal > block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939452#comment-16939452 ] Ayush Saxena commented on HDFS-14849: - Thanx [~marvelrock] and [~ferhui]. v002 LGTM +1 > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal > block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning
[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939084#comment-16939084 ] Fei Hui commented on HDFS-14849: +1 from me This fix is like the function countReplicasForStripedBlock implement {code} /** * For a striped block, it is possible it contains full number of internal * blocks (i.e., 9 by default), but with duplicated replicas of the same * internal block. E.g., for the following list of internal blocks * b0, b0, b1, b2, b3, b4, b5, b6, b7 * we have 9 internal blocks but we actually miss b8. * We should use this method to detect the above scenario and schedule * necessary reconstruction. */ private void countReplicasForStripedBlock(NumberReplicas counters, BlockInfoStriped block, Collection nodesCorrupt, boolean inStartupSafeMode) { BitSet bitSet = new BitSet(block.getTotalBlockNum()); for (StorageAndBlockIndex si : block.getStorageAndIndexInfos()) { StoredReplicaState state = checkReplicaOnStorage(counters, block, si.getStorage(), nodesCorrupt, inStartupSafeMode); if (state == StoredReplicaState.LIVE) { if (!bitSet.get(si.getBlockIndex())) { bitSet.set(si.getBlockIndex()); } else { counters.subtract(StoredReplicaState.LIVE, 1); counters.add(StoredReplicaState.REDUNDANT, 1); } } } } {code} [~ayushtkn] Could you please take a look? > Erasure Coding: the internal block is replicated many times when datanode is > decommissioning > > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Affects Versions: 3.3.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Major > Labels: EC, HDFS, NameNode > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal > block in that datanode will be replicated many times. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org