[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories
[ https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129782#comment-17129782 ] Stephen O'Donnell commented on HDFS-15386: -- [~brfrn169] I have merged this change to 2.10 too. Thanks for the contribution and for tracking down this long standing issue. > ReplicaNotFoundException keeps happening in DN after removing multiple DN's > data directories > > > Key: HDFS-15386 > URL: https://issues.apache.org/jira/browse/HDFS-15386 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0, 3.1.5 > > > When removing volumes, we need to invalidate all the blocks in the volumes. > In the following code (FsDatasetImpl), we keep the blocks that will be > invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* > (Block Pool ID), it will be overwritten by other removed volumes. As a > result, the map will have only the blocks of the last volume we are removing, > and invalidate only them: > {code:java} > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = > volumeMap.replicas(bpid).iterator(); it.hasNext();) { > ReplicaInfo block = it.next(); > final StorageLocation blockStorageLocation = > block.getVolume().getStorageLocation(); > LOG.trace("checking for block " + block.getBlockId() + > " with storageLocation " + blockStorageLocation); > if (blockStorageLocation.equals(sdLocation)) { > blocks.add(block); > it.remove(); > } > } > blkToInvalidate.put(bpid, blocks); > } > {code} > [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories
[ https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128753#comment-17128753 ] Toshihiro Suzuki commented on HDFS-15386: - [~sodonnell] I created a PR for branch-2.10 but it looks like QA is broken. https://github.com/apache/hadoop/pull/2054 Can you please help me? > ReplicaNotFoundException keeps happening in DN after removing multiple DN's > data directories > > > Key: HDFS-15386 > URL: https://issues.apache.org/jira/browse/HDFS-15386 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.4, 3.2.2, 3.3.1, 3.4.0, 3.1.5 > > > When removing volumes, we need to invalidate all the blocks in the volumes. > In the following code (FsDatasetImpl), we keep the blocks that will be > invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* > (Block Pool ID), it will be overwritten by other removed volumes. As a > result, the map will have only the blocks of the last volume we are removing, > and invalidate only them: > {code:java} > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = > volumeMap.replicas(bpid).iterator(); it.hasNext();) { > ReplicaInfo block = it.next(); > final StorageLocation blockStorageLocation = > block.getVolume().getStorageLocation(); > LOG.trace("checking for block " + block.getBlockId() + > " with storageLocation " + blockStorageLocation); > if (blockStorageLocation.equals(sdLocation)) { > blocks.add(block); > it.remove(); > } > } > blkToInvalidate.put(bpid, blocks); > } > {code} > [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories
[ https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126715#comment-17126715 ] Stephen O'Donnell commented on HDFS-15386: -- Please create a PR for branch-2.10 and then we can backport to other 2.x branches from there. This is now committed branch 3.0, 3.1, 3.2, 3.3 and trunk. > ReplicaNotFoundException keeps happening in DN after removing multiple DN's > data directories > > > Key: HDFS-15386 > URL: https://issues.apache.org/jira/browse/HDFS-15386 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > > When removing volumes, we need to invalidate all the blocks in the volumes. > In the following code (FsDatasetImpl), we keep the blocks that will be > invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* > (Block Pool ID), it will be overwritten by other removed volumes. As a > result, the map will have only the blocks of the last volume we are removing, > and invalidate only them: > {code:java} > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = > volumeMap.replicas(bpid).iterator(); it.hasNext();) { > ReplicaInfo block = it.next(); > final StorageLocation blockStorageLocation = > block.getVolume().getStorageLocation(); > LOG.trace("checking for block " + block.getBlockId() + > " with storageLocation " + blockStorageLocation); > if (blockStorageLocation.equals(sdLocation)) { > blocks.add(block); > it.remove(); > } > } > blkToInvalidate.put(bpid, blocks); > } > {code} > [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories
[ https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126647#comment-17126647 ] Toshihiro Suzuki commented on HDFS-15386: - [~sodonnell] Thank you for merging the PR to trunk! For branch 2, which branch should I create a PR for? Thanks. > ReplicaNotFoundException keeps happening in DN after removing multiple DN's > data directories > > > Key: HDFS-15386 > URL: https://issues.apache.org/jira/browse/HDFS-15386 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > > When removing volumes, we need to invalidate all the blocks in the volumes. > In the following code (FsDatasetImpl), we keep the blocks that will be > invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* > (Block Pool ID), it will be overwritten by other removed volumes. As a > result, the map will have only the blocks of the last volume we are removing, > and invalidate only them: > {code:java} > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = > volumeMap.replicas(bpid).iterator(); it.hasNext();) { > ReplicaInfo block = it.next(); > final StorageLocation blockStorageLocation = > block.getVolume().getStorageLocation(); > LOG.trace("checking for block " + block.getBlockId() + > " with storageLocation " + blockStorageLocation); > if (blockStorageLocation.equals(sdLocation)) { > blocks.add(block); > it.remove(); > } > } > blkToInvalidate.put(bpid, blocks); > } > {code} > [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories
[ https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126630#comment-17126630 ] Hudson commented on HDFS-15386: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18329 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18329/]) HDFS-15386 ReplicaNotFoundException keeps happening in DN after removing (github: rev 545a0a147c5256c44911ba57b4898e01d786d836) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java > ReplicaNotFoundException keeps happening in DN after removing multiple DN's > data directories > > > Key: HDFS-15386 > URL: https://issues.apache.org/jira/browse/HDFS-15386 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > > When removing volumes, we need to invalidate all the blocks in the volumes. > In the following code (FsDatasetImpl), we keep the blocks that will be > invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* > (Block Pool ID), it will be overwritten by other removed volumes. As a > result, the map will have only the blocks of the last volume we are removing, > and invalidate only them: > {code:java} > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = > volumeMap.replicas(bpid).iterator(); it.hasNext();) { > ReplicaInfo block = it.next(); > final StorageLocation blockStorageLocation = > block.getVolume().getStorageLocation(); > LOG.trace("checking for block " + block.getBlockId() + > " with storageLocation " + blockStorageLocation); > if (blockStorageLocation.equals(sdLocation)) { > blocks.add(block); > it.remove(); > } > } > blkToInvalidate.put(bpid, blocks); > } > {code} > [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org