[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-09 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129782#comment-17129782
 ] 

Stephen O'Donnell commented on HDFS-15386:
--

[~brfrn169] I have merged this change to 2.10 too. Thanks for the contribution 
and for tracking down this long standing issue.

> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.0.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0, 3.1.5
>
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-08 Thread Toshihiro Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128753#comment-17128753
 ] 

Toshihiro Suzuki commented on HDFS-15386:
-

[~sodonnell] I created a PR for branch-2.10 but it looks like QA is broken.
https://github.com/apache/hadoop/pull/2054

Can you please help me?

> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.0.4, 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-05 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126715#comment-17126715
 ] 

Stephen O'Donnell commented on HDFS-15386:
--

Please create a PR for branch-2.10 and then we can backport to other 2.x 
branches from there.

This is now committed branch 3.0, 3.1, 3.2, 3.3 and trunk.

> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-05 Thread Toshihiro Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126647#comment-17126647
 ] 

Toshihiro Suzuki commented on HDFS-15386:
-

[~sodonnell] Thank you for merging the PR to trunk!

For branch 2, which branch should I create a PR for? Thanks.

> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15386) ReplicaNotFoundException keeps happening in DN after removing multiple DN's data directories

2020-06-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126630#comment-17126630
 ] 

Hudson commented on HDFS-15386:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18329 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18329/])
HDFS-15386 ReplicaNotFoundException keeps happening in DN after removing 
(github: rev 545a0a147c5256c44911ba57b4898e01d786d836)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java


> ReplicaNotFoundException keeps happening in DN after removing multiple DN's 
> data directories
> 
>
> Key: HDFS-15386
> URL: https://issues.apache.org/jira/browse/HDFS-15386
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> When removing volumes, we need to invalidate all the blocks in the volumes. 
> In the following code (FsDatasetImpl), we keep the blocks that will be 
> invalidate in *blkToInvalidate* map. However as the key of the map is *bpid* 
> (Block Pool ID), it will be overwritten by other removed volumes. As a 
> result, the map will have only the blocks of the last volume we are removing, 
> and invalidate only them:
> {code:java}
> for (String bpid : volumeMap.getBlockPoolList()) {
>   List blocks = new ArrayList<>();
>   for (Iterator it =
> volumeMap.replicas(bpid).iterator(); it.hasNext();) {
> ReplicaInfo block = it.next();
> final StorageLocation blockStorageLocation =
> block.getVolume().getStorageLocation();
> LOG.trace("checking for block " + block.getBlockId() +
> " with storageLocation " + blockStorageLocation);
> if (blockStorageLocation.equals(sdLocation)) {
>   blocks.add(block);
>   it.remove();
> }
>   }
>   blkToInvalidate.put(bpid, blocks);
> }
> {code}
> [https://github.com/apache/hadoop/blob/704409d53bf7ebf717a3c2e988ede80f623bbad3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L580-L595]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org