subject:"\[jira\] \[Work logged\] $HDFS\-16182$ numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage"

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-11-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=681915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681915
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 16/Nov/21 10:22
Start Date: 16/Nov/21 10:22
Worklog Time Spent: 10m 
  Work Description: jojochuang merged pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681915)
Time Spent: 3.5h  (was: 3h 20m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-09-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=644885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644885
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 01/Sep/21 09:14
Start Date: 01/Sep/21 09:14
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-910066888


   cc @jojochuang .  Failed junit tests seem unrelated.  Can you review this 
patch again?  Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644885)
Time Spent: 3h 20m  (was: 3h 10m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-09-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=644818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644818
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 01/Sep/21 08:36
Start Date: 01/Sep/21 08:36
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-910066888


   cc @jojochuang .  Failed junit tests seem unrelated.  Can you review this 
patch again?  Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644818)
Time Spent: 3h 10m  (was: 3h)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=642360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642360
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 14:12
Start Date: 26/Aug/21 14:12
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-906450340


   > @Neilxzn Please fix checkstyle and check failed unit tests first. Thanks
   @Hexiaoqiao Thank you for your review.
   1. About BlockPlacementPolicyDefault$chooseTarget,  ParameterNumber 
checkstyle warning is not generated by this patch and it is hard to fix. Maybe 
we ignore it this patch?
   2. I checked these failed tests agian and these tests run pass locally. And 
these tests seem unrelated. Please review these tests again. Thanks.
   
   > hadoop.hdfs.server.namenode.ha.TestEditLogTailer
   > hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
   > hadoop.hdfs.TestHDFSFileSystemContract


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642360)
Time Spent: 3h  (was: 2h 50m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=642291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642291
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:00
Start Date: 26/Aug/21 12:00
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-906344882


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  1s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 44s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  9s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 53s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/4/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 153 unchanged 
- 2 fixed = 154 total (was 155)  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 360m 45s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 452m 43s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
   |   | hadoop.hdfs.TestHDFSFileSystemContract |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3320 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux b22575c707a1 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 
16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9bb5daa16c3aef93b81499ec512bd7eca21b5868 |
   | Default Java | Private

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=642133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642133
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 03:55
Start Date: 26/Aug/21 03:55
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-906072715


   @Neilxzn  Please fix checkstyle and check failed unit tests first. Thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642133)
Time Spent: 2h 40m  (was: 2.5h)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=642109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642109
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 02:29
Start Date: 26/Aug/21 02:29
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-906027161


   Failed junit tests seem unrelated. And I try these tests in IDEA locally, 
these tests run pass. 
   
   - hadoop.hdfs.server.namenode.ha.TestEditLogTailer
   
   - hadoop.hdfs.TestHDFSFileSystemContract
   
   - hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642109)
Time Spent: 2.5h  (was: 2h 20m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=642070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642070
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 00:25
Start Date: 26/Aug/21 00:25
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-905964389


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  6s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 34s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 54s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 153 unchanged 
- 2 fixed = 155 total (was 155)  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 57s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 353m 51s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 447m 30s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.TestHDFSFileSystemContract |
   |   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3320 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux f0c899902df0 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 
16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 90fcf70f258934eb56c25cea56533b3aaa481e5d |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641826
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 17:04
Start Date: 25/Aug/21 17:04
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-905714180


   > Thanks @Neilxzn for your works. Great catch here! LGTM. +1 once leaved 
nits comments fixed.
   
   I think it is a bug  just when fallback storage policy happens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641826)
Time Spent: 2h 10m  (was: 2h)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641819
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 16:59
Start Date: 25/Aug/21 16:59
Worklog Time Spent: 10m 
  Work Description: Neilxzn removed a comment on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-905710927


   Just wonder if this is common bug for every chooseTarget invokes when assign 
requiredStorageTypes.size() to numOfReplicas here? or just for set 
storagePolicy?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641819)
Time Spent: 2h  (was: 1h 50m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641818
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 16:58
Start Date: 25/Aug/21 16:58
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-905710927


   Just wonder if this is common bug for every chooseTarget invokes when assign 
requiredStorageTypes.size() to numOfReplicas here? or just for set 
storagePolicy?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641818)
Time Spent: 1h 50m  (was: 1h 40m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641816
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 16:58
Start Date: 25/Aug/21 16:58
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on a change in pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#discussion_r695940727



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java
##
@@ -1337,6 +1337,50 @@ public void testChooseSsdOverDisk() throws Exception {
 Assert.assertEquals(StorageType.DISK, targets[1].getStorageType());
   }
 
+  @Test

Review comment:
   Done. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641816)
Time Spent: 1h 40m  (was: 1.5h)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641707
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 14:12
Start Date: 25/Aug/21 14:12
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-905536745


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 13s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 58s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 153 unchanged 
- 2 fixed = 155 total (was 155)  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 38s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 344m 21s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 440m 16s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3320 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 90ed77414ee5 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 
16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 53e9dcb4e3e96a55627a7c8c2825c5e575c52d1e |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641602
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 10:13
Start Date: 25/Aug/21 10:13
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on a change in pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#discussion_r695602966



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java
##
@@ -1337,6 +1337,50 @@ public void testChooseSsdOverDisk() throws Exception {
 Assert.assertEquals(StorageType.DISK, targets[1].getStorageType());
   }
 
+  @Test

Review comment:
   Add some javadoc here will be more friendly here.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
##
@@ -469,7 +469,7 @@ private Node chooseTarget(int numOfReplicas,
 LOG.trace("storageTypes={}", storageTypes);
 
 try {
-  if ((numOfReplicas = requiredStorageTypes.size()) == 0) {

Review comment:
   Just wonder if this is common bug for every `chooseTarget` invokes when 
assign `requiredStorageTypes.size()` to `numOfReplicas` here? or just for set 
storagePolicy?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641602)
Time Spent: 1h 20m  (was: 1h 10m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641527
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 06:53
Start Date: 25/Aug/21 06:53
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on a change in pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#discussion_r695451621



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
##
@@ -469,7 +469,7 @@ private Node chooseTarget(int numOfReplicas,
 LOG.trace("storageTypes={}", storageTypes);
 
 try {
-  if ((numOfReplicas = requiredStorageTypes.size()) == 0) {

Review comment:
   done. Declare numOfReplicas a final variable at line 438.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641527)
Time Spent: 1h 10m  (was: 1h)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641488
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 03:39
Start Date: 25/Aug/21 03:39
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on a change in pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#discussion_r695335223



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
##
@@ -469,7 +469,7 @@ private Node chooseTarget(int numOfReplicas,
 LOG.trace("storageTypes={}", storageTypes);
 
 try {
-  if ((numOfReplicas = requiredStorageTypes.size()) == 0) {

Review comment:
   better to declare numOfReplicas a final variable at line 438.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java
##
@@ -1337,6 +1337,51 @@ public void testChooseSsdOverDisk() throws Exception {
 Assert.assertEquals(StorageType.DISK, targets[1].getStorageType());
   }
 
+  @Test
+  public void testAddDatanode2ExistingPipelineInSsd() throws Exception {
+BlockStoragePolicy policy = POLICY_SUITE.getPolicy(ALLSSD);
+
+final String[] racks = {"/d1/r1", "/d2/r2", "/d3/r3", "/d4/r4", "/d5/r5",
+"/d6/r6", "/d7/r7"};
+final String[] hosts = {"host1", "host2", "host3", "host4", "host5",
+"host6", "host7"};
+final StorageType[] disks = {StorageType.DISK, StorageType.DISK, 
StorageType.DISK};
+
+final DatanodeStorageInfo[] diskStorages
+= DFSTestUtil.createDatanodeStorageInfos(7, racks, hosts, disks);
+final DatanodeDescriptor[] dataNodes
+= DFSTestUtil.toDatanodeDescriptor(diskStorages);
+for(int i = 0; i < dataNodes.length ; i++) {
+  BlockManagerTestUtil.updateStorage(dataNodes[i],
+  new DatanodeStorage("ssd" + i + 1, DatanodeStorage.State.NORMAL,
+  StorageType.SSD));
+}
+
+FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
+conf.set(DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY, "0.0.0.0:0");
+File baseDir = PathUtils.getTestDir(TestReplicationPolicy.class);
+conf.set(DFSConfigKeys.DFS_NAMENODE_NAME_DIR_KEY,
+new File(baseDir, "name").getPath());
+DFSTestUtil.formatNameNode(conf);
+NameNode namenode = new NameNode(conf);
+
+final BlockManager bm = namenode.getNamesystem().getBlockManager();
+BlockPlacementPolicy replicator = bm.getBlockPlacementPolicy();
+NetworkTopology cluster = bm.getDatanodeManager().getNetworkTopology();
+for (DatanodeDescriptor datanode : dataNodes) {
+  cluster.add(datanode);
+}
+// chsenDs are DISK StorageType to simulate not enough SDD Storage
+List chsenDs = new ArrayList<>();
+chsenDs.add(diskStorages[0]);
+chsenDs.add(diskStorages[1]);
+DatanodeStorageInfo[] targets = replicator.chooseTarget("/foo", 1,
+null, chsenDs, true,
+new HashSet(), 0, policy, null);
+System.out.println(policy.getName() + ": " + Arrays.asList(targets));

Review comment:
   Please use log4j to log the message.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641488)
Time Spent: 1h  (was: 50m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
>

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=640933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640933
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 04:06
Start Date: 24/Aug/21 04:06
Worklog Time Spent: 10m 
  Work Description: Neilxzn edited a comment on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-904304994


   @jojochuang   Agree with you.  I think we should fix it. 
   
   In my cluster, we use  BlockPlacementPolicyDefault to choose dn and the 
number of SSD DN is much less than DISK DN. It may cause to  some block  that 
should be placed to SSD DNs fallback to place DISK DNs when SSD DNs are too 
busy or no enough place.  Consider the following scenario.
   
1. Create  empty file   /foo_file 
2. Set its storagepolicy to All_SSD
   3. Put data to /foo_file
   4. /foo_file  gets 3  DISK  dns for pipeline because SSD dns are too busy at 
the beginning. 
   5. When it transfers data in pipeline,  one of 3 DISK dns shut down.
   6. The client  need to get one new dn for existing pipeline in 
DataStreamer$addDatanode2ExistingPipeline. 
   7. If SSD dns are available at the moment,  namenode will choose the 3 SSD 
dns and return it to the client. However, the client just need one new dn,  
namenode returns 3 new SSD dn and  the client threw exception in 
DataStreamer$findNewDatanode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640933)
Time Spent: 50m  (was: 40m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=640932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640932
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 04:04
Start Date: 24/Aug/21 04:04
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-904304994


   Agree it.  I think we should fix it. 
   
   In my cluster, we use  BlockPlacementPolicyDefault to choose dn and the 
number of SSD DN is much less than DISK DN. It may cause to  some block  that 
should be placed to SSD DNs fallback to place DISK DNs when SSD DNs are too 
busy or no enough place.  Consider the following scenario.
   
1. Create  empty file   /foo_file 
2. Set its storagepolicy to All_SSD
   3. Put data to /foo_file
   4. /foo_file  gets 3  DISK  dns for pipeline because SSD dns are too busy at 
the beginning. 
   5. When it transfers data in pipeline,  one of 3 DISK dns shut down.
   6. The client  need to get one new dn for existing pipeline in 
DataStreamer$addDatanode2ExistingPipeline. 
   7. If SSD dns are available at the moment,  namenode will choose the 3 SSD 
dns and return it to the client. However, the client just need one new dn,  
namenode returns 3 new SSD dn and  the client threw exception in 
DataStreamer$findNewDatanode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640932)
Time Spent: 40m  (was: 0.5h)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=640930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640930
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 03:49
Start Date: 24/Aug/21 03:49
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-904299806


   Don't really understand the code but seems like an ancient regression from 
HDFS-6686. As a general code advice, we should not update a parameter variable 
and pass it on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640930)
Time Spent: 0.5h  (was: 20m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16182.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=640730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640730
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 23/Aug/21 15:01
Start Date: 23/Aug/21 15:01
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#issuecomment-903852770


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 18s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 36s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 51s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 154 unchanged 
- 1 fixed = 155 total (was 155)  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 356m  5s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 40s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 449m 41s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3320/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3320 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 5ca83a692447 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 
16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 74f75f9545382b06b07d18e9a657ecfb9ab115a9 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2021-08-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=640600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640600
 ]

ASF GitHub Bot logged work on HDFS-16182:
-

Author: ASF GitHub Bot
Created on: 23/Aug/21 07:30
Start Date: 23/Aug/21 07:30
Worklog Time Spent: 10m 
  Work Description: Neilxzn opened a new pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320


   
   
   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-16182
   
   ### How was this patch tested?
   add  TestBlockStoragePolicy.testAddDatanode2ExistingPipelineInSsd
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640600)
Remaining Estimate: 0h
Time Spent: 10m

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail:

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

21 matches

Site Navigation

Mail list logo

Footer information