[jira] [Commented] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap

2021-07-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374521#comment-17374521
 ] 

Hudson commented on HDFS-16101:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} |  | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} yetus {color} | {color:red}  0m  7s{color} 
|  | {color:red} Unprocessed flag(s): --mvn-custom-repos-dir {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/665/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-16101 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13027419/HDFS-16101.001.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/665/console |
| versions | git=2.25.1 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Remove unuse variable and IoException in ProvidedStorageMap
> ---
>
> Key: HDFS-16101
> URL: https://issues.apache.org/jira/browse/HDFS-16101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16101.001.patch
>
>
> Remove unuse variable and IoException in ProvidedStorageMap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16108) Incorrect log placeholders used in JournalNodeSyncer

2021-07-04 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-16108:
-
Fix Version/s: 3.3.2
   3.2.3

Backported to branch-3.3 and branch-3.2

> Incorrect log placeholders used in JournalNodeSyncer
> 
>
> Key: HDFS-16108
> URL: https://issues.apache.org/jira/browse/HDFS-16108
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When Journal sync thread is using incorrect log placeholders at 2 places:
>  # When it fails to create dir for downloading log segments
>  # When it fails to move tmp editFile to current dir
> Since these failure logs are important to debug JN sync issues, we should fix 
> these incorrect placeholders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap

2021-07-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374513#comment-17374513
 ] 

Hudson commented on HDFS-16101:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} |  | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} yetus {color} | {color:red}  0m  7s{color} 
|  | {color:red} Unprocessed flag(s): --brief-report-file 
--spotbugs-strict-precheck --html-report-file --mvn-custom-repos --shelldocs 
--mvn-javadoc-goals --mvn-custom-repos-dir {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/664/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-16101 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13027419/HDFS-16101.001.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/664/console |
| versions | git=2.25.1 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Remove unuse variable and IoException in ProvidedStorageMap
> ---
>
> Key: HDFS-16101
> URL: https://issues.apache.org/jira/browse/HDFS-16101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16101.001.patch
>
>
> Remove unuse variable and IoException in ProvidedStorageMap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap

2021-07-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374511#comment-17374511
 ] 

Hudson commented on HDFS-16101:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} |  | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} yetus {color} | {color:red}  0m  7s{color} 
|  | {color:red} Unprocessed flag(s): --brief-report-file 
--spotbugs-strict-precheck --html-report-file --mvn-custom-repos --shelldocs 
--mvn-javadoc-goals --mvn-custom-repos-dir {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/662/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-16101 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13027419/HDFS-16101.001.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/662/console |
| versions | git=2.25.1 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Remove unuse variable and IoException in ProvidedStorageMap
> ---
>
> Key: HDFS-16101
> URL: https://issues.apache.org/jira/browse/HDFS-16101
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16101.001.patch
>
>
> Remove unuse variable and IoException in ProvidedStorageMap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16110?focusedWorklogId=618499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618499
 ]

ASF GitHub Bot logged work on HDFS-16110:
-

Author: ASF GitHub Bot
Created on: 05/Jul/21 04:06
Start Date: 05/Jul/21 04:06
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3174:
URL: https://github.com/apache/hadoop/pull/3174#issuecomment-873767890


   Hi @aajisaka @tasanuma @jojochuang @ferhui , could you please review the 
code? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618499)
Time Spent: 40m  (was: 0.5h)

> Remove unused method reportChecksumFailure in DFSClient
> ---
>
> Key: HDFS-16110
> URL: https://issues.apache.org/jira/browse/HDFS-16110
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Remove unused method reportChecksumFailure and fix some code styles by the 
> way in DFSClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16113) Improve CallQueueManager#swapQueue() execution performance

2021-07-04 Thread JiangHua Zhu (Jira)
JiangHua Zhu created HDFS-16113:
---

 Summary: Improve CallQueueManager#swapQueue() execution performance
 Key: HDFS-16113
 URL: https://issues.apache.org/jira/browse/HDFS-16113
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: JiangHua Zhu


In CallQueueManager#swapQueue(), there are some codes:
CallQueueManager#swapQueue() {
..
while (!queueIsReallyEmpty(oldQ)) {}
..
}
In queueIsReallyEmpty():
..
for (int i = 0; i 

[jira] [Assigned] (HDFS-16113) Improve CallQueueManager#swapQueue() execution performance

2021-07-04 Thread JiangHua Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu reassigned HDFS-16113:
---

Assignee: JiangHua Zhu

> Improve CallQueueManager#swapQueue() execution performance
> --
>
> Key: HDFS-16113
> URL: https://issues.apache.org/jira/browse/HDFS-16113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>
> In CallQueueManager#swapQueue(), there are some codes:
> CallQueueManager#swapQueue() {
> ..
> while (!queueIsReallyEmpty(oldQ)) {}
> ..
> }
> In queueIsReallyEmpty():
> ..
> for (int i = 0; i  ...
> We found that this implementation has certain performance hindrances in real 
> clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=618493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618493
 ]

ASF GitHub Bot logged work on HDFS-16088:
-

Author: ASF GitHub Bot
Created on: 05/Jul/21 03:36
Start Date: 05/Jul/21 03:36
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #3140:
URL: https://github.com/apache/hadoop/pull/3140#issuecomment-873759215


   LGTM. @Hexiaoqiao Could you please take another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618493)
Time Spent: 2h 20m  (was: 2h 10m)

> Standby NameNode process getLiveDatanodeStorageReport request to reduce 
> Active load
> ---
>
> Key: HDFS-16088
> URL: https://issues.apache.org/jira/browse/HDFS-16088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: standyby-ipcserver.jpg
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also 
> request to SNN to reduce the ANN load.
> There are two points that need to be mentioned:
>  1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, 
> so we can access SNN directly.
>  2. We can share the same UT(testBalancerRequestSBNWithHA) with 
> NameNodeConnector#getBlocks().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-16023) Improve blockReportLeaseId acquisition to avoid repeated FBR

2021-07-04 Thread JiangHua Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-16023 started by JiangHua Zhu.
---
> Improve blockReportLeaseId acquisition to avoid repeated FBR
> 
>
> Key: HDFS-16023
> URL: https://issues.apache.org/jira/browse/HDFS-16023
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the NameNode receives the data (FBR) from the DataNode, it will put the 
> data in the queue (BlockReportProcessingThread#queue), and there will be 
> threads processing them thereafter.
> When the DataNode wants to send data (here, FBR) to the NameNode, it will 
> first obtain a blockReportLeaseId from the NameNode. If the DataNode data 
> already exists in the queue, there is no need to assign a blockReportLeaseId 
> to the DataNode again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16110?focusedWorklogId=618486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618486
 ]

ASF GitHub Bot logged work on HDFS-16110:
-

Author: ASF GitHub Bot
Created on: 05/Jul/21 02:35
Start Date: 05/Jul/21 02:35
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3174:
URL: https://github.com/apache/hadoop/pull/3174#issuecomment-873738971


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 55s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 59s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 19s |  |  
hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 0 new + 41 
unchanged - 5 fixed = 41 total (was 46)  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 29s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 31s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 20s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  78m 56s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3174/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3174 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux b1eab1986fb6 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 74f772a4c74c6a880ebfb71feda4cae935175983 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3174/2/testReport/ |
   | Max. process+thread count | 545 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
   | 

[jira] [Work started] (HDFS-16107) Split RPC configuration to isolate RPC

2021-07-04 Thread JiangHua Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-16107 started by JiangHua Zhu.
---
> Split RPC configuration to isolate RPC
> --
>
> Key: HDFS-16107
> URL: https://issues.apache.org/jira/browse/HDFS-16107
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For RPC of different ports, there are some common configurations, such as:
> ipc.server.read.threadpool.size
> ipc.server.read.connection-queue.size
> ipc.server.handler.queue.size
> Once we configure these values, it will affect all requests (including client 
> and requests within the cluster).
> It is necessary for us to split these configurations to adapt to different 
> ports, such as:
> ipc.8020.server.read.threadpool.size
> ipc.8021.server.read.threadpool.size
> ipc.8020.server.read.connection-queue.size
> ipc.8021.server.read.connection-queue.size
> The advantage of this is to isolate the RPC to deal with the pressure of 
> requests from all sides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16107) Split RPC configuration to isolate RPC

2021-07-04 Thread JiangHua Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374447#comment-17374447
 ] 

JiangHua Zhu commented on HDFS-16107:
-

[~weichiu] [~sodonnell] [~hexiaoqiao], do you have any new suggestions?
Also, if possible, please help review the code I submitted. thank you very much.

> Split RPC configuration to isolate RPC
> --
>
> Key: HDFS-16107
> URL: https://issues.apache.org/jira/browse/HDFS-16107
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For RPC of different ports, there are some common configurations, such as:
> ipc.server.read.threadpool.size
> ipc.server.read.connection-queue.size
> ipc.server.handler.queue.size
> Once we configure these values, it will affect all requests (including client 
> and requests within the cluster).
> It is necessary for us to split these configurations to adapt to different 
> ports, such as:
> ipc.8020.server.read.threadpool.size
> ipc.8021.server.read.threadpool.size
> ipc.8020.server.read.connection-queue.size
> ipc.8021.server.read.connection-queue.size
> The advantage of this is to isolate the RPC to deal with the pressure of 
> requests from all sides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=618470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618470
 ]

ASF GitHub Bot logged work on HDFS-16088:
-

Author: ASF GitHub Bot
Created on: 05/Jul/21 01:54
Start Date: 05/Jul/21 01:54
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3140:
URL: https://github.com/apache/hadoop/pull/3140#issuecomment-873724526


   Hi @ferhui @Hexiaoqiao , I extracted a new method and added an seperate UT. 
Could you please review again? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618470)
Time Spent: 2h 10m  (was: 2h)

> Standby NameNode process getLiveDatanodeStorageReport request to reduce 
> Active load
> ---
>
> Key: HDFS-16088
> URL: https://issues.apache.org/jira/browse/HDFS-16088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: standyby-ipcserver.jpg
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also 
> request to SNN to reduce the ANN load.
> There are two points that need to be mentioned:
>  1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, 
> so we can access SNN directly.
>  2. We can share the same UT(testBalancerRequestSBNWithHA) with 
> NameNodeConnector#getBlocks().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=618469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618469
 ]

ASF GitHub Bot logged work on HDFS-16088:
-

Author: ASF GitHub Bot
Created on: 05/Jul/21 01:51
Start Date: 05/Jul/21 01:51
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3140:
URL: https://github.com/apache/hadoop/pull/3140#issuecomment-873723601


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 32s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  5s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 14s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  9s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 230m 32s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 45s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 314m  1s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3140/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3140 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 80e59a19ec5e 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / aa944c03b6a5817905ef89506820dd512b47a1bf |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3140/5/testReport/ |
   | Max. process+thread count | 3110 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3140/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This 

[jira] [Comment Edited] (HDFS-16112) Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor

2021-07-04 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374443#comment-17374443
 ] 

tomscut edited comment on HDFS-16112 at 7/5/21, 1:50 AM:
-

Hi [~sodonnell], these unit test 
TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, 
could you please take a look when you have time. Thanks a lot.


was (Author: tomscut):
Hi [~sodonnell], the unit test 
TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, 
could you please take a look when you have time. Thanks a lot.

> Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor 
> 
>
> Key: HDFS-16112
> URL: https://issues.apache.org/jira/browse/HDFS-16112
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Priority: Minor
>
> These unit tests 
> TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
> TestDecommissioningStatus#testDecommissionStatus recently seems a little 
> flaky, we should fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16112) Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor

2021-07-04 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut updated HDFS-16112:
---
Description: These unit tests 
TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, 
we should fix them.  (was: The unit test 
TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, 
we should fix them.)

> Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor 
> 
>
> Key: HDFS-16112
> URL: https://issues.apache.org/jira/browse/HDFS-16112
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Priority: Minor
>
> These unit tests 
> TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
> TestDecommissioningStatus#testDecommissionStatus recently seems a little 
> flaky, we should fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16112) Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor

2021-07-04 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374443#comment-17374443
 ] 

tomscut commented on HDFS-16112:


Hi [~sodonnell], the unit test 
TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, 
could you please take a look when you have time. Thanks a lot.

> Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor 
> 
>
> Key: HDFS-16112
> URL: https://issues.apache.org/jira/browse/HDFS-16112
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Priority: Minor
>
> The unit test 
> TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
> TestDecommissioningStatus#testDecommissionStatus recently seems a little 
> flaky, we should fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16112) Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor

2021-07-04 Thread tomscut (Jira)
tomscut created HDFS-16112:
--

 Summary: Fix flaky unit test 
TestDecommissioningStatusWithBackoffMonitor 
 Key: HDFS-16112
 URL: https://issues.apache.org/jira/browse/HDFS-16112
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut


The unit test 
TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and 
TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, 
we should fix them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16108) Incorrect log placeholders used in JournalNodeSyncer

2021-07-04 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei resolved HDFS-16108.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Incorrect log placeholders used in JournalNodeSyncer
> 
>
> Key: HDFS-16108
> URL: https://issues.apache.org/jira/browse/HDFS-16108
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When Journal sync thread is using incorrect log placeholders at 2 places:
>  # When it fails to create dir for downloading log segments
>  # When it fails to move tmp editFile to current dir
> Since these failure logs are important to debug JN sync issues, we should fix 
> these incorrect placeholders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16108) Incorrect log placeholders used in JournalNodeSyncer

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16108?focusedWorklogId=618467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618467
 ]

ASF GitHub Bot logged work on HDFS-16108:
-

Author: ASF GitHub Bot
Created on: 05/Jul/21 01:23
Start Date: 05/Jul/21 01:23
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #3169:
URL: https://github.com/apache/hadoop/pull/3169#issuecomment-873715651


   @virajjasani Thanks for contribution. @aajisaka @tomscut Thanks for review.
   Merged to trunk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618467)
Time Spent: 1h 20m  (was: 1h 10m)

> Incorrect log placeholders used in JournalNodeSyncer
> 
>
> Key: HDFS-16108
> URL: https://issues.apache.org/jira/browse/HDFS-16108
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When Journal sync thread is using incorrect log placeholders at 2 places:
>  # When it fails to create dir for downloading log segments
>  # When it fails to move tmp editFile to current dir
> Since these failure logs are important to debug JN sync issues, we should fix 
> these incorrect placeholders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16108) Incorrect log placeholders used in JournalNodeSyncer

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16108?focusedWorklogId=618468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618468
 ]

ASF GitHub Bot logged work on HDFS-16108:
-

Author: ASF GitHub Bot
Created on: 05/Jul/21 01:23
Start Date: 05/Jul/21 01:23
Worklog Time Spent: 10m 
  Work Description: ferhui merged pull request #3169:
URL: https://github.com/apache/hadoop/pull/3169


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618468)
Time Spent: 1.5h  (was: 1h 20m)

> Incorrect log placeholders used in JournalNodeSyncer
> 
>
> Key: HDFS-16108
> URL: https://issues.apache.org/jira/browse/HDFS-16108
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When Journal sync thread is using incorrect log placeholders at 2 places:
>  # When it fails to create dir for downloading log segments
>  # When it fails to move tmp editFile to current dir
> Since these failure logs are important to debug JN sync issues, we should fix 
> these incorrect placeholders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16109) Fix flaky some unit tests since they offen timeout

2021-07-04 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374440#comment-17374440
 ] 

tomscut commented on HDFS-16109:


Thanks [~aajisaka] for the merge.

> Fix flaky some unit tests since they offen timeout
> --
>
> Key: HDFS-16109
> URL: https://issues.apache.org/jira/browse/HDFS-16109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Increase timeout for TestBootstrapStandby, TestFsVolumeList and 
> TestDecommissionWithBackoffMonitor since they offen timeout.
>  
> TestBootstrapStandby:
> {code:java}
> [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 159.474 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests 
> run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] 
> testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby)
>   Time elapsed: 31.262 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at 
> java.io.RandomAccessFile.write(RandomAccessFile.java:512) at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) 
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> TestFsVolumeList:
> {code:java}
> [ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 190.294 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 190.294 s 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> testAddRplicaProcessorForAddingReplicaInMap(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList)
>   Time elapsed: 60.028 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at 
> 

[jira] [Updated] (HDFS-16109) Fix flaky some unit tests since they offen timeout

2021-07-04 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-16109:
-
Issue Type: Bug  (was: Wish)

> Fix flaky some unit tests since they offen timeout
> --
>
> Key: HDFS-16109
> URL: https://issues.apache.org/jira/browse/HDFS-16109
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Increase timeout for TestBootstrapStandby, TestFsVolumeList and 
> TestDecommissionWithBackoffMonitor since they offen timeout.
>  
> TestBootstrapStandby:
> {code:java}
> [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 159.474 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests 
> run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] 
> testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby)
>   Time elapsed: 31.262 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at 
> java.io.RandomAccessFile.write(RandomAccessFile.java:512) at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) 
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> TestFsVolumeList:
> {code:java}
> [ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 190.294 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 190.294 s 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> testAddRplicaProcessorForAddingReplicaInMap(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList)
>   Time elapsed: 60.028 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at 
> java.util.concurrent.FutureTask.get(FutureTask.java:191) at 
> 

[jira] [Updated] (HDFS-16109) Fix flaky some unit tests since they offen timeout

2021-07-04 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-16109:
-
Component/s: test

> Fix flaky some unit tests since they offen timeout
> --
>
> Key: HDFS-16109
> URL: https://issues.apache.org/jira/browse/HDFS-16109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Increase timeout for TestBootstrapStandby, TestFsVolumeList and 
> TestDecommissionWithBackoffMonitor since they offen timeout.
>  
> TestBootstrapStandby:
> {code:java}
> [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 159.474 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests 
> run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] 
> testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby)
>   Time elapsed: 31.262 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at 
> java.io.RandomAccessFile.write(RandomAccessFile.java:512) at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) 
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> TestFsVolumeList:
> {code:java}
> [ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 190.294 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 190.294 s 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> testAddRplicaProcessorForAddingReplicaInMap(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList)
>   Time elapsed: 60.028 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at 
> 

[jira] [Resolved] (HDFS-16109) Fix flaky some unit tests since they offen timeout

2021-07-04 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved HDFS-16109.
--
Fix Version/s: 3.3.2
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed to trunk and branch-3.3. Thank you [~tomscut] for your contribution.

> Fix flaky some unit tests since they offen timeout
> --
>
> Key: HDFS-16109
> URL: https://issues.apache.org/jira/browse/HDFS-16109
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Increase timeout for TestBootstrapStandby, TestFsVolumeList and 
> TestDecommissionWithBackoffMonitor since they offen timeout.
>  
> TestBootstrapStandby:
> {code:java}
> [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 159.474 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests 
> run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] 
> testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby)
>   Time elapsed: 31.262 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at 
> java.io.RandomAccessFile.write(RandomAccessFile.java:512) at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) 
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> TestFsVolumeList:
> {code:java}
> [ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 190.294 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 190.294 s 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> testAddRplicaProcessorForAddingReplicaInMap(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList)
>   Time elapsed: 60.028 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds at sun.misc.Unsafe.park(Native Method) at 
> 

[jira] [Work logged] (HDFS-16109) Fix flaky some unit tests since they offen timeout

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16109?focusedWorklogId=618461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618461
 ]

ASF GitHub Bot logged work on HDFS-16109:
-

Author: ASF GitHub Bot
Created on: 04/Jul/21 23:14
Start Date: 04/Jul/21 23:14
Worklog Time Spent: 10m 
  Work Description: aajisaka merged pull request #3172:
URL: https://github.com/apache/hadoop/pull/3172


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618461)
Time Spent: 40m  (was: 0.5h)

> Fix flaky some unit tests since they offen timeout
> --
>
> Key: HDFS-16109
> URL: https://issues.apache.org/jira/browse/HDFS-16109
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Increase timeout for TestBootstrapStandby, TestFsVolumeList and 
> TestDecommissionWithBackoffMonitor since they offen timeout.
>  
> TestBootstrapStandby:
> {code:java}
> [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 159.474 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests 
> run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] 
> testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby)
>   Time elapsed: 31.262 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at 
> java.io.RandomAccessFile.write(RandomAccessFile.java:512) at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) 
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> TestFsVolumeList:
> {code:java}
> [ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 190.294 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] 
> 

[jira] [Updated] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes at datanodes.

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16111:
--
Labels: pull-request-available  (was: )

> Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes 
> at datanodes.
> ---
>
> Key: HDFS-16111
> URL: https://issues.apache.org/jira/browse/HDFS-16111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Zhihai Xu
>Assignee: Zhihai Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got 
> failed volume on a lot of datanodes, which cause some missing blocks at that 
> time. Although later on we recovered all the missing blocks by symlinking the 
> path (dfs/dn/current) on the failed volume to a new directory and copying all 
> the data to the new directory, we missed our SLA and it delayed our upgrading 
> process on our production cluster for several hours.
> When this issue happened, we saw a lot of this exceptions happened before the 
> volumed failed on the datanode:
>  [DataXceiver for client  at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] 
> [Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] 
> datanode.DataNode (BlockReceiver.java:(289)) - IOException in 
> BlockReceiver constructor :Possible disk error: Failed to create 
> /XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause 
> is
> java.io.IOException: No space left on device
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1012)
>         at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTemporary(FsVolumeImpl.java:1254)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1598)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:212)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1314)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:768)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291)
>         at java.lang.Thread.run(Thread.java:748)
>  
> We found this issue happened due to the following two reasons:
> First the upgrade process added some extra disk storage on the each disk 
> volume of the data node:
> BlockPoolSliceStorage.doUpgrade 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java#L445)
>  is the main upgrade function in the datanode, it will add some extra 
> storage. The extra storage added is all new directories created in 
> /current//current, although all block data file and block meta data 
> file are hard-linked with /current//previous after upgrade. Since there 
> will be a lot of new directories created, this will use some disk space on 
> each disk volume.
>  
> Second there is a potential bug when picking a disk volume to write a new 
> block file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, 
> The code to select a disk will check whether the available space on the 
> selected disk is more than the size bytes of block file to store 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86)
>  But when creating a new block, there will be two files created: one is the 
> block file blk_, the other is block metadata file blk__.meta, 
> this is the code when finalizing a block, both block file size and meta data 
> file size will be updated: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L391
>  

[jira] [Work logged] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes at datanodes.

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16111?focusedWorklogId=618459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618459
 ]

ASF GitHub Bot logged work on HDFS-16111:
-

Author: ASF GitHub Bot
Created on: 04/Jul/21 22:36
Start Date: 04/Jul/21 22:36
Worklog Time Spent: 10m 
  Work Description: zhihaixu2012 opened a new pull request #3175:
URL: https://github.com/apache/hadoop/pull/3175


   …avoid failed volumes at datanodes.
   
   Change-Id: Iead25812d4073e3980893e3e76f7d2b03b57442a
   
   JIRA: https://issues.apache.org/jira/browse/HDFS-16111
   
   there is a potential bug when picking a disk volume to write a new block 
file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, The code 
to select a disk will check whether the available space on the selected disk is 
more than the size bytes of block file to store 
(https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86)
 But when creating a new block, there will be two files created: one is the 
block file blk_, the other is block metadata file blk__.meta, this 
is the code when finalizing a block, both block file size and meta data file 
size will be updated: 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L391
 the current code only considers the size of block file and doesn't consider 
the size of block metadata file, when choosing a disk in 
RoundRobinVolumeChoosingPolicy. There can be a lot of on-going blocks received 
at the same time, the default maximum number of DataXceiver threads is 4096. 
This will underestimate the total size needed to write a block, which will 
potentially cause the disk full error(No space left on device) when writing a 
replica.
   
   Since the size of the block metadata file is not fixed, I suggest to add a 
configuration(dfs.datanode.round-robin-volume-choosing-policy.additional-available-space)
 to safeguard the disk space when choosing a volume to write a new block data 
in RoundRobinVolumeChoosingPolicy.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618459)
Remaining Estimate: 0h
Time Spent: 10m

> Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes 
> at datanodes.
> ---
>
> Key: HDFS-16111
> URL: https://issues.apache.org/jira/browse/HDFS-16111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Zhihai Xu
>Assignee: Zhihai Xu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got 
> failed volume on a lot of datanodes, which cause some missing blocks at that 
> time. Although later on we recovered all the missing blocks by symlinking the 
> path (dfs/dn/current) on the failed volume to a new directory and copying all 
> the data to the new directory, we missed our SLA and it delayed our upgrading 
> process on our production cluster for several hours.
> When this issue happened, we saw a lot of this exceptions happened before the 
> volumed failed on the datanode:
>  [DataXceiver for client  at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] 
> [Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] 
> datanode.DataNode (BlockReceiver.java:(289)) - IOException in 
> BlockReceiver constructor :Possible disk error: Failed to create 
> /XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause 
> is
> java.io.IOException: No space left on device
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1012)
>         at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532)
>         at 
> 

[jira] [Work logged] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16110?focusedWorklogId=618458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618458
 ]

ASF GitHub Bot logged work on HDFS-16110:
-

Author: ASF GitHub Bot
Created on: 04/Jul/21 21:56
Start Date: 04/Jul/21 21:56
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3174:
URL: https://github.com/apache/hadoop/pull/3174#issuecomment-873669979


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 52s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 58s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3174/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 1 new + 41 
unchanged - 5 fixed = 42 total (was 46)  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 26s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 19s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  79m 45s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3174/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3174 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 04166e58be95 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4fc88b18a6d1c1f3c19abd858954068099394452 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 

[jira] [Updated] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes at datanodes.

2021-07-04 Thread Zhihai Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihai Xu updated HDFS-16111:
-
Summary: Add a configuration to RoundRobinVolumeChoosingPolicy to avoid 
failed volumes at datanodes.  (was: Add a configuration to 
RoundRobinVolumeChoosingPolicy to avoid failed volumes at datanode.)

> Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes 
> at datanodes.
> ---
>
> Key: HDFS-16111
> URL: https://issues.apache.org/jira/browse/HDFS-16111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Zhihai Xu
>Assignee: Zhihai Xu
>Priority: Major
>
> When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got 
> failed volume on a lot of datanodes, which cause some missing blocks at that 
> time. Although later on we recovered all the missing blocks by symlinking the 
> path (dfs/dn/current) on the failed volume to a new directory and copying all 
> the data to the new directory, we missed our SLA and it delayed our upgrading 
> process on our production cluster for several hours.
> When this issue happened, we saw a lot of this exceptions happened before the 
> volumed failed on the datanode:
>  [DataXceiver for client  at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] 
> [Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] 
> datanode.DataNode (BlockReceiver.java:(289)) - IOException in 
> BlockReceiver constructor :Possible disk error: Failed to create 
> /XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause 
> is
> java.io.IOException: No space left on device
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1012)
>         at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTemporary(FsVolumeImpl.java:1254)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1598)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:212)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1314)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:768)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291)
>         at java.lang.Thread.run(Thread.java:748)
>  
> We found this issue happened due to the following two reasons:
> First the upgrade process added some extra disk storage on the each disk 
> volume of the data node:
> BlockPoolSliceStorage.doUpgrade 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java#L445)
>  is the main upgrade function in the datanode, it will add some extra 
> storage. The extra storage added is all new directories created in 
> /current//current, although all block data file and block meta data 
> file are hard-linked with /current//previous after upgrade. Since there 
> will be a lot of new directories created, this will use some disk space on 
> each disk volume.
>  
> Second there is a potential bug when picking a disk volume to write a new 
> block file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, 
> The code to select a disk will check whether the available space on the 
> selected disk is more than the size bytes of block file to store 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86)
>  But when creating a new block, there will be two files created: one is the 
> block file blk_, the other is block metadata file blk__.meta, 
> this is the code when finalizing a block, both block file size and meta data 
> file size will be updated: 
> 

[jira] [Updated] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes at datanode.

2021-07-04 Thread Zhihai Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihai Xu updated HDFS-16111:
-
Summary: Add a configuration to RoundRobinVolumeChoosingPolicy to avoid 
failed volumes at datanode.  (was: Add a configuration to 
RoundRobinVolumeChoosingPolicy to avoid failed volumes.)

> Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes 
> at datanode.
> --
>
> Key: HDFS-16111
> URL: https://issues.apache.org/jira/browse/HDFS-16111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Zhihai Xu
>Assignee: Zhihai Xu
>Priority: Major
>
> When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got 
> failed volume on a lot of datanodes, which cause some missing blocks at that 
> time. Although later on we recovered all the missing blocks by symlinking the 
> path (dfs/dn/current) on the failed volume to a new directory and copying all 
> the data to the new directory, we missed our SLA and it delayed our upgrading 
> process on our production cluster for several hours.
> When this issue happened, we saw a lot of this exceptions happened before the 
> volumed failed on the datanode:
>  [DataXceiver for client  at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] 
> [Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] 
> datanode.DataNode (BlockReceiver.java:(289)) - IOException in 
> BlockReceiver constructor :Possible disk error: Failed to create 
> /XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause 
> is
> java.io.IOException: No space left on device
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1012)
>         at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTemporary(FsVolumeImpl.java:1254)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1598)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:212)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1314)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:768)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291)
>         at java.lang.Thread.run(Thread.java:748)
>  
> We found this issue happened due to the following two reasons:
> First the upgrade process added some extra disk storage on the each disk 
> volume of the data node:
> BlockPoolSliceStorage.doUpgrade 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java#L445)
>  is the main upgrade function in the datanode, it will add some extra 
> storage. The extra storage added is all new directories created in 
> /current//current, although all block data file and block meta data 
> file are hard-linked with /current//previous after upgrade. Since there 
> will be a lot of new directories created, this will use some disk space on 
> each disk volume.
>  
> Second there is a potential bug when picking a disk volume to write a new 
> block file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, 
> The code to select a disk will check whether the available space on the 
> selected disk is more than the size bytes of block file to store 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86)
>  But when creating a new block, there will be two files created: one is the 
> block file blk_, the other is block metadata file blk__.meta, 
> this is the code when finalizing a block, both block file size and meta data 
> file size will be updated: 
> 

[jira] [Updated] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes.

2021-07-04 Thread Zhihai Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihai Xu updated HDFS-16111:
-
Summary: Add a configuration to RoundRobinVolumeChoosingPolicy to avoid 
failed volumes.  (was: Add a configuration to RoundRobinVolumeChoosingPolicy to 
avoid picking an almost full volume to place a replica. )

> Add a configuration to RoundRobinVolumeChoosingPolicy to avoid failed volumes.
> --
>
> Key: HDFS-16111
> URL: https://issues.apache.org/jira/browse/HDFS-16111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Zhihai Xu
>Assignee: Zhihai Xu
>Priority: Major
>
> When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got 
> failed volume on a lot of datanodes, which cause some missing blocks at that 
> time. Although later on we recovered all the missing blocks by symlinking the 
> path (dfs/dn/current) on the failed volume to a new directory and copying all 
> the data to the new directory, we missed our SLA and it delayed our upgrading 
> process on our production cluster for several hours.
> When this issue happened, we saw a lot of this exceptions happened before the 
> volumed failed on the datanode:
>  [DataXceiver for client  at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] 
> [Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] 
> datanode.DataNode (BlockReceiver.java:(289)) - IOException in 
> BlockReceiver constructor :Possible disk error: Failed to create 
> /XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause 
> is
> java.io.IOException: No space left on device
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1012)
>         at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTemporary(FsVolumeImpl.java:1254)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1598)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:212)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1314)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:768)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291)
>         at java.lang.Thread.run(Thread.java:748)
>  
> We found this issue happened due to the following two reasons:
> First the upgrade process added some extra disk storage on the each disk 
> volume of the data node:
> BlockPoolSliceStorage.doUpgrade 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java#L445)
>  is the main upgrade function in the datanode, it will add some extra 
> storage. The extra storage added is all new directories created in 
> /current//current, although all block data file and block meta data 
> file are hard-linked with /current//previous after upgrade. Since there 
> will be a lot of new directories created, this will use some disk space on 
> each disk volume.
>  
> Second there is a potential bug when picking a disk volume to write a new 
> block file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, 
> The code to select a disk will check whether the available space on the 
> selected disk is more than the size bytes of block file to store 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86)
>  But when creating a new block, there will be two files created: one is the 
> block file blk_, the other is block metadata file blk__.meta, 
> this is the code when finalizing a block, both block file size and meta data 
> file size will be updated: 
> 

[jira] [Created] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid picking an almost full volume to place a replica.

2021-07-04 Thread Zhihai Xu (Jira)
Zhihai Xu created HDFS-16111:


 Summary: Add a configuration to RoundRobinVolumeChoosingPolicy to 
avoid picking an almost full volume to place a replica. 
 Key: HDFS-16111
 URL: https://issues.apache.org/jira/browse/HDFS-16111
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Zhihai Xu
Assignee: Zhihai Xu


When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got 
failed volume on a lot of datanodes, which cause some missing blocks at that 
time. Although later on we recovered all the missing blocks by symlinking the 
path (dfs/dn/current) on the failed volume to a new directory and copying all 
the data to the new directory, we missed our SLA and it delayed our upgrading 
process on our production cluster for several hours.

When this issue happened, we saw a lot of this exceptions happened before the 
volumed failed on the datanode:

 [DataXceiver for client  at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] 
[Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] 
datanode.DataNode (BlockReceiver.java:(289)) - IOException in 
BlockReceiver constructor :Possible disk error: Failed to create 
/XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause is
java.io.IOException: No space left on device
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:1012)
        at 
org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302)
        at 
org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTemporary(FsVolumeImpl.java:1254)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1598)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:212)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1314)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:768)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291)
        at java.lang.Thread.run(Thread.java:748)

 

We found this issue happened due to the following two reasons:

First the upgrade process added some extra disk storage on the each disk volume 
of the data node:

BlockPoolSliceStorage.doUpgrade 
(https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java#L445)
 is the main upgrade function in the datanode, it will add some extra storage. 
The extra storage added is all new directories created in 
/current//current, although all block data file and block meta data file 
are hard-linked with /current//previous after upgrade. Since there will 
be a lot of new directories created, this will use some disk space on each disk 
volume.

 

Second there is a potential bug when picking a disk volume to write a new block 
file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, The code 
to select a disk will check whether the available space on the selected disk is 
more than the size bytes of block file to store 
(https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86)
 But when creating a new block, there will be two files created: one is the 
block file blk_, the other is block metadata file blk__.meta, this 
is the code when finalizing a block, both block file size and meta data file 
size will be updated: 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L391
 the current code only considers the size of block file and doesn't consider 
the size of block metadata file, when choosing a disk in 
RoundRobinVolumeChoosingPolicy. There can be a lot of on-going blocks received 
at the same time, the default maximum number of DataXceiver threads is 4096. 
This will underestimate the total size needed to write a block, which will 
potentially cause the above disk full error(No space left on device).

 

Since the size of the block metadata file is not 

[jira] [Commented] (HDFS-16100) HA: Improve performance of Standby node transition to Active

2021-07-04 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374323#comment-17374323
 ] 

Xiaoqiao He commented on HDFS-16100:


Thanks [~ayushtkn] for your comments.
IMO, it is safe to to queue when `storedBlock.getGenerationStamp() <= 
iblk.getGenerationStamp()` rather than `storedBlock.getGenerationStamp() == 
iblk.getGenerationStamp()` here.
{code:java}
+  if (!(reportedState == ReplicaState.RBW &&
+  storedBlock.getGenerationStamp() != iblk.getGenerationStamp())) {
+..
+  }
{code}
others look good to me. I will give my +1 when fix that. Thanks.

>  HA: Improve performance of Standby node transition to Active
> -
>
> Key: HDFS-16100
> URL: https://issues.apache.org/jira/browse/HDFS-16100
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: wudeyu
>Assignee: wudeyu
>Priority: Major
> Attachments: HDFS-16100.patch
>
>
> pendingDNMessages in Standby is used to support process postponed block 
> reports. Block reports in pendingDNMessages would be processed:
>  # If GS of replica is in the future, Standby Node will process it when 
> corresponding edit log(e.g add_block) is loaded.
>  # If replica is corrupted, Standby Node will process it while it transfer to 
> Active.
>  # If DataNode is removed, corresponding of block reports will be removed in 
> pendingDNMessages.
> Obviously, if num of corrupted replica grows, more time cost during 
> transferring. In out situation, there're 60 millions block reports in 
> pendingDNMessages before transfer. Processing block reports cost almost 7mins 
> and it's killed by zkfc. The replica state of the most block reports is RBW 
> with wrong GS(less than storedblock in Standby Node).
> In my opinion, Standby Node could ignore the block reports that replica state 
> is RBW with wrong GS. Because Active node/DataNode will remove it later.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load

2021-07-04 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut updated HDFS-16088:
---
Description: 
As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also 
request to SNN to reduce the ANN load.

There are two points that need to be mentioned:
 1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, so 
we can access SNN directly.
 2. We can share the same UT(testBalancerRequestSBNWithHA) with 
NameNodeConnector#getBlocks().

  was:
As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also 
request to SNN to reduce the ANN load.

There are two points that need to be mentioned:
 1. FSNamesystem#getLiveDatanodeStorageReport() is OperationCategory.UNCHECKED, 
so we can access SNN directly.
 2. We can share the same UT(testBalancerRequestSBNWithHA) with 
NameNodeConnector#getBlocks().


> Standby NameNode process getLiveDatanodeStorageReport request to reduce 
> Active load
> ---
>
> Key: HDFS-16088
> URL: https://issues.apache.org/jira/browse/HDFS-16088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: standyby-ipcserver.jpg
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also 
> request to SNN to reduce the ANN load.
> There are two points that need to be mentioned:
>  1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, 
> so we can access SNN directly.
>  2. We can share the same UT(testBalancerRequestSBNWithHA) with 
> NameNodeConnector#getBlocks().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16110:
--
Labels: pull-request-available  (was: )

> Remove unused method reportChecksumFailure in DFSClient
> ---
>
> Key: HDFS-16110
> URL: https://issues.apache.org/jira/browse/HDFS-16110
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove unused method reportChecksumFailure and fix some code styles by the 
> way in DFSClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16110?focusedWorklogId=618433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618433
 ]

ASF GitHub Bot logged work on HDFS-16110:
-

Author: ASF GitHub Bot
Created on: 04/Jul/21 13:54
Start Date: 04/Jul/21 13:54
Worklog Time Spent: 10m 
  Work Description: tomscut opened a new pull request #3174:
URL: https://github.com/apache/hadoop/pull/3174


   JIRA: [HDFS-16110](https://issues.apache.org/jira/browse/HDFS-16110)
   
   Remove unused method reportChecksumFailure and fix some code styles by the 
way in DFSClient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618433)
Remaining Estimate: 0h
Time Spent: 10m

> Remove unused method reportChecksumFailure in DFSClient
> ---
>
> Key: HDFS-16110
> URL: https://issues.apache.org/jira/browse/HDFS-16110
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Remove unused method reportChecksumFailure and fix some code styles by the 
> way in DFSClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient

2021-07-04 Thread tomscut (Jira)
tomscut created HDFS-16110:
--

 Summary: Remove unused method reportChecksumFailure in DFSClient
 Key: HDFS-16110
 URL: https://issues.apache.org/jira/browse/HDFS-16110
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Remove unused method reportChecksumFailure and fix some code styles by the way 
in DFSClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16109) Fix flaky some unit tests since they offen timeout

2021-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16109?focusedWorklogId=618418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618418
 ]

ASF GitHub Bot logged work on HDFS-16109:
-

Author: ASF GitHub Bot
Created on: 04/Jul/21 09:55
Start Date: 04/Jul/21 09:55
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3172:
URL: https://github.com/apache/hadoop/pull/3172#issuecomment-873557085


   Thanks @aajisaka and @ayushtkn for your review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618418)
Time Spent: 0.5h  (was: 20m)

> Fix flaky some unit tests since they offen timeout
> --
>
> Key: HDFS-16109
> URL: https://issues.apache.org/jira/browse/HDFS-16109
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Increase timeout for TestBootstrapStandby, TestFsVolumeList and 
> TestDecommissionWithBackoffMonitor since they offen timeout.
>  
> TestBootstrapStandby:
> {code:java}
> [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 159.474 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests 
> run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< 
> FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] 
> testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby)
>   Time elapsed: 31.262 s  <<< 
> ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at 
> java.io.RandomAccessFile.write(RandomAccessFile.java:512) at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699)
>  at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) 
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> TestFsVolumeList:
> {code:java}
> [ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 190.294 s <<< FAILURE! - in 
>