[jira] [Commented] (HDFS-15176) Enable GcTimePercentage Metric in NameNode's JvmMetrics.

2020-02-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038845#comment-17038845
 ] 

Hadoop QA commented on HDFS-15176:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m  
0s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 19s{color} | {color:orange} root: The patch generated 12 new + 568 unchanged 
- 0 fixed = 580 total (was 568) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m  8s{color} 
| {color:red} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 42s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}257m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.ipc.TestRPC |
|   | hadoop.security.TestRaceWhenRelogin |
|   | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15176 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12993731/HDFS-15176.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1b1487595d04 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.

2020-02-17 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-15177:
-
Summary: Split datanode invalide block deletion, to avoid the FsDatasetImpl 
lock too much time.  (was: Split datanode invalide block deletion, to avoid the 
FsDatasetImpl lock too many time.)

> Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too 
> much time.
> --
>
> Key: HDFS-15177
> URL: https://issues.apache.org/jira/browse/HDFS-15177
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> In our cluster, the datanode receive the delete command with too many blocks 
> deletion when we have many blockpools sharing the same datanode and the 
> datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too 
> much time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time.

2020-02-17 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-15177:
-
Description: 
In our cluster, the datanode receive the delete command with too many blocks 
deletion when we have many blockpools sharing the same datanode and the 
datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too 
much time.

 

  was:
In our cluster, the datanode receive the delete command with too many blocks 
deletion when we have many blockpools sharing the same datanode, it will cause 
the FsDatasetImpl lock too much time.

 


> Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too 
> many time.
> --
>
> Key: HDFS-15177
> URL: https://issues.apache.org/jira/browse/HDFS-15177
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> In our cluster, the datanode receive the delete command with too many blocks 
> deletion when we have many blockpools sharing the same datanode and the 
> datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too 
> much time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15178) Federation: Add missing FederationClientInterceptor APIs

2020-02-17 Thread D M Murali Krishna Reddy (Jira)
D M Murali Krishna Reddy created HDFS-15178:
---

 Summary: Federation: Add missing FederationClientInterceptor APIs
 Key: HDFS-15178
 URL: https://issues.apache.org/jira/browse/HDFS-15178
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: federation
Reporter: D M Murali Krishna Reddy


In FederationClientInterceptor, many API's are not Implemented.
 * getClusterNodes
 * getQueueInfo
 * getQueueUserAcls
 * moveApplicationAcrossQueues
 * getNewReservation
 * submitReservation
 * listReservations
 * updateReservation
 * deleteReservation
 * getNodeToLabels
 * getLabelsToNodes
 * getClusterNodeLabels
 * getApplicationAttemptReport
 * getApplicationAttempts
 * getContainerReport
 * getContainers
 * getDelegationToken
 * renewDelegationToken
 * cancelDelegationToken
 * failApplicationAttempt
 * updateApplicationPriority
 * signalToContainer
 * updateApplicationTimeouts
 * getResourceProfiles
 * getResourceProfile
 * getResourceTypeInfo
 * getAttributesToNodes
 * getClusterNodeAttributes
 * getNodesToAttributes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time.

2020-02-17 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-15177:
-
Description: 
In our cluster, the datanode receive the delete command with too many blocks 
deletion when we have many blockpools sharing the same datanode, it will cause 
the FsDatasetImpl lock too much time.

 

  was:
In our cluster , the datanode receive the delete command with too many blocks 
deletion when we have many blockpools sharing the same datanode, it will cause 
the FsDatasetImpl lock too much time.

 


> Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too 
> many time.
> --
>
> Key: HDFS-15177
> URL: https://issues.apache.org/jira/browse/HDFS-15177
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> In our cluster, the datanode receive the delete command with too many blocks 
> deletion when we have many blockpools sharing the same datanode, it will 
> cause the FsDatasetImpl lock too much time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time.

2020-02-17 Thread zhuqi (Jira)
zhuqi created HDFS-15177:


 Summary: Split datanode invalide block deletion, to avoid the 
FsDatasetImpl lock too many time.
 Key: HDFS-15177
 URL: https://issues.apache.org/jira/browse/HDFS-15177
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: zhuqi
Assignee: zhuqi


In our cluster , the datanode receive the delete command with too many blocks 
deletion when we have many blockpools sharing the same datanode, it will cause 
the FsDatasetImpl lock too much time.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.

2020-02-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038793#comment-17038793
 ] 

Hadoop QA commented on HDFS-15120:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 348 unchanged - 0 fixed = 351 total (was 348) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}120m  
7s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}187m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15120 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12993730/HDFS-15120.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8e44025ebd0c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a562942 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28794/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28794/testReport/ |
| Max. process+thread count | 3048 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28794/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Updated] (HDFS-15176) Enable GcTimePercentage Metric in NameNode's JvmMetrics.

2020-02-17 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15176:
---
Attachment: HDFS-15176.001.patch
Status: Patch Available  (was: Open)

> Enable GcTimePercentage Metric in NameNode's JvmMetrics.
> 
>
> Key: HDFS-15176
> URL: https://issues.apache.org/jira/browse/HDFS-15176
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-15176.001.patch
>
>
> The GcTimePercentage(computed by GcTimeMonitor) could be used as a dimension 
> to analyze the NameNode GC.  We should add a switch config to enable the 
> GcTimePercentage metric in HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15176) Enable GcTimePercentage Metric in NameNode's JvmMetrics.

2020-02-17 Thread Jinglun (Jira)
Jinglun created HDFS-15176:
--

 Summary: Enable GcTimePercentage Metric in NameNode's JvmMetrics.
 Key: HDFS-15176
 URL: https://issues.apache.org/jira/browse/HDFS-15176
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jinglun
Assignee: Jinglun


The GcTimePercentage(computed by GcTimeMonitor) could be used as a dimension to 
analyze the NameNode GC.  We should add a switch config to enable the 
GcTimePercentage metric in HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.

2020-02-17 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15120:
---
Attachment: HDFS-15120.003.patch

> Refresh BlockPlacementPolicy at runtime.
> 
>
> Key: HDFS-15120
> URL: https://issues.apache.org/jira/browse/HDFS-15120
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch, 
> HDFS-15120.003.patch
>
>
> Now if we want to switch BlockPlacementPolicies we need to restart the 
> NameNode. It would be convenient if we can switch it at runtime. For example 
> we can switch between AvailableSpaceBlockPlacementPolicy and 
> BlockPlacementPolicyDefault as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-17 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038721#comment-17038721
 ] 

Lisheng Sun commented on HDFS-15172:


Thank [~elgoiri] for review.
This jira solves the problem that that excessive frequency check. HDFS-15149 
should not bring this part of the modification and is surpose to solve that 
DeadNodeDetector surpresses all interrupts and never checks for a termination 
flag. I think these two problem is better divided into two jiras. 

> Remove unnecessary  deadNodeDetectInterval in 
> DeadNodeDetector#checkDeadNodes()
> ---
>
> Key: HDFS-15172
> URL: https://issues.apache.org/jira/browse/HDFS-15172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15172-001.patch, HDFS-15172-002.patch
>
>
> Every call to checkDeadNodes() will change the state to IDLE forcing the 
> DeadNodeDetector to sleep for IDLE_SLEEP_MS. So we don't need 
> deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-17 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038673#comment-17038673
 ] 

Ahmed Hussein commented on HDFS-15149:
--

The  poll period and waiting time (5000 and 10) in {{waitFoDeadNode}} is 
very large. I assume you had to use large numbers to match the delays of the 
detector threads.
I have a question about {{clearAndGetDetectedDeadNodes()}}: As far as I 
understand Calling the method in a loop means that a "deadnode" can be removed 
from the {{deadNodes}} map. In other words, the count may never reach 3, 
because the map does not for the removed nodes from the list. Please feel free 
to correct my understanding of the code if I am wrong.

I did not find easy to understand the implementation of the 
{{DeadNodeDetector}}. It is very challenging to avoid timeout when there are 
multiple threads running in parallel {{DeadNodeDetector}}, {{Probe}}, and 
{{ProbeSchedulers}}.

IMHO, {{DeadNodeDetector.java}} needs to introduce more aggressive mechanisms 
to coordinate between the threads. Instead of just racing between each other, 
tasks can use conditional variables to communicate like synchronized queues, or 
object monitors. Another benefit from using conditional variables is that the 
runtime of the tests will be improved because there won't be need to wait for a 
full cycle.
The {{DefaultSpeculator.java}} has a synchronized queue just for the purpose of 
testing: "{{DefaultSpeculator.scanControl}}".

 

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15149-001.patch
>
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> 

[jira] [Commented] (HDFS-15167) Block Report Interval shouldn't be reset apart from first Block Report

2020-02-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038665#comment-17038665
 ] 

Hadoop QA commented on HDFS-15167:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  9m 
22s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
27s{color} | {color:red} hadoop-hdfs in trunk failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
35s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 40s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDeadNodeDetection |
|   | 
hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness
 |
|   | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
|   | hadoop.hdfs.server.blockmanagement.TestBlockInfoStriped |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.blockmanagement.TestPendingReconstruction |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15167 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12993713/HDFS-15167-08.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux dd042c1ae88c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 439d935 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28793/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| mvninstall | 

[jira] [Commented] (HDFS-12459) Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2020-02-17 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038658#comment-17038658
 ] 

Kihwal Lee commented on HDFS-12459:
---

We discovered HDFS-11156 was still in branch-2.10. Unreverted, it caused 
problems in testing. I reverted it from branch-2.10 and cherry-picked this Jira 
to branch-3.1 and branch-2.10.

> Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
> 
>
> Key: HDFS-12459
> URL: https://issues.apache.org/jira/browse/HDFS-12459
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-12459.001.patch, HDFS-12459.002.patch, 
> HDFS-12459.003.patch, HDFS-12459.004.patch, HDFS-12459.005.patch, 
> HDFS-12459.006.patch, HDFS-12459.006.patch, HDFS-12459.007.patch, 
> HDFS-12459.008.patch
>
>
> HDFS-11156 was reverted because the implementation was non optimal, based on 
> the suggestion from [~shahrs87], we should avoid creating a dfs client to get 
> block locations because that create extra RPC call. Instead we should use 
> {{NamenodeProtocols#getBlockLocations}} then covert {{LocatedBlocks}} to 
> {{BlockLocation[]}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12459) Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2020-02-17 Thread Kihwal Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-12459:
--
Fix Version/s: 2.10.1
   3.1.4

> Fix revert: Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
> 
>
> Key: HDFS-12459
> URL: https://issues.apache.org/jira/browse/HDFS-12459
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-12459.001.patch, HDFS-12459.002.patch, 
> HDFS-12459.003.patch, HDFS-12459.004.patch, HDFS-12459.005.patch, 
> HDFS-12459.006.patch, HDFS-12459.006.patch, HDFS-12459.007.patch, 
> HDFS-12459.008.patch
>
>
> HDFS-11156 was reverted because the implementation was non optimal, based on 
> the suggestion from [~shahrs87], we should avoid creating a dfs client to get 
> block locations because that create extra RPC call. Instead we should use 
> {{NamenodeProtocols#getBlockLocations}} then covert {{LocatedBlocks}} to 
> {{BlockLocation[]}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2020-02-17 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038653#comment-17038653
 ] 

Kihwal Lee commented on HDFS-11156:
---

Reverted from only 2.10 for now. I also cherry-picked HDFS-12459.

> Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
> 
>
> Key: HDFS-11156
> URL: https://issues.apache.org/jira/browse/HDFS-11156
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.3
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.0.0-alpha2
>
> Attachments: BlockLocationProperties_JSON_Schema.jpg, 
> BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, 
> HDFS-11156-branch-2.01.patch, HDFS-11156.01.patch, HDFS-11156.02.patch, 
> HDFS-11156.03.patch, HDFS-11156.04.patch, HDFS-11156.05.patch, 
> HDFS-11156.06.patch, HDFS-11156.07.patch, HDFS-11156.08.patch, 
> HDFS-11156.09.patch, HDFS-11156.10.patch, HDFS-11156.11.patch, 
> HDFS-11156.12.patch, HDFS-11156.13.patch, HDFS-11156.14.patch, 
> HDFS-11156.15.patch, HDFS-11156.16.patch, Output_JSON_format_v10.jpg, 
> SampleResponse_JSON.jpg
>
>
> Following webhdfs REST API
> {code}
> http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS=0=1
> {code}
> will get a response like
> {code}
> {
>   "LocatedBlocks" : {
> "fileLength" : 1073741824,
> "isLastBlockComplete" : true,
> "isUnderConstruction" : false,
> "lastLocatedBlock" : { ... },
> "locatedBlocks" : [ {...} ]
>   }
> }
> {code}
> This represents for *o.a.h.h.p.LocatedBlocks*. However according to 
> *FileSystem* API, 
> {code}
> public BlockLocation[] getFileBlockLocations(Path p, long start, long len)
> {code}
> clients would expect an array of BlockLocation. This mismatch should be 
> fixed. Marked as Incompatible change as this will change the output of the 
> GET_BLOCK_LOCATIONS API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API

2020-02-17 Thread Kihwal Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038590#comment-17038590
 ] 

Kihwal Lee commented on HDFS-11156:
---

Contrary to the Fix Version (3.0.0-alpha2) of this jira, the change was also 
committed to branch-2.10 (formerly branch-2) and branch-2.9. They were not 
reverted and it is causing problem in our 2.10 testing.  I will revert them and 
see if HDFS-12459 can be applied.

> Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
> 
>
> Key: HDFS-11156
> URL: https://issues.apache.org/jira/browse/HDFS-11156
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.3
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.0.0-alpha2
>
> Attachments: BlockLocationProperties_JSON_Schema.jpg, 
> BlockLocations_JSON_Schema.jpg, FileStatuses_JSON_Schema.jpg, 
> HDFS-11156-branch-2.01.patch, HDFS-11156.01.patch, HDFS-11156.02.patch, 
> HDFS-11156.03.patch, HDFS-11156.04.patch, HDFS-11156.05.patch, 
> HDFS-11156.06.patch, HDFS-11156.07.patch, HDFS-11156.08.patch, 
> HDFS-11156.09.patch, HDFS-11156.10.patch, HDFS-11156.11.patch, 
> HDFS-11156.12.patch, HDFS-11156.13.patch, HDFS-11156.14.patch, 
> HDFS-11156.15.patch, HDFS-11156.16.patch, Output_JSON_format_v10.jpg, 
> SampleResponse_JSON.jpg
>
>
> Following webhdfs REST API
> {code}
> http://:/webhdfs/v1/?op=GET_BLOCK_LOCATIONS=0=1
> {code}
> will get a response like
> {code}
> {
>   "LocatedBlocks" : {
> "fileLength" : 1073741824,
> "isLastBlockComplete" : true,
> "isUnderConstruction" : false,
> "lastLocatedBlock" : { ... },
> "locatedBlocks" : [ {...} ]
>   }
> }
> {code}
> This represents for *o.a.h.h.p.LocatedBlocks*. However according to 
> *FileSystem* API, 
> {code}
> public BlockLocation[] getFileBlockLocations(Path p, long start, long len)
> {code}
> clients would expect an array of BlockLocation. This mismatch should be 
> fixed. Marked as Incompatible change as this will change the output of the 
> GET_BLOCK_LOCATIONS API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15167) Block Report Interval shouldn't be reset apart from first Block Report

2020-02-17 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038573#comment-17038573
 ] 

Ayush Saxena commented on HDFS-15167:
-

Handled in v8

> Block Report Interval shouldn't be reset apart from first Block Report
> --
>
> Key: HDFS-15167
> URL: https://issues.apache.org/jira/browse/HDFS-15167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15167-01.patch, HDFS-15167-02.patch, 
> HDFS-15167-03.patch, HDFS-15167-04.patch, HDFS-15167-05.patch, 
> HDFS-15167-06.patch, HDFS-15167-07.patch, HDFS-15167-08.patch
>
>
> Presently BlockReport interval is reset even in case the BR is manually 
> triggered or BR is triggered for diskError.
> Which isn't required. As per the comment also, it is intended for first BR 
> only :
> {code:java}
>   // If we have sent the first set of block reports, then wait a random
>   // time before we start the periodic block reports.
>   if (resetBlockReportTime) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15167) Block Report Interval shouldn't be reset apart from first Block Report

2020-02-17 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15167:

Attachment: HDFS-15167-08.patch

> Block Report Interval shouldn't be reset apart from first Block Report
> --
>
> Key: HDFS-15167
> URL: https://issues.apache.org/jira/browse/HDFS-15167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15167-01.patch, HDFS-15167-02.patch, 
> HDFS-15167-03.patch, HDFS-15167-04.patch, HDFS-15167-05.patch, 
> HDFS-15167-06.patch, HDFS-15167-07.patch, HDFS-15167-08.patch
>
>
> Presently BlockReport interval is reset even in case the BR is manually 
> triggered or BR is triggered for diskError.
> Which isn't required. As per the comment also, it is intended for first BR 
> only :
> {code:java}
>   // If we have sent the first set of block reports, then wait a random
>   // time before we start the periodic block reports.
>   if (resetBlockReportTime) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038568#comment-17038568
 ] 

Íñigo Goiri commented on HDFS-15172:


It looks like you are doing part of this in HDFS-15149 too.
What do you want to do with this?

> Remove unnecessary  deadNodeDetectInterval in 
> DeadNodeDetector#checkDeadNodes()
> ---
>
> Key: HDFS-15172
> URL: https://issues.apache.org/jira/browse/HDFS-15172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15172-001.patch, HDFS-15172-002.patch
>
>
> Every call to checkDeadNodes() will change the state to IDLE forcing the 
> DeadNodeDetector to sleep for IDLE_SLEEP_MS. So we don't need 
> deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038565#comment-17038565
 ] 

Íñigo Goiri commented on HDFS-15149:


The part about manually enabling/disabling the thread is not the cleanest.
I don't think there is a better way though.
I like the rest of the solution though.
However, it looks like testDeadNodeDetectionInBackground is not handling this 
well.

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15149-001.patch
>
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
> "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:552)
> at java.util.TimerThread.run(Timer.java:505)
> "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
>  daemon prio=5 tid=385 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
> at java.lang.Thread.run(Thread.java:748)
> "qtp164757726-349" daemon prio=5 tid=349 runnable
> java.lang.Thread.State: 

[jira] [Commented] (HDFS-15167) Block Report Interval shouldn't be reset apart from first Block Report

2020-02-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038562#comment-17038562
 ] 

Íñigo Goiri commented on HDFS-15167:


This looks good, just a minor comment.
In the javadoc, where we define delay, let's add the unit (milliseconds, 
right?) and mention that 0 or smaller sends it right away.

> Block Report Interval shouldn't be reset apart from first Block Report
> --
>
> Key: HDFS-15167
> URL: https://issues.apache.org/jira/browse/HDFS-15167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15167-01.patch, HDFS-15167-02.patch, 
> HDFS-15167-03.patch, HDFS-15167-04.patch, HDFS-15167-05.patch, 
> HDFS-15167-06.patch, HDFS-15167-07.patch
>
>
> Presently BlockReport interval is reset even in case the BR is manually 
> triggered or BR is triggered for diskError.
> Which isn't required. As per the comment also, it is intended for first BR 
> only :
> {code:java}
>   // If we have sent the first set of block reports, then wait a random
>   // time before we start the periodic block reports.
>   if (resetBlockReportTime) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038552#comment-17038552
 ] 

Hadoop QA commented on HDFS-15149:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
10s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 15s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}212m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15149 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12993694/HDFS-15149-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eddba98369de 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 439d935 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| 

[jira] [Commented] (HDFS-15167) Block Report Interval shouldn't be reset apart from first Block Report

2020-02-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038510#comment-17038510
 ] 

Hadoop QA commented on HDFS-15167:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 21s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.6 Server=19.03.6 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15167 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12993683/HDFS-15167-07.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 55146fa4a128 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 439d935 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28791/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28791/testReport/ |
| Max. process+thread count | 2982 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-15104) If block is not reported by any Datanode, the flag corrupt of BlockLocation should be marked as true.

2020-02-17 Thread Yang Yun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038421#comment-17038421
 ] 

Yang Yun commented on HDFS-15104:
-

For the block, it's missing. But for the file, it's corrupt as fsck show.

I open this for it's different behavior between 3.3 and 2.6, the code of 2.6 is 
as below.  it thinks missing in 2.6 code. Did we intentionally modify its 
behavior?

final boolean isCorrupt = numCorruptNodes == numNodes;

 

 

> If block is  not reported by any Datanode, the flag corrupt of BlockLocation 
> should be marked as true.
> --
>
> Key: HDFS-15104
> URL: https://issues.apache.org/jira/browse/HDFS-15104
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Major
> Attachments: HDFS-15104.patch
>
>
> The flag corrupt of BlockLocation returned from getFileBlockLocations() is 
> not marked true even the block is not reported by any Datanode( the hosts is 
> empty).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-17 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15149:
---
Attachment: HDFS-15149-001.patch
Status: Patch Available  (was: Open)

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15149-001.patch
>
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
> "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:552)
> at java.util.TimerThread.run(Timer.java:505)
> "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
>  daemon prio=5 tid=385 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
> at java.lang.Thread.run(Thread.java:748)
> "qtp164757726-349" daemon prio=5 tid=349 runnable
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at 

[jira] [Commented] (HDFS-15104) If block is not reported by any Datanode, the flag corrupt of BlockLocation should be marked as true.

2020-02-17 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038373#comment-17038373
 ] 

Ayush Saxena commented on HDFS-15104:
-

How can u say that block is corrupt, if no datanode has reported till that 
time, even if you send back as corrupt here, the JMX won't be showing it as 
corrupt the FSCK won't be showing as corrupt, Don't think it is good to 
conclude that the block is corrupt without actually knowing it.
This seems to be a missing block scenario, not corrupt block

> If block is  not reported by any Datanode, the flag corrupt of BlockLocation 
> should be marked as true.
> --
>
> Key: HDFS-15104
> URL: https://issues.apache.org/jira/browse/HDFS-15104
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Major
> Attachments: HDFS-15104.patch
>
>
> The flag corrupt of BlockLocation returned from getFileBlockLocations() is 
> not marked true even the block is not reported by any Datanode( the hosts is 
> empty).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15167) Block Report Interval shouldn't be reset apart from first Block Report

2020-02-17 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038360#comment-17038360
 ] 

Ayush Saxena commented on HDFS-15167:
-

Thanx [~elgoiri] for the review.
Added comments as suggested.


> Block Report Interval shouldn't be reset apart from first Block Report
> --
>
> Key: HDFS-15167
> URL: https://issues.apache.org/jira/browse/HDFS-15167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15167-01.patch, HDFS-15167-02.patch, 
> HDFS-15167-03.patch, HDFS-15167-04.patch, HDFS-15167-05.patch, 
> HDFS-15167-06.patch, HDFS-15167-07.patch
>
>
> Presently BlockReport interval is reset even in case the BR is manually 
> triggered or BR is triggered for diskError.
> Which isn't required. As per the comment also, it is intended for first BR 
> only :
> {code:java}
>   // If we have sent the first set of block reports, then wait a random
>   // time before we start the periodic block reports.
>   if (resetBlockReportTime) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15167) Block Report Interval shouldn't be reset apart from first Block Report

2020-02-17 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15167:

Attachment: HDFS-15167-07.patch

> Block Report Interval shouldn't be reset apart from first Block Report
> --
>
> Key: HDFS-15167
> URL: https://issues.apache.org/jira/browse/HDFS-15167
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-15167-01.patch, HDFS-15167-02.patch, 
> HDFS-15167-03.patch, HDFS-15167-04.patch, HDFS-15167-05.patch, 
> HDFS-15167-06.patch, HDFS-15167-07.patch
>
>
> Presently BlockReport interval is reset even in case the BR is manually 
> triggered or BR is triggered for diskError.
> Which isn't required. As per the comment also, it is intended for first BR 
> only :
> {code:java}
>   // If we have sent the first set of block reports, then wait a random
>   // time before we start the periodic block reports.
>   if (resetBlockReportTime) {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-02-17 Thread Yicong Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai updated HDFS-15175:
--
Description: 
 
{panel:title=Crash exception}
2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
[length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
txid=32625024993]
 java.io.IOException: File is not under construction: ..
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:360)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
 at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
{panel}
 
{panel:title=Editlog}

 OP_REASSIGN_LEASE
 
 32625021150
 DFSClient_NONMAPREDUCE_-969060727_197760
 ..
 DFSClient_NONMAPREDUCE_1000868229_201260
 
 

..


 OP_CLOSE
 
 32625023743
 0
 0
 ..
 3
 1581816135883
 1581814760398
 536870912
 
 
 false
 
 5568434562
 185818644
 4495417845
 
 
 da_music
 hdfs
 416
 
 
 

..


 OP_TRUNCATE
 
 32625024049
 ..
 DFSClient_NONMAPREDUCE_1000868229_201260
 ..
 185818644
 1581816136336
 
 5568434562
 185818648
 4495417845
 
 
 

..


 OP_CLOSE
 
 32625024993
 0
 0
 ..
 3
 1581816138774
 1581814760398
 536870912
 
 
 false
 
 5568434562
 185818644
 4495417845
 
 
 da_music
 hdfs
 416
 
 
 
{panel}
 

 

The block size should be 185818648 in the first CloseOp. When truncate is used, 
the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
synchronized to the JournalNode in the same batch. The block used by CloseOp 
twice is the same instance, which causes the first CloseOp has wrong block 
size. When SNN rolling Editlog, TruncateOp does not make the file to the 
UnderConstruction state. Then, when the second CloseOp is executed, the file is 
not in the UnderConstruction state, and SNN crashes.

  was:
 
{panel:title=Crash exception}
2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
[length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
txid=32625024993]
java.io.IOException: File is not under construction: ..
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
at 

[jira] [Created] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-02-17 Thread Yicong Cai (Jira)
Yicong Cai created HDFS-15175:
-

 Summary: Multiple CloseOp shared block instance causes the standby 
namenode to crash when rolling editlog
 Key: HDFS-15175
 URL: https://issues.apache.org/jira/browse/HDFS-15175
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.9.2
Reporter: Yicong Cai
Assignee: Yicong Cai


 
{panel:title=Crash exception}
2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
[length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
txid=32625024993]
java.io.IOException: File is not under construction: ..
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
{panel}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15173) RBF: Delete repeated configuration 'dfs.federation.router.metrics.enable'

2020-02-17 Thread panlijie (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038177#comment-17038177
 ] 

panlijie commented on HDFS-15173:
-

Thanks for your review ! [~aajisaka]

> RBF: Delete repeated configuration 'dfs.federation.router.metrics.enable'
> -
>
> Key: HDFS-15173
> URL: https://issues.apache.org/jira/browse/HDFS-15173
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation, rbf
>Affects Versions: 3.1.1, 3.2.1
>Reporter: panlijie
>Assignee: panlijie
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> In The HDFS RBF default config hdfs-rbf-default.xml, The configuration 
> contains two repeated configurations, 'dfs.federation.router.metrics.enable' 
> appears twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org