[jira] [Commented] (HDFS-16200) Improve NameNode failover

2021-09-15 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415781#comment-17415781
 ] 

Aihua Xu commented on HDFS-16200:
-

[~hexiaoqiao]  Thanks for checking. Regarding improving topology resolution 
performance, there is TableMapping with precomputed topology info but you need 
to know the list of the hosts and precompute the topology. We can convert the 
script into a build-in implementation, but I believe we will still hit some 
slowness there. 
For our particular case, we don't colocate storage with computing and the 
failover has been improved from over 10 minutes to just seconds by disabling 
it. Right now there are more cases to separate storage and computing. Should we 
have a global configuration to optimize for those cases?


> Improve NameNode failover
> -
>
> Key: HDFS-16200
> URL: https://issues.apache.org/jira/browse/HDFS-16200
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namanode
>Affects Versions: 2.8.2
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In a busy cluster, we are noticing the NameNode failover takes longer time 
> (over 10 minutes) and it causes cluster down time during the time period.
> One bottleneck locates in resolving the client host's topology when the 
> cluster is not colocated with the computing hosts. NameNode resolves the 
> client host's topology and uses it to sort the hosts where the blocks locate 
> in. Such topology will be cached so the next access will be efficient, while 
> if the standby NameNode is newly restarted, then all the client hosts, e.g., 
> YARN hosts need to be resolved.
> Solutions can be: 1) we can expose an API in DFSAdmin to load topology cache, 
> or 2) we can add a new configuration in HDFS cluster to skip resolving 
> topology for non-colocated HDFS cluster. Since client hosts and HDFS hosts 
> are not colocated, it's unnecessary to sort the DataNodes for the clients.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16200) Improve NameNode failover

2021-09-07 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411600#comment-17411600
 ] 

Hadoop QA commented on HDFS-16200:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
0s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  
1s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 2 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m 
55s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
19s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  8s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 20s{color} 
| 
[/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/2/artifact/out/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt]
 | {color:red} 
hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with 
JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 468 unchanged - 0 
fixed = 469 total (was 468) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m  9s{color} 
| 
[/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/2/artifact/out/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt]
 | {color:red} 
hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 1 new 
+ 452 unchanged - 0 fixed = 453 total (was 452) {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 55s{color} | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt]
 | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 10 

[jira] [Commented] (HDFS-16200) Improve NameNode failover

2021-09-05 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410148#comment-17410148
 ] 

Xiaoqiao He commented on HDFS-16200:


Thanks [~aihuaxu] for your report. I think the best way is to improve resolve 
and rack ware performance rather than disable it directly. FYI.

> Improve NameNode failover
> -
>
> Key: HDFS-16200
> URL: https://issues.apache.org/jira/browse/HDFS-16200
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namanode
>Affects Versions: 2.8.2
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In a busy cluster, we are noticing the NameNode failover takes longer time 
> (over 10 minutes) and it causes cluster down time during the time period.
> One bottleneck locates in resolving the client host's topology when the 
> cluster is not colocated with the computing hosts. NameNode resolves the 
> client host's topology and uses it to sort the hosts where the blocks locate 
> in. Such topology will be cached so the next access will be efficient, while 
> if the standby NameNode is newly restarted, then all the client hosts, e.g., 
> YARN hosts need to be resolved.
> Solutions can be: 1) we can expose an API in DFSAdmin to load topology cache, 
> or 2) we can add a new configuration in HDFS cluster to skip resolving 
> topology for non-colocated HDFS cluster. Since client hosts and HDFS hosts 
> are not colocated, it's unnecessary to sort the DataNodes for the clients.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16200) Improve NameNode failover

2021-08-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407748#comment-17407748
 ] 

Hadoop QA commented on HDFS-16200:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
20s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  
1s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 1 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m 
58s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
18s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 57s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  1m 
10s{color} | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/1/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt]
 | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
16s{color} | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt]
 | {color:red} hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 16s{color} 
| 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt]
 | {color:red} hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m  
9s{color} | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt]
 | {color:red} hadoop-hdfs in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m  9s{color} 
| 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3364/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt]
 | {color:red} hadoop-hdfs in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10. {color} |
|