[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162200#comment-17162200 ] Hadoop QA commented on HDFS-12200: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 16s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 23s{color} | {color:red} hadoop-common-project/hadoop-common in trunk has 2 extant findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 13s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 4 extant findbugs warnings. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 14m 0s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 39s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 55s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 5s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}235m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestHdfsConfigFields | | | hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader | | |
[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201192#comment-16201192 ] Hadoop QA commented on HDFS-12200: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 27s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 13s{color} | {color:orange} root: The patch generated 1 new + 456 unchanged - 0 fixed = 457 total (was 456) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 8m 51s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 41s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 53s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}188m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:3d04c00 | | JIRA Issue | HDFS-12200 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879096/HDFS-12200-003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 3aed150cd705 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200851#comment-16200851 ] Hanisha Koneru commented on HDFS-12200: --- Hi [~yangjiandan]] If I understand correctly, you want to add an option to disable rack lookup for uncached hosts, right? This patch would optimize rack lookup for an unusual architecture. Generally HDFS and YARN are deployed on the same machines. As such, I think the use case might be very limited. Also, can you please update the Jira title to reflect that this is for adding an option to disable rack lookup for uncached hosts. Currently, when a DN registers, the DNS to Switch mapping is resolved during the registration process itself. With this change and with resolve-non-cached-host set to false, the rack resolution for new DN will be skipped during registration. This might cause the rack new DN's rack resolution to be incorrectly cached as default in the following 2 cases: - a new DN is added, rack mapping script is updated, and the DN registers before refreshNodes is called: The rack will be resolved to default during DN registration. And since it has already been resolved, it would not be updated during refreshNodes. - a new DN is added, refreshNodes is called and then the rack mapping script is updated: In this case too, the mapping for the new DN would be updated with default instead of the correct mapping. > Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization > --- > > Key: HDFS-12200 > URL: https://issues.apache.org/jira/browse/HDFS-12200 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jiandan Yang >Assignee: Jiandan Yang > Attachments: HDFS-12200-001.patch, HDFS-12200-002.patch, > HDFS-12200-003.patch, cpu_ utilization.png, nn_thread_num.png > > > 1. Background : > Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to > 600+ machines, YARN is deployed to another machine pool. > We found that sometimes NameNode cpu utilization rate of 90% or even 100%. > The most serious is cpu utilization rate of 100% for a long time result in > writing journalNode timeout, eventually leading to NameNode hang up. The > reason is offline tasks running in a few hundred servers access HDFS at the > same time, NameNode resolve rack of client machine, started several hundreds > to two thousand sub-process. > {code:java} > "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x7fe270a31800 > nid=0x38d93 runnable [0x7fcdc36fc000] >java.lang.Thread.State: RUNNABLE > at java.lang.UNIXProcess.waitForProcessExit(Native Method) > at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301) > at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834 > {code} > Our configuration as follows: > {code:java} > net.topology.node.switch.mapping.impl = ScriptBasedMapping, > net.topology.script.file.name = 'a python script' > {code} > 2. Optimization > In order to solve these two problems, we have optimized the > CachedDNSToSwitchMapping > (1) Added the DataNode IP list to the file of dfs.hosts configured. when > NameNode starts it preloads DataNode rack information to the cache, get a > batch of racks of hosts when running script once (the corresponding > configuration is net.topology.script.number,the default value of 100) > (2) Step (1) has ensured that the cache has all the DataNodes’ rack, so if > the cache did not hit, then the host must be a client machine, then directly > return /default-rack, > (3) Each time you add new DataNodes you need to add the new DataNodes’ IP > address to the file specified by dfs.hosts, and then run command of bin/hdfs > dfsadmin -refreshNodes, it will put the newly added DataNodes’ rack into cache > (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, > the value is false to open the above function, and the value is true to turn > off the above functions, default value is true to keep compatibility -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102989#comment-16102989 ] Jiandan Yang commented on HDFS-12200: -- [~brahmareddy] Please help me review it. Thanks a log. > Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization > --- > > Key: HDFS-12200 > URL: https://issues.apache.org/jira/browse/HDFS-12200 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jiandan Yang >Assignee: Jiandan Yang > Attachments: cpu_ utilization.png, HDFS-12200-001.patch, > HDFS-12200-002.patch, HDFS-12200-003.patch, nn_thread_num.png > > > 1. Background : > Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to > 600+ machines, YARN is deployed to another machine pool. > We found that sometimes NameNode cpu utilization rate of 90% or even 100%. > The most serious is cpu utilization rate of 100% for a long time result in > writing journalNode timeout, eventually leading to NameNode hang up. The > reason is offline tasks running in a few hundred servers access HDFS at the > same time, NameNode resolve rack of client machine, started several hundreds > to two thousand sub-process. > {code:java} > "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x7fe270a31800 > nid=0x38d93 runnable [0x7fcdc36fc000] >java.lang.Thread.State: RUNNABLE > at java.lang.UNIXProcess.waitForProcessExit(Native Method) > at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301) > at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834 > {code} > Our configuration as follows: > {code:java} > net.topology.node.switch.mapping.impl = ScriptBasedMapping, > net.topology.script.file.name = 'a python script' > {code} > 2. Optimization > In order to solve these two problems, we have optimized the > CachedDNSToSwitchMapping > (1) Added the DataNode IP list to the file of dfs.hosts configured. when > NameNode starts it preloads DataNode rack information to the cache, get a > batch of racks of hosts when running script once (the corresponding > configuration is net.topology.script.number,the default value of 100) > (2) Step (1) has ensured that the cache has all the DataNodes’ rack, so if > the cache did not hit, then the host must be a client machine, then directly > return /default-rack, > (3) Each time you add new DataNodes you need to add the new DataNodes’ IP > address to the file specified by dfs.hosts, and then run command of bin/hdfs > dfsadmin -refreshNodes, it will put the newly added DataNodes’ rack into cache > (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, > the value is false to open the above function, and the value is true to turn > off the above functions, default value is true to keep compatibility -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102983#comment-16102983 ] Jiandan Yang commented on HDFS-12200: -- ut and findbugs error is not introduced by this patch > Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization > --- > > Key: HDFS-12200 > URL: https://issues.apache.org/jira/browse/HDFS-12200 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jiandan Yang >Assignee: Jiandan Yang > Attachments: cpu_ utilization.png, HDFS-12200-001.patch, > HDFS-12200-002.patch, HDFS-12200-003.patch, nn_thread_num.png > > > 1. Background : > Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to > 600+ machines, YARN is deployed to another machine pool. > We found that sometimes NameNode cpu utilization rate of 90% or even 100%. > The most serious is cpu utilization rate of 100% for a long time result in > writing journalNode timeout, eventually leading to NameNode hang up. The > reason is offline tasks running in a few hundred servers access HDFS at the > same time, NameNode resolve rack of client machine, started several hundreds > to two thousand sub-process. > {code:java} > "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x7fe270a31800 > nid=0x38d93 runnable [0x7fcdc36fc000] >java.lang.Thread.State: RUNNABLE > at java.lang.UNIXProcess.waitForProcessExit(Native Method) > at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301) > at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834 > {code} > Our configuration as follows: > {code:java} > net.topology.node.switch.mapping.impl = ScriptBasedMapping, > net.topology.script.file.name = 'a python script' > {code} > 2. Optimization > In order to solve these two problems, we have optimized the > CachedDNSToSwitchMapping > (1) Added the DataNode IP list to the file of dfs.hosts configured. when > NameNode starts it preloads DataNode rack information to the cache, get a > batch of racks of hosts when running script once (the corresponding > configuration is net.topology.script.number,the default value of 100) > (2) Step (1) has ensured that the cache has all the DataNodes’ rack, so if > the cache did not hit, then the host must be a client machine, then directly > return /default-rack, > (3) Each time you add new DataNodes you need to add the new DataNodes’ IP > address to the file specified by dfs.hosts, and then run command of bin/hdfs > dfsadmin -refreshNodes, it will put the newly added DataNodes’ rack into cache > (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, > the value is false to open the above function, and the value is true to turn > off the above functions, default value is true to keep compatibility -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102753#comment-16102753 ] Hadoop QA commented on HDFS-12200: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 47s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 8s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 7s{color} | {color:orange} root: The patch generated 1 new + 449 unchanged - 0 fixed = 450 total (was 449) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 19s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 32s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}145m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ha.TestZKFailoverController | | | hadoop.net.TestDNS | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-12200 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879096/HDFS-12200-003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux e12d7c14f56f 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | |
[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102642#comment-16102642 ] Jiandan Yang commented on HDFS-12200: -- upload HDFS-12200-003.patch, fix ut and checkstyle > Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization > --- > > Key: HDFS-12200 > URL: https://issues.apache.org/jira/browse/HDFS-12200 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jiandan Yang >Assignee: Jiandan Yang > Attachments: cpu_ utilization.png, HDFS-12200-001.patch, > HDFS-12200-002.patch, HDFS-12200-003.patch, nn_thread_num.png > > > 1. Background : > Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to > 600+ machines, YARN is deployed to another machine pool. > We found that sometimes NameNode cpu utilization rate of 90% or even 100%. > The most serious is cpu utilization rate of 100% for a long time result in > writing journalNode timeout, eventually leading to NameNode hang up. The > reason is offline tasks running in a few hundred servers access HDFS at the > same time, NameNode resolve rack of client machine, started several hundreds > to two thousand sub-process. > {code:java} > "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x7fe270a31800 > nid=0x38d93 runnable [0x7fcdc36fc000] >java.lang.Thread.State: RUNNABLE > at java.lang.UNIXProcess.waitForProcessExit(Native Method) > at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301) > at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834 > {code} > Our configuration as follows: > {code:java} > net.topology.node.switch.mapping.impl = ScriptBasedMapping, > net.topology.script.file.name = 'a python script' > {code} > 2. Optimization > In order to solve these two problems, we have optimized the > CachedDNSToSwitchMapping > (1) Added the DataNode IP list to the file of dfs.hosts configured. when > NameNode starts it preloads DataNode rack information to the cache, get a > batch of racks of hosts when running script once (the corresponding > configuration is net.topology.script.number,the default value of 100) > (2) Step (1) has ensured that the cache has all the DataNodes’ rack, so if > the cache did not hit, then the host must be a client machine, then directly > return /default-rack, > (3) Each time you add new DataNodes you need to add the new DataNodes’ IP > address to the file specified by dfs.hosts, and then run command of bin/hdfs > dfsadmin -refreshNodes, it will put the newly added DataNodes’ rack into cache > (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, > the value is false to open the above function, and the value is true to turn > off the above functions, default value is true to keep compatibility -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101594#comment-16101594 ] Hadoop QA commented on HDFS-12200: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 44s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 16s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 58s{color} | {color:orange} root: The patch generated 15 new + 449 unchanged - 0 fixed = 464 total (was 449) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 7s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.net.TestDNS | | | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-12200 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878967/HDFS-12200-002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3f4a5266f42b 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a92bf39 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | checkstyle |