[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578504#comment-16578504 ] Allen Wittenauer commented on HDFS-12711: - bq. Was a follow up jira filed for this work? (and if so, which one was chosen) Nope. Few others seemed to care; patches go in regardless of what Jenkins says and/or how they may impact the build negatively. Yetus 0.7.0 and bumping up the surefire version (at least in trunk) stopped hadoop from crashing ASF Jenkins build nodes. It's still horribly broken, just less obviously so. branch-2 nightlies were turned off months ago since they were failing at such a high level to be pointless. I don't think anyone really pays attention to the trunk nightlies so they should probably be turned off too. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578258#comment-16578258 ] Ewan Higgs commented on HDFS-12711: --- {quote}2. hadoop-hdfs-project either needs to get refactored into multiple maven modules or the simultaneous thread counts need to get greatly reduced. e.g., just changing the unit test's DN RPC thread count may work. {quote} Was a follow up jira files for this work? Thanks > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259536#comment-16259536 ] Allen Wittenauer commented on HDFS-12711: - FYI HADOOP-13514. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257668#comment-16257668 ] Allen Wittenauer commented on HDFS-12711: - Doing some quick math, my estimation is that we received 730 test results out of ~3000. So yes, we lost 75% of the test results in that run. HDFS-12731's https://builds.apache.org/job/PreCommit-HDFS-Build/22132/ run only dropped ~34%. So hey, that's an improvement... > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257558#comment-16257558 ] Allen Wittenauer commented on HDFS-12711: - bq. We usually try to rerun the failed tests locally to check if they are related to the patch. I think this may be the key as to why I don't think enough people are in panic mode. Let's take Erik's log as an example. It's from HDFS-12823. Precommit reported ~20 tests that either failed or timed out. It reaped 20 excess surefire jvms after mvn returned. The asflicense check came back with 130 dump log files. Those 130 dump log files in almost every case I looked at were not reported to surefire. That means that we're probably looking at a minimum of 150 tests failed, not 20. Given that those 120 broken JVMs likely had more than 1 test... We're basically dropping a very large percentage (maybe even the majority) of test results on the ground. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257363#comment-16257363 ] Arpit Agarwal commented on HDFS-12711: -- I don't think committers are ignoring precommit run results. We usually try to rerun the failed tests locally to check if they are related to the patch. Being a manual process, it is time consuming and error-prone. I also don't think anyone is happy about the current situation. Over time we have come to rely on complex MiniDFSCluster-based tests over real unit tests and these tend to be flaky and fail in hard-to-debug ways. We probably need a community effort to revamp our unit tests. This will also require extensive refactoring of existing classes to make them unit-testable, a risky task in itself. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256584#comment-16256584 ] Allen Wittenauer commented on HDFS-12711: - Ignoring the hs_err_pid log files is pretty much just sticking our collective heads in the sand about actual, real problems with the unit tests. The unit tests themselves haven't been rock solid for a very long time, even before all of this start happening. Entries have been put into the ignore pile so often that I wouldn't be surprised if the community is already at the point that most developers are ignoring precommit. (e.g., commits with findbugs reported in the issues, javadoc compilation failures being treated as "environmental", etc, etc.) If I were actually paying more attention to day-to-day Hadoop bits these days, I'd probably be ready to disable unit tests (at least HDFS) to specifically avoid the "cried wolf" condition. The rest of the precommit tests work properly the vast majority of the time and are probably more important given the current state of things. (Never mind the massive speed up. QBT is hitting the 15 hour mark for a full run for branch-2 when it is actually allowed to complete.) No one seems to actually care that the unit tests are a broken mess and I doubt they'd be missed. My goal here was to prevent Hadoop from bringing down the rest of the ASF build infrastructure. It's under enough stress without this project making things that much worse. Achievement unlocked and other Yetus users will pick up those new safety features in the next release. I should probably close this JIRA issue. Unless someone else plans to spend some effort on these bugs? At least at this point in time, I view my work here as complete. Also: {code} /build/ {code} ARGH. That hasn't been valid since Hadoop used ant. A great example of "well, if we ignore it, it doesn't exist, right?" Because anything that is still using /build/ almost certainly isn't safe for parallel tests and likely contributing to a whole host of problems. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256368#comment-16256368 ] Erik Krogen commented on HDFS-12711: Thanks Sean. Agreed that it is not really a big issue but it does make it more likely for a developer to miss an actual license violation (a "QA bot cried wolf" situation). It seems maybe it would make more sense for the {{hs_err_pid*.log}} files to appear in an already-excluded area, like within {{/build/}}, to represent their transient nature. I assume their location should be configurable in some way? > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256295#comment-16256295 ] Erik Krogen commented on HDFS-12711: Yeah so although we obviously need to fix the unit tests, the license checker also shouldn't be picking up those temp output files in the meantime, right? > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256283#comment-16256283 ] Allen Wittenauer commented on HDFS-12711: - It's probably also worth pointing out that those files also represent tests that weren't actually executed. So they aren't recorded in the fail/success output. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256275#comment-16256275 ] Allen Wittenauer commented on HDFS-12711: - Those files are the stack dumps from the unit tests that ran out of resources. Fix the unit tests, those files go away. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256166#comment-16256166 ] Erik Krogen commented on HDFS-12711: Hey [~aw], in addition to the wild fluctuations in success of HDFS unit tests (not your fault, but unfortunate) I'm seeing lots of false license violations caused by these changes, e.g.: https://builds.apache.org/job/PreCommit-HDFS-Build/22122/artifact/out/patch-asflicense-problems.txt Can we do something to solve that? > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251809#comment-16251809 ] Allen Wittenauer commented on HDFS-12711: - With the kill code in place, I'm seeing wild fluctuations in hdfs and mr unit tests. Lots of unreaped processes. Probably a hint that they are paused for some reason. I have a hunch that we're pretty much bottlenecked on IO. Tests happen on a single disk that is shared among all the executors on that jenkins node. Let's say 2xHDFS tests are running, that could easily be thousands of threads doing IO to the same disk. It might be smart to decrease the # of parallel tests, at least in HDFS. This obviously impacts runtime (which is already out of control) but will probably increase accuracy. Or, we could attempt to split up the tests such that compute heavy get done in parallel, IO heavy get done serial. Of course, if no one is paying attention to the tests anyway, we could just disable them altogether I guess. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249035#comment-16249035 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s{color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 13s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 48s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 36s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hadoop-nfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-common:1 | | Timed out junit tests | org.apache.hadoop.log.TestLogLevel | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895190/fakepatch.branch-2.txt | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux 946a92c520e9 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / c153bed | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_151 | | xml | https://builds.apache.org/job/PreCommit-HDFS-Build2/12/artifact/out/xml.txt | | Unreaped Processes Log | https://builds.apache.org/job/PreCommit-HDFS-Build2/12/artifact/out/patch-unit-hadoop-common-project_hadoop-common-reaper.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build2/12/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt | | Test
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249032#comment-16249032 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s{color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 10s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 49s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 1s{color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 25s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hadoop-nfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-common:1 | | Timed out junit tests | org.apache.hadoop.log.TestLogLevel | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895190/fakepatch.branch-2.txt | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux 401f661e60f0 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / c153bed | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_151 | | xml | https://builds.apache.org/job/PreCommit-HDFS-Build2/11/artifact/out/xml.txt | | Unreaped Processes Log | https://builds.apache.org/job/PreCommit-HDFS-Build2/11/artifact/out/patch-unit-hadoop-common-project_hadoop-common-reaper.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build2/11/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt | | Test
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238181#comment-16238181 ] stack commented on HDFS-12711: -- bq. For now, though, I'm sort of tired at looking at this problem and will go work on something else for a while. Thanks for putting Hadoop in a box. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238107#comment-16238107 ] Allen Wittenauer commented on HDFS-12711: - Thanks! I'll have to play around with sending a SIGQUIT. The other thing is that some process types may need different types of signals. It might be useful to be able to define the "signal path"... e.g., surefire processes get QUIT -> TERM -> KILL. I know the other thing is for archiver to save off the stack trace logs (hs_err_pidXX.log files) we do get. That's just a settings thing that I've been too busy to setup in Jenkins. For now, though, I'm sort of tired at looking at this problem and will go work on something else for a while. It's at the point that the issues are firmly contained from ASF build infra perspective and rests solely in the hands of the Hadoop community to fix their unit tests (or even base code) to be less broken. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238056#comment-16238056 ] stack commented on HDFS-12711: -- This is excellent work. Would a kill -QUIT before you do actual kill of the errant processes be of use? It'd do a dump of stack trace before process goes away (processes might not be connected to stdout/stderr anymore?). Thanks. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237685#comment-16237685 ] Allen Wittenauer commented on HDFS-12711: - Finally got a full qbt run on branch-2, thanks to YETUS-561 and YETUS-570: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/27/console branch-2 is still a broken mess (those test times! argh!), but at least it won't kill nodes anymore. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226299#comment-16226299 ] Allen Wittenauer commented on HDFS-12711: - For those playing at home: YETUS-570 (in development) changes how precommit handles tests. It will now seek out and kill processes that match a certain pattern after unit tests are run. It reports the number that it had to kill: | Stuck Test Processes | hadoop-hdfs-project/hadoop-hdfs:21 | and generates a log to show which processes those were: | Stuck Test Processes Log | https://builds.apache.org/job/PreCommit-HDFS-Build2/9/artifact/out/reaper-hadoop-hdfs-project_hadoop-hdfs.log | It's supposed to do this after every individual module, but I've got a (simple) bug to fix first. In any case, this should help give a metric as to just how broken a particular set of tests actually are. Hopefully at some point we'll have the logic to pinpoint it to individual tests, but using the actual unit test log should be pretty helpful. It's also worth pointing out that this change will also help the full qbt to actually complete. But I need to fix that bug first. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226133#comment-16226133 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 20m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 17s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 4m 42s{color} | {color:red} The patch generated 322 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Stuck Test Processes | hadoop-hdfs-project/hadoop-hdfs:21 | | Failed junit tests | hadoop.fs.TestUnbuffer | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.TestDFSClientFailover | | | org.apache.hadoop.hdfs.web.TestWebHdfsTokens | | | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream | | | org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade | | | org.apache.hadoop.hdfs.TestFileAppendRestart | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter | | | org.apache.hadoop.hdfs.TestDFSMkdirs | | | org.apache.hadoop.hdfs.TestDFSOutputStream | | | org.apache.hadoop.hdfs.TestDatanodeReport | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | | | org.apache.hadoop.hdfs.TestDistributedFileSystem | | | org.apache.hadoop.hdfs.web.TestWebHDFSForHA | | | org.apache.hadoop.hdfs.web.TestHftpFileSystem | | | org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication | | | org.apache.hadoop.hdfs.TestDFSShell | | | org.apache.hadoop.hdfs.web.TestWebHDFSAcl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893963/HDFS-12711.branch-2.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux 9ec3abba2a98 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 76ec5ea | | maven | version: Apache
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225562#comment-16225562 ] Sean Busbey commented on HDFS-12711: interesting. what's the spread on surefire versions for hadoops? If we are triggering things in HBase those builds all use a surefire version that includes that fix, with the exception of hbase branch-1.1. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224389#comment-16224389 ] Allen Wittenauer commented on HDFS-12711: - SUREFIRE-524 ? > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224146#comment-16224146 ] Allen Wittenauer commented on HDFS-12711: - I've got a hypothesis. Somewhere in the HDFS code base is a try/catch block that is effectively ignoring system exceptions. A test times out and surefire sends (probably) a SIGINT. The try/catch grabs the exception and tosses it to the side, all the while eating CPU and IO. This situation makes more tests time out. surefire sends more SIGINTs which also either get ignored or never "make it" to the process due to CPU being scarce. Surefire, thinking that those were received, fires off even more tests ... This pattern continues until eventually there is nothing left for surefire and/or maven to die on its own, leaving lots of unreaped children, doing nothing but destroying the box. One thing has been bothering me. Why are projects like HBase that are using openjdk7 + some form of branch-2 code base not seeing these problems? What if the code path was a less frequently traveled one? A feature that isn't heavily used. For the vast majority of committers testing a release, it's probably not even tested "for reals", never mind in a hostile environment where CPU, IO, whatever is scarce. But the HDFS unit tests (and maybe the MR unit tests) would almost certainly hit that path, probably several times over. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223845#comment-16223845 ] Allen Wittenauer commented on HDFS-12711: - YETUS-570-wip-01 w/minor output fix test run. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223483#comment-16223483 ] Allen Wittenauer commented on HDFS-12711: - https://issues.apache.org/jira/browse/INFRA-15373?focusedCommentId=16223481=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16223481 > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223251#comment-16223251 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 20s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 2m 40s{color} | {color:red} The patch generated 183 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestSecureEncryptionZoneWithKMS | | | hadoop.hdfs.TestCrcCorruption | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestMaintenanceState | | | org.apache.hadoop.hdfs.TestSetrepIncreasing | | | org.apache.hadoop.hdfs.TestRollingUpgradeDowngrade | | | org.apache.hadoop.hdfs.TestHDFSFileSystemContract | | | org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage | | | org.apache.hadoop.hdfs.TestFileCreationDelete | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter | | | org.apache.hadoop.hdfs.TestBlockStoragePolicy | | | org.apache.hadoop.hdfs.TestDFSOutputStream | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.TestAppendSnapshotTruncate | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.TestRollingUpgradeRollback | | | org.apache.hadoop.hdfs.TestMiniDFSCluster | | | org.apache.hadoop.hdfs.web.TestWebHDFSForHA | | | org.apache.hadoop.hdfs.TestDFSShell | | | org.apache.hadoop.hdfs.TestDataTransferProtocol | | | org.apache.hadoop.hdfs.web.TestWebHDFSAcl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893963/HDFS-12711.branch-2.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux f9f5a922a080 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 9093ad6 | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_151 | | unit |
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223191#comment-16223191 ] Allen Wittenauer commented on HDFS-12711: - bq. Max. thread count 3114 5000 proc limit is looking to be a realistic number. It gives some wiggle room but shouldn't be too much to break the world unless things really go south or unlucky. Let's try the branch-2 HDFS fake patch and see what happens, now that I've got some metrics gathering. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223185#comment-16223185 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 144 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 21s{color} | {color:red} hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient generated 2 new + 95 unchanged - 48 fixed = 97 total (was 143) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 51s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: The patch generated 137 new + 2573 unchanged - 305 fixed = 2710 total (was 2878) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 26s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s{color} | {color:red} The patch generated 4 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapreduce.TestMapCollection | | | hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl | | Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12894259/MAPREDUCE-4980.016.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux 5ca15500a7c1 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8be5707 | | maven | version: Apache Maven 3.3.9 | |
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222841#comment-16222841 ] Allen Wittenauer commented on HDFS-12711: - Adding some better metrics gathering for YETUS-561 so we have some raw numbers to look at. Also working on some automatic child killing drones in YETUS-570. In the process of this, learned that while running trunk HDFS unit tests, thread count spikes up to ~4000 threads on their own. I think some tests or maybe even some core code need some major tuning. It's pretty easy to see that if those JVMs were to stall, the thread count is going to climb to ridiculous heights. In other news, all Yetus-based Hadoop jobs on Jenkins are now running this code with the process limit set to 5k. This is likely too few for mapreduce, given it's proclivity to bringing up multiple clusters in their entirety on a regular basis. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, MAPREDUCE-4980.016.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221741#comment-16221741 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 144 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 21s{color} | {color:red} hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient generated 2 new + 95 unchanged - 48 fixed = 97 total (was 143) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 52s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: The patch generated 137 new + 2573 unchanged - 305 fixed = 2710 total (was 2878) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 8s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s{color} | {color:red} The patch generated 4 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapreduce.TestMapCollection | | | hadoop.mapred.TestReduceFetch | | | hadoop.mapreduce.v2.TestMRJobs | | | hadoop.mapreduce.TestMRJobClient | | Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12894259/MAPREDUCE-4980.016.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux 7cd4aa4da144 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk /
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221701#comment-16221701 ] Allen Wittenauer commented on HDFS-12711: - 1500 is too strict. Trying 3000. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, MAPREDUCE-4980.016.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221685#comment-16221685 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 144 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 3m 48s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 23s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 22s{color} | {color:red} hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient generated 2 new + 95 unchanged - 48 fixed = 97 total (was 143) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: The patch generated 137 new + 2573 unchanged - 305 fixed = 2710 total (was 2878) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 32m 52s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 18s{color} | {color:red} The patch generated 4 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapreduce.TestLargeSort | | | hadoop.mapreduce.v2.TestRMNMInfo | | | hadoop.mapreduce.v2.TestSpeculativeExecution | | | hadoop.mapreduce.v2.TestMRJobsWithProfiler | | | hadoop.mapreduce.TestMapCollection | | | hadoop.mapreduce.lib.output.TestJobOutputCommitter | | | hadoop.mapreduce.v2.TestUberAM | | | hadoop.mapreduce.v2.TestMRJobsWithHistoryService | | | hadoop.mapreduce.v2.TestMRJobs | | | hadoop.mapreduce.v2.TestNonExistentJob | | | hadoop.mapreduce.v2.TestMRAppWithCombiner | | | hadoop.mapreduce.v2.TestMROldApiJobs | | | hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities | | Timed out junit tests | org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue |
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221587#comment-16221587 ] Allen Wittenauer commented on HDFS-12711: - For reference later, two branch-2 patches running on HADOOP that have triggered HDFS unit tests * https://builds.apache.org/job/PreCommit-hadoop-Build/13585/ , H9 * https://builds.apache.org/job/PreCommit-hadoop-Build/13583/ , H1 > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, MAPREDUCE-4980.016.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221575#comment-16221575 ] Allen Wittenauer commented on HDFS-12711: - This is a patch that I know how well it performs in general. Usually one or two failures, highly threaded, etc, etc. So let's see if 2k proc limit is realistic. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, MAPREDUCE-4980.016.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221500#comment-16221500 ] Allen Wittenauer commented on HDFS-12711: - Some status, after taking part of the afternoon to do something besides look at this. a) It's clear that surefire and therefore both Jenkins and Yetus, doesn't by default report on tests that didn't launch during low memory conditions. This is a bit problematic: we could have had this problem for a very long time but we never would have known it. It'd be great if there was a way to ask surefire what is expected to run and then do a pre-/post- comparison. Something else to consider: there are likely lots and lots of projects that may not know they have tests failing in this way... b) That said, there are two things that we might be able to add to Yetus that might help: some sort of reporting of JVMs that are still running after maven exits and killing leftover Java processes in between module unit testing. Obviously, this would only be available when running under Docker. c) This patch is working well enough that I've reconfigured Precommit-HADOOP-build to use my private Yetus branch with this patch. This is primarily to give me more general usability data. I've set the proclimit to be 5000 and mem limit to be 20g (that's slightly less than half of the node's full memory size). proclimit is likely too high but should give us a lot of wiggle room to tune downwards. d) As part of deploying this version of Yetus, I also took the opportunity to reconfigure the job to be laid out in a more sane manner as well as setting it to run on all of the Hadoop build nodes. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221229#comment-16221229 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 34s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 30m 49s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 14s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestHftpFileSystem | | Timed out junit tests | org.apache.hadoop.hdfs.TestBlockStoragePolicy | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.TestRollingUpgradeRollback | | | org.apache.hadoop.hdfs.web.TestWebHDFSAcl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893963/HDFS-12711.branch-2.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux 9c99ee79afc9 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 4678315 | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_151 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build2/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build2/3/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build2/3/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221083#comment-16221083 ] Allen Wittenauer commented on HDFS-12711: - Note to self: Kicking off another run with proclimit set to 1500. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221077#comment-16221077 ] Allen Wittenauer commented on HDFS-12711: - Success! One interesting thing: surefire is a liar. A lot more tests failed than what it reported. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221076#comment-16221076 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 31m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestFileStatus | | | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | hadoop.hdfs.web.TestWebHDFSAcl | | | hadoop.hdfs.TestDFSMkdirs | | | hadoop.hdfs.TestSnapshotCommands | | | hadoop.hdfs.TestRollingUpgradeRollback | | | hadoop.hdfs.TestTrashWithEncryptionZones | | | hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | | hadoop.hdfs.TestBlockStoragePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 | | JIRA Issue | HDFS-12711 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893963/HDFS-12711.branch-2.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux 1ad1a0b77a83 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 4678315 | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_151 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build2/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build2/2/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build2/2/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221069#comment-16221069 ] Allen Wittenauer commented on HDFS-12711: - https://builds.apache.org/job/PreCommit-HDFS-Build2/2 This is better or worse, depending upon your point of view. Many of these: {code} estDeleteEZWithMultipleUsers(org.apache.hadoop.hdfs.TestTrashWithEncryptionZones) Time elapsed: 5.134 sec <<< ERROR! java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:557) at io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:146) at io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:69) at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:272) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1986) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:1868) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1858) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1837) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1811) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1804) at org.apache.hadoop.hdfs.TestTrashWithEncryptionZones.teardown(TestTrashWithEncryptionZones.java:118) {code} which eventually turn into these: {code} /bin/sh: 1: Cannot fork /bin/sh: 1: Cannot fork /bin/sh: 1: Cannot fork /bin/sh: 1: Cannot fork /bin/sh: 1: Cannot fork /bin/sh: 1: Cannot fork {code} but that's not it's final form! Oh no, it then turns into reams and reams of this: {code} # # There is insufficient memory for the Java Runtime Environment to continue. # Cannot create GC thread. Out of system resources. # An error report file with more information is saved as: # /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/hs_err_pid20306.log # {code} Now we just have to wait and see if the walls put up were enough to stop H4 from crashing. (the 1k default process limit is *probably* a smidge too low, but not by too much.) > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219757#comment-16219757 ] Allen Wittenauer commented on HDFS-12711: - Ok, at least locally, YETUS-561.04 is successfully preventing things from running wild. It's pretty obvious at this point that something is very wrong with Hadoop code + OpenJDK 7. Now whether that is only caused by the unit tests or something actually in the core code base, I have no idea. I'll let someone else deal with that. As soon as Jenkins clears itself up, I'll try testing -04 with the Hadoop code base on the ASF boxes. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219641#comment-16219641 ] Allen Wittenauer commented on HDFS-12711: - lol yeah, that's... too many. OK, I know what I need to do to at least get this under control. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219618#comment-16219618 ] Íñigo Goiri commented on HDFS-12711: Almost a hundred java processes hanging around. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219587#comment-16219587 ] Allen Wittenauer commented on HDFS-12711: - See how many java processes are on that JDK7 node. Also, is that Oracle JDK7 or OpenJDK7? Thanks! > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219573#comment-16219573 ] Íñigo Goiri commented on HDFS-12711: [~aw] just to confirm, I just run the unit tests with OpenJDK 8 and everything worked in around 5 hours. The machine running the tests with OpenJDK 7 is still crawling after 3 days. So I'd say is an issue with Java 7. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219512#comment-16219512 ] Allen Wittenauer commented on HDFS-12711: - For reference: {code} $ free -h totalusedfree shared buff/cache available Mem: 9.7G9.3G157M9.0M265M 39M Swap:0B 0B 0B {code} I have the docker -m parameter set to 10g, just to see what would happen... > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219501#comment-16219501 ] Allen Wittenauer commented on HDFS-12711: - OK, I just hit it here. On my laptop's Linux VM, things start falling apart around 35 Java processes running, almost all of them fired off from surefire. Some things start to fail (I couldn't do a ps!). Eventually I can again and see what is happening. Only *one* process has been killed, even though the shell couldn't fork for ps. Given that my CPU is currently pegged (uptime says the loadavg is over 50), I'm guessing the same thing is happening on a bigger scale on the ASF build boxes. There aren't enough cycles on the CPU to run the OOM killer fast enough for it to kill things. Eventually, other stuff fails to launch due to insufficient memory. This eventually confuses the Jenkins agent and it all falls apart from there. It *eventually* gets over the hump, but by then it's too late. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219475#comment-16219475 ] Allen Wittenauer commented on HDFS-12711: - pono just provided this ps from one of the dead nodes: https://paste.apache.org/p/2D4I Umm, yeah, that's not good. java isn't reaping or there's a fork bomb in one of the unit tests. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219416#comment-16219416 ] Allen Wittenauer commented on HDFS-12711: - FYI: Reducing the number of nodes that HDFS, HADOOP, and Hadoop's QBT jobs may run on to reduce the impact to the rest of the ASF. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219365#comment-16219365 ] Allen Wittenauer commented on HDFS-12711: - Something else worth mentioning that I failed to earlier: that run was done without the patched version of Yetus. The goal was to reproduce with just the hdfs chunk running and that was accomplished. Now we just need to see if patched Yetus will protect everyone else. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219353#comment-16219353 ] Allen Wittenauer commented on HDFS-12711: - Re-launched the agent using the Jenkins UI. But now Jenkins doesn't appear to want to schedule *any* jobs. Just shoot me. ... In others news, a bit of "inside baseball" that I bet a lot of people don't know. When Yetus launches a docker container, Jenkins doesn't know how to kill it. It sends the equiv of ctrl-c to the docker CLI but it doesn't seem to respond to it. So the Docker container *continues to run*. (Thus why "timed out" tasks will still run if they are running their docker armor). In qbt mode, there is no JIRA to write to. So the output is actually handled by Jenkins. In test-patch mode, Yetus has a JIRA to write output to. What we are seeing is that Yetus is continuing to run, finishes, then says "yeah, a bunch of stuff failed. fix your code." Meanwhile, outside the container, it's death and destruction and the loss of the Jenkins agent and probably other stuff. But this does mean at least in this run, that it was *NOT* a kernel panic because otherwise we would never have gotten any feedback at all. That's fantastic news because it means there are likely some controls that can put around it just a matter if they are OS-level/infra or docker-related. It's worth noting that from what I can tell, surefire will report OOM'd and/or otherwise externally killed tests as "timed out". So there was still a lot of death and destruction inside the container as well. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219295#comment-16219295 ] Allen Wittenauer commented on HDFS-12711: - Uh that was unexpected. According to Jenkins, that node is dead! Maybe there is hope after all. Just need to give some cycles to https://builds.apache.org/job/PreCommit-HDFS-Build2/ to test if the OOM protection features in Yetus works. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219285#comment-16219285 ] Hadoop QA commented on HDFS-12711: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 20s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}184m 46s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 2m 53s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}204m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hdfs.TestEncryptionZones | | | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestHdfsAdmin | | | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.TestParallelRead | | | org.apache.hadoop.hdfs.TestMaintenanceState | | | org.apache.hadoop.hdfs.TestBlocksScheduledCounter | | | org.apache.hadoop.hdfs.TestMultiThreadedHflush | | | org.apache.hadoop.hdfs.TestReservedRawPaths | | | org.apache.hadoop.hdfs.TestSetrepIncreasing | | | org.apache.hadoop.hdfs.TestReplication | | | org.apache.hadoop.hdfs.TestDataTransferKeepalive | | | org.apache.hadoop.hdfs.TestDatanodeDeath | | | org.apache.hadoop.hdfs.TestFileAppend | | | org.apache.hadoop.hdfs.TestBlockMissingException | | | org.apache.hadoop.hdfs.TestFileAppend4 | | | org.apache.hadoop.hdfs.TestRollingUpgradeDowngrade | | | org.apache.hadoop.hdfs.TestHDFSFileSystemContract | | | org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage | | | org.apache.hadoop.hdfs.TestDFSPermission | | | org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade | | | org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal | | | org.apache.hadoop.hdfs.TestCrcCorruption | | | org.apache.hadoop.hdfs.TestFileCreationDelete | | | org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS | | | org.apache.hadoop.hdfs.TestDFSAddressConfig | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter | | | org.apache.hadoop.hdfs.TestLeaseRecovery | | | org.apache.hadoop.hdfs.TestBlockStoragePolicy | | | org.apache.hadoop.hdfs.TestDatanodeConfig | | | org.apache.hadoop.hdfs.TestSeekBug | | | org.apache.hadoop.hdfs.TestDFSOutputStream | | | org.apache.hadoop.hdfs.TestDFSUpgrade | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.TestRollingUpgradeRollback | | |
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219243#comment-16219243 ] Allen Wittenauer commented on HDFS-12711: - OK, messing with OOM from the docker perspective is NOT protecting from this crash. I'm at the point where unless I can reproduce this locally, I'm not going to be able to diagnose it any farther. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219178#comment-16219178 ] stack commented on HDFS-12711: -- Rah rah [~aw]! Thanks for digging in. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219103#comment-16219103 ] Allen Wittenauer commented on HDFS-12711: - Total run time was a bit over an hour. Much easier now to reproduce. Gonna set up a new job on jenkins so that I have more control over the situation. It's also interesting to note that a trunk job just failed, but it seemed to be at a different spot in the hdfs-unit tests. Unrelated? Probably. The gear being used in the ASF build inf is probably getting old, s > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219099#comment-16219099 ] Allen Wittenauer commented on HDFS-12711: - Hooray it crashed! > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219020#comment-16219020 ] Allen Wittenauer commented on HDFS-12711: - The problem that we're seeing is that if we run the full gamut of Yetus tests on branch-2, it invariably hard crashes the node on the ASF build infrastructure. Always. Every time. It also always happen during the hdfs unit tests. I'm attempting to isolate it to see if it is *just* the HDFS unit tests that trigger the crash or if I need to run through common, etc, first too. This will do two things: * drop the reproducible test case down from 5+ hours to 1+ hours * confirm that's it is entirely in the hdfs unit tests and not from something else in the path My ultimate goal is to get Yetus to configure the Docker container to at least prevent the crashes. But I need a 'faster' way to test... FWIW, I'm still unsure if it is a kernel panic or just breaking Jenkins enough that it thinks the node is catatonic. From the *one* time I was able to see logs before they disappeared, there were a ton of OOM errors, a core dump, and more. This leads me to believe that it is likely a kernel panic caused by the OOM killer going nuts, since it's well established how badly the Linux kernel behaves under low mem. (Thus why I can't really test at home either... I'm not using Linux on my "big box") > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219009#comment-16219009 ] Anu Engineer commented on HDFS-12711: - [~aw] [~daryn] [~kihwal] Care to enlighten us lesser souls about how this empty line impacts tests? > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218997#comment-16218997 ] Íñigo Goiri commented on HDFS-12711: [~aw] the unit tests seemed to be running a week ago with Java 8. I started them 2 days ago with Java 7 and I'm still half way. Haven't double checked but the OpenJDK 7 issue seems to match. I can start running again with Java 8 to see. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218914#comment-16218914 ] Allen Wittenauer commented on HDFS-12711: - FWIW, my current hypothesis is that we're triggering a bug in OpenJDK 7. If others could run some tests against it, that'd be super. I don't have the equipment here at home to really test it properly and I'm certainly not going to burn money on AWS/Azure/whatever on it. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218895#comment-16218895 ] Allen Wittenauer commented on HDFS-12711: - BTW, there a bunch of tests that still use build/test/data in HDFS. I wouldn't be surprised if that's where some of the flaky tests are. Meanwhile, back to trying to figure out if I can break a node just running them... *sigh* > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218892#comment-16218892 ] Kihwal Lee commented on HDFS-12711: --- bq. -1 on adding Skynet logic to the hdfs tests. +2 to make it +1. We should make it deadlier. Jokes aside, I've noticed TestPread hits maven timeout in 2.8. It passes in trunk. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218869#comment-16218869 ] Daryn Sharp commented on HDFS-12711: -1 on adding Skynet logic to the hdfs tests. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org