Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Yeah, consistently failing nightly builds are just burning resources. Until someone starts working to fix it, we shouldn't keep submitting the job. Too bad; I thought build times and resource usage were becoming more manageable on branch-2. If anyone has cycles to work on this, the job is here [1]. -C [1]: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ On Tue, May 15, 2018 at 10:27 AM, Allen Wittenauerwrote: > >> On May 15, 2018, at 10:16 AM, Chris Douglas wrote: >> >> They've been failing for a long time. It can't install bats, and >> that's fatal? -C > > > The bats error is new and causes the build to fail enough that it > produces the email output. For the past few months, it hasn’t been producing > email output at all because the builds have been timing out. (The last > ‘good’ report was Feb 26.) Since no one [*] is paying attention to them > enough to notice, I figured it was better to free up the cycles for the rest > of the ASF. > > * - I noticed a while back, but for various reasons I’ve mostly moved to only > working on Hadoop things where I’m getting paid. - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> On May 15, 2018, at 10:16 AM, Chris Douglaswrote: > > They've been failing for a long time. It can't install bats, and > that's fatal? -C The bats error is new and causes the build to fail enough that it produces the email output. For the past few months, it hasn’t been producing email output at all because the builds have been timing out. (The last ‘good’ report was Feb 26.) Since no one [*] is paying attention to them enough to notice, I figured it was better to free up the cycles for the rest of the ASF. * - I noticed a while back, but for various reasons I’ve mostly moved to only working on Hadoop things where I’m getting paid. - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
They've been failing for a long time. It can't install bats, and that's fatal? -C On Tue, May 15, 2018 at 9:43 AM, Allen Wittenauerwrote: > > > FYI: > > I’m going to disable the branch-2 nightly jobs. > - > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org > - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
FYI: I’m going to disable the branch-2 nightly jobs. - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Allen, can we bump up the maven surefire heap size to max (if it already is not) for the branch-2 nightly build and see if it helps? Thanks, Subru On Tue, Oct 24, 2017 at 4:22 PM, Allen Wittenauerwrote: > > > On Oct 24, 2017, at 4:10 PM, Andrew Wang > wrote: > > > > FWIW we've been running branch-3.0 unit tests successfully internally, > though we have separate jobs for Common, HDFS, YARN, and MR. The failures > here are probably a property of running everything in the same JVM, which > I've found problematic in the past due to OOMs. > > Last time I looked, surefire was configured to launch unit tests > in different JVMs. But that might only be true in trunk. Or maybe only > for some of the subprojects. > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> On Oct 24, 2017, at 4:10 PM, Andrew Wangwrote: > > FWIW we've been running branch-3.0 unit tests successfully internally, though > we have separate jobs for Common, HDFS, YARN, and MR. The failures here are > probably a property of running everything in the same JVM, which I've found > problematic in the past due to OOMs. Last time I looked, surefire was configured to launch unit tests in different JVMs. But that might only be true in trunk. Or maybe only for some of the subprojects. - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
FWIW we've been running branch-3.0 unit tests successfully internally, though we have separate jobs for Common, HDFS, YARN, and MR. The failures here are probably a property of running everything in the same JVM, which I've found problematic in the past due to OOMs. On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer <a...@effectivemachines.com> wrote: > > My plan is currently to: > > * switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 > patch to test it out. > * if the tests work, work on getting YETUS-561 committed to yetus master > * switch jobs back to ASF yetus master either post-YETUS-561 or without it > if it doesn’t work > * go back to working on something else, regardless of the outcome > > > > On Oct 24, 2017, at 2:55 PM, Chris Douglas <cdoug...@apache.org> wrote: > > > > Sean/Junping- > > > > Ignoring the epistemology, it's a problem. Let's figure out what's > > causing memory to balloon and then we can work out the appropriate > > remedy. > > > > Is this reproducible outside the CI environment? To Junping's point, > > would YETUS-561 provide more detailed information to aid debugging? -C > > > > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote: > >> In general, the "solid evidence" of memory leak comes from analysis of > heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which > piece of code are leaking memory from the analysis. > >> > >> Unfortunately, I cannot find any conclusion from previous comments and > it even cannot tell which daemons/components of HDFS consumes unexpected > high memory. Don't sounds like a solid bug report to me. > >> > >> > >> > >> Thanks,? > >> > >> > >> Junping > >> > >> > >> ________________ > >> From: Sean Busbey <bus...@cloudera.com> > >> Sent: Tuesday, October 24, 2017 2:20 PM > >> To: Junping Du > >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; > mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org > >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > >> > >> Just curious, Junping what would "solid evidence" look like? Is the > supposition here that the memory leak is within HDFS test code rather than > library runtime code? How would such a distinction be shown? > >> > >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <j...@hortonworks.com > <mailto:j...@hortonworks.com>> wrote: > >> Allen, > >> Do we have any solid evidence to show the HDFS unit tests going > through the roof are due to serious memory leak by HDFS? Normally, I don't > expect memory leak are identified in our UTs - mostly, it (test jvm gone) > is just because of test or deployment issues. > >> Unless there is concrete evidence, my concern on seriously memory > leak for HDFS on 2.8 is relatively low given some companies (Yahoo, > Alibaba, etc.) have deployed 2.8 on large production environment for > months. Non-serious memory leak (like forgetting to close stream in > non-critical path, etc.) and other non-critical bugs always happens here > and there that we have to live with. > >> > >> Thanks, > >> > >> Junping > >> > >> > >> From: Allen Wittenauer <a...@effectivemachines.com a...@effectivemachines.com>> > >> Sent: Tuesday, October 24, 2017 8:27 AM > >> To: Hadoop Common > >> Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@ > hadoop.apache.org>; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org> > >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > >> > >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer < > a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote: > >>> > >>> > >>> > >>> With no other information or access to go on, my current hunch is that > one of the HDFS unit tests is ballooning in memory size. The easiest way > to kill a Linux machine is to eat all of the RAM, thanks to overcommit and > that's what this "feels" like. > >>> > >>> Someone should verify if 2.8.2 has the same issues before a release > goes out ... > >> > >> > >>FWIW, I ran 2.8.2 last night and it has the same problems. > >> > >>Also: the node didn't die! Looking through the workspace (so > the next run will destroy them), two sets of logs stand out: > >> &g
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
My plan is currently to: * switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 patch to test it out. * if the tests work, work on getting YETUS-561 committed to yetus master * switch jobs back to ASF yetus master either post-YETUS-561 or without it if it doesn’t work * go back to working on something else, regardless of the outcome > On Oct 24, 2017, at 2:55 PM, Chris Douglas <cdoug...@apache.org> wrote: > > Sean/Junping- > > Ignoring the epistemology, it's a problem. Let's figure out what's > causing memory to balloon and then we can work out the appropriate > remedy. > > Is this reproducible outside the CI environment? To Junping's point, > would YETUS-561 provide more detailed information to aid debugging? -C > > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote: >> In general, the "solid evidence" of memory leak comes from analysis of >> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which >> piece of code are leaking memory from the analysis. >> >> Unfortunately, I cannot find any conclusion from previous comments and it >> even cannot tell which daemons/components of HDFS consumes unexpected high >> memory. Don't sounds like a solid bug report to me. >> >> >> >> Thanks,? >> >> >> Junping >> >> >> >> From: Sean Busbey <bus...@cloudera.com> >> Sent: Tuesday, October 24, 2017 2:20 PM >> To: Junping Du >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; >> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 >> >> Just curious, Junping what would "solid evidence" look like? Is the >> supposition here that the memory leak is within HDFS test code rather than >> library runtime code? How would such a distinction be shown? >> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du >> <j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote: >> Allen, >> Do we have any solid evidence to show the HDFS unit tests going through >> the roof are due to serious memory leak by HDFS? Normally, I don't expect >> memory leak are identified in our UTs - mostly, it (test jvm gone) is just >> because of test or deployment issues. >> Unless there is concrete evidence, my concern on seriously memory leak >> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, >> etc.) have deployed 2.8 on large production environment for months. >> Non-serious memory leak (like forgetting to close stream in non-critical >> path, etc.) and other non-critical bugs always happens here and there that >> we have to live with. >> >> Thanks, >> >> Junping >> >> ________________________ >> From: Allen Wittenauer >> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> >> Sent: Tuesday, October 24, 2017 8:27 AM >> To: Hadoop Common >> Cc: Hdfs-dev; >> mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; >> yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 >> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer >>> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote: >>> >>> >>> >>> With no other information or access to go on, my current hunch is that one >>> of the HDFS unit tests is ballooning in memory size. The easiest way to >>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and >>> that's what this "feels" like. >>> >>> Someone should verify if 2.8.2 has the same issues before a release goes >>> out ... >> >> >>FWIW, I ran 2.8.2 last night and it has the same problems. >> >>Also: the node didn't die! Looking through the workspace (so the >> next run will destroy them), two sets of logs stand out: >> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt >> >>and >> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ >> >>It looks like my hunch is correct: RAM in the HDFS unit tests are >> going through the roof. It's also interesting how MANY log files t
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Sean/Junping- Ignoring the epistemology, it's a problem. Let's figure out what's causing memory to balloon and then we can work out the appropriate remedy. Is this reproducible outside the CI environment? To Junping's point, would YETUS-561 provide more detailed information to aid debugging? -C On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote: > In general, the "solid evidence" of memory leak comes from analysis of > heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which > piece of code are leaking memory from the analysis. > > Unfortunately, I cannot find any conclusion from previous comments and it > even cannot tell which daemons/components of HDFS consumes unexpected high > memory. Don't sounds like a solid bug report to me. > > > > Thanks,? > > > Junping > > > > From: Sean Busbey <bus...@cloudera.com> > Sent: Tuesday, October 24, 2017 2:20 PM > To: Junping Du > Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; > mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org > Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > > Just curious, Junping what would "solid evidence" look like? Is the > supposition here that the memory leak is within HDFS test code rather than > library runtime code? How would such a distinction be shown? > > On Tue, Oct 24, 2017 at 4:06 PM, Junping Du > <j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote: > Allen, > Do we have any solid evidence to show the HDFS unit tests going through > the roof are due to serious memory leak by HDFS? Normally, I don't expect > memory leak are identified in our UTs - mostly, it (test jvm gone) is just > because of test or deployment issues. > Unless there is concrete evidence, my concern on seriously memory leak > for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) > have deployed 2.8 on large production environment for months. Non-serious > memory leak (like forgetting to close stream in non-critical path, etc.) and > other non-critical bugs always happens here and there that we have to live > with. > > Thanks, > > Junping > > > From: Allen Wittenauer > <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> > Sent: Tuesday, October 24, 2017 8:27 AM > To: Hadoop Common > Cc: Hdfs-dev; > mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; > yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org> > Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > >> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer >> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote: >> >> >> >> With no other information or access to go on, my current hunch is that one >> of the HDFS unit tests is ballooning in memory size. The easiest way to >> kill a Linux machine is to eat all of the RAM, thanks to overcommit and >> that's what this "feels" like. >> >> Someone should verify if 2.8.2 has the same issues before a release goes out >> ... > > > FWIW, I ran 2.8.2 last night and it has the same problems. > > Also: the node didn't die! Looking through the workspace (so the > next run will destroy them), two sets of logs stand out: > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > and > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ > > It looks like my hunch is correct: RAM in the HDFS unit tests are > going through the roof. It's also interesting how MANY log files there are. > Is surefire not picking up that jobs are dying? Maybe not if memory is > getting tight. > > Anyway, at the point, branch-2.8 and higher are probably fubar'd. > Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers > can have their RAM limits set in order to prevent more nodes going catatonic. > > > > - > To unsubscribe, e-mail: > yarn-dev-unsubscr...@hadoop.apache.org<mailto:yarn-dev-unsubscr...@hadoop.apache.org> > For additional commands, e-mail: > yarn-dev-h...@hadoop.apache.org<mailto:yarn-dev-h...@hadoop.apache.org> > > > > - > To unsubscribe, e-mail: > common-dev-unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscr...@hadoop.apache.org> > For additional commands, e-mail: > common-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org> > > > > > -- > busbey - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
In general, the "solid evidence" of memory leak comes from analysis of heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which piece of code are leaking memory from the analysis. Unfortunately, I cannot find any conclusion from previous comments and it even cannot tell which daemons/components of HDFS consumes unexpected high memory. Don't sounds like a solid bug report to me. Thanks,? Junping From: Sean Busbey <bus...@cloudera.com> Sent: Tuesday, October 24, 2017 2:20 PM To: Junping Du Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 Just curious, Junping what would "solid evidence" look like? Is the supposition here that the memory leak is within HDFS test code rather than library runtime code? How would such a distinction be shown? On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote: Allen, Do we have any solid evidence to show the HDFS unit tests going through the roof are due to serious memory leak by HDFS? Normally, I don't expect memory leak are identified in our UTs - mostly, it (test jvm gone) is just because of test or deployment issues. Unless there is concrete evidence, my concern on seriously memory leak for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have deployed 2.8 on large production environment for months. Non-serious memory leak (like forgetting to close stream in non-critical path, etc.) and other non-critical bugs always happens here and there that we have to live with. Thanks, Junping From: Allen Wittenauer <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> Sent: Tuesday, October 24, 2017 8:27 AM To: Hadoop Common Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > On Oct 23, 2017, at 12:50 PM, Allen Wittenauer > <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote: > > > > With no other information or access to go on, my current hunch is that one of > the HDFS unit tests is ballooning in memory size. The easiest way to kill a > Linux machine is to eat all of the RAM, thanks to overcommit and that's what > this "feels" like. > > Someone should verify if 2.8.2 has the same issues before a release goes out > ... FWIW, I ran 2.8.2 last night and it has the same problems. Also: the node didn't die! Looking through the workspace (so the next run will destroy them), two sets of logs stand out: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt and https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ It looks like my hunch is correct: RAM in the HDFS unit tests are going through the roof. It's also interesting how MANY log files there are. Is surefire not picking up that jobs are dying? Maybe not if memory is getting tight. Anyway, at the point, branch-2.8 and higher are probably fubar'd. Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers can have their RAM limits set in order to prevent more nodes going catatonic. - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org<mailto:yarn-dev-unsubscr...@hadoop.apache.org> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org<mailto:yarn-dev-h...@hadoop.apache.org> - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscr...@hadoop.apache.org> For additional commands, e-mail: common-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org> -- busbey
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Just curious, Junping what would "solid evidence" look like? Is the supposition here that the memory leak is within HDFS test code rather than library runtime code? How would such a distinction be shown? On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <j...@hortonworks.com> wrote: > Allen, > Do we have any solid evidence to show the HDFS unit tests going > through the roof are due to serious memory leak by HDFS? Normally, I don't > expect memory leak are identified in our UTs - mostly, it (test jvm gone) > is just because of test or deployment issues. > Unless there is concrete evidence, my concern on seriously memory > leak for HDFS on 2.8 is relatively low given some companies (Yahoo, > Alibaba, etc.) have deployed 2.8 on large production environment for > months. Non-serious memory leak (like forgetting to close stream in > non-critical path, etc.) and other non-critical bugs always happens here > and there that we have to live with. > > Thanks, > > Junping > > > From: Allen Wittenauer <a...@effectivemachines.com> > Sent: Tuesday, October 24, 2017 8:27 AM > To: Hadoop Common > Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org > Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > > > On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <a...@effectivemachines.com> > wrote: > > > > > > > > With no other information or access to go on, my current hunch is that > one of the HDFS unit tests is ballooning in memory size. The easiest way > to kill a Linux machine is to eat all of the RAM, thanks to overcommit and > that’s what this “feels” like. > > > > Someone should verify if 2.8.2 has the same issues before a release goes > out … > > > FWIW, I ran 2.8.2 last night and it has the same problems. > > Also: the node didn’t die! Looking through the workspace (so the > next run will destroy them), two sets of logs stand out: > > https://builds.apache.org/job/hadoop-qbt-branch2-java7- > linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > and > > https://builds.apache.org/job/hadoop-qbt-branch2-java7- > linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ > > It looks like my hunch is correct: RAM in the HDFS unit tests are > going through the roof. It’s also interesting how MANY log files there > are. Is surefire not picking up that jobs are dying? Maybe not if memory > is getting tight. > > Anyway, at the point, branch-2.8 and higher are probably fubar’d. > Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker > containers can have their RAM limits set in order to prevent more nodes > going catatonic. > > > > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > -- busbey
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Allen, Do we have any solid evidence to show the HDFS unit tests going through the roof are due to serious memory leak by HDFS? Normally, I don't expect memory leak are identified in our UTs - mostly, it (test jvm gone) is just because of test or deployment issues. Unless there is concrete evidence, my concern on seriously memory leak for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have deployed 2.8 on large production environment for months. Non-serious memory leak (like forgetting to close stream in non-critical path, etc.) and other non-critical bugs always happens here and there that we have to live with. Thanks, Junping From: Allen Wittenauer <a...@effectivemachines.com> Sent: Tuesday, October 24, 2017 8:27 AM To: Hadoop Common Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <a...@effectivemachines.com> > wrote: > > > > With no other information or access to go on, my current hunch is that one of > the HDFS unit tests is ballooning in memory size. The easiest way to kill a > Linux machine is to eat all of the RAM, thanks to overcommit and that’s what > this “feels” like. > > Someone should verify if 2.8.2 has the same issues before a release goes out … FWIW, I ran 2.8.2 last night and it has the same problems. Also: the node didn’t die! Looking through the workspace (so the next run will destroy them), two sets of logs stand out: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt and https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ It looks like my hunch is correct: RAM in the HDFS unit tests are going through the roof. It’s also interesting how MANY log files there are. Is surefire not picking up that jobs are dying? Maybe not if memory is getting tight. Anyway, at the point, branch-2.8 and higher are probably fubar’d. Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers can have their RAM limits set in order to prevent more nodes going catatonic. - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer> wrote: > > > > With no other information or access to go on, my current hunch is that one of > the HDFS unit tests is ballooning in memory size. The easiest way to kill a > Linux machine is to eat all of the RAM, thanks to overcommit and that’s what > this “feels” like. > > Someone should verify if 2.8.2 has the same issues before a release goes out … FWIW, I ran 2.8.2 last night and it has the same problems. Also: the node didn’t die! Looking through the workspace (so the next run will destroy them), two sets of logs stand out: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt and https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ It looks like my hunch is correct: RAM in the HDFS unit tests are going through the roof. It’s also interesting how MANY log files there are. Is surefire not picking up that jobs are dying? Maybe not if memory is getting tight. Anyway, at the point, branch-2.8 and higher are probably fubar’d. Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers can have their RAM limits set in order to prevent more nodes going catatonic. - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
With no other information or access to go on, my current hunch is that one of the HDFS unit tests is ballooning in memory size. The easiest way to kill a Linux machine is to eat all of the RAM, thanks to overcommit and that’s what this “feels” like. Someone should verify if 2.8.2 has the same issues before a release goes out … > On Oct 23, 2017, at 12:38 PM, Subramaniam V Kwrote: > > Hi Allen, > > I had set up the build (or intended to) in anticipation 2.9 release. Thanks > for fixing the configuration! > > We did face HDFS tests timeouts in branch-2 when run together but > individually the tests pass: > https://issues.apache.org/jira/browse/HDFS-12620 > > Folks in HDFS, can you please take a look at HDFS tests in branch-2 as we are > not able to get even a single Yetus run to complete due to multiple test > failures/timeout. > > Thanks, > Subru > > On Mon, Oct 23, 2017 at 11:26 AM, Vrushali C wrote: > Hi Allen, > > I have filed https://issues.apache.org/jira/browse/YARN-7380 for the > timeline service findbugs warnings. > > thanks > Vrushali > > > On Mon, Oct 23, 2017 at 11:14 AM, Allen Wittenauer > wrote: > > > > > I’m really confused why this causes the Yahoo! QA boxes to go catatonic > > (!?!) during the run. As in, never come back online, probably in a kernel > > panic. It’s pretty consistently in hadoop-hdfs, so something is going wrong > > there… is branch-2 hdfs behaving badly? Someone needs to run the > > hadoop-hdfs unit tests to see what is going on. > > > > It’s probably worth noting that findbugs says there is a problem in the > > timeline server hbase code.Someone should probably verify + fix that > > issue. > > > > > > > > - > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > > > > - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Hi Allen, I had set up the build (or intended to) in anticipation 2.9 release. Thanks for fixing the configuration! We did face HDFS tests timeouts in branch-2 when run together but individually the tests pass: https://issues.apache.org/jira/browse/HDFS-12620 Folks in HDFS, can you please take a look at HDFS tests in branch-2 as we are not able to get even a single Yetus run to complete due to multiple test failures/timeout. Thanks, Subru On Mon, Oct 23, 2017 at 11:26 AM, Vrushali Cwrote: > Hi Allen, > > I have filed https://issues.apache.org/jira/browse/YARN-7380 for the > timeline service findbugs warnings. > > thanks > Vrushali > > > On Mon, Oct 23, 2017 at 11:14 AM, Allen Wittenauer < > a...@effectivemachines.com > > wrote: > > > > > I’m really confused why this causes the Yahoo! QA boxes to go catatonic > > (!?!) during the run. As in, never come back online, probably in a > kernel > > panic. It’s pretty consistently in hadoop-hdfs, so something is going > wrong > > there… is branch-2 hdfs behaving badly? Someone needs to run the > > hadoop-hdfs unit tests to see what is going on. > > > > It’s probably worth noting that findbugs says there is a problem in the > > timeline server hbase code.Someone should probably verify + fix that > > issue. > > > > > > > > - > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > > > >
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Hi Allen, I have filed https://issues.apache.org/jira/browse/YARN-7380 for the timeline service findbugs warnings. thanks Vrushali On Mon, Oct 23, 2017 at 11:14 AM, Allen Wittenauerwrote: > > I’m really confused why this causes the Yahoo! QA boxes to go catatonic > (!?!) during the run. As in, never come back online, probably in a kernel > panic. It’s pretty consistently in hadoop-hdfs, so something is going wrong > there… is branch-2 hdfs behaving badly? Someone needs to run the > hadoop-hdfs unit tests to see what is going on. > > It’s probably worth noting that findbugs says there is a problem in the > timeline server hbase code.Someone should probably verify + fix that > issue. > > > > - > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
I’m really confused why this causes the Yahoo! QA boxes to go catatonic (!?!) during the run. As in, never come back online, probably in a kernel panic. It’s pretty consistently in hadoop-hdfs, so something is going wrong there… is branch-2 hdfs behaving badly? Someone needs to run the hadoop-hdfs unit tests to see what is going on. It’s probably worth noting that findbugs says there is a problem in the timeline server hbase code.Someone should probably verify + fix that issue. - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
To whoever set this up: There was a job config problem where the Jenkins branch parameter wasn’t passed to Yetus. Therefore both of these reports have been against trunk. I’ve fixed this job (as well as the other jobs) to honor that parameter. I’ve kicked off a new run with these changes. > On Oct 21, 2017, at 9:58 AM, Apache Jenkins Server >wrote: > > For more details, see > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/ > > [Oct 20, 2017 9:27:59 PM] (stevel) HADOOP-14942. DistCp#cleanup() should > check whether jobFS is null. > [Oct 21, 2017 12:19:29 AM] (subru) YARN-6871. Add additional deSelects params > in > > > > > -1 overall > > > The following subsystems voted -1: >asflicense unit > > > The following subsystems voted -1 but > were configured to be filtered/ignored: >cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace > > > The following subsystems are considered long running: > (runtime bigger than 1h 0m 0s) >unit > > > Specific tests: > >Failed junit tests : > > hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 > hadoop.hdfs.TestReadStripedFileWithMissingBlocks > hadoop.hdfs.server.namenode.ha.TestPipelinesFailover > hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency > hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler > hadoop.yarn.server.resourcemanager.TestApplicationMasterService > hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA > hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels > hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue > > hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation > > hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher > hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > hadoop.yarn.server.resourcemanager.TestRMHA > > hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification > > hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched > hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps > hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl > > hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation > hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors > hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA > hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA > > hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities > > > hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler > > hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption > > > hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior > > hadoop.yarn.server.TestDiskFailures > >Timed out junit tests : > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestZKConfigurationStore > > > org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore > org.apache.hadoop.yarn.server.resourcemanager.TestLeaderElectorService > org.apache.hadoop.mapred.pipes.TestPipeApplication > > > cc: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-compile-cc-root.txt > [4.0K] > > javac: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-compile-javac-root.txt > [284K] > > checkstyle: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-checkstyle-root.txt > [17M] > > pylint: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-pylint.txt > [20K] > > shellcheck: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-shellcheck.txt > [20K] > > shelldocs: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-shelldocs.txt > [12K] > > whitespace: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/whitespace-eol.txt > [8.5M] > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/whitespace-tabs.txt > [292K] > > javadoc: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-javadoc-javadoc-root.txt > [760K] > > unit: > > > https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > [308K] >