Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-05-15 Thread Chris Douglas
Yeah, consistently failing nightly builds are just burning resources.
Until someone starts working to fix it, we shouldn't keep submitting
the job. Too bad; I thought build times and resource usage were
becoming more manageable on branch-2.

If anyone has cycles to work on this, the job is here [1]. -C

[1]: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/

On Tue, May 15, 2018 at 10:27 AM, Allen Wittenauer
 wrote:
>
>> On May 15, 2018, at 10:16 AM, Chris Douglas  wrote:
>>
>> They've been failing for a long time. It can't install bats, and
>> that's fatal? -C
>
>
> The bats error is new and causes the build to fail enough that it 
> produces the email output.  For the past few months, it hasn’t been producing 
> email output at all because the builds have been timing out.  (The last 
> ‘good’ report was Feb 26.)  Since no one [*] is paying attention to them 
> enough to notice, I figured it was better to free up the cycles for the rest 
> of the ASF.
>
> * - I noticed a while back, but for various reasons I’ve mostly moved to only 
> working on Hadoop things where I’m getting paid.

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-05-15 Thread Allen Wittenauer

> On May 15, 2018, at 10:16 AM, Chris Douglas  wrote:
> 
> They've been failing for a long time. It can't install bats, and
> that's fatal? -C


The bats error is new and causes the build to fail enough that it 
produces the email output.  For the past few months, it hasn’t been producing 
email output at all because the builds have been timing out.  (The last ‘good’ 
report was Feb 26.)  Since no one [*] is paying attention to them enough to 
notice, I figured it was better to free up the cycles for the rest of the ASF. 

* - I noticed a while back, but for various reasons I’ve mostly moved to only 
working on Hadoop things where I’m getting paid.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-05-15 Thread Chris Douglas
They've been failing for a long time. It can't install bats, and
that's fatal? -C

On Tue, May 15, 2018 at 9:43 AM, Allen Wittenauer
 wrote:
>
>
> FYI:
>
> I’m going to disable the branch-2 nightly jobs.
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-05-15 Thread Allen Wittenauer


FYI:

I’m going to disable the branch-2 nightly jobs.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Subramaniam V K
Allen, can we bump up the maven surefire heap size to max (if it already is
not) for the branch-2 nightly build and see if it helps?

Thanks,
Subru

On Tue, Oct 24, 2017 at 4:22 PM, Allen Wittenauer 
wrote:

>
> > On Oct 24, 2017, at 4:10 PM, Andrew Wang 
> wrote:
> >
> > FWIW we've been running branch-3.0 unit tests successfully internally,
> though we have separate jobs for Common, HDFS, YARN, and MR. The failures
> here are probably a property of running everything in the same JVM, which
> I've found problematic in the past due to OOMs.
>
> Last time I looked, surefire was configured to launch unit tests
> in different JVMs.  But that might only be true in trunk.  Or maybe only
> for some of the subprojects.
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 24, 2017, at 4:10 PM, Andrew Wang  wrote:
> 
> FWIW we've been running branch-3.0 unit tests successfully internally, though 
> we have separate jobs for Common, HDFS, YARN, and MR. The failures here are 
> probably a property of running everything in the same JVM, which I've found 
> problematic in the past due to OOMs.

Last time I looked, surefire was configured to launch unit tests in 
different JVMs.  But that might only be true in trunk.  Or maybe only for some 
of the subprojects.  
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Andrew Wang
FWIW we've been running branch-3.0 unit tests successfully internally,
though we have separate jobs for Common, HDFS, YARN, and MR. The failures
here are probably a property of running everything in the same JVM, which
I've found problematic in the past due to OOMs.

On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer <a...@effectivemachines.com>
wrote:

>
> My plan is currently to:
>
> *  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561
> patch to test it out.
> * if the tests work, work on getting YETUS-561 committed to yetus master
> * switch jobs back to ASF yetus master either post-YETUS-561 or without it
> if it doesn’t work
> * go back to working on something else, regardless of the outcome
>
>
> > On Oct 24, 2017, at 2:55 PM, Chris Douglas <cdoug...@apache.org> wrote:
> >
> > Sean/Junping-
> >
> > Ignoring the epistemology, it's a problem. Let's figure out what's
> > causing memory to balloon and then we can work out the appropriate
> > remedy.
> >
> > Is this reproducible outside the CI environment? To Junping's point,
> > would YETUS-561 provide more detailed information to aid debugging? -C
> >
> > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote:
> >> In general, the "solid evidence" of memory leak comes from analysis of
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which
> piece of code are leaking memory from the analysis.
> >>
> >> Unfortunately, I cannot find any conclusion from previous comments and
> it even cannot tell which daemons/components of HDFS consumes unexpected
> high memory. Don't sounds like a solid bug report to me.
> >>
> >>
> >>
> >> Thanks,?
> >>
> >>
> >> Junping
> >>
> >>
> >> ________________
> >> From: Sean Busbey <bus...@cloudera.com>
> >> Sent: Tuesday, October 24, 2017 2:20 PM
> >> To: Junping Du
> >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev;
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >> Just curious, Junping what would "solid evidence" look like? Is the
> supposition here that the memory leak is within HDFS test code rather than
> library runtime code? How would such a distinction be shown?
> >>
> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <j...@hortonworks.com
> <mailto:j...@hortonworks.com>> wrote:
> >> Allen,
> >> Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
> >> Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >> 
> >> From: Allen Wittenauer <a...@effectivemachines.com a...@effectivemachines.com>>
> >> Sent: Tuesday, October 24, 2017 8:27 AM
> >> To: Hadoop Common
> >> Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@
> hadoop.apache.org>; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org>
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <
> a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote:
> >>>
> >>>
> >>>
> >>> With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that's what this "feels" like.
> >>>
> >>> Someone should verify if 2.8.2 has the same issues before a release
> goes out ...
> >>
> >>
> >>FWIW, I ran 2.8.2 last night and it has the same problems.
> >>
> >>Also: the node didn't die!  Looking through the workspace (so
> the next run will destroy them), two sets of logs stand out:
> >>
&g

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

My plan is currently to:

*  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 
patch to test it out. 
* if the tests work, work on getting YETUS-561 committed to yetus master
* switch jobs back to ASF yetus master either post-YETUS-561 or without it if 
it doesn’t work
* go back to working on something else, regardless of the outcome


> On Oct 24, 2017, at 2:55 PM, Chris Douglas <cdoug...@apache.org> wrote:
> 
> Sean/Junping-
> 
> Ignoring the epistemology, it's a problem. Let's figure out what's
> causing memory to balloon and then we can work out the appropriate
> remedy.
> 
> Is this reproducible outside the CI environment? To Junping's point,
> would YETUS-561 provide more detailed information to aid debugging? -C
> 
> On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote:
>> In general, the "solid evidence" of memory leak comes from analysis of 
>> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
>> piece of code are leaking memory from the analysis.
>> 
>> Unfortunately, I cannot find any conclusion from previous comments and it 
>> even cannot tell which daemons/components of HDFS consumes unexpected high 
>> memory. Don't sounds like a solid bug report to me.
>> 
>> 
>> 
>> Thanks,?
>> 
>> 
>> Junping
>> 
>> 
>> 
>> From: Sean Busbey <bus...@cloudera.com>
>> Sent: Tuesday, October 24, 2017 2:20 PM
>> To: Junping Du
>> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
>> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>> Just curious, Junping what would "solid evidence" look like? Is the 
>> supposition here that the memory leak is within HDFS test code rather than 
>> library runtime code? How would such a distinction be shown?
>> 
>> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
>> <j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote:
>> Allen,
>> Do we have any solid evidence to show the HDFS unit tests going through 
>> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
>> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
>> because of test or deployment issues.
>> Unless there is concrete evidence, my concern on seriously memory leak 
>> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, 
>> etc.) have deployed 2.8 on large production environment for months. 
>> Non-serious memory leak (like forgetting to close stream in non-critical 
>> path, etc.) and other non-critical bugs always happens here and there that 
>> we have to live with.
>> 
>> Thanks,
>> 
>> Junping
>> 
>> ________________________
>> From: Allen Wittenauer 
>> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>>
>> Sent: Tuesday, October 24, 2017 8:27 AM
>> To: Hadoop Common
>> Cc: Hdfs-dev; 
>> mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; 
>> yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>>> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote:
>>> 
>>> 
>>> 
>>> With no other information or access to go on, my current hunch is that one 
>>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>>> that's what this "feels" like.
>>> 
>>> Someone should verify if 2.8.2 has the same issues before a release goes 
>>> out ...
>> 
>> 
>>FWIW, I ran 2.8.2 last night and it has the same problems.
>> 
>>Also: the node didn't die!  Looking through the workspace (so the 
>> next run will destroy them), two sets of logs stand out:
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>> 
>>and
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>> 
>>It looks like my hunch is correct:  RAM in the HDFS unit tests are 
>> going through the roof.  It's also interesting how MANY log files t

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Chris Douglas
Sean/Junping-

Ignoring the epistemology, it's a problem. Let's figure out what's
causing memory to balloon and then we can work out the appropriate
remedy.

Is this reproducible outside the CI environment? To Junping's point,
would YETUS-561 provide more detailed information to aid debugging? -C

On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote:
> In general, the "solid evidence" of memory leak comes from analysis of 
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
> piece of code are leaking memory from the analysis.
>
> Unfortunately, I cannot find any conclusion from previous comments and it 
> even cannot tell which daemons/components of HDFS consumes unexpected high 
> memory. Don't sounds like a solid bug report to me.
>
>
>
> Thanks,?
>
>
> Junping
>
>
> 
> From: Sean Busbey <bus...@cloudera.com>
> Sent: Tuesday, October 24, 2017 2:20 PM
> To: Junping Du
> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
> Just curious, Junping what would "solid evidence" look like? Is the 
> supposition here that the memory leak is within HDFS test code rather than 
> library runtime code? How would such a distinction be shown?
>
> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
> <j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote:
> Allen,
>  Do we have any solid evidence to show the HDFS unit tests going through 
> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
> because of test or deployment issues.
>  Unless there is concrete evidence, my concern on seriously memory leak 
> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) 
> have deployed 2.8 on large production environment for months. Non-serious 
> memory leak (like forgetting to close stream in non-critical path, etc.) and 
> other non-critical bugs always happens here and there that we have to live 
> with.
>
> Thanks,
>
> Junping
>
> 
> From: Allen Wittenauer 
> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>>
> Sent: Tuesday, October 24, 2017 8:27 AM
> To: Hadoop Common
> Cc: Hdfs-dev; 
> mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; 
> yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote:
>>
>>
>>
>> With no other information or access to go on, my current hunch is that one 
>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>> that's what this "feels" like.
>>
>> Someone should verify if 2.8.2 has the same issues before a release goes out 
>> ...
>
>
> FWIW, I ran 2.8.2 last night and it has the same problems.
>
> Also: the node didn't die!  Looking through the workspace (so the 
> next run will destroy them), two sets of logs stand out:
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
> and
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>
> It looks like my hunch is correct:  RAM in the HDFS unit tests are 
> going through the roof.  It's also interesting how MANY log files there are.  
> Is surefire not picking up that jobs are dying?  Maybe not if memory is 
> getting tight.
>
> Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers 
> can have their RAM limits set in order to prevent more nodes going catatonic.
>
>
>
> -
> To unsubscribe, e-mail: 
> yarn-dev-unsubscr...@hadoop.apache.org<mailto:yarn-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: 
> yarn-dev-h...@hadoop.apache.org<mailto:yarn-dev-h...@hadoop.apache.org>
>
>
>
> -
> To unsubscribe, e-mail: 
> common-dev-unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: 
> common-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org>
>
>
>
>
> --
> busbey

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Junping Du
In general, the "solid evidence" of memory leak comes from analysis of 
heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
piece of code are leaking memory from the analysis.

Unfortunately, I cannot find any conclusion from previous comments and it even 
cannot tell which daemons/components of HDFS consumes unexpected high memory. 
Don't sounds like a solid bug report to me.



Thanks,?


Junping



From: Sean Busbey <bus...@cloudera.com>
Sent: Tuesday, October 24, 2017 2:20 PM
To: Junping Du
Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; mapreduce-dev@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Just curious, Junping what would "solid evidence" look like? Is the supposition 
here that the memory leak is within HDFS test code rather than library runtime 
code? How would such a distinction be shown?

On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
<j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote:
Allen,
 Do we have any solid evidence to show the HDFS unit tests going through 
the roof are due to serious memory leak by HDFS? Normally, I don't expect 
memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
because of test or deployment issues.
 Unless there is concrete evidence, my concern on seriously memory leak for 
HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have 
deployed 2.8 on large production environment for months. Non-serious memory 
leak (like forgetting to close stream in non-critical path, etc.) and other 
non-critical bugs always happens here and there that we have to live with.

Thanks,

Junping


From: Allen Wittenauer 
<a...@effectivemachines.com<mailto:a...@effectivemachines.com>>
Sent: Tuesday, October 24, 2017 8:27 AM
To: Hadoop Common
Cc: Hdfs-dev; 
mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; 
yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote:
>
>
>
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that's what 
> this "feels" like.
>
> Someone should verify if 2.8.2 has the same issues before a release goes out 
> ...


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn't die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It's also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight.

Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: 
yarn-dev-unsubscr...@hadoop.apache.org<mailto:yarn-dev-unsubscr...@hadoop.apache.org>
For additional commands, e-mail: 
yarn-dev-h...@hadoop.apache.org<mailto:yarn-dev-h...@hadoop.apache.org>



-
To unsubscribe, e-mail: 
common-dev-unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscr...@hadoop.apache.org>
For additional commands, e-mail: 
common-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org>




--
busbey


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Sean Busbey
Just curious, Junping what would "solid evidence" look like? Is the
supposition here that the memory leak is within HDFS test code rather than
library runtime code? How would such a distinction be shown?

On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <j...@hortonworks.com> wrote:

> Allen,
>  Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
>  Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
>
> Thanks,
>
> Junping
>
> 
> From: Allen Wittenauer <a...@effectivemachines.com>
> Sent: Tuesday, October 24, 2017 8:27 AM
> To: Hadoop Common
> Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
> > On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <a...@effectivemachines.com>
> wrote:
> >
> >
> >
> > With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that’s what this “feels” like.
> >
> > Someone should verify if 2.8.2 has the same issues before a release goes
> out …
>
>
> FWIW, I ran 2.8.2 last night and it has the same problems.
>
> Also: the node didn’t die!  Looking through the workspace (so the
> next run will destroy them), two sets of logs stand out:
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
> and
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>
> It looks like my hunch is correct:  RAM in the HDFS unit tests are
> going through the roof.  It’s also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
>
> Anyway, at the point, branch-2.8 and higher are probably fubar’d.
> Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker
> containers can have their RAM limits set in order to prevent more nodes
> going catatonic.
>
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


-- 
busbey


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Junping Du
Allen,
 Do we have any solid evidence to show the HDFS unit tests going through 
the roof are due to serious memory leak by HDFS? Normally, I don't expect 
memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
because of test or deployment issues. 
 Unless there is concrete evidence, my concern on seriously memory leak for 
HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have 
deployed 2.8 on large production environment for months. Non-serious memory 
leak (like forgetting to close stream in non-critical path, etc.) and other 
non-critical bugs always happens here and there that we have to live with.

Thanks,

Junping


From: Allen Wittenauer <a...@effectivemachines.com>
Sent: Tuesday, October 24, 2017 8:27 AM
To: Hadoop Common
Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <a...@effectivemachines.com> 
> wrote:
>
>
>
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
> this “feels” like.
>
> Someone should verify if 2.8.2 has the same issues before a release goes out …


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn’t die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It’s also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight.

Anyway, at the point, branch-2.8 and higher are probably fubar’d. 
Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer  
> wrote:
> 
> 
> 
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
> this “feels” like.
> 
> Someone should verify if 2.8.2 has the same issues before a release goes out …


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn’t die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It’s also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight. 

Anyway, at the point, branch-2.8 and higher are probably fubar’d. 
Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-23 Thread Allen Wittenauer


With no other information or access to go on, my current hunch is that one of 
the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
this “feels” like.

Someone should verify if 2.8.2 has the same issues before a release goes out …


> On Oct 23, 2017, at 12:38 PM, Subramaniam V K  wrote:
> 
> Hi Allen,
> 
> I had set up the build (or intended to) in anticipation 2.9 release. Thanks 
> for fixing the configuration!
> 
> We did face HDFS tests timeouts in branch-2 when run together but 
> individually the tests pass:
> https://issues.apache.org/jira/browse/HDFS-12620
> 
> Folks in HDFS, can you please take a look at HDFS tests in branch-2 as we are 
> not able to get even a single Yetus run to complete due to multiple test 
> failures/timeout.
> 
> Thanks,
> Subru
> 
> On Mon, Oct 23, 2017 at 11:26 AM, Vrushali C  wrote:
> Hi Allen,
> 
> I have filed https://issues.apache.org/jira/browse/YARN-7380 for the
> timeline service findbugs warnings.
> 
> thanks
> Vrushali
> 
> 
> On Mon, Oct 23, 2017 at 11:14 AM, Allen Wittenauer  > wrote:
> 
> >
> > I’m really confused why this causes the Yahoo! QA boxes to go catatonic
> > (!?!) during the run.  As in, never come back online, probably in a kernel
> > panic. It’s pretty consistently in hadoop-hdfs, so something is going wrong
> > there… is branch-2 hdfs behaving badly?  Someone needs to run the
> > hadoop-hdfs unit tests to see what is going on.
> >
> > It’s probably worth noting that findbugs says there is a problem in the
> > timeline server hbase code.Someone should probably verify + fix that
> > issue.
> >
> >
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-23 Thread Subramaniam V K
Hi Allen,

I had set up the build (or intended to) in anticipation 2.9 release. Thanks
for fixing the configuration!

We did face HDFS tests timeouts in branch-2 when run together but
individually the tests pass:
https://issues.apache.org/jira/browse/HDFS-12620

Folks in HDFS, can you please take a look at HDFS tests in branch-2 as we
are not able to get even a single Yetus run to complete due to multiple
test failures/timeout.

Thanks,
Subru

On Mon, Oct 23, 2017 at 11:26 AM, Vrushali C 
wrote:

> Hi Allen,
>
> I have filed https://issues.apache.org/jira/browse/YARN-7380 for the
> timeline service findbugs warnings.
>
> thanks
> Vrushali
>
>
> On Mon, Oct 23, 2017 at 11:14 AM, Allen Wittenauer <
> a...@effectivemachines.com
> > wrote:
>
> >
> > I’m really confused why this causes the Yahoo! QA boxes to go catatonic
> > (!?!) during the run.  As in, never come back online, probably in a
> kernel
> > panic. It’s pretty consistently in hadoop-hdfs, so something is going
> wrong
> > there… is branch-2 hdfs behaving badly?  Someone needs to run the
> > hadoop-hdfs unit tests to see what is going on.
> >
> > It’s probably worth noting that findbugs says there is a problem in the
> > timeline server hbase code.Someone should probably verify + fix that
> > issue.
> >
> >
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
>


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-23 Thread Vrushali C
Hi Allen,

I have filed https://issues.apache.org/jira/browse/YARN-7380 for the
timeline service findbugs warnings.

thanks
Vrushali


On Mon, Oct 23, 2017 at 11:14 AM, Allen Wittenauer  wrote:

>
> I’m really confused why this causes the Yahoo! QA boxes to go catatonic
> (!?!) during the run.  As in, never come back online, probably in a kernel
> panic. It’s pretty consistently in hadoop-hdfs, so something is going wrong
> there… is branch-2 hdfs behaving badly?  Someone needs to run the
> hadoop-hdfs unit tests to see what is going on.
>
> It’s probably worth noting that findbugs says there is a problem in the
> timeline server hbase code.Someone should probably verify + fix that
> issue.
>
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-23 Thread Allen Wittenauer

I’m really confused why this causes the Yahoo! QA boxes to go catatonic (!?!) 
during the run.  As in, never come back online, probably in a kernel panic. 
It’s pretty consistently in hadoop-hdfs, so something is going wrong there… is 
branch-2 hdfs behaving badly?  Someone needs to run the hadoop-hdfs unit tests 
to see what is going on.

It’s probably worth noting that findbugs says there is a problem in the 
timeline server hbase code.Someone should probably verify + fix that issue.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-21 Thread Allen Wittenauer

To whoever set this up:

There was a job config problem where the Jenkins branch parameter wasn’t passed 
to Yetus.  Therefore both of these reports have been against trunk.  I’ve fixed 
this job (as well as the other jobs) to honor that parameter.  I’ve kicked off 
a new run with these changes.




> On Oct 21, 2017, at 9:58 AM, Apache Jenkins Server 
>  wrote:
> 
> For more details, see 
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/
> 
> [Oct 20, 2017 9:27:59 PM] (stevel) HADOOP-14942. DistCp#cleanup() should 
> check whether jobFS is null.
> [Oct 21, 2017 12:19:29 AM] (subru) YARN-6871. Add additional deSelects params 
> in
> 
> 
> 
> 
> -1 overall
> 
> 
> The following subsystems voted -1:
>asflicense unit
> 
> 
> The following subsystems voted -1 but
> were configured to be filtered/ignored:
>cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace
> 
> 
> The following subsystems are considered long running:
> (runtime bigger than 1h  0m  0s)
>unit
> 
> 
> Specific tests:
> 
>Failed junit tests :
> 
>   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 
>   hadoop.hdfs.TestReadStripedFileWithMissingBlocks 
>   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover 
>   hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency 
>   hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler 
>   hadoop.yarn.server.resourcemanager.TestApplicationMasterService 
>   hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
>   hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels 
>   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue 
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
>  
>   hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler 
>   hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher 
>   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler 
>   hadoop.yarn.server.resourcemanager.TestRMHA 
>   
> hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification 
>   
> hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched 
>   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps 
>   hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl 
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation 
>   hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors 
>   hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA 
>   hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA 
>   
> hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities
>  
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler 
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption
>  
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
>  
>   hadoop.yarn.server.TestDiskFailures 
> 
>Timed out junit tests :
> 
>   
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestZKConfigurationStore
>  
>   
> org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore 
>   org.apache.hadoop.yarn.server.resourcemanager.TestLeaderElectorService 
>   org.apache.hadoop.mapred.pipes.TestPipeApplication 
> 
> 
>   cc:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-compile-cc-root.txt
>   [4.0K]
> 
>   javac:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-compile-javac-root.txt
>   [284K]
> 
>   checkstyle:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-checkstyle-root.txt
>   [17M]
> 
>   pylint:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-pylint.txt
>   [20K]
> 
>   shellcheck:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-shellcheck.txt
>   [20K]
> 
>   shelldocs:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-shelldocs.txt
>   [12K]
> 
>   whitespace:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/whitespace-eol.txt
>   [8.5M]
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/whitespace-tabs.txt
>   [292K]
> 
>   javadoc:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-javadoc-javadoc-root.txt
>   [760K]
> 
>   unit:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>   [308K]
>