>> is there an -Xmx value you would recommend to ensure the parent process
can send the NOOP to the surefire process?

no such value is generic value to recommend. Each application or test must
find our the value to prevent from OOM.


>> I'm making a custom build of surefire-booter to workaround the initial
problem by commenting out the code to exit.

Very good step forward. Then post your findings and we can continue on
making a pull request on GitHub.


>> Is there a logging API which I can use instead?

Yes, it is very low level of logging (sending events to parent process) but
this will be turned to high level abstraction in several days (branch
surefire-1222_2).
Yet, pls use this function:

encodeAndWriteToOutput()


>>





On Fri, Mar 22, 2019 at 4:44 PM Jason Young <jason.yo...@procentive.com>
wrote:

> To clarify, this is the image ours is based on:
> https://hub.docker.com/_/alpine; we are not using a vanilla Maven image as
> we have to add a few other items to this image as well. I don't know who
> maintains that or adds /etc/mavenrc; I'll bring that up in another topic in
> another forum. I was just answering a previous question, and also throwing
> out info for anyone struggling with e.g. OOME in their Maven process (a lot
> of people have that problem when running the Sonar scanner, for example). I
> don't think we need to worry about why that file is added for this topic.
>
> But while we're on the subject, is there an -Xmx value you would recommend
> to ensure the parent process can send the NOOP to the surefire process?
>
> I'm making a custom build of surefire-booter to workaround the initial
> problem by commenting out the code to exit. I am also adding logging via
> stdout so I can see if ForkedBooter is indeed running the ping job in small
> increments of time. Is there a logging API which I can use instead?
>
> On Fri, Mar 22, 2019 at 5:44 AM Tibor Digana <tibordig...@apache.org>
> wrote:
>
> > The base images are developed in
> https://github.com/carlossg/docker-maven,
> > right?
> > Who creates "/etc/mavenrc"?
> >
> > On Thu, Mar 21, 2019 at 12:05 AM Jason Young <jason.yo...@procentive.com
> >
> > wrote:
> >
> > > Mikael, sorry I do not appear to have permission to view the link.
> > >
> > > I did some digging in the last couple of days. I see that the parent
> > > process reads from stdin. I could not find anywhere that we are using
> > > stdin. FWIW the failures nearly always happen at least 15m into a ~20m
> > test
> > > run, so perf is a likely culprit.
> > >
> > > I see also that ForkedBooter reads commands from stdin in one thread,
> and
> > > uses an executor service to check for a past ping in
> > > ForkedBooter.listenToShutdownCommands(..). When it checks, it also sets
> > > pingDone to false. The executor is configured to run up to 2 threads
> > > concurrently to handle the workload, and is set to run at a fixed rate
> > (not
> > > a fixed delay). If the test suite is busy with testing and GC and has
> > lots
> > > of threads running, it's entirely possible that a thread won't have a
> > > chance to run for a long time (e.g. 5s). Maybe instead of a 30s delay,
> > the
> > > VM gets around to checking for a ping every 35s over a long span of
> time.
> > > Because we're running at a "fixed rate" and not a "fixed delay", then
> > after
> > > a couple of minutes we might be a full 30s behind schedule. It's
> possible
> > > the executor will create another thread to run the scheduled task
> because
> > > it's running behind schedule. This new thread checks for a ping, finds
> > it,
> > > and sets pingDone to false. But then the original thread also runs,
> say,
> > 2
> > > seconds afterwards, checks pingDone, and finds it is false.
> > >
> > > So to mitigate the problem, can we a) make the executor run only 1
> thread
> > > and b) schedule the task at a fixed rate? For that matter, is there
> > another
> > > scheduled executor we can reuse? I understand why checking for ping
> > > requires a separate executor. Should I ask in github?
> > >
> > > Regarding a previous question, I found out that Alpine's Maven package
> > > comes with an /etc/mavenrc that sets `MAVEN_OPTS="$MAVEN_OPTS
> -Xmx512m"`
> > > which cannot be undone by setting `MAVEN_OPTS` at the command line; you
> > end
> > > up with e.g. `-Xmx1g -Xmx512m`. (Note this applies to the Maven
> (parent)
> > > process, not the surefire/failsafe (child) process.)
> > >
> > > On Wed, Mar 20, 2019 at 3:46 AM Bernd Eckenfels <
> e...@zusammenkunft.net>
> > > wrote:
> > >
> > > > I guess a timeout caused by FullGC can happen with TCP as well.
> > > Increasing
> > > > the timeout might not be nice but does look like it would help in
> both
> > > > cases. (Problems with stdout are more related to unexpected JVM
> > messages
> > > I
> > > > guess)
> > > >
> > > > Gruss
> > > > Bernd
> > > > --
> > > > http://bernd.eckenfels.net
> > > >
> > > > ________________________________
> > > > Von: Mikael Åsberg <m.asberg.wa...@gmail.com>
> > > > Gesendet: Mittwoch, März 20, 2019 9:40 AM
> > > > An: Maven Users List
> > > > Betreff: Re: Failsafe: Killing self fork JVM. PING timeout elapsed.
> > > >
> > > > These issues regarding communication with forked JVMs, won't they be
> > > > resolved once surefire moves to interprocess communication using
> > > > tcp/ip sockets? This happens to be the target feature to be included
> > > > in the next surefire 3.0.0 milestone:
> > > > https://issues.apache.org/jira/projects/SUREFIRE/versions/12344668
> > > >
> > > > There are soooo many issues relating to surefire reading stdout of
> > > > forked processes (which is my understanding that it is currently
> > > > doing). Many of us are really looking forward to the next milestone.
> > > >
> > > > On Tue, Mar 19, 2019 at 8:59 PM Jason Young <
> > jason.yo...@procentive.com>
> > > > wrote:
> > > > >
> > > > > Getting back to my original questions, I know that "ping" means to
> > see
> > > > if a
> > > > > process is there, and "NOOP" implies it's not a command to do
> > anything.
> > > > But
> > > > > what do the terms "ping" and "NOOP" mean in this context, i.e. how
> do
> > > the
> > > > > processes communicate? I assume they don't sonar. Do other
> processes
> > > also
> > > > > ping NOOPs? Can I PING Chrome with a NOOP from bash? Is it with
> TCP?
> > > > >
> > > > > I'm confused about what I should do regarding GC pauses.
> Previously I
> > > had
> > > > > code that would write the amount of remaining heap space (or
> > something
> > > > like
> > > > > that) to stdout after every test to troubleshoot OOMEs. Can writing
> > to
> > > > > stdout cause the communication failure somehow?
> > > > >
> > > > > On Wed, Mar 13, 2019 at 5:57 PM Jason Young <
> > > jason.yo...@procentive.com>
> > > > > wrote:
> > > > >
> > > > > > I upgraded failsafe and surefire to 3.0.0-M3 as advised; we
> > > encountered
> > > > > > the same exception. (Still using -Xmx5g, will switch to OpenJ9
> soon
> > > in
> > > > case
> > > > > > that helps.)
> > > > > >
> > > > > > BTW I also asked on StackOverflow previously, for anyone
> > interested:
> > > > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed
> > > > > >
> > > > > > On Tue, Feb 26, 2019 at 6:40 PM Jason Young <
> > > > jason.yo...@procentive.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Thanks again for the information.
> > > > > >>
> > > > > >> We had increased the RAM to 3g some time ago to prevent OOMEs.
> > More
> > > > > >> recently, I increased the RAM again to 5g for extra headroom
> since
> > > we
> > > > had
> > > > > >> more headroom available; the problem hasn't happened since, but
> it
> > > > hasn't
> > > > > >> been very long.
> > > > > >>
> > > > > >> We use a more customized image based on Alpine 3.8.2. The JDK
> and
> > > > Maven
> > > > > >> are obtained via apk.
> > > > > >>
> > > > > >> I will try upgrading failsafe (and surefire while I'm at it)
> > sooner,
> > > > and
> > > > > >> probably do some experimentation with JVMs another time (not
> > > pressing
> > > > for
> > > > > >> me ATM).
> > > > > >>
> > > > > >> On Tue, Feb 26, 2019 at 12:20 PM Tibor Digana <
> > > tibordig...@apache.org
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> >> I'll try to enable some logging about GC pauses to see
> what's
> > up
> > > > > >>>
> > > > > >>> Pls do not keep such setting after tuning the GC because this
> may
> > > > > >>> sometime
> > > > > >>> break the interprocess communication between Maven process and
> > > > surefire
> > > > > >>> process.
> > > > > >>> It's worth to list GC information in a file and not in the
> > console
> > > > logs.
> > > > > >>> This can be configured, I guess.
> > > > > >>>
> > > > > >>> >> Do you think the value is simply too low?
> > > > > >>>
> > > > > >>> GCing many objects may take some time and I remember we had a
> > user
> > > > who
> > > > > >>> had
> > > > > >>> this problem a year or two ago.
> > > > > >>> We check every third NOOP (which is 3 x 10 sec) as a fix
> instead
> > of
> > > > every
> > > > > >>> NOP. So 30 seconds looked satisfactory.
> > > > > >>> I think you use old version 2.20 or something like that. The
> > fixes
> > > > for
> > > > > >>> docker have been done so far, so please use the latest version
> > > > 3.0.0-M3.
> > > > > >>> See this page
> > > > > >>>
> > > https://maven.apache.org/surefire/maven-surefire-plugin/docker.html,
> > > > we
> > > > > >>> used maven:3.5.3-jdk-8-alpine in this test. Which base image
> did
> > > you
> > > > use?
> > > > > >>>
> > > > > >>> Cheers
> > > > > >>> Tibor
> > > > > >>>
> > > > > >>> On Tue, Feb 26, 2019 at 5:24 PM Jason Young <
> > > > jason.yo...@procentive.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > Thanks for the information. It's good to see someone
> > understands
> > > a
> > > > > >>> little
> > > > > >>> > about this.
> > > > > >>> >
> > > > > >>> > Incidentally, we have been looking at other GCs and VMs for
> the
> > > > > >>> application
> > > > > >>> > in production environments, so I'll look into how these
> affect
> > > > tests as
> > > > > >>> > well. I'll try to enable some logging about GC pauses to see
> > > > what's up.
> > > > > >>> >
> > > > > >>> > How would `-Xmx3g` cause long GC cycles? Do you think the
> value
> > > is
> > > > > >>> simply
> > > > > >>> > too low?
> > > > > >>> >
> > > > > >>> > FWIW we're running the Maven build in an Alpine-based Docker
> > > > container.
> > > > > >>> >
> > > > > >>> > On Sat, Feb 23, 2019 at 6:36 AM Tibor Digana <
> > > > tibordig...@apache.org>
> > > > > >>> > wrote:
> > > > > >>> >
> > > > > >>> > > Hi Jason,
> > > > > >>> > >
> > > > > >>> > > We spoke about this issue on our chat in ASF Slack:
> > > > > >>> > > "I think his tests have been paused for a long GC periods
> and
> > > > timed
> > > > > >>> out
> > > > > >>> > 3x
> > > > > >>> > > PING period = 30 seconds. After this period forked JVM
> > supposed
> > > > the
> > > > > >>> Maven
> > > > > >>> > > process was killed by JenkinsCI and therefore all surefire
> > > > processes
> > > > > >>> are
> > > > > >>> > > killed as well and all the file handlers and memory
> > > consumptions
> > > > are
> > > > > >>> > > freed."
> > > > > >>> > >
> > > > > >>> > > "But I have to say that `-Xmx3g` may cause long GC cycles,
> > see
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > >
> > >
> >
> https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html
> > > > > >>> > > "
> > > > > >>> > >
> > > > > >>> > > You are using java-1.8-openjdk. I guess you should use
> > > > Shenandoah GC
> > > > > >>> > which
> > > > > >>> > > is an experimental algorithm in JVM 1.8. This would
> > > significantly
> > > > > >>> short
> > > > > >>> > > the GC cycles.
> > > > > >>> > >
> > > > > >>> > > We should of cource provide a new configuration parameter
> to
> > > give
> > > > > >>> you a
> > > > > >>> > > chance to prolong the PING.
> > > > > >>> > >
> > > > > >>> > > Cheers
> > > > > >>> > > Tibor
> > > > > >>> > >
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > --
> > > > > >>> >
> > > > > >>> > Jason Young
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > >
> > > > > --
> > > > > Jason Young
> > > > > Software Engineer | PROCENTIVE
> > > > > [image: Phone] 715 245 8000 x7609
> > > > > [image: Mobile] 706 870 3540
> > > > > [image: Web] procentive.com
> > > > > Confidentiality Notice: This message is intended for the sole use
> of
> > > the
> > > > > individual and entity to which it is addressed, and may contain
> > > > information
> > > > > that is privileged, confidential and exempt from disclosure under
> > > > > applicable law. Any unauthorized review, use, disclosure or
> > > distribution
> > > > of
> > > > > this email message, including any attachment, is prohibited. If you
> > are
> > > > not
> > > > > the intended recipient, please advise the sender by reply email and
> > > > destroy
> > > > > all copies of the original message.
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: users-unsubscr...@maven.apache.org
> > > > For additional commands, e-mail: users-h...@maven.apache.org
> > > >
> > > >
> > >
> >
>

Reply via email to