Pls check this commit.
It is the algorithm
https://github.com/apache/maven-surefire/pull/231/commits/112f3e4348f8ed731182064561604ff0fa9ca30c
which avoids killing JVM when GC has paused the JVM for longer than 15
seconds.

On Sat, Apr 20, 2019 at 5:27 PM Tibor Digana <tibordig...@apache.org> wrote:

> Hey Jason,
>
> Here is the PR I promissed
> https://github.com/apache/maven-surefire/pull/231
> I will think about more tolerant algorithm.
> Feel free to help us in this PR on GitHub.
>
> Cheers
> Tibor
>
>
> On Wed, Apr 10, 2019 at 3:18 AM Jason Young <jason.yo...@procentive.com>
> wrote:
>
>> Thanks.
>>
>> I finally took a little time to look at why the `ps` command would not
>> work. For our project, we cannot use a default Maven image, so I made a
>> custom one based on alpine. One piece of software the Maven alpine image
>> has is `pcprops`; adding that fixes the problem:
>>
>> $ docker run -it alpine sh
>> > / # apk update
>> > ...
>> > OK: 8445 distinct packages available
>> > / # apk upgrade
>> > ...
>> > OK: 4 MiB in 13 packages
>> > / # ps -o etime= -p 1
>> > ps: unrecognized option: p
>> > BusyBox v1.26.2 (2018-05-30 13:51:20 GMT) multi-call binary.
>> > Usage: ps [-o COL1,COL2=HEADER]
>> > Show list of processes
>> > -o COL1,COL2=HEADER Select columns for display
>> > / # apk add procps
>> > ...
>> > OK: 12 MiB in 19 packages
>> > / # ps -o etime= -p 1
>> >       05:21
>>
>>
>> Hope that helps anyone else using a custom-built alpine-based Maven image.
>>
>> I'm not sure if I'll have time to experiment much with Surefire changes in
>> the near future; my focus at the moment is figuring out all the reasons
>> our
>> test suite sometimes takes 5 or more times as long as normal (often also
>> causes Surefire to kill the child process, but even if it didn't, we have
>> a
>> problem!).
>>
>> On Sun, Apr 7, 2019 at 2:04 AM Tibor Digana <tibordig...@apache.org>
>> wrote:
>>
>> > Hi Jason,
>> >
>> > I am sorry I haven't reply.
>> > Excellent analysis!
>> >
>> > Can you make an experiment and use pingJob on a "fixed delay" instead
>> of a
>> > "fixed rate"?
>> > If you want to make more, feel free.
>> >
>> > We should mabe think about how we would add more intelligence in these
>> > timers in order to detect the "blackout".
>> >
>> > Cheers
>> > Tibor
>> >
>> >
>> >
>> >
>> > On Fri, Mar 22, 2019 at 9:53 PM Jason Young <jason.yo...@procentive.com
>> >
>> > wrote:
>> >
>> > > Notice:
>> > > * an over 3-minute gap between the first two entries above--neither
>> the
>> > > command reader thread nor the pingJob had a chance to run.
>> > > * the command reader thread and the ping job executor evidently get a
>> > > chance to run from 19:18:02 to 19:19:19. They ran with increased
>> > frequency
>> > > during that time to "catch up": The command reader thread is
>> programmed
>> > to
>> > > read from stdin until there's nothing left to read, and the
>> pingScheduler
>> > > runs the pingJob at a "fixed" rate.
>> > > * Another 3-minute gap and another attempt to catch up at 19:22:28.
>> This
>> > > time though, the command reader thread evidently did not get a turn in
>> > > time.
>> > >
>> > > I did not bother to log what threads the ping job was running under; I
>> > can
>> > > do that later. I also did not add any logging on the Maven process
>> side;
>> > I
>> > > can do that as well.
>> > >
>> > > This is on a VM with unthrottled access to 4 processor cores and 8GB
>> > RAM. I
>> > > think the code under test is CPU- and RAM-intensive. There probably
>> were
>> > GC
>> > > pauses, but I don't have proper logging for that in place.
>> > >
>> > > Note that this is all very normal behavior: Both the command reader
>> and
>> > the
>> > > ping job's thread (run by the pingScheduler) are instructed to run
>> until
>> > > their work is done, not to sleep or yield. For high throughput, an OS
>> > will
>> > > favor continuing to run a thread for a while once it is started over
>> > > switching tasks, so it's normal both that each thread would run for a
>> > while
>> > > after a pause, and (by extension) that each would have long pauses as
>> > other
>> > > threads had to run, too. Because of the long gaps between runs of the
>> > > command reader thread, my previous advice (schedule the pingJob on a
>> > "fixed
>> > > delay" instead of a "fixed rate") would not have saved the build in
>> this
>> > > case (though "fixed delay" is more correct and would save the build in
>> > > other possible cases). Our code base has some unit tests that rely on
>> > > "normal case" timing, but they now have the @Ignore attribute because
>> > they
>> > > would rarely, but too often, fail. For most teams and projects, unit
>> > tests
>> > > have to be more reliable than that.
>> > >
>> > > So I think we need a solution that does not rely on preemptive
>> scheduling
>> > > from the OS or on thread scheduling, but which ensures that commnads
>> have
>> > > been read before running code depending on commands. There are
>> multiple
>> > > ways to do that; maybe I can find time to write a solution later.
>> > >
>> > >
>> > > On Fri, Mar 22, 2019 at 3:46 PM Jason Young <
>> jason.yo...@procentive.com>
>> > > wrote:
>> > >
>> > > > Thanks Tibor. I'll use `encodeAndWriteToOutput` later. For now, I
>> > hacked
>> > > > together some temporary changes to ForkedBooter so that it "logs"
>> (just
>> > > > `System.out.println`) the current time every time:
>> > > >
>> > > >    - the command reader thread reads a NOOP via stdin,
>> > > >    - it starts the ping job,
>> > > >    - it checks for NOOP (`canUseNewPingMechanism(..)` returns
>> `false`),
>> > > >    and
>> > > >    - it would have exited because there is no NOOP (`!pingDone`).
>> > > >
>> > > > I also commented out the code to exit in that case so it keeps
>> running
>> > > and
>> > > > logging.
>> > > >
>> > > > The resulting output starts out showing that it's receiving ping
>> every
>> > > 10s
>> > > > and checking for NOOP every 30s, as expected. But then one large
>> test
>> > > > begins, and:
>> > > >
>> > > > DEBUG 2019/03/22 19:14:40.223 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:02.723 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.139 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:03.172 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:03.186 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:03.188 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.243 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.307 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.308 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.202 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:03.323 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.337 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:03.351 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.351 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:03.351 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.352 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:03.366 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.393 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.380 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:03.409 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.424 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.423 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:03.424 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.438 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:03.438 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.453 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:03.454 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.476 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:03.476 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:03.505 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:03.505 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:08.872 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:18.859 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:19.282 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:19.330 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:18:28.845 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:38.832 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:48.838 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:18:49.241 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:18:49.265 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:19:01.162 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:19:10.069 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:19:19.136 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:19:19.199 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:19:19.199 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > WARN 2019/03/22 19:22:28.698 GMT ForkedBooter: Did not receive NOOP
>> > from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > WARN 2019/03/22 19:22:28.698 GMT ForkedBooter: Did not receive NOOP
>> > from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP
>> > from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP
>> > from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP
>> > from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job
>> (i.e.
>> > > > checking if we read a ping earlier).
>> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP
>> from
>> > > > Maven process.
>> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP
>> > from
>> > > > Maven process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from
>> > Maven
>> > > > process.
>> > > > ...
>> > > >
>> > > > With these changes, a project which normally takes 15 to 20 minutes
>> to
>> > > run
>> > > > actually took 80m. Would printing directly to stdout from within a
>> test
>> > > do
>> > > > that somehow? If not, we have a problem in our code.
>> > > >
>> > > > My comments on the above report to follow...
>> > > >
>> > > > On Fri, Mar 22, 2019 at 11:58 AM Tibor Digana <
>> tibordig...@apache.org>
>> > > > wrote:
>> > > >
>> > > >> >> is there an -Xmx value you would recommend to ensure the parent
>> > > process
>> > > >> can send the NOOP to the surefire process?
>> > > >>
>> > > >> no such value is generic value to recommend. Each application or
>> test
>> > > must
>> > > >> find our the value to prevent from OOM.
>> > > >>
>> > > >>
>> > > >> >> I'm making a custom build of surefire-booter to workaround the
>> > > initial
>> > > >> problem by commenting out the code to exit.
>> > > >>
>> > > >> Very good step forward. Then post your findings and we can
>> continue on
>> > > >> making a pull request on GitHub.
>> > > >>
>> > > >>
>> > > >> >> Is there a logging API which I can use instead?
>> > > >>
>> > > >> Yes, it is very low level of logging (sending events to parent
>> > process)
>> > > >> but
>> > > >> this will be turned to high level abstraction in several days
>> (branch
>> > > >> surefire-1222_2).
>> > > >> Yet, pls use this function:
>> > > >>
>> > > >> encodeAndWriteToOutput()
>> > > >>
>> > > >>
>> > > >> >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Fri, Mar 22, 2019 at 4:44 PM Jason Young <
>> > jason.yo...@procentive.com
>> > > >
>> > > >> wrote:
>> > > >>
>> > > >> > To clarify, this is the image ours is based on:
>> > > >> > https://hub.docker.com/_/alpine; we are not using a vanilla
>> Maven
>> > > >> image as
>> > > >> > we have to add a few other items to this image as well. I don't
>> know
>> > > who
>> > > >> > maintains that or adds /etc/mavenrc; I'll bring that up in
>> another
>> > > >> topic in
>> > > >> > another forum. I was just answering a previous question, and also
>> > > >> throwing
>> > > >> > out info for anyone struggling with e.g. OOME in their Maven
>> process
>> > > (a
>> > > >> lot
>> > > >> > of people have that problem when running the Sonar scanner, for
>> > > >> example). I
>> > > >> > don't think we need to worry about why that file is added for
>> this
>> > > >> topic.
>> > > >> >
>> > > >> > But while we're on the subject, is there an -Xmx value you would
>> > > >> recommend
>> > > >> > to ensure the parent process can send the NOOP to the surefire
>> > > process?
>> > > >> >
>> > > >> > I'm making a custom build of surefire-booter to workaround the
>> > initial
>> > > >> > problem by commenting out the code to exit. I am also adding
>> logging
>> > > via
>> > > >> > stdout so I can see if ForkedBooter is indeed running the ping
>> job
>> > in
>> > > >> small
>> > > >> > increments of time. Is there a logging API which I can use
>> instead?
>> > > >> >
>> > > >> > On Fri, Mar 22, 2019 at 5:44 AM Tibor Digana <
>> > tibordig...@apache.org>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > The base images are developed in
>> > > >> > https://github.com/carlossg/docker-maven,
>> > > >> > > right?
>> > > >> > > Who creates "/etc/mavenrc"?
>> > > >> > >
>> > > >> > > On Thu, Mar 21, 2019 at 12:05 AM Jason Young <
>> > > >> jason.yo...@procentive.com
>> > > >> > >
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > > > Mikael, sorry I do not appear to have permission to view the
>> > link.
>> > > >> > > >
>> > > >> > > > I did some digging in the last couple of days. I see that the
>> > > parent
>> > > >> > > > process reads from stdin. I could not find anywhere that we
>> are
>> > > >> using
>> > > >> > > > stdin. FWIW the failures nearly always happen at least 15m
>> into
>> > a
>> > > >> ~20m
>> > > >> > > test
>> > > >> > > > run, so perf is a likely culprit.
>> > > >> > > >
>> > > >> > > > I see also that ForkedBooter reads commands from stdin in one
>> > > >> thread,
>> > > >> > and
>> > > >> > > > uses an executor service to check for a past ping in
>> > > >> > > > ForkedBooter.listenToShutdownCommands(..). When it checks, it
>> > also
>> > > >> sets
>> > > >> > > > pingDone to false. The executor is configured to run up to 2
>> > > threads
>> > > >> > > > concurrently to handle the workload, and is set to run at a
>> > fixed
>> > > >> rate
>> > > >> > > (not
>> > > >> > > > a fixed delay). If the test suite is busy with testing and GC
>> > and
>> > > >> has
>> > > >> > > lots
>> > > >> > > > of threads running, it's entirely possible that a thread
>> won't
>> > > have
>> > > >> a
>> > > >> > > > chance to run for a long time (e.g. 5s). Maybe instead of a
>> 30s
>> > > >> delay,
>> > > >> > > the
>> > > >> > > > VM gets around to checking for a ping every 35s over a long
>> span
>> > > of
>> > > >> > time.
>> > > >> > > > Because we're running at a "fixed rate" and not a "fixed
>> delay",
>> > > >> then
>> > > >> > > after
>> > > >> > > > a couple of minutes we might be a full 30s behind schedule.
>> It's
>> > > >> > possible
>> > > >> > > > the executor will create another thread to run the scheduled
>> > task
>> > > >> > because
>> > > >> > > > it's running behind schedule. This new thread checks for a
>> ping,
>> > > >> finds
>> > > >> > > it,
>> > > >> > > > and sets pingDone to false. But then the original thread also
>> > > runs,
>> > > >> > say,
>> > > >> > > 2
>> > > >> > > > seconds afterwards, checks pingDone, and finds it is false.
>> > > >> > > >
>> > > >> > > > So to mitigate the problem, can we a) make the executor run
>> > only 1
>> > > >> > thread
>> > > >> > > > and b) schedule the task at a fixed rate? For that matter, is
>> > > there
>> > > >> > > another
>> > > >> > > > scheduled executor we can reuse? I understand why checking
>> for
>> > > ping
>> > > >> > > > requires a separate executor. Should I ask in github?
>> > > >> > > >
>> > > >> > > > Regarding a previous question, I found out that Alpine's
>> Maven
>> > > >> package
>> > > >> > > > comes with an /etc/mavenrc that sets `MAVEN_OPTS="$MAVEN_OPTS
>> > > >> > -Xmx512m"`
>> > > >> > > > which cannot be undone by setting `MAVEN_OPTS` at the command
>> > > line;
>> > > >> you
>> > > >> > > end
>> > > >> > > > up with e.g. `-Xmx1g -Xmx512m`. (Note this applies to the
>> Maven
>> > > >> > (parent)
>> > > >> > > > process, not the surefire/failsafe (child) process.)
>> > > >> > > >
>> > > >> > > > On Wed, Mar 20, 2019 at 3:46 AM Bernd Eckenfels <
>> > > >> > e...@zusammenkunft.net>
>> > > >> > > > wrote:
>> > > >> > > >
>> > > >> > > > > I guess a timeout caused by FullGC can happen with TCP as
>> > well.
>> > > >> > > > Increasing
>> > > >> > > > > the timeout might not be nice but does look like it would
>> help
>> > > in
>> > > >> > both
>> > > >> > > > > cases. (Problems with stdout are more related to unexpected
>> > JVM
>> > > >> > > messages
>> > > >> > > > I
>> > > >> > > > > guess)
>> > > >> > > > >
>> > > >> > > > > Gruss
>> > > >> > > > > Bernd
>> > > >> > > > > --
>> > > >> > > > > http://bernd.eckenfels.net
>> > > >> > > > >
>> > > >> > > > > ________________________________
>> > > >> > > > > Von: Mikael Åsberg <m.asberg.wa...@gmail.com>
>> > > >> > > > > Gesendet: Mittwoch, März 20, 2019 9:40 AM
>> > > >> > > > > An: Maven Users List
>> > > >> > > > > Betreff: Re: Failsafe: Killing self fork JVM. PING timeout
>> > > >> elapsed.
>> > > >> > > > >
>> > > >> > > > > These issues regarding communication with forked JVMs,
>> won't
>> > > they
>> > > >> be
>> > > >> > > > > resolved once surefire moves to interprocess communication
>> > using
>> > > >> > > > > tcp/ip sockets? This happens to be the target feature to be
>> > > >> included
>> > > >> > > > > in the next surefire 3.0.0 milestone:
>> > > >> > > > >
>> > > >> https://issues.apache.org/jira/projects/SUREFIRE/versions/12344668
>> > > >> > > > >
>> > > >> > > > > There are soooo many issues relating to surefire reading
>> > stdout
>> > > of
>> > > >> > > > > forked processes (which is my understanding that it is
>> > currently
>> > > >> > > > > doing). Many of us are really looking forward to the next
>> > > >> milestone.
>> > > >> > > > >
>> > > >> > > > > On Tue, Mar 19, 2019 at 8:59 PM Jason Young <
>> > > >> > > jason.yo...@procentive.com>
>> > > >> > > > > wrote:
>> > > >> > > > > >
>> > > >> > > > > > Getting back to my original questions, I know that "ping"
>> > > means
>> > > >> to
>> > > >> > > see
>> > > >> > > > > if a
>> > > >> > > > > > process is there, and "NOOP" implies it's not a command
>> to
>> > do
>> > > >> > > anything.
>> > > >> > > > > But
>> > > >> > > > > > what do the terms "ping" and "NOOP" mean in this context,
>> > i.e.
>> > > >> how
>> > > >> > do
>> > > >> > > > the
>> > > >> > > > > > processes communicate? I assume they don't sonar. Do
>> other
>> > > >> > processes
>> > > >> > > > also
>> > > >> > > > > > ping NOOPs? Can I PING Chrome with a NOOP from bash? Is
>> it
>> > > with
>> > > >> > TCP?
>> > > >> > > > > >
>> > > >> > > > > > I'm confused about what I should do regarding GC pauses.
>> > > >> > Previously I
>> > > >> > > > had
>> > > >> > > > > > code that would write the amount of remaining heap space
>> (or
>> > > >> > > something
>> > > >> > > > > like
>> > > >> > > > > > that) to stdout after every test to troubleshoot OOMEs.
>> Can
>> > > >> writing
>> > > >> > > to
>> > > >> > > > > > stdout cause the communication failure somehow?
>> > > >> > > > > >
>> > > >> > > > > > On Wed, Mar 13, 2019 at 5:57 PM Jason Young <
>> > > >> > > > jason.yo...@procentive.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > >
>> > > >> > > > > > > I upgraded failsafe and surefire to 3.0.0-M3 as
>> advised;
>> > we
>> > > >> > > > encountered
>> > > >> > > > > > > the same exception. (Still using -Xmx5g, will switch to
>> > > OpenJ9
>> > > >> > soon
>> > > >> > > > in
>> > > >> > > > > case
>> > > >> > > > > > > that helps.)
>> > > >> > > > > > >
>> > > >> > > > > > > BTW I also asked on StackOverflow previously, for
>> anyone
>> > > >> > > interested:
>> > > >> > > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed
>> > > >> > > > > > >
>> > > >> > > > > > > On Tue, Feb 26, 2019 at 6:40 PM Jason Young <
>> > > >> > > > > jason.yo...@procentive.com>
>> > > >> > > > > > > wrote:
>> > > >> > > > > > >
>> > > >> > > > > > >> Thanks again for the information.
>> > > >> > > > > > >>
>> > > >> > > > > > >> We had increased the RAM to 3g some time ago to
>> prevent
>> > > >> OOMEs.
>> > > >> > > More
>> > > >> > > > > > >> recently, I increased the RAM again to 5g for extra
>> > > headroom
>> > > >> > since
>> > > >> > > > we
>> > > >> > > > > had
>> > > >> > > > > > >> more headroom available; the problem hasn't happened
>> > since,
>> > > >> but
>> > > >> > it
>> > > >> > > > > hasn't
>> > > >> > > > > > >> been very long.
>> > > >> > > > > > >>
>> > > >> > > > > > >> We use a more customized image based on Alpine 3.8.2.
>> The
>> > > JDK
>> > > >> > and
>> > > >> > > > > Maven
>> > > >> > > > > > >> are obtained via apk.
>> > > >> > > > > > >>
>> > > >> > > > > > >> I will try upgrading failsafe (and surefire while I'm
>> at
>> > > it)
>> > > >> > > sooner,
>> > > >> > > > > and
>> > > >> > > > > > >> probably do some experimentation with JVMs another
>> time
>> > > (not
>> > > >> > > > pressing
>> > > >> > > > > for
>> > > >> > > > > > >> me ATM).
>> > > >> > > > > > >>
>> > > >> > > > > > >> On Tue, Feb 26, 2019 at 12:20 PM Tibor Digana <
>> > > >> > > > tibordig...@apache.org
>> > > >> > > > > >
>> > > >> > > > > > >> wrote:
>> > > >> > > > > > >>
>> > > >> > > > > > >>> >> I'll try to enable some logging about GC pauses to
>> > see
>> > > >> > what's
>> > > >> > > up
>> > > >> > > > > > >>>
>> > > >> > > > > > >>> Pls do not keep such setting after tuning the GC
>> because
>> > > >> this
>> > > >> > may
>> > > >> > > > > > >>> sometime
>> > > >> > > > > > >>> break the interprocess communication between Maven
>> > process
>> > > >> and
>> > > >> > > > > surefire
>> > > >> > > > > > >>> process.
>> > > >> > > > > > >>> It's worth to list GC information in a file and not
>> in
>> > the
>> > > >> > > console
>> > > >> > > > > logs.
>> > > >> > > > > > >>> This can be configured, I guess.
>> > > >> > > > > > >>>
>> > > >> > > > > > >>> >> Do you think the value is simply too low?
>> > > >> > > > > > >>>
>> > > >> > > > > > >>> GCing many objects may take some time and I remember
>> we
>> > > had
>> > > >> a
>> > > >> > > user
>> > > >> > > > > who
>> > > >> > > > > > >>> had
>> > > >> > > > > > >>> this problem a year or two ago.
>> > > >> > > > > > >>> We check every third NOOP (which is 3 x 10 sec) as a
>> fix
>> > > >> > instead
>> > > >> > > of
>> > > >> > > > > every
>> > > >> > > > > > >>> NOP. So 30 seconds looked satisfactory.
>> > > >> > > > > > >>> I think you use old version 2.20 or something like
>> that.
>> > > The
>> > > >> > > fixes
>> > > >> > > > > for
>> > > >> > > > > > >>> docker have been done so far, so please use the
>> latest
>> > > >> version
>> > > >> > > > > 3.0.0-M3.
>> > > >> > > > > > >>> See this page
>> > > >> > > > > > >>>
>> > > >> > > >
>> > > https://maven.apache.org/surefire/maven-surefire-plugin/docker.html
>> > > >> ,
>> > > >> > > > > we
>> > > >> > > > > > >>> used maven:3.5.3-jdk-8-alpine in this test. Which
>> base
>> > > image
>> > > >> > did
>> > > >> > > > you
>> > > >> > > > > use?
>> > > >> > > > > > >>>
>> > > >> > > > > > >>> Cheers
>> > > >> > > > > > >>> Tibor
>> > > >> > > > > > >>>
>> > > >> > > > > > >>> On Tue, Feb 26, 2019 at 5:24 PM Jason Young <
>> > > >> > > > > jason.yo...@procentive.com>
>> > > >> > > > > > >>> wrote:
>> > > >> > > > > > >>>
>> > > >> > > > > > >>> > Thanks for the information. It's good to see
>> someone
>> > > >> > > understands
>> > > >> > > > a
>> > > >> > > > > > >>> little
>> > > >> > > > > > >>> > about this.
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> > Incidentally, we have been looking at other GCs and
>> > VMs
>> > > >> for
>> > > >> > the
>> > > >> > > > > > >>> application
>> > > >> > > > > > >>> > in production environments, so I'll look into how
>> > these
>> > > >> > affect
>> > > >> > > > > tests as
>> > > >> > > > > > >>> > well. I'll try to enable some logging about GC
>> pauses
>> > to
>> > > >> see
>> > > >> > > > > what's up.
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> > How would `-Xmx3g` cause long GC cycles? Do you
>> think
>> > > the
>> > > >> > value
>> > > >> > > > is
>> > > >> > > > > > >>> simply
>> > > >> > > > > > >>> > too low?
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> > FWIW we're running the Maven build in an
>> Alpine-based
>> > > >> Docker
>> > > >> > > > > container.
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> > On Sat, Feb 23, 2019 at 6:36 AM Tibor Digana <
>> > > >> > > > > tibordig...@apache.org>
>> > > >> > > > > > >>> > wrote:
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> > > Hi Jason,
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> > > We spoke about this issue on our chat in ASF
>> Slack:
>> > > >> > > > > > >>> > > "I think his tests have been paused for a long GC
>> > > >> periods
>> > > >> > and
>> > > >> > > > > timed
>> > > >> > > > > > >>> out
>> > > >> > > > > > >>> > 3x
>> > > >> > > > > > >>> > > PING period = 30 seconds. After this period
>> forked
>> > JVM
>> > > >> > > supposed
>> > > >> > > > > the
>> > > >> > > > > > >>> Maven
>> > > >> > > > > > >>> > > process was killed by JenkinsCI and therefore all
>> > > >> surefire
>> > > >> > > > > processes
>> > > >> > > > > > >>> are
>> > > >> > > > > > >>> > > killed as well and all the file handlers and
>> memory
>> > > >> > > > consumptions
>> > > >> > > > > are
>> > > >> > > > > > >>> > > freed."
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> > > "But I have to say that `-Xmx3g` may cause long
>> GC
>> > > >> cycles,
>> > > >> > > see
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>>
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html
>> > > >> > > > > > >>> > > "
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> > > You are using java-1.8-openjdk. I guess you
>> should
>> > use
>> > > >> > > > > Shenandoah GC
>> > > >> > > > > > >>> > which
>> > > >> > > > > > >>> > > is an experimental algorithm in JVM 1.8. This
>> would
>> > > >> > > > significantly
>> > > >> > > > > > >>> short
>> > > >> > > > > > >>> > > the GC cycles.
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> > > We should of cource provide a new configuration
>> > > >> parameter
>> > > >> > to
>> > > >> > > > give
>> > > >> > > > > > >>> you a
>> > > >> > > > > > >>> > > chance to prolong the PING.
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> > > Cheers
>> > > >> > > > > > >>> > > Tibor
>> > > >> > > > > > >>> > >
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> > --
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>> > Jason Young
>> > > >> > > > > > >>> >
>> > > >> > > > > > >>>
>> > > >> > > > > > >>
>> > > >> > > > > > >>
>> > > >> > > > > >
>> > > >> > > > > > --
>> > > >> > > > > > Jason Young
>> > > >> > > > > > Software Engineer | PROCENTIVE
>> > > >> > > > > > [image: Phone] 715 245 8000 x7609
>> > > >> > > > > > [image: Mobile] 706 870 3540
>> > > >> > > > > > [image: Web] procentive.com
>> > > >> > > > > > Confidentiality Notice: This message is intended for the
>> > sole
>> > > >> use
>> > > >> > of
>> > > >> > > > the
>> > > >> > > > > > individual and entity to which it is addressed, and may
>> > > contain
>> > > >> > > > > information
>> > > >> > > > > > that is privileged, confidential and exempt from
>> disclosure
>> > > >> under
>> > > >> > > > > > applicable law. Any unauthorized review, use, disclosure
>> or
>> > > >> > > > distribution
>> > > >> > > > > of
>> > > >> > > > > > this email message, including any attachment, is
>> prohibited.
>> > > If
>> > > >> you
>> > > >> > > are
>> > > >> > > > > not
>> > > >> > > > > > the intended recipient, please advise the sender by reply
>> > > email
>> > > >> and
>> > > >> > > > > destroy
>> > > >> > > > > > all copies of the original message.
>> > > >> > > > >
>> > > >> > > > >
>> > > >>
>> ---------------------------------------------------------------------
>> > > >> > > > > To unsubscribe, e-mail: users-unsubscr...@maven.apache.org
>> > > >> > > > > For additional commands, e-mail:
>> users-h...@maven.apache.org
>> > > >> > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Reply via email to