Pls check this commit. It is the algorithm https://github.com/apache/maven-surefire/pull/231/commits/112f3e4348f8ed731182064561604ff0fa9ca30c which avoids killing JVM when GC has paused the JVM for longer than 15 seconds.
On Sat, Apr 20, 2019 at 5:27 PM Tibor Digana <tibordig...@apache.org> wrote: > Hey Jason, > > Here is the PR I promissed > https://github.com/apache/maven-surefire/pull/231 > I will think about more tolerant algorithm. > Feel free to help us in this PR on GitHub. > > Cheers > Tibor > > > On Wed, Apr 10, 2019 at 3:18 AM Jason Young <jason.yo...@procentive.com> > wrote: > >> Thanks. >> >> I finally took a little time to look at why the `ps` command would not >> work. For our project, we cannot use a default Maven image, so I made a >> custom one based on alpine. One piece of software the Maven alpine image >> has is `pcprops`; adding that fixes the problem: >> >> $ docker run -it alpine sh >> > / # apk update >> > ... >> > OK: 8445 distinct packages available >> > / # apk upgrade >> > ... >> > OK: 4 MiB in 13 packages >> > / # ps -o etime= -p 1 >> > ps: unrecognized option: p >> > BusyBox v1.26.2 (2018-05-30 13:51:20 GMT) multi-call binary. >> > Usage: ps [-o COL1,COL2=HEADER] >> > Show list of processes >> > -o COL1,COL2=HEADER Select columns for display >> > / # apk add procps >> > ... >> > OK: 12 MiB in 19 packages >> > / # ps -o etime= -p 1 >> > 05:21 >> >> >> Hope that helps anyone else using a custom-built alpine-based Maven image. >> >> I'm not sure if I'll have time to experiment much with Surefire changes in >> the near future; my focus at the moment is figuring out all the reasons >> our >> test suite sometimes takes 5 or more times as long as normal (often also >> causes Surefire to kill the child process, but even if it didn't, we have >> a >> problem!). >> >> On Sun, Apr 7, 2019 at 2:04 AM Tibor Digana <tibordig...@apache.org> >> wrote: >> >> > Hi Jason, >> > >> > I am sorry I haven't reply. >> > Excellent analysis! >> > >> > Can you make an experiment and use pingJob on a "fixed delay" instead >> of a >> > "fixed rate"? >> > If you want to make more, feel free. >> > >> > We should mabe think about how we would add more intelligence in these >> > timers in order to detect the "blackout". >> > >> > Cheers >> > Tibor >> > >> > >> > >> > >> > On Fri, Mar 22, 2019 at 9:53 PM Jason Young <jason.yo...@procentive.com >> > >> > wrote: >> > >> > > Notice: >> > > * an over 3-minute gap between the first two entries above--neither >> the >> > > command reader thread nor the pingJob had a chance to run. >> > > * the command reader thread and the ping job executor evidently get a >> > > chance to run from 19:18:02 to 19:19:19. They ran with increased >> > frequency >> > > during that time to "catch up": The command reader thread is >> programmed >> > to >> > > read from stdin until there's nothing left to read, and the >> pingScheduler >> > > runs the pingJob at a "fixed" rate. >> > > * Another 3-minute gap and another attempt to catch up at 19:22:28. >> This >> > > time though, the command reader thread evidently did not get a turn in >> > > time. >> > > >> > > I did not bother to log what threads the ping job was running under; I >> > can >> > > do that later. I also did not add any logging on the Maven process >> side; >> > I >> > > can do that as well. >> > > >> > > This is on a VM with unthrottled access to 4 processor cores and 8GB >> > RAM. I >> > > think the code under test is CPU- and RAM-intensive. There probably >> were >> > GC >> > > pauses, but I don't have proper logging for that in place. >> > > >> > > Note that this is all very normal behavior: Both the command reader >> and >> > the >> > > ping job's thread (run by the pingScheduler) are instructed to run >> until >> > > their work is done, not to sleep or yield. For high throughput, an OS >> > will >> > > favor continuing to run a thread for a while once it is started over >> > > switching tasks, so it's normal both that each thread would run for a >> > while >> > > after a pause, and (by extension) that each would have long pauses as >> > other >> > > threads had to run, too. Because of the long gaps between runs of the >> > > command reader thread, my previous advice (schedule the pingJob on a >> > "fixed >> > > delay" instead of a "fixed rate") would not have saved the build in >> this >> > > case (though "fixed delay" is more correct and would save the build in >> > > other possible cases). Our code base has some unit tests that rely on >> > > "normal case" timing, but they now have the @Ignore attribute because >> > they >> > > would rarely, but too often, fail. For most teams and projects, unit >> > tests >> > > have to be more reliable than that. >> > > >> > > So I think we need a solution that does not rely on preemptive >> scheduling >> > > from the OS or on thread scheduling, but which ensures that commnads >> have >> > > been read before running code depending on commands. There are >> multiple >> > > ways to do that; maybe I can find time to write a solution later. >> > > >> > > >> > > On Fri, Mar 22, 2019 at 3:46 PM Jason Young < >> jason.yo...@procentive.com> >> > > wrote: >> > > >> > > > Thanks Tibor. I'll use `encodeAndWriteToOutput` later. For now, I >> > hacked >> > > > together some temporary changes to ForkedBooter so that it "logs" >> (just >> > > > `System.out.println`) the current time every time: >> > > > >> > > > - the command reader thread reads a NOOP via stdin, >> > > > - it starts the ping job, >> > > > - it checks for NOOP (`canUseNewPingMechanism(..)` returns >> `false`), >> > > > and >> > > > - it would have exited because there is no NOOP (`!pingDone`). >> > > > >> > > > I also commented out the code to exit in that case so it keeps >> running >> > > and >> > > > logging. >> > > > >> > > > The resulting output starts out showing that it's receiving ping >> every >> > > 10s >> > > > and checking for NOOP every 30s, as expected. But then one large >> test >> > > > begins, and: >> > > > >> > > > DEBUG 2019/03/22 19:14:40.223 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:02.723 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.139 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:03.172 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:03.186 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:03.188 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:03.187 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.243 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.307 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.308 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.202 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:03.323 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.337 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:03.351 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.351 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:03.351 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.352 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:03.366 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.393 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.380 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:03.409 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.424 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.423 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:03.424 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.438 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:03.438 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.453 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:03.454 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.476 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:03.476 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:03.505 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:03.505 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:08.872 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:18.859 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:19.282 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:19.330 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:18:28.845 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:38.832 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:48.838 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:18:49.241 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:18:49.265 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:19:01.162 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:19:10.069 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:19:19.136 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:19:19.199 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:19:19.199 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > WARN 2019/03/22 19:22:28.698 GMT ForkedBooter: Did not receive NOOP >> > from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:22:28.698 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > WARN 2019/03/22 19:22:28.698 GMT ForkedBooter: Did not receive NOOP >> > from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP >> > from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP >> > from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP >> > from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Running ping job >> (i.e. >> > > > checking if we read a ping earlier). >> > > > DEBUG 2019/03/22 19:22:28.699 GMT ForkedBooter: Checking for NOOP >> from >> > > > Maven process. >> > > > WARN 2019/03/22 19:22:28.699 GMT ForkedBooter: Did not receive NOOP >> > from >> > > > Maven process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.703 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > DEBUG 2019/03/22 19:22:28.704 GMT ForkedBooter: Received ping from >> > Maven >> > > > process. >> > > > ... >> > > > >> > > > With these changes, a project which normally takes 15 to 20 minutes >> to >> > > run >> > > > actually took 80m. Would printing directly to stdout from within a >> test >> > > do >> > > > that somehow? If not, we have a problem in our code. >> > > > >> > > > My comments on the above report to follow... >> > > > >> > > > On Fri, Mar 22, 2019 at 11:58 AM Tibor Digana < >> tibordig...@apache.org> >> > > > wrote: >> > > > >> > > >> >> is there an -Xmx value you would recommend to ensure the parent >> > > process >> > > >> can send the NOOP to the surefire process? >> > > >> >> > > >> no such value is generic value to recommend. Each application or >> test >> > > must >> > > >> find our the value to prevent from OOM. >> > > >> >> > > >> >> > > >> >> I'm making a custom build of surefire-booter to workaround the >> > > initial >> > > >> problem by commenting out the code to exit. >> > > >> >> > > >> Very good step forward. Then post your findings and we can >> continue on >> > > >> making a pull request on GitHub. >> > > >> >> > > >> >> > > >> >> Is there a logging API which I can use instead? >> > > >> >> > > >> Yes, it is very low level of logging (sending events to parent >> > process) >> > > >> but >> > > >> this will be turned to high level abstraction in several days >> (branch >> > > >> surefire-1222_2). >> > > >> Yet, pls use this function: >> > > >> >> > > >> encodeAndWriteToOutput() >> > > >> >> > > >> >> > > >> >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> On Fri, Mar 22, 2019 at 4:44 PM Jason Young < >> > jason.yo...@procentive.com >> > > > >> > > >> wrote: >> > > >> >> > > >> > To clarify, this is the image ours is based on: >> > > >> > https://hub.docker.com/_/alpine; we are not using a vanilla >> Maven >> > > >> image as >> > > >> > we have to add a few other items to this image as well. I don't >> know >> > > who >> > > >> > maintains that or adds /etc/mavenrc; I'll bring that up in >> another >> > > >> topic in >> > > >> > another forum. I was just answering a previous question, and also >> > > >> throwing >> > > >> > out info for anyone struggling with e.g. OOME in their Maven >> process >> > > (a >> > > >> lot >> > > >> > of people have that problem when running the Sonar scanner, for >> > > >> example). I >> > > >> > don't think we need to worry about why that file is added for >> this >> > > >> topic. >> > > >> > >> > > >> > But while we're on the subject, is there an -Xmx value you would >> > > >> recommend >> > > >> > to ensure the parent process can send the NOOP to the surefire >> > > process? >> > > >> > >> > > >> > I'm making a custom build of surefire-booter to workaround the >> > initial >> > > >> > problem by commenting out the code to exit. I am also adding >> logging >> > > via >> > > >> > stdout so I can see if ForkedBooter is indeed running the ping >> job >> > in >> > > >> small >> > > >> > increments of time. Is there a logging API which I can use >> instead? >> > > >> > >> > > >> > On Fri, Mar 22, 2019 at 5:44 AM Tibor Digana < >> > tibordig...@apache.org> >> > > >> > wrote: >> > > >> > >> > > >> > > The base images are developed in >> > > >> > https://github.com/carlossg/docker-maven, >> > > >> > > right? >> > > >> > > Who creates "/etc/mavenrc"? >> > > >> > > >> > > >> > > On Thu, Mar 21, 2019 at 12:05 AM Jason Young < >> > > >> jason.yo...@procentive.com >> > > >> > > >> > > >> > > wrote: >> > > >> > > >> > > >> > > > Mikael, sorry I do not appear to have permission to view the >> > link. >> > > >> > > > >> > > >> > > > I did some digging in the last couple of days. I see that the >> > > parent >> > > >> > > > process reads from stdin. I could not find anywhere that we >> are >> > > >> using >> > > >> > > > stdin. FWIW the failures nearly always happen at least 15m >> into >> > a >> > > >> ~20m >> > > >> > > test >> > > >> > > > run, so perf is a likely culprit. >> > > >> > > > >> > > >> > > > I see also that ForkedBooter reads commands from stdin in one >> > > >> thread, >> > > >> > and >> > > >> > > > uses an executor service to check for a past ping in >> > > >> > > > ForkedBooter.listenToShutdownCommands(..). When it checks, it >> > also >> > > >> sets >> > > >> > > > pingDone to false. The executor is configured to run up to 2 >> > > threads >> > > >> > > > concurrently to handle the workload, and is set to run at a >> > fixed >> > > >> rate >> > > >> > > (not >> > > >> > > > a fixed delay). If the test suite is busy with testing and GC >> > and >> > > >> has >> > > >> > > lots >> > > >> > > > of threads running, it's entirely possible that a thread >> won't >> > > have >> > > >> a >> > > >> > > > chance to run for a long time (e.g. 5s). Maybe instead of a >> 30s >> > > >> delay, >> > > >> > > the >> > > >> > > > VM gets around to checking for a ping every 35s over a long >> span >> > > of >> > > >> > time. >> > > >> > > > Because we're running at a "fixed rate" and not a "fixed >> delay", >> > > >> then >> > > >> > > after >> > > >> > > > a couple of minutes we might be a full 30s behind schedule. >> It's >> > > >> > possible >> > > >> > > > the executor will create another thread to run the scheduled >> > task >> > > >> > because >> > > >> > > > it's running behind schedule. This new thread checks for a >> ping, >> > > >> finds >> > > >> > > it, >> > > >> > > > and sets pingDone to false. But then the original thread also >> > > runs, >> > > >> > say, >> > > >> > > 2 >> > > >> > > > seconds afterwards, checks pingDone, and finds it is false. >> > > >> > > > >> > > >> > > > So to mitigate the problem, can we a) make the executor run >> > only 1 >> > > >> > thread >> > > >> > > > and b) schedule the task at a fixed rate? For that matter, is >> > > there >> > > >> > > another >> > > >> > > > scheduled executor we can reuse? I understand why checking >> for >> > > ping >> > > >> > > > requires a separate executor. Should I ask in github? >> > > >> > > > >> > > >> > > > Regarding a previous question, I found out that Alpine's >> Maven >> > > >> package >> > > >> > > > comes with an /etc/mavenrc that sets `MAVEN_OPTS="$MAVEN_OPTS >> > > >> > -Xmx512m"` >> > > >> > > > which cannot be undone by setting `MAVEN_OPTS` at the command >> > > line; >> > > >> you >> > > >> > > end >> > > >> > > > up with e.g. `-Xmx1g -Xmx512m`. (Note this applies to the >> Maven >> > > >> > (parent) >> > > >> > > > process, not the surefire/failsafe (child) process.) >> > > >> > > > >> > > >> > > > On Wed, Mar 20, 2019 at 3:46 AM Bernd Eckenfels < >> > > >> > e...@zusammenkunft.net> >> > > >> > > > wrote: >> > > >> > > > >> > > >> > > > > I guess a timeout caused by FullGC can happen with TCP as >> > well. >> > > >> > > > Increasing >> > > >> > > > > the timeout might not be nice but does look like it would >> help >> > > in >> > > >> > both >> > > >> > > > > cases. (Problems with stdout are more related to unexpected >> > JVM >> > > >> > > messages >> > > >> > > > I >> > > >> > > > > guess) >> > > >> > > > > >> > > >> > > > > Gruss >> > > >> > > > > Bernd >> > > >> > > > > -- >> > > >> > > > > http://bernd.eckenfels.net >> > > >> > > > > >> > > >> > > > > ________________________________ >> > > >> > > > > Von: Mikael Åsberg <m.asberg.wa...@gmail.com> >> > > >> > > > > Gesendet: Mittwoch, März 20, 2019 9:40 AM >> > > >> > > > > An: Maven Users List >> > > >> > > > > Betreff: Re: Failsafe: Killing self fork JVM. PING timeout >> > > >> elapsed. >> > > >> > > > > >> > > >> > > > > These issues regarding communication with forked JVMs, >> won't >> > > they >> > > >> be >> > > >> > > > > resolved once surefire moves to interprocess communication >> > using >> > > >> > > > > tcp/ip sockets? This happens to be the target feature to be >> > > >> included >> > > >> > > > > in the next surefire 3.0.0 milestone: >> > > >> > > > > >> > > >> https://issues.apache.org/jira/projects/SUREFIRE/versions/12344668 >> > > >> > > > > >> > > >> > > > > There are soooo many issues relating to surefire reading >> > stdout >> > > of >> > > >> > > > > forked processes (which is my understanding that it is >> > currently >> > > >> > > > > doing). Many of us are really looking forward to the next >> > > >> milestone. >> > > >> > > > > >> > > >> > > > > On Tue, Mar 19, 2019 at 8:59 PM Jason Young < >> > > >> > > jason.yo...@procentive.com> >> > > >> > > > > wrote: >> > > >> > > > > > >> > > >> > > > > > Getting back to my original questions, I know that "ping" >> > > means >> > > >> to >> > > >> > > see >> > > >> > > > > if a >> > > >> > > > > > process is there, and "NOOP" implies it's not a command >> to >> > do >> > > >> > > anything. >> > > >> > > > > But >> > > >> > > > > > what do the terms "ping" and "NOOP" mean in this context, >> > i.e. >> > > >> how >> > > >> > do >> > > >> > > > the >> > > >> > > > > > processes communicate? I assume they don't sonar. Do >> other >> > > >> > processes >> > > >> > > > also >> > > >> > > > > > ping NOOPs? Can I PING Chrome with a NOOP from bash? Is >> it >> > > with >> > > >> > TCP? >> > > >> > > > > > >> > > >> > > > > > I'm confused about what I should do regarding GC pauses. >> > > >> > Previously I >> > > >> > > > had >> > > >> > > > > > code that would write the amount of remaining heap space >> (or >> > > >> > > something >> > > >> > > > > like >> > > >> > > > > > that) to stdout after every test to troubleshoot OOMEs. >> Can >> > > >> writing >> > > >> > > to >> > > >> > > > > > stdout cause the communication failure somehow? >> > > >> > > > > > >> > > >> > > > > > On Wed, Mar 13, 2019 at 5:57 PM Jason Young < >> > > >> > > > jason.yo...@procentive.com> >> > > >> > > > > > wrote: >> > > >> > > > > > >> > > >> > > > > > > I upgraded failsafe and surefire to 3.0.0-M3 as >> advised; >> > we >> > > >> > > > encountered >> > > >> > > > > > > the same exception. (Still using -Xmx5g, will switch to >> > > OpenJ9 >> > > >> > soon >> > > >> > > > in >> > > >> > > > > case >> > > >> > > > > > > that helps.) >> > > >> > > > > > > >> > > >> > > > > > > BTW I also asked on StackOverflow previously, for >> anyone >> > > >> > > interested: >> > > >> > > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed >> > > >> > > > > > > >> > > >> > > > > > > On Tue, Feb 26, 2019 at 6:40 PM Jason Young < >> > > >> > > > > jason.yo...@procentive.com> >> > > >> > > > > > > wrote: >> > > >> > > > > > > >> > > >> > > > > > >> Thanks again for the information. >> > > >> > > > > > >> >> > > >> > > > > > >> We had increased the RAM to 3g some time ago to >> prevent >> > > >> OOMEs. >> > > >> > > More >> > > >> > > > > > >> recently, I increased the RAM again to 5g for extra >> > > headroom >> > > >> > since >> > > >> > > > we >> > > >> > > > > had >> > > >> > > > > > >> more headroom available; the problem hasn't happened >> > since, >> > > >> but >> > > >> > it >> > > >> > > > > hasn't >> > > >> > > > > > >> been very long. >> > > >> > > > > > >> >> > > >> > > > > > >> We use a more customized image based on Alpine 3.8.2. >> The >> > > JDK >> > > >> > and >> > > >> > > > > Maven >> > > >> > > > > > >> are obtained via apk. >> > > >> > > > > > >> >> > > >> > > > > > >> I will try upgrading failsafe (and surefire while I'm >> at >> > > it) >> > > >> > > sooner, >> > > >> > > > > and >> > > >> > > > > > >> probably do some experimentation with JVMs another >> time >> > > (not >> > > >> > > > pressing >> > > >> > > > > for >> > > >> > > > > > >> me ATM). >> > > >> > > > > > >> >> > > >> > > > > > >> On Tue, Feb 26, 2019 at 12:20 PM Tibor Digana < >> > > >> > > > tibordig...@apache.org >> > > >> > > > > > >> > > >> > > > > > >> wrote: >> > > >> > > > > > >> >> > > >> > > > > > >>> >> I'll try to enable some logging about GC pauses to >> > see >> > > >> > what's >> > > >> > > up >> > > >> > > > > > >>> >> > > >> > > > > > >>> Pls do not keep such setting after tuning the GC >> because >> > > >> this >> > > >> > may >> > > >> > > > > > >>> sometime >> > > >> > > > > > >>> break the interprocess communication between Maven >> > process >> > > >> and >> > > >> > > > > surefire >> > > >> > > > > > >>> process. >> > > >> > > > > > >>> It's worth to list GC information in a file and not >> in >> > the >> > > >> > > console >> > > >> > > > > logs. >> > > >> > > > > > >>> This can be configured, I guess. >> > > >> > > > > > >>> >> > > >> > > > > > >>> >> Do you think the value is simply too low? >> > > >> > > > > > >>> >> > > >> > > > > > >>> GCing many objects may take some time and I remember >> we >> > > had >> > > >> a >> > > >> > > user >> > > >> > > > > who >> > > >> > > > > > >>> had >> > > >> > > > > > >>> this problem a year or two ago. >> > > >> > > > > > >>> We check every third NOOP (which is 3 x 10 sec) as a >> fix >> > > >> > instead >> > > >> > > of >> > > >> > > > > every >> > > >> > > > > > >>> NOP. So 30 seconds looked satisfactory. >> > > >> > > > > > >>> I think you use old version 2.20 or something like >> that. >> > > The >> > > >> > > fixes >> > > >> > > > > for >> > > >> > > > > > >>> docker have been done so far, so please use the >> latest >> > > >> version >> > > >> > > > > 3.0.0-M3. >> > > >> > > > > > >>> See this page >> > > >> > > > > > >>> >> > > >> > > > >> > > https://maven.apache.org/surefire/maven-surefire-plugin/docker.html >> > > >> , >> > > >> > > > > we >> > > >> > > > > > >>> used maven:3.5.3-jdk-8-alpine in this test. Which >> base >> > > image >> > > >> > did >> > > >> > > > you >> > > >> > > > > use? >> > > >> > > > > > >>> >> > > >> > > > > > >>> Cheers >> > > >> > > > > > >>> Tibor >> > > >> > > > > > >>> >> > > >> > > > > > >>> On Tue, Feb 26, 2019 at 5:24 PM Jason Young < >> > > >> > > > > jason.yo...@procentive.com> >> > > >> > > > > > >>> wrote: >> > > >> > > > > > >>> >> > > >> > > > > > >>> > Thanks for the information. It's good to see >> someone >> > > >> > > understands >> > > >> > > > a >> > > >> > > > > > >>> little >> > > >> > > > > > >>> > about this. >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > Incidentally, we have been looking at other GCs and >> > VMs >> > > >> for >> > > >> > the >> > > >> > > > > > >>> application >> > > >> > > > > > >>> > in production environments, so I'll look into how >> > these >> > > >> > affect >> > > >> > > > > tests as >> > > >> > > > > > >>> > well. I'll try to enable some logging about GC >> pauses >> > to >> > > >> see >> > > >> > > > > what's up. >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > How would `-Xmx3g` cause long GC cycles? Do you >> think >> > > the >> > > >> > value >> > > >> > > > is >> > > >> > > > > > >>> simply >> > > >> > > > > > >>> > too low? >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > FWIW we're running the Maven build in an >> Alpine-based >> > > >> Docker >> > > >> > > > > container. >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > On Sat, Feb 23, 2019 at 6:36 AM Tibor Digana < >> > > >> > > > > tibordig...@apache.org> >> > > >> > > > > > >>> > wrote: >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > > Hi Jason, >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > > We spoke about this issue on our chat in ASF >> Slack: >> > > >> > > > > > >>> > > "I think his tests have been paused for a long GC >> > > >> periods >> > > >> > and >> > > >> > > > > timed >> > > >> > > > > > >>> out >> > > >> > > > > > >>> > 3x >> > > >> > > > > > >>> > > PING period = 30 seconds. After this period >> forked >> > JVM >> > > >> > > supposed >> > > >> > > > > the >> > > >> > > > > > >>> Maven >> > > >> > > > > > >>> > > process was killed by JenkinsCI and therefore all >> > > >> surefire >> > > >> > > > > processes >> > > >> > > > > > >>> are >> > > >> > > > > > >>> > > killed as well and all the file handlers and >> memory >> > > >> > > > consumptions >> > > >> > > > > are >> > > >> > > > > > >>> > > freed." >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > > "But I have to say that `-Xmx3g` may cause long >> GC >> > > >> cycles, >> > > >> > > see >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > >> > > >> > > > > > >>> >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html >> > > >> > > > > > >>> > > " >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > > You are using java-1.8-openjdk. I guess you >> should >> > use >> > > >> > > > > Shenandoah GC >> > > >> > > > > > >>> > which >> > > >> > > > > > >>> > > is an experimental algorithm in JVM 1.8. This >> would >> > > >> > > > significantly >> > > >> > > > > > >>> short >> > > >> > > > > > >>> > > the GC cycles. >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > > We should of cource provide a new configuration >> > > >> parameter >> > > >> > to >> > > >> > > > give >> > > >> > > > > > >>> you a >> > > >> > > > > > >>> > > chance to prolong the PING. >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > > Cheers >> > > >> > > > > > >>> > > Tibor >> > > >> > > > > > >>> > > >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > -- >> > > >> > > > > > >>> > >> > > >> > > > > > >>> > Jason Young >> > > >> > > > > > >>> > >> > > >> > > > > > >>> >> > > >> > > > > > >> >> > > >> > > > > > >> >> > > >> > > > > > >> > > >> > > > > > -- >> > > >> > > > > > Jason Young >> > > >> > > > > > Software Engineer | PROCENTIVE >> > > >> > > > > > [image: Phone] 715 245 8000 x7609 >> > > >> > > > > > [image: Mobile] 706 870 3540 >> > > >> > > > > > [image: Web] procentive.com >> > > >> > > > > > Confidentiality Notice: This message is intended for the >> > sole >> > > >> use >> > > >> > of >> > > >> > > > the >> > > >> > > > > > individual and entity to which it is addressed, and may >> > > contain >> > > >> > > > > information >> > > >> > > > > > that is privileged, confidential and exempt from >> disclosure >> > > >> under >> > > >> > > > > > applicable law. Any unauthorized review, use, disclosure >> or >> > > >> > > > distribution >> > > >> > > > > of >> > > >> > > > > > this email message, including any attachment, is >> prohibited. >> > > If >> > > >> you >> > > >> > > are >> > > >> > > > > not >> > > >> > > > > > the intended recipient, please advise the sender by reply >> > > email >> > > >> and >> > > >> > > > > destroy >> > > >> > > > > > all copies of the original message. >> > > >> > > > > >> > > >> > > > > >> > > >> >> --------------------------------------------------------------------- >> > > >> > > > > To unsubscribe, e-mail: users-unsubscr...@maven.apache.org >> > > >> > > > > For additional commands, e-mail: >> users-h...@maven.apache.org >> > > >> > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > >> > >> >