I upgraded failsafe and surefire to 3.0.0-M3 as advised; we encountered the same exception. (Still using -Xmx5g, will switch to OpenJ9 soon in case that helps.)
BTW I also asked on StackOverflow previously, for anyone interested: https://stackoverflow.com/questions/54755846/killing-self-fork-jvm-ping-timeout-elapsed On Tue, Feb 26, 2019 at 6:40 PM Jason Young <jason.yo...@procentive.com> wrote: > Thanks again for the information. > > We had increased the RAM to 3g some time ago to prevent OOMEs. More > recently, I increased the RAM again to 5g for extra headroom since we had > more headroom available; the problem hasn't happened since, but it hasn't > been very long. > > We use a more customized image based on Alpine 3.8.2. The JDK and Maven > are obtained via apk. > > I will try upgrading failsafe (and surefire while I'm at it) sooner, and > probably do some experimentation with JVMs another time (not pressing for > me ATM). > > On Tue, Feb 26, 2019 at 12:20 PM Tibor Digana <tibordig...@apache.org> > wrote: > >> >> I'll try to enable some logging about GC pauses to see what's up >> >> Pls do not keep such setting after tuning the GC because this may sometime >> break the interprocess communication between Maven process and surefire >> process. >> It's worth to list GC information in a file and not in the console logs. >> This can be configured, I guess. >> >> >> Do you think the value is simply too low? >> >> GCing many objects may take some time and I remember we had a user who had >> this problem a year or two ago. >> We check every third NOOP (which is 3 x 10 sec) as a fix instead of every >> NOP. So 30 seconds looked satisfactory. >> I think you use old version 2.20 or something like that. The fixes for >> docker have been done so far, so please use the latest version 3.0.0-M3. >> See this page >> https://maven.apache.org/surefire/maven-surefire-plugin/docker.html, we >> used maven:3.5.3-jdk-8-alpine in this test. Which base image did you use? >> >> Cheers >> Tibor >> >> On Tue, Feb 26, 2019 at 5:24 PM Jason Young <jason.yo...@procentive.com> >> wrote: >> >> > Thanks for the information. It's good to see someone understands a >> little >> > about this. >> > >> > Incidentally, we have been looking at other GCs and VMs for the >> application >> > in production environments, so I'll look into how these affect tests as >> > well. I'll try to enable some logging about GC pauses to see what's up. >> > >> > How would `-Xmx3g` cause long GC cycles? Do you think the value is >> simply >> > too low? >> > >> > FWIW we're running the Maven build in an Alpine-based Docker container. >> > >> > On Sat, Feb 23, 2019 at 6:36 AM Tibor Digana <tibordig...@apache.org> >> > wrote: >> > >> > > Hi Jason, >> > > >> > > We spoke about this issue on our chat in ASF Slack: >> > > "I think his tests have been paused for a long GC periods and timed >> out >> > 3x >> > > PING period = 30 seconds. After this period forked JVM supposed the >> Maven >> > > process was killed by JenkinsCI and therefore all surefire processes >> are >> > > killed as well and all the file handlers and memory consumptions are >> > > freed." >> > > >> > > "But I have to say that `-Xmx3g` may cause long GC cycles, see >> > > >> > > >> > >> https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html >> > > " >> > > >> > > You are using java-1.8-openjdk. I guess you should use Shenandoah GC >> > which >> > > is an experimental algorithm in JVM 1.8. This would significantly >> short >> > > the GC cycles. >> > > >> > > We should of cource provide a new configuration parameter to give you >> a >> > > chance to prolong the PING. >> > > >> > > Cheers >> > > Tibor >> > > >> > >> > >> > -- >> > >> > Jason Young >> > >> > >