> Could there be some automatic calculation of memory resources that makes > the build fail on some servers and not on others? >
maybe but > Our CI servers have 28GiB of memory. Could Docker allocate an amount > that is particularly suitable for our test suite? > the last memory error failure ( https://builds.apache.org/job/james/job/ApacheJames/job/PR-264 ) ran on H39 according to https://cwiki.apache.org/confluence/display/INFRA/Jenkins+node+labels this node has 4TB disk, 96GB RAM, builds 9->12 & 14 ran on H39 build 6 which also failed on memory ran on H22 for which the wiki doesn"t have stats builds 45 48,49 and 50 succeeded on H48 (I'm sticking to failures on purpose) I can't really wait for 3-4 hours in front of the jenkins page to see if another job starts on the worker my build is running on and the runners don't seem to store build history (not even for 24h https://builds.apache.org/computer/H39/builds build #13 failed on OOM and ran between 12 and 15 on 2021-01-13) so its really hard to correlate. for this morning build I happened to see a kafka job start on the runner (H42) while build #15 was running Kafka » kafka-trunk-jdk15 <https://builds.apache.org/job/Kafka/job/kafka-trunk-jdk15/> unfortunately that build was hit by the copy on write concurrency bug so no memory error this time \o/ However if you look at what I said about build #13 > I get a more classical `java.lang.OutOfMemoryError: Java heap space` a bit > before (12:14:09.563 vs 12:45:57.988). > The last non error line before the fatal Direct buffer memory error is > [INFO] Running > org.apache.james.webadmin.integration.rabbitmq.RabbitMQReindexingWithEventDeadLettersTest > The last non error line before the nonfatal heap memory error is > [INFO] Running > org.apache.james.jmap.memory.cucumber.MemoryDownloadCucumberTest According to all the documentation I have ever seen the Java heap space message means that the JVM could not allocate memory for an object within the heap and is not related to missing outside memory in my experience which seems to match https://stackoverflow.com/questions/46801741/jvm-crashes-with-error-cannot-allocate-memory-errno-12 a native memory allocation error while trying to increase the heap will crash the JVM so the Java heap OOM error is not related to other processes running on the machine I also don't see errors related to docker containers used by the tests failing to allocate memory, (there are some errors about the docker containers not finding an image once in a while but thats it) the memory errors are always in the james test code itself the OOM Direct buffer memory seems to often be triggered by an attempt to start GuiceJamesServer which in turns starts multiple netty servers (for the various ports) the default max off heap memory can be related to Xmx (see https://blog.alexis-hassler.com/2020/05/15/direct-buffer-memory.html for a recent french resource on the subject or http://www.mastertheboss.com/other/java-stuff/troubleshooting-outofmemoryerror-direct-buffer-memory ) also https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo says that the error seems to hitting a limit of the JVM instead of malloc being unable to allocate native memory I couldn't find the corresponding JDK documentation So all this seems to confirm Matthieu"s intuition of a resource leak in the test suites instead of a native memory starvation issue. why that leak doesn't affect your CI is still a mystery to me I know that I had some pain to reproduce a green build on my local > computer, while it works pretty smoothly on the CI. > > Raphaël. > > Le 14/01/2021 à 09:37, Jean Helou a écrit : > >> My 2 cents trying to bring a little force in this nice project :) > >> > > Thanks Raphaël :) > > > > All our old CI is open source, so you can just check the source, Luke: > >> https://github.com/linagora/james-jenkins/blob/master/workflow-job#L643 > > > > thanks for the pointer, I had not looked for that one. After digging > > through the repo, I couldn't find any memory specific settings either. > > > > and in particular > >> > https://github.com/linagora/james-project/blob/master/dockerfiles/compilation/java-11/compile.sh#L65 > >> > > So I'm sorry there is no magic mvn parameter... > > well I was wondering if maybe there was something passed in > > MVN_ADDITIONAL_ARG_LINE but as I said I was unable to find anything > > special in the james-jenkins repo. > > > > This leaves me even more confused, don't you encounter these random > looking > > failures on the linagora ci platform ? > > I mean I did manage to get a few green builds but overall out of 62 > builds > > I had 5 successful ones, that's less than 10% ! > > > > I haven't kept detailed stats (I didn't think it would be this bad) but > out > > of gut feeling, the primary causes seem to be: > > - copy on write thread safety (which can arguably be explained by slower > > computers), hence my impatience to see JAMES-3477 fixed since this would > > likely resolve a lot of unstable tests > > - out of memory errors which I find much harder to explain by slower > > machines > > > > For the out of memory errors I ended up increasing the Xmx of surefire > > (from 1g to 2g) in the following pom files : > > > > > > server/protocols/jmap-draft-integration-testing/cassandra-jmap-draft-integration-testing/pom.xml > > > > server/protocols/webadmin-integration-test/distributed-webadmin-integration-test/pom.xml > > > > > > server/protocols/webadmin-integration-test/memory-webadmin-integration-test/pom.xml > > > > server/protocols/webadmin-integration-test/pom.xml > > > > > > server/protocols/jmap-rfc-8621-integration-tests/distributed-jmap-rfc-8621-integration-tests/pom.xml > > server/protocols/webadmin/webadmin-mailbox/pom.xml > > server/protocols/webadmin/webadmin-mailbox/pom.xml > > server/container/guice/cassandra-rabbitmq-guice/pom.xml > > > > server/protocols/jmap-draft-integration-testing/rabbitmq-jmap-draft-integration-testing/pom.xml > > > > server/protocols/jmap-draft-integration-testing/memory-jmap-draft-integration-testing/pom.xml > > > > Obviously I very much look forward to the removal of the jmap draft > module, > > since I read in other exchanges that it was deprecated and would be > removed > > > > If I still can't get the build to pass, I'll look into matthieu's > > suggestion : > > > > The first solution is to recycle JVMs less to mitigate leaks effects > >> (with surefire reusefork option). > > > > Cheers, > > jean > > > > > >> And happy new year to you! > >> > >> Cheers, > >> > >> Raphaël. > >> > >> Le 13/01/2021 à 12:50, Jean Helou a écrit : > >>> Happy new year fellow jamers ! > >>> > >>> In this thrilling new episode you might learn if 2021 will be the year > >> the > >>> james project gets a public ci rolling again ! > >>> > >>> CI wars > >>> Episode 49e^55 > >>> The Memory errors strike back > >>> The CI Resistance succeeded in configuring jenkins, fixed some tests, > >>> exposed some bugs and tagged a lot of unstable tests as being unstable. > >>> After such a striking defeat the empire of bugs reacted in the most > >> vicious > >>> way ever, it deployed "Direct buffer memory" errors throughout the > galaxy > >>> to find contributors to the CI effort and tear down their hope and > >>> motivation. They found the apache jenkins and it will need help from > all > >>> the CI resistance members to fight them off ! > >>> > >>> On a bit more serious note, > >>> I am at a loss as to how to fix this issue. My last four builds have > >> failed > >>> because a `java.lang.OutOfMemoryError: Direct buffer memory` caused the > >>> forked jvm to crash, crashing the surefire plugin and the build with > it. > >>> and that has been a build failure cause for a lot of the 63 builds on > the > >>> apache CI. until now I updated the pom files of the corresponding > >> projects > >>> to increase heap to 2G but the last failure occured in a project where > >> the > >>> heap was already increased. > >>> > >>> Looking at a specific log > >>> > >> > https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-264/12/pipeline > >>> I get a more classical `java.lang.OutOfMemoryError: Java heap space` a > >> bit > >>> before (12:14:09.563 vs 12:45:57.988). > >>> The last non error line before the fatal Direct buffer memory error > is > >>> [INFO] Running > >>> > >> > org.apache.james.webadmin.integration.rabbitmq.RabbitMQReindexingWithEventDeadLettersTest > >>> The last non error line before the nonfatal heap memory error is > >>> [INFO] Running > >>> org.apache.james.jmap.memory.cucumber.MemoryDownloadCucumberTest > >>> > >>> I will try to increase surefire's heap for the > >>> memory-jmap-draft-integration-testing project too in case the inital > heap > >>> space OOM triggered the other one. > >>> stackoverflow is not very helpful either > >>> > >> > https://stackoverflow.com/search?q=java.lang.OutOfMemoryError%3A+Direct+buffer+memory > >>> or I have not been able to comprehend how the solutions there could > help > >>> > >>> I have gone through the files in /dockerfiles without finding anything > >> that > >>> looked related to memory configuration of maven itself, If people who > run > >>> the build locally with success or on their own CI could check the > >> MVN_OPTS > >>> and let me know if they override maven's Xmx itself I would appreciate > >> it. > >>> thanks for your help > >>> jean > >>> > >>> > >>> > >>> On Tue, Dec 29, 2020 at 9:10 AM Jean Helou <jean.he...@gmail.com> > wrote: > >>> > >>>> Hi Benoit, > >>>> > >>>> As someone operating another CI, I want to play even unstable test on > >>>>> every runs. Is there some adaptation needed to do this? > >>>>> > >>>> Yes you will have to change your CI, > >>>>> mvn -B -e -fae test > >>>> now only runs stable tests, to run unstable tests you need an > additional > >>>> step > >>>>> mvn -B -e -fae test -Punstable-tests > >>>> I believe your CI is also based on jenkins (because of the stress test > >>>> jenkinsfile at the root of the project) in which case you could > >> configure > >>>> your jenkins to pick up the jenkinsfile and use the same pipeline as > we > >> use > >>>> on apache CI > >>>> > >>>> cheers, > >>>> jean > >>>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > >> For additional commands, e-mail: server-dev-h...@james.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > For additional commands, e-mail: server-dev-h...@james.apache.org > >