Re: Jenkins CI setup

Tellier Benoit Fri, 11 Dec 2020 01:16:31 -0800

Hello Jean,

Le 11/12/2020 à 15:47, Jean Helou a écrit :
> Hello again jamers!
> 
> It's time for your irregular report on the CI effort on apache infra :)


\o/

>> I'm in favor for opening a dedicated ticket and merge a disabled version
>> of this test in order to document the problem.
>>
> 
> it"s been busy and I haven't opened the ticket yet, nor have we managed to
> fully fix the issue yet
> 

I can devote some of my time to support you on this.

> 
>>> Here is what I would like to do at this stage :
>>> - Isolate the unstable tests under with an unstable tag (akin to "feature
>>> tags")
>>> - exclude these tests from the default surefire execution profile,
>>> - add a parallel pipeline step for these tests where the step failure
>>> doesn't fail the pipeline [2]
>>> - ensure that the build is green
>>> - merge so the project finally has a working public CI
>>>
>>> I intend to start working on this quickly so we can all enjoy a
>> functional
>>> public CI.
>>
> 
> So I added an `Unstable.Tag` and started tagging the known unstable tests,
> it seemed that running the pipeline in parallele led to more issues
> so I reverted the parallel run to a serial run for now. I also changed from
> fail at first error to fail at end to get an idea of the volume of unstable
> tests.

+1

> 
>> I'd advocate a @Disabled tag, referencing both a JIRA ticket specific to
>> the bugfix needed, and the JIRA of the CI build.
>> Having a list of such issues in the JIRA (CI setup) ticket would be
>> valuable. I'd even advise doing subtickets to have a nice checklist.
> 
> 
> Despite Matthieu' remarks I was open to create the tickets and add the
> information to the Unstable.TAG or as a comment next to the tag.
> 
> However the recent CI results make the effort feel overwhelming :
> -
> https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/7/tests
> : failures 4 new, 33 existing, 27 fixed, total 37
> -
> https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/6/tests
> : failures 27 new, 27 existing, 0 fixed, total 54
> -
> https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/5/tests
> : failures 0 new, 8 existing, 6 fixed, total 8
> -
> https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/4/tests
> (issue with the jenkins file the build did not run)
> -
> https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/3/tests
> : 4 fixed, 92 existing failures (that uses parallel to run both stable and
> unstable)
> -
> https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/2/tests
> : 92 existing failures (that uses parallel to run both stable and unstable)
> 
> Now the last 2 runs where triggered after I rebased on master.
> I will trigger the next 3 by modifying random comments in the jenkins file
> to see if the build has a reproductible failure pattern or if all these
> need to be tagged as Unstable.
> Thats a lot of failures some of which don't even make sense to me, in run 7
> the first of the 4 new failures is:
> ```
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.james.user.ldap.ReadOnlyUsersLDAPRepositoryTest
> at
> org.apache.james.user.ldap.ReadOnlyUsersLDAPRepositoryInvalidDnTest.setUp(ReadOnlyUsersLDAPRepositoryInvalidDnTest.java:62)
> ```
> and quite a few of the errors listed are similar NoClassDefFound errors
> which I quite fail to understand ... I would very much welcom feedback if
> one of you encountered this kind of issues before, it feels like I am
> missing something :(

We use a static singleton approach in order for testcontainers docker
containers to be initialised once per surefire fork and not once per
test class. Combined wih a reuseForks=true setting this dramatically
reduce testing time!

The only downside in cryptic NoClassDefound errors if the given docker
container can't start.

To be noted that:
 - some tests reuse existing images
 - some tests like the LDAP one uses a Dockerfile to build their own
image. This is likely the source of some instability.

I also saw Cassandra containers failing initialization in memory
constraint environments (eg on my laptop if less than 5GB are available).

Not sure if it helps.
Not sure if we can hope for changes in the docker environment.

A quick action could be to decrease surefire concurrency in some
projects using expensive-to-start containers (like mailbox/cassandra,
mpt/impl/imap/cassandra, etc...). If we can use en ENV variable for this...

Do give some insight, at linagora we run 1 build only per jenkins slave,
which have a dedicated physical host. I doubt we have such things in the
Apache environment.

Hope it helps.

> 
>> Having a build in the first place, even with the restrictions you
>> describe sounds like a good progress to me.
>>
> 
> As it stands it looks like it's going to take a bit longer :(
> 

Cheers,

Benoit

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Re: Jenkins CI setup

Reply via email to