Le 04/12/2020 à 03:21, Jean Helou a écrit :
> Hello fellow jamers !
>
> The Jenkinsfile in the PR works, up until the test suite fails, the tests
> failures are from seemingly "unstable" tests that fail because of timing
> issues. Benoit fixed the first one in
> https://github.com/apache/james-project/pull/267 by disabling read repairs
> during consistency checks (I have no idea what it means but it sounds
> awesome :) ), I fixed the second one in
> https://github.com/apache/james-project/pull/269 where the event bus sender
> and receivers where closed out of order on shutdown sometimes leading up to
> events being sent to a closed receiver.
>
> After some cleanup, Matthieu recreated a buildable PR which lead to yet
> another unstable test in
> https://builds.apache.org/blue/organizations/jenkins/james%2FApacheJames/detail/PR-268/1/tests
We have been encoutering this for a while. Thanks for digging in.
>
> I started investigating the issue and ended up roping in Matthieu since the
> symptoms for the issue left me completely puzzled. Matthieu managed to
> pinpoint the root cause to a NPE sometimes thrown from
> within org.apache.james.server.core.MimeMessageCopyOnWriteProxy which in
> turn triggered further NullPointerExceptions in the mailet pipeline error
> handling code.
> We finally confirmed a concurrency issue in the refcounting management of
> the proxy which if I understand correctly can lead to unrecoverable data
> loss. We wrote a test to trigger it [1] in an almost deterministic manner.
I'm in favor for opening a dedicated ticket and merge a disabled version
of this test in order to document the problem.
>
> Once we had a test to reproduce the race condition, we tried to fix the
> issue only to realize that it led to even more concurrency issues. The
> rather depressing conclusion we reached yesterday was that the whole
> implementation is currently unsound with regard to concurrency. I am unable
> to estimate the resolution effort at this point, Matthieu has some ideas
> and will work on it (as well as I) when time allows.
>
> Which leads me to my current interrogations: I feel that fixing such long
> standing issues in the test suite is not actually part of configuring the
> apache CI but I am unsure how to proceed.
+1
>
> Here is what I would like to do at this stage :
> - Isolate the unstable tests under with an unstable tag (akin to "feature
> tags")
I'd advocate a @Disabled tag, referencing both a JIRA ticket specific to
the bugfix needed, and the JIRA of the CI build.

Having a list of such issues in the JIRA (CI setup) ticket would be
valuable. I'd even advise doing subtickets to have a nice checklist.
> - exclude these tests from the default surefire execution profile,
> - add a parallel pipeline step for these tests where the step failure
> doesn't fail the pipeline [2]
> - ensure that the build is green
> - merge so the project finally has a working public CI
>
> I intend to start working on this quickly so we can all enjoy a functional
> public CI.
+1 I agree on the approach.
>
> Alternatives:
> - Merge the jenkinsfile after the whole pipeline has been tested in the PR
> branch, which may not happen in a short-medium term...
> - Merging as is, means that many builds on PRs will end up failing and the
> last steps (snapshot publish) might fail even if the testsuite succeeds
> since it never ran.
> - Something I haven't thought of ?
>
> Another issue I want to raise is the availability of the CI builds. As you
> have seen from my experiments, the CI triggers configuration will only
> build commits from :
> - all branches of the main repository
> - all PRs opened from the main repository
> - all PRs opened by someone with write access to the main repository
>
> Which means that PRs for external contributors will not be built at all.
>
> I tried adding the  issueCommentTrigger to the jenkins file but neither my
> comments nor those of someone with commit access were able to trigger the
> build.
>
> I think that one of the project members should revise the current settings
> to make it possible to build external contributors PR one way or another.
> (only project members have access or can have access to the jenkins project
> configuration).
> Here are two options:
> - the easiest and quickest modification is to let the CI build all and
> every PR, there are relatively few PRs on james so the burden on the CI
> platform shouldn't be too bad.
> - alternatively it may be possible to configure jenkins to require a
> comment for someone with write access to trigger a build. unfortunately I
> am not certain how to set this up, maybe INFRA can help.
Having a build in the first place, even with the restrictions you
describe sounds like a good progress to me.

I agree we need to see what other Apache projects are doing, and if
needed ask the INFRA.
>
> I know this was a long piece, I look forward to reading your opinions !
Thanks for your involvement on this topic.

Benoit
> Jean
>
> [1] see
> https://github.com/jeantil/james-project/tree/james-3225-concurrency-bug-mimemessagecow
> [2] see
> https://stackoverflow.com/questions/44022775/jenkins-ignore-failure-in-pipeline-build-step
>
> On Thu, Nov 26, 2020 at 11:22 AM Jean Helou <jean.he...@gmail.com> wrote:
>
>> The good news is that docker does indeed work, the bad news is that the
>> tests fail with an issue that's too involved for me :/
>>
>> [INFO]
>> [INFO] Results:
>> [INFO]
>> [ERROR] Failures:
>> [ERROR]   
>> CassandraMailboxManagerConsistencyTest$FailuresOnDeletion$DeleteOnce.deleteMailboxByPathShouldBeConsistentWhenMailboxPathDaoFails:433
>>  Multiple Failures (1 failure)
>>      
>> Expecting:
>>   <[]>
>> to contain exactly (and in same order):
>>   <[#private:user:INBOX]>
>> but could not find the following elements:
>>   <[#private:user:INBOX]>
>>
>> at 
>> CassandraMailboxManagerConsistencyTest$FailuresOnDeletion$DeleteOnce.lambda$deleteMailboxByPathShouldBeConsistentWhenMailboxPathDaoFails$8(CassandraMailboxManagerConsistencyTest$FailuresOnDeletion$DeleteOnce.java:440)
>>
>> so unless the build for
>>
>> * 6fab99364a - JAMES-3448 Rewrite links to http://james.apache.org/server/3/ 
>> (Mon Nov 23 15:10:36 2020 +0700) <Benoit Tellier> N
>>
>> is broken which sounds unlikely, I'm going to need help
>>
>> jean
>>
>> On Thu, Nov 26, 2020 at 10:53 AM Jean Helou <jean.he...@gmail.com> wrote:
>>
>>> on a loosely related note : the test suite logs are scary to look at:
>>> piles upon piles of stack traces and error logs but the tests actually pass
>>> ...
>>>
>>> On Thu, Nov 26, 2020 at 10:50 AM Jean Helou <jean.he...@gmail.com> wrote:
>>>
>>>> Thanks benoit,
>>>>
>>>> Matthieu pointed me to numerous apache projects with jenkinsfiles which
>>>> mention docker in
>>>> https://github.com/search?q=org%3Aapache++filename%3AJenkinsfile+docker&type=Code
>>>> so I'm trying out things based on that
>>>>
>>>> the logs seem promising so far :
>>>> ```
>>>>
>>>> [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
>>>> 0.697 s - in 
>>>> org.apache.james.backends.rabbitmq.RabbitMQConnectionFactoryTest
>>>>         ℹ︎ Checking the system...
>>>>         ✔ Docker version should be at least 1.6.0
>>>>         ✔ Docker environment should have more than 2GB free disk space
>>>> [INFO] Running org.apache.james.backends.rabbitmq.RabbitMQTest
>>>> ```
>>>>
>>>>
>>>> On Thu, Nov 26, 2020 at 10:40 AM Tellier Benoit <btell...@apache.org>
>>>> wrote:
>>>>
>>>>> Done
>>>>>
>>>>> Le 26/11/2020 à 16:25, Jean Helou a écrit :
>>>>>> hi all,
>>>>>>
>>>>>> As you know I started a PR to setup jenkins CI, the latest iteration
>>>>> sees
>>>>>> the compilation of the project complete in 5 minutes ( thanks to T1C)
>>>>> but
>>>>>> the tests fail to initialize docker containers with the disastrous
>>>>>> consequences you can imagine :D
>>>>>>
>>>>>> I opened https://issues.apache.org/jira/browse/INFRA-21144 to ask if
>>>>> it is
>>>>>> possible to have the docker service enable don some nodes, since I am
>>>>> not
>>>>>> official member of the project I think it may be useful if you chimed
>>>>> in on
>>>>>> the ticket to confirm that this is a legitimate request.
>>>>>>
>>>>>> Best regards,
>>>>>> Jean
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
>>>>> For additional commands, e-mail: server-dev-h...@james.apache.org
>>>>>
>>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to