[Pharo-dev] [CI] Cause of the random failures in the CI

Cyril Ferlicot D. Tue, 19 Jun 2018 07:56:05 -0700

Hi,

Since months now there are a lot of random failure on the CI making it
hard to work.


There is different kind of failures:
- Network problems
- Failing tests
- Incomprehensible problems

Now I don't see much failure due to Network. I suppose the Inria
infrastructure improved.

Failing tests were corrected those past months and we see less and less
of them.

Now the big problem are the incomprehensible crashes such as "The
workspace was not found" or "FileDoesNotExistException" or "pharo-vm/ is
already present".

We just found the problem :)

During the validation of the Bootstrap multiple tests are launched on
OSX/Windows/linux in parallel. Each task is on a different slave of the
Jenkins. But, apparently we discovered that two slaves could have the
same disk. Usually it does not cause any trouble since a job is only run
by one slave. But in this particular case, two slaves can be used by the
same job and mess with the resources of each other.

We highlighted the problem by adding logs to the CI. Now when we launch
tests we create a file with the name of the task.

Today we got a crash and in the log we see that the same workspace has
two of those files, proving that they are executed on the same disk, in
the same folder :

[…]
-rw-rw-r-- 1 ci ci    0 Jun 19 16:01 Kernel-tests-unix-32
[…]
-rw-rw-r-- 1 ci ci    0 Jun 19 16:01 Tests-unix-32

As a solution we will execute the tests inside a subfolder with the name
of the task and it should reduce a lot the number of problems.

Have a nice day :)

-- 
Cyril Ferlicot
https://ferlicot.fr

[Pharo-dev] [CI] Cause of the random failures in the CI

Reply via email to