Hi, Since months now there are a lot of random failure on the CI making it hard to work.
There is different kind of failures: - Network problems - Failing tests - Incomprehensible problems Now I don't see much failure due to Network. I suppose the Inria infrastructure improved. Failing tests were corrected those past months and we see less and less of them. Now the big problem are the incomprehensible crashes such as "The workspace was not found" or "FileDoesNotExistException" or "pharo-vm/ is already present". We just found the problem :) During the validation of the Bootstrap multiple tests are launched on OSX/Windows/linux in parallel. Each task is on a different slave of the Jenkins. But, apparently we discovered that two slaves could have the same disk. Usually it does not cause any trouble since a job is only run by one slave. But in this particular case, two slaves can be used by the same job and mess with the resources of each other. We highlighted the problem by adding logs to the CI. Now when we launch tests we create a file with the name of the task. Today we got a crash and in the log we see that the same workspace has two of those files, proving that they are executed on the same disk, in the same folder : […] -rw-rw-r-- 1 ci ci 0 Jun 19 16:01 Kernel-tests-unix-32 […] -rw-rw-r-- 1 ci ci 0 Jun 19 16:01 Tests-unix-32 As a solution we will execute the tests inside a subfolder with the name of the task and it should reduce a lot the number of problems. Have a nice day :) -- Cyril Ferlicot https://ferlicot.fr
