On 6 Mar 2017, at 16:45, Roland Haas <[email protected]> wrote:

> Present: Erik, Roland, Cory Chu, Steve, Bhavesh, 彭兆宏
> 
> We quickly discussed the failing tests, and given that the option list
> used
> (https://bitbucket.org/ianhinder/cactusjenkins/raw/d7021a52bd83448db589b2346c43441682eecabb/build.cfg)
> is not using -ffast-math or -mmarch=native the working assumption is
> that the occurrence of the failures is due to to an updated OS and
> newer compiler.

I seem to remember that this problem started when we moved the system from UCD 
to NCSA.  At this time, there was no change to the OS or the compiler.  The 
only thing that should have been visible was a change in CPU.  I might be 
misremembering.

Do we have tickets for these failures?  It is hard to keep track of what has 
been discussed without them.  It would be good if someone familiar with the 
failing codes would take "ownership" of this issue, create some tickets, and 
try to come up with a plan for tracking down the cause of the failures.  Having 
constantly-failing tests in Jenkins desensitises us to problems, and is not in 
general a good situation to be in.

We have failures in three thorns:

        CT_MultiLevel
        SphericalHarmonicReconGen
        GRHydro

> Erik asked if we had a docker container for the Jenkins
> test slave, which we were not sure (there is a docker image at
> https://bitbucket.org/ianhinder/et-jenkins-slave though it is not clear if 
> this is the one used).

I would first see if it is easy to reproduce the failure on a system which you 
already have set up, since that is the easiest.

The docker container used is:

ianhinder/et-jenkins-slave:ubuntu-16.04

A new build slave can be created, assuming an existing installation of Ubuntu 
16.04, by using the (almost trivial) scripts in the repository at

        https://bitbucket.org/ianhinder/ncsajenkins

The README gives the commands to run.  If you want to just run the container on 
an existing docker system, a simple

        docker run --name etslave ianhinder/et-jenkins-slave:ubuntu-16.04

should be sufficient.

If the problem is CPU-specific, then it will matter what system you run on.  
The build machine reports:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 60
model name      : Intel Core Processor (Haswell)
stepping        : 1
microcode       : 0x1
cpu MHz         : 2499.996
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
hypervisor lahf_lm abm fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs            :
bogomips        : 4999.99
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:


I have just tried to reproduce the failure at

https://build-test.barrywardell.net/job/EinsteinToolkit/lastCompletedBuild/testReport/(root)/GRHydro/GRHydro_test_shock_weno_1procs/

on a Ubuntu machine with a Kaby Lake processor, and it gives exactly the same 
"diffs" output, and the same failure, as on the build machine.  This is with 
Ubuntu 16.10 with ubuntu.cfg from simfactory.

--
Ian Hinder
http://members.aei.mpg.de/ianhin

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Reply via email to