[Pw_forum] Looking for some guidance with failing tests

Barry Moore Mon, 15 May 2017 09:53:34 -0700

Hello All,

I have recently compiled QE 6.1 and 5.4 with GCC 4.8.5, OpenMPI 2.0.2, and
MKL 2017, but I am having some failing tests when run in parallel (they are
okay when run serially). The problem persists if I remove MKL, try GCC
5.4.0, or Intel + IntelMPI (2017).


Here is the output from `make run-custom-test-parallel testdir=pw_langevin`:

Using executable: /ihome/crc/build/quantumespresso/qe-gcc-
openmpi-6.1/test-suite/..//test-suite/run-pw.sh.
Test id: 110517.
Benchmark: 6.1.

pw_langevin - langevin.in: **FAILED**.
el1
    ERROR: absolute error 5.21e-02 greater than 1.00e-02. (Test: 0.7628.
Benchmark: 0.8149.)
el1
    ERROR: absolute error 1.04e-02 greater than 1.00e-02. (Test: 0.7855.
Benchmark: 0.7959.)
el1
    ERROR: absolute error 2.95e-02 greater than 1.00e-02. (Test: 0.7629.
Benchmark: 0.7924.)
el1
    ERROR: absolute error 8.92e-02 greater than 1.00e-02. (Test: 0.7354.
Benchmark: 0.6462.)
el1
    ERROR: absolute error 3.95e-02 greater than 1.00e-02. (Test: 0.7338.
Benchmark: 0.6943.)
el1
    ERROR: absolute error 5.16e-02 greater than 1.00e-02. (Test: 0.7795.
Benchmark: 0.7279.)
el1
    ERROR: absolute error 3.84e-01 greater than 1.00e-02. (Test: 0.7775.
Benchmark: 0.3932.)
el1
    ERROR: absolute error 3.56e-01 greater than 1.00e-02. (Test: 0.7699.
Benchmark: 0.4142.)
el1
    ERROR: absolute error 3.93e-02 greater than 1.00e-02. (Test: 0.7495.
Benchmark: 0.7888.)
e1
    ERROR: absolute error 7.13e-04 greater than 1.00e-06. (Test:
-2.414157.  Benchmark: -2.413444.)
eh1
    ERROR: absolute error 1.72e-01 greater than 1.00e-02. (Test: -10.6793.
Benchmark: -10.507.)
eh1
    ERROR: absolute error 3.52e-02 greater than 1.00e-02. (Test: -10.604.
Benchmark: -10.5688.)
eh1
    ERROR: absolute error 9.85e-02 greater than 1.00e-02. (Test: -10.6789.
Benchmark: -10.5804.)
eh1
    ERROR: absolute error 3.14e-01 greater than 1.00e-02. (Test: -10.7728.
Benchmark: -11.0869.)
eh1
    ERROR: absolute error 3.84e-01 greater than 1.00e-02. (Test: -10.7792.
Benchmark: -10.3947.)
eh1
    ERROR: absolute error 1.72e-01 greater than 1.00e-02. (Test: -10.6241.
Benchmark: -10.7964.)
eh1
    ERROR: absolute error 4.06e-01 greater than 1.00e-02. (Test: -10.6304.
Benchmark: -10.2247.)
eh1
    ERROR: absolute error 4.19e-01 greater than 1.00e-02. (Test: -10.6557.
Benchmark: -10.2366.)
eh1
    ERROR: absolute error 2.73e-01 greater than 1.00e-02. (Test: -10.7245.
Benchmark: -10.4515.)

pw_langevin - langevin_smc.in: **FAILED**.
el1
    ERROR: absolute error 5.21e-02 greater than 1.00e-02. (Test: 0.7628.
Benchmark: 0.8149.)
el1
    ERROR: absolute error 2.81e-02 greater than 1.00e-02. (Test: 0.6296.
Benchmark: 0.6577.)
el1
    ERROR: absolute error 5.01e-02 greater than 1.00e-02. (Test: 0.8071.
Benchmark: 0.757.)
el1
    ERROR: absolute error 1.89e-01 greater than 1.00e-02. (Test: 0.6871.
Benchmark: 0.4977.)
el1
    ERROR: absolute error 1.45e-01 greater than 1.00e-02. (Test: 0.7734.
Benchmark: 0.6287.)
el1
    ERROR: absolute error 1.96e-02 greater than 1.00e-02. (Test: 0.762.
Benchmark: 0.7816.)
el1
    ERROR: absolute error 5.69e-02 greater than 1.00e-02. (Test: 0.7798.
Benchmark: 0.7229.)
el1
    ERROR: absolute error 1.45e-01 greater than 1.00e-02. (Test: 0.6516.
Benchmark: 0.7962.)
el1
    ERROR: absolute error 3.92e-02 greater than 1.00e-02. (Test: 0.7488.
Benchmark: 0.788.)
e1
    ERROR: absolute error 6.76e-04 greater than 1.00e-06. (Test:
-2.414139.  Benchmark: -2.414815.)
eh1
    ERROR: absolute error 1.72e-01 greater than 1.00e-02. (Test: -10.6793.
Benchmark: -10.507.)
eh1
    ERROR: absolute error 1.58e-02 greater than 1.00e-02. (Test: -10.3584.
Benchmark: -10.3742.)
eh1
    ERROR: absolute error 1.66e-01 greater than 1.00e-02. (Test: -10.5316.
Benchmark: -10.6974.)
eh1
    ERROR: absolute error 6.57e-01 greater than 1.00e-02. (Test: -10.9399.
Benchmark: -10.2825.)
eh1
    ERROR: absolute error 2.87e-01 greater than 1.00e-02. (Test: -10.6448.
Benchmark: -10.3576.)
eh1
    ERROR: absolute error 6.69e-02 greater than 1.00e-02. (Test: -10.6823.
Benchmark: -10.6154.)
eh1
    ERROR: absolute error 1.93e-01 greater than 1.00e-02. (Test: -10.6229.
Benchmark: -10.8154.)
eh1
    ERROR: absolute error 1.98e-01 greater than 1.00e-02. (Test: -10.371.
Benchmark: -10.5688.)
eh1
    ERROR: absolute error 1.30e-01 greater than 1.00e-02. (Test: -10.725.
Benchmark: -10.595.)

All done. ERROR: only 0 out of 2 tests passed.
Failed tests in:
/ihome/crc/build/quantumespresso/qe-gcc-openmpi-6.1/test-suite/pw_langevin/

and `make run-custom-test-serial testdir=pw_langevin`

Using executable: /ihome/crc/build/quantumespresso/qe-gcc-
openmpi-6.1/test-suite/..//test-suite/run-pw.sh.
Test id: 110517-1.
Benchmark: 6.1.

pw_langevin - langevin.in: Passed.

pw_langevin - langevin_smc.in: Passed.

All done. 2 out of 2 tests passed.

Is this expected? Any suggestions? There are about 30 failing tests total.
They are all "very close" to the correct answer, but not the correct answer.

Thanks,

Barry

P.S.

I have never run QE in my academic career. I am installing it at our center
for another user and I don't want to release a code with failing tests.

-- 
Barry E Moore II, PhD
E-mail: [email protected]

Assistant Research Professor
Center for Simulation and Modeling
University of Pittsburgh
Pittsburgh, PA 15260

_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum

[Pw_forum] Looking for some guidance with failing tests

Reply via email to