Dear Quantum ESPRESSO Developers and Community,
I am writing to report a persistent runtime error in the GPU-accelerated
version of ph.x (Quantum ESPRESSO v7.5) when calculating electron-phonon
coefficients using the OpenACC port.
While the code successfully calculates the Dynamical Matrices and Frequencies
on the GPU, it consistently crashes during the final electron-phonon
interaction step (routine elphon) with a File I/O error, specifically related
to the temporary file a2Fsave.
1. System and Compilation Details:
Version: Quantum ESPRESSO v7.5 (GitLab release)
Compiler: NVIDIA HPC SDK v24.9
Configuration: ./configure --enable-openacc --with-cuda=yes --with-cuda-cc=89
--with-cuda-runtime=12.6
Hardware: NVIDIA RTX 4090 (Ada Lovelace)
MPI: OpenMPI (via NVIDIA HPC SDK)
2. The Issue: When running ph.x with electron_phonon = 'interpolated' (or any
mode that triggers elphon), the execution aborts immediately after
diagonalizing the dynamical matrix for the first q-point. The crash occurs
regardless of the MPI parallelization level (reproduced with both -np 1 and -np
8).
3. Error Log: The crash points to a read error in elphon.f90 attempting to read
a file that appears to be empty or not flushed to disk.
FIO-F-217/list-directed read/unit=40/attempt to read past end of file.
File name = './out/mgb2.a2Fsave', formatted, sequential access record = 1
In source file /path/to/q-e/PHonon/PH/elphon.f90, at line number 847
File name = './out/mgb2.a2Fsave', formatted, sequential access record = 1
In source file /path/to/q-e/PHonon/PH/elphon.f90, at line number 847
4. Reproduction Case (MgB2): I reproduced this using a standard MgB2 test case.
Input snippet (ph.in):
Fortran
&INPUTPH
tr2_ph = 1.0d-14,
prefix = 'mgb2',
outdir = './out',
fildyn = 'mgb2.dyn',
fildvscf = 'mgb2.dvscf',
electron_phonon = 'interpolated', ! <--- Triggers the crash
trans = .true.,
ldisp = .true.,
nq1=6, nq2=6, nq3=4
/
5. Observations:
Pure Phonons work: If I comment out electron_phonon, the GPU run finishes
successfully and writes .dyn and .dvscffiles.
CPU Works: The exact same input runs successfully on the CPU-only binary
(gfortran compilation).
File Incompatibility: I attempted to run the heavy phonon calculation on the
GPU and the final electron-phonon collection on the CPU (using recover=.true.
or trans=.false.), but the CPU binary cannot read the GPU-generated
.dvscf/binary files ("problems reading u" error), likely due to binary
format/padding differences between nvfortranand gfortran.
It appears there is a race condition or file handling issue in the OpenACC
implementation of the elphon routine where the a2Fsave file is read before it
is successfully written/closed.
Any advice on a workaround or a patch for elphon.f90 to stabilize the GPU I/O
would be greatly appreciated.
Thank you for your time and for developing this software.
Best regards,
Dholon Kumar Paul
Research Assistant, BRAC University, Bangladesh_______________________________________________________________________________
The Quantum ESPRESSO Foundation stands in solidarity with all civilians
worldwide who are victims of terrorism, military aggression, and indiscriminate
warfare.
--------------------------------------------------------------------------------
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users