Dear Quantum Espresso users, I am trying to wannierize the electronic structure of ZnMgHf. The structure is large, contains 174 atoms and I calculate 1400 spins unpolarized bands. I carry out calculations using computational cluster. The problem is when I try to use pw2wannier90.x. I receive the message that my calculations are terminated and in joberr file it is stated that the system is out of memory. I even tried 64 nodes with 180 Gb of RAM each and still receive this message. Is there a proper way to make calculations on a cluster with pw2wannier90.x? I run the file with the instruction: mpirun -np 3072 pw2wannier90.x -in pw2wan_ZnMgHf_m1.in > pw2wan_ZnMgHf_m1.out, where pw2wan_ZnMgHf_m1.in is the input file Details of slurm comends are in run_qe_wannier.sh file. I had no problems with scf, nscf calculations and never needed that high number of nodes. I usually use just 4 nodes for this system and the memeory amount is enough.
I include the .win and pw2wannier input files and run file for cluster.Also, output files are included. I used Quantum espresso 6.7 and wannier90 3.1.0 Would appreciate any help. Best regards, Ireneusz Buganski AGH University of Science and Technology, Krakow, Poland
gcccore/10.3.0 loaded. zlib/1.2.11-gcccore-10.3.0 loaded. binutils/2.36.1-gcccore-10.3.0 loaded. intel-compilers/2021.2.0 loaded. numactl/2.0.14-gcccore-10.3.0 loaded. ucx/1.10.0-gcccore-10.3.0 loaded. impi/2021.2.0-intel-compilers-2021.2.0 loaded. iimpi/2021a loaded. imkl/2021.2.0-iimpi-2021a loaded. intel/2021a loaded. szip/2.1.1-gcccore-10.3.0 loaded. hdf5/1.10.7-iimpi-2021a loaded. elpa/2021.05.001-intel-2021a loaded. libxc/5.1.5-intel-compilers-2021.2.0 loaded. quantumespresso/6.7-intel-2021a loaded. intel/2021a unloaded. gcccore/10.3.0 unloaded. gcccore/11.2.0 loaded. zlib/1.2.11-gcccore-10.3.0 unloaded. binutils/2.36.1-gcccore-10.3.0 unloaded. zlib/1.2.11-gcccore-11.2.0 loaded. binutils/2.37-gcccore-11.2.0 loaded. intel-compilers/2021.2.0 unloaded. intel-compilers/2021.4.0 loaded. impi/2021.2.0-intel-compilers-2021.2.0 unloaded. ucx/1.10.0-gcccore-10.3.0 unloaded. numactl/2.0.14-gcccore-10.3.0 unloaded. numactl/2.0.14-gcccore-11.2.0 loaded. ucx/1.11.2-gcccore-11.2.0 loaded. impi/2021.4.0-intel-compilers-2021.4.0 loaded. imkl/2021.2.0-iimpi-2021a unloaded. imkl/2021.4.0 loaded. iimpi/2021a unloaded. iimpi/2021b loaded. imkl-fftw/2021.4.0-iimpi-2021b loaded. intel/2021b loaded. wannier90/3.1.0-intel-2021b loaded. The following have been reloaded with a version change: 1) binutils/2.36.1-gcccore-10.3.0 => binutils/2.37-gcccore-11.2.0 2) gcccore/10.3.0 => gcccore/11.2.0 3) iimpi/2021a => iimpi/2021b 4) imkl/2021.2.0-iimpi-2021a => imkl/2021.4.0 5) impi/2021.2.0-intel-compilers-2021.2.0 => impi/2021.4.0-intel-compilers-2021.4.0 6) intel-compilers/2021.2.0 => intel-compilers/2021.4.0 7) intel/2021a => intel/2021b 8) numactl/2.0.14-gcccore-10.3.0 => numactl/2.0.14-gcccore-11.2.0 9) ucx/1.10.0-gcccore-10.3.0 => ucx/1.11.2-gcccore-11.2.0 10) zlib/1.2.11-gcccore-10.3.0 => zlib/1.2.11-gcccore-11.2.0 slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.9. Some of the step tasks have been OOM Killed. srun: error: ac0519: task 0: Out Of Memory slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.6. Some of the step tasks have been OOM Killed. srun: error: ac0074: task 1: Out Of Memory [proxy:0:40@ac0517] main (../../../../../src/pm/i_hydra/proxy/proxy.c:1189): assert (proxy_params.immediate.proxy.pid_hash == NULL) failed srun: error: ac0517: task 10: Exited with exit code 5 [proxy:0:4@ac0043] main (../../../../../src/pm/i_hydra/proxy/proxy.c:1189): assert (proxy_params.immediate.proxy.pid_hash == NULL) failed srun: error: ac0043: task 1: Exited with exit code 5 srun: error: ac0662: task 14: Broken pipe [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) srun: error: ac0705: task 15: Broken pipe [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) srun: error: ac0620: task 0: Out Of Memory srun: error: ac0599: task 12: Exited with exit code 5 slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.5. Some of the step tasks have been OOM Killed. [proxy:0:48@ac0599] main (../../../../../src/pm/i_hydra/proxy/proxy.c:1189): assert (proxy_params.immediate.proxy.pid_hash == NULL) failed [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor) srun: error: ac0416: task 5: Out Of Memory slurmstepd: error: Detected 1 oom_kill event in StepId=9760000.0. Some of the step tasks have been OOM Killed. [mpiexec@ac0001] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor)
MPI startup(): PMI server not found. Please set I_MPI_PMI_LIBRARY variable if it is not a singleton case.
pw2wan_ZnMgHf_m1.in
Description: Binary data
pw2wan_ZnMgHf_m1.out
Description: Binary data
run_qe_wannier.sh
Description: Unix shell archive
ZnMgHf_m1.win
Description: Binary data
ZnMgHf_m1.wout
Description: Binary data
_______________________________________________ The Quantum ESPRESSO community stands by the Ukrainian people and expresses its concerns about the devastating effects that the Russian military offensive has on their country and on the free and peaceful scientific, cultural, and economic cooperation amongst peoples _______________________________________________ Quantum ESPRESSO is supported by MaX (www.max-centre.eu) users mailing list users@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/users