Thanks Nick! Good to hear this is a known problem and not something on my end.
For the record, I tried D&C, expert and MRRR solvers, and they all had this
problem.
At the moment, intel 2021 is the only version for which the compilers, MPI and
MKL all worked for siesta on my cluster, so I am stuck with it for now. We will
have a big upgrade including intel 23 in the coming months. I will also build
siesta with the ELPA solver going forward to be safe (or finally move to
compiling with ESL). I'm working on recompiling my current version ELPA, but
having some problems compiling ELPA. For now my not so elegant solution was to
lower the kpoints and energy cutoff and increase cores to reduce memory-related
crashes, and resbumit if they crash. It's working ok for now until I get the
long term solutions working
Danny
From: siesta-l-requ...@uam.es on behalf of Nick
Papior
Sent: 27 February 2023 03:07
To: siesta-l@uam.es
Subject: Re: [SIESTA-L] Memory problems with SOC
I think that this is a memory leak in MKL, could you try a later revision?
Please see details here:
https://urldefense.com/v3/__https://gitlab.com/siesta-project/siesta/-/issues/29__;!!D9dNQwwGXtA!UGLOSLEhGEY9xiUzENi1YMH4paRXN8Bztvik-VhAh_gu_vPS4z4F8c3B_nr5dFRJmnfpAJDmZMqZUsgU$
<https://urldefense.com/v3/__https://gitlab.com/siesta-project/siesta/-/issues/29__;!!D9dNQwwGXtA!VJ6dNL-0-ztPLYjo9nyf0U6oWg9E4yUL2dpwIavhqG8_fCPtLxlZBt0ZW4ZCD8oJSNuN_S5CVgp2mkfOgg$>
So I think it can easily be mitigated. :)
Den lør. 25. feb. 2023 kl. 22.16 skrev Daniel Bennett
mailto:db...@cantab.ac.uk>>:
Hi all,
I'm running some calculations with SOC for a slab-like system with ~100 atoms,
~1300 orbitals, and am having some issues running out of memory. I ran the same
calculations with no SOC and things went fine.
I'm using the PSML version with intel/mkl 2021.2.0. The calculations are
running on 48 cores with just under 4GB memory per cpu. The calculations run
and eventually crash, either in the SCF loop or when computing the bands. From
my submission script: "srun: error: holy7c02611: task 5: Killed" it looks like
one of the cores gets stopped, and then the calculation hangs. I tried setting
ulimit -s unlimited and ulimit -m unlimited, but that didn't help. I also
decreased the mesh cutoff and kpoints from 600Ry to 400Ry and 12x12x1 to 8x8x1,
but the calculations still run out of memory eventually.
Does anybody have any general advice for getting larger calculations with SOC
to run without running out of memory? I could try increasing the number of
cores, but I wanted to see if anybody had some advice first because the wait
time is a lot longer for a larger number of cores, which makes it take a long
time to troubleshoot. I did try going up to 96 cores (2 nodes), but it still
crashes.
I'm not sure if I can send attachments to the mailing list, but if not I can
send inputs / outputs privately
Thanks,
Daniel Bennett
--
SIESTA is supported by the Spanish Research Agency (AEI) and by the European
H2020 MaX Centre of Excellence
(https://urldefense.com/v3/__http://www.max-centre.eu/__;!!D9dNQwwGXtA!UGLOSLEhGEY9xiUzENi1YMH4paRXN8Bztvik-VhAh_gu_vPS4z4F8c3B_nr5dFRJmnfpAJDmZL6SNciY$
<https://urldefense.com/v3/__http://www.max-centre.eu/__;!!D9dNQwwGXtA!VJ6dNL-0-ztPLYjo9nyf0U6oWg9E4yUL2dpwIavhqG8_fCPtLxlZBt0ZW4ZCD8oJSNuN_S5CVgphwWrjeg$>)
--
Kind regards Nick
--
SIESTA is supported by the Spanish Research Agency (AEI) and by the European
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)