Thanks Nick! Good to hear this is a known problem and not something on my end. 
For the record, I tried D&C, expert and MRRR solvers, and they all had this 
problem.

At the moment, intel 2021 is the only version for which the compilers, MPI and 
MKL all worked for siesta on my cluster, so I am stuck with it for now. We will 
have a big upgrade including intel 23 in the coming months. I will also build 
siesta with the ELPA solver going forward to be safe (or finally move to 
compiling with ESL). I'm working on recompiling my current version ELPA, but 
having some problems compiling ELPA. For now my not so elegant solution was to 
lower the kpoints and energy cutoff and increase cores to reduce memory-related 
crashes, and resbumit if they crash. It's working ok for now until I get the 
long term solutions working

Danny
________________________________
From: siesta-l-requ...@uam.es <siesta-l-requ...@uam.es> on behalf of Nick 
Papior <nickpap...@gmail.com>
Sent: 27 February 2023 03:07
To: siesta-l@uam.es <siesta-l@uam.es>
Subject: Re: [SIESTA-L] Memory problems with SOC

I think that this is a memory leak in MKL, could you try a later revision?
Please see details here: 
https://urldefense.com/v3/__https://gitlab.com/siesta-project/siesta/-/issues/29__;!!D9dNQwwGXtA!UGLOSLEhGEY9xiUzENi1YMH4paRXN8Bztvik-VhAh_gu_vPS4z4F8c3B_nr5dFRJmnfpAJDmZMqZUsgU$
 
<https://urldefense.com/v3/__https://gitlab.com/siesta-project/siesta/-/issues/29__;!!D9dNQwwGXtA!VJ6dNL-0-ztPLYjo9nyf0U6oWg9E4yUL2dpwIavhqG8_fCPtLxlZBt0ZW4ZCD8oJSNuN_S5CVgp2mkfOgg$>

So I think it can easily be mitigated. :)


Den lør. 25. feb. 2023 kl. 22.16 skrev Daniel Bennett 
<db...@cantab.ac.uk<mailto:db...@cantab.ac.uk>>:
Hi all,

I'm running some calculations with SOC for a slab-like system with ~100 atoms, 
~1300 orbitals, and am having some issues running out of memory. I ran the same 
calculations with no SOC and things went fine.

I'm using the PSML version with intel/mkl 2021.2.0. The calculations are 
running on 48 cores with just under 4GB memory per cpu. The calculations run 
and eventually crash, either in the SCF loop or when computing the bands. From 
my submission script: "srun: error: holy7c02611: task 5: Killed" it looks like 
one of the cores gets stopped, and then the calculation hangs. I tried setting 
ulimit -s unlimited and ulimit -m unlimited, but that didn't help. I also 
decreased the mesh cutoff and kpoints from 600Ry to 400Ry and 12x12x1 to 8x8x1, 
but the calculations still run out of memory eventually.

Does anybody have any general advice for getting larger calculations with SOC 
to run without running out of memory? I could try increasing the number of 
cores, but I wanted to see if anybody had some advice first because the wait 
time is a lot longer for a larger number of cores, which makes it take a long 
time to troubleshoot. I did try going up to 96 cores (2 nodes), but it still 
crashes.

I'm not sure if I can send attachments to the mailing list, but if not I can 
send inputs / outputs privately

Thanks,

Daniel Bennett



--
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence 
(https://urldefense.com/v3/__http://www.max-centre.eu/__;!!D9dNQwwGXtA!UGLOSLEhGEY9xiUzENi1YMH4paRXN8Bztvik-VhAh_gu_vPS4z4F8c3B_nr5dFRJmnfpAJDmZL6SNciY$
 
<https://urldefense.com/v3/__http://www.max-centre.eu/__;!!D9dNQwwGXtA!VJ6dNL-0-ztPLYjo9nyf0U6oWg9E4yUL2dpwIavhqG8_fCPtLxlZBt0ZW4ZCD8oJSNuN_S5CVgphwWrjeg$>)


--
Kind regards Nick
-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a