Dear Miro,
It is hard to give your a meaningful answer with little info, but I will
try my best guess because I needed to set this up recently. I assume
that you want to use k-parallel and you don't have mpi.
With a serial job you automatically run on a single node. Single node is
a physical computer with a physical CPU, but typically with 4 memory
channels to it can run 8 jobs in parallel.
With k-parallel you need to define nodes on which k-points are
calculated. With slurm, maybe things will work if you create 8
"localhost" lines in .machines file, because this will still run on a
single node that is assigned automatically. But things probably won't
work if you create lines such as "node001", "node002" etc (depending on
the names of the nodes in your cluster). And to take an advantage of the
cluster you need to use as many nodes as possible.
Now the problem is, that k-parallel works assuming you can ssh to every
node without a password. This is typically forbidden in the slurm
environment. Prof. Blaha provides workarounds, but to me their
implementation seems complicated (I not an expert):
http://www.wien2k.at/reg_user/faq/pbs.html
I am using an older cluster where it is possible to allocate nodes, and
with this allocation comes automatically passwordless ssh to these
nodes. Then the slurm workarounds are not needed. Maybe you can talk to
your administrator if this is possible in your cluster, because I think
typically this is blocked.
Best,
Lukasz
On 2023-06-20 10:18, Ilias Miroslav, doc. RNDr., PhD. wrote:
Hello,
I am able to run serial SCF via SLURM
https://github.com/miroi/open-collection/blob/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg/virgo_slurm_wien2kgnupar_fromdstart.01
but when trying parallel
https://github.com/miroi/open-collection/blob/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg/virgo_slurm_wien2kgnupar_fromdstart.02
I get lapw2.error
'LAPW2' - can't open unit: 30
'LAPW2' - filename: LvO2onQg.energy_1
** testerror: Error in Parallel LAPW2
The file "LvO2onQg.energy" is correct in serial mode.
Seems that LvO2onQg.energy_1 file is not produces in parallel run ?
All files are
https://github.com/miroi/open-collection/tree/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg
[1]
Best,
Miro
Links:
------
[1]
https://github.com/miroi/open-collection/tree/master/theoretical_chemistry/software/wien2k/runs/LvO2_on_small_quartz/wien2k/LvO2onQg
_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/[email protected]/index.html
_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/[email protected]/index.html