qtlpara is not ready to use parallel vectors from scratch directories on
different nodes, but so far requires that all vector files are
accessible directly.
Both, x lapw2 -p -qtl and also x qtl -p run actually in single
mode, the -p directs them to read the .processes files and to use all
the parallel vectors (case.vector_1 .._2, ...).
When using a local SCRATCH directory, the vectors are stored there and
ONLY ACCESSIBLE on the corresponding node. Thus it works if using a
single node (all parallel vector files are accessible on that node), but
does not when using 2 or more nodes.
lapw2para can overcome this limitation, since it has a line with
vec2old_lapw, which uses scp to copy all vector files from the different
nodes to the local machine:
qtl:
echo "calculating QTL's from parallel vectors"
vec2old_lapw -p -local $so -$updn # <-------
$exe $def.def $maxproc
in qtlpara, this line is missing.
echo "calculating QTL's from parallel vectors"
$exe $def.def $maxproc
Please insert the vec2old_lapw line into qtlpara just before the $exe line.
Am 24.10.2020 um 22:30 schrieb Christian Søndergaard Pedersen:
Hello Gavin
Thanks for your reply, and apologies for my tardiness.
[1] All my calculations are run in MPI-parallel on our HPC cluster. I
cannot execute any 'x lapw[0,1,2] -p' command in the terminal (on the
cluster login node); this results in 'pbsssh: command not found'.
However, submitting via the SLURM workload manager works fine. In all my
submit scripts, I specify 'setenv SCRATCH /scratch/$USER', which is the
proper location of scratch storage on our HPC cluster.
[2] Without having tried your example for diamond, I can report that
'run_lapw -p' followed by 'x qtl -p -telnes' works without problems for
a single cell of Vanadium dioxide. However, for other systems I get the
error I specified. The other systems (1) are larger, and (2) use two
CPU's instead of a single CPU (.machines file are modified suitably).
Checking the qtl.def file for the calculation that _did_ work, I can see
that the line specifying '/scratch/chrsop/VO2.vectordn' is _also_
present here, so this is not to blame. This leaves me baffled as to what
the error can be - as far as I can tell, I am trying to perform the
exact same calculation for different systems. I thought maybe
insufficient scratch storage could be to blame, but this would most
likely show up in the 'run_lapw' cycles (I believe).
[3] I am posting here the difference between qtlpara and lapw2para:
$ grep "single" $WIENROOT/qtlpara_lapw
testinput .processes single
$ grep "single" $WIENROOT/lapw2para_lapw
testinput .processes single
single:
echo "running in single mode"
... if this is wrong, I kindly request advice on how to fix it, so I can
pass it on to our software maintenance guy. If there's anything else I
can try please let me know.
Best regards
Christian
------------------------------------------------------------------------
*Fra:* Wien <wien-boun...@zeus.theochem.tuwien.ac.at> på vegne af Gavin
Abo <gs...@crimson.ua.edu>
*Sendt:* 21. oktober 2020 07:02:01
*Til:* wien@zeus.theochem.tuwien.ac.at
*Emne:* Re: [Wien] qtl: error reading parallel vectors
I'm not sure about the physics of the following WIEN2k 19.2 parallel
calculation (with all patches at [1] applied), but mechanically the "x
qtl -p -telnes" seems to have run without error.
I typically have SCRATCH in my .bashrc set to "./" but used another
location "/home/username/wiendata/scratch" as seen below. Does a simple
k-point parallel calculation like the one below work on your system? I
haven't tried mpi parallel yet. On the other hand, I have noticed a
possible issue that if one forgets to setup a .machines file and tries
to run a parallel calculation that qtlpara_lapw seems to fail switching
over to the serial calculation mode as shown under [2] below. If one
compares for example lapw2para_lapw and qtlpara_lapw, as illustrated by
[3] below, the qtlpara_lapw may be missing some additional code that
could be needed to get that to work.
username@computername:~/wiendata/diamond$ grep SCRATCH ~/.bashrc
export SCRATCH=/home/username/wiendata/scratch
username@computername:~/wiendata/diamond$ ls
diamond.struct
username@computername:~/wiendata/diamond$ init_lapw -b
...
init_lapw finished ok
username@computername:~/wiendata/diamond$ cat .machines
1:localhost
1:localhost
granularity:1
extrafine:1
username@computername:~/wiendata/diamond$ run_lapw -p
...
in cycle 11 ETEST: .0001457550000000 CTEST: .0033029
hup: Command not found.
STOP LAPW0 END
STOP LAPW1 END
STOP LAPW1 END
STOP LAPW2 - FERMI; weights written
STOP LAPW2 END
STOP LAPW2 END
STOP SUMPARA END
STOP CORE END
STOP MIXER END
ec cc and fc_conv 1 1 1
> stop
username@computername:~/wiendata/diamond$ cp
$WIENROOT/SRC_templates/case.innes diamond.innes
username@computername:~/wiendata/diamond$ x qtl -p -telnes
running QTL in parallel mode
calculating QTL's from parallel vectors
STOP QTL END
6.4u 0.1s 0:06.59 100.0% 0+0k 0+8024io 0pf+0w
username@computername:~/wiendata/diamond$ cat diamond.inq
0 2.20000000000000000000
1
1 99 1 0
4 0 1 2 3
username@computername:~/wiendata/diamond$ x telnes3
STOP TELNES3 DONE
3.3u 0.0s 0:03.39 99.7% 0+0k 0+96io 0pf+0w
[1] https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2
<https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2>
WIEN2k-Patches/19.2 at master · gsabo/WIEN2k-Patches · GitHub
<https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2>
github.com
Contribute to gsabo/WIEN2k-Patches development by creating an account on
GitHub.
[2] Error when qtlpara_lapw tries to switch to single mode during "x qtl
-p -telnes":
username@computername:~/wiendata/diamond$ cat .machine
cat: .machine: No such file or directory
username@computername:~/wiendata/diamond$ run_lapw -p
...
in cycle 11 ETEST: .0001457550000000 CTEST: .0033029
hup: Command not found.
STOP LAPW0 END
STOP LAPW1 END
STOP LAPW2 END
STOP CORE END
STOP MIXER END
ec cc and fc_conv 1 1 1
> stop
username@computername:~/wiendata/diamond$ cp
$WIENROOT/SRC_templates/case.innes diamond.innes
username@computername:~/wiendata/diamond$ x qtl -p -telnes
single: label not found.
0.0u 0.0s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
error: command /home/username/WIEN2k/qtlpara qtl.def failed
[3] Grep difference between qtlpara_lapw and lapw2para_lapw:
username@computername:~/wiendata/diamond$ grep "single"
$WIENROOT/qtlpara_lapw
testinput .processes single
username@computername:~/wiendata/diamond$ grep "single"
$WIENROOT/lapw2para_lapw
testinput .processes single
single:
echo "running in single mode"
On 10/20/2020 12:24 PM, Christian Søndergaard Pedersen wrote:
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW:
http://www.imc.tuwien.ac.at/tc_blaha-------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html