Re: [Wien] .machines for several nodes

Peter Blaha Mon, 12 Oct 2020 02:58:47 -0700

Yes, this is ok when your have nodes with 16 cores !!!

(Only the lapw0 line could use :16 instead of 8 if you have 96 atoms,but most likely this is fairly negligible).

Yes, the QTL calculation in lapw2 is also affected by theparallelization. but it reads from a .processes file, which is createdby lapw1.

If you run x lapw2 -p -qtl in an extra job, you should add thefollowing line to create a "correct" .processes file:

x lapw1 -p -d >&/dev/null # Create .processes (necessary forstandalone-lapw2)


On 10/12/20 11:45 AM, Christian Søndergaard Pedersen wrote:

This went a long way towards clearing up my confusion, thanks again. Iwill try starting an MPI-parallel calculations for 4 nodes with 16 coreseach using the following .machines-file:
1:g008:16
1:g021:16
1:g025:16
1:g028:16
lapw0: g008:8 g021:8 g025:8 g028:8

dstart: g008:8 g021:8 g025:8 g028:8
... and see how it performs. If the matrix sizes are small, I understandthat I could also have each node work on 2 (or more) k-points at thesame time, by specifying:
1:g008:8
1:g008:8
1:g021:8
1:g021:8
1:g025:8
1:g025:8
1:g028:8
1:g028:8
so that for instance g008 will work on 2 kpoints using 8 cores for eachk point, am I right? And a (hopefully) final question, since qtlaccording to the manual runs in k-point parallel, is it also affected bythe parallellization scheme specified for lapw1 and lapw2 (unless Ideliberately change it)?
------------------------------------------------------------------------
*Fra:* Wien <[email protected]> på vegne af Ruh,Thomas <[email protected]>
*Sendt:* 12. oktober 2020 10:59:09
*Til:* A Mailing list for WIEN2k users
*Emne:* Re: [Wien] .machines for several nodes

I am afraid, there is still some confusion.


First about /lapw1/:
Sorry for my unclear statement - I meant that you need one line perk-parallel job in the sense that #lines k-points are run simultaneously,i. e. if you speficify this part of the machines file like this:
1:g008:16

1:g021:16

1:g025:16

1:g028:16
your k-point list will be split into 4 parts of 56 k-points each [1] ,which will be processed step-by-step. Node g008 will work in its firstk-point, while node g021 will do the same for its first k-point, and so on
You need the ":16" after the name of the node. Otherwise, on every nodeonly *one* core would be used. If it is useful to use 16 mpi-paralleljobs per k-point (meaning that the matrices will distributed on 16 coreswith each core getting only 1/16 of the matrix elements) depends on yourmatrix sizes (which in turn depend on your rkmax). You should check thatby grepping :rkm in your case.scf file. If the matrix size there issmall, using OMP_NUM_THREADS 16 might be much faster (since MPI addsoverhead to your calculation).
Regarding /lapw0/dstart/:
The way you set the calculation up could lead to (possible severe)overloading of your nodes: WIEN2k will start 24 jobs on each node (so1.5 times the number of cores) at the same time doing the calculationfor 1 atom each.
As one possible alternative, you specify only 8 cores per node (i.e. forexample "lapw0: g008:8" and so on) 8 jobs per node, which would lead tostep-by-step calculations for 3 atoms per core.
Which option is faster is hard to tell and depends a lot on your hardware.
So what you could do - in principle - is to test multiple configurations(you can modify your .machines file on the fly during a SCF run) in thefirst cycles, compare the times (in case.dayfile), and use the fasterone for the rest of the run.
Regards,
Thomas
[1] Sidenote: This splitting is controlled by the first number - in thiscase 4 equal sublists will be set-up - you could also specifiy different"weights", for instance, if your nodes are of different speeds, themachinesfile could then read for example:
3:g008:16

2:g021:16

2:g025:16

1:g028:16
In this case, the first node would "get" 3/8 of the k-points (84), nodesg021 and g025 would geht 2/8 each (56), and the last one (because it isvery slow) would get only 28 k-points.
------------------------------------------------------------------------
*Von:* Wien <[email protected]> im Auftrag vonChristian Søndergaard Pedersen <[email protected]>
*Gesendet:* Montag, 12. Oktober 2020 10:24
*An:* A Mailing list for WIEN2k users
*Betreff:* Re: [Wien] .machines for several nodes
Thanks a lot for your answer. After re-reading the relevant pages in theUser Guide, I am still left with some questions. Specifically, I amworking with a system containing 96 atoms (as described in thecase.struct-file) and 224 inequivalent k points; i.e. 500 kpointsdistributed as a 7x8x8 grid (448 total) reduced to 224 kpoints. Runningon 4 nodes each with 16 cores, I want each of the 4 nodes to calculate56 k points (224/4 = 56). Meanwhile, each node should handle 24 atoms(96/4 = 24).
Part of my confusion stems from your suggestion that I repeat the line"1:g008:4 [...]" a number of times equal to the number of k points Iwant to run in parallel, and that each repetition should refer to adifferent node. The reason is that the line in question already containsthe names of all four nodes that were assigned to the job. However,combining your advice with the example on page 86, the lines should read:
1:g008

1:g021

1:g025

1:g028 # k points distributed over 4 jobs, running on 1 node each

extrafine:1
As for the parallellization over atoms for dstart and lapw0, Iunderstand that the numbers assigned to each individual node should sumup to the number of atoms in the system, like this:
dstart:g008:24 g021:24 g025:24 g028:24

lapw0:g008:24 g021:24 g025:24 g028:24
so the final .machines-file would be a combination of the above pieces.Have I understood this correctly, or am I missing the mark? Also, isthere any difference between distributing the k points across four jobs(1 for each node), and across 224 jobs (by repeating each of the 1:gxxxlines 56 times)?
Best regards

Christian

------------------------------------------------------------------------
*Fra:* Wien <[email protected]> på vegne af Ruh,Thomas <[email protected]>
*Sendt:* 12. oktober 2020 09:29:37
*Til:* A Mailing list for WIEN2k users
*Emne:* Re: [Wien] .machines for several nodes

Hi,


your .machines is wrong.
The nodes for /lapw1 /are prefaced not with "lapw1:" but only with "1:"./lapw2 /needs no line, as it takes the same nodes as lapw1 before.
So an example for your usecase would be:


#

dstart:g008:4 g021:4 g025:4 g028:4

lapw0:g008:4 g021:4 g025:4 g028:4

1:g008:4 g021:4 g025:4 g028:4

granularity:1

extrafine:1
The line starting with "1:" has to be repeated (with different nodes, ofcourse) x times, if you want to run x k-points in parallel (you can findmore details about this in the usersguide, pages 84-91).
Regards,

Thomas
PS: As a sidenote: Both /dstart /and /lapw0 /parallelize over atoms, so16 nodes might not be the best choice for your example.
------------------------------------------------------------------------
*Von:* Wien <[email protected]> im Auftrag vonChristian Søndergaard Pedersen <[email protected]>
*Gesendet:* Montag, 12. Oktober 2020 09:06
*An:* [email protected]
*Betreff:* [Wien] .machines for several nodes

Hello everybody
I am new to WIEN2k, and am struggling with parallellizing calculationson our HPC cluster beyond what can be achieved using OMP. In particular,I want to execute run_lapw and/or runsp_lapw running on four identicalnodes (16 cores each), parallellizing over k points (unless there's amore efficient scheme). To achieve this, I try to mimic the example fromthe User Guide (without the extra Alpha node), but my .machines-filedoes not work the way I intended. This is what I have:
#

dstart:g008:4 g021:4 g025:4 g028:4

lapw0:g008:4 g021:4 g025:4 g028:4

lapw1:g008:4 g021:4 g025:4 g028:4

lapw2:g008:4 g021:4 g025:4 g028:4

granularity:1

extrafine:1
The node names gxxx are read from SLURM_JOB_NODELIST in the submitscript, and a couple of regular expressions generate the above lines.Afterwards, my job script does the following:
srun hostname -s > slurm.hosts
run_lapw -p
which results in a job that idles for the entire walltime and finisheswith a CPU efficiency of 0.00%. I would appreciate any help in figuringout where I've gone wrong.
Best regards
Christian


_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/[email protected]/index.html


--

                                      P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: [email protected]    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------
_______________________________________________
Wien mailing list
[email protected]
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/[email protected]/index.html

Re: [Wien] .machines for several nodes

Reply via email to