Re: [Wien] [Extern] Re: RAM issues in lapw1 -bands

Peter Blaha Thu, 29 Nov 2018 05:17:36 -0800

This problem is easily solvable (it again means: You MUST READ the UG(parallelization), otherwise you will run VERY badly).

For such a small problem (14 atoms, matrixsize 2600) it is NOTnecessary (in fact probably even quite bad) to use mpi-parallelization.



Instead use k-parallelization (and maybe   export OMP_NUM_THREAD=2).

simply put eg. 24 lines like:

1:localhost

into the .machines file, and you will run 24 parallel lapw1 (each using2 cores when OMP_NUM_THREAD=2 is set).


--------------
With respect to your other questions:
I don't know what:  lapw1para_mpi -p -band is ??

lapw1 should be always invoked using:

x lapw1 -p      or    x lapw1 -p -band

The difference is just that you are using either case.klist orcase.klist_band. Checkout how many k-points are in these 2 files (250was just an "input", it seems to have made a 13x13x1 mesh and then stillapplied symmetry, so you may have just ~ 30 k-points in case.klist ...)


-----------------
Another question: do you have 48 "physical cores", or only 24 ???
Do you have 2 or 4 Xeons (with 12 cores each) in your computer ??

If you have only 24 "real" cores:

The "virtual cores" which Intel gives you "for free" due to their"hyperthreading", are usually not very effective. You can at most expectan improvement of 10-20% when using 48 instead of 24 cores, butsometimes, this can also degrade performance by 30% because thememorybus gets overloaded. So test it ....




On 11/29/18 1:10 PM, Coriolan TIUSAN wrote:

Thanks for the suggestion of dividing the band calculation.

Actually, I would like to make a 'zoom' around the Gamma point (forX-G-X direction) with a resolution of about 0.001 Bohr-1 (to get enoughaccuracy for small Rasba splittings, k_0< 0.01 Bohr-1). I guess I couldsimply make the 'zoom' calculation?

The .machines, file, having in view that I have only one node (computer)with 48 available CPUs is:


-------------------------------------

1:localhost:48
granularity:1
extrafine:1
lapw0:localhost:48
dstart:localhost:48
nlvdw:localhost:48

--------------------------------------

For a supercell here attached, I was trying to make a bandstructurecalculations along the X-G-X direction with at least 200 points....whichcorresponds to a step of only 0.005 Bohr-1, not enough for Rashba insame order of magnitude.

For my calculations I get: MATRIX SIZE 2606LOs: 138 RKM= 6.99 and theRAM of 64Gk is 100% filles plus about 100G of swap...

Beyond all aspects, what I would like to understand is also why in scfcalculation I have no memory 'overload' FOR 250K POINTS (13 13 1)...while when running 'lapw1para_mpi -p -band ' the memory issue seem moredramatic?


If necessary, my struct file is:

------------------

VFeMgO-vid                               s-o calc. M||  1.00 0.00  0.00
P 14
RELA
   5.725872  5.725872 61.131153 90.000000 90.000000 90.000000
ATOM  -1: X=0.50000000 Y=0.50000000 Z=0.01215444
           MULT= 1          ISPLIT= 8
V 1        NPT=  781  R0=.000050000 RMT=   2.18000   Z: 23.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -2: X=0.00000000 Y=0.00000000 Z=0.05174176
           MULT= 1          ISPLIT= 8
V 2        NPT=  781  R0=.000050000 RMT=   2.18000   Z: 23.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -3: X=0.50000000 Y=0.50000000 Z=0.09885823
           MULT= 1          ISPLIT= 8
V 3        NPT=  781  R0=.000050000 RMT=   2.18000   Z: 23.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -4: X=0.00000000 Y=0.00000000 Z=0.13971867
           MULT= 1          ISPLIT= 8
Fe1        NPT=  781  R0=.000050000 RMT=   1.95000   Z: 26.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -5: X=0.50000000 Y=0.50000000 Z=0.18164479
           MULT= 1          ISPLIT= 8
Fe2        NPT=  781  R0=.000050000 RMT=   1.95000   Z: 26.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -6: X=0.00000000 Y=0.00000000 Z=0.22284885
           MULT= 1          ISPLIT= 8
Fe3        NPT=  781  R0=.000050000 RMT=   1.95000   Z: 26.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -7: X=0.50000000 Y=0.50000000 Z=0.26533335
           MULT= 1          ISPLIT= 8
Fe4        NPT=  781  R0=.000050000 RMT=   1.95000   Z: 26.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -8: X=0.00000000 Y=0.00000000 Z=0.30245527
           MULT= 1          ISPLIT= 8
Fe5        NPT=  781  R0=.000050000 RMT=   1.95000   Z: 26.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM  -9: X=0.00000000 Y=0.00000000 Z=0.36627712
           MULT= 1          ISPLIT= 8
O 1        NPT=  781  R0=.000100000 RMT=   1.68000   Z: 8.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM -10: X=0.50000000 Y=0.50000000 Z=0.36416415
           MULT= 1          ISPLIT= 8
Mg1        NPT=  781  R0=.000100000 RMT=   1.87000   Z: 12.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM -11: X=0.50000000 Y=0.50000000 Z=0.43034285
           MULT= 1          ISPLIT= 8
O 2        NPT=  781  R0=.000100000 RMT=   1.68000   Z: 8.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM -12: X=0.00000000 Y=0.00000000 Z=0.43127365
           MULT= 1          ISPLIT= 8
Mg2        NPT=  781  R0=.000100000 RMT=   1.87000   Z: 12.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM -13: X=0.00000000 Y=0.00000000 Z=0.49684798
           MULT= 1          ISPLIT= 8
O 3        NPT=  781  R0=.000100000 RMT=   1.68000   Z: 8.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
ATOM -14: X=0.50000000 Y=0.50000000 Z=0.49541730
           MULT= 1          ISPLIT= 8
Mg3        NPT=  781  R0=.000100000 RMT=   1.87000   Z: 12.00000
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                      0.0000000 1.0000000 0.0000000
                      0.0000000 0.0000000 1.0000000
    4      NUMBER OF SYMMETRY OPERATIONS
-1 0 0 0.00000000
  0 1 0 0.00000000
  0 0 1 0.00000000
        1   A   1 so. oper.  type  orig. index
  1 0 0 0.00000000
  0 1 0 0.00000000
  0 0 1 0.00000000
        2   A   2
-1 0 0 0.00000000
  0-1 0 0.00000000
  0 0 1 0.00000000
        3   B   3
  1 0 0 0.00000000
  0-1 0 0.00000000
  0 0 1 0.00000000
        4   B   4
---------------------------


La 29/11/2018 13:05, Peter Blaha a scris:

You never listed your .machines file, nor do we know how many k-pointsare in the scf and the bandstructure cases and what the matrixsize(:RKM)/ real/ complex details are.
The memory leakage of intels mpi seems to be very version dependent,but there's nothing we can do against from the wien2k side.
Besides installing a different mpi version, one could more easily runthe bandstructure in pieces. Simply divide your klist_band file intoseveral pieces and calculate one after the other.
The resulting case.outputso_1,2,3.. files can simply be concatenated(cat file1 file2 file3 > file) together.
On 11/28/18 1:41 PM, Coriolan TIUSAN wrote:
Dear wien2k users,
I am running wien 18.2 on Ubuntu 18.04 , installed on a HP station:64GB, Intel® Xeon(R) Gold 5118 CPU @ 2.30GHz × 48.
The fortran compiler/math library are ifc and intel mkl library. Forparallel execution I have MPI+SCALAPACK, FFTW.
For parallel execution (-p options +.machines), I have dimensionedNMATMAX/NUME according to user guide. Therefore, standardcalculations in SCF loops turn well, without any memory pagingissues, about 90% of physical RAM being used.
However, in supercells, once getting case.vector files, whencalculating bands (lapw1c -bands -p) with fine k structure (e.g.above 150-200k on line X-G-X), necessary because I am looking tosmall Rashba shifts at metel-insulator interfaces...all availablephysical memory plus a huge amount of swap (>100G) are filled/used...
Any suggestion/ideea for overcoming this issue...without addingadditional RAM?
Why in lapw1 -p for selfonsistance memory looks enough while withswitch -band overload memory?
With thanks in advance,

C. Tiusan



_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


--

                                      P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] [Extern] Re: RAM issues in lapw1 -bands

Reply via email to