I forgot to forward my replies to W90 mailing lists.

---------- Forwarded message ---------
From: H. Lee <hjun...@gmail.com>
Date: Mon, Mar 1, 2021 at 10:40 AM
Subject: Re: [Wannier] MPI version large systems
To: Jonathan Backman <jback...@iis.ee.ethz.ch>

Dear Jonathan:

I think that you have an issue with reading W90 inputs when using 27 cores
even if you have enough memory.

I regenerated all relevant inputs (*.mmn, *.amn, and *.eig) to W90 using
your *.wout and reproduced your issue.
Of course, I just regenerated inputs with the same file size, but contained
random numbers; however, they are surely enough for the test.

I performed the test on the one node with 2 Intel Broadwell processors
running at 2.6 GHz, with 14 cores each (28 cores per node), 128 GB memory,
and GPFS file system.
In this case, I succeeded in reading inputs using 10 cores as you can see


 |                  b_k Vectors (Ang^-1) and Weights (Ang^2)

 |                  ----------------------------------------

 |            No.         b_k(x)      b_k(y)      b_k(z)        w_b

 |            ---        --------------------------------     --------

 |             1         0.095739    0.000000    0.000000    54.549228

 |             2         0.000000    0.095739    0.000000    54.549228

 |             3         0.000000    0.000000    0.095739    54.549228

 |             4         0.000000    0.000000   -0.095739    54.549228

 |             5         0.000000   -0.095739    0.000000    54.549228

 |             6        -0.095739    0.000000    0.000000    54.549228


 |                           b_k Directions (Ang^-1)

 |                           -----------------------

 |            No.           x           y           z

 |            ---        --------------------------------

 |             1         0.095739    0.000000    0.000000

 |             2         0.000000    0.095739    0.000000

 |             3         0.000000    0.000000    0.095739


 Time to get kmesh              0.022 (sec)


 |                              MEMORY ESTIMATE

 |         Maximum RAM allocated during each phase of the calculation


 |                        Disentanglement        34494.77 Mb

 |                            Wannierise:        30250.47 Mb

 |                          plot_wannier:        30250.47 Mb


 Starting a new Wannier90 calculation ...

 Reading overlaps from wannier90.mmn    :  Created on

 Reading projections from wannier90.amn :  Created on

 Time to read overlaps       1009.729 (sec)

 *------------------------------- DISENTANGLE

But, when I use the cores larger than 12, I encountered the same issue and
confirmed that the memory footprint increases as the number of cores used
At this time, I am not sure whether this is due to the increase of internal
MPI memory usage or some kind of buffer for the file system; this is
specific to the system configuration.

The only thing I can tell you is that with your inputs with very large file
size the memory footprint increases with the number of cores used.
So I would suggest you to try to reduce the number of cores to the smaller
one, for example, 8 or 12, etc.


Hyungjun Lee
UT Austin.

On Fri, Feb 26, 2021 at 5:31 PM Jonathan Backman <jback...@iis.ee.ethz.ch>

> Dear Hyungjun Lee,
> Here is the output of the serial run:
> ----- START OF OUTPUT -----
> Time to get kmesh              0.217 (sec)
> *============================================================================*
>  |                              MEMORY
> ESTIMATE                               |
>  |         Maximum RAM allocated during each phase of the
> calculation         |
> *============================================================================*
>  |                        Disentanglement        34494.77
> Mb                  |
>  |                            Wannierise:        30250.47
> Mb                  |
>  |                          plot_wannier:        30250.47
> Mb                  |
> *----------------------------------------------------------------------------*
>  Starting a new Wannier90 calculation ...
>  Reading overlaps from wannier90.mmn    : File generated by VASP: unknown
> system
>  Reading projections from wannier90.amn : Projections from Vasp,
> concatenated by Python.
>  Time to read overlaps       2195.901 (sec)
> ----- END OF OUTPUT -----
> Best,
> Jonathan
> On 26/02/2021 21:35, H. Lee wrote:
> Dear Jonathan Backman:
> Thank you for providing your output.
> I think that even if your run stopped during the printing of
> information on b vectors, you would encounter the issue in the next steps.
> In normal run, the W90 output is like the following:
> ...
> +----------------------------------------------------------------------------+
>  | The b-vectors are chosen automatically
>     |
>  | The following shells are used:   1,  2
>     |
> +----------------------------------------------------------------------------+
>  |                        Shell   # Nearest-Neighbours
>     |
>  |                        -----   --------------------
>     |
>  |                          1               2
>     |
>  |                          2               6
>     |
> +----------------------------------------------------------------------------+
>  | Completeness relation is fully satisfied [Eq. (B1), PRB 56, 12847
> (1997)]  |
> +----------------------------------------------------------------------------+
>  |                  b_k Vectors (Ang^-1) and Weights (Ang^2)
>     |
>  |                  ----------------------------------------
>     |
>  |            No.         b_k(x)      b_k(y)      b_k(z)        w_b
>     |
>  |            ---        --------------------------------     --------
>     |
>  |             1         0.000000    0.000000    0.079153    71.124740
>     |
>  |             2         0.000000    0.000000   -0.079153    71.124740
>     |
>  |             3         0.113136   -0.000000    0.026384    26.042079
>     |
>  |             4        -0.113136    0.000000   -0.026384    26.042079
>     |
>  |             5        -0.056568    0.097979    0.026384    26.042079
>     |
>  |             6         0.056568   -0.097979   -0.026384    26.042079
>     |
>  |             7        -0.056568   -0.097979    0.026384    26.042079
>     |
>  |             8         0.056568    0.097979   -0.026384    26.042079
>     |
> +----------------------------------------------------------------------------+
>  |                           b_k Directions (Ang^-1)
>     |
>  |                           -----------------------
>     |
>  |            No.           x           y           z
>     |
>  |            ---        --------------------------------
>     |
>  |             1         0.000000    0.000000    0.079153
>     |
>  |             2         0.113136   -0.000000    0.026384
>     |
>  |             3        -0.056568    0.097979    0.026384
>     |
>  |             4        -0.056568   -0.097979    0.026384
>     |
> +----------------------------------------------------------------------------+
>  *Time to get kmesh              ..... (sec)*
> *============================================================================*
>  |                              MEMORY ESTIMATE
>     |
>  |         Maximum RAM allocated during each phase of the calculation
>     |
> *============================================================================*
>  |                        Disentanglement         9404.64 Mb
>     |
>  |                            Wannierise:         4942.84 Mb
>     |
>  |                          plot_wannier:         4942.84 Mb
>     |
> *----------------------------------------------------------------------------*
>  Starting a new Wannier90 calculation ...
>  Reading overlaps from wannier90.mmn    : File generated by ...
>  Reading projections from wannier90.amn : File generated by ...
>  *Time to read overlaps        ..... (sec)*
>  *------------------------------- DISENTANGLE
> --------------------------------*
> +----------------------------------------------------------------------------+
> ...
> You told me that your serial run proceeded to the disentanglement step.
> Could you let me know the (1) time to get kmesh and (2) time to read
> overlaps (highlighted with red-color texts in the above example output)
> from your W90 output obtained by your serial run?
> Sincerely,
> Hyungjun Lee
> UT Austin
> On Fri, Feb 26, 2021 at 12:23 PM Jonathan Backman <jback...@iis.ee.ethz.ch>
> wrote:
>> Dear Hyungjun Lee,
>> Thank you for the help.
>> I have attached the W90 output file from my calculation. As you can see
>> it stops printing output after picking Shell.
>> When running the calculation using the serial version the output also
>> stops at this point for a long time. Then after a few days it shows that it
>> has done a few steps of the disentanglement, this however never happens for
>> the parallel run.
>> Best,
>> Jonathan
>> On 26/02/2021 18:38, H. Lee wrote:
>> Dear Jonathan Backman:
>> Could you show me your Wannier90 (W90) output with the high verbosity so
>> that I can identify the step at which W90 got stuck; in particular, I would
>> like to know whether your run passed the reading of relevant input
>> matrices, for instance, Mmn.
>> I assume that you use the disentanglement.
>> In this case, the largest array is the global complex-valued array
>> (attributed as SAVE and PUBLIC) of m_matrix_orig and it is only allocated
>> on the ROOT node with the size (in your case) of 16 x 2688 x 2688 x 8 x 27
>> = about 25 GB (I assume nntot is 8 in your case).
>> One of the issues in the current implementation is that even if this
>> matrix is not used any more outside of the subroutine of overlap_read when
>> gamma_only is false, it is not deallocated immediately after reading Mmn
>> and scattering it in the subroutine of overlap_read, thereby leading to the
>> very large (unbalanced) memory footprint on the ROOT node.
>> I understand that in your case, memory might not be a problem, but I
>> would like to confirm it by looking at your W90 output.
>> Sincerely,
>> Hyungjun Lee
>> UT Austin
>> On Fri, Feb 26, 2021 at 4:22 AM Jonathan Backman <jback...@iis.ee.ethz.ch>
>> wrote:
>>> Dear All,
>>> I'm trying to run Wannier90 using MPI for a large system.
>>> 2688 Bloch states, 2048 Wannier functions, and 27 K-points. (3x3x3 grid).
>>> AMN file size: 6 GB
>>> MMN file size: 42 GB
>>> My system does not run out of memory during the parallel run (512 GB
>>> available).
>>> When using one MPI process then calculation progresses, but very slow
>>> due to the size.  However, when running using multiple MPI processes
>>> the calculations runs but does not progress at all, I have tried waiting
>>> over 2 weeks.  I tried different number of MPI processes, but I would
>>> assume 27 would be the best since I have 27 k-points.
>>> Does anyone have experience with the MPI version of the code for large
>>> systems? Are there any specific setting that should be used when running
>>> using MPI?
>>> Best regards,
>>> Jonathan Backman, ETH Zürich
>>> _______________________________________________
>>> Wannier mailing list
>>> Wannier@lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/wannier
Wannier mailing list

Reply via email to