I forgot to forward my replies to W90 mailing lists. ---------- Forwarded message --------- From: H. Lee <hjun...@gmail.com> Date: Mon, Mar 1, 2021 at 10:40 AM Subject: Re: [Wannier] MPI version large systems To: Jonathan Backman <jback...@iis.ee.ethz.ch>
Dear Jonathan: I think that you have an issue with reading W90 inputs when using 27 cores even if you have enough memory. I regenerated all relevant inputs (*.mmn, *.amn, and *.eig) to W90 using your *.wout and reproduced your issue. Of course, I just regenerated inputs with the same file size, but contained random numbers; however, they are surely enough for the test. I performed the test on the one node with 2 Intel Broadwell processors running at 2.6 GHz, with 14 cores each (28 cores per node), 128 GB memory, and GPFS file system. In this case, I succeeded in reading inputs using 10 cores as you can see below: (START OF OUTPUT) +----------------------------------------------------------------------------+ | b_k Vectors (Ang^-1) and Weights (Ang^2) | | ---------------------------------------- | | No. b_k(x) b_k(y) b_k(z) w_b | | --- -------------------------------- -------- | | 1 0.095739 0.000000 0.000000 54.549228 | | 2 0.000000 0.095739 0.000000 54.549228 | | 3 0.000000 0.000000 0.095739 54.549228 | | 4 0.000000 0.000000 -0.095739 54.549228 | | 5 0.000000 -0.095739 0.000000 54.549228 | | 6 -0.095739 0.000000 0.000000 54.549228 | +----------------------------------------------------------------------------+ | b_k Directions (Ang^-1) | | ----------------------- | | No. x y z | | --- -------------------------------- | | 1 0.095739 0.000000 0.000000 | | 2 0.000000 0.095739 0.000000 | | 3 0.000000 0.000000 0.095739 | +----------------------------------------------------------------------------+ Time to get kmesh 0.022 (sec) *============================================================================* | MEMORY ESTIMATE | | Maximum RAM allocated during each phase of the calculation | *============================================================================* | Disentanglement 34494.77 Mb | | Wannierise: 30250.47 Mb | | plot_wannier: 30250.47 Mb | *----------------------------------------------------------------------------* Starting a new Wannier90 calculation ... Reading overlaps from wannier90.mmn : Created on Reading projections from wannier90.amn : Created on Time to read overlaps 1009.729 (sec) *------------------------------- DISENTANGLE --------------------------------* (END OF OUTPUT) But, when I use the cores larger than 12, I encountered the same issue and confirmed that the memory footprint increases as the number of cores used increases. At this time, I am not sure whether this is due to the increase of internal MPI memory usage or some kind of buffer for the file system; this is specific to the system configuration. The only thing I can tell you is that with your inputs with very large file size the memory footprint increases with the number of cores used. So I would suggest you to try to reduce the number of cores to the smaller one, for example, 8 or 12, etc. Sincerely, Hyungjun Lee UT Austin. On Fri, Feb 26, 2021 at 5:31 PM Jonathan Backman <jback...@iis.ee.ethz.ch> wrote: > Dear Hyungjun Lee, > > Here is the output of the serial run: > > ----- START OF OUTPUT ----- > > Time to get kmesh 0.217 (sec) > > > *============================================================================* > | MEMORY > ESTIMATE | > | Maximum RAM allocated during each phase of the > calculation | > > > *============================================================================* > | Disentanglement 34494.77 > Mb | > | Wannierise: 30250.47 > Mb | > | plot_wannier: 30250.47 > Mb | > > > *----------------------------------------------------------------------------* > > Starting a new Wannier90 calculation ... > > > Reading overlaps from wannier90.mmn : File generated by VASP: unknown > system > > Reading projections from wannier90.amn : Projections from Vasp, > concatenated by Python. > > Time to read overlaps 2195.901 (sec) > > ----- END OF OUTPUT ----- > > Best, > > Jonathan > > > On 26/02/2021 21:35, H. Lee wrote: > > Dear Jonathan Backman: > > Thank you for providing your output. > > I think that even if your run stopped during the printing of > information on b vectors, you would encounter the issue in the next steps. > In normal run, the W90 output is like the following: > > ----- START OF EXAMPLE OUTPUT ----- > ... > > > +----------------------------------------------------------------------------+ > > | The b-vectors are chosen automatically > | > > | The following shells are used: 1, 2 > | > > > +----------------------------------------------------------------------------+ > > | Shell # Nearest-Neighbours > | > > | ----- -------------------- > | > > | 1 2 > | > > | 2 6 > | > > > +----------------------------------------------------------------------------+ > > | Completeness relation is fully satisfied [Eq. (B1), PRB 56, 12847 > (1997)] | > > > +----------------------------------------------------------------------------+ > > | b_k Vectors (Ang^-1) and Weights (Ang^2) > | > > | ---------------------------------------- > | > > | No. b_k(x) b_k(y) b_k(z) w_b > | > > | --- -------------------------------- -------- > | > > | 1 0.000000 0.000000 0.079153 71.124740 > | > > | 2 0.000000 0.000000 -0.079153 71.124740 > | > > | 3 0.113136 -0.000000 0.026384 26.042079 > | > > | 4 -0.113136 0.000000 -0.026384 26.042079 > | > > | 5 -0.056568 0.097979 0.026384 26.042079 > | > > | 6 0.056568 -0.097979 -0.026384 26.042079 > | > > | 7 -0.056568 -0.097979 0.026384 26.042079 > | > > | 8 0.056568 0.097979 -0.026384 26.042079 > | > > > +----------------------------------------------------------------------------+ > > | b_k Directions (Ang^-1) > | > > | ----------------------- > | > > | No. x y z > | > > | --- -------------------------------- > | > > | 1 0.000000 0.000000 0.079153 > | > > | 2 0.113136 -0.000000 0.026384 > | > > | 3 -0.056568 0.097979 0.026384 > | > > | 4 -0.056568 -0.097979 0.026384 > | > > > +----------------------------------------------------------------------------+ > > > *Time to get kmesh ..... (sec)* > > > *============================================================================* > > | MEMORY ESTIMATE > | > > | Maximum RAM allocated during each phase of the calculation > | > > > *============================================================================* > > | Disentanglement 9404.64 Mb > | > > | Wannierise: 4942.84 Mb > | > > | plot_wannier: 4942.84 Mb > | > > > *----------------------------------------------------------------------------* > > > Starting a new Wannier90 calculation ... > > > > Reading overlaps from wannier90.mmn : File generated by ... > > > Reading projections from wannier90.amn : File generated by ... > > > *Time to read overlaps ..... (sec)* > > > *------------------------------- DISENTANGLE > --------------------------------* > > > +----------------------------------------------------------------------------+ > > ... > ----- END OF EXAMPLE OUTPUT ----- > > You told me that your serial run proceeded to the disentanglement step. > Could you let me know the (1) time to get kmesh and (2) time to read > overlaps (highlighted with red-color texts in the above example output) > from your W90 output obtained by your serial run? > > Sincerely, > > Hyungjun Lee > UT Austin > > On Fri, Feb 26, 2021 at 12:23 PM Jonathan Backman <jback...@iis.ee.ethz.ch> > wrote: > >> Dear Hyungjun Lee, >> >> Thank you for the help. >> >> I have attached the W90 output file from my calculation. As you can see >> it stops printing output after picking Shell. >> >> When running the calculation using the serial version the output also >> stops at this point for a long time. Then after a few days it shows that it >> has done a few steps of the disentanglement, this however never happens for >> the parallel run. >> >> Best, >> >> Jonathan >> >> >> On 26/02/2021 18:38, H. Lee wrote: >> >> Dear Jonathan Backman: >> >> Could you show me your Wannier90 (W90) output with the high verbosity so >> that I can identify the step at which W90 got stuck; in particular, I would >> like to know whether your run passed the reading of relevant input >> matrices, for instance, Mmn. >> >> I assume that you use the disentanglement. >> In this case, the largest array is the global complex-valued array >> (attributed as SAVE and PUBLIC) of m_matrix_orig and it is only allocated >> on the ROOT node with the size (in your case) of 16 x 2688 x 2688 x 8 x 27 >> = about 25 GB (I assume nntot is 8 in your case). >> >> One of the issues in the current implementation is that even if this >> matrix is not used any more outside of the subroutine of overlap_read when >> gamma_only is false, it is not deallocated immediately after reading Mmn >> and scattering it in the subroutine of overlap_read, thereby leading to the >> very large (unbalanced) memory footprint on the ROOT node. >> >> I understand that in your case, memory might not be a problem, but I >> would like to confirm it by looking at your W90 output. >> >> Sincerely, >> >> Hyungjun Lee >> UT Austin >> >> On Fri, Feb 26, 2021 at 4:22 AM Jonathan Backman <jback...@iis.ee.ethz.ch> >> wrote: >> >>> Dear All, >>> >>> I'm trying to run Wannier90 using MPI for a large system. >>> >>> 2688 Bloch states, 2048 Wannier functions, and 27 K-points. (3x3x3 grid). >>> >>> AMN file size: 6 GB >>> >>> MMN file size: 42 GB >>> >>> My system does not run out of memory during the parallel run (512 GB >>> available). >>> >>> When using one MPI process then calculation progresses, but very slow >>> due to the size. However, when running using multiple MPI processes >>> the calculations runs but does not progress at all, I have tried waiting >>> over 2 weeks. I tried different number of MPI processes, but I would >>> assume 27 would be the best since I have 27 k-points. >>> >>> Does anyone have experience with the MPI version of the code for large >>> systems? Are there any specific setting that should be used when running >>> using MPI? >>> >>> Best regards, >>> >>> Jonathan Backman, ETH Zürich >>> >>> >>> >>> _______________________________________________ >>> Wannier mailing list >>> Wannier@lists.quantum-espresso.org >>> https://lists.quantum-espresso.org/mailman/listinfo/wannier >>> >>
_______________________________________________ Wannier mailing list Wannier@lists.quantum-espresso.org https://lists.quantum-espresso.org/mailman/listinfo/wannier