You are right, in the sense that now the code just writes
"suboptimal parallelization: some nodes have no k-points"
I'm quite sure I remember the code stopping because it was run with more pools than k-points, was this changed recently, Paolo?

On 5/20/20 10:34 AM, M.J. Hutcheon wrote:
Dear Lorenzo,

I'm quite sure that the pw code stops if you try to run with more pools than k-points !

This doesn't seem to be the case? I ran a vc-relax and an scf (attached output) with these (terrible) parallelism settings, and they ran just fine.



On 2020-05-20 09:25, Lorenzo Paulatto wrote:

While I am quite sure that such a wasteful parallelization works anyway for the self-consistent code,

I'm quite sure that the pw code stops if you try to run with more pools than k-points !

I am not equally sure it will for the phonon code.

If the ph code does not stop in this case, I'm confident it will not work properly!


It isn't presumably difficult to fix it, but I would move to a more sensible parallelization. For 20 k points and 32 processors, I would try 4 pools of 8 processors (mpirun -np 32
  ph.x -nk 4 ...)

On Tue, May 19, 2020 at 2:12 PM M.J. Hutcheon < <> < <>>> wrote:

    Dear QE users/developers,

    Following from the previous request, I've changed to a newer MPI
    library which gives a little more error information, specifically it
    does now crash with the following message:

    An error occurred in MPI_Allreduce
    eported by process [1564540929,0]
    on communicator MPI COMMUNICATOR 6 SPLIT FROM 3
    MPI_ERR_TRUNCATE: message truncated
    MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
    and potentially your MPI job)

    It appears that this is thrown at the end of a self-consistent DFPT
    calculation (see the attached output file - it appears the final
    iteration has converged). I'm using the development version of QE,
    so I suspect that the error arises from somewhere inside

    I don't really know how to debug/workaround this further, any
    ideas/suggestions would be most welcome.


    Michael Hutcheon

    TCM group, University of Cambridge

    On 2020-05-12 13:29, M.J. Hutcheon wrote:

    Dear QE users/developers,

    I am running an electron-phonon coupling calculation at the gamma
    point for a large unit cell Calcium-Hydride (Output file
    attached). The calculation appears to get stuck during the DFPT
    stage. It does not crash, or produce any error files/output of any
    sort, or run out of walltime, but the calculation does not
    progress either. I have tried different parameter sets (k-point
    grids + cutoffs), which changes the representation where the
    calculation gets stuck, but it still gets stuck. I don't really
    know what to try next, short of compiling QE in debug mode and
    running under a debugger to see where it gets stuck. Any ideas
    before I head down this laborious route?

    Many thanks,

    Michael Hutcheon

    TCM group, University of Cambridge

    Quantum ESPRESSO is supported by MaX
    ( <>
    users mailing list <>     < <>>

-- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222

Quantum ESPRESSO is supported by MaX ( <>) users mailing list <>

Lorenzo Paulatto - Paris
Quantum ESPRESSO is supported by MaX (
users mailing list

Reply via email to