Dear All, I have found that calculations involving band group parallelism that worked correctly using QE 5.2.0 produce errors in version 5.3.0 (see below for an example input file). In particular, when I run a PBE0 calculation with either nbgrp or ndiag set to 1, everything runs correctly; however, when I run a calculation with both nbgrp and ndiag set greater than 1, the calculation immediately fails with the following error messages:
Rank 48 [Mon Jan 25 09:52:04 2016] [c0-0c0s14n2] Fatal error in PMPI_Group_incl: Invalid rank, error stack: PMPI_Group_incl(173).............: MPI_Group_incl(group=0x88000002, n=36, ranks=0x53a3c80, new_group=0x7fffffff6794) failed MPIR_Group_check_valid_ranks(259): Duplicate ranks in rank array at index 12, has value 0 which is also the value at index 0 Rank 93 [Mon Jan 25 09:52:04 2016] [c0-0c0s14n3] Fatal error in PMPI_Group_incl: Invalid rank, error stack: PMPI_Group_incl(173).............: MPI_Group_incl(group=0x88000002, n=36, ranks=0x538fdf0, new_group=0x7fffffff6794) failed MPIR_Group_check_valid_ranks(259): Duplicate ranks in rank array at index 12, has value 0 which is also the value at index 0 etc... The error is apparently related to a change in Modules/mp_global.f90 on line 80. Here, the line previously read: CALL mp_start_diag ( ndiag_, intra_BGRP_comm ) In QE 5.3.0, this has been changed to: CALL mp_start_diag ( ndiag_, intra_POOL_comm ) The call using intra_BGRP_comm still exists in version 5.3.0 of the code, but is commented out, and the surrounding comments indicate that it should be possible to switch back to the old parallelization by commenting/uncommenting as desired. When I do this, I find that instead of the error messages described above, I get the following error messages: Error in routine cdiaghg(193): problems computing cholesky Am I missing something, or are these errors the result of a bug? Best Regards, Dr. Taylor Barnes, Lawrence Berkeley National Laboratory ================= Run Command: ================= srun -n 96 pw.x -nbgrp 4 -in input > input.out ================= Input File: ================= &control prefix = 'water' calculation = 'scf' restart_mode = 'from_scratch' wf_collect = .true. disk_io = 'none' tstress = .false. tprnfor = .false. outdir = './' wfcdir = './' pseudo_dir = '/global/homes/t/tabarnes/espresso/pseudo' / &system ibrav = 1 celldm(1) = 15.249332837 nat = 48 ntyp = 2 ecutwfc = 130 input_dft = 'pbe0' / &electrons diago_thr_init=5.0d-4 mixing_mode = 'plain' mixing_beta = 0.7 mixing_ndim = 8 diagonalization = 'david' diago_david_ndim = 4 diago_full_acc = .true. electron_maxstep=3 scf_must_converge=.false. / ATOMIC_SPECIES O 15.999 O.pbe-mt_fhi.UPF H 1.008 H.pbe-mt_fhi.UPF ATOMIC_POSITIONS alat O 0.405369 0.567356 0.442192 H 0.471865 0.482160 0.381557 H 0.442867 0.572759 0.560178 O 0.584679 0.262476 0.215740 H 0.689058 0.204790 0.249459 H 0.503275 0.179176 0.173433 O 0.613936 0.468084 0.701359 H 0.720162 0.421081 0.658182 H 0.629377 0.503798 0.819016 O 0.692499 0.571474 0.008796 H 0.815865 0.562339 0.016182 H 0.640331 0.489132 0.085318 O 0.138542 0.767947 0.322270 H 0.052664 0.771819 0.411531 H 0.239736 0.710419 0.364788 O 0.127282 0.623278 0.765792 H 0.075781 0.693268 0.677441 H 0.243000 0.662182 0.787094 O 0.572799 0.844477 0.542529 H 0.556579 0.966998 0.533420 H 0.548297 0.791340 0.433292 O -0.007677 0.992860 0.095967 H 0.064148 1.011844 -0.003219 H 0.048026 0.913005 0.172625 O 0.035337 0.547318 0.085085 H 0.072732 0.625835 0.173379 H 0.089917 0.576762 -0.022194 O 0.666008 0.900155 0.183677 H 0.773299 0.937456 0.134145 H 0.609289 0.822407 0.105606 O 0.443447 0.737755 0.836152 H 0.526041 0.665651 0.893906 H 0.483300 0.762549 0.721464 O 0.934493 0.378765 0.627850 H 1.012721 0.449242 0.693201 H 0.955703 0.394823 0.506816 O 0.006386 0.270244 0.269327 H 0.021231 0.364797 0.190612 H 0.021863 0.163251 0.208755 O 0.936337 0.855942 0.611999 H 0.956610 0.972475 0.648965 H 0.815045 0.839173 0.592915 O 0.228881 0.037509 0.849634 H 0.263938 0.065862 0.734213 H 0.282576 -0.068680 0.884220 O 0.346187 0.176679 0.553828 H 0.247521 0.218347 0.491489 H 0.402671 0.271609 0.610010 K_POINTS automatic 1 1 1 1 1 1
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
