Please report the exact conditions under which you are running the 24-processor case: something like mpirun -np 24 pw.x -nk .. -nd .. -whatever_option
Paolo On Wed, Aug 29, 2018 at 11:49 PM, Martina Lessio <[email protected]> wrote: > Dear all, > > I have been successfully using QE 5.4 for a while now but recently decided > to install the newest version hoping that some issues I have been > experiencing with 5.4 would be resolved. However, I now have some issues > when running version 6.3 in parallel. In particular, if I run a sample > calculation (input file provided below) on more than 16 processors the > calculation crashes after printing this line "Starting wfcs are random" and > the following error message is printed in the output file: > [compute-0-5.local:5241] *** An error occurred in MPI_Bcast > [compute-0-5.local:5241] *** on communicator MPI COMMUNICATOR 20 SPLIT > FROM 18 > [compute-0-5.local:5241] *** MPI_ERR_TRUNCATE: message truncated > [compute-0-5.local:5241] *** MPI_ERRORS_ARE_FATAL: your MPI job will now > abort > -------------------------------------------------------------------------- > mpirun has exited due to process rank 16 with PID 5243 on > node compute-0-5.local exiting improperly. There are two reasons this > could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > [compute-0-5.local:05226] 1 more process has sent help message > help-mpi-errors.txt / mpi_errors_are_fatal > [compute-0-5.local:05226] Set MCA parameter "orte_base_help_aggregate" to > 0 to see all help / error messages > > > Note that I have been running QE 5.4 on 24 cpu on this same computer > cluster without any issue. I am copying my input file at the end of this > email. > > Any help with this would be greatly appreciated. > Thank you in advance. > > All the best, > Martina > > Martina Lessio > Department of Chemistry > Columbia University > > *Input file:* > &control > calculation = 'relax' > restart_mode='from_scratch', > prefix='MoTe2_bulk_opt_1', > pseudo_dir = '/home/mlessio/espresso-5.4.0/pseudo/', > outdir='/home/mlessio/espresso-5.4.0/tempdir/' > / > &system > ibrav= 4, A=3.530, B=3.530, C=13.882, cosAB=-0.5, cosAC=0, cosBC=0, > nat= 6, ntyp= 2, > ecutwfc =60. > occupations='smearing', smearing='gaussian', degauss=0.01 > nspin =1 > / > &electrons > mixing_mode = 'plain' > mixing_beta = 0.7 > conv_thr = 1.0d-10 > / > &ions > / > ATOMIC_SPECIES > Mo 95.96 Mo_ONCV_PBE_FR-1.0.upf > Te 127.6 Te_ONCV_PBE_FR-1.1.upf > ATOMIC_POSITIONS {crystal} > Te 0.333333334 0.666666643 0.625000034 > Te 0.666666641 0.333333282 0.375000000 > Te 0.666666641 0.333333282 0.125000000 > Te 0.333333334 0.666666643 0.874999966 > Mo 0.333333334 0.666666643 0.250000000 > Mo 0.666666641 0.333333282 0.750000000 > > K_POINTS {automatic} > 8 8 2 0 0 0 > > > _______________________________________________ > users mailing list > [email protected] > https://lists.quantum-espresso.org/mailman/listinfo/users > -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ users mailing list [email protected] https://lists.quantum-espresso.org/mailman/listinfo/users
