It looks like a compiler/mpi bug, since there is nothing special in your input and in your execution, unless you find evidence that the problem is reproducible on other compiler/mpi versions.
Paolo On Sun, May 15, 2016 at 10:11 AM, Chong Wang <[email protected]> wrote: > Hi, > > > Thank you for replying. > > > More details: > > > 1. input data: > > &control > calculation='scf' > restart_mode='from_scratch', > pseudo_dir = '../pot/', > outdir='./out/' > prefix='BaTiO3' > / > &system > nbnd = 48 > ibrav = 0, nat = 5, ntyp = 3 > ecutwfc = 50 > occupations='smearing', smearing='gaussian', degauss=0.02 > / > &electrons > conv_thr = 1.0e-8 > / > ATOMIC_SPECIES > Ba 137.327 Ba.pbe-mt_fhi.UPF > Ti 204.380 Ti.pbe-mt_fhi.UPF > O 15.999 O.pbe-mt_fhi.UPF > ATOMIC_POSITIONS > Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000 > Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795 > O 0.5000000000000000 0.5000000000000000 0.0160000007599592 > O 0.5000000000000000 -0.0000000000000000 0.5149999856948849 > O 0.0000000000000000 0.5000000000000000 0.5149999856948849 > K_POINTS (automatic) > 11 11 11 0 0 0 > CELL_PARAMETERS {angstrom} > 3.999800000000001 0.000000000000000 0.000000000000000 > 0.000000000000000 3.999800000000001 0.000000000000000 > 0.000000000000000 0.000000000000000 4.018000000000000 > > 2. number of processors: > I tested 24 cores and 8 cores, and both yield the same result. > > 3. type of parallelization: > I don't know your meaning. I execute pw.x by: > mpirun -np 24 pw.x < BTO.scf.in >> output > > 'which mpirun' output: > /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun > > 4. when the error occurs: > in the middle of the run. The last a few lines of the output is > total cpu time spent up to now is 32.9 secs > > total energy = -105.97885119 Ry > Harris-Foulkes estimate = -105.99394457 Ry > estimated scf accuracy < 0.03479229 Ry > > iteration # 7 ecut= 50.00 Ry beta=0.70 > Davidson diagonalization with overlap > ethr = 1.45E-04, avg # of iterations = 2.7 > > total cpu time spent up to now is 37.3 secs > > total energy = -105.99039982 Ry > Harris-Foulkes estimate = -105.99025175 Ry > estimated scf accuracy < 0.00927902 Ry > > iteration # 8 ecut= 50.00 Ry beta=0.70 > Davidson diagonalization with overlap > > 5. Error message: > Something like: > Fatal error in PMPI_Cart_sub: Other MPI error, error stack: > PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, > remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed > PMPI_Cart_sub(178)...................: > MPIR_Comm_split_impl(270)............: > MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 > free on this process; ignore_id=0) > Fatal error in PMPI_Cart_sub: Other MPI error, error stack: > PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, > remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed > PMPI_Cart_sub(178)...................: > > Cheers! > > Chong > ------------------------------ > *From:* [email protected] <[email protected]> on behalf > of Paolo Giannozzi <[email protected]> > *Sent:* Sunday, May 15, 2016 3:43 PM > *To:* PWSCF Forum > *Subject:* Re: [Pw_forum] mpi error using pw.x > > Please tell us what is wrong and we will fix it. > > Seriously: nobody can answer your question unless you specify, as a strict > minimum, input data, number of processors and type of parallelization that > trigger the error, and where the error occurs (at startup, later, in the > middle of the run, ...). > > Paolo > > On Sun, May 15, 2016 at 7:50 AM, Chong Wang <[email protected]> wrote: > >> I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3. >> >> However, when I ran pw.x the following errors were reported: >> >> ... >> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >> free on this process; ignore_id=0) >> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >> remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed >> PMPI_Cart_sub(178)...................: >> MPIR_Comm_split_impl(270)............: >> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >> free on this process; ignore_id=0) >> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >> remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed >> PMPI_Cart_sub(178)...................: >> MPIR_Comm_split_impl(270)............: >> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >> free on this process; ignore_id=0) >> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >> remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed >> PMPI_Cart_sub(178)...................: >> MPIR_Comm_split_impl(270)............: >> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >> free on this process; ignore_id=0) >> >> I googled and found out this might be caused by hitting os limits of >> number of opened files. However, After I increased number of opened files >> per process from 1024 to 40960, the error persists. >> >> >> What's wrong here? >> >> >> Chong Wang >> >> Ph. D. candidate >> >> Institute for Advanced Study, Tsinghua University, Beijing, 100084 >> >> _______________________________________________ >> Pw_forum mailing list >> [email protected] >> http://pwscf.org/mailman/listinfo/pw_forum >> > > > > -- > Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, > Univ. Udine, via delle Scienze 208, 33100 Udine, Italy > Phone +39-0432-558216, fax +39-0432-558222 > > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
