Hi,

Thank you for replying.


More details:


1. input data:

&control
    calculation='scf'
    restart_mode='from_scratch',
    pseudo_dir = '../pot/',
    outdir='./out/'
    prefix='BaTiO3'
/
&system
    nbnd = 48
    ibrav = 0, nat = 5, ntyp = 3
    ecutwfc = 50
    occupations='smearing', smearing='gaussian', degauss=0.02
/
&electrons
    conv_thr = 1.0e-8
/
ATOMIC_SPECIES
 Ba 137.327 Ba.pbe-mt_fhi.UPF
 Ti 204.380 Ti.pbe-mt_fhi.UPF
 O  15.999  O.pbe-mt_fhi.UPF
ATOMIC_POSITIONS
 Ba 0.0000000000000000   0.0000000000000000   0.0000000000000000
 Ti 0.5000000000000000   0.5000000000000000   0.4819999933242795
 O  0.5000000000000000   0.5000000000000000   0.0160000007599592
 O  0.5000000000000000  -0.0000000000000000   0.5149999856948849
 O  0.0000000000000000   0.5000000000000000   0.5149999856948849
K_POINTS (automatic)
11 11 11 0 0 0
CELL_PARAMETERS {angstrom}
3.999800000000001       0.000000000000000       0.000000000000000
0.000000000000000       3.999800000000001       0.000000000000000
0.000000000000000       0.000000000000000       4.018000000000000


2. number of processors:
I tested 24 cores and 8 cores, and both yield the same result.

3. type of parallelization:
I don't know your meaning. I execute pw.x by:
mpirun  -np 24 pw.x < BTO.scf.in >> output

'which mpirun' output:
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun

4. when the error occurs:
in the middle of the run. The last a few lines of the output is
     total cpu time spent up to now is       32.9 secs

     total energy              =    -105.97885119 Ry
     Harris-Foulkes estimate   =    -105.99394457 Ry
     estimated scf accuracy    <       0.03479229 Ry

     iteration #  7     ecut=    50.00 Ry     beta=0.70
     Davidson diagonalization with overlap
     ethr =  1.45E-04,  avg # of iterations =  2.7

     total cpu time spent up to now is       37.3 secs

     total energy              =    -105.99039982 Ry
     Harris-Foulkes estimate   =    -105.99025175 Ry
     estimated scf accuracy    <       0.00927902 Ry

     iteration #  8     ecut=    50.00 Ry     beta=0.70
     Davidson diagonalization with overlap

5. Error message:
Something like:
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, 
remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on 
this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, 
remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed
PMPI_Cart_sub(178)...................:

Cheers!

Chong
________________________________
From: [email protected] <[email protected]> on behalf of 
Paolo Giannozzi <[email protected]>
Sent: Sunday, May 15, 2016 3:43 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] mpi error using pw.x

Please tell us what is wrong and we will fix it.

Seriously: nobody can answer your question unless you specify, as a strict 
minimum, input data, number of processors and type of parallelization that 
trigger the error, and where the error occurs (at startup, later, in the middle 
of the run, ...).

Paolo

On Sun, May 15, 2016 at 7:50 AM, Chong Wang 
<[email protected]<mailto:[email protected]>> wrote:

I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.

However, when I ran pw.x the following errors were reported:

...
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on 
this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, 
remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on 
this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, 
remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on 
this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, 
remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on 
this process; ignore_id=0)


I googled and found out this might be caused by hitting os limits of number of 
opened files. However, After I increased number of opened files per process 
from 1024 to 40960, the error persists.


What's wrong here?


Chong Wang

Ph. D. candidate

Institute for Advanced Study, Tsinghua University, Beijing, 100084

_______________________________________________
Pw_forum mailing list
[email protected]<mailto:[email protected]>
http://pwscf.org/mailman/listinfo/pw_forum



--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222

_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum

Reply via email to