On Mon, May 16, 2016 at 4:11 AM, Chong Wang <[email protected]> wrote:
I have checked my mpif90 calls gfortran so there's no mix up. > I am not sure it is possible to use gfortran together with intel mpi. If you have intel mpi and mkl, presumably you have the intel compiler as well. > Can you kindly share with me your make.sys? > it doesn't make sense to share a make.sys file unless the software configuration is the same. Paolo Thanks in advance! > > > Best! > > > Chong Wang > ------------------------------ > *From:* [email protected] <[email protected]> on behalf > of Paolo Giannozzi <[email protected]> > *Sent:* Monday, May 16, 2016 3:10 AM > *To:* PWSCF Forum > *Subject:* Re: [Pw_forum] mpi error using pw.x > > Your make.sys shows clear signs of mixup between ifort and gfortran. > Please verify that mpif90 calls ifort and not gfortran (or vice versa). > Configure issues a warning if this happens. > > I have successfully run your test on a machine with some recent intel > compiler and intel mpi. The second output (run as mpirun -np 18 pw.x -nk > 18....) is an example of what I mean by "type of parallelization": there > are many different parallelization levels in QE. This is on k-points (and > runs faster in this case on less processors than parallelization on plane > waves). > > Paolo > > On Sun, May 15, 2016 at 6:01 PM, Chong Wang <[email protected]> wrote: > >> Hi, >> >> >> I have done more test: >> >> 1. intel mpi 2015 yields segment fault >> >> 2. intel mpi 2013 yields the same error here >> >> Did I do something wrong with compiling? Here's my make.sys: >> >> >> # make.sys. Generated from make.sys.in by configure. >> >> >> # compilation rules >> >> >> .SUFFIXES : >> >> .SUFFIXES : .o .c .f .f90 >> >> >> # most fortran compilers can directly preprocess c-like directives: use >> >> # $(MPIF90) $(F90FLAGS) -c $< >> >> # if explicit preprocessing by the C preprocessor is needed, use: >> >> # $(CPP) $(CPPFLAGS) $< -o $*.F90 >> >> # $(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o >> >> # remember the tabulator in the first column !!! >> >> >> .f90.o: >> >> $(MPIF90) $(F90FLAGS) -c $< >> >> >> # .f.o and .c.o: do not modify >> >> >> .f.o: >> >> $(F77) $(FFLAGS) -c $< >> >> >> .c.o: >> >> $(CC) $(CFLAGS) -c $< >> >> >> >> >> # Top QE directory, not used in QE but useful for linking QE libs with >> plugins >> >> # The following syntax should always point to TOPDIR: >> >> # $(dir $(abspath $(filter %make.sys,$(MAKEFILE_LIST)))) >> >> >> TOPDIR = /home/wangc/temp/espresso-5.4.0 >> >> >> # DFLAGS = precompilation options (possible arguments to -D and -U) >> >> # used by the C compiler and preprocessor >> >> # FDFLAGS = as DFLAGS, for the f90 compiler >> >> # See include/defs.h.README for a list of options and their meaning >> >> # With the exception of IBM xlf, FDFLAGS = $(DFLAGS) >> >> # For IBM xlf, FDFLAGS is the same as DFLAGS with separating commas >> >> >> # MANUAL_DFLAGS = additional precompilation option(s), if desired >> >> # BEWARE: it does not work for IBM xlf! Manually edit >> FDFLAGS >> >> MANUAL_DFLAGS = >> >> DFLAGS = -D__GFORTRAN -D__STD_F95 -D__DFTI -D__MPI -D__PARA >> -D__SCALAPACK >> >> FDFLAGS = $(DFLAGS) $(MANUAL_DFLAGS) >> >> >> # IFLAGS = how to locate directories with *.h or *.f90 file to be included >> >> # typically -I../include -I/some/other/directory/ >> >> # the latter contains .e.g. files needed by FFT libraries >> >> >> IFLAGS = -I../include >> -I/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include >> >> >> # MOD_FLAGS = flag used by f90 compiler to locate modules >> >> # Each Makefile defines the list of needed modules in MODFLAGS >> >> >> MOD_FLAG = -I >> >> >> # Compilers: fortran-90, fortran-77, C >> >> # If a parallel compilation is desired, MPIF90 should be a fortran-90 >> >> # compiler that produces executables for parallel execution using MPI >> >> # (such as for instance mpif90, mpf90, mpxlf90,...); >> >> # otherwise, an ordinary fortran-90 compiler (f90, g95, xlf90, ifort,...) >> >> # If you have a parallel machine but no suitable candidate for MPIF90, >> >> # try to specify the directory containing "mpif.h" in IFLAGS >> >> # and to specify the location of MPI libraries in MPI_LIBS >> >> >> MPIF90 = mpif90 >> >> #F90 = gfortran >> >> CC = cc >> >> F77 = gfortran >> >> >> # C preprocessor and preprocessing flags - for explicit preprocessing, >> >> # if needed (see the compilation rules above) >> >> # preprocessing flags must include DFLAGS and IFLAGS >> >> >> CPP = cpp >> >> CPPFLAGS = -P -C -traditional $(DFLAGS) $(IFLAGS) >> >> >> # compiler flags: C, F90, F77 >> >> # C flags must include DFLAGS and IFLAGS >> >> # F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with appropriate >> syntax >> >> >> CFLAGS = -O3 $(DFLAGS) $(IFLAGS) >> >> F90FLAGS = $(FFLAGS) -x f95-cpp-input $(FDFLAGS) $(IFLAGS) >> $(MODFLAGS) >> >> FFLAGS = -O3 -g >> >> >> # compiler flags without optimization for fortran-77 >> >> # the latter is NEEDED to properly compile dlamch.f, used by lapack >> >> >> FFLAGS_NOOPT = -O0 -g >> >> >> # compiler flag needed by some compilers when the main program is not >> fortran >> >> # Currently used for Yambo >> >> >> FFLAGS_NOMAIN = >> >> >> # Linker, linker-specific flags (if any) >> >> # Typically LD coincides with F90 or MPIF90, LD_LIBS is empty >> >> >> LD = mpif90 >> >> LDFLAGS = -g -pthread >> >> LD_LIBS = >> >> >> # External Libraries (if any) : blas, lapack, fft, MPI >> >> >> # If you have nothing better, use the local copy : >> >> # BLAS_LIBS = /your/path/to/espresso/BLAS/blas.a >> >> # BLAS_LIBS_SWITCH = internal >> >> >> BLAS_LIBS = -lmkl_gf_lp64 -lmkl_sequential -lmkl_core >> >> BLAS_LIBS_SWITCH = external >> >> >> # If you have nothing better, use the local copy : >> >> # LAPACK_LIBS = /your/path/to/espresso/lapack-3.2/lapack.a >> >> # LAPACK_LIBS_SWITCH = internal >> >> # For IBM machines with essl (-D__ESSL): load essl BEFORE lapack ! >> >> # remember that LAPACK_LIBS precedes BLAS_LIBS in loading order >> >> >> LAPACK_LIBS = >> >> LAPACK_LIBS_SWITCH = external >> >> >> ELPA_LIBS_SWITCH = disabled >> >> SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 >> >> >> # nothing needed here if the the internal copy of FFTW is compiled >> >> # (needs -D__FFTW in DFLAGS) >> >> >> FFT_LIBS = >> >> >> # For parallel execution, the correct path to MPI libraries must >> >> # be specified in MPI_LIBS (except for IBM if you use mpxlf) >> >> >> MPI_LIBS = >> >> >> # IBM-specific: MASS libraries, if available and if -D__MASS is defined >> in FDFLAGS >> >> >> MASS_LIBS = >> >> >> # ar command and flags - for most architectures: AR = ar, ARFLAGS = ruv >> >> >> AR = ar >> >> ARFLAGS = ruv >> >> >> # ranlib command. If ranlib is not needed (it isn't in most cases) use >> >> # RANLIB = echo >> >> >> RANLIB = ranlib >> >> >> # all internal and external libraries - do not modify >> >> >> FLIB_TARGETS = all >> >> >> LIBOBJS = ../clib/clib.a ../iotk/src/libiotk.a >> >> LIBS = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS) >> $(BLAS_LIBS) $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS) >> >> >> # wget or curl - useful to download from network >> >> WGET = wget -O >> >> >> # Install directory - not currently used >> >> PREFIX = /usr/local >> >> Cheers! >> >> >> Chong Wang >> ------------------------------ >> *From:* [email protected] <[email protected]> on >> behalf of Paolo Giannozzi <[email protected]> >> *Sent:* Sunday, May 15, 2016 8:28:26 PM >> >> *To:* PWSCF Forum >> *Subject:* Re: [Pw_forum] mpi error using pw.x >> >> It looks like a compiler/mpi bug, since there is nothing special in your >> input and in your execution, unless you find evidence that the problem is >> reproducible on other compiler/mpi versions. >> >> Paolo >> >> On Sun, May 15, 2016 at 10:11 AM, Chong Wang <[email protected]> wrote: >> >>> Hi, >>> >>> >>> Thank you for replying. >>> >>> >>> More details: >>> >>> >>> 1. input data: >>> >>> &control >>> calculation='scf' >>> restart_mode='from_scratch', >>> pseudo_dir = '../pot/', >>> outdir='./out/' >>> prefix='BaTiO3' >>> / >>> &system >>> nbnd = 48 >>> ibrav = 0, nat = 5, ntyp = 3 >>> ecutwfc = 50 >>> occupations='smearing', smearing='gaussian', degauss=0.02 >>> / >>> &electrons >>> conv_thr = 1.0e-8 >>> / >>> ATOMIC_SPECIES >>> Ba 137.327 Ba.pbe-mt_fhi.UPF >>> Ti 204.380 Ti.pbe-mt_fhi.UPF >>> O 15.999 O.pbe-mt_fhi.UPF >>> ATOMIC_POSITIONS >>> Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000 >>> Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795 >>> O 0.5000000000000000 0.5000000000000000 0.0160000007599592 >>> O 0.5000000000000000 -0.0000000000000000 0.5149999856948849 >>> O 0.0000000000000000 0.5000000000000000 0.5149999856948849 >>> K_POINTS (automatic) >>> 11 11 11 0 0 0 >>> CELL_PARAMETERS {angstrom} >>> 3.999800000000001 0.000000000000000 0.000000000000000 >>> 0.000000000000000 3.999800000000001 0.000000000000000 >>> 0.000000000000000 0.000000000000000 4.018000000000000 >>> >>> 2. number of processors: >>> I tested 24 cores and 8 cores, and both yield the same result. >>> >>> 3. type of parallelization: >>> I don't know your meaning. I execute pw.x by: >>> mpirun -np 24 pw.x < BTO.scf.in >> output >>> >>> 'which mpirun' output: >>> >>> /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun >>> >>> 4. when the error occurs: >>> in the middle of the run. The last a few lines of the output is >>> total cpu time spent up to now is 32.9 secs >>> >>> total energy = -105.97885119 Ry >>> Harris-Foulkes estimate = -105.99394457 Ry >>> estimated scf accuracy < 0.03479229 Ry >>> >>> iteration # 7 ecut= 50.00 Ry beta=0.70 >>> Davidson diagonalization with overlap >>> ethr = 1.45E-04, avg # of iterations = 2.7 >>> >>> total cpu time spent up to now is 37.3 secs >>> >>> total energy = -105.99039982 Ry >>> Harris-Foulkes estimate = -105.99025175 Ry >>> estimated scf accuracy < 0.00927902 Ry >>> >>> iteration # 8 ecut= 50.00 Ry beta=0.70 >>> Davidson diagonalization with overlap >>> >>> 5. Error message: >>> Something like: >>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >>> remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed >>> PMPI_Cart_sub(178)...................: >>> MPIR_Comm_split_impl(270)............: >>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >>> free on this process; ignore_id=0) >>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >>> remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed >>> PMPI_Cart_sub(178)...................: >>> >>> Cheers! >>> >>> Chong >>> ------------------------------ >>> *From:* [email protected] <[email protected]> on >>> behalf of Paolo Giannozzi <[email protected]> >>> *Sent:* Sunday, May 15, 2016 3:43 PM >>> *To:* PWSCF Forum >>> *Subject:* Re: [Pw_forum] mpi error using pw.x >>> >>> Please tell us what is wrong and we will fix it. >>> >>> Seriously: nobody can answer your question unless you specify, as a >>> strict minimum, input data, number of processors and type of >>> parallelization that trigger the error, and where the error occurs (at >>> startup, later, in the middle of the run, ...). >>> >>> Paolo >>> >>> On Sun, May 15, 2016 at 7:50 AM, Chong Wang <[email protected]> wrote: >>> >>>> I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3. >>>> >>>> However, when I ran pw.x the following errors were reported: >>>> >>>> ... >>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >>>> free on this process; ignore_id=0) >>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >>>> remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed >>>> PMPI_Cart_sub(178)...................: >>>> MPIR_Comm_split_impl(270)............: >>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >>>> free on this process; ignore_id=0) >>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >>>> remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed >>>> PMPI_Cart_sub(178)...................: >>>> MPIR_Comm_split_impl(270)............: >>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >>>> free on this process; ignore_id=0) >>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack: >>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, >>>> remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed >>>> PMPI_Cart_sub(178)...................: >>>> MPIR_Comm_split_impl(270)............: >>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 >>>> free on this process; ignore_id=0) >>>> >>>> I googled and found out this might be caused by hitting os limits of >>>> number of opened files. However, After I increased number of opened files >>>> per process from 1024 to 40960, the error persists. >>>> >>>> >>>> What's wrong here? >>>> >>>> >>>> Chong Wang >>>> >>>> Ph. D. candidate >>>> >>>> Institute for Advanced Study, Tsinghua University, Beijing, 100084 >>>> >>>> _______________________________________________ >>>> Pw_forum mailing list >>>> [email protected] >>>> http://pwscf.org/mailman/listinfo/pw_forum >>>> >>> >>> >>> >>> -- >>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, >>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy >>> Phone +39-0432-558216, fax +39-0432-558222 >>> >>> >>> _______________________________________________ >>> Pw_forum mailing list >>> [email protected] >>> http://pwscf.org/mailman/listinfo/pw_forum >>> >> >> >> >> -- >> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, >> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy >> Phone +39-0432-558216, fax +39-0432-558222 >> >> >> _______________________________________________ >> Pw_forum mailing list >> [email protected] >> http://pwscf.org/mailman/listinfo/pw_forum >> > > > > -- > Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, > Univ. Udine, via delle Scienze 208, 33100 Udine, Italy > Phone +39-0432-558216, fax +39-0432-558222 > > > _______________________________________________ > Pw_forum mailing list > [email protected] > http://pwscf.org/mailman/listinfo/pw_forum > -- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche, Univ. Udine, via delle Scienze 208, 33100 Udine, Italy Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________ Pw_forum mailing list [email protected] http://pwscf.org/mailman/listinfo/pw_forum
