On Mon, May 16, 2016 at 4:11 AM, Chong Wang <[email protected]
<mailto:[email protected]>> wrote:
I have checked my mpif90 calls gfortran so there's no mix up.
I am not sure it is possible to use gfortran together with intel mpi.
If you have intel mpi and mkl, presumably you have the intel compiler
as well.
Can you kindly share with me your make.sys?
it doesn't make sense to share a make.sys file unless the software
configuration is the same.
Paolo
Thanks in advance!
Best!
Chong Wang
------------------------------------------------------------------------
*From:* [email protected]
<mailto:[email protected]> <[email protected]
<mailto:[email protected]>> on behalf of Paolo Giannozzi
<[email protected] <mailto:[email protected]>>
*Sent:* Monday, May 16, 2016 3:10 AM
*To:* PWSCF Forum
*Subject:* Re: [Pw_forum] mpi error using pw.x
Your make.sys shows clear signs of mixup between ifort and
gfortran. Please verify that mpif90 calls ifort and not gfortran
(or vice versa). Configure issues a warning if this happens.
I have successfully run your test on a machine with some recent
intel compiler and intel mpi. The second output (run as mpirun -np
18 pw.x -nk 18....) is an example of what I mean by "type of
parallelization": there are many different parallelization levels
in QE. This is on k-points (and runs faster in this case on less
processors than parallelization on plane waves).
Paolo
On Sun, May 15, 2016 at 6:01 PM, Chong Wang <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have done more test:
1. intel mpi 2015 yields segment fault
2. intel mpi 2013 yields the same error here
Did I do something wrong with compiling? Here's my make.sys:
# make.sys. Generated from make.sys.in <http://make.sys.in>
by configure.
# compilation rules
.SUFFIXES :
.SUFFIXES : .o .c .f .f90
# most fortran compilers can directly preprocess c-like
directives: use
# $(MPIF90) $(F90FLAGS) -c $<
# if explicit preprocessing by the C preprocessor is needed, use:
# $(CPP) $(CPPFLAGS) $< -o $*.F90
#$(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o
# remember the tabulator in the first column !!!
.f90.o:
$(MPIF90) $(F90FLAGS) -c $<
# .f.o and .c.o: do not modify
.f.o:
$(F77) $(FFLAGS) -c $<
.c.o:
$(CC) $(CFLAGS) -c $<
# Top QE directory, not used in QE but useful for linking QE
libs with plugins
# The following syntax should always point to TOPDIR:
# $(dir $(abspath $(filter %make.sys,$(MAKEFILE_LIST))))
TOPDIR = /home/wangc/temp/espresso-5.4.0
# DFLAGS = precompilation options (possible arguments to -D
and -U)
# used by the C compiler and preprocessor
# FDFLAGS = as DFLAGS, for the f90 compiler
# See include/defs.h.README for a list of options and their
meaning
# With the exception of IBM xlf, FDFLAGS = $(DFLAGS)
# For IBM xlf, FDFLAGS is the same as DFLAGS with separating
commas
# MANUAL_DFLAGS = additional precompilation option(s), if desired
# BEWARE: it does not work for IBM xlf!
Manually edit FDFLAGS
MANUAL_DFLAGS =
DFLAGS = -D__GFORTRAN -D__STD_F95 -D__DFTI -D__MPI
-D__PARA -D__SCALAPACK
FDFLAGS = $(DFLAGS) $(MANUAL_DFLAGS)
# IFLAGS = how to locate directories with *.h or *.f90 file to
be included
# typically -I../include -I/some/other/directory/
# the latter contains .e.g. files needed by FFT libraries
IFLAGS = -I../include
-I/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include
# MOD_FLAGS = flag used by f90 compiler to locate modules
# Each Makefile defines the list of needed modules in MODFLAGS
MOD_FLAG = -I
# Compilers: fortran-90, fortran-77, C
# If a parallel compilation is desired, MPIF90 should be a
fortran-90
# compiler that produces executables for parallel execution
using MPI
# (such as for instance mpif90, mpf90, mpxlf90,...);
# otherwise, an ordinary fortran-90 compiler (f90, g95, xlf90,
ifort,...)
# If you have a parallel machine but no suitable candidate for
MPIF90,
# try to specify the directory containing "mpif.h" in IFLAGS
# and to specify the location of MPI libraries in MPI_LIBS
MPIF90 = mpif90
#F90 = gfortran
CC = cc
F77 = gfortran
# C preprocessor and preprocessing flags - for explicit
preprocessing,
# if needed (see the compilation rules above)
# preprocessing flags must include DFLAGS and IFLAGS
CPP = cpp
CPPFLAGS = -P -C -traditional $(DFLAGS) $(IFLAGS)
# compiler flags: C, F90, F77
# C flags must include DFLAGS and IFLAGS
# F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with
appropriate syntax
CFLAGS = -O3 $(DFLAGS) $(IFLAGS)
F90FLAGS = $(FFLAGS) -x f95-cpp-input $(FDFLAGS)
$(IFLAGS) $(MODFLAGS)
FFLAGS = -O3 -g
# compiler flags without optimization for fortran-77
# the latter is NEEDED to properly compile dlamch.f, used by
lapack
FFLAGS_NOOPT = -O0 -g
# compiler flag needed by some compilers when the main program
is not fortran
# Currently used for Yambo
FFLAGS_NOMAIN =
# Linker, linker-specific flags (if any)
# Typically LD coincides with F90 or MPIF90, LD_LIBS is empty
LD = mpif90
LDFLAGS = -g -pthread
LD_LIBS =
# External Libraries (if any) : blas, lapack, fft, MPI
# If you have nothing better, use the local copy :
# BLAS_LIBS = /your/path/to/espresso/BLAS/blas.a
# BLAS_LIBS_SWITCH = internal
BLAS_LIBS = -lmkl_gf_lp64 -lmkl_sequential -lmkl_core
BLAS_LIBS_SWITCH = external
# If you have nothing better, use the local copy :
# LAPACK_LIBS = /your/path/to/espresso/lapack-3.2/lapack.a
# LAPACK_LIBS_SWITCH = internal
# For IBM machines with essl (-D__ESSL): load essl BEFORE lapack !
# remember that LAPACK_LIBS precedes BLAS_LIBS in loading order
LAPACK_LIBS =
LAPACK_LIBS_SWITCH = external
ELPA_LIBS_SWITCH = disabled
SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
# nothing needed here if the the internal copy of FFTW is compiled
# (needs -D__FFTW in DFLAGS)
FFT_LIBS =
# For parallel execution, the correct path to MPI libraries must
# be specified in MPI_LIBS (except for IBM if you use mpxlf)
MPI_LIBS =
# IBM-specific: MASS libraries, if available and if -D__MASS
is defined in FDFLAGS
MASS_LIBS =
# ar command and flags - for most architectures: AR = ar,
ARFLAGS = ruv
AR = ar
ARFLAGS = ruv
# ranlib command. If ranlib is not needed (it isn't in most
cases) use
# RANLIB = echo
RANLIB = ranlib
# all internal and external libraries - do not modify
FLIB_TARGETS = all
LIBOBJS = ../clib/clib.a ../iotk/src/libiotk.a
LIBS = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS)
$(BLAS_LIBS) $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS)
# wget or curl - useful to download from network
WGET = wget -O
# Install directory - not currently used
PREFIX = /usr/local
Cheers!
Chong Wang
------------------------------------------------------------------------
*From:* [email protected]
<mailto:[email protected]>
<[email protected]
<mailto:[email protected]>> on behalf of Paolo
Giannozzi <[email protected] <mailto:[email protected]>>
*Sent:* Sunday, May 15, 2016 8:28:26 PM
*To:* PWSCF Forum
*Subject:* Re: [Pw_forum] mpi error using pw.x
It looks like a compiler/mpi bug, since there is nothing
special in your input and in your execution, unless you find
evidence that the problem is reproducible on other
compiler/mpi versions.
Paolo
On Sun, May 15, 2016 at 10:11 AM, Chong Wang
<[email protected] <mailto:[email protected]>> wrote:
Hi,
Thank you for replying.
More details:
1. input data:
&control
calculation='scf'
restart_mode='from_scratch',
pseudo_dir = '../pot/',
outdir='./out/'
prefix='BaTiO3'
/
&system
nbnd = 48
ibrav = 0, nat = 5, ntyp = 3
ecutwfc = 50
occupations='smearing', smearing='gaussian', degauss=0.02
/
&electrons
conv_thr = 1.0e-8
/
ATOMIC_SPECIES
Ba 137.327 Ba.pbe-mt_fhi.UPF
Ti 204.380 Ti.pbe-mt_fhi.UPF
O 15.999 O.pbe-mt_fhi.UPF
ATOMIC_POSITIONS
Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000
Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795
O 0.5000000000000000 0.5000000000000000 0.0160000007599592
O 0.5000000000000000 -0.0000000000000000 0.5149999856948849
O 0.0000000000000000 0.5000000000000000 0.5149999856948849
K_POINTS (automatic)
11 11 11 0 0 0
CELL_PARAMETERS {angstrom}
3.999800000000001 0.000000000000000 0.000000000000000
0.000000000000000 3.999800000000001 0.000000000000000
0.000000000000000 0.000000000000000 4.018000000000000
2. number of processors:
I tested 24 cores and 8 cores, and both yield the same result.
3. type of parallelization:
I don't know your meaning. I execute pw.x by:
mpirun -np 24 pw.x < BTO.scf.in <http://BTO.scf.in> >> output
'which mpirun' output:
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
4. when the error occurs:
in the middle of the run. The last a few lines of the
output is
total cpu time spent up to now is 32.9 secs
total energy = -105.97885119 Ry
Harris-Foulkes estimate = -105.99394457 Ry
estimated scf accuracy < 0.03479229 Ry
iteration # 7 ecut= 50.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 1.45E-04, avg # of iterations = 2.7
total cpu time spent up to now is 37.3 secs
total energy = -105.99039982 Ry
Harris-Foulkes estimate = -105.99025175 Ry
estimated scf accuracy < 0.00927902 Ry
iteration # 8 ecut= 50.00 Ry beta=0.70
Davidson diagonalization with overlap
5. Error message:
Something like:
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................:
MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38,
comm_new=0x7ffc03ae5e90) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many
communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................:
MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408,
comm_new=0x7ffd10080360) failed
PMPI_Cart_sub(178)...................:
Cheers!
Chong
------------------------------------------------------------------------
*From:* [email protected]
<mailto:[email protected]>
<[email protected]
<mailto:[email protected]>> on behalf of Paolo
Giannozzi <[email protected]
<mailto:[email protected]>>
*Sent:* Sunday, May 15, 2016 3:43 PM
*To:* PWSCF Forum
*Subject:* Re: [Pw_forum] mpi error using pw.x
Please tell us what is wrong and we will fix it.
Seriously: nobody can answer your question unless you
specify, as a strict minimum, input data, number of
processors and type of parallelization that trigger the
error, and where the error occurs (at startup, later, in
the middle of the run, ...).
Paolo
On Sun, May 15, 2016 at 7:50 AM, Chong Wang
<[email protected] <mailto:[email protected]>> wrote:
I compiled quantum espresso 5.4 with intel mpi and mkl
2016 update 3.
However, when I ran pw.x the following errors were
reported:
...
MPIR_Get_contextid_sparse_group(1330): Too many
communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error
stack:
PMPI_Cart_sub(242)...................:
MPI_Cart_sub(comm=0xc400fcf3,
remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30)
failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many
communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error
stack:
PMPI_Cart_sub(242)...................:
MPI_Cart_sub(comm=0xc400fcf3,
remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10)
failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many
communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error
stack:
PMPI_Cart_sub(242)...................:
MPI_Cart_sub(comm=0xc400fcf3,
remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050)
failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many
communicators (0/16384 free on this process; ignore_id=0)
I googled and found out this might be caused by
hitting os limits of number of opened files. However,
After I increased number of opened files per process
from 1024 to 40960, the error persists.
What's wrong here?
Chong Wang
Ph. D. candidate
Institute for Advanced Study, Tsinghua University,
Beijing, 100084
_______________________________________________
Pw_forum mailing list
[email protected] <mailto:[email protected]>
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e
Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
+39-0432-558222 <tel:%2B39-0432-558222>
_______________________________________________
Pw_forum mailing list
[email protected] <mailto:[email protected]>
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
+39-0432-558222 <tel:%2B39-0432-558222>
_______________________________________________
Pw_forum mailing list
[email protected] <mailto:[email protected]>
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax +39-0432-558222
<tel:%2B39-0432-558222>
_______________________________________________
Pw_forum mailing list
[email protected] <mailto:[email protected]>
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum