Hi Chong Wang,

Perhaps it would be better to run ./configure with,
./configure CC=icc CXX=icpc F90=ifort F77=ifort MPIF90=mpiifort --with-scalapack=intel

so that QE knows which compiler to use, verified with QE v5.3.0.

Rolly

On 05/16/2016 05:52 PM, Paolo Giannozzi wrote:
On Mon, May 16, 2016 at 4:11 AM, Chong Wang <[email protected] <mailto:[email protected]>> wrote:

    I have checked my mpif90 calls gfortran so there's no mix up.


I am not sure it is possible to use gfortran together with intel mpi. If you have intel mpi and mkl, presumably you have the intel compiler as well.

    Can you kindly share with me your make.sys?


it doesn't make sense to share a make.sys file unless the software configuration is the same.

Paolo


    Thanks in advance!


    Best!


    Chong Wang

    ------------------------------------------------------------------------
    *From:* [email protected]
    <mailto:[email protected]> <[email protected]
    <mailto:[email protected]>> on behalf of Paolo Giannozzi
    <[email protected] <mailto:[email protected]>>
    *Sent:* Monday, May 16, 2016 3:10 AM
    *To:* PWSCF Forum
    *Subject:* Re: [Pw_forum] mpi error using pw.x
    Your make.sys shows clear signs of mixup between ifort and
    gfortran. Please verify that mpif90 calls ifort and not gfortran
    (or vice versa). Configure issues a warning if this happens.

    I have successfully run your test on a machine with some recent
    intel compiler and intel mpi. The second output (run as mpirun -np
    18 pw.x -nk 18....) is an example of what I mean by "type of
    parallelization": there are many different parallelization levels
    in QE. This is on k-points (and runs faster in this case on less
    processors than parallelization on plane waves).

    Paolo

    On Sun, May 15, 2016 at 6:01 PM, Chong Wang <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,


        I have done more test:

        1. intel mpi 2015 yields segment fault

        2. intel mpi 2013 yields the same error here

        Did I do something wrong with compiling? Here's my make.sys:


        # make.sys.  Generated from make.sys.in <http://make.sys.in>
        by configure.


        # compilation rules


        .SUFFIXES :

        .SUFFIXES : .o .c .f .f90


        # most fortran compilers can directly preprocess c-like
        directives: use

        # $(MPIF90) $(F90FLAGS) -c $<

        # if explicit preprocessing by the C preprocessor is needed, use:

        # $(CPP) $(CPPFLAGS) $< -o $*.F90

        #$(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o

        # remember the tabulator in the first column !!!


        .f90.o:

        $(MPIF90) $(F90FLAGS) -c $<


        # .f.o and .c.o: do not modify


        .f.o:

        $(F77) $(FFLAGS) -c $<


        .c.o:

        $(CC) $(CFLAGS) -c $<




        # Top QE directory, not used in QE but useful for linking QE
        libs with plugins

        # The following syntax should always point to TOPDIR:

        #   $(dir $(abspath $(filter %make.sys,$(MAKEFILE_LIST))))


        TOPDIR = /home/wangc/temp/espresso-5.4.0


        # DFLAGS  = precompilation options (possible arguments to -D
        and -U)

        #           used by the C compiler and preprocessor

        # FDFLAGS = as DFLAGS, for the f90 compiler

        # See include/defs.h.README for a list of options and their
        meaning

        # With the exception of IBM xlf, FDFLAGS = $(DFLAGS)

        # For IBM xlf, FDFLAGS is the same as DFLAGS with separating
        commas


        # MANUAL_DFLAGS  = additional precompilation option(s), if desired

        #                  BEWARE: it does not work for IBM xlf!
        Manually edit FDFLAGS

        MANUAL_DFLAGS  =

        DFLAGS         =  -D__GFORTRAN -D__STD_F95 -D__DFTI -D__MPI
        -D__PARA -D__SCALAPACK

        FDFLAGS        = $(DFLAGS) $(MANUAL_DFLAGS)


        # IFLAGS = how to locate directories with *.h or *.f90 file to
        be included

        #          typically -I../include -I/some/other/directory/

        #          the latter contains .e.g. files needed by FFT libraries


        IFLAGS         = -I../include
        -I/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include


        # MOD_FLAGS = flag used by f90 compiler to locate modules

        # Each Makefile defines the list of needed modules in MODFLAGS


        MOD_FLAG      = -I


        # Compilers: fortran-90, fortran-77, C

        # If a parallel compilation is desired, MPIF90 should be a
        fortran-90

        # compiler that produces executables for parallel execution
        using MPI

        # (such as for instance mpif90, mpf90, mpxlf90,...);

        # otherwise, an ordinary fortran-90 compiler (f90, g95, xlf90,
        ifort,...)

        # If you have a parallel machine but no suitable candidate for
        MPIF90,

        # try to specify the directory containing "mpif.h" in IFLAGS

        # and to specify the location of MPI libraries in MPI_LIBS


        MPIF90         = mpif90

        #F90           = gfortran

        CC             = cc

        F77            = gfortran


        # C preprocessor and preprocessing flags - for explicit
        preprocessing,

        # if needed (see the compilation rules above)

        # preprocessing flags must include DFLAGS and IFLAGS


        CPP            = cpp

        CPPFLAGS       = -P -C -traditional $(DFLAGS) $(IFLAGS)


        # compiler flags: C, F90, F77

        # C flags must include DFLAGS and IFLAGS

        # F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with
        appropriate syntax


        CFLAGS         = -O3 $(DFLAGS) $(IFLAGS)

        F90FLAGS       = $(FFLAGS) -x f95-cpp-input $(FDFLAGS)
        $(IFLAGS) $(MODFLAGS)

        FFLAGS         = -O3 -g


        # compiler flags without optimization for fortran-77

        # the latter is NEEDED to properly compile dlamch.f, used by
        lapack


        FFLAGS_NOOPT   = -O0 -g


        # compiler flag needed by some compilers when the main program
        is not fortran

        # Currently used for Yambo


        FFLAGS_NOMAIN   =


        # Linker, linker-specific flags (if any)

        # Typically LD coincides with F90 or MPIF90, LD_LIBS is empty


        LD             = mpif90

        LDFLAGS        =  -g -pthread

        LD_LIBS        =


        # External Libraries (if any) : blas, lapack, fft, MPI


        # If you have nothing better, use the local copy :

        # BLAS_LIBS = /your/path/to/espresso/BLAS/blas.a

        # BLAS_LIBS_SWITCH = internal


        BLAS_LIBS      = -lmkl_gf_lp64  -lmkl_sequential -lmkl_core

        BLAS_LIBS_SWITCH = external


        # If you have nothing better, use the local copy :

        # LAPACK_LIBS = /your/path/to/espresso/lapack-3.2/lapack.a

        # LAPACK_LIBS_SWITCH = internal

        # For IBM machines with essl (-D__ESSL): load essl BEFORE lapack !

        # remember that LAPACK_LIBS precedes BLAS_LIBS in loading order


        LAPACK_LIBS    =

        LAPACK_LIBS_SWITCH = external


        ELPA_LIBS_SWITCH = disabled

        SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64


        # nothing needed here if the the internal copy of FFTW is compiled

        # (needs -D__FFTW in DFLAGS)


        FFT_LIBS       =


        # For parallel execution, the correct path to MPI libraries must

        # be specified in MPI_LIBS (except for IBM if you use mpxlf)


        MPI_LIBS       =


        # IBM-specific: MASS libraries, if available and if -D__MASS
        is defined in FDFLAGS


        MASS_LIBS      =


        # ar command and flags - for most architectures: AR = ar,
        ARFLAGS = ruv


        AR             = ar

        ARFLAGS        = ruv


        # ranlib command. If ranlib is not needed (it isn't in most
        cases) use

        # RANLIB = echo


        RANLIB         = ranlib


        # all internal and external libraries - do not modify


        FLIB_TARGETS   = all


        LIBOBJS        = ../clib/clib.a ../iotk/src/libiotk.a

        LIBS           = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS)
        $(BLAS_LIBS) $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS)


        # wget or curl - useful to download from network

        WGET = wget -O


        # Install directory - not currently used

        PREFIX = /usr/local


        Cheers!


        Chong Wang

        ------------------------------------------------------------------------
        *From:* [email protected]
        <mailto:[email protected]>
        <[email protected]
        <mailto:[email protected]>> on behalf of Paolo
        Giannozzi <[email protected] <mailto:[email protected]>>
        *Sent:* Sunday, May 15, 2016 8:28:26 PM

        *To:* PWSCF Forum
        *Subject:* Re: [Pw_forum] mpi error using pw.x
        It looks like a compiler/mpi bug, since there is nothing
        special in your input and in your execution, unless you find
        evidence that the problem is reproducible on other
        compiler/mpi versions.

        Paolo

        On Sun, May 15, 2016 at 10:11 AM, Chong Wang
        <[email protected] <mailto:[email protected]>> wrote:

            Hi,


            Thank you for replying.


            More details:


            1. input data:

            &control
            calculation='scf'
            restart_mode='from_scratch',
                pseudo_dir = '../pot/',
                outdir='./out/'
                prefix='BaTiO3'
            /
            &system
                nbnd = 48
                ibrav = 0, nat = 5, ntyp = 3
                ecutwfc = 50
            occupations='smearing', smearing='gaussian', degauss=0.02
            /
            &electrons
                conv_thr = 1.0e-8
            /
            ATOMIC_SPECIES
             Ba 137.327 Ba.pbe-mt_fhi.UPF
             Ti 204.380 Ti.pbe-mt_fhi.UPF
             O  15.999  O.pbe-mt_fhi.UPF
            ATOMIC_POSITIONS
             Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000
             Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795
             O  0.5000000000000000 0.5000000000000000 0.0160000007599592
             O  0.5000000000000000  -0.0000000000000000 0.5149999856948849
             O  0.0000000000000000 0.5000000000000000 0.5149999856948849
            K_POINTS (automatic)
            11 11 11 0 0 0
            CELL_PARAMETERS {angstrom}
            3.999800000000001     0.000000000000000 0.000000000000000
            0.000000000000000     3.999800000000001 0.000000000000000
            0.000000000000000     0.000000000000000 4.018000000000000

            2. number of processors:
            I tested 24 cores and 8 cores, and both yield the same result.

            3. type of parallelization:
            I don't know your meaning. I execute pw.x by:
            mpirun  -np 24 pw.x < BTO.scf.in <http://BTO.scf.in> >> output

            'which mpirun' output:
            
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun

            4. when the error occurs:
            in the middle of the run. The last a few lines of the
            output is
                 total cpu time spent up to now is   32.9 secs

                 total energy            =  -105.97885119 Ry
                 Harris-Foulkes estimate   =  -105.99394457 Ry
                 estimated scf accuracy    < 0.03479229 Ry

                 iteration #  7     ecut=    50.00 Ry     beta=0.70
                 Davidson diagonalization with overlap
                 ethr =  1.45E-04,  avg # of iterations =  2.7

                 total cpu time spent up to now is   37.3 secs

                 total energy            =  -105.99039982 Ry
                 Harris-Foulkes estimate   =  -105.99025175 Ry
                 estimated scf accuracy    < 0.00927902 Ry

                 iteration #  8     ecut=    50.00 Ry     beta=0.70
                 Davidson diagonalization with overlap

            5. Error message:
            Something like:
            Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
            PMPI_Cart_sub(242)...................:
            MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38,
            comm_new=0x7ffc03ae5e90) failed
            PMPI_Cart_sub(178)...................:
            MPIR_Comm_split_impl(270)............:
            MPIR_Get_contextid_sparse_group(1330): Too many
            communicators (0/16384 free on this process; ignore_id=0)
            Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
            PMPI_Cart_sub(242)...................:
            MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408,
            comm_new=0x7ffd10080360) failed
            PMPI_Cart_sub(178)...................:

            Cheers!

            Chong
            
------------------------------------------------------------------------
            *From:* [email protected]
            <mailto:[email protected]>
            <[email protected]
            <mailto:[email protected]>> on behalf of Paolo
            Giannozzi <[email protected]
            <mailto:[email protected]>>
            *Sent:* Sunday, May 15, 2016 3:43 PM
            *To:* PWSCF Forum
            *Subject:* Re: [Pw_forum] mpi error using pw.x
            Please tell us what is wrong and we will fix it.

            Seriously: nobody can answer your question unless you
            specify, as a strict minimum, input data, number of
            processors and type of parallelization that trigger the
            error, and where the error occurs (at startup, later, in
            the middle of the run, ...).

            Paolo

            On Sun, May 15, 2016 at 7:50 AM, Chong Wang
            <[email protected] <mailto:[email protected]>> wrote:

                I compiled quantum espresso 5.4 with intel mpi and mkl
                2016 update 3.

                However, when I ran pw.x the following errors were
                reported:

                ...
                MPIR_Get_contextid_sparse_group(1330): Too many
                communicators (0/16384 free on this process; ignore_id=0)
                Fatal error in PMPI_Cart_sub: Other MPI error, error
                stack:
                PMPI_Cart_sub(242)...................:
                MPI_Cart_sub(comm=0xc400fcf3,
                remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30)
                failed
                PMPI_Cart_sub(178)...................:
                MPIR_Comm_split_impl(270)............:
                MPIR_Get_contextid_sparse_group(1330): Too many
                communicators (0/16384 free on this process; ignore_id=0)
                Fatal error in PMPI_Cart_sub: Other MPI error, error
                stack:
                PMPI_Cart_sub(242)...................:
                MPI_Cart_sub(comm=0xc400fcf3,
                remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10)
                failed
                PMPI_Cart_sub(178)...................:
                MPIR_Comm_split_impl(270)............:
                MPIR_Get_contextid_sparse_group(1330): Too many
                communicators (0/16384 free on this process; ignore_id=0)
                Fatal error in PMPI_Cart_sub: Other MPI error, error
                stack:
                PMPI_Cart_sub(242)...................:
                MPI_Cart_sub(comm=0xc400fcf3,
                remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050)
                failed
                PMPI_Cart_sub(178)...................:
                MPIR_Comm_split_impl(270)............:
                MPIR_Get_contextid_sparse_group(1330): Too many
                communicators (0/16384 free on this process; ignore_id=0)

                I googled and found out this might be caused by
                hitting os limits of number of opened files. However,
                After I increased number of opened files per process
                from 1024 to 40960, the error persists.


                What's wrong here?


                Chong Wang

                Ph. D. candidate

                Institute for Advanced Study, Tsinghua University,
                Beijing, 100084


                _______________________________________________
                Pw_forum mailing list
                [email protected] <mailto:[email protected]>
                http://pwscf.org/mailman/listinfo/pw_forum




-- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e
            Fisiche,
            Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
            Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
            +39-0432-558222 <tel:%2B39-0432-558222>


            _______________________________________________
            Pw_forum mailing list
            [email protected] <mailto:[email protected]>
            http://pwscf.org/mailman/listinfo/pw_forum




-- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
        Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
        Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
        +39-0432-558222 <tel:%2B39-0432-558222>


        _______________________________________________
        Pw_forum mailing list
        [email protected] <mailto:[email protected]>
        http://pwscf.org/mailman/listinfo/pw_forum




-- Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
    Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
    Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax +39-0432-558222
    <tel:%2B39-0432-558222>


    _______________________________________________
    Pw_forum mailing list
    [email protected] <mailto:[email protected]>
    http://pwscf.org/mailman/listinfo/pw_forum




--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222



_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum

--
PhD. Research Fellow,
Dept. of Physics & Materials Science,
City University of Hong Kong
Tel: +852 3442 4000
Fax: +852 3442 0538

_______________________________________________
Pw_forum mailing list
[email protected]
http://pwscf.org/mailman/listinfo/pw_forum

Reply via email to