Hi Elio

As Gilles said, if you change the integer size to -i8 in the application, and MPI was built with the default-sized integers
(4 bytes), things will get really ugly and mismatched.
Better avoid flags such as -i8, -r8, etc, when compiling MPI programs.

Have you tried to compile the code with the -traceback flag?
This at least should tell you where the code is failing (source file
and line).
As Ralph said, most likely the program is trying to
go beyond array boundaries,
or accessing non-allocated memory.
That can happen even with innocent strings
(say a file name or an informative message that are too big).

A better approach, as suggested by Ralph,
is to open the core file with gdb.
It should be named something like "core.98765", where the "98765" is the process number. However, many Linux distributions set the core file size to zero by default, which prevents the core file to be created when the program crashes, but on the upside also prevents disk to fill up with big core files that are forgotten and hang around forever.
[ulimit -a will tell.]

I hope this helps,
Gus Correa

On 04/23/2016 07:06 PM, Gilles Gouaillardet wrote:
If you build your application with intel compilers and -i8, then openmpi
must also be built with intel compilers and -i8.



On Sunday, April 24, 2016, Elio Physics <elio-phys...@live.com
<mailto:elio-phys...@live.com>> wrote:

    Well, I changed the compiler from mpif90 to mpiifort with
    corresponding flags -i8 -g and recompiled. i am not getting the
    segmentation fault problem anymore and the program runs but later
    stops with no errors (except that the Fermi energy was not found!)
    and with some strange empty files that are produced something like:
       fortDgcQe3  fortechvF2  fortMaN6a1  fortnxoYy1  fortvR5F8q.  i
    still feel something is wrong.. Does anybody know what are these files?


    *From:* users <users-boun...@open-mpi.org
    <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
    behalf of Ralph Castain <r...@open-mpi.org
    *Sent:* Saturday, April 23, 2016 1:38 PM
    *To:* Open MPI Users
    I don’t see any way this could be compilation related - I suspect
    there is simply some error in the program (e.g., forgetting to
    initialize some memory region).

    On Apr 23, 2016, at 8:03 AM, Elio Physics <elio-phys...@live.com
    <javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>> wrote:

    Hello Andy,

    the program is not mine. I have got it from a group upon request.
    It might be program related because I run other codes such as
    quantum espresso and work perfectly fine although it is the
    cluster people who compiled it. Since I have compiled the program
    I am having problems with, I am thinking that it might be
    "compilation" related. This is why i wanted some experts' opinion
    on this

    *From:*users <users-boun...@open-mpi.org
    <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
    behalf of Andy Riebs <andy.ri...@hpe.com
    *Sent:*Saturday, April 23, 2016 12:49 PM
    The challenge for the MPI experts here (of which I am NOT one!) is
    that the problem appears to be in your program; MPI is simply
    reporting that your program failed. If you got the program from
    someone else, you will need to solicit their help. If you wrote
    it, well, it is never a bad time to learn to use gdb!

    Best regards

    On 04/23/2016 10:41 AM, Elio Physics wrote:
    I am not really an expert with gdb. What is the core file? and
    how to use gdb? I have got three files as an output when the
    executable is used. One is the actual output which stops and the
    other two are error files (from which I knew about the
    segmentation fault).


    behalf of Ralph Castain<r...@open-mpi.org>
    *Sent:*Saturday, April 23, 2016 11:39 AM
    *To:*Open MPI Users
    valgrind isn’t going to help here - there are multiple reasons
    why your application could be segfaulting. Take a look at the
    core file with gdb and find out where it is failing.

    On Apr 22, 2016, at 10:20 PM, Elio Physics
    <javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>> wrote:

    One more thing i forgot to mention in my previous e-mail. In the
    output file I get the following message:

    2 total processes killed (some possibly by mpirun during cleanup)


    *From:*users <users-boun...@open-mpi.org
    <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
    behalf of Elio Physics
    *Sent:*Saturday, April 23, 2016 3:07 AM
    *To:*Open MPI Users
    I have used valgrind and this is what i got:

    valgrind mpirun ~/Elie/SPRKKR/bin/kkrscf6.3MPI Fe_SCF.inp >
    ==8135== Memcheck, a memory error detector
    ==8135== Copyright (C) 2002-2012, and GNU GPL'd, by Julian
    Seward et al.
    ==8135== Using Valgrind-3.8.1 and LibVEX; rerun with -h for
    copyright info
    ==8135== Command: mpirun
    /home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI Fe_SCF.inp
    mpirun noticed that process rank 0 with PID 8147 on
    <http://jlborges.fisica.ufmg.br/>exited on signal 11
    (Segmentation fault).
    ==8135== HEAP SUMMARY:
    ==8135==     in use at exit: 485,683 bytes in 1,899 blocks
    ==8135==   total heap usage: 7,723 allocs, 5,824 frees,
    12,185,660 bytes allocated
    ==8135== LEAK SUMMARY:
    ==8135==    definitely lost: 34,944 bytes in 34 blocks
    ==8135==    indirectly lost: 26,613 bytes in 58 blocks
    ==8135==      possibly lost: 0 bytes in 0 blocks
    ==8135==    still reachable: 424,126 bytes in 1,807 blocks
    ==8135==         suppressed: 0 bytes in 0 blocks
    ==8135== Rerun with --leak-check=full to see details of leaked
    ==8135== For counts of detected and suppressed errors, rerun
    with: -v
    ==8135== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6
    from 6)

    What does that supposed to mean?

    *From:*users <users-boun...@open-mpi.org
    <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
    behalf of Ralph Castain
    *Sent:*Saturday, April 23, 2016 1:36:50 AM
    *To:*Open MPI Users
    All I can say is that your program segfault’d during execution -
    you might want to look at the core file using a debugger like
    gdb to see why it failed.

    On Apr 22, 2016, at 8:32 PM, Elio Physics
    <javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>> wrote:

    Dear all,

    I have successfully compiled a code where the executable have
    been produced. However when I started using the executable with
    mpirun, the code stopped with the following error:

    "mpirun noticed that process rank 0 with PID 570 on node
    compute-1-9.local exited on signal 11 (Segmentation fault)."

    What is that error due to? and How can i solve it?

    I will post the make.inc compilation file:

    #BUILD_TYPE := debug

    VERSION = 6.3

    ifeq ($(BUILD_TYPE), debug)

    BIN =~/Elie/SPRKKR/bin

    LIB =  -L/opt/intel/mkl/lib/intel64/libmkl_blas95_ilp64
    -L/opt/intel/mkl/lib/intel64 -lmkl_scalapack_ilp64
    -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_i
    ntelmpi_ilp64 -lpthread -lm -ldl

    #Include mpif.h
    INCLUDE = -I/opt/intel/mkl/include/intel64/ilp64

    FFLAGS = -O2

    FC   = mpif90 -c $(FFLAGS) $(INCLUDE)
    LINK = mpif90   $(FFLAGS) $(INCLUDE)


    Thanks in advance

    University of Rondonia, brazil

    users mailing list
    Link to this

    users mailing list
    Link to this

    users mailing list
    Link to this 

    users mailing list
    Link to this

users mailing list
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 

Reply via email to