Hi Elio
As Gilles said, if you change the integer size to -i8 in the
application, and MPI was built with the default-sized integers
(4 bytes), things will get really ugly and mismatched.
Better avoid flags such as -i8, -r8, etc, when compiling MPI programs.
Have you tried to compile the code with the -traceback flag?
This at least should tell you where the code is failing (source file
and line).
As Ralph said, most likely the program is trying to
go beyond array boundaries,
or accessing non-allocated memory.
That can happen even with innocent strings
(say a file name or an informative message that are too big).
A better approach, as suggested by Ralph,
is to open the core file with gdb.
It should be named something like "core.98765", where the "98765" is the
process number.
However, many Linux distributions set the core file size to zero by
default, which prevents the core file to be created when the program
crashes, but on the upside also prevents disk to fill up with big core
files that are forgotten and hang around forever.
[ulimit -a will tell.]
I hope this helps,
Gus Correa
On 04/23/2016 07:06 PM, Gilles Gouaillardet wrote:
If you build your application with intel compilers and -i8, then openmpi
must also be built with intel compilers and -i8.
Cheers,
Gilles
On Sunday, April 24, 2016, Elio Physics <elio-phys...@live.com
<mailto:elio-phys...@live.com>> wrote:
Well, I changed the compiler from mpif90 to mpiifort with
corresponding flags -i8 -g and recompiled. i am not getting the
segmentation fault problem anymore and the program runs but later
stops with no errors (except that the Fermi energy was not found!)
and with some strange empty files that are produced something like:
fortDgcQe3 fortechvF2 fortMaN6a1 fortnxoYy1 fortvR5F8q. i
still feel something is wrong.. Does anybody know what are these files?
Regards
------------------------------------------------------------------------
*From:* users <users-boun...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
behalf of Ralph Castain <r...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>>
*Sent:* Saturday, April 23, 2016 1:38 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] MPIRUN SEGMENTATION FAULT
I don’t see any way this could be compilation related - I suspect
there is simply some error in the program (e.g., forgetting to
initialize some memory region).
On Apr 23, 2016, at 8:03 AM, Elio Physics <elio-phys...@live.com
<javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>> wrote:
Hello Andy,
the program is not mine. I have got it from a group upon request.
It might be program related because I run other codes such as
quantum espresso and work perfectly fine although it is the
cluster people who compiled it. Since I have compiled the program
I am having problems with, I am thinking that it might be
"compilation" related. This is why i wanted some experts' opinion
on this
------------------------------------------------------------------------
*From:*users <users-boun...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
behalf of Andy Riebs <andy.ri...@hpe.com
<javascript:_e(%7B%7D,'cvml','andy.ri...@hpe.com');>>
*Sent:*Saturday, April 23, 2016 12:49 PM
*To:*us...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
*Subject:*Re: [OMPI users] MPIRUN SEGMENTATION FAULT
The challenge for the MPI experts here (of which I am NOT one!) is
that the problem appears to be in your program; MPI is simply
reporting that your program failed. If you got the program from
someone else, you will need to solicit their help. If you wrote
it, well, it is never a bad time to learn to use gdb!
Best regards
Andy
On 04/23/2016 10:41 AM, Elio Physics wrote:
I am not really an expert with gdb. What is the core file? and
how to use gdb? I have got three files as an output when the
executable is used. One is the actual output which stops and the
other two are error files (from which I knew about the
segmentation fault).
thanks
------------------------------------------------------------------------
*From:*users<users-boun...@open-mpi.org>
<javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>on
behalf of Ralph Castain<r...@open-mpi.org>
<javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>
*Sent:*Saturday, April 23, 2016 11:39 AM
*To:*Open MPI Users
*Subject:*Re: [OMPI users] MPIRUN SEGMENTATION FAULT
valgrind isn’t going to help here - there are multiple reasons
why your application could be segfaulting. Take a look at the
core file with gdb and find out where it is failing.
On Apr 22, 2016, at 10:20 PM, Elio Physics
<elio-phys...@live.com
<javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>> wrote:
One more thing i forgot to mention in my previous e-mail. In the
output file I get the following message:
2 total processes killed (some possibly by mpirun during cleanup)
Thanks
------------------------------------------------------------------------
*From:*users <users-boun...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
behalf of Elio Physics
<<javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>elio-phys...@live.com
<javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>>
*Sent:*Saturday, April 23, 2016 3:07 AM
*To:*Open MPI Users
*Subject:*Re: [OMPI users] MPIRUN SEGMENTATION FAULT
I have used valgrind and this is what i got:
valgrind mpirun ~/Elie/SPRKKR/bin/kkrscf6.3MPI Fe_SCF.inp >
scf-51551.jlborges.fisica.ufmg.br.out
==8135== Memcheck, a memory error detector
==8135== Copyright (C) 2002-2012, and GNU GPL'd, by Julian
Seward et al.
==8135== Using Valgrind-3.8.1 and LibVEX; rerun with -h for
copyright info
==8135== Command: mpirun
/home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI Fe_SCF.inp
==8135==
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8147 on
nodejlborges.fisica.ufmg.br
<http://jlborges.fisica.ufmg.br/>exited on signal 11
(Segmentation fault).
--------------------------------------------------------------------------
==8135==
==8135== HEAP SUMMARY:
==8135== in use at exit: 485,683 bytes in 1,899 blocks
==8135== total heap usage: 7,723 allocs, 5,824 frees,
12,185,660 bytes allocated
==8135==
==8135== LEAK SUMMARY:
==8135== definitely lost: 34,944 bytes in 34 blocks
==8135== indirectly lost: 26,613 bytes in 58 blocks
==8135== possibly lost: 0 bytes in 0 blocks
==8135== still reachable: 424,126 bytes in 1,807 blocks
==8135== suppressed: 0 bytes in 0 blocks
==8135== Rerun with --leak-check=full to see details of leaked
memory
==8135==
==8135== For counts of detected and suppressed errors, rerun
with: -v
==8135== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6
from 6)
What does that supposed to mean?
Regards
------------------------------------------------------------------------
*From:*users <users-boun...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>> on
behalf of Ralph Castain
<<javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>r...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>>
*Sent:*Saturday, April 23, 2016 1:36:50 AM
*To:*Open MPI Users
*Subject:*Re: [OMPI users] MPIRUN SEGMENTATION FAULT
All I can say is that your program segfault’d during execution -
you might want to look at the core file using a debugger like
gdb to see why it failed.
On Apr 22, 2016, at 8:32 PM, Elio Physics
<<javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>elio-phys...@live.com
<javascript:_e(%7B%7D,'cvml','elio-phys...@live.com');>> wrote:
Dear all,
I have successfully compiled a code where the executable have
been produced. However when I started using the executable with
mpirun, the code stopped with the following error:
"mpirun noticed that process rank 0 with PID 570 on node
compute-1-9.local exited on signal 11 (Segmentation fault)."
What is that error due to? and How can i solve it?
I will post the make.inc compilation file:
BUILD_TYPE ?=
#BUILD_TYPE := debug
VERSION = 6.3
ifeq ($(BUILD_TYPE), debug)
VERSION := $(VERSION)$(BUILD_TYPE)
endif
BIN =~/Elie/SPRKKR/bin
#BIN=~/bin
#BIN=/tmp/$(USER)
LIB = -L/opt/intel/mkl/lib/intel64/libmkl_blas95_ilp64
-L/opt/intel/mkl/lib/intel64/libmkl_lapack95_ilp64
-L/opt/intel/mkl/lib/intel64 -lmkl_scalapack_ilp64
-lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_i
ntelmpi_ilp64 -lpthread -lm -ldl
#Include mpif.h
INCLUDE = -I/opt/intel/mkl/include/intel64/ilp64
-I/opt/intel/mkl/lib/include
#FFLAGS
FFLAGS = -O2
FC = mpif90 -c $(FFLAGS) $(INCLUDE)
LINK = mpif90 $(FFLAGS) $(INCLUDE)
MPI=MPI
Thanks in advance
Elio
University of Rondonia, brazil
_______________________________________________
users mailing list
us...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
Subscription:<http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:<http://www.open-mpi.org/community/lists/users/2016/04/29000.php>http://www.open-mpi.org/community/lists/users/2016/04/29000.php
_______________________________________________
users mailing list
us...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
Subscription:<http://www.open-mpi.org/mailman/listinfo.cgi/users>http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:<http://www.open-mpi.org/community/lists/users/2016/04/29003.php>http://www.open-mpi.org/community/lists/users/2016/04/29003.php
_______________________________________________
users mailing list
us...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/04/29005.php
_______________________________________________
users mailing list
us...@open-mpi.org
<javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/04/29007.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/04/29012.php