Hi Oscar

This is a long shot, but maybe worth trying.
I am assuming you're using Linux, or some form or Unix, right?

You may try to increase the stack size.
The default in Linux is often too small for large programs.
Sometimes this may cause a segmentation fault, even if the
program is correct.

You can check what you have with:

ulimit -a        (bash)

or

limit             (csh or tcsh)

Then set it to a larger number or perhaps to unlimited,
e.g.:

ulimit -s unlimited

or

limit stacksize unlimited

You didn't say anything about the computer(s) you are using.
Is this a single machine, a cluster, something else?

Anyway, resetting the statck size may depend a bit on what you
have in /etc/security/limits.conf,
and whether it allows you to increase the stack size.
If it is a single computer that you have root access, you may
do it yourself.
There are other limits worth increasing (number of open files,
max locked memory).
For instance, this could go in limits.conf:

*   -   memlock     -1
*   -   stack       -1
*   -   nofile      4096

See 'man limits.conf' for details.

If it is a cluster, and this should be set on all nodes,
and you may need to ask your system administrator to do it.

I hope this helps,
Gus Correa

On 04/16/2014 11:24 AM, Gus Correa wrote:
On 04/16/2014 08:30 AM, Oscar Mojica wrote:
How would be the command line to compile with the option -g ? What
debugger can I use?
Thanks


Replace any optimization flags (-O2, or similar) by -g.
Check if your compiler has the -traceback flag or similar
(man compiler-name).

The gdb debugger is normally available on Linux (or you can install it
with yum, apt-get, etc).  An alternative is ddd, with a GUI (can also be
installed from yum, etc).
If you use a commercial compiler you may have a debugger with a GUI.

Enviado desde mi iPad

El 15/04/2014, a las 18:20, "Gus Correa" <g...@ldeo.columbia.edu>
escribió:

Or just compiling with -g or -traceback (depending on the compiler) will
give you more information about the point of failure
in the error message.

On 04/15/2014 04:25 PM, Ralph Castain wrote:
Have you tried using a debugger to look at the resulting core file? It
will probably point you right at the problem. Most likely a case of
overrunning some array when #temps > 5




On Tue, Apr 15, 2014 at 10:46 AM, Oscar Mojica <o_moji...@hotmail.com
<mailto:o_moji...@hotmail.com>> wrote:

    Hello everybody

    I implemented a parallel simulated annealing algorithm in fortran.
      The algorithm is describes as follows

    1. The MPI program initially generates P processes that have rank
    0,1,...,P-1.
    2. The MPI program generates a starting point and sends it  for all
    processes set T=T0
    3. At the current temperature T, each process begins to execute
    iterative operations
    4. At end of iterations, process with rank 0 is responsible for
    collecting the solution obatined by
    5. Each process at current temperature and broadcast the best
    solution of them among all participating
    process
    6. Each process cools the temperatue and goes back to step 3, until
    the maximum number of temperatures
    is reach

    I compiled with: mpif90 -o exe mpivfsa_version2.f
    and run with: mpirun -np 4 ./exe in a single machine

    So I have 4 processes, 1 iteration per temperature and for example
    15 temperatures. When I run the program
    with just 5 temperatures it works well, but when the number of
    temperatures is higher than 5 it doesn't write the
    ouput files and I get the following error message:


    [oscar-Vostro-3550:06740] *** Process received signal ***
    [oscar-Vostro-3550:06741] *** Process received signal ***
    [oscar-Vostro-3550:06741] Signal: Segmentation fault (11)
    [oscar-Vostro-3550:06741] Signal code: Address not mapped (1)
    [oscar-Vostro-3550:06741] Failing at address: 0xad6af
    [oscar-Vostro-3550:06742] *** Process received signal ***
    [oscar-Vostro-3550:06740] Signal: Segmentation fault (11)
    [oscar-Vostro-3550:06740] Signal code: Address not mapped (1)
    [oscar-Vostro-3550:06740] Failing at address: 0xad6af
    [oscar-Vostro-3550:06742] Signal: Segmentation fault (11)
    [oscar-Vostro-3550:06742] Signal code: Address not mapped (1)
    [oscar-Vostro-3550:06742] Failing at address: 0xad6af
    [oscar-Vostro-3550:06740] [ 0]
    /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f49ee2224a0]
    [oscar-Vostro-3550:06740] [ 1]
    /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7f49ee26f54c]
    [oscar-Vostro-3550:06740] [ 2] ./exe() [0x406742]
    [oscar-Vostro-3550:06740] [ 3] ./exe(main+0x34) [0x406ac9]
    [oscar-Vostro-3550:06740] [ 4]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
[0x7f49ee20d76d]
    [oscar-Vostro-3550:06742] [ 0]
    /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f6877fdc4a0]
    [oscar-Vostro-3550:06742] [ 1]
    /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7f687802954c]
    [oscar-Vostro-3550:06742] [ 2] ./exe() [0x406742]
    [oscar-Vostro-3550:06742] [ 3] ./exe(main+0x34) [0x406ac9]
    [oscar-Vostro-3550:06742] [ 4]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
[0x7f6877fc776d]
    [oscar-Vostro-3550:06742] [ 5] ./exe() [0x401399]
    [oscar-Vostro-3550:06742] *** End of error message ***
    [oscar-Vostro-3550:06740] [ 5] ./exe() [0x401399]
    [oscar-Vostro-3550:06740] *** End of error message ***
    [oscar-Vostro-3550:06741] [ 0]
    /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fa6c4c6e4a0]
    [oscar-Vostro-3550:06741] [ 1]
    /lib/x86_64-linux-gnu/libc.so.6(cfree+0x1c) [0x7fa6c4cbb54c]
    [oscar-Vostro-3550:06741] [ 2] ./exe() [0x406742]
    [oscar-Vostro-3550:06741] [ 3] ./exe(main+0x34) [0x406ac9]
    [oscar-Vostro-3550:06741] [ 4]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
[0x7fa6c4c5976d]
    [oscar-Vostro-3550:06741] [ 5] ./exe() [0x401399]
    [oscar-Vostro-3550:06741] *** End of error message ***

--------------------------------------------------------------------------

    mpirun noticed that process rank 0 with PID 6917 on node
    oscar-Vostro-3550 exited on signal 11 (Segmentation fault).

--------------------------------------------------------------------------

    2 total processes killed (some possibly by mpirun during cleanup)

    If there is a segmentation fault in no case it must work .
    I checked the program and didn't find the error. Why does the
    program work with five temperatures?
    Could someone help me to find the error and answer my question
please.

    The program and the necessary files to run it  are attached

    Thanks


    _Oscar Fabian Mojica Ladino_
    Geologist M.S. in  Geophysics

    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to