Hi all,

I hope to provide enough information to make my problem clear. I
have been debugging a lot after continually getting a segfault
in my program, but then I decided to try and run it on another
node, and it didn't segfault! The program which causes this
strange behaviour can be downloaded with

$ git clone https://toothbr...@github.com/toothbrush/bsp-cg.git

It depends on bsponmpi (can be found at:
http://bsponmpi.sourceforge.net/ ).

The machine on which I get a segfault is 
Linux scarlatti 2.6.38-2-amd64 #1 SMP Thu Apr 7 04:28:07 UTC 2011 x86_64 
GNU/Linux
OpenMPI --version: mpirun (Open MPI) 1.4.3

And the error message is:
[scarlatti:22100] *** Process received signal ***
[scarlatti:22100] Signal: Segmentation fault (11)
[scarlatti:22100] Signal code:  (128)
[scarlatti:22100] Failing at address: (nil)
[scarlatti:22100] [ 0] /lib/libpthread.so.0(+0xef60) [0x7f33ca69ef60]
[scarlatti:22100] [ 1] /lib/libc.so.6(+0x74121) [0x7f33ca3a3121]
[scarlatti:22100] [ 2] /lib/libc.so.6(__libc_malloc+0x70) [0x7f33ca3a5930]
[scarlatti:22100] [ 3] src/cg(vecalloci+0x2c) [0x401789]
[scarlatti:22100] [ 4] src/cg(bspmv_init+0x60) [0x40286a]
[scarlatti:22100] [ 5] src/cg(bspcg+0x63b) [0x401f8b]
[scarlatti:22100] [ 6] src/cg(main+0xd3) [0x402517]
[scarlatti:22100] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f33ca34dc4d]
[scarlatti:22100] [ 8] src/cg() [0x401609]
[scarlatti:22100] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 22100 on node scarlatti exited on 
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

The program can be invoked (after downloading the source,
running make, and cd'ing into the project's root directory)
like:

$ mpirun -np 2 src/cg examples/test.mtx-P2 examples/test.mtx-v2 
examples/test.mtx-u2

The program seems to fail at src/bspedupack.c:vecalloci(), but
printf'ing the pointer that's returned by malloc() looks okay.

The node on which the program DOES run without segfault is as
follows: (OS X laptop)

Darwin purcell 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 
2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
OpenMPI --version: mpirun (Open MPI) 1.2.8

Please inform if this is a real bug in OpenMPI, or if I'm coding
something incorrectly. Note that I'm not asking anyone to debug
my code for me, it's purely in case people want to try and
reproduce my error locally. 

If I can provide more info, please advise. I'm not an MPI
expert, unfortunately. 

Kind regards,

Paul van der Walt

-- 
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Attachment: signature.asc
Description: Digital signature

Reply via email to