*Usually* when we see segv's in calls to alloc, it means that there was 
previously some kind of memory bug, such as an array overflow or something like 
that (i.e., something that stomped on the memory allocation tables, causing the 
next alloc to fail).

Have you tried running your code through a memory-checking debugger?


On May 16, 2011, at 12:35 PM, Paul van der Walt wrote:

> Hi all,
> 
> I hope to provide enough information to make my problem clear. I
> have been debugging a lot after continually getting a segfault
> in my program, but then I decided to try and run it on another
> node, and it didn't segfault! The program which causes this
> strange behaviour can be downloaded with
> 
> $ git clone https://toothbr...@github.com/toothbrush/bsp-cg.git
> 
> It depends on bsponmpi (can be found at:
> http://bsponmpi.sourceforge.net/ ).
> 
> The machine on which I get a segfault is 
> Linux scarlatti 2.6.38-2-amd64 #1 SMP Thu Apr 7 04:28:07 UTC 2011 x86_64 
> GNU/Linux
> OpenMPI --version: mpirun (Open MPI) 1.4.3
> 
> And the error message is:
> [scarlatti:22100] *** Process received signal ***
> [scarlatti:22100] Signal: Segmentation fault (11)
> [scarlatti:22100] Signal code:  (128)
> [scarlatti:22100] Failing at address: (nil)
> [scarlatti:22100] [ 0] /lib/libpthread.so.0(+0xef60) [0x7f33ca69ef60]
> [scarlatti:22100] [ 1] /lib/libc.so.6(+0x74121) [0x7f33ca3a3121]
> [scarlatti:22100] [ 2] /lib/libc.so.6(__libc_malloc+0x70) [0x7f33ca3a5930]
> [scarlatti:22100] [ 3] src/cg(vecalloci+0x2c) [0x401789]
> [scarlatti:22100] [ 4] src/cg(bspmv_init+0x60) [0x40286a]
> [scarlatti:22100] [ 5] src/cg(bspcg+0x63b) [0x401f8b]
> [scarlatti:22100] [ 6] src/cg(main+0xd3) [0x402517]
> [scarlatti:22100] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f33ca34dc4d]
> [scarlatti:22100] [ 8] src/cg() [0x401609]
> [scarlatti:22100] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 22100 on node scarlatti exited on 
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> The program can be invoked (after downloading the source,
> running make, and cd'ing into the project's root directory)
> like:
> 
> $ mpirun -np 2 src/cg examples/test.mtx-P2 examples/test.mtx-v2 
> examples/test.mtx-u2
> 
> The program seems to fail at src/bspedupack.c:vecalloci(), but
> printf'ing the pointer that's returned by malloc() looks okay.
> 
> The node on which the program DOES run without segfault is as
> follows: (OS X laptop)
> 
> Darwin purcell 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 
> 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
> OpenMPI --version: mpirun (Open MPI) 1.2.8
> 
> Please inform if this is a real bug in OpenMPI, or if I'm coding
> something incorrectly. Note that I'm not asking anyone to debug
> my code for me, it's purely in case people want to try and
> reproduce my error locally. 
> 
> If I can provide more info, please advise. I'm not an MPI
> expert, unfortunately. 
> 
> Kind regards,
> 
> Paul van der Walt
> 
> -- 
> O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to