Re: [OMPI users] Segfault after malloc()?
Okay cool, mine already breaks with P=2, so I'll try this soon. Thanks for the impatient-idiot's-guide :) On 18 May 2011 14:15, Jeff Squyreswrote: > If you're only running with a few MPI processes, you might be able to get > away with: > > mpirun -np 4 valgrind ./my_mpi_application > > If you run any more than that, the output gets too jumbled and you should > output each process' valgrind stdout to a different file with the --log-file > option (IIRC). > > I personally like these valgrind options: > > valgrind --num-callers=50 --db-attach=yes --tool=memcheck --leak-check=yes > --show-reachable=yes > > > > On May 18, 2011, at 8:49 AM, Paul van der Walt wrote: > >> Hi Jeff, >> >> Thanks for the response. >> >> On 18 May 2011 13:30, Jeff Squyres wrote: >>> *Usually* when we see segv's in calls to alloc, it means that there was >>> previously some kind of memory bug, such as an array overflow or something >>> like that (i.e., something that stomped on the memory allocation tables, >>> causing the next alloc to fail). >>> >>> Have you tried running your code through a memory-checking debugger? >> >> I sort-of tried with valgrind, but I'm not really sure how to >> interpret the output (I'm not such a C-wizard). I'll have another look >> a little later then and report back. I suppose I should RTFM on how to >> properly invoke valgrind so it makes sense with an MPI program? >> >> Paul >> >> -- >> O< ascii ribbon campaign - stop html mail - www.asciiribbon.org >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
Re: [OMPI users] Segfault after malloc()?
If you're only running with a few MPI processes, you might be able to get away with: mpirun -np 4 valgrind ./my_mpi_application If you run any more than that, the output gets too jumbled and you should output each process' valgrind stdout to a different file with the --log-file option (IIRC). I personally like these valgrind options: valgrind --num-callers=50 --db-attach=yes --tool=memcheck --leak-check=yes --show-reachable=yes On May 18, 2011, at 8:49 AM, Paul van der Walt wrote: > Hi Jeff, > > Thanks for the response. > > On 18 May 2011 13:30, Jeff Squyreswrote: >> *Usually* when we see segv's in calls to alloc, it means that there was >> previously some kind of memory bug, such as an array overflow or something >> like that (i.e., something that stomped on the memory allocation tables, >> causing the next alloc to fail). >> >> Have you tried running your code through a memory-checking debugger? > > I sort-of tried with valgrind, but I'm not really sure how to > interpret the output (I'm not such a C-wizard). I'll have another look > a little later then and report back. I suppose I should RTFM on how to > properly invoke valgrind so it makes sense with an MPI program? > > Paul > > -- > O< ascii ribbon campaign - stop html mail - www.asciiribbon.org > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Segfault after malloc()?
Hi Jeff, Thanks for the response. On 18 May 2011 13:30, Jeff Squyreswrote: > *Usually* when we see segv's in calls to alloc, it means that there was > previously some kind of memory bug, such as an array overflow or something > like that (i.e., something that stomped on the memory allocation tables, > causing the next alloc to fail). > > Have you tried running your code through a memory-checking debugger? I sort-of tried with valgrind, but I'm not really sure how to interpret the output (I'm not such a C-wizard). I'll have another look a little later then and report back. I suppose I should RTFM on how to properly invoke valgrind so it makes sense with an MPI program? Paul -- O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
Re: [OMPI users] Segfault after malloc()?
*Usually* when we see segv's in calls to alloc, it means that there was previously some kind of memory bug, such as an array overflow or something like that (i.e., something that stomped on the memory allocation tables, causing the next alloc to fail). Have you tried running your code through a memory-checking debugger? On May 16, 2011, at 12:35 PM, Paul van der Walt wrote: > Hi all, > > I hope to provide enough information to make my problem clear. I > have been debugging a lot after continually getting a segfault > in my program, but then I decided to try and run it on another > node, and it didn't segfault! The program which causes this > strange behaviour can be downloaded with > > $ git clone https://toothbr...@github.com/toothbrush/bsp-cg.git > > It depends on bsponmpi (can be found at: > http://bsponmpi.sourceforge.net/ ). > > The machine on which I get a segfault is > Linux scarlatti 2.6.38-2-amd64 #1 SMP Thu Apr 7 04:28:07 UTC 2011 x86_64 > GNU/Linux > OpenMPI --version: mpirun (Open MPI) 1.4.3 > > And the error message is: > [scarlatti:22100] *** Process received signal *** > [scarlatti:22100] Signal: Segmentation fault (11) > [scarlatti:22100] Signal code: (128) > [scarlatti:22100] Failing at address: (nil) > [scarlatti:22100] [ 0] /lib/libpthread.so.0(+0xef60) [0x7f33ca69ef60] > [scarlatti:22100] [ 1] /lib/libc.so.6(+0x74121) [0x7f33ca3a3121] > [scarlatti:22100] [ 2] /lib/libc.so.6(__libc_malloc+0x70) [0x7f33ca3a5930] > [scarlatti:22100] [ 3] src/cg(vecalloci+0x2c) [0x401789] > [scarlatti:22100] [ 4] src/cg(bspmv_init+0x60) [0x40286a] > [scarlatti:22100] [ 5] src/cg(bspcg+0x63b) [0x401f8b] > [scarlatti:22100] [ 6] src/cg(main+0xd3) [0x402517] > [scarlatti:22100] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f33ca34dc4d] > [scarlatti:22100] [ 8] src/cg() [0x401609] > [scarlatti:22100] *** End of error message *** > -- > mpirun noticed that process rank 0 with PID 22100 on node scarlatti exited on > signal 11 (Segmentation fault). > -- > > The program can be invoked (after downloading the source, > running make, and cd'ing into the project's root directory) > like: > > $ mpirun -np 2 src/cg examples/test.mtx-P2 examples/test.mtx-v2 > examples/test.mtx-u2 > > The program seems to fail at src/bspedupack.c:vecalloci(), but > printf'ing the pointer that's returned by malloc() looks okay. > > The node on which the program DOES run without segfault is as > follows: (OS X laptop) > > Darwin purcell 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST > 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386 > OpenMPI --version: mpirun (Open MPI) 1.2.8 > > Please inform if this is a real bug in OpenMPI, or if I'm coding > something incorrectly. Note that I'm not asking anyone to debug > my code for me, it's purely in case people want to try and > reproduce my error locally. > > If I can provide more info, please advise. I'm not an MPI > expert, unfortunately. > > Kind regards, > > Paul van der Walt > > -- > O< ascii ribbon campaign - stop html mail - www.asciiribbon.org > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] Segfault after malloc()?
Hi all, I hope to provide enough information to make my problem clear. I have been debugging a lot after continually getting a segfault in my program, but then I decided to try and run it on another node, and it didn't segfault! The program which causes this strange behaviour can be downloaded with $ git clone https://toothbr...@github.com/toothbrush/bsp-cg.git It depends on bsponmpi (can be found at: http://bsponmpi.sourceforge.net/ ). The machine on which I get a segfault is Linux scarlatti 2.6.38-2-amd64 #1 SMP Thu Apr 7 04:28:07 UTC 2011 x86_64 GNU/Linux OpenMPI --version: mpirun (Open MPI) 1.4.3 And the error message is: [scarlatti:22100] *** Process received signal *** [scarlatti:22100] Signal: Segmentation fault (11) [scarlatti:22100] Signal code: (128) [scarlatti:22100] Failing at address: (nil) [scarlatti:22100] [ 0] /lib/libpthread.so.0(+0xef60) [0x7f33ca69ef60] [scarlatti:22100] [ 1] /lib/libc.so.6(+0x74121) [0x7f33ca3a3121] [scarlatti:22100] [ 2] /lib/libc.so.6(__libc_malloc+0x70) [0x7f33ca3a5930] [scarlatti:22100] [ 3] src/cg(vecalloci+0x2c) [0x401789] [scarlatti:22100] [ 4] src/cg(bspmv_init+0x60) [0x40286a] [scarlatti:22100] [ 5] src/cg(bspcg+0x63b) [0x401f8b] [scarlatti:22100] [ 6] src/cg(main+0xd3) [0x402517] [scarlatti:22100] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f33ca34dc4d] [scarlatti:22100] [ 8] src/cg() [0x401609] [scarlatti:22100] *** End of error message *** -- mpirun noticed that process rank 0 with PID 22100 on node scarlatti exited on signal 11 (Segmentation fault). -- The program can be invoked (after downloading the source, running make, and cd'ing into the project's root directory) like: $ mpirun -np 2 src/cg examples/test.mtx-P2 examples/test.mtx-v2 examples/test.mtx-u2 The program seems to fail at src/bspedupack.c:vecalloci(), but printf'ing the pointer that's returned by malloc() looks okay. The node on which the program DOES run without segfault is as follows: (OS X laptop) Darwin purcell 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386 OpenMPI --version: mpirun (Open MPI) 1.2.8 Please inform if this is a real bug in OpenMPI, or if I'm coding something incorrectly. Note that I'm not asking anyone to debug my code for me, it's purely in case people want to try and reproduce my error locally. If I can provide more info, please advise. I'm not an MPI expert, unfortunately. Kind regards, Paul van der Walt -- O< ascii ribbon campaign - stop html mail - www.asciiribbon.org signature.asc Description: Digital signature