Re: [OMPI users] Segfault after malloc()?

2011-05-18 Thread Paul van der Walt
Okay cool, mine already breaks with P=2, so I'll try this soon. Thanks
for the impatient-idiot's-guide :)

On 18 May 2011 14:15, Jeff Squyres  wrote:
> If you're only running with a few MPI processes, you might be able to get 
> away with:
>
> mpirun -np 4 valgrind ./my_mpi_application
>
> If you run any more than that, the output gets too jumbled and you should 
> output each process' valgrind stdout to a different file with the --log-file 
> option (IIRC).
>
> I personally like these valgrind options:
>
> valgrind --num-callers=50 --db-attach=yes --tool=memcheck --leak-check=yes 
> --show-reachable=yes
>
>
>
> On May 18, 2011, at 8:49 AM, Paul van der Walt wrote:
>
>> Hi Jeff,
>>
>> Thanks for the response.
>>
>> On 18 May 2011 13:30, Jeff Squyres  wrote:
>>> *Usually* when we see segv's in calls to alloc, it means that there was 
>>> previously some kind of memory bug, such as an array overflow or something 
>>> like that (i.e., something that stomped on the memory allocation tables, 
>>> causing the next alloc to fail).
>>>
>>> Have you tried running your code through a memory-checking debugger?
>>
>> I sort-of tried with valgrind, but I'm not really sure how to
>> interpret the output (I'm not such a C-wizard). I'll have another look
>> a little later then and report back. I suppose I should RTFM on how to
>> properly invoke valgrind so it makes sense with an MPI program?
>>
>> Paul
>>
>> --
>> O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org


Re: [OMPI users] Segfault after malloc()?

2011-05-18 Thread Jeff Squyres
If you're only running with a few MPI processes, you might be able to get away 
with:

mpirun -np 4 valgrind ./my_mpi_application

If you run any more than that, the output gets too jumbled and you should 
output each process' valgrind stdout to a different file with the --log-file 
option (IIRC).

I personally like these valgrind options:

valgrind --num-callers=50 --db-attach=yes --tool=memcheck --leak-check=yes 
--show-reachable=yes



On May 18, 2011, at 8:49 AM, Paul van der Walt wrote:

> Hi Jeff,
> 
> Thanks for the response.
> 
> On 18 May 2011 13:30, Jeff Squyres  wrote:
>> *Usually* when we see segv's in calls to alloc, it means that there was 
>> previously some kind of memory bug, such as an array overflow or something 
>> like that (i.e., something that stomped on the memory allocation tables, 
>> causing the next alloc to fail).
>> 
>> Have you tried running your code through a memory-checking debugger?
> 
> I sort-of tried with valgrind, but I'm not really sure how to
> interpret the output (I'm not such a C-wizard). I'll have another look
> a little later then and report back. I suppose I should RTFM on how to
> properly invoke valgrind so it makes sense with an MPI program?
> 
> Paul
> 
> -- 
> O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Segfault after malloc()?

2011-05-18 Thread Paul van der Walt
Hi Jeff,

Thanks for the response.

On 18 May 2011 13:30, Jeff Squyres  wrote:
> *Usually* when we see segv's in calls to alloc, it means that there was 
> previously some kind of memory bug, such as an array overflow or something 
> like that (i.e., something that stomped on the memory allocation tables, 
> causing the next alloc to fail).
>
> Have you tried running your code through a memory-checking debugger?

I sort-of tried with valgrind, but I'm not really sure how to
interpret the output (I'm not such a C-wizard). I'll have another look
a little later then and report back. I suppose I should RTFM on how to
properly invoke valgrind so it makes sense with an MPI program?

Paul

-- 
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org


Re: [OMPI users] Segfault after malloc()?

2011-05-18 Thread Jeff Squyres
*Usually* when we see segv's in calls to alloc, it means that there was 
previously some kind of memory bug, such as an array overflow or something like 
that (i.e., something that stomped on the memory allocation tables, causing the 
next alloc to fail).

Have you tried running your code through a memory-checking debugger?


On May 16, 2011, at 12:35 PM, Paul van der Walt wrote:

> Hi all,
> 
> I hope to provide enough information to make my problem clear. I
> have been debugging a lot after continually getting a segfault
> in my program, but then I decided to try and run it on another
> node, and it didn't segfault! The program which causes this
> strange behaviour can be downloaded with
> 
> $ git clone https://toothbr...@github.com/toothbrush/bsp-cg.git
> 
> It depends on bsponmpi (can be found at:
> http://bsponmpi.sourceforge.net/ ).
> 
> The machine on which I get a segfault is 
> Linux scarlatti 2.6.38-2-amd64 #1 SMP Thu Apr 7 04:28:07 UTC 2011 x86_64 
> GNU/Linux
> OpenMPI --version: mpirun (Open MPI) 1.4.3
> 
> And the error message is:
> [scarlatti:22100] *** Process received signal ***
> [scarlatti:22100] Signal: Segmentation fault (11)
> [scarlatti:22100] Signal code:  (128)
> [scarlatti:22100] Failing at address: (nil)
> [scarlatti:22100] [ 0] /lib/libpthread.so.0(+0xef60) [0x7f33ca69ef60]
> [scarlatti:22100] [ 1] /lib/libc.so.6(+0x74121) [0x7f33ca3a3121]
> [scarlatti:22100] [ 2] /lib/libc.so.6(__libc_malloc+0x70) [0x7f33ca3a5930]
> [scarlatti:22100] [ 3] src/cg(vecalloci+0x2c) [0x401789]
> [scarlatti:22100] [ 4] src/cg(bspmv_init+0x60) [0x40286a]
> [scarlatti:22100] [ 5] src/cg(bspcg+0x63b) [0x401f8b]
> [scarlatti:22100] [ 6] src/cg(main+0xd3) [0x402517]
> [scarlatti:22100] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f33ca34dc4d]
> [scarlatti:22100] [ 8] src/cg() [0x401609]
> [scarlatti:22100] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 22100 on node scarlatti exited on 
> signal 11 (Segmentation fault).
> --
> 
> The program can be invoked (after downloading the source,
> running make, and cd'ing into the project's root directory)
> like:
> 
> $ mpirun -np 2 src/cg examples/test.mtx-P2 examples/test.mtx-v2 
> examples/test.mtx-u2
> 
> The program seems to fail at src/bspedupack.c:vecalloci(), but
> printf'ing the pointer that's returned by malloc() looks okay.
> 
> The node on which the program DOES run without segfault is as
> follows: (OS X laptop)
> 
> Darwin purcell 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 
> 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
> OpenMPI --version: mpirun (Open MPI) 1.2.8
> 
> Please inform if this is a real bug in OpenMPI, or if I'm coding
> something incorrectly. Note that I'm not asking anyone to debug
> my code for me, it's purely in case people want to try and
> reproduce my error locally. 
> 
> If I can provide more info, please advise. I'm not an MPI
> expert, unfortunately. 
> 
> Kind regards,
> 
> Paul van der Walt
> 
> -- 
> O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Segfault after malloc()?

2011-05-16 Thread Paul van der Walt
Hi all,

I hope to provide enough information to make my problem clear. I
have been debugging a lot after continually getting a segfault
in my program, but then I decided to try and run it on another
node, and it didn't segfault! The program which causes this
strange behaviour can be downloaded with

$ git clone https://toothbr...@github.com/toothbrush/bsp-cg.git

It depends on bsponmpi (can be found at:
http://bsponmpi.sourceforge.net/ ).

The machine on which I get a segfault is 
Linux scarlatti 2.6.38-2-amd64 #1 SMP Thu Apr 7 04:28:07 UTC 2011 x86_64 
GNU/Linux
OpenMPI --version: mpirun (Open MPI) 1.4.3

And the error message is:
[scarlatti:22100] *** Process received signal ***
[scarlatti:22100] Signal: Segmentation fault (11)
[scarlatti:22100] Signal code:  (128)
[scarlatti:22100] Failing at address: (nil)
[scarlatti:22100] [ 0] /lib/libpthread.so.0(+0xef60) [0x7f33ca69ef60]
[scarlatti:22100] [ 1] /lib/libc.so.6(+0x74121) [0x7f33ca3a3121]
[scarlatti:22100] [ 2] /lib/libc.so.6(__libc_malloc+0x70) [0x7f33ca3a5930]
[scarlatti:22100] [ 3] src/cg(vecalloci+0x2c) [0x401789]
[scarlatti:22100] [ 4] src/cg(bspmv_init+0x60) [0x40286a]
[scarlatti:22100] [ 5] src/cg(bspcg+0x63b) [0x401f8b]
[scarlatti:22100] [ 6] src/cg(main+0xd3) [0x402517]
[scarlatti:22100] [ 7] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f33ca34dc4d]
[scarlatti:22100] [ 8] src/cg() [0x401609]
[scarlatti:22100] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 22100 on node scarlatti exited on 
signal 11 (Segmentation fault).
--

The program can be invoked (after downloading the source,
running make, and cd'ing into the project's root directory)
like:

$ mpirun -np 2 src/cg examples/test.mtx-P2 examples/test.mtx-v2 
examples/test.mtx-u2

The program seems to fail at src/bspedupack.c:vecalloci(), but
printf'ing the pointer that's returned by malloc() looks okay.

The node on which the program DOES run without segfault is as
follows: (OS X laptop)

Darwin purcell 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 
2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
OpenMPI --version: mpirun (Open MPI) 1.2.8

Please inform if this is a real bug in OpenMPI, or if I'm coding
something incorrectly. Note that I'm not asking anyone to debug
my code for me, it's purely in case people want to try and
reproduce my error locally. 

If I can provide more info, please advise. I'm not an MPI
expert, unfortunately. 

Kind regards,

Paul van der Walt

-- 
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org


signature.asc
Description: Digital signature