building openmpi with option "--without-memory-manager" fix my problem.

What does it exactly imply to compile with this option ?
I guess all malloc use functions from libc instead of openmpi one, but does
it have an effect on performance or something else ?

Nicolas

2010/8/8 Nysal Jan <jny...@gmail.com>

> What interconnect are you using? Infiniband? Use
> "--without-memory-manager" option while building ompi in order to disable
> ptmalloc.
>
> Regards
> --Nysal
>
>
> On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere <
> nicolas.deladerri...@gmail.com> wrote:
>
>> Yes, I'am using 24G machine on 64 bit Linux OS.
>> If I compile without wrapper, I did not get any problems.
>>
>> It seems that when I am linking with openmpi, my program use a kind of
>> openmpi implemented malloc. Is it possible to switch it off in order ot only
>> use malloc from libc ?
>>
>> Nicolas
>>
>> 2010/8/8 Terry Frankcombe <te...@chem.gu.se>
>>
>> You're trying to do a 6GB allocate.  Can your underlying system handle
>>> that?  IF you compile without the wrapper, does it work?
>>>
>>> I see your executable is using the OMPI memory stuff.  IIRC there are
>>> switches to turn that off.
>>>
>>>
>>> On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote:
>>> > Hello,
>>> >
>>> > I'am having an sigsegv error when using simple program compiled and
>>> > link with openmpi.
>>> > I have reproduce the problem using really simple fortran code. It
>>> > actually does not even use MPI, but just link with mpi shared
>>> > libraries. (problem does not appear when I do not link with mpi
>>> > libraries)
>>> >    % cat allocate.F90
>>> >    program test
>>> >    implicit none
>>> >        integer, dimension(:), allocatable :: z
>>> >        integer(kind=8) :: l
>>> >
>>> >        write(*,*) "l ?"
>>> >        read(*,*) l
>>> >
>>> >        ALLOCATE(z(l))
>>> >        z(1) = 111
>>> >        z(l) = 222
>>> >        DEALLOCATE(z)
>>> >
>>> >    end program test
>>> >
>>> > I am using openmpi 1.4.2 and gfortran for my tests. Here is the
>>> > compilation :
>>> >
>>> >    % ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate
>>> > allocate.F90
>>> >    gfortran -g -o testallocate allocate.F90
>>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread
>>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib
>>> > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90
>>> > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
>>> > -lutil -lm -ldl -pthread
>>> >
>>> > When I am running that test with different length, I sometimes get a
>>> > "Segmentation fault" error. Here are two examples using two specific
>>> > values, but error happens for many other values of length (I did not
>>> > manage to find which values of lenght gives that error)
>>> >
>>> >    %  ./testallocate
>>> >     l ?
>>> >    1600000000
>>> >    Segmentation fault
>>> >    % ./testallocate
>>> >     l ?
>>> >    2000000000
>>> >
>>> > I used debugger with re-compiled version of openmpi using debug flag.
>>> > I got the folowing error in function sYSMALLOc
>>> >
>>> >    Program received signal SIGSEGV, Segmentation fault.
>>> >    0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, av=0x2aaaab930200)
>>> > at malloc.c:3239
>>> >    3239        set_head(remainder, remainder_size | PREV_INUSE);
>>> >    Current language:  auto; currently c
>>> >    (gdb) bt
>>> >    #0  0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016,
>>> > av=0x2aaaab930200) at malloc.c:3239
>>> >    #1  0x00002aaaab70d0db in opal_memory_ptmalloc2_int_malloc
>>> > (av=0x2aaaab930200, bytes=6400000000) at malloc.c:4322
>>> >    #2  0x00002aaaab70b773 in opal_memory_ptmalloc2_malloc
>>> > (bytes=6400000000) at malloc.c:3435
>>> >    #3  0x00002aaaab70a665 in opal_memory_ptmalloc2_malloc_hook
>>> > (sz=6400000000, caller=0x2aaaabf8534d) at hooks.c:667
>>> >    #4  0x00002aaaabf8534d in _gfortran_internal_free ()
>>> > from /usr/lib64/libgfortran.so.1
>>> >    #5  0x0000000000400bcc in MAIN__ () at allocate.F90:11
>>> >    #6  0x0000000000400c4e in main ()
>>> >    (gdb) display
>>> >    (gdb) list
>>> >    3234      if ((unsigned long)(size) >= (unsigned long)(nb +
>>> > MINSIZE)) {
>>> >    3235        remainder_size = size - nb;
>>> >    3236        remainder = chunk_at_offset(p, nb);
>>> >    3237        av->top = remainder;
>>> >    3238        set_head(p, nb | PREV_INUSE | (av != &main_arena ?
>>> > NON_MAIN_ARENA : 0));
>>> >    3239        set_head(remainder, remainder_size | PREV_INUSE);
>>> >    3240        check_malloced_chunk(av, p, nb);
>>> >    3241        return chunk2mem(p);
>>> >    3242      }
>>> >    3243
>>> >
>>> >
>>> > I also did the same test in C and I got the same problem.
>>> >
>>> > Does someone has any idea that could help me understand what's going
>>> > on ?
>>> >
>>> > Regards
>>> > Nicolas
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to