Thomas,

i did double check this and
- there is no problem with MPI_Isend/MPI_Irecv (datatypes are correctly retained/released, and this part is well "hidden" inside some macros) - there is no such thing with libnbc (and hence the bug). depending on the collective and the algo that will be chosed (depends on communicator and message size) you may or may not hit the bug.

i opened https://github.com/open-mpi/ompi/issues/1304 in order to track this issue, and will start making a proof of concept from now.

Cheers,

Gilles


On 1/13/2016 11:00 PM, Gilles Gouaillardet wrote:
Thomas,

thanks for the report,

at first glance, libnbc (the default module that implements non blocking collective) does not retain/release datatypes, that is why you ran into this kind of trouble.

I quickly checked the code, and it seems this kind of mechanism is also missing for MPI_Isend/MPI_Irecv ...

I will investigate this further

Cheers,

Gilles

On Wednesday, January 13, 2016, Thomas Ponweiser <thomas.ponwei...@risc-software.at <javascript:_e(%7B%7D,'cvml','thomas.ponwei...@risc-software.at');>> wrote:

    Dear friends of Open MPI,

    I am currently facing a problem in connection with MPI_Ibcast and
    MPI_Type_free. I've been able to isolate the problem in a
    minimalistic test program which I attached.

    Maybe some of you can tell me what I am doing wrong or confirm
    that this might be a bug in Open MPI (I am using version 1.10.1).

    Here is what I am doing:
    1) I have two struct types, foo_type and bar_type, as follows:

    typedef struct
    {
       int v[6];
       long l;
    } foo_type;

    typedef struct
    {
       int v[3];
       foo_type foo;
    } bar_type;

    2) I am creating corresponding MPI types (foo_mpitype and
    bar_mpitype) with MPI_Type_create_struct.

    3) I am freeing foo_mpitype.

    4) I am broadcasting a variable of type bar_type with MPI_Ibcast
    (using count = 1 and datatype = bar_mpitype).

    5) I am freeing bar_mpitype.

    6) I am waiting for the completion of step 4) with MPI_Wait.

    In step 6) I get a segmentation fault within MPI_Wait, but only if
    the MPI job is larger than 4 processes.

    Testing with MPICH 3.2, the program seems to work just fine.

    I found out that swapping the steps 5) and 6) helps. But I think
    this should not make any difference, according to the description
    of MPI_Type_free at
    http://www.mpi-forum.org/docs/mpi-1.1/mpi-11-html/node58.html:
    "Any communication that is currently using this datatype will
    complete normally." And: " Freeing a datatype does not affect any
    other datatype that was built from the freed datatype."

    (In fact, doing the same thing, that is MPI_IBcast followed by
    MPI_Type_free followed by MPI_Wait, with foo_type and foo_mpitype
    seems to work fine).

    Thanks in advance for your help,

    kind regards,
    Thomas



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/01/28265.php

Reply via email to