I know I'm quite late to this thread, but Edgar is correct: the arguments in 
collective calls -- including the lengths in sendcounts and recvcounts in 
alltoallv -- must match at all processes.

This is different than point-to-point MPI calls, where a sender can send a 
smaller count than the receiver posted.

Hence, I think Open MPI's current alltoallv behavior is correct.  If your code 
is working in the OMPI 1.5 series, it's only by chance / you code may be 
invoking nondeterministic behavior.



> On Apr 8, 2015, at 10:41 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
> 
> I think the following paragraph might be useful. Its in MPI-3, page 142 lines 
> 16-20:
> 
> "The type-matching conditions for the collective operations are more strict 
> than the corresponding conditions between sender and receiver in 
> point-to-point. Namely, for collective operations, the amount of data sent 
> must exactly match the amount of data specified by the receiver. Different 
> type maps (the layout in memory, see Section 4.1) between sender and receiver 
> are still allowed".
> 
> 
> Thanks
> Edgar
> 
> On 4/8/2015 9:30 AM, Ralph Castain wrote:
>> In the interim, perhaps another way of addressing this would be to ask:
>> what happens when you run your reproducer with MPICH? Does that work?
>> 
>> This would at least tell us how another implementation interpreted that
>> function.
>> 
>> 
>>> On Apr 7, 2015, at 10:18 AM, Ralph Castain <r...@open-mpi.org
>>> <mailto:r...@open-mpi.org>> wrote:
>>> 
>>> I’m afraid we’ll have to get someone from the Forum to interpret
>>> (Howard is a member as well), but here is what I see just below that,
>>> in the description section:
>>> 
>>> /The type signature associated with sendcounts[j], sendtype at
>>> process i must be equal to the type signature associated
>>> with recvcounts[i], recvtype at process j. This implies that the
>>> amount of data sent must be equal to the amount of data received,
>>> pairwise between every pair of processes/
>>> 
>>> 
>>>> On Apr 7, 2015, at 9:56 AM, Hamidreza Anvari <hr.anv...@gmail.com
>>>> <mailto:hr.anv...@gmail.com>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> Thanks for your description.
>>>> I'm currently doing allToAll() prior to allToAllV(), to communicate
>>>> length of expected messages.
>>>> .
>>>> BUT, I still strongly believe that the right implementation of this
>>>> method is something that I expected earlier!
>>>> If you check the MPI specification here:
>>>> 
>>>> http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
>>>> Page 170
>>>> Line 14
>>>> 
>>>> It is mentioned that "... the number of elements that CAN be
>>>> received...". which implies that the actual received message may have
>>>> shorter length.
>>>> 
>>>> While in cases where it is mandatory to have same value, the modal
>>>> "MUST" is used. for example at page 171 Line 1, it is mentioned that
>>>> "... sendtype at process i MUST be equal to the type signature ...".
>>>> 
>>>> SO, I would expect that any consistent implementation of MPI
>>>> specification handle this message length matching by itself, as I
>>>> asked originally.
>>>> 
>>>> Thanks,
>>>> -- HR
>>>> 
>>>> On Tue, Apr 7, 2015 at 6:03 AM, Howard Pritchard <hpprit...@gmail.com
>>>> <mailto:hpprit...@gmail.com>> wrote:
>>>> 
>>>>    Hi HR,
>>>> 
>>>>    Sorry for not noticing the receive side earlier, but as Ralph
>>>>    implied earlier
>>>>    in this thread, the MPI standard has more strict type matching
>>>>    for collectives
>>>>    than for point to point.  Namely, the number of bytes the
>>>>    receiver expects
>>>>    to receive from a given sender in the alltoallv must match the
>>>>    number of bytes
>>>>    sent by the sender.
>>>> 
>>>>    You were just getting lucky with the older open mpi.  The error
>>>>    message
>>>>    isn't so great though.  Its likely in the newer open mpi you are
>>>>    using a
>>>>    collective algorithm for alltoallv that assumes you're app is
>>>>    obeying the
>>>>    standard.
>>>> 
>>>>    You are correct that if the ranks don't know how much data will
>>>>    be sent
>>>>    to them from each rank prior to the alltoallv op, you will need
>>>>    to have some
>>>>    mechanism for exchanging this info prior to the alltoallv op.
>>>> 
>>>>    Howard
>>>> 
>>>> 
>>>>    2015-04-06 23:23 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com
>>>>    <mailto:hr.anv...@gmail.com>>:
>>>> 
>>>>        Hello,
>>>> 
>>>>        If I set the size2 values according to your suggestion, which
>>>>        is the same values as on sending nodes, it works fine.
>>>>        But by definition it does not need to be exactly the same as
>>>>        the length of sent data, and it is just a maximum length of
>>>>        expected data to receive. If not, it is inevitable to run a
>>>>        allToAll() first to communicate the data sizes, and then
>>>>        doing the main allToAllV(), which is an expensive unnecessary
>>>>        communication overhead.
>>>> 
>>>>        I just created a reproducer in C++ which gives the error
>>>>        under OpenMPI 1.8.4, but runs correctly under OpenMPI 1.5.4 .
>>>>        (I've not included the Java version of this reproducer, which
>>>>        I think is not important as current version is enough to
>>>>        reproduce the error. But in case, it is straight forward to
>>>>        convert this code to Java).
>>>> 
>>>>        Thanks,
>>>>        -- HR
>>>> 
>>>>        On Mon, Apr 6, 2015 at 3:03 PM, Ralph Castain
>>>>        <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:
>>>> 
>>>>            That would imply that the issue is in the underlying C
>>>>            implementation in OMPI, not the Java bindings. The
>>>>            reproducer would definitely help pin it down.
>>>> 
>>>>            If you change the size2 values to the ones we sent you,
>>>>            does the program by chance work?
>>>> 
>>>> 
>>>>>            On Apr 6, 2015, at 1:44 PM, Hamidreza Anvari
>>>>>            <hr.anv...@gmail.com <mailto:hr.anv...@gmail.com>> wrote:
>>>>> 
>>>>>            I'll try that as well.
>>>>>            Meanwhile, I found that my c++ code is running fine on a
>>>>>            machine running OpenMPI 1.5.4, but I receive the same
>>>>>            error under OpenMPI 1.8.4 for both Java and C++.
>>>>> 
>>>>>            On Mon, Apr 6, 2015 at 2:21 PM, Howard Pritchard
>>>>>            <hpprit...@gmail.com <mailto:hpprit...@gmail.com>> wrote:
>>>>> 
>>>>>                Hello HR,
>>>>> 
>>>>>                Thanks!  If you have Java 1.7 installed on your
>>>>>                system would you mind trying to test against that
>>>>>                version too?
>>>>> 
>>>>>                Thanks,
>>>>> 
>>>>>                Howard
>>>>> 
>>>>> 
>>>>>                2015-04-06 13:09 GMT-06:00 Hamidreza Anvari
>>>>>                <hr.anv...@gmail.com <mailto:hr.anv...@gmail.com>>:
>>>>> 
>>>>>                    Hello,
>>>>> 
>>>>>                    1. I'm using Java/Javac version 1.8.0_20 under
>>>>>                    OS X 10.10.2.
>>>>> 
>>>>>                    2. I have used the following configuration for
>>>>>                    making OpenMPI:
>>>>>                    ./configure --enable-mpi-java
>>>>>                    
>>>>> --with-jdk-bindir="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands"
>>>>>                    
>>>>> --with-jdk-headers="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers"
>>>>>                    --prefix="/users/hamidreza/openmpi-1.8.4"
>>>>> 
>>>>>                    make all install
>>>>> 
>>>>>                    3. As a logical point of view, size2 is the
>>>>>                    maximum expected data to receive, which in turn
>>>>>                    might be less that this maximum.
>>>>> 
>>>>>                    4. I will try to prepare a working reproducer of
>>>>>                    my error and send it to you.
>>>>> 
>>>>>                    Thanks,
>>>>>                    -- HR
>>>>> 
>>>>>                    On Mon, Apr 6, 2015 at 10:46 AM, Ralph Castain
>>>>>                    <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:
>>>>> 
>>>>>                        I’ve talked to the folks who wrote the Java
>>>>>                        bindings. One possibility we identified is
>>>>>                        that there may be an error in your code when
>>>>>                        you did the translation
>>>>> 
>>>>>>                        My immediate thought is that each process
>>>>>>                        can not receive more elements than it was
>>>>>>                        sent to them. That's the reason of
>>>>>>                        truncation error.
>>>>>> 
>>>>>>                        These are the correct values:
>>>>>> 
>>>>>>                        rank 0 - size2: 2,2,1,1
>>>>>>                        rank 1 - size2: 1,1,1,1
>>>>>>                        rank 2 - size2: 0,1,1,2
>>>>>>                        rank 3 - size2: 2,1,2,1
>>>>> 
>>>>>                        Can you check your code to see if perhaps
>>>>>                        the values you are passing didn’t get
>>>>>                        translated correctly from your C++ version
>>>>>                        to the Java version?
>>>>> 
>>>>> 
>>>>> 
>>>>>>                        On Apr 6, 2015, at 5:03 AM, Howard
>>>>>>                        Pritchard <hpprit...@gmail.com
>>>>>>                        <mailto:hpprit...@gmail.com>> wrote:
>>>>>> 
>>>>>>                        Hello HR,
>>>>>> 
>>>>>>                        It would also be useful to know which java
>>>>>>                        version you are using, as well
>>>>>>                        as the configure options used when building
>>>>>>                        open mpi.
>>>>>> 
>>>>>>                        Thanks,
>>>>>> 
>>>>>>                        Howard
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>                        2015-04-05 19:10 GMT-06:00 Ralph Castain
>>>>>>                        <r...@open-mpi.org <mailto:r...@open-mpi.org>>:
>>>>>> 
>>>>>>                            If not too much trouble, can you
>>>>>>                            extract just the alltoallv portion and
>>>>>>                            provide us with a small reproducer?
>>>>>> 
>>>>>> 
>>>>>>>                            On Apr 5, 2015, at 12:11 PM, Hamidreza
>>>>>>>                            Anvari <hr.anv...@gmail.com
>>>>>>>                            <mailto:hr.anv...@gmail.com>> wrote:
>>>>>>> 
>>>>>>>                            Hello,
>>>>>>> 
>>>>>>>                            I am converting an existing MPI
>>>>>>>                            program in C++ to Java using OpenMPI
>>>>>>>                            1.8.4,
>>>>>>>                            At some point I have a allToAllv()
>>>>>>>                            code which works fine in C++ but
>>>>>>>                            receives error in Java version:
>>>>>>> 
>>>>>>>                            MPI.COMM_WORLD.allToAllv(data,
>>>>>>>                            subpartition_size,
>>>>>>>                            subpartition_offset, MPI.INT
>>>>>>>                            <http://mpi.int/>,
>>>>>>>                            
>>>>>>> data2,subpartition_size2,subpartition_offset2,MPI.INT
>>>>>>>                            <http://mpi.int/>);
>>>>>>> 
>>>>>>>                            Error:
>>>>>>>                            *** An error occurred in MPI_Alltoallv
>>>>>>>                            *** reported by process
>>>>>>>                            [3621322753,9223372036854775811]
>>>>>>>                            *** on communicator MPI_COMM_WORLD
>>>>>>>                            *** MPI_ERR_TRUNCATE: message truncated
>>>>>>>                            *** MPI_ERRORS_ARE_FATAL (processes in
>>>>>>>                            this communicator will now abort,
>>>>>>>                            ***    and potentially your MPI job)
>>>>>>>                            3 more processes have sent help
>>>>>>>                            message help-mpi-errors.txt /
>>>>>>>                            mpi_errors_are_fatal
>>>>>>>                            Set MCA parameter
>>>>>>>                            "orte_base_help_aggregate" to 0 to see
>>>>>>>                            all help / error messages
>>>>>>> 
>>>>>>>                            Here are the values for parameters:
>>>>>>> 
>>>>>>>                            data.length = 5
>>>>>>>                            data2.length = 20
>>>>>>> 
>>>>>>>                            ---------- Rank 0 of 4 ----------
>>>>>>>                            subpartition_offset:0,2,3,3,
>>>>>>>                            subpartition_size:2,1,0,2,
>>>>>>>                            subpartition_offset2:0,5,10,15,
>>>>>>>                            subpartition_size2:5,5,5,5,
>>>>>>>                            ----------
>>>>>>>                            ---------- Rank 1 of 4 ----------
>>>>>>>                            subpartition_offset:0,2,3,4,
>>>>>>>                            subpartition_size:2,1,1,1,
>>>>>>>                            subpartition_offset2:0,5,10,15,
>>>>>>>                            subpartition_size2:5,5,5,5,
>>>>>>>                            ----------
>>>>>>>                            ---------- Rank 2 of 4 ----------
>>>>>>>                            subpartition_offset:0,1,2,3,
>>>>>>>                            subpartition_size:1,1,1,2,
>>>>>>>                            subpartition_offset2:0,5,10,15,
>>>>>>>                            subpartition_size2:5,5,5,5,
>>>>>>>                            ----------
>>>>>>>                            ---------- Rank 3 of 4 ----------
>>>>>>>                            subpartition_offset:0,1,2,4,
>>>>>>>                            subpartition_size:1,1,2,1,
>>>>>>>                            subpartition_offset2:0,5,10,15,
>>>>>>>                            subpartition_size2:5,5,5,5,
>>>>>>>                            ----------
>>>>>>> 
>>>>>>>                            Again, this is a code which works in
>>>>>>>                            C++ version.
>>>>>>> 
>>>>>>>                            Any help or advice is greatly appreciated.
>>>>>>> 
>>>>>>>                            Thanks,
>>>>>>>                            -- HR
>>>>>>>                            
>>>>>>> _______________________________________________
>>>>>>>                            users mailing list
>>>>>>>                            us...@open-mpi.org
>>>>>>>                            <mailto:us...@open-mpi.org>
>>>>>>>                            Subscription:
>>>>>>>                            
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>                            Link to this post:
>>>>>>>                            
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26610.php
>>>>>> 
>>>>>> 
>>>>>>                            
>>>>>> _______________________________________________
>>>>>>                            users mailing list
>>>>>>                            us...@open-mpi.org
>>>>>>                            <mailto:us...@open-mpi.org>
>>>>>>                            Subscription:
>>>>>>                            
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>                            Link to this post:
>>>>>>                            
>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26613.php
>>>>>> 
>>>>>> 
>>>>>>                        _______________________________________________
>>>>>>                        users mailing list
>>>>>>                        us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>>                        Subscription:
>>>>>>                        http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>                        Link to this post:
>>>>>>                        
>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26615.php
>>>>> 
>>>>> 
>>>>>                        _______________________________________________
>>>>>                        users mailing list
>>>>>                        us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>                        Subscription:
>>>>>                        http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>                        Link to this post:
>>>>>                        
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26616.php
>>>>> 
>>>>> 
>>>>> 
>>>>>                    _______________________________________________
>>>>>                    users mailing list
>>>>>                    us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>                    Subscription:
>>>>>                    http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>                    Link to this post:
>>>>>                    
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26617.php
>>>>> 
>>>>> 
>>>>> 
>>>>>                _______________________________________________
>>>>>                users mailing list
>>>>>                us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>                Subscription:
>>>>>                http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>                Link to this post:
>>>>>                
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26620.php
>>>>> 
>>>>> 
>>>>>            _______________________________________________
>>>>>            users mailing list
>>>>>            us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>>            Subscription:
>>>>>            http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>            Link to this post:
>>>>>            http://www.open-mpi.org/community/lists/users/2015/04/26622.php
>>>> 
>>>> 
>>>>            _______________________________________________
>>>>            users mailing list
>>>>            us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>            Subscription:
>>>>            http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>            Link to this post:
>>>>            http://www.open-mpi.org/community/lists/users/2015/04/26623.php
>>>> 
>>>> 
>>>> 
>>>>        _______________________________________________
>>>>        users mailing list
>>>>        us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>        Link to this post:
>>>>        http://www.open-mpi.org/community/lists/users/2015/04/26631.php
>>>> 
>>>> 
>>>> 
>>>>    _______________________________________________
>>>>    users mailing list
>>>>    us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>    Link to this post:
>>>>    http://www.open-mpi.org/community/lists/users/2015/04/26634.php
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26637.php
>>> 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26648.php
>> 
> 
> -- 
> Edgar Gabriel
> Associate Professor
> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
> Department of Computer Science          University of Houston
> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
> --
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/04/26649.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to