I know I'm quite late to this thread, but Edgar is correct: the arguments in collective calls -- including the lengths in sendcounts and recvcounts in alltoallv -- must match at all processes.
This is different than point-to-point MPI calls, where a sender can send a smaller count than the receiver posted. Hence, I think Open MPI's current alltoallv behavior is correct. If your code is working in the OMPI 1.5 series, it's only by chance / you code may be invoking nondeterministic behavior. > On Apr 8, 2015, at 10:41 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote: > > I think the following paragraph might be useful. Its in MPI-3, page 142 lines > 16-20: > > "The type-matching conditions for the collective operations are more strict > than the corresponding conditions between sender and receiver in > point-to-point. Namely, for collective operations, the amount of data sent > must exactly match the amount of data specified by the receiver. Different > type maps (the layout in memory, see Section 4.1) between sender and receiver > are still allowed". > > > Thanks > Edgar > > On 4/8/2015 9:30 AM, Ralph Castain wrote: >> In the interim, perhaps another way of addressing this would be to ask: >> what happens when you run your reproducer with MPICH? Does that work? >> >> This would at least tell us how another implementation interpreted that >> function. >> >> >>> On Apr 7, 2015, at 10:18 AM, Ralph Castain <r...@open-mpi.org >>> <mailto:r...@open-mpi.org>> wrote: >>> >>> I’m afraid we’ll have to get someone from the Forum to interpret >>> (Howard is a member as well), but here is what I see just below that, >>> in the description section: >>> >>> /The type signature associated with sendcounts[j], sendtype at >>> process i must be equal to the type signature associated >>> with recvcounts[i], recvtype at process j. This implies that the >>> amount of data sent must be equal to the amount of data received, >>> pairwise between every pair of processes/ >>> >>> >>>> On Apr 7, 2015, at 9:56 AM, Hamidreza Anvari <hr.anv...@gmail.com >>>> <mailto:hr.anv...@gmail.com>> wrote: >>>> >>>> Hello, >>>> >>>> Thanks for your description. >>>> I'm currently doing allToAll() prior to allToAllV(), to communicate >>>> length of expected messages. >>>> . >>>> BUT, I still strongly believe that the right implementation of this >>>> method is something that I expected earlier! >>>> If you check the MPI specification here: >>>> >>>> http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf >>>> Page 170 >>>> Line 14 >>>> >>>> It is mentioned that "... the number of elements that CAN be >>>> received...". which implies that the actual received message may have >>>> shorter length. >>>> >>>> While in cases where it is mandatory to have same value, the modal >>>> "MUST" is used. for example at page 171 Line 1, it is mentioned that >>>> "... sendtype at process i MUST be equal to the type signature ...". >>>> >>>> SO, I would expect that any consistent implementation of MPI >>>> specification handle this message length matching by itself, as I >>>> asked originally. >>>> >>>> Thanks, >>>> -- HR >>>> >>>> On Tue, Apr 7, 2015 at 6:03 AM, Howard Pritchard <hpprit...@gmail.com >>>> <mailto:hpprit...@gmail.com>> wrote: >>>> >>>> Hi HR, >>>> >>>> Sorry for not noticing the receive side earlier, but as Ralph >>>> implied earlier >>>> in this thread, the MPI standard has more strict type matching >>>> for collectives >>>> than for point to point. Namely, the number of bytes the >>>> receiver expects >>>> to receive from a given sender in the alltoallv must match the >>>> number of bytes >>>> sent by the sender. >>>> >>>> You were just getting lucky with the older open mpi. The error >>>> message >>>> isn't so great though. Its likely in the newer open mpi you are >>>> using a >>>> collective algorithm for alltoallv that assumes you're app is >>>> obeying the >>>> standard. >>>> >>>> You are correct that if the ranks don't know how much data will >>>> be sent >>>> to them from each rank prior to the alltoallv op, you will need >>>> to have some >>>> mechanism for exchanging this info prior to the alltoallv op. >>>> >>>> Howard >>>> >>>> >>>> 2015-04-06 23:23 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com >>>> <mailto:hr.anv...@gmail.com>>: >>>> >>>> Hello, >>>> >>>> If I set the size2 values according to your suggestion, which >>>> is the same values as on sending nodes, it works fine. >>>> But by definition it does not need to be exactly the same as >>>> the length of sent data, and it is just a maximum length of >>>> expected data to receive. If not, it is inevitable to run a >>>> allToAll() first to communicate the data sizes, and then >>>> doing the main allToAllV(), which is an expensive unnecessary >>>> communication overhead. >>>> >>>> I just created a reproducer in C++ which gives the error >>>> under OpenMPI 1.8.4, but runs correctly under OpenMPI 1.5.4 . >>>> (I've not included the Java version of this reproducer, which >>>> I think is not important as current version is enough to >>>> reproduce the error. But in case, it is straight forward to >>>> convert this code to Java). >>>> >>>> Thanks, >>>> -- HR >>>> >>>> On Mon, Apr 6, 2015 at 3:03 PM, Ralph Castain >>>> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: >>>> >>>> That would imply that the issue is in the underlying C >>>> implementation in OMPI, not the Java bindings. The >>>> reproducer would definitely help pin it down. >>>> >>>> If you change the size2 values to the ones we sent you, >>>> does the program by chance work? >>>> >>>> >>>>> On Apr 6, 2015, at 1:44 PM, Hamidreza Anvari >>>>> <hr.anv...@gmail.com <mailto:hr.anv...@gmail.com>> wrote: >>>>> >>>>> I'll try that as well. >>>>> Meanwhile, I found that my c++ code is running fine on a >>>>> machine running OpenMPI 1.5.4, but I receive the same >>>>> error under OpenMPI 1.8.4 for both Java and C++. >>>>> >>>>> On Mon, Apr 6, 2015 at 2:21 PM, Howard Pritchard >>>>> <hpprit...@gmail.com <mailto:hpprit...@gmail.com>> wrote: >>>>> >>>>> Hello HR, >>>>> >>>>> Thanks! If you have Java 1.7 installed on your >>>>> system would you mind trying to test against that >>>>> version too? >>>>> >>>>> Thanks, >>>>> >>>>> Howard >>>>> >>>>> >>>>> 2015-04-06 13:09 GMT-06:00 Hamidreza Anvari >>>>> <hr.anv...@gmail.com <mailto:hr.anv...@gmail.com>>: >>>>> >>>>> Hello, >>>>> >>>>> 1. I'm using Java/Javac version 1.8.0_20 under >>>>> OS X 10.10.2. >>>>> >>>>> 2. I have used the following configuration for >>>>> making OpenMPI: >>>>> ./configure --enable-mpi-java >>>>> >>>>> --with-jdk-bindir="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands" >>>>> >>>>> --with-jdk-headers="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers" >>>>> --prefix="/users/hamidreza/openmpi-1.8.4" >>>>> >>>>> make all install >>>>> >>>>> 3. As a logical point of view, size2 is the >>>>> maximum expected data to receive, which in turn >>>>> might be less that this maximum. >>>>> >>>>> 4. I will try to prepare a working reproducer of >>>>> my error and send it to you. >>>>> >>>>> Thanks, >>>>> -- HR >>>>> >>>>> On Mon, Apr 6, 2015 at 10:46 AM, Ralph Castain >>>>> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: >>>>> >>>>> I’ve talked to the folks who wrote the Java >>>>> bindings. One possibility we identified is >>>>> that there may be an error in your code when >>>>> you did the translation >>>>> >>>>>> My immediate thought is that each process >>>>>> can not receive more elements than it was >>>>>> sent to them. That's the reason of >>>>>> truncation error. >>>>>> >>>>>> These are the correct values: >>>>>> >>>>>> rank 0 - size2: 2,2,1,1 >>>>>> rank 1 - size2: 1,1,1,1 >>>>>> rank 2 - size2: 0,1,1,2 >>>>>> rank 3 - size2: 2,1,2,1 >>>>> >>>>> Can you check your code to see if perhaps >>>>> the values you are passing didn’t get >>>>> translated correctly from your C++ version >>>>> to the Java version? >>>>> >>>>> >>>>> >>>>>> On Apr 6, 2015, at 5:03 AM, Howard >>>>>> Pritchard <hpprit...@gmail.com >>>>>> <mailto:hpprit...@gmail.com>> wrote: >>>>>> >>>>>> Hello HR, >>>>>> >>>>>> It would also be useful to know which java >>>>>> version you are using, as well >>>>>> as the configure options used when building >>>>>> open mpi. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Howard >>>>>> >>>>>> >>>>>> >>>>>> 2015-04-05 19:10 GMT-06:00 Ralph Castain >>>>>> <r...@open-mpi.org <mailto:r...@open-mpi.org>>: >>>>>> >>>>>> If not too much trouble, can you >>>>>> extract just the alltoallv portion and >>>>>> provide us with a small reproducer? >>>>>> >>>>>> >>>>>>> On Apr 5, 2015, at 12:11 PM, Hamidreza >>>>>>> Anvari <hr.anv...@gmail.com >>>>>>> <mailto:hr.anv...@gmail.com>> wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am converting an existing MPI >>>>>>> program in C++ to Java using OpenMPI >>>>>>> 1.8.4, >>>>>>> At some point I have a allToAllv() >>>>>>> code which works fine in C++ but >>>>>>> receives error in Java version: >>>>>>> >>>>>>> MPI.COMM_WORLD.allToAllv(data, >>>>>>> subpartition_size, >>>>>>> subpartition_offset, MPI.INT >>>>>>> <http://mpi.int/>, >>>>>>> >>>>>>> data2,subpartition_size2,subpartition_offset2,MPI.INT >>>>>>> <http://mpi.int/>); >>>>>>> >>>>>>> Error: >>>>>>> *** An error occurred in MPI_Alltoallv >>>>>>> *** reported by process >>>>>>> [3621322753,9223372036854775811] >>>>>>> *** on communicator MPI_COMM_WORLD >>>>>>> *** MPI_ERR_TRUNCATE: message truncated >>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in >>>>>>> this communicator will now abort, >>>>>>> *** and potentially your MPI job) >>>>>>> 3 more processes have sent help >>>>>>> message help-mpi-errors.txt / >>>>>>> mpi_errors_are_fatal >>>>>>> Set MCA parameter >>>>>>> "orte_base_help_aggregate" to 0 to see >>>>>>> all help / error messages >>>>>>> >>>>>>> Here are the values for parameters: >>>>>>> >>>>>>> data.length = 5 >>>>>>> data2.length = 20 >>>>>>> >>>>>>> ---------- Rank 0 of 4 ---------- >>>>>>> subpartition_offset:0,2,3,3, >>>>>>> subpartition_size:2,1,0,2, >>>>>>> subpartition_offset2:0,5,10,15, >>>>>>> subpartition_size2:5,5,5,5, >>>>>>> ---------- >>>>>>> ---------- Rank 1 of 4 ---------- >>>>>>> subpartition_offset:0,2,3,4, >>>>>>> subpartition_size:2,1,1,1, >>>>>>> subpartition_offset2:0,5,10,15, >>>>>>> subpartition_size2:5,5,5,5, >>>>>>> ---------- >>>>>>> ---------- Rank 2 of 4 ---------- >>>>>>> subpartition_offset:0,1,2,3, >>>>>>> subpartition_size:1,1,1,2, >>>>>>> subpartition_offset2:0,5,10,15, >>>>>>> subpartition_size2:5,5,5,5, >>>>>>> ---------- >>>>>>> ---------- Rank 3 of 4 ---------- >>>>>>> subpartition_offset:0,1,2,4, >>>>>>> subpartition_size:1,1,2,1, >>>>>>> subpartition_offset2:0,5,10,15, >>>>>>> subpartition_size2:5,5,5,5, >>>>>>> ---------- >>>>>>> >>>>>>> Again, this is a code which works in >>>>>>> C++ version. >>>>>>> >>>>>>> Any help or advice is greatly appreciated. >>>>>>> >>>>>>> Thanks, >>>>>>> -- HR >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> <mailto:us...@open-mpi.org> >>>>>>> Subscription: >>>>>>> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> >>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26610.php >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> <mailto:us...@open-mpi.org> >>>>>> Subscription: >>>>>> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> >>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26613.php >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> >>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26615.php >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26616.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26617.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26620.php >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26622.php >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26623.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26631.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26634.php >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26637.php >>> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26648.php >> > > -- > Edgar Gabriel > Associate Professor > Parallel Software Technologies Lab http://pstl.cs.uh.edu > Department of Computer Science University of Houston > Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA > Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 > -- > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26649.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/