I’m afraid we’ll have to get someone from the Forum to interpret (Howard is a member as well), but here is what I see just below that, in the description section:
The type signature associated with sendcounts[j], sendtype at process i must be equal to the type signature associated with recvcounts[i], recvtype at process j. This implies that the amount of data sent must be equal to the amount of data received, pairwise between every pair of processes > On Apr 7, 2015, at 9:56 AM, Hamidreza Anvari <hr.anv...@gmail.com> wrote: > > Hello, > > Thanks for your description. > I'm currently doing allToAll() prior to allToAllV(), to communicate length of > expected messages. > . > BUT, I still strongly believe that the right implementation of this method is > something that I expected earlier! > If you check the MPI specification here: > > http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf > <http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf> > Page 170 > Line 14 > > It is mentioned that "... the number of elements that CAN be received...". > which implies that the actual received message may have shorter length. > > While in cases where it is mandatory to have same value, the modal "MUST" is > used. for example at page 171 Line 1, it is mentioned that "... sendtype at > process i MUST be equal to the type signature ...". > > SO, I would expect that any consistent implementation of MPI specification > handle this message length matching by itself, as I asked originally. > > Thanks, > -- HR > > On Tue, Apr 7, 2015 at 6:03 AM, Howard Pritchard <hpprit...@gmail.com > <mailto:hpprit...@gmail.com>> wrote: > Hi HR, > > Sorry for not noticing the receive side earlier, but as Ralph implied earlier > in this thread, the MPI standard has more strict type matching for collectives > than for point to point. Namely, the number of bytes the receiver expects > to receive from a given sender in the alltoallv must match the number of bytes > sent by the sender. > > You were just getting lucky with the older open mpi. The error message > isn't so great though. Its likely in the newer open mpi you are using a > collective algorithm for alltoallv that assumes you're app is obeying the > standard. > > You are correct that if the ranks don't know how much data will be sent > to them from each rank prior to the alltoallv op, you will need to have some > mechanism for exchanging this info prior to the alltoallv op. > > Howard > > > 2015-04-06 23:23 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com > <mailto:hr.anv...@gmail.com>>: > Hello, > > If I set the size2 values according to your suggestion, which is the same > values as on sending nodes, it works fine. > But by definition it does not need to be exactly the same as the length of > sent data, and it is just a maximum length of expected data to receive. If > not, it is inevitable to run a allToAll() first to communicate the data > sizes, and then doing the main allToAllV(), which is an expensive unnecessary > communication overhead. > > I just created a reproducer in C++ which gives the error under OpenMPI 1.8.4, > but runs correctly under OpenMPI 1.5.4 . > (I've not included the Java version of this reproducer, which I think is not > important as current version is enough to reproduce the error. But in case, > it is straight forward to convert this code to Java). > > Thanks, > -- HR > > On Mon, Apr 6, 2015 at 3:03 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > That would imply that the issue is in the underlying C implementation in > OMPI, not the Java bindings. The reproducer would definitely help pin it down. > > If you change the size2 values to the ones we sent you, does the program by > chance work? > > >> On Apr 6, 2015, at 1:44 PM, Hamidreza Anvari <hr.anv...@gmail.com >> <mailto:hr.anv...@gmail.com>> wrote: >> >> I'll try that as well. >> Meanwhile, I found that my c++ code is running fine on a machine running >> OpenMPI 1.5.4, but I receive the same error under OpenMPI 1.8.4 for both >> Java and C++. >> >> On Mon, Apr 6, 2015 at 2:21 PM, Howard Pritchard <hpprit...@gmail.com >> <mailto:hpprit...@gmail.com>> wrote: >> Hello HR, >> >> Thanks! If you have Java 1.7 installed on your system would you mind trying >> to test against that version too? >> >> Thanks, >> >> Howard >> >> >> 2015-04-06 13:09 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com >> <mailto:hr.anv...@gmail.com>>: >> Hello, >> >> 1. I'm using Java/Javac version 1.8.0_20 under OS X 10.10.2. >> >> 2. I have used the following configuration for making OpenMPI: >> ./configure --enable-mpi-java >> --with-jdk-bindir="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands" >> >> --with-jdk-headers="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers" >> --prefix="/users/hamidreza/openmpi-1.8.4" >> >> make all install >> >> 3. As a logical point of view, size2 is the maximum expected data to >> receive, which in turn might be less that this maximum. >> >> 4. I will try to prepare a working reproducer of my error and send it to you. >> >> Thanks, >> -- HR >> >> On Mon, Apr 6, 2015 at 10:46 AM, Ralph Castain <r...@open-mpi.org >> <mailto:r...@open-mpi.org>> wrote: >> I’ve talked to the folks who wrote the Java bindings. One possibility we >> identified is that there may be an error in your code when you did the >> translation >> >>> My immediate thought is that each process can not receive more elements >>> than it was sent to them. That's the reason of truncation error. >>> >>> These are the correct values: >>> >>> rank 0 - size2: 2,2,1,1 >>> rank 1 - size2: 1,1,1,1 >>> rank 2 - size2: 0,1,1,2 >>> rank 3 - size2: 2,1,2,1 >> >> Can you check your code to see if perhaps the values you are passing didn’t >> get translated correctly from your C++ version to the Java version? >> >> >> >>> On Apr 6, 2015, at 5:03 AM, Howard Pritchard <hpprit...@gmail.com >>> <mailto:hpprit...@gmail.com>> wrote: >>> >>> Hello HR, >>> >>> It would also be useful to know which java version you are using, as well >>> as the configure options used when building open mpi. >>> >>> Thanks, >>> >>> Howard >>> >>> >>> >>> 2015-04-05 19:10 GMT-06:00 Ralph Castain <r...@open-mpi.org >>> <mailto:r...@open-mpi.org>>: >>> If not too much trouble, can you extract just the alltoallv portion and >>> provide us with a small reproducer? >>> >>> >>>> On Apr 5, 2015, at 12:11 PM, Hamidreza Anvari <hr.anv...@gmail.com >>>> <mailto:hr.anv...@gmail.com>> wrote: >>>> >>>> Hello, >>>> >>>> I am converting an existing MPI program in C++ to Java using OpenMPI 1.8.4, >>>> At some point I have a allToAllv() code which works fine in C++ but >>>> receives error in Java version: >>>> >>>> MPI.COMM_WORLD.allToAllv(data, subpartition_size, subpartition_offset, >>>> MPI.INT <http://mpi.int/>, >>>> data2,subpartition_size2,subpartition_offset2,MPI.INT <http://mpi.int/>); >>>> >>>> Error: >>>> *** An error occurred in MPI_Alltoallv >>>> *** reported by process [3621322753,9223372036854775811] >>>> *** on communicator MPI_COMM_WORLD >>>> *** MPI_ERR_TRUNCATE: message truncated >>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>> *** and potentially your MPI job) >>>> 3 more processes have sent help message help-mpi-errors.txt / >>>> mpi_errors_are_fatal >>>> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error >>>> messages >>>> >>>> Here are the values for parameters: >>>> >>>> data.length = 5 >>>> data2.length = 20 >>>> >>>> ---------- Rank 0 of 4 ---------- >>>> subpartition_offset:0,2,3,3, >>>> subpartition_size:2,1,0,2, >>>> subpartition_offset2:0,5,10,15, >>>> subpartition_size2:5,5,5,5, >>>> ---------- >>>> ---------- Rank 1 of 4 ---------- >>>> subpartition_offset:0,2,3,4, >>>> subpartition_size:2,1,1,1, >>>> subpartition_offset2:0,5,10,15, >>>> subpartition_size2:5,5,5,5, >>>> ---------- >>>> ---------- Rank 2 of 4 ---------- >>>> subpartition_offset:0,1,2,3, >>>> subpartition_size:1,1,1,2, >>>> subpartition_offset2:0,5,10,15, >>>> subpartition_size2:5,5,5,5, >>>> ---------- >>>> ---------- Rank 3 of 4 ---------- >>>> subpartition_offset:0,1,2,4, >>>> subpartition_size:1,1,2,1, >>>> subpartition_offset2:0,5,10,15, >>>> subpartition_size2:5,5,5,5, >>>> ---------- >>>> >>>> Again, this is a code which works in C++ version. >>>> >>>> Any help or advice is greatly appreciated. >>>> >>>> Thanks, >>>> -- HR >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26610.php >>>> <http://www.open-mpi.org/community/lists/users/2015/04/26610.php> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26613.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26613.php> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26615.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26615.php> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26616.php >> <http://www.open-mpi.org/community/lists/users/2015/04/26616.php> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26617.php >> <http://www.open-mpi.org/community/lists/users/2015/04/26617.php> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26620.php >> <http://www.open-mpi.org/community/lists/users/2015/04/26620.php> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26622.php >> <http://www.open-mpi.org/community/lists/users/2015/04/26622.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26623.php > <http://www.open-mpi.org/community/lists/users/2015/04/26623.php> > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26631.php > <http://www.open-mpi.org/community/lists/users/2015/04/26631.php> > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26634.php > <http://www.open-mpi.org/community/lists/users/2015/04/26634.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26637.php