subject:"Re\: \[Boost\-mpi\] multiple irecv tests failure with MPI_ERR

Re: [Boost-mpi] multiple irecv tests failure with MPI_ERR_TRUNCATE

2014-02-23 Thread Walter Woods

Unless there is a significant performance penalty in each
boost::mpi::communicator holding onto a shadow version of itself, I think
the headaches this would save would be worth it.  Simply put, the current
behavior is just very annoying to debug, and it is a somewhat common use
case (especially in applications that know they are receiving exactly N
messages, and want to wait on them simultaneously).

Thanks,
Walt


On Sun, Feb 23, 2014 at 8:18 AM, Matthias Troyer tro...@phys.ethz.chwrote:

 Indeed, that is the problem. If we don't want to reserve certain tags for
 internal use of Boost.MPI then the only secure way of solving this problem
 is to create a copy of the communicator, and send the actual message using
 a unique tag in this shadow communicator. We so far hesitated to implement
 this procedure, thinking it to be very unlikely that a user would send a
 second message with the same tag before the first one is received. if this
 should turn out to be a common usage case then we can consider the solution
 I outlined. Does anyone see problems with that solution?

 Matthias



 On 23 Feb 2014, at 04:39, Walter Woods woodswal...@gmail.com wrote:

  seems to indicate that MPI guarantees that sends and recvs are kept ordered
 on a single-threaded process not using MPI_ANY_SOURCE. If that is the case
 then boost::mpi should as well.

 Right, so they are ordered, and that's the problem.

 boost::mpi needs to know exactly the size of data that it's receiving.
  So, if you If you're sending / receiving a non-native type, boost::mpi
 needs to transmit how big that data is going to be.  Then, it sends the
 data.  So one send becomes two sends to MPI - these are ordered.

 Receiving is the opposite - it uses one receive to get the size, and then 
 *after
 it has the size*, issues another receive to get the data.  If you issue
 one irecv command before another has gotten its length (and thus issued its
 data irecv command internally), then because of message ordering, the first
 irecv will get the length, as expected, but then the second irecv will get
 the first's data, mistaking it for a length submission.

 Hopefully that makes sense.  It's an interleaving problem - because
 everything is ordered, but irecvs turn into two underlying MPI irecvs, the
 two boost::mpi irecvs interleave, causing the problem.


 On Fri, Feb 21, 2014 at 5:52 PM, Roy Hashimoto roy.hashim...@gmail.comwrote:

 On Fri, Feb 21, 2014 at 11:49 AM, Walter Woods woodswal...@gmail.comwrote:

 In Roy's case, especially the test file, the problem is having multiple
 irecv's happening.  Lookat the underlying request::handle_serialized_irecv
 implementation in boost/mpi/communicator.hpp - one recv is accomplished
 through several MPI_IRecv requests issued in sequence.  If you have several
 irecvs running at once, then one is likely to get the other's data as its
 length.


 Thanks for your reply and looking at the boost::mpi source - I haven't
 got that far. I understand what you're saying, but the first few paragraphs
 of this page:

  http://www.mpi-forum.org/docs/mpi-1.1/mpi-11-html/node41.html

 seems to indicate that MPI guarantees that sends and recvs are kept
 ordered on a single-threaded process not using MPI_ANY_SOURCE. If that is
 the case then boost::mpi should as well.


 In other words, if you want to receive multiple messages in the same
 tag, be sure to only have one IRecv() with that tag running at a time.
  Data may only be transferred serially (not in parallel) over a single tag
 anyhow.


 I did change my development code to do this.

 Hope that helps,

 Walt


 It does, thanks!

 Roy

 ___
 Boost-mpi mailing list
 Boost-mpi@lists.boost.org
 http://lists.boost.org/mailman/listinfo.cgi/boost-mpi


 ___
 Boost-mpi mailing list
 Boost-mpi@lists.boost.org
 http://lists.boost.org/mailman/listinfo.cgi/boost-mpi



 ___
 Boost-mpi mailing list
 Boost-mpi@lists.boost.org
 http://lists.boost.org/mailman/listinfo.cgi/boost-mpi


___
Boost-mpi mailing list
Boost-mpi@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-mpi

Re: [Boost-mpi] multiple irecv tests failure with MPI_ERR_TRUNCATE

2014-02-21 Thread Walter Woods

Hey,

Wasn't quite sure how to reply to Roy's e-mail since I was unsubscribed
before, but saw it was recent so wanted to chime in.

I have this exact same problem.  What I find interesting is that in my
case, where I call irecv() from a number of other ranks directly, but with
any_tag:

world.irecv(1, mpi::any_tag, data);

I only have issues when there are more processes than those I am listening
to.  I find this to be especially odd.  The issue seems like the second
receive (after the data count) is picking up a separate ISend's count.  I'm
still looking into what the issue could be...

Thanks,
Walt
___
Boost-mpi mailing list
Boost-mpi@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-mpi

Re: [Boost-mpi] multiple irecv tests failure with MPI_ERR_TRUNCATE

Re: [Boost-mpi] multiple irecv tests failure with MPI_ERR_TRUNCATE

2 matches

Site Navigation

Mail list logo

Footer information