Re: [OMPI users] end-to-end data reliability

Brian Barrett Mon, 16 Jul 2007 10:43:50 -0400

On Jul 15, 2007, at 10:05 PM, Isaac Huang wrote:

Hello, I read from the FAQ that current Open MPI releases don't
support end-to-end data reliability. But I still have some confusing
that can't be solved by googling or reading the FAQ:


1. I read from "MPI - The Complete Reference" that "MPI provides the
user with reliable message transmission. A message sent is always
received correctly, and the user does not need to check for
transmission errors, timeouts, or other error conditions." But the
standard is sort of vague about what exactly this "reliable message
transmission" is. Does it at least require reliable delivery? Or, does
Open MPI notice and re-transmit lost data?

Yes, the MPI standard guarantees message is reliably delivered inorder. MPI implementations have taken this to mean that if thetransport is "reliable", then the MPI doesn't have to do anythingspecial. So we assume that TCP delivers data into our headersproperly and same for shared memory, Myrinet, and InfiniBand (the RCprotocol, anyway). We also assume that any data sent arrives on theother side.

We have an experimental point-to-point engine, DR, that providesreliable transportation even for networks that have corruption and/orpacket loss. The engine isn't available in a stable release, as itis still in the experimental phase. Checksums and timers are used todetect message corruption and recover. This allows us to play withnon-reliable network protocols such as UDP or InfiniBand's UD protocol.

In truth, however, the reliability guaranteed by the transportscurrently in use by Open MPI are more than enough to meet the needsof almost all users. Most of the supported networks have some typeof error detection or correction that provides protection onlyslightly statistically worse than what we could provide within OpenMPI, but at a much lower cost.

2. When a data corruption happens (in message data), is the data in
the message envelop still reliable? Or, does Open MPI or the MPI
standard guarantee data integrity of message envelops? I'm
particularly interested in MPI_TAG which I use to encode things.

In my opinion, any guarantee that applies to the message applies tothe meta-data (tag, source, length) as well. The DR component willprovide the same level of protection to the headers as it does to thepayload.


Brian


--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory

Re: [OMPI users] end-to-end data reliability

Reply via email to