On Jul 15, 2007, at 10:05 PM, Isaac Huang wrote:

Hello, I read from the FAQ that current Open MPI releases don't
support end-to-end data reliability. But I still have some confusing
that can't be solved by googling or reading the FAQ:

1. I read from "MPI - The Complete Reference" that "MPI provides the
user with reliable message transmission. A message sent is always
received correctly, and the user does not need to check for
transmission errors, timeouts, or other error conditions." But the
standard is sort of vague about what exactly this "reliable message
transmission" is. Does it at least require reliable delivery? Or, does
Open MPI notice and re-transmit lost data?

Yes, the MPI standard guarantees message is reliably delivered in order. MPI implementations have taken this to mean that if the transport is "reliable", then the MPI doesn't have to do anything special. So we assume that TCP delivers data into our headers properly and same for shared memory, Myrinet, and InfiniBand (the RC protocol, anyway). We also assume that any data sent arrives on the other side.

We have an experimental point-to-point engine, DR, that provides reliable transportation even for networks that have corruption and/or packet loss. The engine isn't available in a stable release, as it is still in the experimental phase. Checksums and timers are used to detect message corruption and recover. This allows us to play with non-reliable network protocols such as UDP or InfiniBand's UD protocol.

In truth, however, the reliability guaranteed by the transports currently in use by Open MPI are more than enough to meet the needs of almost all users. Most of the supported networks have some type of error detection or correction that provides protection only slightly statistically worse than what we could provide within Open MPI, but at a much lower cost.

2. When a data corruption happens (in message data), is the data in
the message envelop still reliable? Or, does Open MPI or the MPI
standard guarantee data integrity of message envelops? I'm
particularly interested in MPI_TAG which I use to encode things.

In my opinion, any guarantee that applies to the message applies to the meta-data (tag, source, length) as well. The DR component will provide the same level of protection to the headers as it does to the payload.

Brian


--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory


Reply via email to