On Jul 15, 2007, at 10:05 PM, Isaac Huang wrote:
Hello, I read from the FAQ that current Open MPI releases don't
support end-to-end data reliability. But I still have some confusing
that can't be solved by googling or reading the FAQ:
1. I read from "MPI - The Complete Reference" that "MPI provides the
user with reliable message transmission. A message sent is always
received correctly, and the user does not need to check for
transmission errors, timeouts, or other error conditions." But the
standard is sort of vague about what exactly this "reliable message
transmission" is. Does it at least require reliable delivery? Or, does
Open MPI notice and re-transmit lost data?
Yes, the MPI standard guarantees message is reliably delivered in
order. MPI implementations have taken this to mean that if the
transport is "reliable", then the MPI doesn't have to do anything
special. So we assume that TCP delivers data into our headers
properly and same for shared memory, Myrinet, and InfiniBand (the RC
protocol, anyway). We also assume that any data sent arrives on the
other side.
We have an experimental point-to-point engine, DR, that provides
reliable transportation even for networks that have corruption and/or
packet loss. The engine isn't available in a stable release, as it
is still in the experimental phase. Checksums and timers are used to
detect message corruption and recover. This allows us to play with
non-reliable network protocols such as UDP or InfiniBand's UD protocol.
In truth, however, the reliability guaranteed by the transports
currently in use by Open MPI are more than enough to meet the needs
of almost all users. Most of the supported networks have some type
of error detection or correction that provides protection only
slightly statistically worse than what we could provide within Open
MPI, but at a much lower cost.
2. When a data corruption happens (in message data), is the data in
the message envelop still reliable? Or, does Open MPI or the MPI
standard guarantee data integrity of message envelops? I'm
particularly interested in MPI_TAG which I use to encode things.
In my opinion, any guarantee that applies to the message applies to
the meta-data (tag, source, length) as well. The DR component will
provide the same level of protection to the headers as it does to the
payload.
Brian
--
Brian W. Barrett
Networking Team, CCS-1
Los Alamos National Laboratory