Re: [OMPI users] Fault Tolerance & Behavior

Galen M. Shipman Tue, 31 Oct 2006 10:43:14 -0500

Galen M. Shipman wrote:

Gleb Natapov wrote:
On Mon, Oct 30, 2006 at 11:45:53AM -0700, Troy Telford wrote:
On Sun, 29 Oct 2006 01:34:06 -0700, Gleb Natapov <gl...@voltaire.com>wrote:
If you use OB1 PML (default one) it will never recover from link down
error no matter how many other transports you have. The reason is that
OB1 never tracks what happens with buffers submitted to BTL. So if BTL
can't, for any reason, transmit packet passed to it by OB1 the job will
stuck because OB1 doesn't have enough information to try to resend the
packet via another transport. For this kind of resource tracking there
is DR PML. In case of IB BTL link down event generates error for each
packet submitted for sending to the device. IB BTL simply discards all
this packets and relies on PML to resend them, so even after link up
event a job will not recover if OB1 PML is used with IB BTL. This may be
different with another transports.
This makes sense; one thing I'm wondering now is if the OB1 PML is able(and/or had enough information) to figure out that it can't continue atall, and will abort the job.
In case of openib BTL I don't see how job may recover from link down
event so I think aborting the job is the right thing to do. In case of
other transports if transport can continue after link up event as if
nothing happened it is worth to wait for link up. This capability may be
added to openib BTL too, it's just nobody cares enough.
Ethernet doesn't fail in this case because the TCP stack handles thisgracefully. The same behavior can be observed when disconnecting anethernet cable while a ssh session exists, plug it back in and you areprobably good to go, after a bit of time (due to exponential backoff onretrans). For GM/MX over myrinet the timeout is quite high on connectiondown and the software stack handles this gracefully. For IB the linkstate transitions from LinkActive to LinkActDefer until LinkDownTimeoutexpires and the link transitions to LinkDown state.From the spec: LinkDownTimeout occurs when the port state machine hascontinuously been in the LinkActDefer state for 10ms + 3% /-51% .. Ihave no idea what that formula means, perhaps my pdf of the spec ismessed up.

Okay, so these are percentage not modulus, the formula makes some sensenow..so the timeout is between 4.9 and 10.3 ms, you had better plug the cablein/out very quickly ;-)

So transitioning to the LinkDown state is dictated by the IB spec, itwould seem that we would want to defer the transition based on a userconfigurable parameter, this is link layer so it would probably benecessary to do this when loading the IB driver. Am I interpreting thiscorrectly?
- Galen
--
                        Gleb.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Fault Tolerance & Behavior

Reply via email to