To keep this out of the weeds, I have attached a program called "bug3"
that illustrates this problem on openmpi 1.2.5 using the openib BTL. In
bug3 process with rank 0 uses all available memory buffering
"unexpected" messages from its neighbors.

Bug3 is a test-case derived from a real, scalable application (desmond
for molecular dynamics) that several experienced MPI developers have
worked on. Note the MPI_Send calls of processes N>0 are *blocking*; the
openmpi silently sends them in the background and overwhelms process 0
due to lack of flow control.

It may not be hard to change desmond to work around openmpi's small
message semantics, but a programmer should reasonably be allowed to
think a blocking send will block if the receiver cannot handle it yet.

Federico

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Brightwell, Ronald
Sent: Monday, February 04, 2008 3:30 PM
To: Patrick Geoffray
Cc: Open MPI Users
Subject: Re: [OMPI users] openmpi credits for eager messages

> > I'm looking at a network where the number of endpoints is large
enough that
> > everybody can't have a credit to start with, and the "offender"
isn't any
> > single process, but rather a combination of processes doing N-to-1
where N
> > is sufficiently large.  I can't just tell one process to slow down.
I have
> > to tell them all to slow down and do it quickly...
> 
> When you have N->1 patterns, then the hardware flow-control will
> throttle the senders, or drop packets if there is no hardware
> flow-control. If you don't have HOL blocking but the receiver does not
> consume for any reasons (busy, sleeping, dead, whatever), then you can
> still drop packets on the receiver (NIC, driver, thread) at a last
> resort, this is what TCP does. The key is have exponential backoff (or
a
> reasonably large resend timeout) to no continue the hammering.
> 
> It costs nothing in the common case (unlike the credits approach), but
> it does handle corner cases without affecting too much other nodes
> (unlike hardware flow-control).

Right.  For a sufficiently large number of endpoints, flow control has
to get
pushed out of MPI and down into the network, which is why I don't
necesarily
want an MPI that does flow control at the user-level.

> 
> But you know all that. You are just being mean to your users because
you
> can :-) The sick part is that I think I envy you...

You know it :)

-Ron


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Attachment: bug3.c
Description: bug3.c

Reply via email to