Hi All

Joe Landman wrote:
Ralph Castain wrote:

Ummm....not to put gasoline on the fire, but...if the data exchange is blocking, why do you need to call a barrier op first? Just use an appropriate blocking data exchange call (collective or whatever) and it will "barrier" anyway.

Since I don't run these codes, I would have to defer to those that do.

This said, I am not sure if they are running the coupling as two separate MPI codes or as one code sharing a communications handle or whatnot else.

In coupled climate models, both paradigms are used:
MPMD and SPMD.

MPMD tends to appear in models that have "components" developed by
different organizations (ocean comes from one national lab,
the atmosphere from some other place, etc).
Short from writing all code from scratch,
one writes a model "coupler" and modifies the "components"
to talk to each other using the "coupler" as kind of a translator
or middle man.

SPMD in general comes from a single organization,
and are in many cases the children of a master coding plan
or "framework".
This also includes redesign and adaption of previous codes,
as nobody writes everything from scratch.
Disguised within those codes there is always the
"coupler" middle man anyway.

So, the choice of MPMD or SPMD
doesn't seem to be made based on software engineering alone,
but maybe on a tad of politics,
a bit of convenience and cost savings
(who wants or can afford to rewrite all that code from scratch?),
etc.

It may be just coincidence or my bias in favor of clusters,
but I found the MPMD codes better structured,
less monolithic, and they certainly produce executables
of smaller size than the SPMD codes, which is a good thing if
you are running on a cluster with limited RAM per core.
They also tend to run faster, and tend to
observe "Jeff's rule" at least in part.

The current global climate model
schemes have ocean, atmosphere, sea ice, land processes,
and a mass/energy/momentum flux coupler.
These may be 5 separate components, or some parts may be
merged with another
(e.g. sea ice being a "module" in the ocean code, or land processes
integrated with the atmosphere).
Newer schemes may include the biosphere,
multi-component atmosphere (with stratospheric processes,
atmospheric chemistry processes, cloud convection,etc),
solid earth processes (volcanic eruptions,
carbon sequestration, etc), and the list goes on and on.

The flux coupler is a natural barrier (bottleneck?),
as it coordinates the data exchanges across all model components.
It is present in both MPMD and, in a somewhat disguised way,
in SPMD schemes.

Every single component is actually a domain decomposition code.
This entails the type of natural synchronization during exchanges of
data across the sub-domain boundaries,
as somebody else already mentioned.

However, even within a single component, say the atmosphere,
there are various physical processes with different
time and spatial scales (as noted by Gerry and Joe).
Those processes require some type of
synchronization to exchange information
with other physical processes.
Moreover, these processes tend to be modeled in a sequential,
rather than parallel way.
This is the way we think about them, not only the way we code them.
E.g. first the atmospheric radiation balance "module"
does its thing, then the atmospheric thermodynamics "module"
takes over, then the actual dynamics "module"
calculates the winds, advects moisture, etc.
This is a somewhat sequential view of how natural processes occur
that has plenty of natural barriers/synchronization points built in.

As for Jeff's rule,
my observation is that there was a significant movement
to reduce the use of blocking send/recv
towards using non-blocking calls,
towards reducing the number of unnecessary barriers
(although they were not completely weeded out, of course), etc.
However, this is on a somewhat small scale of coding.
This "get rid of barriers" action
tends to be restricted to each "module"
that represents a specific physical process.
This is Jeff's rule put to work.
It is very beneficial,
and improves code efficiency a lot.

However, as Durga noticed, Jeff's rule is perhaps hard to
apply to the large scale.
As I mentioned, one reason is because the way we think
about interacting physical processes is (intrinsically ?)
somewhat sequential, has (natural or conceptual?) barriers
embedded on it.
Another reason may be the way these codes are developed,
say, whether there should be a code architect wizard who designs
a master plan, or some form of integration and adaption of
well proven existing code, or something else.

My two cents,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

I do agree that letting the data exchange provide a (natural) barrier makes a great deal of sense, though the codes may not be amenable to this mode of operations. Gerry could likely shed light on this.





Reply via email to