Thank you Jeff and George for your enlightening comments and inputs !!
I will design my application algorithm appropriately.

With Regards,
S. Biplab Raut

From: George Bosilca <bosi...@icl.utk.edu>
Sent: Thursday, March 26, 2020 8:00 PM
To: Jeff Squyres (jsquyres) <jsquy...@cisco.com>
Cc: Raut, S Biplab <biplab.r...@amd.com>; Open MPI User's List 
<users@lists.open-mpi.org>
Subject: Re: [OMPI users] Regarding eager limit relationship to send message 
size

[CAUTION: External Email]
An application that rely on MPI eager buffers for correctness or performance is 
an incorrect application. Among many other points simply because MPI 
implementations without support for eager are legit. Moreover, these 
applications also miss the point on performance. Among the overheads I am not 
only talking about the memory allocations by MPI to store the eager data, or 
the additional memcpy needed to put that data back into userland once the 
corresponding request is posted. But also about stressing the unexpected 
messages path in the MPI library, creating potentially long chains of 
unexpected messages that need to be traversed in order to guarantee the FIFO 
matching required by MPI.

In the same idea as Jeff, if you want a portable and efficient MPI application 
then assume eager is always 0 and prepost all your receives.

  George.

PS: In OMPI the eager size is provided by the underlying transport, so the BTL, 
and can be changed this via MCA. 'ompi_info --param btl all -l 4 | grep eager' 
should give you the full list.

On Thu, Mar 26, 2020 at 10:00 AM Jeff Squyres (jsquyres) 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
On Mar 26, 2020, at 5:36 AM, Raut, S Biplab 
<biplab.r...@amd.com<mailto:biplab.r...@amd.com>> wrote:
>
> I am doing pairwise send-recv and not all-to-all since not all the data is 
> required by all the ranks.
> And I am doing blocking send and recv calls since there are multiple 
> iterations of such message chunks to be sent with synchronization.
>
> I understand your recommendation in the below mail, however I still see 
> benefit for my application level algorithm to do pairwise send-recv chunks 
> where each chunk is within eager limit.
> Since the input and output buffer is same within the process, so I can avoid 
> certain buffering at each sender rank by doing successive send calls within 
> eager limit to receiver ranks and then have recv calls.

But if the buffers are small enough to fall within the eager limit, there's 
very little benefit to not having an A/B buffering scheme.  Sure, it's 2x the 
memory, but it's 2 times a small number (measured in KB).  Assuming you have GB 
of RAM, it's hard to believe that this would make a meaningful difference.  
Indeed, one way to think of the eager limit is: "it's small enough that the 
cost of a memcpy doesn't matter."

I'm not sure I understand your comments about preventing copying.  MPI will 
always do the most efficient thing to send the message, regardless of whether 
it is under the eager limit or not.  I also don't quite grok your comments 
about "application buffering" and message buffering required by the eager 
protocol.

The short version of this is: you shouldn't worry about any of this.  Rely on 
the underlying MPI to do the most efficient thing possible, and you should use 
a communication algorithm that makes sense for your application.  In most 
cases, you'll be good.

If you start trying to tune for a specific environment, platform, and MPI 
implementation, the number of variables grows exponentially.  And if you change 
any one parameter in the whole setup, your optimizations may get lost.  Also, 
if you add a bunch of infrastructure in your app to try to exactly match your 
environment+platform+implementation (e.g., manual segmenting to fit your 
overall message into the eager limit), you may just be adding additional 
overhead that effectively nullifies any optimization you might get (especially 
if the optimization is very small).  Indeed, the methods used for shared memory 
and similar to but different than the methods used for networks.  And there's a 
wide variety of network capabilities; some can be more efficient than others 
(depending on a zillion factors).

If you're using shared memory, ensure that your Linux kernel has good shared 
memory support (e.g., support for CMA), and let MPI optimize the message 
transfers for you.

--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>

Reply via email to