On Wed, Mar 25, 2020 at 4:49 AM Raut, S Biplab <biplab.r...@amd.com> wrote:

> [AMD Official Use Only - Internal Distribution Only]
>
>
>
> Dear George,
>
>                         Thank you the reply. But my question is more
> particularly on the message size from application side.
>
>
>
> Let’s say the application is running with 128 ranks.
>
> Each rank is doing send() msg to rest of 127 ranks where the msg length
> sent is under question.
>
> Now after all the sends are completed, each rank will recv() msg from rest
> of 127 ranks.
>
> Unless the msg length in the sending part is within eager_limit (4K size),
> this program will hang.
>

This is definitively not true, one can imagine many communication patterns
that will ensure correctness for your all-to-all communications. As an
example, you can place your processes in a virtual ring, and at each step
send and recv to/from process (my_rank + step) % comm_size. This
communication pattern will always be correct, independent of the eager size
(for as long as you correctly order the send/recv for each pair).

 So, based on the above scenario, my questions are:-
>

>    1. Can each of the rank send message upto 4K size successfully, i.e
>    all 128 ranks sending (128 * 4K) bytes simultaneously?
>
> Potentially yes, but there are physical constraints (aka number of network
links, switches capabilities, ... ) and memory limits. But if you have
enough memory, this could potentially work. I'm not saying this is correct
and should be done.


>    1. If application has bigger msg to be sent by each rank, then how to
>    derive the send message size? Is it equal to eager_limit and each rank
>    needs to send multiple chunks of this size?
>
> Definitively not! You should never rely on the eager size to fix a complex
communication pattern. The rule of thumb should be: Is my application
working correctly if the MPI forces a zero-bytes eager size. As suggested
above, the most suitable approach is to define a communication scheme that
would never deadlock.

  George.


> With Regards,
>
> S. Biplab Raut
>
>
>
> *From:* George Bosilca <bosi...@icl.utk.edu>
> *Sent:* Tuesday, March 24, 2020 9:01 PM
> *To:* Open MPI Users <users@lists.open-mpi.org>
> *Cc:* Raut, S Biplab <biplab.r...@amd.com>
> *Subject:* Re: [OMPI users] Regarding eager limit relationship to send
> message size
>
>
>
> [CAUTION: External Email]
>
> Biplab,
>
>
>
> The eager is a constant for each BTL, and it represent the data that is
> sent eagerly with the matching information out of the entire message. So,
> if the question is how much memory is needed to store all the
> eager messages then the answer will depend on the communication pattern of
> your application:
>
> - applications using only blocking messages might only have 1 pending
> communications per peer, so in the worst case any process will only need at
> most P * eager_size memory for local storage of the eager.
>
> - applications using non-blocking communications, there is basically no
> limit.
>
>
>
> However, the good news is that you can change this limit to adapt to the
> needs of your application(s).
>
>
>
> Hope this answers your question,
>
> George.
>
>
>
>
>
> On Tue, Mar 24, 2020 at 1:46 AM Raut, S Biplab via users <
> users@lists.open-mpi.org> wrote:
>
> Dear Experts,
>
>                         I would like to derive/calculate the maximum MPI
> send message size possible  given the known details of
> btl_vader_eager_limit and number of ranks.
>
> Can anybody explain and confirm on this?
>
>
>
> With Regards,
>
> S. Biplab Raut
>
>

Reply via email to