Re: [OMPI users] Subcommunicator communications do not complete intermittently

2022-09-12 Thread Niranda Perera via users
Hi Joachim and George,

Thank you for your response. I tried with MPI_Issend instead of MPI_Isend
calls (I only have isends). For smaller parallelism it still works without
any deadlocks. But the deadlocks are still there at larger parallelisms.
One thing I've forgotten to mention was that, if I take off bcast/ gather
from the comm_world, and just pass comm_world to my foo function, it works
perfectly fine for any parallelism (tested up to 720).
Let me try MUST/ TotalView and see.

I am working on a distributed memory dataframe system, Cylon [1]. As a part
of an integration, we are using a Parsl [2] to schedule cylon applications.
Parsl provides Cylon a subcommunicator with requested parallelism while it
uses the comm_world to bcast tasks and gather outputs from each cylon
worker. Cylon abstracts out communication impls and currently supports MPI,
gloo and UCX. I've tested the same scenario on Gloo (gloo can spawn a comm
network using an MPI Communicator), and haven't had any problems with it
(apart from it being slower than MPI).

Best

[1] https://github.com/cylondata/cylon
[2] https://github.com/Parsl/parsl

On Sun, Sep 11, 2022 at 6:39 PM Protze, Joachim via users <
users@lists.open-mpi.org> wrote:

> Hi,
>
> A source of sudden deadlocks at larger scale can be a change of send
> behavior from buffered to synchronous mode. You can try whether your
> application deadlocks at smaller scale, if you replace all send by ssend
> (e.g., add`#define MPI_Send MPI_Ssend` and `#define MPI_Isend MPI_Issend`
> after the include of the MPI header).
> An application with correct communication pattern should run with
> synchronous send without deadlock.
> To check for other deadlock pattern in your application you can use tools
> like MUST [1] or Totalview.
>
> Best
> Joachim
>
>
> [1] https://itc.rwth-aachen.de/must/
> --
> *From:* users  on behalf of George
> Bosilca via users 
> *Sent:* Sunday, September 11, 2022 10:40:42 PM
> *To:* Open MPI Users 
> *Cc:* George Bosilca 
> *Subject:* Re: [OMPI users] Subcommunicator communications do not
> complete intermittently
>
> Assuming a correct implementation the described communication pattern
> should work seamlessly.
>
> Would it be possible to either share a reproducer or provide the execution
> stack by attaching a debugger to the deadlocked application to see the
> state of the different processes. I wonder if all processes join eventually
> the gather on comm_world or dinner of them are stuck on some orthogonal
> collective communication pattern.
>
> George
>
>
>
>
> On Fri, Sep 9, 2022, 21:24 Niranda Perera via users <
> users@lists.open-mpi.org> wrote:
>
> Hi all,
>
> I have the following use case. I have N mpi ranks in the global
> communicator, and I split it into two, first being rank 0, and the other
> being all ranks from 1-->N-1.
> Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to
> broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world.
> Rank0 then immediately calls a gather (blocking) over comm_world and
> busywait for results. Once the broadcast is received by workers, they call
> a method foo(args, local_comm). Inside foo, workers communicate with each
> other using the subcommunicator, and each produce N-1 results, which would
> be sent to Rank0 as gather responses over comm_world. Inside foo there are
> multiple iterations, collectives, send-receives, etc.
>
> This seems to be working okay with smaller parallelism and smaller tasks
> of foo. But when the parallelism increases (eg: 64... 512), only a single
> iteration completes inside foo. Subsequent iterations, seems to be hanging.
>
> Is this an anti-pattern in MPI? Should I use igather, ibcast instead of
> blocking calls?
>
> Any help is greatly appreciated.
>
> --
> Niranda Perera
> https://niranda.dev/
> @n1r44 <https://twitter.com/N1R44>
>
>

-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>


Re: [OMPI users] Subcommunicator communications do not complete intermittently

2022-09-11 Thread Protze, Joachim via users
Hi,

A source of sudden deadlocks at larger scale can be a change of send behavior 
from buffered to synchronous mode. You can try whether your application 
deadlocks at smaller scale, if you replace all send by ssend (e.g., add`#define 
MPI_Send MPI_Ssend` and `#define MPI_Isend MPI_Issend` after the include of the 
MPI header).
An application with correct communication pattern should run with synchronous 
send without deadlock.
To check for other deadlock pattern in your application you can use tools like 
MUST [1] or Totalview.

Best
Joachim


[1] https://itc.rwth-aachen.de/must/

From: users  on behalf of George Bosilca via 
users 
Sent: Sunday, September 11, 2022 10:40:42 PM
To: Open MPI Users 
Cc: George Bosilca 
Subject: Re: [OMPI users] Subcommunicator communications do not complete 
intermittently

Assuming a correct implementation the described communication pattern should 
work seamlessly.

Would it be possible to either share a reproducer or provide the execution 
stack by attaching a debugger to the deadlocked application to see the state of 
the different processes. I wonder if all processes join eventually the gather 
on comm_world or dinner of them are stuck on some orthogonal collective 
communication pattern.

George




On Fri, Sep 9, 2022, 21:24 Niranda Perera via users 
mailto:users@lists.open-mpi.org>> wrote:
Hi all,

I have the following use case. I have N mpi ranks in the global communicator, 
and I split it into two, first being rank 0, and the other being all ranks from 
1-->N-1.
Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to 
broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world. Rank0 
then immediately calls a gather (blocking) over comm_world and busywait for 
results. Once the broadcast is received by workers, they call a method 
foo(args, local_comm). Inside foo, workers communicate with each other using 
the subcommunicator, and each produce N-1 results, which would be sent to Rank0 
as gather responses over comm_world. Inside foo there are multiple iterations, 
collectives, send-receives, etc.

This seems to be working okay with smaller parallelism and smaller tasks of 
foo. But when the parallelism increases (eg: 64... 512), only a single 
iteration completes inside foo. Subsequent iterations, seems to be hanging.

Is this an anti-pattern in MPI? Should I use igather, ibcast instead of 
blocking calls?

Any help is greatly appreciated.

--
Niranda Perera
https://niranda.dev/
@n1r44<https://twitter.com/N1R44>



Re: [OMPI users] Subcommunicator communications do not complete intermittently

2022-09-11 Thread George Bosilca via users
Assuming a correct implementation the described communication pattern
should work seamlessly.

Would it be possible to either share a reproducer or provide the execution
stack by attaching a debugger to the deadlocked application to see the
state of the different processes. I wonder if all processes join eventually
the gather on comm_world or dinner of them are stuck on some orthogonal
collective communication pattern.

George




On Fri, Sep 9, 2022, 21:24 Niranda Perera via users <
users@lists.open-mpi.org> wrote:

> Hi all,
>
> I have the following use case. I have N mpi ranks in the global
> communicator, and I split it into two, first being rank 0, and the other
> being all ranks from 1-->N-1.
> Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to
> broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world.
> Rank0 then immediately calls a gather (blocking) over comm_world and
> busywait for results. Once the broadcast is received by workers, they call
> a method foo(args, local_comm). Inside foo, workers communicate with each
> other using the subcommunicator, and each produce N-1 results, which would
> be sent to Rank0 as gather responses over comm_world. Inside foo there are
> multiple iterations, collectives, send-receives, etc.
>
> This seems to be working okay with smaller parallelism and smaller tasks
> of foo. But when the parallelism increases (eg: 64... 512), only a single
> iteration completes inside foo. Subsequent iterations, seems to be hanging.
>
> Is this an anti-pattern in MPI? Should I use igather, ibcast instead of
> blocking calls?
>
> Any help is greatly appreciated.
>
> --
> Niranda Perera
> https://niranda.dev/
> @n1r44 
>
>


[OMPI users] Subcommunicator communications do not complete intermittently

2022-09-09 Thread Niranda Perera via users
Hi all,

I have the following use case. I have N mpi ranks in the global
communicator, and I split it into two, first being rank 0, and the other
being all ranks from 1-->N-1.
Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to
broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world.
Rank0 then immediately calls a gather (blocking) over comm_world and
busywait for results. Once the broadcast is received by workers, they call
a method foo(args, local_comm). Inside foo, workers communicate with each
other using the subcommunicator, and each produce N-1 results, which would
be sent to Rank0 as gather responses over comm_world. Inside foo there are
multiple iterations, collectives, send-receives, etc.

This seems to be working okay with smaller parallelism and smaller tasks of
foo. But when the parallelism increases (eg: 64... 512), only a single
iteration completes inside foo. Subsequent iterations, seems to be hanging.

Is this an anti-pattern in MPI? Should I use igather, ibcast instead of
blocking calls?

Any help is greatly appreciated.

-- 
Niranda Perera
https://niranda.dev/
@n1r44