Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-12 Thread Daniel Torres via users

Hi George and Gilles.

Thanks a lot for taking the time to test the code I sent.

As Gilles mentioned all tests he made worked perfect, I decided to 
install a totally new *OMPI 4.1.0* and test again.


Happily, the OOM killer is not shooting any process and all my 
experimentation worked perfect. The processes on hold remain "alert" to 
the end of the other processes, always obeying barriers or broadcasts.


It seems to be an issue with my own *OMPI 4.1.0u1a1* installation, not 
an issue with the current OMPI version.


So I'm going to put this issue on the right mailing list. =)

Thanks for your time and patience to answer to this issue.

Best regards.

El 12/01/21 a las 4:04, George Bosilca via users escribió:
*MPI_ERR_PROC_FAILED is not yet a valid error in MPI. It is coming 
from ULFM, an extension to MPI that is not yet in the OMPI master.*

*
*
*Daniel what version of Open MPI are you using ? Are you sure you are 
not mixing multiple versions due to PATH/LD_LIBRARY_PATH ?*

*
*
*George.*
*
*

On Mon, Jan 11, 2021 at 21:31 Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org>> wrote:


Daniel,

the test works in my environment (1 node, 32 GB memory) with all the
mentioned parameters.

Did you check the memory usage on your nodes and made sure the oom
killer did not shoot any process?

Cheers,

Gilles

On Tue, Jan 12, 2021 at 1:48 AM Daniel Torres via users
mailto:users@lists.open-mpi.org>> wrote:
>
> Hi.
>
> Thanks for responding. I have taken the most important parts
from my code and I created a test that reproduces the behavior I
described previously.
>
> I attach to this e-mail the compressed file "test.tar.gz".
Inside him, you can find:
>
> 1.- The .c source code "test.c", which I compiled with "mpicc -g
-O3 test.c -o test -lm". The main work is performed on the
function "work_on_grid", starting at line 162.
> 2.- Four execution examples in two different machines (my own
and a cluster machine), which I executed with "mpiexec -np 16
--machinefile hostfile --map-by node --mca btl tcp,vader,self
--mca btl_base_verbose 100 ./test 4096 4096", varying the last two
arguments with 4096, 8192 and 16384 (a matrix size). The error
appears with bigger numbers (8192 in my machine, 16384 in the cluster)
> 3.- The "ompi_info -a" output from the two machines.
> 4.- The hostfile.
>
> The duration of the delay is just a few seconds, about 3 ~ 4.
>
> Essentially, the first error message I get from a waiting
process is "74: MPI_ERR_PROC_FAILED: Process Failure".
>
> Hope this information can help.
>
> Thanks a lot for your time.
>
> El 08/01/21 a las 18:40, George Bosilca via users escribió:
>
> Daniel,
>
> There are no timeouts in OMPI with the exception of the initial
connection over TCP, where we use the socket timeout to prevent
deadlocks. As you already did quite a few communicator
duplications and other collective communications before you see
the timeout, we need more info about this. As Gilles indicated,
having the complete output might help. What is the duration of the
delay for the waiting process ? Also, can you post a replicator of
this issue ?
>
>   George.
>
>
> On Fri, Jan 8, 2021 at 9:03 AM Gilles Gouaillardet via users
mailto:users@lists.open-mpi.org>> wrote:
>>
>> Daniel,
>>
>> Can you please post the full error message and share a
reproducer for
>> this issue?
>>
>> Cheers,
>>
>> Gilles
>>
>> On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
>> mailto:users@lists.open-mpi.org>> wrote:
>> >
>> > Hi all.
>> >
>> > Actually I'm implementing an algorithm that creates a process
grid and divides it into row and column communicators as follows:
>> >
>> >              col_comm0    col_comm1 col_comm2 col_comm3
>> > row_comm0    P0           P1           P2   P3
>> > row_comm1    P4           P5           P6   P7
>> > row_comm2    P8           P9           P10    P11
>> > row_comm3    P12          P13          P14    P15
>> >
>> > Then, every process works on its own column communicator and
broadcast data on row communicators.
>> > While column operations are being executed, processes not
included in the current column communicator just wait for results.
>> >
>> > In a moment, a column communicator could be splitted to
create a temp communicator and allow only the right processes to
work on it.
>> >
>> > At the end of a step, a call to MPI_Barrier (on a duplicate
of MPI_COMM_WORLD) is executed to sync all processes and avoid bad
results.
>> >
>> > With a small amount of data (a small matrix) the MPI_Barrier
call syncs correctly on the communicator that includes all
processes and processing ends fine.

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-11 Thread George Bosilca via users
*MPI_ERR_PROC_FAILED is not yet a valid error in MPI. It is coming from
ULFM, an extension to MPI that is not yet in the OMPI master.*

*Daniel what version of Open MPI are you using ? Are you sure you are not
mixing multiple versions due to PATH/LD_LIBRARY_PATH ?*

*George.*


On Mon, Jan 11, 2021 at 21:31 Gilles Gouaillardet via users <
users@lists.open-mpi.org> wrote:

> Daniel,
>
> the test works in my environment (1 node, 32 GB memory) with all the
> mentioned parameters.
>
> Did you check the memory usage on your nodes and made sure the oom
> killer did not shoot any process?
>
> Cheers,
>
> Gilles
>
> On Tue, Jan 12, 2021 at 1:48 AM Daniel Torres via users
>  wrote:
> >
> > Hi.
> >
> > Thanks for responding. I have taken the most important parts from my
> code and I created a test that reproduces the behavior I described
> previously.
> >
> > I attach to this e-mail the compressed file "test.tar.gz". Inside him,
> you can find:
> >
> > 1.- The .c source code "test.c", which I compiled with "mpicc -g -O3
> test.c -o test -lm". The main work is performed on the function
> "work_on_grid", starting at line 162.
> > 2.- Four execution examples in two different machines (my own and a
> cluster machine), which I executed with "mpiexec -np 16 --machinefile
> hostfile --map-by node --mca btl tcp,vader,self --mca btl_base_verbose 100
> ./test 4096 4096", varying the last two arguments with 4096, 8192 and 16384
> (a matrix size). The error appears with bigger numbers (8192 in my machine,
> 16384 in the cluster)
> > 3.- The "ompi_info -a" output from the two machines.
> > 4.- The hostfile.
> >
> > The duration of the delay is just a few seconds, about 3 ~ 4.
> >
> > Essentially, the first error message I get from a waiting process is
> "74: MPI_ERR_PROC_FAILED: Process Failure".
> >
> > Hope this information can help.
> >
> > Thanks a lot for your time.
> >
> > El 08/01/21 a las 18:40, George Bosilca via users escribió:
> >
> > Daniel,
> >
> > There are no timeouts in OMPI with the exception of the initial
> connection over TCP, where we use the socket timeout to prevent deadlocks.
> As you already did quite a few communicator duplications and other
> collective communications before you see the timeout, we need more info
> about this. As Gilles indicated, having the complete output might help.
> What is the duration of the delay for the waiting process ? Also, can you
> post a replicator of this issue ?
> >
> >   George.
> >
> >
> > On Fri, Jan 8, 2021 at 9:03 AM Gilles Gouaillardet via users <
> users@lists.open-mpi.org> wrote:
> >>
> >> Daniel,
> >>
> >> Can you please post the full error message and share a reproducer for
> >> this issue?
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
> >>  wrote:
> >> >
> >> > Hi all.
> >> >
> >> > Actually I'm implementing an algorithm that creates a process grid
> and divides it into row and column communicators as follows:
> >> >
> >> >  col_comm0col_comm1col_comm2 col_comm3
> >> > row_comm0P0   P1   P2P3
> >> > row_comm1P4   P5   P6P7
> >> > row_comm2P8   P9   P10   P11
> >> > row_comm3P12  P13  P14   P15
> >> >
> >> > Then, every process works on its own column communicator and
> broadcast data on row communicators.
> >> > While column operations are being executed, processes not included in
> the current column communicator just wait for results.
> >> >
> >> > In a moment, a column communicator could be splitted to create a temp
> communicator and allow only the right processes to work on it.
> >> >
> >> > At the end of a step, a call to MPI_Barrier (on a duplicate of
> MPI_COMM_WORLD) is executed to sync all processes and avoid bad results.
> >> >
> >> > With a small amount of data (a small matrix) the MPI_Barrier call
> syncs correctly on the communicator that includes all processes and
> processing ends fine.
> >> > But when the amount of data (a big matrix) is incremented, operations
> on column communicators take more time to finish and hence waiting time
> also increments for waiting processes.
> >> >
> >> > After a few time, waiting processes return an error when they have
> not received the broadcast (MPI_Bcast) on row communicators or when they
> have finished their work at the sync point (MPI_Barrier). But when the
> operations on the current column communicator end, the still active
> processes try to broadcast on row communicators and they fail because the
> waiting processes have returned an error. So all processes fail in
> different moment in time.
> >> >
> >> > So my problem is that waiting processes "believe" that the current
> operations have failed (but they have not finished yet!) and they fail too.
> >> >
> >> > So I have a question about MPI_Bcast/MPI_Barrier:
> >> >
> >> > Is there a way to increment the timeout a process can wait for a
> broadcast or barrier to be 

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-11 Thread Gilles Gouaillardet via users
Daniel,

the test works in my environment (1 node, 32 GB memory) with all the
mentioned parameters.

Did you check the memory usage on your nodes and made sure the oom
killer did not shoot any process?

Cheers,

Gilles

On Tue, Jan 12, 2021 at 1:48 AM Daniel Torres via users
 wrote:
>
> Hi.
>
> Thanks for responding. I have taken the most important parts from my code and 
> I created a test that reproduces the behavior I described previously.
>
> I attach to this e-mail the compressed file "test.tar.gz". Inside him, you 
> can find:
>
> 1.- The .c source code "test.c", which I compiled with "mpicc -g -O3 test.c 
> -o test -lm". The main work is performed on the function "work_on_grid", 
> starting at line 162.
> 2.- Four execution examples in two different machines (my own and a cluster 
> machine), which I executed with "mpiexec -np 16 --machinefile hostfile 
> --map-by node --mca btl tcp,vader,self --mca btl_base_verbose 100 ./test 4096 
> 4096", varying the last two arguments with 4096, 8192 and 16384 (a matrix 
> size). The error appears with bigger numbers (8192 in my machine, 16384 in 
> the cluster)
> 3.- The "ompi_info -a" output from the two machines.
> 4.- The hostfile.
>
> The duration of the delay is just a few seconds, about 3 ~ 4.
>
> Essentially, the first error message I get from a waiting process is "74: 
> MPI_ERR_PROC_FAILED: Process Failure".
>
> Hope this information can help.
>
> Thanks a lot for your time.
>
> El 08/01/21 a las 18:40, George Bosilca via users escribió:
>
> Daniel,
>
> There are no timeouts in OMPI with the exception of the initial connection 
> over TCP, where we use the socket timeout to prevent deadlocks. As you 
> already did quite a few communicator duplications and other collective 
> communications before you see the timeout, we need more info about this. As 
> Gilles indicated, having the complete output might help. What is the duration 
> of the delay for the waiting process ? Also, can you post a replicator of 
> this issue ?
>
>   George.
>
>
> On Fri, Jan 8, 2021 at 9:03 AM Gilles Gouaillardet via users 
>  wrote:
>>
>> Daniel,
>>
>> Can you please post the full error message and share a reproducer for
>> this issue?
>>
>> Cheers,
>>
>> Gilles
>>
>> On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
>>  wrote:
>> >
>> > Hi all.
>> >
>> > Actually I'm implementing an algorithm that creates a process grid and 
>> > divides it into row and column communicators as follows:
>> >
>> >  col_comm0col_comm1col_comm2 col_comm3
>> > row_comm0P0   P1   P2P3
>> > row_comm1P4   P5   P6P7
>> > row_comm2P8   P9   P10   P11
>> > row_comm3P12  P13  P14   P15
>> >
>> > Then, every process works on its own column communicator and broadcast 
>> > data on row communicators.
>> > While column operations are being executed, processes not included in the 
>> > current column communicator just wait for results.
>> >
>> > In a moment, a column communicator could be splitted to create a temp 
>> > communicator and allow only the right processes to work on it.
>> >
>> > At the end of a step, a call to MPI_Barrier (on a duplicate of 
>> > MPI_COMM_WORLD) is executed to sync all processes and avoid bad results.
>> >
>> > With a small amount of data (a small matrix) the MPI_Barrier call syncs 
>> > correctly on the communicator that includes all processes and processing 
>> > ends fine.
>> > But when the amount of data (a big matrix) is incremented, operations on 
>> > column communicators take more time to finish and hence waiting time also 
>> > increments for waiting processes.
>> >
>> > After a few time, waiting processes return an error when they have not 
>> > received the broadcast (MPI_Bcast) on row communicators or when they have 
>> > finished their work at the sync point (MPI_Barrier). But when the 
>> > operations on the current column communicator end, the still active 
>> > processes try to broadcast on row communicators and they fail because the 
>> > waiting processes have returned an error. So all processes fail in 
>> > different moment in time.
>> >
>> > So my problem is that waiting processes "believe" that the current 
>> > operations have failed (but they have not finished yet!) and they fail too.
>> >
>> > So I have a question about MPI_Bcast/MPI_Barrier:
>> >
>> > Is there a way to increment the timeout a process can wait for a broadcast 
>> > or barrier to be completed?
>> >
>> > Here is my machine and OpenMPI info:
>> > - OpenMPI version: Open MPI 4.1.0u1a1
>> > - OS: Linux Daniel 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 
>> > 2020 x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > Thanks in advance for reading my description/question.
>> >
>> > Best regards.
>> >
>> > --
>> > Daniel Torres
>> > LIPN - Université Sorbonne Paris Nord
>
> --
> Daniel Torres
> LIPN - Université Sorbonne Paris Nord


Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-11 Thread Daniel Torres via users

Hi.

Thanks for responding. I have taken the most important parts from my 
code and I created a test that reproduces the behavior I described 
previously.


I attach to this e-mail the compressed file "*test.tar.gz*". Inside him, 
you can find:


1.- The .c source code "test.c", which I compiled with "*mpicc -g -O3 
test.c -o test -lm*". The main work is performed on the function 
"*work_on_grid*", starting at line 162.
2.- Four execution examples in two different machines (my own and a 
cluster machine), which I executed with "*mpiexec -np 16 --machinefile 
hostfile --map-by node --mca btl tcp,vader,self --mca btl_base_verbose 
100 ./test 4096 4096*", varying the last two arguments with *4096, 8192 
and 16384* (a matrix size). The error appears with bigger numbers (8192 
in my machine, 16384 in the cluster)

3.- The "ompi_info -a" output from the two machines.
4.- The hostfile.

The duration of the delay is just a few seconds, about 3 ~ 4.

Essentially, the first error message I get from a waiting process is 
"*74: MPI_ERR_PROC_FAILED: Process Failure*".


Hope this information can help.

Thanks a lot for your time.

El 08/01/21 a las 18:40, George Bosilca via users escribió:

Daniel,

There are no timeouts in OMPI with the exception of the initial 
connection over TCP, where we use the socket timeout to prevent 
deadlocks. As you already did quite a few communicator duplications 
and other collective communications before you see the timeout, we 
need more info about this. As Gilles indicated, having the complete 
output might help. What is the duration of the delay for the waiting 
process ? Also, can you post a replicator of this issue ?


  George.


On Fri, Jan 8, 2021 at 9:03 AM Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org>> wrote:


Daniel,

Can you please post the full error message and share a reproducer for
this issue?

Cheers,

Gilles

On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
mailto:users@lists.open-mpi.org>> wrote:
>
> Hi all.
>
> Actually I'm implementing an algorithm that creates a process
grid and divides it into row and column communicators as follows:
>
>              col_comm0    col_comm1    col_comm2 col_comm3
> row_comm0    P0           P1           P2        P3
> row_comm1    P4           P5           P6        P7
> row_comm2    P8           P9           P10       P11
> row_comm3    P12          P13          P14       P15
>
> Then, every process works on its own column communicator and
broadcast data on row communicators.
> While column operations are being executed, processes not
included in the current column communicator just wait for results.
>
> In a moment, a column communicator could be splitted to create a
temp communicator and allow only the right processes to work on it.
>
> At the end of a step, a call to MPI_Barrier (on a duplicate of
MPI_COMM_WORLD) is executed to sync all processes and avoid bad
results.
>
> With a small amount of data (a small matrix) the MPI_Barrier
call syncs correctly on the communicator that includes all
processes and processing ends fine.
> But when the amount of data (a big matrix) is incremented,
operations on column communicators take more time to finish and
hence waiting time also increments for waiting processes.
>
> After a few time, waiting processes return an error when they
have not received the broadcast (MPI_Bcast) on row communicators
or when they have finished their work at the sync point
(MPI_Barrier). But when the operations on the current column
communicator end, the still active processes try to broadcast on
row communicators and they fail because the waiting processes have
returned an error. So all processes fail in different moment in time.
>
> So my problem is that waiting processes "believe" that the
current operations have failed (but they have not finished yet!)
and they fail too.
>
> So I have a question about MPI_Bcast/MPI_Barrier:
>
> Is there a way to increment the timeout a process can wait for a
broadcast or barrier to be completed?
>
> Here is my machine and OpenMPI info:
> - OpenMPI version: Open MPI 4.1.0u1a1
> - OS: Linux Daniel 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15
10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> Thanks in advance for reading my description/question.
>
> Best regards.
>
> --
> Daniel Torres
> LIPN - Université Sorbonne Paris Nord


--
Daniel Torres
LIPN - Université Sorbonne Paris Nord



test.tar.gz
Description: application/gzip


Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-08 Thread George Bosilca via users
Daniel,

There are no timeouts in OMPI with the exception of the initial connection
over TCP, where we use the socket timeout to prevent deadlocks. As you
already did quite a few communicator duplications and other collective
communications before you see the timeout, we need more info about this. As
Gilles indicated, having the complete output might help. What is the
duration of the delay for the waiting process ? Also, can you post a
replicator of this issue ?

  George.


On Fri, Jan 8, 2021 at 9:03 AM Gilles Gouaillardet via users <
users@lists.open-mpi.org> wrote:

> Daniel,
>
> Can you please post the full error message and share a reproducer for
> this issue?
>
> Cheers,
>
> Gilles
>
> On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
>  wrote:
> >
> > Hi all.
> >
> > Actually I'm implementing an algorithm that creates a process grid and
> divides it into row and column communicators as follows:
> >
> >  col_comm0col_comm1col_comm2 col_comm3
> > row_comm0P0   P1   P2P3
> > row_comm1P4   P5   P6P7
> > row_comm2P8   P9   P10   P11
> > row_comm3P12  P13  P14   P15
> >
> > Then, every process works on its own column communicator and broadcast
> data on row communicators.
> > While column operations are being executed, processes not included in
> the current column communicator just wait for results.
> >
> > In a moment, a column communicator could be splitted to create a temp
> communicator and allow only the right processes to work on it.
> >
> > At the end of a step, a call to MPI_Barrier (on a duplicate of
> MPI_COMM_WORLD) is executed to sync all processes and avoid bad results.
> >
> > With a small amount of data (a small matrix) the MPI_Barrier call syncs
> correctly on the communicator that includes all processes and processing
> ends fine.
> > But when the amount of data (a big matrix) is incremented, operations on
> column communicators take more time to finish and hence waiting time also
> increments for waiting processes.
> >
> > After a few time, waiting processes return an error when they have not
> received the broadcast (MPI_Bcast) on row communicators or when they have
> finished their work at the sync point (MPI_Barrier). But when the
> operations on the current column communicator end, the still active
> processes try to broadcast on row communicators and they fail because the
> waiting processes have returned an error. So all processes fail in
> different moment in time.
> >
> > So my problem is that waiting processes "believe" that the current
> operations have failed (but they have not finished yet!) and they fail too.
> >
> > So I have a question about MPI_Bcast/MPI_Barrier:
> >
> > Is there a way to increment the timeout a process can wait for a
> broadcast or barrier to be completed?
> >
> > Here is my machine and OpenMPI info:
> > - OpenMPI version: Open MPI 4.1.0u1a1
> > - OS: Linux Daniel 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00
> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> >
> > Thanks in advance for reading my description/question.
> >
> > Best regards.
> >
> > --
> > Daniel Torres
> > LIPN - Université Sorbonne Paris Nord
>


Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-08 Thread Gilles Gouaillardet via users
Daniel,

Can you please post the full error message and share a reproducer for
this issue?

Cheers,

Gilles

On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
 wrote:
>
> Hi all.
>
> Actually I'm implementing an algorithm that creates a process grid and 
> divides it into row and column communicators as follows:
>
>  col_comm0col_comm1col_comm2 col_comm3
> row_comm0P0   P1   P2P3
> row_comm1P4   P5   P6P7
> row_comm2P8   P9   P10   P11
> row_comm3P12  P13  P14   P15
>
> Then, every process works on its own column communicator and broadcast data 
> on row communicators.
> While column operations are being executed, processes not included in the 
> current column communicator just wait for results.
>
> In a moment, a column communicator could be splitted to create a temp 
> communicator and allow only the right processes to work on it.
>
> At the end of a step, a call to MPI_Barrier (on a duplicate of 
> MPI_COMM_WORLD) is executed to sync all processes and avoid bad results.
>
> With a small amount of data (a small matrix) the MPI_Barrier call syncs 
> correctly on the communicator that includes all processes and processing ends 
> fine.
> But when the amount of data (a big matrix) is incremented, operations on 
> column communicators take more time to finish and hence waiting time also 
> increments for waiting processes.
>
> After a few time, waiting processes return an error when they have not 
> received the broadcast (MPI_Bcast) on row communicators or when they have 
> finished their work at the sync point (MPI_Barrier). But when the operations 
> on the current column communicator end, the still active processes try to 
> broadcast on row communicators and they fail because the waiting processes 
> have returned an error. So all processes fail in different moment in time.
>
> So my problem is that waiting processes "believe" that the current operations 
> have failed (but they have not finished yet!) and they fail too.
>
> So I have a question about MPI_Bcast/MPI_Barrier:
>
> Is there a way to increment the timeout a process can wait for a broadcast or 
> barrier to be completed?
>
> Here is my machine and OpenMPI info:
> - OpenMPI version: Open MPI 4.1.0u1a1
> - OS: Linux Daniel 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 
> 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> Thanks in advance for reading my description/question.
>
> Best regards.
>
> --
> Daniel Torres
> LIPN - Université Sorbonne Paris Nord


[OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-08 Thread Daniel Torres via users

Hi all.

Actually I'm implementing an algorithm that creates a process grid and 
divides it into row and column communicators as follows:


         col_comm0    col_comm1    col_comm2 col_comm3
row_comm0    P0           P1           P2        P3
row_comm1    P4           P5           P6        P7
row_comm2    P8           P9           P10   P11
row_comm3    P12          P13          P14   P15

Then, every process works on its own column communicator and broadcast 
data on row communicators.
While column operations are being executed, processes not included in 
the current column communicator just wait for results.


In a moment, a column communicator could be splitted to create a temp 
communicator and allow only the right processes to work on it.


At the end of a step, a call to MPI_Barrier (on a duplicate of 
MPI_COMM_WORLD) is executed to sync all processes and avoid bad results.


With a small amount of data (a small matrix) the MPI_Barrier call syncs 
correctly on the communicator that includes all processes and processing 
ends fine.
But when the amount of data (a big matrix) is incremented, operations on 
column communicators take more time to finish and hence waiting time 
also increments for waiting processes.


After a few time, waiting processes return an error when they have not 
received the broadcast (MPI_Bcast) on row communicators or when they 
have finished their work at the sync point (MPI_Barrier). But when the 
operations on the current column communicator end, the still active 
processes try to broadcast on row communicators and they fail because 
the waiting processes have returned an error. So all processes fail in 
different moment in time.


So my problem is that waiting processes "believe" that the current 
operations have failed (but they have not finished yet!) and they fail too.


So I have a question about MPI_Bcast/MPI_Barrier:

Is there a way to increment the timeout a process can wait for a 
broadcast or barrier to be completed?


Here is my machine and OpenMPI info:
- OpenMPI version: Open MPI 4.1.0u1a1
- OS: Linux Daniel 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


Thanks in advance for reading my description/question.

Best regards.

--
Daniel Torres
LIPN - Université Sorbonne Paris Nord