Re: [OMPI users] Issue with shared memory arrays in Fortran

2020-08-25 Thread Patrick McNally via users
Thank you very much for the response.  I have to admit that I'm much more
in the developer camp than the admin camp and am not terribly familiar with
installing and configuring OpenMPI myself.  At least one of the systems
does not appear to use ucx but both are using mxm.  I'm attaching the
output of 'ompi_info --all' for the system on which the code always fails
(not just with Python) in case it is helpful.  I neglected to mention that
the shmemTest.out file included in the original archive was a run from this
system.

I tested your suggested workaround on that same always-failing system and
it did indeed work, so thank you very much for that!  I realize you
probably don't know enough about the root cause yet to know the scope of
the issue, but if you get to that point I'd appreciate any additional
knowledge about whether other calls (gather, reduce, etc.) might also be
affected.  This particular call was relatively easy for me to find because
the bad data caused obvious failures in our code.  It is possible other
areas are also affected but just in more subtle ways.

Please let me know if there is any additional testing/exploration I can do
on this end to help.

-Patrick

On Mon, Aug 24, 2020 at 11:19 PM Gilles Gouaillardet via users <
users@lists.open-mpi.org> wrote:

> Patrick,
>
> Thanks for the report and the reproducer.
>
> I was able to confirm the issue with python and Fortran, but
>  - I can only reproduce it with pml/ucx (read --mca pml ob1 --mca btl
> tcp,self works fine)
>  - I can only reproduce it with bcast algorithm 8 and 9
>
> As a workaround, you can keep using ucx but manually change the bcast algo
>
> mpirun --mca coll_tuned_use_dynamic_rules 1 --mca
> coll_tuned_bcast_algorithm 1 ...
>
> /* you can replace the bcast algorithm with any value between 1 and 7
> included */
>
> Cheers,
>
> Gilles
>
> On Mon, Aug 24, 2020 at 10:58 PM Patrick McNally via users
>  wrote:
> >
> > I apologize in advance for the size of the example source and probably
> the length of the email, but this has been a pain to track down.
> >
> > Our application uses System V style shared memory pretty extensively and
> have recently found that in certain circumstances, OpenMPI appears to
> provide ranks with stale data.  The attached archive contains sample code
> that demonstrates the issue.  There is a subroutine that uses a shared
> memory array to broadcast from a single rank on one compute node to a
> single rank on all other compute nodes.  The first call sends all 1s, then
> all 2s, and so on.  The receiving rank(s) get all 1s on the first
> execution, but on subsequent executions they receive some 2s and some 1s;
> then some 3s, some 2s, and some 1s.  The code contains a version of this
> routine in both C and Fortran but only the Fortran version appears to
> exhibit the problem.
> >
> > I've tried this with OpenMPI 3.1.5, 4.0.2, and 4.0.4 and on two
> different systems with very different configurations and both show the
> problem.  On one of the machines, it only appears to happen when MPI is
> initialized with mpi4py, so I've included that in the test as well.  Other
> than that, the behavior is very consistent across machines.  When run with
> the same number of ranks and same size array, the two machines even show
> the invalid values at the same indices.
> >
> > Please let me know if you need any additional information.
> >
> > Thanks,
> > Patrick
>


ompiInfoAll.tar.bz2
Description: application/bzip2


Re: [OMPI users] Issue with shared memory arrays in Fortran

2020-08-24 Thread Gilles Gouaillardet via users
Patrick,

Thanks for the report and the reproducer.

I was able to confirm the issue with python and Fortran, but
 - I can only reproduce it with pml/ucx (read --mca pml ob1 --mca btl
tcp,self works fine)
 - I can only reproduce it with bcast algorithm 8 and 9

As a workaround, you can keep using ucx but manually change the bcast algo

mpirun --mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_bcast_algorithm 1 ...

/* you can replace the bcast algorithm with any value between 1 and 7
included */

Cheers,

Gilles

On Mon, Aug 24, 2020 at 10:58 PM Patrick McNally via users
 wrote:
>
> I apologize in advance for the size of the example source and probably the 
> length of the email, but this has been a pain to track down.
>
> Our application uses System V style shared memory pretty extensively and have 
> recently found that in certain circumstances, OpenMPI appears to provide 
> ranks with stale data.  The attached archive contains sample code that 
> demonstrates the issue.  There is a subroutine that uses a shared memory 
> array to broadcast from a single rank on one compute node to a single rank on 
> all other compute nodes.  The first call sends all 1s, then all 2s, and so 
> on.  The receiving rank(s) get all 1s on the first execution, but on 
> subsequent executions they receive some 2s and some 1s; then some 3s, some 
> 2s, and some 1s.  The code contains a version of this routine in both C and 
> Fortran but only the Fortran version appears to exhibit the problem.
>
> I've tried this with OpenMPI 3.1.5, 4.0.2, and 4.0.4 and on two different 
> systems with very different configurations and both show the problem.  On one 
> of the machines, it only appears to happen when MPI is initialized with 
> mpi4py, so I've included that in the test as well.  Other than that, the 
> behavior is very consistent across machines.  When run with the same number 
> of ranks and same size array, the two machines even show the invalid values 
> at the same indices.
>
> Please let me know if you need any additional information.
>
> Thanks,
> Patrick


[OMPI users] Issue with shared memory arrays in Fortran

2020-08-24 Thread Patrick McNally via users
I apologize in advance for the size of the example source and probably the
length of the email, but this has been a pain to track down.

Our application uses System V style shared memory pretty extensively and
have recently found that in certain circumstances, OpenMPI appears to
provide ranks with stale data.  The attached archive contains sample code
that demonstrates the issue.  There is a subroutine that uses a shared
memory array to broadcast from a single rank on one compute node to a
single rank on all other compute nodes.  The first call sends all 1s, then
all 2s, and so on.  The receiving rank(s) get all 1s on the first
execution, but on subsequent executions they receive some 2s and some 1s;
then some 3s, some 2s, and some 1s.  The code contains a version of this
routine in both C and Fortran but only the Fortran version appears to
exhibit the problem.

I've tried this with OpenMPI 3.1.5, 4.0.2, and 4.0.4 and on two different
systems with very different configurations and both show the problem.  On
one of the machines, it only appears to happen when MPI is initialized with
mpi4py, so I've included that in the test as well.  Other than that, the
behavior is very consistent across machines.  When run with the same number
of ranks and same size array, the two machines even show the invalid values
at the same indices.

Please let me know if you need any additional information.

Thanks,
Patrick


shmemTest.tgz
Description: application/compressed-tar