Re: [OMPI users] silent failure for large allgather

2019-09-25 Thread Heinz, Michael William via users
Emmanuel Thomé, Thanks for bringing this to our attention. It turns out this issue affects all OFI providers in open-mpi. We've applied a fix to the 3.0.x and later branches of open-mpi/ompi on github. However, you should be aware that this fix simply adds the appropriate error message, it does

Re: [OMPI users] silent failure for large allgather

2019-09-13 Thread Jeff Squyres (jsquyres) via users
Emmanuel -- Looks like the right people missed this when you posted; sorry about that! We're tracking it now: https://github.com/open-mpi/ompi/issues/6976 On Sep 13, 2019, at 3:04 AM, Emmanuel Thomé via users mailto:users@lists.open-mpi.org>> wrote: Hi, Thanks Jeff for your reply, and sorry

Re: [OMPI users] silent failure for large allgather

2019-09-13 Thread Emmanuel Thomé via users
Hi, Thanks Jeff for your reply, and sorry for this late follow-up... On Sun, Aug 11, 2019 at 02:27:53PM -0700, Jeff Hammond wrote: > > openmpi-4.0.1 gives essentially the same results (similar files > > attached), but with various doubts on my part as to whether I've run this > > check correctly.

Re: [OMPI users] silent failure for large allgather

2019-08-11 Thread Jeff Hammond via users
On Tue, Aug 6, 2019 at 9:54 AM Emmanuel Thomé via users < users@lists.open-mpi.org> wrote: > Hi, > > In the attached program, the MPI_Allgather() call fails to communicate > all data (the amount it communicates wraps around at 4G...). I'm running > on an omnipath cluster (2018 hardware), openmpi

[OMPI users] silent failure for large allgather

2019-08-06 Thread Emmanuel Thomé via users
Hi, In the attached program, the MPI_Allgather() call fails to communicate all data (the amount it communicates wraps around at 4G...). I'm running on an omnipath cluster (2018 hardware), openmpi 3.1.3 or 4.0.1 (tested both). With the OFI mtl, the failure is silent, with no error message reporte