Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-13 Thread Jeff Squyres (jsquyres)
Thanks for the investigation and confirmation! -- Jeff Squyres jsquy...@cisco.com

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-12 Thread Drew Parsons
On 2021-10-13 02:35, Drew Parsons wrote: .. Debugging a bit further (with MPI_IN_PLACE removed), I can identify that the bug is in dolfinx not openmpi (unless there are two bugs here). Comparing detailed debug output from 2 threads, I find one thread skips the facet loop in

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-12 Thread Drew Parsons
On 2021-10-12 22:24, Drew Parsons wrote: On 2021-10-12 17:46, Jeff Squyres (jsquyres) wrote: ... Ok, so this is an MPI_Alltoall issue. Does it use MPI_IN_PLACE? ... I'll apply PR1738 to the debian dolfinx build and see how it turns out. Looks like removing MPI_IN_PLACE is not enough.

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-12 Thread Drew Parsons
On 2021-10-12 18:22, Drew Parsons wrote: On 2021-10-12 17:46, Jeff Squyres (jsquyres) wrote: I'm sorry, I just noticed that you replied 6 days ago, but I apparently wasn't notified by the Debian bug tracker. :-( Sorry about that. I'm never quite sure when the bug tracker does or does not add

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-12 Thread Drew Parsons
On 2021-10-12 17:46, Jeff Squyres (jsquyres) wrote: I'm sorry, I just noticed that you replied 6 days ago, but I apparently wasn't notified by the Debian bug tracker. :-( Sorry about that. I'm never quite sure when the bug tracker does or does not add cc:s. This reply has you cc:d in any

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-12 Thread Jeff Squyres (jsquyres)
I'm sorry, I just noticed that you replied 6 days ago, but I apparently wasn't notified by the Debian bug tracker. :-( Ok, so this is an MPI_Alltoall issue. Does it use MPI_IN_PLACE? On Wed, 06 Oct 2021 20:15:38 +0200 Drew Parsons wrote: > Source: openmpi > Followup-For: Bug #995599 > >

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-06 Thread Drew Parsons
Source: openmpi Followup-For: Bug #995599 Not so simple to make a minimal test case I think. all_to_all is defined in cpp/dolfinx/common/MPI.h in dolfinx source, and calls MPI_Alltoall from openmpi. It's designed to use with graph::AdjacencyList from graph/AdjacencyList.h, and is called from

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-02 Thread Drew Parsons
On 2021-10-03 00:51, Drew Parsons wrote: Manually debugging isolates the point in the dolfinx code at mesh/graphbuild.cpp l.144 graph::AdjacencyList recvd_buffer = dolfinx::MPI::all_to_all(comm, send_buffer);

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-02 Thread Drew Parsons
Manually debugging isolates the point in the dolfinx code at mesh/graphbuild.cpp l.144 graph::AdjacencyList recvd_buffer = dolfinx::MPI::all_to_all(comm, send_buffer); https://salsa.debian.org/science-team/fenics/fenics-dolfinx/-/blob/experimental/cpp/dolfinx/mesh/graphbuild.cpp#L144 I

Bug#995599: libopenmpi3: segfault in mca_btl_vader.so on 32-bit arches

2021-10-02 Thread Drew Parsons
Package: libopenmpi3 Version: 4.1.2~rc1-4 Severity: important Control: affects -1 src:fenics-dolfinx fenics-dolfinx FTBFS on 32-bit arches, i386, armel, armhf, see https://buildd.debian.org/status/package.php?p=fenics-dolfinx=experimental