Bug#910251: libpsm2 bug #910485 appears solved for mpgrafic builds - supports closing #910251
hi all There was an FTBFS bug #911941 for mpgrafic on 17 and 27 October 2018, which were clearly only because of libpsm2 bug #910485 reporting a timeout to stdout: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910485 The reproducible builds of mpgrafic since then look OK: https://tests.reproducible-builds.org/debian/history/mpgrafic.html and commit 3aa6558 for libpsm2 says that an upstream fix is now in debian 11.2.68-4 https://salsa.debian.org/hpc-team/libpsm2/commit/3aa65581753f79bc09ab776aa89c27e4df7a1877 Bug #910485 has now been closed. The mpgrafic builds provide circumstantial evidence supporting the closing of openmpi bug #910251 . Cheers Boud
Bug#910251: Bug@910251 Confirmation of Fix
The delays and warning messages for my MPI programs on Sid were resolved by the workaround fix in libpsm2 11.2.68-2. You can close this one as far as I'm concerned. Thanks, Ron -- James Ronald Lovell Huntsville, AL, USA A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. -Leslie Lamport
Bug#910251:
That should be libpsm2-2 11.2.68-1, not .78. On Mon, Oct 8, 2018 at 1:39 PM Ron Lovell wrote: > Alastair, > > Thanks for the quick update. I see that I upgraded to libpsm2-2 11.2.78-1 > on 01 Oct 2018, so the timing fits. > > Thanks, > Ron > -- > James Ronald Lovell > Huntsville, AL, USA > -- James Ronald Lovell Huntsville, AL, USA
Bug#910251:
Alastair, Thanks for the quick update. I see that I upgraded to libpsm2-2 11.2.78-1 on 01 Oct 2018, so the timing fits. Thanks, Ron -- James Ronald Lovell Huntsville, AL, USA
Bug#910251: libopenmpi3 3.1.2-5 Introduces 15s Delay and hfi_wait_for_device Messages
This appears to be due to a bug in libpsm2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910485 Best regards Alastair On 04/10/2018 00:24, Ron Lovell wrote: Package: libopenmpi3 Version: 3.1.2-5 Severity: normal Dear Maintainer, I updated Open MPI and libpmix2 on my Sid X86_64 system this afternoon: Open MPI 3.1.2-5 libpmix2 3.0.2-2 My simple MPI tests run to completion and give correct results, but there is an abnormal delay in startup and new warning messages. Example: INFO:Executing build/mpi_mm_c mpiexec -np 2 build/mpi_mm_c < mpi_mm_c.in > mpi_mm_c.tmp 2>mpi_mm_c.err ron5sid.10482hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out ron5sid.10483hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out mpi_mm has started with 2 tasks. ... (rest of results are normal) I did some research online. Red Hat Bug 1408316 from a couple years ago was to fix libfabric 1.4.1 in RHEL7 to not wait for /dev/hfi* devices if OPA/HFI hardware is not present. Occurred when system had PSM2 installed but no OPA/HFI hardware. My system does have libfabric1-1.6.1-5, which libopenmpi3 depends on. My system does have libpsm-infinipath1 and libpsm2-2, which libopenmpi3 depends on. I noticed this note in the "Changes" for openmpi 3.1.2-5: "* Drop link-libfabric.patch as obsolete" Relevant? What did the obsolete patch do? I see in my test results that my MPI tests passed 12 Sep 2018, when 3.1.2-2 or 3.1.2-3 was current. I don't see any record of testing 3.1.2-4. I'm filing this as "normal" severity, since my programs do run correctly. It's a slight practical nuisance as my test jobs flag changed output as possible problems. (Helpful in this case.) -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libopenmpi3 depends on: ii libc62.27-6 ii libevent-2.1-6 2.1.8-stable-4 ii libevent-pthreads-2.1-6 2.1.8-stable-4 ii libfabric1 1.6.1-5 ii libgcc1 1:8.2.0-7 ii libgfortran5 8.2.0-7 ii libhwloc-plugins 1.11.11-2 ii libhwloc51.11.11-2 ii libibverbs1 20.0-1 ii libpmix2 3.0.2-2 ii libpsm-infinipath1 3.3+20.604758e7-5 ii libpsm2-211.2.68-1 ii libquadmath0 8.2.0-7 ii libstdc++6 8.2.0-7 ii zlib1g 1:1.2.11.dfsg-1 Versions of packages libopenmpi3 recommends: ii openmpi-bin 3.1.2-5 libopenmpi3 suggests no packages. -- no debconf information -- Alastair McKinstry, , , https://diaspora.sceal.ie/u/amckinstry Commander Vimes didn’t like the phrase “The innocent have nothing to fear,” believing the innocent had everything to fear, mostly from the guilty but in the longer term even more from those who say things like “The innocent have nothing to fear.” - T. Pratchett, Snuff
Bug#910251: libopenmpi3 3.1.2-5 Introduces 15s Delay and hfi_wait_for_device Messages
Package: libopenmpi3 Version: 3.1.2-5 Severity: normal Dear Maintainer, I updated Open MPI and libpmix2 on my Sid X86_64 system this afternoon: Open MPI 3.1.2-5 libpmix2 3.0.2-2 My simple MPI tests run to completion and give correct results, but there is an abnormal delay in startup and new warning messages. Example: INFO:Executing build/mpi_mm_c mpiexec -np 2 build/mpi_mm_c < mpi_mm_c.in > mpi_mm_c.tmp 2>mpi_mm_c.err ron5sid.10482hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out ron5sid.10483hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out mpi_mm has started with 2 tasks. ... (rest of results are normal) I did some research online. Red Hat Bug 1408316 from a couple years ago was to fix libfabric 1.4.1 in RHEL7 to not wait for /dev/hfi* devices if OPA/HFI hardware is not present. Occurred when system had PSM2 installed but no OPA/HFI hardware. My system does have libfabric1-1.6.1-5, which libopenmpi3 depends on. My system does have libpsm-infinipath1 and libpsm2-2, which libopenmpi3 depends on. I noticed this note in the "Changes" for openmpi 3.1.2-5: "* Drop link-libfabric.patch as obsolete" Relevant? What did the obsolete patch do? I see in my test results that my MPI tests passed 12 Sep 2018, when 3.1.2-2 or 3.1.2-3 was current. I don't see any record of testing 3.1.2-4. I'm filing this as "normal" severity, since my programs do run correctly. It's a slight practical nuisance as my test jobs flag changed output as possible problems. (Helpful in this case.) -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libopenmpi3 depends on: ii libc62.27-6 ii libevent-2.1-6 2.1.8-stable-4 ii libevent-pthreads-2.1-6 2.1.8-stable-4 ii libfabric1 1.6.1-5 ii libgcc1 1:8.2.0-7 ii libgfortran5 8.2.0-7 ii libhwloc-plugins 1.11.11-2 ii libhwloc51.11.11-2 ii libibverbs1 20.0-1 ii libpmix2 3.0.2-2 ii libpsm-infinipath1 3.3+20.604758e7-5 ii libpsm2-211.2.68-1 ii libquadmath0 8.2.0-7 ii libstdc++6 8.2.0-7 ii zlib1g 1:1.2.11.dfsg-1 Versions of packages libopenmpi3 recommends: ii openmpi-bin 3.1.2-5 libopenmpi3 suggests no packages. -- no debconf information