Bug#910251: libpsm2 bug #910485 appears solved for mpgrafic builds - supports closing #910251

2018-12-23 Thread Boud Roukema

hi all

There was an FTBFS bug #911941 for mpgrafic on 17 and 27 October 2018, which 
were clearly
only because of libpsm2 bug #910485 reporting a timeout to stdout:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910485

The reproducible builds of mpgrafic since then look OK:

https://tests.reproducible-builds.org/debian/history/mpgrafic.html

and commit 3aa6558 for libpsm2 says that an upstream fix is now in debian 
11.2.68-4

https://salsa.debian.org/hpc-team/libpsm2/commit/3aa65581753f79bc09ab776aa89c27e4df7a1877

Bug #910485 has now been closed.

The mpgrafic builds provide circumstantial evidence supporting the
closing of openmpi bug #910251 .

Cheers
Boud



Bug#910251: Bug@910251 Confirmation of Fix

2018-11-02 Thread Ron Lovell
The delays and warning messages for my MPI programs on Sid were resolved by
the workaround fix in libpsm2 11.2.68-2. You can close this one as far as
I'm concerned.

Thanks,
Ron
-- 
James Ronald Lovell 
Huntsville, AL, USA
A distributed system is one in which the failure of a computer you
didn't even know existed can render your own computer unusable.
-Leslie Lamport


Bug#910251:

2018-10-08 Thread Ron Lovell
That should be libpsm2-2 11.2.68-1, not .78.

On Mon, Oct 8, 2018 at 1:39 PM Ron Lovell  wrote:

> Alastair,
>
> Thanks for the quick update. I see that I upgraded to libpsm2-2 11.2.78-1
> on 01 Oct 2018, so the timing fits.
>
> Thanks,
> Ron
> --
> James Ronald Lovell 
> Huntsville, AL, USA
>


-- 
James Ronald Lovell 
Huntsville, AL, USA


Bug#910251:

2018-10-08 Thread Ron Lovell
Alastair,

Thanks for the quick update. I see that I upgraded to libpsm2-2 11.2.78-1
on 01 Oct 2018, so the timing fits.

Thanks,
Ron
-- 
James Ronald Lovell 
Huntsville, AL, USA


Bug#910251: libopenmpi3 3.1.2-5 Introduces 15s Delay and hfi_wait_for_device Messages

2018-10-08 Thread Alastair McKinstry

This appears to be due to a bug in libpsm2:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910485


Best regards

Alastair


On 04/10/2018 00:24, Ron Lovell wrote:

Package: libopenmpi3
Version: 3.1.2-5
Severity: normal

Dear Maintainer,

I updated Open MPI and libpmix2 on my Sid X86_64 system this afternoon:
Open MPI 3.1.2-5
libpmix2 3.0.2-2

My simple MPI tests run to completion and give correct results, but
there is an abnormal delay in startup and new warning messages. Example:

INFO:Executing build/mpi_mm_c
mpiexec -np 2 build/mpi_mm_c < mpi_mm_c.in > mpi_mm_c.tmp 2>mpi_mm_c.err
ron5sid.10482hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 
15.0 seconds: Connection timed out
ron5sid.10483hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 
15.0 seconds: Connection timed out
mpi_mm has started with 2 tasks.
...
(rest of results are normal)

I did some research online. Red Hat Bug 1408316 from a couple years ago
was to fix libfabric 1.4.1 in RHEL7 to not wait for /dev/hfi* devices if
OPA/HFI hardware is not present. Occurred when system had PSM2 installed
but no OPA/HFI hardware.

My system does have libfabric1-1.6.1-5, which libopenmpi3 depends on.
My system does have libpsm-infinipath1 and libpsm2-2, which libopenmpi3
depends on.

I noticed this note in the "Changes" for openmpi 3.1.2-5:
"* Drop link-libfabric.patch as obsolete"
Relevant? What did the obsolete patch do?

I see in my test results that my MPI tests passed 12 Sep 2018, when 3.1.2-2
or 3.1.2-3 was current. I don't see any record of testing 3.1.2-4.

I'm filing this as "normal" severity, since my programs do run correctly.
It's a slight practical nuisance as my test jobs flag changed output as
possible problems. (Helpful in this case.)

-- System Information:
Debian Release: buster/sid
   APT prefers unstable
   APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libopenmpi3 depends on:
ii  libc62.27-6
ii  libevent-2.1-6   2.1.8-stable-4
ii  libevent-pthreads-2.1-6  2.1.8-stable-4
ii  libfabric1   1.6.1-5
ii  libgcc1  1:8.2.0-7
ii  libgfortran5 8.2.0-7
ii  libhwloc-plugins 1.11.11-2
ii  libhwloc51.11.11-2
ii  libibverbs1  20.0-1
ii  libpmix2 3.0.2-2
ii  libpsm-infinipath1   3.3+20.604758e7-5
ii  libpsm2-211.2.68-1
ii  libquadmath0 8.2.0-7
ii  libstdc++6   8.2.0-7
ii  zlib1g   1:1.2.11.dfsg-1

Versions of packages libopenmpi3 recommends:
ii  openmpi-bin  3.1.2-5

libopenmpi3 suggests no packages.

-- no debconf information


--
Alastair McKinstry, , , 
https://diaspora.sceal.ie/u/amckinstry
Commander Vimes didn’t like the phrase “The innocent have nothing to fear,”
 believing the innocent had everything to fear, mostly from the guilty but in 
the longer term
 even more from those who say things like “The innocent have nothing to fear.”
 - T. Pratchett, Snuff



Bug#910251: libopenmpi3 3.1.2-5 Introduces 15s Delay and hfi_wait_for_device Messages

2018-10-03 Thread Ron Lovell
Package: libopenmpi3
Version: 3.1.2-5
Severity: normal

Dear Maintainer,

I updated Open MPI and libpmix2 on my Sid X86_64 system this afternoon:
Open MPI 3.1.2-5
libpmix2 3.0.2-2

My simple MPI tests run to completion and give correct results, but
there is an abnormal delay in startup and new warning messages. Example:

INFO:Executing build/mpi_mm_c
mpiexec -np 2 build/mpi_mm_c < mpi_mm_c.in > mpi_mm_c.tmp 2>mpi_mm_c.err
ron5sid.10482hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 
15.0 seconds: Connection timed out
ron5sid.10483hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 
15.0 seconds: Connection timed out
mpi_mm has started with 2 tasks.
...
(rest of results are normal)

I did some research online. Red Hat Bug 1408316 from a couple years ago
was to fix libfabric 1.4.1 in RHEL7 to not wait for /dev/hfi* devices if
OPA/HFI hardware is not present. Occurred when system had PSM2 installed
but no OPA/HFI hardware.

My system does have libfabric1-1.6.1-5, which libopenmpi3 depends on.
My system does have libpsm-infinipath1 and libpsm2-2, which libopenmpi3
depends on.

I noticed this note in the "Changes" for openmpi 3.1.2-5:
"* Drop link-libfabric.patch as obsolete"
Relevant? What did the obsolete patch do?

I see in my test results that my MPI tests passed 12 Sep 2018, when 3.1.2-2
or 3.1.2-3 was current. I don't see any record of testing 3.1.2-4.

I'm filing this as "normal" severity, since my programs do run correctly.
It's a slight practical nuisance as my test jobs flag changed output as
possible problems. (Helpful in this case.)

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libopenmpi3 depends on:
ii  libc62.27-6
ii  libevent-2.1-6   2.1.8-stable-4
ii  libevent-pthreads-2.1-6  2.1.8-stable-4
ii  libfabric1   1.6.1-5
ii  libgcc1  1:8.2.0-7
ii  libgfortran5 8.2.0-7
ii  libhwloc-plugins 1.11.11-2
ii  libhwloc51.11.11-2
ii  libibverbs1  20.0-1
ii  libpmix2 3.0.2-2
ii  libpsm-infinipath1   3.3+20.604758e7-5
ii  libpsm2-211.2.68-1
ii  libquadmath0 8.2.0-7
ii  libstdc++6   8.2.0-7
ii  zlib1g   1:1.2.11.dfsg-1

Versions of packages libopenmpi3 recommends:
ii  openmpi-bin  3.1.2-5

libopenmpi3 suggests no packages.

-- no debconf information