Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

2021-09-25 Thread Thibaut Paumard
Control: found -1 gyoto/1.4.4-4
Control: found -1 gyoto/1.4.4-3
Control: notfound -1 gyoto/1.4.4-5

Hi Paul,

Le 24/09/2021 à 21:42, Paul Gevers a écrit :
> Is the workaround inside the binary, or only (needed) in the test suite?
> In other words, did openmpi *break* gyoto on i386 in some cases? If yes,
> Ideally openmpi is updated with a versioned Breaks on gyoto with the
> right unfixed package. The migration software then will schedule the set
> and the migration will happen if everything's fine.

The workaround is only in the test suite. There remains a bug, either
within openmpi or within gyoto but triggered by the new version of openmpi.

Concerning gyoto, I would only rate it "normal" though, not "serious",
if you can confirm that the workaround actually worked in the testing
testbed. If this is the case, I would decrease the severity of the bug
and keep it opened. It would be great if the openmpi mainainers could
have a look, but I guess they will need a me to provide a minimal
example which will not be easy to provide, unless they experience the
same symptoms in other situations.

Regards, Thibaut.



Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

2021-09-24 Thread Paul Gevers
Control: reassign -1 src:openmpi,src:gyoto
Control: found -1 openmpi/4.1.1-3
Control: found -1 gyoto/1.4.4-4
Hi

On 24-09-2021 11:42, Thibaut Paumard wrote:
> Control: reassign -1 src:openmpi src:gyoto

This assigned the package to version src:gyoto ;)

> I think I've found a workaround and am getting closer to finding the
> cause. I've just uploaded a package (gyoto 1.4.4-5) with the workaround.
> If you can then check that the test passes fine, I guess we will just
> have to let this gyoto migrate together with openmpi.

Is the workaround inside the binary, or only (needed) in the test suite?
In other words, did openmpi *break* gyoto on i386 in some cases? If yes,
Ideally openmpi is updated with a versioned Breaks on gyoto with the
right unfixed package. The migration software then will schedule the set
and the migration will happen if everything's fine.

> The code path is slightly different within gyoto between the two
> approaches so there could be a bug in gyoto, but it is puzzling that it
> only affect one specific input file, only on one architecture, and only
> with this new release of openmpi. And it still depends on the
> environment: I don't get the failure if I let autopkgtest run the test
> in my chroot, but I get it if I run the same commands manually in the
> same chroot.

Ouch.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

2021-09-24 Thread Thibaut Paumard
Control: reassign -1 src:openmpi src:gyoto

Hi Paul,

I think I've found a workaround and am getting closer to finding the
cause. I've just uploaded a package (gyoto 1.4.4-5) with the workaround.
If you can then check that the test passes fine, I guess we will just
have to let this gyoto migrate together with openmpi.

Gyoto supports two models for running within MPI: one can either specify
how many processes to run with the -np argument of mpirun, and one where
-np is set to 1 and gyoto itself spawns more processes (singleton approach).

The test used the singleton approach. If I now let mpirun spawn itself
the n processes, the test doesn't fail anymore.

The code path is slightly different within gyoto between the two
approaches so there could be a bug in gyoto, but it is puzzling that it
only affect one specific input file, only on one architecture, and only
with this new release of openmpi. And it still depends on the
environment: I don't get the failure if I let autopkgtest run the test
in my chroot, but I get it if I run the same commands manually in the
same chroot.

Best regards, Thibaut.



Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

2021-09-23 Thread Thibaut Paumard
Thanks Paul.

I don't think mpi4py is involved, but openmpi (4.1.1-5) is (based on
where in the test suite the bug happens, and on the fact that the
failure also occurs when only openmpi is frozen to unstable).

The puzzling bit is that the tests nicely go through in unstable, the
failure only occurs with the unstable openmpi in testing (and only on
i386, as far a I can tell).

Still investigating.

Regards, Thibaut.



smime.p7s
Description: Signature cryptographique S/MIME


Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

2021-09-23 Thread Paul Gevers
Hi Thibaut,

Thanks for investigating.

On 23-09-2021 13:44, Thibaut Paumard wrote:
> Note that this is apparently different from what the debci
> infrastructure does (apparently recompiling instead of taking the binaries).

We don't recompile unless "build-needed" is in the restrictions. We do
run in lxc instead of chroot.

> I then ran the test from this chroot (as sbuild user):
> autopkgtest -B --test-name=gyoto-mpi -- null

You can see from the top of the log that we ran with:
--no-built-binaries '--setup-commands=echo '"'"'gyoto testing/i386'"'"'
> /var/tmp/debci.pkg 2>&1 || true' '--setup-commands=echo
'"'"'Acquire::Retries "10";'"'"' > /etc/apt/apt.conf.d/75retry 2>&1 ||
true' --user debci --apt-upgrade '--add-apt-source=deb
http://incoming.debian.org/debian-buildd buildd-unstable main contrib
non-free' --add-apt-release=unstable
--pin-packages=unstable=src:mpi4py,src:openmpi --output-dir
/tmp/tmp.irEl4X24YS/autopkgtest-incoming/testing/i386/g/gyoto/15316145
gyoto -- lxc --sudo --name ci-262-d8ad913b autopkgtest-testing-i386

> I note that the test suite can take a long time to run and that the
> final line in the failure log for this test reads:
> 
> mpirun.openmpi noticed that process rank 1 with PID 0 on node
> ci-262-d8ad913b exited on signal 9 (Killed).
> 
> This happens after this test has been running for 1h and 55min.
> 
> Could it be that the process is killed because of a timeout in the test
> environment?

The autopktest timeout is at 2:47, so if anything this is a timeout
inside the test.

> Anyway, there's not much more I can do, except skip this test.

Can we get more logging? I can run something in the testbed if it helps
debugging the issue.

> I'm reassigning to openmpi because, on the debci infrastructure, the
> same failure occurs with openmpi/unstable also with mpi4py/testing.

Ack, I was informed out-of-band that the likely culpit was there.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

2021-09-23 Thread Thibaut Paumard
Control: reassign -1 src:openmpi
Control: found -1 openmpi/4.1.1-3
Control: tags -1 +unreproducible +help
Control: retitle -1 openmpi breaks gyoto autopkgtest on i386
Control: thanks

Hi,

I cannot reproduce this.

I have set up a testing-i386 chroot (on my amd64 laptop) and installed
openmpi from unstable (I also tried with mpi4py from unstable on top),
as well as their versionned dependencies not in testing:

libopenmpi-dev 4.1.1-5
libopenmpi34.1.1-5
openmpi-common 4.1.1-5
python3-mpi4py 3.0.3-10

I tried with gyoto from unstable (1.4.4-4) and from testing (1.4.4-3).

Note that this is apparently different from what the debci
infrastructure does (apparently recompiling instead of taking the binaries).

I then ran the test from this chroot (as sbuild user):
autopkgtest -B --test-name=gyoto-mpi -- null

The test PASSED each time.

I note that the test suite can take a long time to run and that the
final line in the failure log for this test reads:

mpirun.openmpi noticed that process rank 1 with PID 0 on node
ci-262-d8ad913b exited on signal 9 (Killed).

This happens after this test has been running for 1h and 55min.

Could it be that the process is killed because of a timeout in the test
environment?

In the one hand it would feel strange because, on the debci
infrastucture, the error always happens on the same file
(example-startrace), near the beginning of the process (row 1/32). On
the other hand I know this file is one of the longest to process (still
below a couple of minutes on my laptop).

Anyway, there's not much more I can do, except skip this test.

I'm reassigning to openmpi because, on the debci infrastructure, the
same failure occurs with openmpi/unstable also with mpi4py/testing.

Advice welcome.

Regards, Thibaut.




smime.p7s
Description: Signature cryptographique S/MIME


Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code

2021-09-20 Thread Paul Gevers
Source: mpi4py, gyoto
Control: found -1 mpi4py/3.0.3-10
Control: found -1 gyoto/1.4.4-3
Severity: serious
Tags: sid bookworm
X-Debbugs-CC: debian...@lists.debian.org
User: debian...@lists.debian.org
Usertags: breaks needs-update

Dear maintainer(s),

With a recent upload of mpi4py the autopkgtest of gyoto fails in testing
when that autopkgtest is run with the binary packages of mpi4py from
unstable. It passes when run with only packages from testing. In tabular
form:

   passfail
mpi4py from testing3.0.3-10
gyoto  from testing1.4.4-3
versioned deps [0] from testingfrom unstable
all others from testingfrom testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of mpi4py to testing
[1]. Due to the nature of this issue, I filed this bug report against
both packages. Can you please investigate the situation and reassign the
bug to the right package?

More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[0] You can see what packages were added from the second line of the log
file quoted below. The migration software adds source package from
unstable to the list if they are needed to install packages from
mpi4py/3.0.3-10. I.e. due to versioned dependencies or breaks/conflicts.
[1] https://qa.debian.org/excuses.php?package=mpi4py

https://ci.debian.net/data/autopkgtest/testing/i386/g/gyoto/15316145/log.gz

Reading parameter file:
/tmp/autopkgtest-lxc.8uv_qvhr/downtmp/build.nbg/src/doc/examples/example-startrace.xml
 Copyright (c) 2011-2019 Frédéric Vincent, Thibaut Paumard,
 Odele Straub and Frédéric Lamy.
 GYOTO is distributed under the terms of the GPL v. 3 license.
 We request that use of Gyoto in scientific publications be  properly
 acknowledged. Please cite:
  GYOTO: a new general relativistic ray-tracing code,
  F. H. Vincent, T. Paumard, E. Gourgoulhon & G. Perrin 2011,
  Classical and Quantum Gravity 28, 225011 (2011) [arXiv:1109.4769]

j =
1/32--
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpirun.openmpi noticed that process rank 1 with PID 0 on node
ci-262-d8ad913b exited on signal 9 (Killed).
--



OpenPGP_signature
Description: OpenPGP digital signature