Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code
Control: found -1 gyoto/1.4.4-4 Control: found -1 gyoto/1.4.4-3 Control: notfound -1 gyoto/1.4.4-5 Hi Paul, Le 24/09/2021 à 21:42, Paul Gevers a écrit : > Is the workaround inside the binary, or only (needed) in the test suite? > In other words, did openmpi *break* gyoto on i386 in some cases? If yes, > Ideally openmpi is updated with a versioned Breaks on gyoto with the > right unfixed package. The migration software then will schedule the set > and the migration will happen if everything's fine. The workaround is only in the test suite. There remains a bug, either within openmpi or within gyoto but triggered by the new version of openmpi. Concerning gyoto, I would only rate it "normal" though, not "serious", if you can confirm that the workaround actually worked in the testing testbed. If this is the case, I would decrease the severity of the bug and keep it opened. It would be great if the openmpi mainainers could have a look, but I guess they will need a me to provide a minimal example which will not be easy to provide, unless they experience the same symptoms in other situations. Regards, Thibaut.
Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code
Control: reassign -1 src:openmpi,src:gyoto Control: found -1 openmpi/4.1.1-3 Control: found -1 gyoto/1.4.4-4 Hi On 24-09-2021 11:42, Thibaut Paumard wrote: > Control: reassign -1 src:openmpi src:gyoto This assigned the package to version src:gyoto ;) > I think I've found a workaround and am getting closer to finding the > cause. I've just uploaded a package (gyoto 1.4.4-5) with the workaround. > If you can then check that the test passes fine, I guess we will just > have to let this gyoto migrate together with openmpi. Is the workaround inside the binary, or only (needed) in the test suite? In other words, did openmpi *break* gyoto on i386 in some cases? If yes, Ideally openmpi is updated with a versioned Breaks on gyoto with the right unfixed package. The migration software then will schedule the set and the migration will happen if everything's fine. > The code path is slightly different within gyoto between the two > approaches so there could be a bug in gyoto, but it is puzzling that it > only affect one specific input file, only on one architecture, and only > with this new release of openmpi. And it still depends on the > environment: I don't get the failure if I let autopkgtest run the test > in my chroot, but I get it if I run the same commands manually in the > same chroot. Ouch. Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code
Control: reassign -1 src:openmpi src:gyoto Hi Paul, I think I've found a workaround and am getting closer to finding the cause. I've just uploaded a package (gyoto 1.4.4-5) with the workaround. If you can then check that the test passes fine, I guess we will just have to let this gyoto migrate together with openmpi. Gyoto supports two models for running within MPI: one can either specify how many processes to run with the -np argument of mpirun, and one where -np is set to 1 and gyoto itself spawns more processes (singleton approach). The test used the singleton approach. If I now let mpirun spawn itself the n processes, the test doesn't fail anymore. The code path is slightly different within gyoto between the two approaches so there could be a bug in gyoto, but it is puzzling that it only affect one specific input file, only on one architecture, and only with this new release of openmpi. And it still depends on the environment: I don't get the failure if I let autopkgtest run the test in my chroot, but I get it if I run the same commands manually in the same chroot. Best regards, Thibaut.
Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code
Thanks Paul. I don't think mpi4py is involved, but openmpi (4.1.1-5) is (based on where in the test suite the bug happens, and on the fact that the failure also occurs when only openmpi is frozen to unstable). The puzzling bit is that the tests nicely go through in unstable, the failure only occurs with the unstable openmpi in testing (and only on i386, as far a I can tell). Still investigating. Regards, Thibaut. smime.p7s Description: Signature cryptographique S/MIME
Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code
Hi Thibaut, Thanks for investigating. On 23-09-2021 13:44, Thibaut Paumard wrote: > Note that this is apparently different from what the debci > infrastructure does (apparently recompiling instead of taking the binaries). We don't recompile unless "build-needed" is in the restrictions. We do run in lxc instead of chroot. > I then ran the test from this chroot (as sbuild user): > autopkgtest -B --test-name=gyoto-mpi -- null You can see from the top of the log that we ran with: --no-built-binaries '--setup-commands=echo '"'"'gyoto testing/i386'"'"' > /var/tmp/debci.pkg 2>&1 || true' '--setup-commands=echo '"'"'Acquire::Retries "10";'"'"' > /etc/apt/apt.conf.d/75retry 2>&1 || true' --user debci --apt-upgrade '--add-apt-source=deb http://incoming.debian.org/debian-buildd buildd-unstable main contrib non-free' --add-apt-release=unstable --pin-packages=unstable=src:mpi4py,src:openmpi --output-dir /tmp/tmp.irEl4X24YS/autopkgtest-incoming/testing/i386/g/gyoto/15316145 gyoto -- lxc --sudo --name ci-262-d8ad913b autopkgtest-testing-i386 > I note that the test suite can take a long time to run and that the > final line in the failure log for this test reads: > > mpirun.openmpi noticed that process rank 1 with PID 0 on node > ci-262-d8ad913b exited on signal 9 (Killed). > > This happens after this test has been running for 1h and 55min. > > Could it be that the process is killed because of a timeout in the test > environment? The autopktest timeout is at 2:47, so if anything this is a timeout inside the test. > Anyway, there's not much more I can do, except skip this test. Can we get more logging? I can run something in the testbed if it helps debugging the issue. > I'm reassigning to openmpi because, on the debci infrastructure, the > same failure occurs with openmpi/unstable also with mpi4py/testing. Ack, I was informed out-of-band that the likely culpit was there. Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code
Control: reassign -1 src:openmpi Control: found -1 openmpi/4.1.1-3 Control: tags -1 +unreproducible +help Control: retitle -1 openmpi breaks gyoto autopkgtest on i386 Control: thanks Hi, I cannot reproduce this. I have set up a testing-i386 chroot (on my amd64 laptop) and installed openmpi from unstable (I also tried with mpi4py from unstable on top), as well as their versionned dependencies not in testing: libopenmpi-dev 4.1.1-5 libopenmpi34.1.1-5 openmpi-common 4.1.1-5 python3-mpi4py 3.0.3-10 I tried with gyoto from unstable (1.4.4-4) and from testing (1.4.4-3). Note that this is apparently different from what the debci infrastructure does (apparently recompiling instead of taking the binaries). I then ran the test from this chroot (as sbuild user): autopkgtest -B --test-name=gyoto-mpi -- null The test PASSED each time. I note that the test suite can take a long time to run and that the final line in the failure log for this test reads: mpirun.openmpi noticed that process rank 1 with PID 0 on node ci-262-d8ad913b exited on signal 9 (Killed). This happens after this test has been running for 1h and 55min. Could it be that the process is killed because of a timeout in the test environment? In the one hand it would feel strange because, on the debci infrastucture, the error always happens on the same file (example-startrace), near the beginning of the process (row 1/32). On the other hand I know this file is one of the longest to process (still below a couple of minutes on my laptop). Anyway, there's not much more I can do, except skip this test. I'm reassigning to openmpi because, on the debci infrastructure, the same failure occurs with openmpi/unstable also with mpi4py/testing. Advice welcome. Regards, Thibaut. smime.p7s Description: Signature cryptographique S/MIME
Bug#994784: mpi4py breaks gyoto autopkgtest on i386: 1 process returned, a non-zero exit code
Source: mpi4py, gyoto Control: found -1 mpi4py/3.0.3-10 Control: found -1 gyoto/1.4.4-3 Severity: serious Tags: sid bookworm X-Debbugs-CC: debian...@lists.debian.org User: debian...@lists.debian.org Usertags: breaks needs-update Dear maintainer(s), With a recent upload of mpi4py the autopkgtest of gyoto fails in testing when that autopkgtest is run with the binary packages of mpi4py from unstable. It passes when run with only packages from testing. In tabular form: passfail mpi4py from testing3.0.3-10 gyoto from testing1.4.4-3 versioned deps [0] from testingfrom unstable all others from testingfrom testing I copied some of the output at the bottom of this report. Currently this regression is blocking the migration of mpi4py to testing [1]. Due to the nature of this issue, I filed this bug report against both packages. Can you please investigate the situation and reassign the bug to the right package? More information about this bug and the reason for filing it can be found on https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation Paul [0] You can see what packages were added from the second line of the log file quoted below. The migration software adds source package from unstable to the list if they are needed to install packages from mpi4py/3.0.3-10. I.e. due to versioned dependencies or breaks/conflicts. [1] https://qa.debian.org/excuses.php?package=mpi4py https://ci.debian.net/data/autopkgtest/testing/i386/g/gyoto/15316145/log.gz Reading parameter file: /tmp/autopkgtest-lxc.8uv_qvhr/downtmp/build.nbg/src/doc/examples/example-startrace.xml Copyright (c) 2011-2019 Frédéric Vincent, Thibaut Paumard, Odele Straub and Frédéric Lamy. GYOTO is distributed under the terms of the GPL v. 3 license. We request that use of Gyoto in scientific publications be properly acknowledged. Please cite: GYOTO: a new general relativistic ray-tracing code, F. H. Vincent, T. Paumard, E. Gourgoulhon & G. Perrin 2011, Classical and Quantum Gravity 28, 225011 (2011) [arXiv:1109.4769] j = 1/32-- Child job 2 terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- mpirun.openmpi noticed that process rank 1 with PID 0 on node ci-262-d8ad913b exited on signal 9 (Killed). -- OpenPGP_signature Description: OpenPGP digital signature