Bug#1004107: meson: flaky autopkgtest on armhf: dictionary changed size during iteration -> timeout

2022-05-06 Thread Jussi Pakkanen
On Thu, 5 May 2022 at 22:39, Paul Gevers  wrote:

> It just occurred to me that it may be useful to try and reduce the
> number of concurrent running tests to something you would expect on a
> more normal computer (under conditions where the framework is better
> tested). Our armel host has 160 cores, similar, our amd64 ci-worker13
> host has 56.

No harm in trying I guess:

https://github.com/mesonbuild/meson/pull/10358



Bug#1004107: meson: flaky autopkgtest on armhf: dictionary changed size during iteration -> timeout

2022-05-05 Thread Paul Gevers

Hi Jussi,

On 21-01-2022 19:17, Paul Gevers wrote:

Running tests with 160 workers


It just occurred to me that it may be useful to try and reduce the 
number of concurrent running tests to something you would expect on a 
more normal computer (under conditions where the framework is better 
tested). Our armel host has 160 cores, similar, our amd64 ci-worker13 
host has 56.


Paul

https://sources.debian.org/src/meson/0.62.1-1/run_project_tests.py/#L1542

https://sources.debian.org/src/meson/0.62.1-1/run_project_tests.py/#L1552


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1004107: meson: flaky autopkgtest on armhf: dictionary changed size during iteration -> timeout

2022-01-21 Thread Paul Gevers

Hi Jussi,

On 21-01-2022 00:05, Jussi Pakkanen wrote:

On Thu, 20 Jan 2022 at 23:33, Paul Gevers  wrote:


I looked at the results of the autopkgtest of you package on armhf
because it was showing up as a regression for the upload of
python-defaults and setuptools. I noticed that the test regularly fails,
what's worse, it also seems to hang as the test is killed because it
hits an autopkgtest timeout.


If we look at the backtrace:


Running tests with 160 workers
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.9/threading.py", line 973, in _bootstrap_inner
  self.run()
File "/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
  result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
File "/usr/lib/python3.9/concurrent/futures/process.py", line 376, in
wait_result_broken_or_wakeup
  worker_sentinels = [p.sentinel for p in self.processes.values()]
File "/usr/lib/python3.9/concurrent/futures/process.py", line 376, in

  worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration


This is all Python's internal code. Further, Meson does not do any
fancy threading stuff itself, it uses Python's thread and process
executors to queue up a bunch of work and then wait for it to be done.
According to Python's documentation you don't need to do any locking
or similar to submit new work, you can call the submit method
directly. All of this would seem to indicate that the issue might lie
somewhere in Python's multithreading code. At the very least I have no
idea how I should go about debugging that issue.


Reading the log, it's not clear to me if that backtrace is even related 
to the hang. I mean, in my (non-Python related) experience if something 
goes wrong in a parallel process, you can get logs from parallel pieces 
that fail, while not really causes problems with the actual processing.


Of course, it's still true that it's difficult to troubleshoot.

Paul


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1004107: meson: flaky autopkgtest on armhf: dictionary changed size during iteration -> timeout

2022-01-20 Thread Jussi Pakkanen
On Thu, 20 Jan 2022 at 23:33, Paul Gevers  wrote:

> I looked at the results of the autopkgtest of you package on armhf
> because it was showing up as a regression for the upload of
> python-defaults and setuptools. I noticed that the test regularly fails,
> what's worse, it also seems to hang as the test is killed because it
> hits an autopkgtest timeout.

If we look at the backtrace:

> Running tests with 160 workers
> Exception in thread Thread-1:
> Traceback (most recent call last):
>File "/usr/lib/python3.9/threading.py", line 973, in _bootstrap_inner
>  self.run()
>File "/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
>  result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
>File "/usr/lib/python3.9/concurrent/futures/process.py", line 376, in
> wait_result_broken_or_wakeup
>  worker_sentinels = [p.sentinel for p in self.processes.values()]
>File "/usr/lib/python3.9/concurrent/futures/process.py", line 376, in
> 
>  worker_sentinels = [p.sentinel for p in self.processes.values()]
> RuntimeError: dictionary changed size during iteration

This is all Python's internal code. Further, Meson does not do any
fancy threading stuff itself, it uses Python's thread and process
executors to queue up a bunch of work and then wait for it to be done.
According to Python's documentation you don't need to do any locking
or similar to submit new work, you can call the submit method
directly. All of this would seem to indicate that the issue might lie
somewhere in Python's multithreading code. At the very least I have no
idea how I should go about debugging that issue.



Bug#1004107: meson: flaky autopkgtest on armhf: dictionary changed size during iteration -> timeout

2022-01-20 Thread Paul Gevers

Source: meson
Version: 0.56.2-1
Severity: serious
X-Debbugs-CC: debian...@lists.debian.org
User: debian...@lists.debian.org
Usertags: flaky timeout

Dear maintainer(s),

I looked at the results of the autopkgtest of you package on armhf 
because it was showing up as a regression for the upload of 
python-defaults and setuptools. I noticed that the test regularly fails, 
what's worse, it also seems to hang as the test is killed because it 
hits an autopkgtest timeout.


Because the unstable-to-testing migration software now blocks on
regressions in testing, flaky tests, i.e. tests that flip between
passing and failing without changes to the list of installed packages,
are causing people unrelated to your package to spend time on these
tests. In this case, Release Team members had to investigate if curl was 
OK to go into the next Stable point release.


Don't hesitate to reach out if you need help and some more information
from our infrastructure. Please note that the host we run our armhf 
tests on is very powerful. It has 160 cores and 255 GB RAM. This is 
sometimes the root cause of test that fail. It seems that before we 
switch to this host, the test was more reliable.


Paul

https://ci.debian.net/packages/m/meson/testing/amd64/

E.g. 
https://ci.debian.net/data/autopkgtest/testing/armhf/m/meson/18519155/log.gz


Ran 462 tests in 569.524s

OK (skipped=66)
Meson build system 0.61.0 Unit Tests
pytest-xdist not found, using unittest instead
Total time: 569.540 seconds
Meson build system 0.61.0 Project Tests
Using python 3.9.9 (main, Jan 12 2022, 16:10:51)

host machine compilers

c  : [gcc]  cc (gcc 11.2.0 "cc (Debian 11.2.0-13) 11.2.0")
cpp: [gcc]  c++ (gcc 11.2.0 "c++ (Debian 11.2.0-13) 11.2.0")
cs : [mono] mcs (mono 6.8.0.105)
cuda   : [not found]
cython : [not found]
d  : [llvm] ldc2 (llvm 1.28.0 "LDC - the LLVM D compiler (1.28.0):")
fortran: [gcc]  gfortran (gcc 11.2.0 "GNU Fortran (Debian 11.2.0-13) 
11.2.0")

java   : [unknown]  javac (unknown 11.0.13)
objc   : [gcc]  cc (gcc 11.2.0)
objcpp : [gcc]  c++ (gcc 11.2.0)
rust   : [rustc]rustc -C linker=cc (rustc 1.56.0)
swift  : [not found]
vala   : [valac]valac (valac 0.54.6)

tools

ninja  : /usr/bin/ninja (1.10.1)
cmake  : /usr/bin/cmake (3.22.1)
hotdoc : not found

Checking that configuring works...
Checking that introspect works...
Checking that building works...
Checking that testing works...
Checking that installing works...

Running tests with 160 workers
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File "/usr/lib/python3.9/concurrent/futures/process.py", line 376, in 
wait_result_broken_or_wakeup

worker_sentinels = [p.sentinel for p in self.processes.values()]
  File "/usr/lib/python3.9/concurrent/futures/process.py", line 376, in 


worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration

Running cmake tests.

autopkgtest [16:04:29]: ERROR: timed out on command "su -s /bin/bash 
debci -c set -e; export USER=`id -nu`; . /etc/profile >/dev/null 2>&1 || 
true;  . ~/.profile >/dev/null 2>&1 || true; 
buildtree="/tmp/autopkgtest-lxc.f3gr65px/downtmp/build.IRe/src"; mkdir 
-p -m 1777 -- 
"/tmp/autopkgtest-lxc.f3gr65px/downtmp/exhaustive-artifacts"; export 
AUTOPKGTEST_ARTIFACTS="/tmp/autopkgtest-lxc.f3gr65px/downtmp/exhaustive-artifacts"; 
export ADT_ARTIFACTS="$AUTOPKGTEST_ARTIFACTS"; mkdir -p -m 755 
"/tmp/autopkgtest-lxc.f3gr65px/downtmp/autopkgtest_tmp"; export 
AUTOPKGTEST_TMP="/tmp/autopkgtest-lxc.f3gr65px/downtmp/autopkgtest_tmp"; 
export ADTTMP="$AUTOPKGTEST_TMP"; export DEBIAN_FRONTEND=noninteractive; 
export LANG=C.UTF-8; export DEB_BUILD_OPTIONS=parallel=160; unset 
LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE   LC_MONETARY 
LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS   LC_TELEPHONE LC_MEASUREMENT 
LC_IDENTIFICATION LC_ALL;rm -f /tmp/autopkgtest_script_pid; set -C; echo 
$$ > /tmp/autopkgtest_script_pid; set +C; trap "rm -f 
/tmp/autopkgtest_script_pid" EXIT INT QUIT PIPE; cd "$buildtree"; chmod 
+x 
/tmp/autopkgtest-lxc.f3gr65px/downtmp/build.IRe/src/debian/tests/exhaustive; 
touch /tmp/autopkgtest-lxc.f3gr65px/downtmp/exhaustive-stdout 
/tmp/autopkgtest-lxc.f3gr65px/downtmp/exhaustive-stderr; 
/tmp/autopkgtest-lxc.f3gr65px/downtmp/build.IRe/src/debian/tests/exhaustive 
2> >(tee -a /tmp/autopkgtest-lxc.f3gr65px/downtmp/exhaustive-stderr >&2) 
> >(tee -a /tmp/autopkgtest-lxc.f3gr65px/downtmp/exhaustive-stdout);" 
(kind: test)

autopkgtest [16:04:30]: test exhaustive: ---]


OpenPGP_signature
Description: OpenPGP digital signature