Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-02-25 Thread Paul Gevers

Hi,

On Fri, 26 Jan 2024 11:31:37 +0100 Chris Hofstaedtler  
wrote:

Paul Gevers noted that src:pdns's autopkgtests fail every so often
on a large amd64 debci worker and on s390x workers. Apparently a
similar problem can be seen in src:pdns-recursor's debci runs.


The issue (or at least some issue) seems to be kernel related. Due to 
issues with the backports kernel on arm64, we had to revert to the 
bookworm kernel and now pdns fails on arm64 too. On ppc64el and riscv64 
the test passes for the last two months, both run a newer kernel 
(backports or even sid). However, s390x also runs a backports kernel and 
the issue still exists there.


Paul
By the way, if you want to use "exit 77" when conditions are not met, 
you also need to set the skippable restriction on those tests, otherwise 
the exit code is used like any other.


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-29 Thread Chris Hofstaedtler
* Paul Gevers  [240126 22:25]:
> Hi zeha,
> 
> On 26-01-2024 10:21, Chris Hofstaedtler wrote:
> > I see this "works", but now the tests fail after one try on the
> > problematic worker and then are never retried. Can this please be
> > fixed?
> 
> What do you have in mind? I think you need to wait until issue 166 [1] is
> fixed, which I guess isn't going to happen soon.

166 seems like an option, or auto-retry on a different worker, if
thats possible?

Chris



Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-26 Thread Paul Gevers

Hi zeha,

On 26-01-2024 10:21, Chris Hofstaedtler wrote:

I see this "works", but now the tests fail after one try on the
problematic worker and then are never retried. Can this please be
fixed?


What do you have in mind? I think you need to wait until issue 166 [1] 
is fixed, which I guess isn't going to happen soon.


Paul

[1] https://salsa.debian.org/ci-team/debci/-/issues/166


OpenPGP_signature.asc
Description: OpenPGP digital signature


Processed: Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-26 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

> clone 1059995 -1
Bug #1059995 {Done: Chris Hofstaedtler } [src:pdns] pdns: 
flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC 
namespacing: Resource temporarily unavailable
Bug 1059995 cloned as bug 1061554
> reopen -1
Bug #1061554 {Done: Chris Hofstaedtler } [src:pdns] pdns: 
flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC 
namespacing: Resource temporarily unavailable
'reopen' may be inappropriate when a bug has been closed with a version;
all fixed versions will be cleared, and you may need to re-add them.
Bug reopened
No longer marked as fixed in versions pdns/4.8.3-3.
> reassign -1 systemd
Bug #1061554 [src:pdns] pdns: flaky autopkgtest (host dependent): pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
Bug reassigned from package 'src:pdns' to 'systemd'.
No longer marked as found in versions pdns/4.8.3-2.
Ignoring request to alter fixed versions of bug #1061554 to the same values 
previously set
> found -1 systemd/254.3-1
Bug #1061554 [systemd] pdns: flaky autopkgtest (host dependent): pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
Marked as found in versions systemd/254.3-1.
> forwarded -1 https://github.com/systemd/systemd/issues/31037
Bug #1061554 [systemd] pdns: flaky autopkgtest (host dependent): pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
Set Bug forwarded-to-address to 
'https://github.com/systemd/systemd/issues/31037'.
> thanks
Stopping processing here.

Please contact me if you need assistance.
-- 
1059995: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059995
1061554: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1061554
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-26 Thread Chris Hofstaedtler
clone 1059995 -1
reopen -1
reassign -1 systemd
found -1 systemd/254.3-1
forwarded -1 https://github.com/systemd/systemd/issues/31037
thanks

Dear systemd Packagers,

Paul Gevers noted that src:pdns's autopkgtests fail every so often
on a large amd64 debci worker and on s390x workers. Apparently a
similar problem can be seen in src:pdns-recursor's debci runs.

As there is no pdns(-recursor) code running at this point, this
seems to be a problem somewhere in the space of systemd <> lxc <>
apparmor <> kernel.

I've opened a bug with systemd upstream, unfortunately with very
little info as I don't know how to provide additional info from
within a debci run. Help with providing additional info would be
very welcome.

Thanks,
Chris



Bug#1059995: Re: Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-26 Thread Chris Hofstaedtler
Hi Paul,

* Paul Gevers  [240104 18:14]:
> Can you figure out decent numbers for these? Below I printed the output of
> lsipc and AFAICT SHMMAX is already pretty big ;) (and the same on all our
> hosts, which is also true for MSGMAX).
> 
> On the other hand, $(ipcs -a) doesn't show anything on the host, not even if
> I let it run in a while-loop (1 second interval) while I schedule the test
> of pdns. So, could this be a bug in systemd (which you claim below should be
> handeling this) or is this just not really supported in lxc and do you need
> a full VM. Because it works elsewhere, I feel more like a bug, and it would
> not be the first instance where code fails to properly handle 64 cores or
> 256GB or RAM.

Likely, but it is probably in systemd or in lxc or in apparmor or
elsewhere.

> > > > I wouldn't know what to do about this, its not really under the
> > > > control of src:pdns.
> > > 
> > > Well, maybe check for it and fail gracefully?
> > 
> > But how? systemd sets up the IPC namespace.
> 
> exit with 77 when you detect problems and add the skippable restriction.

I see this "works", but now the tests fail after one try on the
problematic worker and then are never retried. Can this please be
fixed?

Thanks,
Chris



Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-21 Thread Chris Hofstaedtler
On Fri, Jan 12, 2024 at 08:02:53PM +0100, Paul Gevers wrote:
> Hi,
> 
> On 12-01-2024 12:36, Chris Hofstaedtler wrote:
> > can you confirm two additional things please:
> > 
> > 1) this happens only on the large host?
> 
> https://ci.debian.net/packages/p/pdns/testing/s390x/41650331/
> 
> Seems it happens on our s390x host too (which has 10 debci workers running
> in parallel).
> 
> > 2) this does not or does happen with other packages also requesting
> > the same settings from systemd, e.g. dnsdist or pdns-recursor?
> 
> https://ci.debian.net/packages/d/dnsdist/ -> Page not found.
> 
> pdns-recursor seems to be flaky as well on amd64 and all passing tests were
> on one of the smaller hosts. pdns-recursor passes on s390x though.

For now I've added the exit 77 hack in the pdns tests, but this is
quite unsatisfying.

I've opened an issue with systemd upstream, maybe someone there has
any insight: https://github.com/systemd/systemd/issues/31037

Chris



Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-12 Thread Paul Gevers

Hi,

On 12-01-2024 12:36, Chris Hofstaedtler wrote:

can you confirm two additional things please:

1) this happens only on the large host?


https://ci.debian.net/packages/p/pdns/testing/s390x/41650331/

Seems it happens on our s390x host too (which has 10 debci workers 
running in parallel).



2) this does not or does happen with other packages also requesting
the same settings from systemd, e.g. dnsdist or pdns-recursor?


https://ci.debian.net/packages/d/dnsdist/ -> Page not found.

pdns-recursor seems to be flaky as well on amd64 and all passing tests 
were on one of the smaller hosts. pdns-recursor passes on s390x though.


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-12 Thread Chris Hofstaedtler
Hi,

can you confirm two additional things please:

1) this happens only on the large host?

2) this does not or does happen with other packages also requesting
the same settings from systemd, e.g. dnsdist or pdns-recursor?

Chris



Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-04 Thread Paul Gevers

Hi,

On 04-01-2024 17:28, Chris Hofstaedtler wrote:

On Thu, Jan 04, 2024 at 03:37:21PM +0100, Paul Gevers wrote:

Hi,

On 04-01-2024 15:08, Chris Hofstaedtler wrote:

It would seem that the host runs out of IPC space?


What is IPC space?


https://manpages.debian.org/bookworm/manpages/sysvipc.7.en.html
https://manpages.debian.org/bookworm/manpages/ipc_namespaces.7.en.html


And when does a host run out of it? As I said, this is
one of our most powerful hosts, so I would expect it to run out of things
last.


Does it run more tests in parallel than other workers, or so?


Yes, this host (like most of our host, but a bit more) runs multiple lxc
based debci workers.


My guess: the default limits are static, and if LXC doesn't do
anything special, the limits are probably shared with the whole
host.
kernel.shmmax, kernel.msgmax are I think the limits (but I'm not
entirely sure).


Can you figure out decent numbers for these? Below I printed the output 
of lsipc and AFAICT SHMMAX is already pretty big ;) (and the same on all 
our hosts, which is also true for MSGMAX).


On the other hand, $(ipcs -a) doesn't show anything on the host, not 
even if I let it run in a while-loop (1 second interval) while I 
schedule the test of pdns. So, could this be a bug in systemd (which you 
claim below should be handeling this) or is this just not really 
supported in lxc and do you need a full VM. Because it works elsewhere, 
I feel more like a bug, and it would not be the first instance where 
code fails to properly handle 64 cores or 256GB or RAM.



I wouldn't know what to do about this, its not really under the
control of src:pdns.


Well, maybe check for it and fail gracefully?


But how? systemd sets up the IPC namespace.


exit with 77 when you detect problems and add the skippable restriction.


Or, since a couple of days, if
qemu VM don't run out of IPC space, we could run them in qemu always.


I imagine a fully separated VM would not run out of IPC space,
indeed.


I just ran the test in qemu on ci-worker13 and it PASSed.

Paul

root@ci-worker13:~# lsipc
RESOURCE DESCRIPTION  LIMIT 
USED  USE%
MSGMNI   Number of message queues 32000 
  0 0.00%
MSGMAX   Max size of message (bytes) 8K 
  - -
MSGMNB   Default max size of queue (bytes)  16K 
  - -
SHMMNI   Shared memory segments4096 
  0 0.00%
SHMALL   Shared memory pages   18446744073692774399 
  0 0.00%
SHMMAX   Max size of shared memory segment (bytes)  16E 
  - -
SHMMIN   Min size of shared memory segment (bytes)   1B 
  - -
SEMMNI   Number of semaphore identifiers  32000 
  0 0.00%
SEMMNS   Total number of semaphores  102400 
  0 0.00%
SEMMSL   Max semaphores per semaphore set.32000 
  - -
SEMOPM   Max number of operations per semop(2)  500 
  - -
SEMVMX   Semaphore max value  32767 
  - -


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-04 Thread Chris Hofstaedtler
On Thu, Jan 04, 2024 at 03:37:21PM +0100, Paul Gevers wrote:
> Hi,
> 
> On 04-01-2024 15:08, Chris Hofstaedtler wrote:
> > It would seem that the host runs out of IPC space?
> 
> What is IPC space?

https://manpages.debian.org/bookworm/manpages/sysvipc.7.en.html
https://manpages.debian.org/bookworm/manpages/ipc_namespaces.7.en.html

> And when does a host run out of it? As I said, this is
> one of our most powerful hosts, so I would expect it to run out of things
> last.
> 
> > Does it run more tests in parallel than other workers, or so?
> 
> Yes, this host (like most of our host, but a bit more) runs multiple lxc
> based debci workers.

My guess: the default limits are static, and if LXC doesn't do
anything special, the limits are probably shared with the whole
host.
kernel.shmmax, kernel.msgmax are I think the limits (but I'm not
entirely sure).

> > I wouldn't know what to do about this, its not really under the
> > control of src:pdns.
> 
> Well, maybe check for it and fail gracefully?

But how? systemd sets up the IPC namespace.

> Or, since a couple of days, if
> qemu VM don't run out of IPC space, we could run them in qemu always.

I imagine a fully separated VM would not run out of IPC space,
indeed.

Chris



Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-04 Thread Paul Gevers

Hi,

On 04-01-2024 15:08, Chris Hofstaedtler wrote:

It would seem that the host runs out of IPC space?


What is IPC space? And when does a host run out of it? As I said, this 
is one of our most powerful hosts, so I would expect it to run out of 
things last.



Does it run more tests in parallel than other workers, or so?


Yes, this host (like most of our host, but a bit more) runs multiple lxc 
based debci workers.



I wouldn't know what to do about this, its not really under the
control of src:pdns.


Well, maybe check for it and fail gracefully? Or, since a couple of 
days, if qemu VM don't run out of IPC space, we could run them in qemu 
always.


Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-04 Thread Chris Hofstaedtler
On Thu, Jan 04, 2024 at 02:42:59PM +0100, Paul Gevers wrote:
> 269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: Failed
> to set up IPC namespacing: Resource temporarily unavailable
> 269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: Failed
> at step NAMESPACE spawning /usr/sbin/pdns_server: Resource temporarily
> unavailable

It would seem that the host runs out of IPC space?
Does it run more tests in parallel than other workers, or so?

I wouldn't know what to do about this, its not really under the
control of src:pdns.

Chris



Bug#1059995: pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

2024-01-04 Thread Paul Gevers

Source: pdns
Version: 4.8.3-2
Severity: serious
User: debian...@lists.debian.org
Usertags: flaky

Dear maintainer(s),

I looked at the results of the autopkgtest of your package. I noticed 
that it regularly fails. The failures seem related on the host that runs 
the test. ci-worker13 is a beefy machine [1] and test seem to fail 
consistently there, while the other amd64 workers are much more moderate 
[2] and tests pass there.


Because the unstable-to-testing migration software now blocks on
regressions in testing, flaky tests, i.e. tests that flip between
passing and failing without changes to the list of installed packages,
are causing people unrelated to your package to spend time on these
tests.

Don't hesitate to reach out if you need help and some more information
from our infrastructure.

Paul

[1] https://metal.equinix.com/product/servers/m3-large/
[2] https://aws.amazon.com/ec2/instance-types/m5/

https://ci.debian.net/packages/p/pdns/testing/amd64/

https://ci.debian.net/data/autopkgtest/testing/amd64/p/pdns/41325109/log.gz

268s + service pdns restart
269s Job for pdns.service failed because the control process exited with 
error code.
269s See "systemctl status pdns.service" and "journalctl -xeu 
pdns.service" for details.

269s + journalctl _SYSTEMD_UNIT=pdns.service -n 10 --no-pager
269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service: 
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource 
temporarily unavailable
269s Dec 25 16:13:21 ci-359-77591125 (s_server)[3852]: pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:21 ci-359-77591125 (s_server)[3852]: pdns.service: 
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource 
temporarily unavailable
269s Dec 25 16:13:23 ci-359-77591125 (s_server)[3876]: pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:23 ci-359-77591125 (s_server)[3876]: pdns.service: 
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource 
temporarily unavailable
269s Dec 25 16:13:24 ci-359-77591125 (s_server)[3886]: pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:24 ci-359-77591125 (s_server)[3886]: pdns.service: 
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource 
temporarily unavailable
269s Dec 25 16:13:25 ci-359-77591125 (s_server)[3915]: pdns.service: 
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:25 ci-359-77591125 (s_server)[3915]: pdns.service: 
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource 
temporarily unavailable

269s ++ mktemp
269s + TMPFILE=/tmp/tmp.jah1Y5TJIa
269s + trap cleanup EXIT
269s + tee /tmp/tmp.jah1Y5TJIa
269s + sdig 127.0.0.1 53 smoke.pgsql.example.org A
279s Fatal: Timeout waiting for data
279s + grep -c '127\.0\.0\.222' /tmp/tmp.jah1Y5TJIa
279s 0
279s + echo smoke.pgsql.example.org could not be resolved
279s smoke.pgsql.example.org could not be resolved
279s + exit 1
279s + cleanup


OpenPGP_signature.asc
Description: OpenPGP digital signature