Bug#951722: autopkgtest suite flaky on arm64

2020-06-27 Thread Michael Biebl
On Sat, 7 Mar 2020 16:01:22 +0200 Mpampis Kostas 
wrote:
> Hello,
> 
> This doesn't seem to be arm64 related since the same occurs on ppc64el.
> 
> I've been reproducing this failure consistently by running the autopkgtest
> suite on a stressed host.
> I think that the failure appears under similar high-load circumstances on
> the debian ci host.
> 
> dovecot-lda communicates with dovecot through the socket at
> /var/run/dovecot/auth-userdb but on a stressed host
> it's possible for dovecot-lda to call connect() before listen() is called
> on this socket by dovecot.

I forgot to say: Thanks a lot Mpampis!
Your analysis help a lot to get to find the root cause of this issue.
It was immensely helpful.

Regards,
Michael



signature.asc
Description: OpenPGP digital signature


Bug#951722: autopkgtest suite flaky on arm64

2020-05-24 Thread Nis Martensen
control: tags -1 patch

Michael sent me his draft patch. Slightly modified version with added
commit message attached. Diff is against latest upstream git version.

Thanks Michael!
>From 89399122692823bc215cf1097b05da4ee2201e0e Mon Sep 17 00:00:00 2001
From: Nis Martensen 
Date: Sun, 24 May 2020 22:05:42 +0200
Subject: [PATCH 1/2] systemd integration: notify service manager when ready

With Type=simple or Type=forking, systemd does not really know when the
service is ready to accept connections and might start depending
services too early. Use Type=notify to explicitly tell the service
manager when the service is ready.

For a real problem caused by assuming readiness too early, please see
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951722

For the meaning of the service type and details of the readiness
protocol, see also the following links:
https://www.freedesktop.org/software/systemd/man/systemd.service.html#Type=
https://www.freedesktop.org/software/systemd/man/sd_notify.html

As discussed in the last link, more elaborate state notifications are
possible. This patch only implements the most basic part.

Original patch prepared by Michael Biebl, with slight modification.
---
 dovecot.service.in   | 3 +--
 src/lib-master/master-service-settings.c | 2 +-
 src/master/main.c| 6 ++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/dovecot.service.in b/dovecot.service.in
index 5c45f590b..a1df992c5 100644
--- a/dovecot.service.in
+++ b/dovecot.service.in
@@ -14,9 +14,8 @@ Documentation=http://wiki2.dovecot.org/
 After=local-fs.target network-online.target
 
 [Service]
-Type=simple
+Type=notify
 ExecStart=@sbindir@/dovecot -F
-PIDFile=@rundir@/master.pid
 ExecReload=@bindir@/doveadm reload
 ExecStop=@bindir@/doveadm stop
 PrivateTmp=true
diff --git a/src/lib-master/master-service-settings.c b/src/lib-master/master-service-settings.c
index 657ef66bc..c7b8b369c 100644
--- a/src/lib-master/master-service-settings.c
+++ b/src/lib-master/master-service-settings.c
@@ -62,7 +62,7 @@ static const struct setting_define master_service_setting_defines[] = {
 
 /*  */
 #ifdef HAVE_SYSTEMD
-#  define ENV_SYSTEMD " LISTEN_PID LISTEN_FDS"
+#  define ENV_SYSTEMD " LISTEN_PID LISTEN_FDS NOTIFY_SOCKET"
 #else
 #  define ENV_SYSTEMD ""
 #endif
diff --git a/src/master/main.c b/src/master/main.c
index 6e0e68fe7..08bea05ed 100644
--- a/src/master/main.c
+++ b/src/master/main.c
@@ -26,6 +26,9 @@
 #include "service-process.h"
 #include "service-log.h"
 #include "dovecot-version.h"
+#ifdef HAVE_SYSTEMD
+#include "sd-daemon.h"
+#endif
 
 #include 
 #include 
@@ -544,6 +547,9 @@ static void main_init(const struct master_settings *set)
 	master_clients_init();
 
 	services_monitor_start(services);
+#ifdef HAVE_SYSTEMD
+	sd_notify(0, "READY=1");
+#endif
 	startup_finished = TRUE;
 }
 
-- 
2.20.1



Bug#951722: autopkgtest suite flaky on arm64

2020-05-23 Thread Michael Biebl
Am 23.05.20 um 11:14 schrieb Nis Martensen:
> Thanks a lot Noah and Michael for working on this!
> 
> Michael Biebl wrote:
>> The patch to add sd_notify is rather trivial. Problem is, that dovecot
>> unhelpfully clears the full environment. In src/master/main.c,
>> sd_notify() should be called around/after main_init().
>> Unfortunately, at this point master_service_env_clean() has been called,
>> clearing the process environment, including NOTIFY_SOCKET, which is
>> passed from systemd to dovecot and is needed to make sd_notify work.
>>
>> I haven't found a way how to instruct dovecot not to clear the
>> NOTIFY_SOCKET env var.
> 
> I have no idea if this works, but did you try adding NOTIFY_SOCKET to
> line 65 of src/lib-master/master-service-settings.c?

This does the trick. Thanks, Nis.




signature.asc
Description: OpenPGP digital signature


Bug#951722: autopkgtest suite flaky on arm64

2020-05-23 Thread Nis Martensen
Thanks a lot Noah and Michael for working on this!

Michael Biebl wrote:
> The patch to add sd_notify is rather trivial. Problem is, that dovecot
> unhelpfully clears the full environment. In src/master/main.c,
> sd_notify() should be called around/after main_init().
> Unfortunately, at this point master_service_env_clean() has been called,
> clearing the process environment, including NOTIFY_SOCKET, which is
> passed from systemd to dovecot and is needed to make sd_notify work.
> 
> I haven't found a way how to instruct dovecot not to clear the
> NOTIFY_SOCKET env var.

I have no idea if this works, but did you try adding NOTIFY_SOCKET to
line 65 of src/lib-master/master-service-settings.c?



Bug#951722: autopkgtest suite flaky on arm64

2020-05-22 Thread Michael Biebl

I'm convinced that applying such hacks are a bad practice and should be
avoided.
I also have to add, that my motivation to further look into this has now
basically dropped to zero.




signature.asc
Description: OpenPGP digital signature


Bug#951722: autopkgtest suite flaky on arm64

2020-05-22 Thread Noah Meyerhans
On Fri, May 22, 2020 at 11:51:07PM +0200, Michael Biebl wrote:
> > I will upload a new upstream version to sid containing the workaround
> > for the test failures.  I will leave this bug open, but will reduce the
> > severity to 'normal'.  In a subsequent upload, I will apply a patch to
> > implement sd_notify and will resolve the bug.  Please feel free to send
> > a patch if you don't want to wait however long it'll take for me to get
> > around to putting one together.
> 
> Please don't apply this hack. If you don't want to fix this properly to
> get a (newer) version into testing, please just disable the test for the
> time being.

If we don't test it, it can't be broken, right?

> It's great that the autopkgtest suite unconvered a real issue.
> Let's not mutilate the test suite.

I think the test suite with the workaround in place has more value than
the suite with this test completely disabled.  If the service never
becomes available, the test with the workaround will still detect the
situation, which is exactly what it's there for.

noah



Bug#951722: autopkgtest suite flaky on arm64

2020-05-22 Thread Michael Biebl
Am 22.05.20 um 07:29 schrieb Noah Meyerhans:
> On Sun, May 10, 2020 at 11:06:26PM +0200, Michael Biebl wrote:
>>> +echo "Waiting for the service to be available"
>>> +c=0
>>> +while ! nc -z -U /var/run/dovecot/auth-userdb; do
>>> +   c=$(($c+1))
>>> +   sleep 2
>>> +   if [ $c -gt 30 ]; then
>>> +   echo "Timed out waiting for the service to be available" >&2
>>> +   exit 1
>>> +   fi
>>> +done
>>
>> Looping until the service is ready appears to be a workaround/hack at
>> best imho.
> 
> I agree, however...
> 
>> The dovecot service should only signal its readiness when the
>> communication sockets are ready yet to accept connections. I.e. this
>> autopkgtest appears to point at a real issue that should be fixed properly.
> 
> I do not believe that this is an RC issue.  In order to address the
> stale upstream version and pending security updates in sid, and allow
> the package to again enter bullseye, I propose the following:

That's a policy determined by the release manager/ maintainers of debci.
The recommended that I should file such issues with RC severity.
If you don't agree with that policy, you should probably contact them
directly.

> I will upload a new upstream version to sid containing the workaround
> for the test failures.  I will leave this bug open, but will reduce the
> severity to 'normal'.  In a subsequent upload, I will apply a patch to
> implement sd_notify and will resolve the bug.  Please feel free to send
> a patch if you don't want to wait however long it'll take for me to get
> around to putting one together.

Please don't apply this hack. If you don't want to fix this properly to
get a (newer) version into testing, please just disable the test for the
time being.
It's great that the autopkgtest suite unconvered a real issue.
Let's not mutilate the test suite.


> Dovecot has been essentially unmaintained in Debian since August 2019,
> and there's quite a backlog of work to do.  I'm going to work on getting
> it back into shape, but it will be a little while before it's where it
> should be.  It won't happen all at once.> 
> noah
> 

The patch to add sd_notify is rather trivial. Problem is, that dovecot
unhelpfully clears the full environment. In src/master/main.c,
sd_notify() should be called around/after main_init().
Unfortunately, at this point master_service_env_clean() has been called,
clearing the process environment, including NOTIFY_SOCKET, which is
passed from systemd to dovecot and is needed to make sd_notify work.

I haven't found a way how to instruct dovecot not to clear the
NOTIFY_SOCKET env var.

Regards,
Michael




signature.asc
Description: OpenPGP digital signature


Bug#951722: autopkgtest suite flaky on arm64

2020-05-21 Thread Noah Meyerhans
On Sun, May 10, 2020 at 11:06:26PM +0200, Michael Biebl wrote:
> > +echo "Waiting for the service to be available"
> > +c=0
> > +while ! nc -z -U /var/run/dovecot/auth-userdb; do
> > +   c=$(($c+1))
> > +   sleep 2
> > +   if [ $c -gt 30 ]; then
> > +   echo "Timed out waiting for the service to be available" >&2
> > +   exit 1
> > +   fi
> > +done
> 
> Looping until the service is ready appears to be a workaround/hack at
> best imho.

I agree, however...

> The dovecot service should only signal its readiness when the
> communication sockets are ready yet to accept connections. I.e. this
> autopkgtest appears to point at a real issue that should be fixed properly.

I do not believe that this is an RC issue.  In order to address the
stale upstream version and pending security updates in sid, and allow
the package to again enter bullseye, I propose the following:

I will upload a new upstream version to sid containing the workaround
for the test failures.  I will leave this bug open, but will reduce the
severity to 'normal'.  In a subsequent upload, I will apply a patch to
implement sd_notify and will resolve the bug.  Please feel free to send
a patch if you don't want to wait however long it'll take for me to get
around to putting one together.

Dovecot has been essentially unmaintained in Debian since August 2019,
and there's quite a backlog of work to do.  I'm going to work on getting
it back into shape, but it will be a little while before it's where it
should be.  It won't happen all at once.

noah



Bug#951722: autopkgtest suite flaky on arm64

2020-05-10 Thread Michael Biebl
Am 10.05.20 um 23:06 schrieb Michael Biebl:
> On Sat, 7 Mar 2020 16:01:22 +0200 Mpampis Kostas 
> wrote:
>> diff --git a/debian/tests/control b/debian/tests/control
>> index 7abd238c3..5bf1dc94b 100644
>> --- a/debian/tests/control
>> +++ b/debian/tests/control
>> @@ -6,5 +6,5 @@ Tests: systemd
>>  Depends: dovecot-core, systemd-sysv
>>  
>>  Test-Command: run-parts --report --exit-on-error debian/tests/usage
>> -Depends: dovecot-imapd, dovecot-pop3d, python3
>> +Depends: dovecot-imapd, dovecot-pop3d, python3, netcat-openbsd
>>  Restrictions: needs-root, breaks-testbed, allow-stderr
>> diff --git a/debian/tests/usage/00_setup b/debian/tests/usage/00_setup
>> index 2eeeb2f73..e90ca7e92 100755
>> --- a/debian/tests/usage/00_setup
>> +++ b/debian/tests/usage/00_setup
>> @@ -29,6 +29,17 @@ chown nobody:nogroup /srv/dovecot-dep8
>>  echo "Restarting the service"
>>  systemctl restart dovecot
>>  
>> +echo "Waiting for the service to be available"
>> +c=0
>> +while ! nc -z -U /var/run/dovecot/auth-userdb; do
>> +c=$(($c+1))
>> +sleep 2
>> +if [ $c -gt 30 ]; then
>> +echo "Timed out waiting for the service to be available" >&2
>> +exit 1
>> +fi
>> +done
> 
> Looping until the service is ready appears to be a workaround/hack at
> best imho.
> 
> The dovecot service should only signal its readiness when the
> communication sockets are ready yet to accept connections. I.e. this
> autopkgtest appears to point at a real issue that should be fixed properly.

Quickly glancing at dovecot.service, I see cat ./dovecot.service.in
...
Type=simple

This is problematic. Type=simple means, the service is considered ready
as soon as the process has been forked off. In case of dovecot, this
does not appear to be the correct choice, as the service is marked ready
before it had a chance to setup its communication channels.

See also https://www.lucas-nussbaum.net/blog/?p=877

My recommendation would be, that dovecot implements the systemd
readiness protocol sd_notify:
https://www.freedesktop.org/software/systemd/man/sd_notify.html


If there are questions, please don't hesitate to ask.

Michael



signature.asc
Description: OpenPGP digital signature


Bug#951722: autopkgtest suite flaky on arm64

2020-05-10 Thread Michael Biebl
On Sat, 7 Mar 2020 16:01:22 +0200 Mpampis Kostas 
wrote:
> diff --git a/debian/tests/control b/debian/tests/control
> index 7abd238c3..5bf1dc94b 100644
> --- a/debian/tests/control
> +++ b/debian/tests/control
> @@ -6,5 +6,5 @@ Tests: systemd
>  Depends: dovecot-core, systemd-sysv
>  
>  Test-Command: run-parts --report --exit-on-error debian/tests/usage
> -Depends: dovecot-imapd, dovecot-pop3d, python3
> +Depends: dovecot-imapd, dovecot-pop3d, python3, netcat-openbsd
>  Restrictions: needs-root, breaks-testbed, allow-stderr
> diff --git a/debian/tests/usage/00_setup b/debian/tests/usage/00_setup
> index 2eeeb2f73..e90ca7e92 100755
> --- a/debian/tests/usage/00_setup
> +++ b/debian/tests/usage/00_setup
> @@ -29,6 +29,17 @@ chown nobody:nogroup /srv/dovecot-dep8
>  echo "Restarting the service"
>  systemctl restart dovecot
>  
> +echo "Waiting for the service to be available"
> +c=0
> +while ! nc -z -U /var/run/dovecot/auth-userdb; do
> + c=$(($c+1))
> + sleep 2
> + if [ $c -gt 30 ]; then
> + echo "Timed out waiting for the service to be available" >&2
> + exit 1
> + fi
> +done

Looping until the service is ready appears to be a workaround/hack at
best imho.

The dovecot service should only signal its readiness when the
communication sockets are ready yet to accept connections. I.e. this
autopkgtest appears to point at a real issue that should be fixed properly.

Regards,
Michael



signature.asc
Description: OpenPGP digital signature


Bug#951722: autopkgtest suite flaky on arm64

2020-02-20 Thread Michael Biebl
Source: dovecot
Version: 1:2.3.7.2-1
Severity: serious
User: debian...@lists.debian.org
Usertags: flaky

Hi,

the autopkgtest suite appears to be flaky on arm64. Sometimes it
succeeds, sometimes command1 fails [1].

This is problematic, as this (randomly) blocks other packages from
entering testing, a recent example is systemd which is currently blocked
because of

  autopkgtest for dovecot/1:2.3.7.2-1: amd64: Pass, arm64: Regression

I asked on #debci, and they recommended to file this as RC bug.
If you think this is an issue of the debci arm64 infrastructure, please
get in touch with debian...@lists.debian.org.

Regards,
Michael

[1] 
https://ci.debian.net/data/autopkgtest/unstable/arm64/d/dovecot/4294703/log.gz

-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.4.0-4-amd64 (SMP w/4 CPU cores)
Kernel taint flags: TAINT_WARN
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), 
LANGUAGE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled