Thanks for the further investigation, Christian.

So, it doesn't seem to me that /bin/systemd-tty-ask-password-agent is
the culprit here.  Actually, if you look at when it is invoked, you will
notice that it is only executed when the systemctl command is issued
from the tty, which is not our case here: the command that is hanging
("systemctl reload squid") is being invoked indirectly due to the start
of dnsmasq.service.

When we issue a "systemctl start dnsmasq", we can see /bin/systemd-tty-
ask-password-agent there, but not as a child of the "systemctl reload
squid":

root        9164  0.0  0.0  26164  1040 pts/0    S+   22:23   0:00  |           
            \_ systemctl start dnsmasq.service
root        9165  0.0  0.0  12512  2084 pts/0    S+   22:23   0:00  |           
                \_ /bin/bash /bin/systemd-tty-ask-password-agent --watch

This is because we invoked "systemctl start dnsmasq" from the tty.  We
can easily verify that /bin/systemd-tty-ask-password-agent is not to
blame by using "systemctl --no-ask-password stop dnsmasq" and then
"systemctl --no-ask-password start dnsmasq", and verifying that the hang
still happens even though /bin/systemd-tty-ask-password-agent was not
invoked.

Anyway, continuing the investigation here, this is the output of
"systemctl list-jobs":

$ systemctl list-jobs --all 
 JOB UNIT              TYPE   STATE  
2512 dnsmasq.service   start  running
2561 squid.service     reload waiting
2560 nss-lookup.target start  waiting

3 jobs listed.

Nothing really new here, except the fact that the squid reload happens
*because* of the nss-lookup.target start, and both jobs are blocked
waiting.  It's interesting to notice that squid's SysV init file says
that squid "Should-Start: $named", which translated to squid trying to
start nss-lookup.target itself.  I think this is a strong indicator that
we might be seeing a deadlock here.

After a bit more investigation, I found
https://github.com/systemd/systemd/issues/10464, which led me to
https://github.com/systemd/systemd/pull/13860.  I tried backporting the
patch (which is very simple) and seeing if it had any impact, but
unfortunately it didn't.

I then did a quick test and hacked /usr/sbin/invoke-rc.d, specifically
around line 570, and commented out the "if" surrounding sctl_args
="--job-mode=ignore-dependencies" (in other words, I made systemctl
always use this option), and unsurprisingly the bug went away.  However,
just like with the "--no-block" hack I mentioned in my previous comment,
I'm not sure this is a good solution for the problem.

As I'm running out of ideas here, I'd like to propose a possible fix for
the problem, based on what Martin Pitt wrote in one of the bug reports I
mentioned (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777113).
I'd like to suggest that we expand /etc/resolvconf/update-libc.d/squid
to take into account whether systemd is being used behind the scenes,
and invoke systemctl to reload squid while also passing "--no-block" to
it.  Something like this:

 if [ -d /run/systemd ]; then
   systemctl --no-block reload squid
 else
   invoke-rc.d squid reload || true
 fi

Based on local tests here, this works and has the benefit of unblocking
nss-lookup.target to also finish, which means that, by the end of the
"systemctl start dnsmasq" process, we will have both successfully
reloaded squid *and* started nss-lookup.target (as well as started
dnsmasq.service, of course).

This is not the perfect solution, of course, but I feel like we're
wasting a lot of time on this old bug already, and this solution is not
entirely bad, IMHO.  We could in theory try to bisect systemd between
xenial and bionic and see if we could determine what change (or changes)
made this scenario work OK on the latter, but that's assuming that it is
systemd indeed who is causing this (I think it is, but I'm not 100% sure
yet).

Anyway, I'll wait for your answer in the morning.  We can discuss this
during standup too, if you'd like.

** Bug watch added: github.com/systemd/systemd/issues #10464
   https://github.com/systemd/systemd/issues/10464

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1761096

Title:
  dnsmasq starts with error on Ubuntu Xenial amd64 when squid installed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1761096/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to