As already reported dnsmasq hangs on the hooks:
             ├─2089 /bin/sh /etc/init.d/dnsmasq systemd-start-resolvconf
             ├─2097 run-parts --arg=-a --arg=lo.dnsmasq /etc/resolvconf/update.d
             ├─2116 run-parts /etc/resolvconf/update-libc.d
             ├─2129 /bin/sh /etc/resolvconf/update-libc.d/squid
             ├─2130 /bin/sh /usr/sbin/invoke-rc.d squid reload
             └─2148 systemctl reload squid.service

The hierarchy of these is:
run-parts /etc/resolvconf/update-libc.d
  -> /etc/resolvconf/update-libc.d/squid
    -> /usr/sbin/invoke-rc.d squid reload

The actual reload is done really fast:
Sep 02 06:32:19 x2 systemd[1]: Reloading LSB: Squid HTTP Proxy version 3.x.
Sep 02 06:32:19 x2 squid[2505]:  * Reloading Squid HTTP Proxy configuration 
files
Sep 02 06:32:19 x2 squid[2505]:    ...done.
Sep 02 06:32:19 x2 systemd[1]: Reloaded LSB: Squid HTTP Proxy version 3.x.

But in ps hierarchy the systemctl reload hangs still
0     0    2737       1  20   0   4500  1372 wait   Ss   ?          0:00 
/bin/sh /etc/init.d/dnsmasq systemd-start-resolvconf
0     0    2745    2737  20   0   4360   644 wait   S    ?          0:00  \_ 
run-parts --arg=-a --arg=lo.dnsmasq /etc/resolvconf/update.d
0     0    2763    2745  20   0   4360  1184 wait   S    ?          0:00      
\_ run-parts /etc/resolvconf/update-libc.d
0     0    2776    2763  20   0   4500   692 wait   S    ?          0:00        
  \_ /bin/sh /etc/resolvconf/update-libc.d/squid
0     0    2777    2776  20   0   4500  1300 wait   S    ?          0:00        
      \_ /bin/sh /usr/sbin/invoke-rc.d squid reload
0     0    2795    2777  20   0  24884  1068 poll_s S    ?          0:00        
          \_ systemctl reload squid.service


The delay on that is actually 90 seconds.
And it waits until killed:
     0.000000 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=3, 
revents=POLLIN}]) <53.100476>
    53.101445 +++ killed by SIGKILL +++
Due to
  Sep 02 06:37:19 x2 systemd[1]: dnsmasq.service: Start-post operation timed 
out. Stopping.


The old init script reload in xenial was:
    reload|force-reload)
        res=`$DAEMON -k parse -f $CONFIG 2>&1 | grep -o "FATAL: .*"`
        if test -n "$res";
        then
                log_failure_msg "$res"
                exit 3
        else
                log_action_msg "Reloading $DESC configuration files"
                start-stop-daemon --stop --signal 1 \
                        --pidfile $PIDFILE --quiet --exec $DAEMON
                log_action_end_msg 0
        fi
        ;;


But all these actions directly executed are always fast:
- /usr/sbin/invoke-rc.d squid reload
- systemctl reload squid
- start-stop-daemon --stop --signal 1 --pidfile /var/run/squid.pid --quiet 
--exec /usr/sbin/squid

I can make "systemctl reload squid" hang thou - if I invoke it at the right 
time.
To do so I replaced /etc/resolvconf/update-libc.d/squid with a debug hook that 
waits on me:
echo "DEBUG DEBUG DEBUG"
echo "would call 'invoke-rc.d squid reload' now"
echo "instead wait until /tmp/foo was touched"
f="/tmp/foo"
start=$(date -r $f);
new=$(date -r $f);
while [ "$start" = "$new" ]; do
  sleep 2;
  new=$(date -r $f);
done;
echo "DEBUG DEBUG DEBUG"


Once that is active "systemctl reload squid.service" hangs as it is in the 
error case.
This only calls into "/etc/init.d/squid reload" and that is what actually hangs

And what did I find when watching this hang:
0     0    4135    2159  20   0   4500  1368 wait   S+   ?          0:00  \_ 
/bin/sh /etc/init.d/squid reload
0     0    4143    4135  20   0  26588  1128 poll_s S+   ?          0:00      
\_ /bin/systemctl --no-pager reload squid.service
0     0    4144    4143  20   0  15404   840 poll_s S+   ?          0:00        
  \_ /bin/systemd-tty-ask-password-agent --watch

I mean systemd-tty-ask-password-agent what ?!?

I must say that I love to hate the old complexity of the sysV<->systemd change.
- /etc/init.d/squid is generated into a service
- dnsmasq hook calls: systemctl reload squid
- that calls: /etc/init.d/squid reload
- which calls: /bin/systemctl --no-pager reload squid.service

This is back to step #2 one would think :-)

But if at the broken time I directly call "/bin/systemctl --no-pager reload 
squid.service" I get the same behavior with less redirections:
4     0    2159       0  20   0  21608  2800 wait   Ss   ?          0:00 bash
0     0    5312    2159  20   0  26588  2364 poll_s S+   ?          0:00  \_ 
/bin/systemctl --no-pager reload squid.service
0     0    5314    5312  20   0  15404   840 poll_s S+   ?          0:00      
\_ /bin/systemd-tty-ask-password-agent --watch

But when this happens /run/systemd/ask-password is empty.

I think we are looking at something more similar to:
- https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1456789
- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774153
- https://bugs.freedesktop.org/show_bug.cgi?id=92430
- https://bugzilla.redhat.com/show_bug.cgi?id=1367981
But for me `systemctl daemon-reexec` didn't serve as a workaround, so maybe not 
the exact same?

Similar is
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1565617 in which
case /bin/systemd-tty-ask-password-agent was a red herring - but the fix
for that has landed and we still see it.

The next questions open to me now are:
- why does /bin/systemd-tty-ask-password-agent open here
- is /bin/systemd-tty-ask-password-agent the process holding this up?

For the latter I set up this debug trick:
lrwxrwxrwx 1 root root 9 Sep  2 07:27 /bin/systemd-tty-ask-password-agent -> 
/bin/true*
But that was probably too rude and hangs it in:
  [systemd-tty-ask] <defunct>

Sergio, maybe you can continue from here tracking these two open
questions further down?


** Bug watch added: Debian Bug tracker #774153
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774153

** Bug watch added: freedesktop.org Bugzilla #92430
   https://bugs.freedesktop.org/show_bug.cgi?id=92430

** Bug watch added: Red Hat Bugzilla #1367981
   https://bugzilla.redhat.com/show_bug.cgi?id=1367981

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to dnsmasq in Ubuntu.
https://bugs.launchpad.net/bugs/1761096

Title:
  dnsmasq starts with error on Ubuntu Xenial amd64 when squid installed

Status in dnsmasq package in Ubuntu:
  Fix Released
Status in squid package in Ubuntu:
  Fix Released
Status in dnsmasq source package in Xenial:
  Confirmed
Status in squid source package in Xenial:
  Confirmed

Bug description:
  Setup to reproduce:

  Ubuntu Xenial amd64 net install iso from
  http://archive.ubuntu.com/ubuntu/dists/xenial/main/installer-
  amd64/current/images/netboot/mini.iso

  Install system with mostly defaults + LVM + OpenSSH server

  Note that this bug applies to both DHCP and static IP+DNS network
  configurations

  Once server rebooted and is available, log in and install dnsmasq + squid:
  apt-get update && apt-get install squid dnsmasq

  output of this can be found at https://pastebin.com/9Atuipju
  journalctl -xe output at https://pastebin.com/uLhfM4jN

  Furthermore at this point I can run alternating errors

  root@ubuntu-min:~# date ; service dnsmasq start ; date
  Wed Apr  4 09:18:07 CEST 2018
  Wed Apr  4 09:18:07 CEST 2018
  root@ubuntu-min:~# date ; service dnsmasq stop ; date
  Wed Apr  4 09:18:39 CEST 2018
  Wed Apr  4 09:18:39 CEST 2018
  root@ubuntu-min:~# date ; service dnsmasq start ; date
  Wed Apr  4 09:19:10 CEST 2018
  Job for dnsmasq.service failed because a timeout was exceeded. See "systemctl 
status dnsmasq.service" and "journalctl -xe" for details.
  Wed Apr  4 09:20:40 CEST 2018
  root@ubuntu-min:~# date ; service dnsmasq start ; date
  Wed Apr  4 09:42:57 CEST 2018
  Wed Apr  4 09:42:57 CEST 2018
  root@ubuntu-min:~# date ; service dnsmasq stop ; date
  Wed Apr  4 09:43:14 CEST 2018
  Wed Apr  4 09:43:14 CEST 2018
  root@ubuntu-min:~# date ; service dnsmasq start ; date
  Wed Apr  4 09:43:26 CEST 2018
  Job for dnsmasq.service failed because a timeout was exceeded. See "systemctl 
status dnsmasq.service" and "journalctl -xe" for details.
  Wed Apr  4 09:44:56 CEST 2018

  and so on... Each and every 1 out of 2 stop/start cycle fails in 1m30s
  timeout

  Complete journalctl -xe output attached

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1761096/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to