Public bug reported:

TLDR:

attempting to run ntp in standalone instead of dhcp driven somehow leads
to logind/dbus pam timeouts and throw errors to auth.log and results in
pam logins via console/ssh/su to be delayed by 25s for said timeout but
otherwise work. at first I thought I had a regression on 1591411 but
this is different. I do not understand why this happened but presume it
is because nm has some weird side effect on network-pre.target so
disabling the two ntp units didn't work as I intended.

I have since run into 1823098 which has the same starting issue I had
which is to run ntp in standalone mode with a provided /etc/ntp.conf
that isn't modified by dhcp. I am filing this bug in case somebody else
travels down this road, hits these log messages and wonders wtf broke.

the ntp issues:
- there should be a knob in /etc/default/ntp to specify which form of ntp you 
want (none/standalone/dhcp/netplan/etc)
- vaguely using netplan 'use-ntp' shouldn't be the only way to handle this and 
even that's locked into a bool instead of a selection since it has sync 
semantics
- dhcp shouldn't assume 'ntp-servers' is requested without checking for it. 
there's no /etc/default/dhcp that I can see so you have to modify 
/etc/dhcp/dhclient.conf directly as it is in the request block and not a useful 
hook.

perhaps there's an intended way to do this but I didn't run into it thus
far. open to suggestions. meanwhile my solution was simply to disable
the two units and no longer be a network-pre.target. it seemed to work
for 2days and then I started seeing these on pam auths:

===

DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"

$ uptime #all users are just xterm login shells for me in X
 23:34:24 up 34 days,  2:55, 31 users,  load average: 0.55, 0.44, 0.38

$ dpkg -l|egrep 'systemd |dbus-user-session'|awk '{print $1,$2,$3,$4}'|column -t
ii  dbus-user-session  1.12.2-1ubuntu1.2  amd64
ii  libsystemd0:amd64  237-3ubuntu10.42   amd64
ii  libsystemd0:i386   237-3ubuntu10.42   i386
ii  python3-systemd    234-1build1        amd64
ii  systemd            237-3ubuntu10.42   amd64

$ systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
+GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 
default-hierarchy=hybrid

$ ps laxwww|grep dbus-[d]
4   106  1021     1  20   0  52740  3548 -      Ss   ?         13:45 
/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile 
--systemd-activation --syslog-only
0  1000  1839  1712  20   0  51580  2160 ep_pol Ss   ?          1:37 
/usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile 
--systemd-activation --syslog-only
0  1000  2286  2273  20   0  50052   984 ep_pol S    ?          0:16 
/usr/bin/dbus-daemon 
--config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork 
--print-address 3


===8<: trigger1
$ date;sudo su - build -c /bin/date
Thu Feb 18 22:54:40 PST 2021
[sudo] password for luser: 
Thu Feb 18 22:55:06 PST 2021

===8<: trigger1 auth.log
Feb 18 22:54:10 bahx dbus-daemon[1021]: [system] Failed to activate service 
'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:54:10 bahx sshd[22024]: pam_systemd(sshd:session): Failed to create 
session: Connection timed out
Feb 18 22:54:10 bahx systemd-logind[1030]: Failed to start session scope 
session-941.scope: Failed to activate service 'org.freedesktop.systemd1': timed 
out (service_start_timeout=25000ms)
Feb 18 22:54:10 bahx sshd[22090]: Received disconnect from 127.0.0.1 port 
50456:11: disconnected by user
Feb 18 22:54:10 bahx sshd[22090]: Disconnected from user luser 127.0.0.1 port 
50456
Feb 18 22:54:10 bahx sshd[22024]: pam_unix(sshd:session): session closed for 
user luser
Feb 18 22:54:41 bahx sudo:      luser : TTY=pts/26 ; PWD=/home/luser ; 
USER=root ; COMMAND=/bin/su - build -c /bin/date
Feb 18 22:54:41 bahx sudo: pam_unix(sudo:session): session opened for user root 
by luser(uid=0)
Feb 18 22:54:41 bahx su[22111]: Successful su for build by root
Feb 18 22:54:41 bahx su[22111]: + /dev/pts/26 root:build
Feb 18 22:54:41 bahx su[22111]: pam_unix(su:session): session opened for user 
build by luser(uid=0)
Feb 18 22:55:06 bahx dbus-daemon[1021]: [system] Failed to activate service 
'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:55:06 bahx su[22111]: pam_systemd(su:session): Failed to create 
session: Connection timed out
Feb 18 22:55:06 bahx systemd-logind[1030]: Failed to start user slice 
user-20000.slice, ignoring: Connection timed out 
(org.freedesktop.DBus.Error.Timeout)
Feb 18 22:55:06 bahx su[22111]: pam_unix(su:session): session closed for user 
build
Feb 18 22:55:06 bahx sudo: pam_unix(sudo:session): session closed for user root

===8<: trigger2
$ date;ssh localhost date
Thu Feb 18 22:55:52 PST 2021
generic banner
luser@localhost's password: 
Thu Feb 18 22:56:18 PST 2021

===8<: trigger2 auth.log
Feb 18 22:55:53 bahx sshd[22135]: Accepted password for luser from 127.0.0.1 
port 50608 ssh2
Feb 18 22:55:53 bahx sshd[22135]: pam_unix(sshd:session): session opened for 
user luser by (uid=0)
Feb 18 22:55:56 bahx systemd-logind[1030]: Failed to start session scope 
session-2.scope: Connection timed out
Feb 18 22:55:56 bahx dbus-daemon[1021]: [system] Failed to activate service 
'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:56:12 bahx dbus-daemon[1021]: [system] Failed to activate service 
'org.freedesktop.nm_dispatcher': timed out (service_start_timeout=25000ms)
Feb 18 22:56:18 bahx sshd[22135]: pam_systemd(sshd:session): Failed to create 
session: Connection timed out
Feb 18 22:56:18 bahx sshd[22202]: Received disconnect from 127.0.0.1 port 
50608:11: disconnected by user
Feb 18 22:56:18 bahx sshd[22202]: Disconnected from user luser 127.0.0.1 port 
50608
Feb 18 22:56:18 bahx sshd[22135]: pam_unix(sshd:session): session closed for 
user luser
Feb 18 22:56:21 bahx dbus-daemon[1021]: [system] Failed to activate service 
'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Feb 18 22:56:21 bahx systemd-logind[1030]: Failed to stop user service: Failed 
to activate service 'org.freedesktop.systemd1': timed out 
(service_start_timeout=25000ms)

===

all other references to these log messages end up with a generic restart
logind/daemon reexec/reload and ignore the cause so I didn't repeat them
blindly.

instead i reverted and reenabled/started the two units I
disabled/stopped and voila pam worked and no 25s delay on login.

===
$ for i in /var/log/auth.log*;do echo $i|egrep -q gz;(test $? -eq 0&&gzip -dc 
$i||cat $i)|egrep -i systemd1;done|awk 'BEGIN{IFS=":"}{print $1,$2}'|sort|uniq 
-c
      2 Feb 15
    277 Feb 17
    424 Feb 18 #mostly me triaging ssh/sudo su

$ for i in /var/log/auth.log*;do echo $i|egrep -q gz\$;(test $? -eq 0&&gzip -dc 
$i||cat $i)|egrep -i systemd1|awk '{print "'$i'",$0}';done|egrep 'Feb 15'
/var/log/auth.log Feb 15 06:51:05 bahx polkitd(authority=local): Operator of 
unix-session:c2 FAILED to authenticate to gain authorization for action 
org.freedesktop.systemd1.manage-units for system-bus-name::1.12149 [systemctl 
start ntp.service] (owned by unix-user:luser)
/var/log/auth.log Feb 15 07:06:43 bahx polkitd(authority=local): Operator of 
unix-session:c2 FAILED to authenticate to gain authorization for action 
org.freedesktop.systemd1.manage-units for system-bus-name::1.12173 [systemctl 
reset-failed ntp-systemd-netif.service] (owned by unix-user:luser)

===
this was me attempting to shut them down in a xterm without sudo and canceling 
the auth dialog and redoing them with sudo. the units disabled/stopped:

ntp-systemd-netif.service
ntp-systemd-netif.path

because of:

===
$ systemctl list-units --failed --all
  UNIT                      LOAD   ACTIVE SUB    DESCRIPTION
● ntp-systemd-netif.service loaded failed failed ntp-systemd-netif.service

with:

sudo systemctl reset-failed ntp-systemd-netif.service

===

at this point I have disabled/stopped the units again and pam is still
happy. I'll let it run to see if it returns. i did change one thing
though since then. because the failure of the unit was because the
target script hook for ntp was removed I replaced the ntp hook with a
simple exit 0 bash script so it doesn't trigger failure. perhaps that
will be enough to prevent this cascade? time will tell

in any event something this minor in a single daemon shouldn't break pam
across the system I should think. that's rather a violent overreach
methinks.

whether this classifies under nm, dbus, systemd-logind, or ntp is the
real mystery. nm barked first it seems. there's some nuance in ntp's
systemd config not aligning with nm I guess? why it took 2 days to
trigger I assume is like the previous issue requiring enough usage or
SOMAXCONN or something like it before 'it' happens. still tbd. i
triggered another nonsudo auth dialog to try to reproduce but seems fine
still

** Affects: ubuntu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916226

Title:
   pam login 25s timeout delay due to tickling ntp systemd units

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/1916226/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to