Public bug reported: cloud-init 22.2 introduced a race condition with ssh.service since it added a systemctl call to check if service is stopped/not-started. If ssh.service starts at around the same time of the check, cloud-init believes that the service is stopped and does not need to be restarts, proceeds to write the sshd_config. The writing of sshd_config might happen after the ssh.service has started, especially if there's significant delay in the systemctl status call (the delay would have given ssh.service plenty of time to startup)
I believed this is the commit that introduced the issue: https://github.com/canonical/cloud-init/commit/5054ffe188 I've attached the cloud-init.log and the auth.log showing the time ssh service started. >From cloud-init.log - the call to check ssh service status happened at 22:44:43,630, when cloud-init wrote the file sshd_config at 22:44:51, ssh service already started. There's a strange 8s delay from systemctl that might have to do with systemd or the condition of the VM. Regardless, the race is definitely there. 2022-11-22 22:44:43,630 - subp.py[DEBUG]: Running command ['systemctl', 'status', 'ssh'] with allowed return codes [0] (shell=False, capture=True) 2022-11-22 22:44:51,116 - cc_set_passwords.py[DEBUG]: Writing config 'ssh_pwauth: True'. SSH service 'ssh' will not be restarted because it is stopped. 2022-11-22 22:44:51,116 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False) 2022-11-22 22:44:51,116 - util.py[DEBUG]: Read 3546 bytes from /etc/ssh/sshd_config 2022-11-22 22:44:51,116 - ssh_util.py[DEBUG]: line 55: option PasswordAuthentication updated no -> yes 2022-11-22 22:44:51,117 - util.py[DEBUG]: Writing to /etc/ssh/sshd_config - wb: [600] 3547 bytes Here's what the customer gave us from their VM: ssh.service - OpenBSD Secure Shell server Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2022-11-22 22:44:43 UTC; 6 days ago sshd_config file changed after only few seconds. -rw------- 1 root root 3547 2022-11-22 22:44:51.113697898 +0000 /etc/ssh/sshd_config ** Affects: cloud-init Importance: Undecided Status: New ** Attachment added: "ssh_race_logs.zip" https://bugs.launchpad.net/bugs/1998526/+attachment/5633811/+files/ssh_race_logs.zip -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1998526 Title: cloud-init not restarting ssh service after writing sshd_config Status in cloud-init: New Bug description: cloud-init 22.2 introduced a race condition with ssh.service since it added a systemctl call to check if service is stopped/not-started. If ssh.service starts at around the same time of the check, cloud-init believes that the service is stopped and does not need to be restarts, proceeds to write the sshd_config. The writing of sshd_config might happen after the ssh.service has started, especially if there's significant delay in the systemctl status call (the delay would have given ssh.service plenty of time to startup) I believed this is the commit that introduced the issue: https://github.com/canonical/cloud-init/commit/5054ffe188 I've attached the cloud-init.log and the auth.log showing the time ssh service started. From cloud-init.log - the call to check ssh service status happened at 22:44:43,630, when cloud-init wrote the file sshd_config at 22:44:51, ssh service already started. There's a strange 8s delay from systemctl that might have to do with systemd or the condition of the VM. Regardless, the race is definitely there. 2022-11-22 22:44:43,630 - subp.py[DEBUG]: Running command ['systemctl', 'status', 'ssh'] with allowed return codes [0] (shell=False, capture=True) 2022-11-22 22:44:51,116 - cc_set_passwords.py[DEBUG]: Writing config 'ssh_pwauth: True'. SSH service 'ssh' will not be restarted because it is stopped. 2022-11-22 22:44:51,116 - util.py[DEBUG]: Reading from /etc/ssh/sshd_config (quiet=False) 2022-11-22 22:44:51,116 - util.py[DEBUG]: Read 3546 bytes from /etc/ssh/sshd_config 2022-11-22 22:44:51,116 - ssh_util.py[DEBUG]: line 55: option PasswordAuthentication updated no -> yes 2022-11-22 22:44:51,117 - util.py[DEBUG]: Writing to /etc/ssh/sshd_config - wb: [600] 3547 bytes Here's what the customer gave us from their VM: ssh.service - OpenBSD Secure Shell server Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2022-11-22 22:44:43 UTC; 6 days ago sshd_config file changed after only few seconds. -rw------- 1 root root 3547 2022-11-22 22:44:51.113697898 +0000 /etc/ssh/sshd_config To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1998526/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

