Public bug reported: This happened on a CentOS Stream 8.
I created an AWS instance from a snapshot of another instance. Upon start I was unable to login via SSH because it failed to start. Upon log investigation I found out that cloud-init deleted the files from /etc/ssh/ssh_host_* between `sshd-keygen.target` and starting of OpenSSH. I recovered the instance in another way but I dug the logs. Here are the logs extracts: messages: Nov 3 08:30:38 ip-172-21-3-249 systemd[1]: Reached target sshd-keygen.target. cloud-init.log: 2022-11-03 08:31:02,307 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key.pub 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key.pub 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key.pub messages: Nov 3 08:31:02 ip-172-21-3-249 systemd[1]: Starting OpenSSH server daemon... Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_rsa_key Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_ecdsa_key Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_ed25519_key Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: sshd: no hostkeys available -- exiting. Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Failed with result 'exit-code'. Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: Failed to start OpenSSH server daemon. The cloud-init file has the right dependencies: [root@ip-172-21-3-249 log]# more /usr/lib/systemd/system/cloud-init.service [Unit] Description=Initial cloud-init job (metadata service crawler) DefaultDependencies=no Wants=cloud-init-local.service Wants=sshd-keygen.service Wants=sshd.service After=cloud-init-local.service After=systemd-networkd-wait-online.service After=network.service After=NetworkManager.service Before=network-online.target Before=sshd-keygen.service Before=sshd.service Before=systemd-user-sessions.service [Service] Type=oneshot ExecStart=/usr/bin/cloud-init init RemainAfterExit=yes TimeoutSec=0 # Output needs to appear in instance console output StandardOutput=journal+console [Install] WantedBy=cloud-init.target But I wonder if they still work for SystemD templates: [root@ip-172-21-3-249 log]# systemctl status sshd-keygen.service Unit sshd-keygen.service could not be found. [root@ip-172-21-3-249 log]# systemctl status [email protected] Failed to get properties: Unit name [email protected] is neither a valid invocation ID nor unit name. [root@ip-172-21-3-249 log]# systemctl status sshd-keygen@ [email protected] [email protected] [email protected] ««« there are 3 services each for it's key type. I can see that the keygen is disabled here because cloud-init is disabled: [root@ip-172-21-3-249 log]# systemctl status [email protected] ● [email protected] - OpenSSH ed25519 Server Key Generation Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/[email protected] └─disable-sshd-keygen-if-cloud-init-active.conf Active: inactive (dead) Condition: start condition failed at Thu 2022-11-03 10:18:28 UTC; 3h 4min ago └─ ConditionPathExists=!/run/systemd/generator.early/multi-user.target.wants/cloud-init.target was not met How can we ensure this does not happen in the future? ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1995609 Title: ssh host keys deleted by cloud-init between sshd-keygen and sshd start Status in cloud-init: New Bug description: This happened on a CentOS Stream 8. I created an AWS instance from a snapshot of another instance. Upon start I was unable to login via SSH because it failed to start. Upon log investigation I found out that cloud-init deleted the files from /etc/ssh/ssh_host_* between `sshd-keygen.target` and starting of OpenSSH. I recovered the instance in another way but I dug the logs. Here are the logs extracts: messages: Nov 3 08:30:38 ip-172-21-3-249 systemd[1]: Reached target sshd-keygen.target. cloud-init.log: 2022-11-03 08:31:02,307 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key.pub 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key.pub 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key 2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key.pub messages: Nov 3 08:31:02 ip-172-21-3-249 systemd[1]: Starting OpenSSH server daemon... Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_rsa_key Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_ecdsa_key Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_ed25519_key Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: sshd: no hostkeys available -- exiting. Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Failed with result 'exit-code'. Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: Failed to start OpenSSH server daemon. The cloud-init file has the right dependencies: [root@ip-172-21-3-249 log]# more /usr/lib/systemd/system/cloud-init.service [Unit] Description=Initial cloud-init job (metadata service crawler) DefaultDependencies=no Wants=cloud-init-local.service Wants=sshd-keygen.service Wants=sshd.service After=cloud-init-local.service After=systemd-networkd-wait-online.service After=network.service After=NetworkManager.service Before=network-online.target Before=sshd-keygen.service Before=sshd.service Before=systemd-user-sessions.service [Service] Type=oneshot ExecStart=/usr/bin/cloud-init init RemainAfterExit=yes TimeoutSec=0 # Output needs to appear in instance console output StandardOutput=journal+console [Install] WantedBy=cloud-init.target But I wonder if they still work for SystemD templates: [root@ip-172-21-3-249 log]# systemctl status sshd-keygen.service Unit sshd-keygen.service could not be found. [root@ip-172-21-3-249 log]# systemctl status [email protected] Failed to get properties: Unit name [email protected] is neither a valid invocation ID nor unit name. [root@ip-172-21-3-249 log]# systemctl status sshd-keygen@ [email protected] [email protected] [email protected] ««« there are 3 services each for it's key type. I can see that the keygen is disabled here because cloud-init is disabled: [root@ip-172-21-3-249 log]# systemctl status [email protected] ● [email protected] - OpenSSH ed25519 Server Key Generation Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/[email protected] └─disable-sshd-keygen-if-cloud-init-active.conf Active: inactive (dead) Condition: start condition failed at Thu 2022-11-03 10:18:28 UTC; 3h 4min ago └─ ConditionPathExists=!/run/systemd/generator.early/multi-user.target.wants/cloud-init.target was not met How can we ensure this does not happen in the future? To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1995609/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

