Thanks for the investigation work. As it sounds like a rpm-update bug,
and we cannot reproduce it on our side, I mark the bug as invalid.

** Changed in: cloud-init
       Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1995609

Title:
  ssh host keys deleted by cloud-init between sshd-keygen and sshd start

Status in cloud-init:
  Invalid

Bug description:
  This happened on a CentOS Stream 8.

  I created an AWS instance from a snapshot of another instance.
  Upon start I was unable to login via SSH because it failed to start.

  Upon log investigation I found out that cloud-init deleted the files
  from /etc/ssh/ssh_host_* between `sshd-keygen.target` and starting of
  OpenSSH.

  I recovered the instance in another way but I dug the logs.
  Here are the logs extracts:

  messages:
  Nov  3 08:30:38 ip-172-21-3-249 systemd[1]: Reached target sshd-keygen.target.

  cloud-init.log:
  2022-11-03 08:31:02,307 - util.py[DEBUG]: Attempting to remove 
/etc/ssh/ssh_host_ed25519_key
  2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove 
/etc/ssh/ssh_host_ed25519_key.pub
  2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove 
/etc/ssh/ssh_host_ecdsa_key
  2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove 
/etc/ssh/ssh_host_ecdsa_key.pub
  2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove 
/etc/ssh/ssh_host_rsa_key
  2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove 
/etc/ssh/ssh_host_rsa_key.pub

  messages:
  Nov  3 08:31:02 ip-172-21-3-249 systemd[1]: Starting OpenSSH server daemon...
  Nov  3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: 
/etc/ssh/ssh_host_rsa_key
  Nov  3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: 
/etc/ssh/ssh_host_ecdsa_key
  Nov  3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: 
/etc/ssh/ssh_host_ed25519_key
  Nov  3 08:31:03 ip-172-21-3-249 sshd[1337]: sshd: no hostkeys available -- 
exiting.
  Nov  3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Main process 
exited, code=exited, status=1/FAILURE
  Nov  3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Failed with result 
'exit-code'.
  Nov  3 08:31:03 ip-172-21-3-249 systemd[1]: Failed to start OpenSSH server 
daemon.

  The cloud-init file has the right dependencies:

  [root@ip-172-21-3-249 log]# more /usr/lib/systemd/system/cloud-init.service
  [Unit]
  Description=Initial cloud-init job (metadata service crawler)
  DefaultDependencies=no
  Wants=cloud-init-local.service
  Wants=sshd-keygen.service
  Wants=sshd.service
  After=cloud-init-local.service
  After=systemd-networkd-wait-online.service
  After=network.service
  After=NetworkManager.service
  Before=network-online.target
  Before=sshd-keygen.service
  Before=sshd.service
  Before=systemd-user-sessions.service

  [Service]
  Type=oneshot
  ExecStart=/usr/bin/cloud-init init
  RemainAfterExit=yes
  TimeoutSec=0

  # Output needs to appear in instance console output
  StandardOutput=journal+console

  [Install]
  WantedBy=cloud-init.target

  
  But I wonder if they still work for SystemD templates:

  [root@ip-172-21-3-249 log]# systemctl status sshd-keygen.service
  Unit sshd-keygen.service could not be found.
  [root@ip-172-21-3-249 log]# systemctl status [email protected]
  Failed to get properties: Unit name [email protected] is neither a valid 
invocation ID nor unit name.
  [root@ip-172-21-3-249 log]# systemctl status sshd-keygen@
  [email protected]    [email protected]  
[email protected] ««« there are 3 services each for it's key type.

  
  I can see that the keygen is disabled here because cloud-init is disabled:

  [root@ip-172-21-3-249 log]# systemctl status [email protected][email protected] - OpenSSH ed25519 Server Key Generation
     Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; 
vendor preset: disabled)
    Drop-In: /etc/systemd/system/[email protected]
             └─disable-sshd-keygen-if-cloud-init-active.conf
     Active: inactive (dead)
  Condition: start condition failed at Thu 2022-11-03 10:18:28 UTC; 3h 4min ago
             └─ 
ConditionPathExists=!/run/systemd/generator.early/multi-user.target.wants/cloud-init.target
 was not met

  How can we ensure this does not happen in the future?

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1995609/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to