** Description changed:

  [Impact]
  
  - On Noble, cloud-init changed the default DHCP client to dhcpcd.
- Occasionally, dhcpcd seems to successfully negotiate a lease but then
- fail to parse said lease and crashes. As a result, despite having
- obtained a lease, the VM is left without functioning networking. The
- failure looks something like the following traceback (key error message
- is that the dhcpcd service isn't running: "Stderr: dhcpcd is not
- running").
+ Occasionally, dhcpcd successfully negotiates a lease but then fails to
+ parse said lease and crashes. As a result, despite having obtained a
+ lease, the VM is left without functioning networking. The failure looks
+ something like the following traceback (key error message is that the
+ dhcpcd service isn't running: "Stderr: dhcpcd is not running").
  
  │Oct 13 19:13:08 localhost dhcpcd[826]: dhcpcd-10.0.6 starting
  │Oct 13 19:13:08 localhost dhcpcd[829]: DUID 
00:01:00:01:30:71:4c:4a:02:0c:02:61:06:11
  │Oct 13 19:13:08 localhost dhcpcd[829]: ens3: IAID 0c:a0:00:01
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: rebinding lease of 172.29.48.68
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: NAK: wrong address from 
172.29.16.1
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: message: wrong address
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: soliciting a DHCP lease
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: offered 172.29.16.165 from 
172.29.16.1
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: leased 172.29.16.165 for 
infinity
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: adding route to 172.29.16.0/24
  │Oct 13 19:13:10 localhost dhcpcd[829]: ens3: adding default route via 
172.29.16.1
  │Oct 13 19:13:10 localhost dhcpcd[829]: control command: dhcpcd --dumplease 
--ipv4only ens3
  │Oct 13 19:13:10 localhost cloud-init[823]: 2025-10-13 19:13:10,655 - 
subp.py[WARNING]: Running invalid command: ['dhcpcd', '--dumplease', 
'--ipv4only', <cloudinit.distros.ubuntu.Distro object at 0x72f24d0e3830>]
  │Oct 13 19:13:10 localhost cloud-init[823]: 2025-10-13 19:13:10,657 - 
DataSourceCloudStack.py[WARNING]: Unable to obtain a DHCP lease on ens3
  │Oct 13 19:13:13 localhost dhcpcd[977]: dhcpcd is not running
  │Oct 13 19:13:13 localhost systemd-networkd[974]: ens3: DHCPv4 address 
172.29.16.165/24, gateway 172.29.16.1 acquired from 172.29.16.1
  │Oct 13 19:13:13 localhost cloud-init[905]: 2025-10-13 19:13:13,153 - 
main.py[ERROR]: failed stage init
  │Oct 13 19:13:13 localhost cloud-init[905]: Traceback (most recent call last):
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 863, in 
get_newest_lease
  │Oct 13 19:13:13 localhost cloud-init[905]:     subp.subp(
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/subp.py", line 291, in subp
  │Oct 13 19:13:13 localhost cloud-init[905]:     raise ProcessExecutionError(
  │Oct 13 19:13:13 localhost cloud-init[905]: 
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
  │Oct 13 19:13:13 localhost cloud-init[905]: Command: ['dhcpcd', 
'--dumplease', '--ipv4only', 'ens3']
  │Oct 13 19:13:13 localhost cloud-init[905]: Exit code: 1
  │Oct 13 19:13:13 localhost cloud-init[905]: Reason: -
  │Oct 13 19:13:13 localhost cloud-init[905]: Stdout:
  │Oct 13 19:13:13 localhost cloud-init[905]: Stderr: dhcpcd is not running
  │Oct 13 19:13:13 localhost cloud-init[905]: The above exception was the 
direct cause of the following exception:
  │Oct 13 19:13:13 localhost cloud-init[905]: Traceback (most recent call last):
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 963, in 
status_wrapper
  │Oct 13 19:13:13 localhost cloud-init[905]:     ret = functor(name, args)
  │Oct 13 19:13:13 localhost cloud-init[905]:           ^^^^^^^^^^^^^^^^^^^
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 631, in main_init
  │Oct 13 19:13:13 localhost cloud-init[905]:     _maybe_set_hostname(init, 
stage="init-net", retry_stage="modules:config")
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 1046, in 
_maybe_set_hostname
  │Oct 13 19:13:13 localhost cloud-init[905]:     (hostname, _fqdn, _) = 
util.get_hostname_fqdn(
  │Oct 13 19:13:13 localhost cloud-init[905]:                            
^^^^^^^^^^^^^^^^^^^^^^^
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/util.py", line 1221, in 
get_hostname_fqdn
  │Oct 13 19:13:13 localhost cloud-init[905]:     fqdn = cloud.get_hostname(
  │Oct 13 19:13:13 localhost cloud-init[905]:            ^^^^^^^^^^^^^^^^^^^
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 102, in get_hostname
  │Oct 13 19:13:13 localhost cloud-init[905]:     return 
self.datasource.get_hostname(
  │Oct 13 19:13:13 localhost cloud-init[905]:            
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceCloudStack.py", 
line 156, in get_hostname
  │Oct 13 19:13:13 localhost cloud-init[905]:     domainname = 
self._get_domainname()
  │Oct 13 19:13:13 localhost cloud-init[905]:                  
^^^^^^^^^^^^^^^^^^^^^^
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceCloudStack.py", 
line 132, in _get_domainname
  │Oct 13 19:13:13 localhost cloud-init[905]:     latest_lease = 
self.distro.dhcp_client.get_newest_lease(
  │Oct 13 19:13:13 localhost cloud-init[905]:                    
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  │Oct 13 19:13:13 localhost cloud-init[905]:   File 
"/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 881, in 
get_newest_lease
  │Oct 13 19:13:13 localhost cloud-init[905]:     raise NoDHCPLeaseError from 
error
  │Oct 13 19:13:13 localhost cloud-init[905]: 
cloudinit.net.dhcp.NoDHCPLeaseError
  │Oct 13 19:13:13 localhost cloud-init[905]: failed run of stage init
  
  - The issue was discussed upstream at [1], and [2] and fixed in [3] and
  [4].
  
  - The race is that dhcpcd may start trying to read from stdout before
  cat or another program has finished writing.
  
  - This does not reliably reproduce as it is a race, but any Noble
  environment using cloud-init and the new default dhcp client might hit
  this bug.
  
  [Test Plan]
  
  - We have a customer that can reproduce this every couple VM boots (more
  often than 1 in every 8 boots). To validate the patch we can have them
  test that this error does not happen after arbitrarily many VM boots,
  eg. 40.
  
  - A second test we can follow is the method to forcefully reproduce the issue 
described in [5],
  (sleep 0.1; sudo cat /var/lib/dhcpcd/<Interface>.lease) | dhcpcd --dumplease
  
  which targets specifically dhcpcd (removing the cloud-init context in
  which the issue was discovered):
  
https://github.com/NetworkConfiguration/dhcpcd/issues/285#issuecomment-1900583257.
  This will break without the patch, but should succeed with the proposed
  dhcpcd, though it will require the additional '-' argument to forcibly
  wait on stdin as described in [6]
  
  [What can go wrong]
  
  - Serializing the operations can introduce a slight performance
  degradation, but it is necessary for correctness of result (avoiding the
  crash). Besides, this degradation in performance would only be in
  situations where users intentionally forcefully wait on stdin
  
  [Other Info]
  
  This is fixed upstream in 10.0.9, which means Noble is the only affected
  series as Plucky has 10.1.0
  
  [1] https://github.com/NetworkConfiguration/dhcpcd/issues/285
  [2] https://github.com/NetworkConfiguration/dhcpcd/issues/286
  [3] 
https://github.com/NetworkConfiguration/dhcpcd/commit/25806878c9975dd769e2e193eae22f470ef4c71a
  [4] 
https://github.com/NetworkConfiguration/dhcpcd/commit/4395477920b77dc82d4dc0bfe9fb626132c9d3e1
  [5] 
https://github.com/NetworkConfiguration/dhcpcd/issues/285#issuecomment-1900583257
  [6] https://github.com/NetworkConfiguration/dhcpcd/pull/289#issue-2091822946

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131252

Title:
  [SRU] Intermittent Failures when Configuring a DHCP Lease on Noble
  when dhcpcd Client is Connected to stdin Via a Pipe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dhcpcd/+bug/2131252/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to