[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received

2023-01-22 Thread Chris Patterson
Great work Maurico, I think you make several excellent points and I
appreciate your efforts on a better reproducer and alternative patch.
FWIW I began testing the Matthew's initial build (which disabled
threads) against a large number of VMs and that appeared to address the
issues we're seeing.  I'm cutting those tests short and am updating the
tests now to use your patch as provided by Matthew and we'll see how
that goes!

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient: thread concurrency race leads to DHCPOFFER packets not being
  received

Status in bind9-libs package in Ubuntu:
  Fix Released
Status in isc-dhcp package in Ubuntu:
  Invalid
Status in bind9-libs source package in Focal:
  In Progress
Status in bind9-libs source package in Jammy:
  In Progress

Bug description:
  [Impact]

  Occasionally, during instance boot or machine start-up, dhclient will
  attempt to acquire a dhcp lease and fail, leaving the instance with no
  IP address and making it unreachable.

  This happens about once every 100 reboots on bare metal, or Chris
  Patterson in comment #2 describes it as affecting between ~0.3% to 2%
  of deployments on Microsoft Azure. Azure uses dhclient called from
  cloud-init instead of systemd-networkd, and this is causing issues
  with larger deployments.

  The logs of an affected dhclient produce the following:

  Listening on LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   Socket/fallback
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
  ...
  (omitting 20 similar lines)
  ...
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
  No DHCPOFFERS received.
  No working leases in persistent database - sleeping.

  Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
  Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/

  The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER
  packets being replied to with DHCPOFFER packets, but the got_one()
  callback is never called, dhclient does not read these DHCPOFFER
  packets, and continues sending DHCPDISCOVER packets. Once it reaches
  25 DHCPDISCOVER packets sent, it gives up.

  tcpdump: 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap
  Screenshot of Wireshark: 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png

  This behaviour led several bug reporters to believe it was a kernel
  issue, with the kernel not pushing DHCPOFFER packets to dhclient. This
  is not the case, the actual problem is dhclient containing a thread
  concurrency race condition, and when the race occurs, the read socket
  is closed prematurely, and dhclient does not read any of the DHCPOFFER
  replies.

  The full explanation is in the "Other Info" section, but the fix for
  this is to change bind9-libs from being built multithreaded, back to
  single threaded as intended by dhclient maintainers.

  In Focal and Jammy, isc-dhcp links against bind9 libraries provided in
  bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9
  library it uses, which is already configured properly to --disable-
  threads.

  Change the Focal and Jammy bind9-libs to --disable-threads and update
  symbol files to reflect the library is single threaded again.

  [Testcase]

  Start a fresh Focal or Jammy instance.

  Download and set executable test-parallel.sh, and edit some lines:

  1) wget 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
  2) chmod +x test-parallel.sh
  3) vim test-parallel.sh

  Change iface="enp5s0" to your interface, likely iface="enp1s0".
  Comment out the line "#   cp bionic-dhclient $workdir/dhclient".

  4) sudo ./test-parallel.sh

  After five minutes, if you issue reproduces, you will see "TEST
  FAILED".

  You can watch the output with:

  5) cat /tmp/dhclient-* | less

  Next, for instrumented runs, you need to build dhclient from source.

  1) sudo apt install build-essential devscripts
  2) apt source isc-dhcp
  3) sudo apt build-dep isc-dhcp
  4) cd isc-dhcp

  Apply the below patch:

  https://paste.ubuntu.com/p/hGsssrVyG4/

  5) patch -p1 < ~/patch.patch
  6) debuild -b -uc -us
  7) cd ..
  8) sudo dpkg -i isc-dhcp-client-*
  9) sudo ./test-parallel.sh
  10) cat /tmp/dhclient-* | less

  Look for the race, as described in "Other Info", namely:

 

[Touch-packages] [Bug 1989190] Re: Bionic networking failures after NIC reordering

2022-09-09 Thread Chris Patterson
Reproducer script for both variants of systemd.

** Attachment added: "reproducer script"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1989190/+attachment/5614805/+files/lp1989190-reproducer.sh

** Description changed:

- Partially documented in https://bugs.launchpad.net/bugs/1958280 and
+ Documented across https://bugs.launchpad.net/bugs/1958280 and
  https://canonical.force.com/ua/s/case/5004K0E96qlQAB/vf-nic-not-
  getting-renamed-properly-for-ubuntu-2004.
  
- Splitting these reports to focus on Bionic, because it's different than
- 20.04+ and last week's failure
+ Creating this bug to focus on Bionic, because it's different than 20.04+
+ and last week's failure
  https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119 helped me
  identify part of the root cause.
  
  When NICs are renamed on boot, networkd tends to fail to configure them.
  
  
  # WITHOUT THE PROPOSED SYSTEMD PATCH
  
  
  cpatterson@test-ubu1804-nicrenamerepro-x1:~$ networkctl list
  IDX LINK TYPE   OPERATIONAL SETUP
-   1 lo   loopback   carrier unmanaged
-   2 eth0 ether  routableconfigured
-   3 eth1 ether  n/a unmanaged
-   4 eth2 ether  routableconfigured
-   5 eth3 ether  routableconfigured
-   6 eth4 ether  routableconfigured
-   7 eth5 ether  off unmanaged
-   8 eth6 ether  off unmanaged
-   9 eth7 ether  off unmanaged
- 
+   1 lo   loopback   carrier unmanaged
+   2 eth0 ether  routableconfigured
+   3 eth1 ether  n/a unmanaged
+   4 eth2 ether  routableconfigured
+   5 eth3 ether  routableconfigured
+   6 eth4 ether  routableconfigured
+   7 eth5 ether  off unmanaged
+   8 eth6 ether  off unmanaged
+   9 eth7 ether  off unmanaged
  
  ### As expected, we can see the properties are missing.
  
  cpatterson@test-ubu1804-nicrenamerepro-x1:~$ sudo udevadm info 
/sys/class/net/eth7
  P: 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/0022481f-69aa-0022-481f-69aa0022481f/net/eth7
  E: 
DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/0022481f-69aa-0022-481f-69aa0022481f/net/rename9
  E: ID_NET_NAME_MAC=enx0022481f69aa
  E: ID_OUI_FROM_DATABASE=Microsoft Corporation
  E: ID_PATH=acpi-VMBUS:01
  E: ID_PATH_TAG=acpi-VMBUS_01
  E: IFINDEX=9
  E: INTERFACE=eth1
  E: SUBSYSTEM=net
  E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/rename9 
/sys/subsystem/net/devices/eth1 /sys/subsystem/net/devices/cirename0 
/sys/subsystem/net/devices/eth7
  E: TAGS=:systemd:
  E: USEC_INITIALIZED=11203606
  
  ### As expected, restarting networkd does not fix the issue.
  
  cpatterson@test-ubu1804-nicrenamerepro-x1:~$ sudo systemctl restart 
systemd-networkd
  cpatterson@test-ubu1804-nicrenamerepro-x1:~$ networkctl list
  IDX LINK TYPE   OPERATIONAL SETUP
-   1 lo   loopback   carrier unmanaged
-   2 eth0 ether  routableconfigured
-   3 eth1 ether  off unmanaged
-   4 eth2 ether  routableconfigured
-   5 eth3 ether  routableconfigured
-   6 eth4 ether  routableconfigured
-   7 eth5 ether  off unmanaged
-   8 eth6 ether  off unmanaged
-   9 eth7 ether  off unmanaged
+   1 lo   loopback   carrier unmanaged
+   2 eth0 ether  routableconfigured
+   3 eth1 ether  off unmanaged
+   4 eth2 ether  routableconfigured
+   5 eth3 ether  routableconfigured
+   6 eth4 ether  routableconfigured
+   7 eth5 ether  off unmanaged
+   8 eth6 ether  off unmanaged
+   9 eth7 ether  off unmanaged
  
  9 links listed.
  
  
  # WITH THE PROPOSED SYSTEMD PATCH
  
  
  I built systemd with the proposed patches in
  https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119.  With
  these patches, networking still comes up broken, but restarting networkd
  does fix things.
  
  cpatterson@test-ubu1804-nicrenamerepro-systemd55-x2:~$ networkctl list
  IDX LINK TYPE   

[Touch-packages] [Bug 1989190] [NEW] Bionic networking failures after NIC reordering

2022-09-09 Thread Chris Patterson
Public bug reported:

Documented across https://bugs.launchpad.net/bugs/1958280 and
https://canonical.force.com/ua/s/case/5004K0E96qlQAB/vf-nic-not-
getting-renamed-properly-for-ubuntu-2004.

Creating this bug to focus on Bionic, because it's different than 20.04+
and last week's failure
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119 helped me
identify part of the root cause.

When NICs are renamed on boot, networkd tends to fail to configure them.


# WITHOUT THE PROPOSED SYSTEMD PATCH


cpatterson@test-ubu1804-nicrenamerepro-x1:~$ networkctl list
IDX LINK TYPE   OPERATIONAL SETUP
  1 lo   loopback   carrier unmanaged
  2 eth0 ether  routableconfigured
  3 eth1 ether  n/a unmanaged
  4 eth2 ether  routableconfigured
  5 eth3 ether  routableconfigured
  6 eth4 ether  routableconfigured
  7 eth5 ether  off unmanaged
  8 eth6 ether  off unmanaged
  9 eth7 ether  off unmanaged

### As expected, we can see the properties are missing.

cpatterson@test-ubu1804-nicrenamerepro-x1:~$ sudo udevadm info 
/sys/class/net/eth7
P: 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/0022481f-69aa-0022-481f-69aa0022481f/net/eth7
E: 
DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/0022481f-69aa-0022-481f-69aa0022481f/net/rename9
E: ID_NET_NAME_MAC=enx0022481f69aa
E: ID_OUI_FROM_DATABASE=Microsoft Corporation
E: ID_PATH=acpi-VMBUS:01
E: ID_PATH_TAG=acpi-VMBUS_01
E: IFINDEX=9
E: INTERFACE=eth1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/rename9 
/sys/subsystem/net/devices/eth1 /sys/subsystem/net/devices/cirename0 
/sys/subsystem/net/devices/eth7
E: TAGS=:systemd:
E: USEC_INITIALIZED=11203606

### As expected, restarting networkd does not fix the issue.

cpatterson@test-ubu1804-nicrenamerepro-x1:~$ sudo systemctl restart 
systemd-networkd
cpatterson@test-ubu1804-nicrenamerepro-x1:~$ networkctl list
IDX LINK TYPE   OPERATIONAL SETUP
  1 lo   loopback   carrier unmanaged
  2 eth0 ether  routableconfigured
  3 eth1 ether  off unmanaged
  4 eth2 ether  routableconfigured
  5 eth3 ether  routableconfigured
  6 eth4 ether  routableconfigured
  7 eth5 ether  off unmanaged
  8 eth6 ether  off unmanaged
  9 eth7 ether  off unmanaged

9 links listed.


# WITH THE PROPOSED SYSTEMD PATCH


I built systemd with the proposed patches in
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119.  With
these patches, networking still comes up broken, but restarting networkd
does fix things.

cpatterson@test-ubu1804-nicrenamerepro-systemd55-x2:~$ networkctl list
IDX LINK TYPE   OPERATIONAL SETUP
  1 lo   loopback   carrier unmanaged
  2 eth0 ether  routableconfigured
  3 eth1 ether  n/a unmanaged
  4 eth2 ether  n/a unmanaged
  5 eth3 ether  n/a unmanaged
  6 eth4 ether  routableconfigured
  7 eth5 ether  n/a unmanaged
  8 eth6 ether  n/a unmanaged
  9 eth7 ether  n/a unmanaged

9 links listed.

cpatterson@test-ubu1804-nicrenamerepro-systemd55-x2:~$ sudo udevadm info 
/sys/class/net/eth1
P: 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/0022482b-f769-0022-482b-f7690022482b/net/eth1
E: 
DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/0022482b-f769-0022-482b-f7690022482b/net/rename3
E: ID_NET_DRIVER=hv_netvsc
E: ID_NET_LINK_FILE=/run/systemd/network/10-netplan-eth7.link
E: ID_NET_NAME=eth1
E: ID_NET_NAME_MAC=enx0022482bf769
E: ID_OUI_FROM_DATABASE=Microsoft Corporation
E: ID_PATH=acpi-VMBUS:01
E: ID_PATH_TAG=acpi-VMBUS_01
E: IFINDEX=3
E: INTERFACE=eth7
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/rename3 
/sys/subsystem/net/devices/eth7 /sys/subsystem/net/devices/eth1
E: TAGS=:systemd:
E: USEC_INITIALIZED=10280176

cpatterson@test-ubu1804-nicrenamerepro-systemd55-x2:~$ sudo systemctl restart 
systemd-networkd
cpatterson@test-ubu1804-nicrenamerepro-systemd55-x2:~$ networkctl list
IDX LINK TYPE   OPERATIONAL SETUP
  1 lo   loopback   carrier unmanaged
  2 

[Touch-packages] [Bug 1926139] Re: dhclient doesn't receive dhcp offer from kernel

2022-05-26 Thread Chris Patterson
We've been investigating a similar issue in Ubuntu 20.04 (and now 22.04)
on Azure where Running PPS re-use fails to perform DHCP for 5 minutes
when dhclient is invoked by cloud-init.  dhclient is run by cloud-init,
but sees no DHCPOFFER.  It varies due to unknown reasons but it has
affected a ~0.3-2% of deployments in this scenario over time.

We instrumented our images to capture network traffic and see what is
happening and sure enough DHCP offers are coming through to the guest by
dhclient doesn't see them.  We instrumented dhclient and the "got_one()"
callback is never invoked in these failures.

18.04 does not have this issue.

This behavior can be reproduced multiple ways:
- Reproduce similar test environment to above scenario using cloud-init (switch 
hyperv nic to a different vnet while waiting the link status to reset, then 
perform dhcp).  This test case will reproduce in ~1,500 runs, though it varies 
and requires more complex setup.
- Repeatedly run dhclient in a loop until it fails (see test-sequential.sh).  
It may take a while, but even this simple test will reproduce this behavior in 
~50k runs for me in an LXD VM.
- Simply launch instances of dhclient in parallel (see test-parallel.sh). There 
is an excellent chance at least one of those dhclients will fail this way.

I noticed the uprev of bind9 libs in focal:
focal (net): 1:9.11.16+dfsg-3~build1
focal-updates (net): 1:9.11.16+dfsg-3~ubuntu1
impish (net): 1:9.11.19+dfsg-2.1ubuntu1
jammy (net): 1:9.11.19+dfsg-2.1ubuntu3
kinetic (net): 1:9.11.19+dfsg-2.1ubuntu3

I couldn't find any related issue on the isc-dhcp tracker, etc.  I did
build dhclient from the Debian master branch
(https://salsa.debian.org/debian/isc-dhcp/-/commits/master/debian) which
uses the in-tree bind libs and that seems to have addressed the issue
for all scenarios.  Not that it helps much to bisect this just yet.

** Attachment added: "parallel test"
   
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient doesn't receive dhcp offer from kernel

Status in isc-dhcp package in Ubuntu:
  New

Bug description:
  Platform: Qemu/libvirt on AMD64
  Ubuntu version: 20.04
  isc-dhcp-client version: 4.4.1-2.1ubuntu5
  Problem: When dhclient is used during boot every few reboots the DHCP OFFER 
packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be 
seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no 
read event is triggered.
  Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these 
dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still 
occurs. So this issue might also be kernel related.

  Attached diff shows a strace of all threads and a pcap showing the
  tcpdump output.

  Edit:
  - Sometimes the dhclient command does receive the OFFER packet and connection 
is restored.
  - In my testing running dhclient manually from the terminal when the OFFERs 
aren't received will result in a new dhclient session which does receive the 
OFFER packet and connection is restored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1926139] Re: dhclient doesn't receive dhcp offer from kernel

2022-05-26 Thread Chris Patterson
** Attachment added: "sequential test"
   
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593046/+files/test-sequential.sh

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient doesn't receive dhcp offer from kernel

Status in isc-dhcp package in Ubuntu:
  New

Bug description:
  Platform: Qemu/libvirt on AMD64
  Ubuntu version: 20.04
  isc-dhcp-client version: 4.4.1-2.1ubuntu5
  Problem: When dhclient is used during boot every few reboots the DHCP OFFER 
packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be 
seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no 
read event is triggered.
  Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these 
dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still 
occurs. So this issue might also be kernel related.

  Attached diff shows a strace of all threads and a pcap showing the
  tcpdump output.

  Edit:
  - Sometimes the dhclient command does receive the OFFER packet and connection 
is restored.
  - In my testing running dhclient manually from the terminal when the OFFERs 
aren't received will result in a new dhclient session which does receive the 
OFFER packet and connection is restored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp