[Touch-packages] [Bug 1949723] Re: systemd-resolved segfault in hashmap_iterate_entry
** Changed in: systemd (Ubuntu Focal) Status: New => In Progress ** Changed in: systemd (Ubuntu Focal) Importance: Low => Medium ** Changed in: systemd (Ubuntu Focal) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1949723 Title: systemd-resolved segfault in hashmap_iterate_entry Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Focal: In Progress Bug description: installed libnss-resolve that put "resolve" in nsswitch.conf. $ lsb_release -rd Description: Ubuntu 20.04.3 LTS Release: 20.04 $ dpkg -l systemd | grep systemd ii systemd245.4-4ubuntu3.13 amd64system and service manager $ grep ^hosts /etc/nsswitch.conf hosts: files libvirt mdns4_minimal resolve [NOTFOUND=return] dns mymachines systemd-resolved crashed once with segmentation fault. (gdb) bt #0 0x7f119c67693a in hashmap_iterate_entry (h=h@entry=0x706f746b73656465, i=i@entry=0x7ffc4ef515d0) at ../src/basic/hashmap.c:705 #1 0x7f119c6789d6 in internal_hashmap_iterate (h=0x706f746b73656465, i=i@entry=0x7ffc4ef515d0, value=value@entry=0x7ffc4ef515c8, key=key@entry=0x0) at ../src/basic/hashmap.c:714 #2 0x7f119c678a8b in set_iterate (s=, i=i@entry=0x7ffc4ef515d0, value=value@entry=0x7ffc4ef515c8) at ../src/basic/hashmap.c:735 #3 0x55ba5e0ea917 in dns_query_candidate_go (c=c@entry=0x55ba5f353180) at ../src/resolve/resolved-dns-query.c:152 #4 0x55ba5e0e9f0c in dns_query_candidate_notify (c=c@entry=0x55ba5f353180) at ../src/resolve/resolved-dns-query.c:312 #5 0x55ba5e0ea178 in dns_transaction_complete (t=0x55ba5f37a9d0, state=) at ../src/resolve/resolved-dns-transaction.c:351 #6 0x55ba5e0e27cd in dns_transaction_process_dnssec (t=t@entry=0x55ba5f37a9d0) at ../src/resolve/resolved-dns-transaction.c:838 #7 0x55ba5e0e3649 in dns_transaction_process_reply (t=t@entry=0x55ba5f37a9d0, p=p@entry=0x55ba5f39dce0) at ../src/resolve/resolved-dns-transaction.c:1210 #8 0x55ba5e0e40ab in on_dns_packet (s=, fd=, revents=, userdata=0x55ba5f37a9d0) at ../src/resolve/resolved-dns-transaction.c:1264 #9 0x7f119c5e6c77 in source_dispatch (s=s@entry=0x55ba5f346780) at ../src/libsystemd/sd-event/sd-event.c:3193 #10 0x7f119c5e6f11 in sd_event_dispatch (e=e@entry=0x55ba5f320430) at ../src/libsystemd/sd-event/sd-event.c:3634 #11 0x7f119c5e8948 in sd_event_run (e=e@entry=0x55ba5f320430, timeout=timeout@entry=18446744073709551615) at ../src/libsystemd/sd-event/sd-event.c:3692 #12 0x7f119c5e8b6f in sd_event_loop (e=0x55ba5f320430) at ../src/libsystemd/sd-event/sd-event.c:3714 #13 0x55ba5e0c326a in run (argv=, argc=) at ../src/resolve/resolved.c:84 #14 main (argc=, argv=) at ../src/resolve/resolved.c:91 This seems to have been reported upstream https://github.com/systemd/systemd/issues/16168 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1949723/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled
Thank you for the help sorting autopkgtests Mauricio. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to coreutils in Ubuntu. https://bugs.launchpad.net/bugs/2033892 Title: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled Status in coreutils package in Ubuntu: Fix Released Status in coreutils source package in Jammy: Fix Committed Status in coreutils package in Fedora: Fix Released Bug description: [Impact] Issuing a 'ls -l' or a 'stat' on an autofs share when you have set --ghost in the auto.master file, or browse_mode=yes in autofs.conf will lead to the shares being mounted, when they didn't previously. Disks / shares may not be present and the mounts may fail, leading to errors in your output. This is a behaviour change in autofs 8.32, which occurred in the transition to using statx() instead of stat()/lstat() in previous releases. There doesn't seem to be any workarounds, apart from not running a 'ls -l' in your autofs share directory. [Testcase] Start two Jammy VMs. One will be a NFS server, the other, the client. NFS server: Server VM: $ sudo hostnamectl set-hostname jammy-nfs-server $ sudo apt update && sudo apt upgrade -y $ sudo apt install nfs-kernel-server $ sudo mkdir /export $ sudo mkdir /export/users $ sudo vi /etc/exports # add the following lines: /export 192.168.122.0/24(rw,fsid=0,no_subtree_check,sync) /export/users 192.168.122.0/24(rw,nohide,insecure,no_subtree_check,sync) $ sudo systemctl restart nfs-server.service AutoFS Client: $ sudo apt update $ sudo apt install autofs $ sudo vim /etc/autofs.conf browse_mode = yes $ sudo mkdir /mnt2 $ sudo vim /etc/auto.master /mnt2 /etc/auto.indirect $ sudo vim /etc/auto.indirect export 192.168.122.18:/export export-missing 192.168.122.18:/export-missing $ sudo reboot $ cd /mnt2 $ ls -l ls: cannot access 'export-missing': No such file or directory total 4 drwxr-xr-x 3 root root 4096 Feb 12 21:48 export d? ? ?? ?? export-missing $ mount -l | grep /mnt2 /etc/auto.indirect on /mnt2 type autofs (rw,relatime,fd=6,pgrp=634,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21561) 192.168.122.18:/export on /mnt2/export type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.18,mountvers=3,mountport=35786,mountproto=udp,local_lock=none,addr=192.168.122.18) We see the mount for export occurred, and export-missing was attempted, but it was either bogus or the disk was not present, leading to a "No such file or directory" error. There are test packages available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf378489-test If you install them, this is what should occur: $ ls -l total 0 drwxr-xr-x 2 root root 0 Feb 12 22:01 export drwxr-xr-x 2 root root 0 Feb 12 22:01 export-missing $ mount -l | grep /mnt2 /etc/auto.indirect on /mnt2 type autofs (rw,relatime,fd=6,pgrp=636,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=18346) No mounts happen, and no errors either. [Where problems could occur] We are changing the behaviour of core utilities, ls and stat, such that they no longer attempt to mount autofs shares when --ghost option is present or browse_mode is enabled. This is the intended behaviour in the first place, and has been this way for at least a decade prior, and was changed to return to this behaviour shortly after the release of coreutils that introduced statx() that caused automounts to occur. It is unlikely any system administrators are relying on the behaviour found in jammy in any scripts or day to day operations that would be adversely affected by the change. The worst case scenario is that a user doing an 'ls -l' on an unmounted disk finds the mount doesn't automatically occur, and they have to attach the disk and issue the mount themselves. If a regression were to occur, it would be limited to the ls and stat commands, specifically when listing directories linked to autofs mountpoints. [Other info] The automount behaviour change was introduced upstream in version 8.32, with the introduction of the statx() call. This means only Jammy is affected. It was quickly reverted back to how it was originally in version 9.1, which is already available in Mantic and onward. The commits that solve the issue are: commit 85c975df2c25bd799370b04bb294e568e001102f From: Rohan Sable Date: Mon, 7 Mar 2022 14:14:13 + Subject: ls: avoid triggering automounts Link: https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2 commit 92cb8427c537f37edd43c5cef1909585201372ab From: Pádraig Brady Date: Mon, 7 Mar 2022 23:29:20 + Subject: stat: only automount with
[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled
Performing verification for Jammy I set up two Jammy VMs, one a nfs-server and the other an autofs/nfs- client. The client is using coreutils 8.32-4.1ubuntu1.1 from -updates. $ apt-cache policy coreutils | grep Installed Installed: 8.32-4.1ubuntu1.1 I set up the nfs server and autofs mounts as the Testcase indicates. $ ls -l ls: cannot access 'export-missing': No such file or directory total 4 drwxr-xr-x 3 root root 4096 Mar 20 22:16 export d? ? ?? ?? export-missing $ mount -l | grep mnt2 /etc/auto.indirect on /mnt2 type autofs (rw,relatime,fd=6,pgrp=692,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21588) 192.168.122.65:/export on /mnt2/export type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.65,mountvers=3,mountport=47718,mountproto=udp,local_lock=none,addr=192.168.122.65) The mounts were previously unmounted, but when I issue 'ls -l', the mounts occur, which is not wanted, and we error out on the non-existant export-missing mount. I then enabled -proposed, and installed coreutils 8.32-4.1ubuntu1.2. $ apt-cache policy coreutils | grep Installed Installed: 8.32-4.1ubuntu1.2 From there, lets try the 'ls -l': $ ls -l total 0 drwxr-xr-x 2 root root 0 Mar 20 22:25 export drwxr-xr-x 2 root root 0 Mar 20 22:25 export-missing $ mount -l | grep mnt2 /etc/auto.indirect on /mnt2 type autofs (rw,relatime,fd=6,pgrp=648,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=16856) This time the mounts do not occur, we just get a listing of the possible autofs mounts. We can confirm with 'mount -l' that nothing was actually actioned. The package in -proposed fixes the issues. Happy to mark verified for Jammy. ** Tags removed: verification-needed verification-needed-jammy ** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to coreutils in Ubuntu. https://bugs.launchpad.net/bugs/2033892 Title: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled Status in coreutils package in Ubuntu: Fix Released Status in coreutils source package in Jammy: Fix Committed Status in coreutils package in Fedora: Fix Released Bug description: [Impact] Issuing a 'ls -l' or a 'stat' on an autofs share when you have set --ghost in the auto.master file, or browse_mode=yes in autofs.conf will lead to the shares being mounted, when they didn't previously. Disks / shares may not be present and the mounts may fail, leading to errors in your output. This is a behaviour change in autofs 8.32, which occurred in the transition to using statx() instead of stat()/lstat() in previous releases. There doesn't seem to be any workarounds, apart from not running a 'ls -l' in your autofs share directory. [Testcase] Start two Jammy VMs. One will be a NFS server, the other, the client. NFS server: Server VM: $ sudo hostnamectl set-hostname jammy-nfs-server $ sudo apt update && sudo apt upgrade -y $ sudo apt install nfs-kernel-server $ sudo mkdir /export $ sudo mkdir /export/users $ sudo vi /etc/exports # add the following lines: /export 192.168.122.0/24(rw,fsid=0,no_subtree_check,sync) /export/users 192.168.122.0/24(rw,nohide,insecure,no_subtree_check,sync) $ sudo systemctl restart nfs-server.service AutoFS Client: $ sudo apt update $ sudo apt install autofs $ sudo vim /etc/autofs.conf browse_mode = yes $ sudo mkdir /mnt2 $ sudo vim /etc/auto.master /mnt2 /etc/auto.indirect $ sudo vim /etc/auto.indirect export 192.168.122.18:/export export-missing 192.168.122.18:/export-missing $ sudo reboot $ cd /mnt2 $ ls -l ls: cannot access 'export-missing': No such file or directory total 4 drwxr-xr-x 3 root root 4096 Feb 12 21:48 export d? ? ?? ?? export-missing $ mount -l | grep /mnt2 /etc/auto.indirect on /mnt2 type autofs (rw,relatime,fd=6,pgrp=634,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21561) 192.168.122.18:/export on /mnt2/export type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.18,mountvers=3,mountport=35786,mountproto=udp,local_lock=none,addr=192.168.122.18) We see the mount for export occurred, and export-missing was attempted, but it was either bogus or the disk was not present, leading to a "No such file or directory" error. There are test packages available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf378489-test If you install them, this is what should occur: $ ls -l total 0 drwxr-xr-x 2 root root 0 Feb 12 22:01 export drwxr-xr-x 2 root root 0 Feb 12 22:01 export-missing $ mount -l | grep /mnt2 /etc/auto.indirect on /mnt2 type autofs
[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs
Attached is a debdiff for mantic which fixes this issue. ** Patch added: "Debdiff for gtkpod on mantic" https://bugs.launchpad.net/ubuntu/+source/gtkpod/+bug/2044420/+attachment/5757356/+files/lp2044420_mantic.debdiff ** Changed in: glib2.0 (Ubuntu Noble) Status: Triaged => Fix Released ** No longer affects: gtkpod (Ubuntu Noble) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to glib2.0 in Ubuntu. https://bugs.launchpad.net/bugs/2044420 Title: gtkpod segfaults when attempting to display songs Status in GLib: Fix Released Status in glib2.0 package in Ubuntu: Fix Released Status in gtkpod package in Ubuntu: New Status in glib2.0 source package in Mantic: Triaged Status in gtkpod source package in Mantic: New Status in glib2.0 source package in Noble: Fix Released Bug description: Open gtkpod, and select your ipod from the list. If it has more than one screenfull of songs to display in the list, gtkpod will immediately segfault. I haven't found a workaround yet. Broken on Mantic, works on Lunar. Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault. __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 685 ../string/strxfrm_l.c: No such file or directory. (gdb) bt #0 __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 #1 0x770c5a5e in g_utf8_collate_key () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x77f852ec in fuzzy_skip_prefix () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #3 0x7fffa80980ca in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #4 0x7fffa80997fd in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #5 0x7fffa8099526 in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #6 0x7fffa809f196 in sorttab_display_select_playlist_cb () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #7 0x7718d130 in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #8 0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #9 0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #10 0x771abbd6 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #11 0x771abc93 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #12 0x77f67e4b in gtkpod_set_current_playlist () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #13 0x7fffa807cce0 in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #16 0x7708c46f in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0 #18 0xea1f in main () To manage notifications about this bug go to: https://bugs.launchpad.net/glib/+bug/2044420/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs
gtkpod has been removed from debian, and thus removed from noble, so no need to fix there. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to glib2.0 in Ubuntu. https://bugs.launchpad.net/bugs/2044420 Title: gtkpod segfaults when attempting to display songs Status in GLib: Fix Released Status in glib2.0 package in Ubuntu: Fix Released Status in gtkpod package in Ubuntu: New Status in glib2.0 source package in Mantic: Triaged Status in gtkpod source package in Mantic: New Status in glib2.0 source package in Noble: Fix Released Bug description: Open gtkpod, and select your ipod from the list. If it has more than one screenfull of songs to display in the list, gtkpod will immediately segfault. I haven't found a workaround yet. Broken on Mantic, works on Lunar. Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault. __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 685 ../string/strxfrm_l.c: No such file or directory. (gdb) bt #0 __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 #1 0x770c5a5e in g_utf8_collate_key () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x77f852ec in fuzzy_skip_prefix () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #3 0x7fffa80980ca in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #4 0x7fffa80997fd in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #5 0x7fffa8099526 in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #6 0x7fffa809f196 in sorttab_display_select_playlist_cb () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #7 0x7718d130 in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #8 0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #9 0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #10 0x771abbd6 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #11 0x771abc93 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #12 0x77f67e4b in gtkpod_set_current_playlist () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #13 0x7fffa807cce0 in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #16 0x7708c46f in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0 #18 0xea1f in main () To manage notifications about this bug go to: https://bugs.launchpad.net/glib/+bug/2044420/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
I have been running the test packages on AWS with the reproducer running for 20 days now, and they are still running great. The change to direct IO really does fix this issue, and my testing has removed any and all concerns of causing a regression. Previously focal wouldn't last more than 20 minutes, and jammy onward, a week. I will get these patches sponsored now. Sorry for the delay Krister. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled
** Description changed: - Release: 22.04.3 LTS - coreutils 8.32-4.1ubuntu1 + [Impact] - ls triggers unwanted mounts of autofs filesystems + Issuing a 'ls -l' or a 'stat' on an autofs share when you have set + --ghost in the auto.master file, or browse_mode=yes in autofs.conf will + lead to the shares being mounted, when they didn't previously. - cause: coreutils 8.32.4.1ubuntu1 uses statx which not pass the - AT_NO_AUTOMOUNT flag + Disks / shares may not be present and the mounts may fail, leading to + errors in your output. - This bug is also known (and fixed) at Redhat - https://bugzilla.redhat.com/show_bug.cgi?id=2044981 + This is a behaviour change in autofs 8.32, which occurred in the + transition to using statx() instead of stat()/lstat() in previous + releases. - upstream commits: - https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2 - https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-178-g92cb8427c53 + There doesn't seem to be any workarounds, apart from not running a 'ls + -l' in your autofs share directory. - fedora commit - https://src.fedoraproject.org/rpms/coreutils/c/d736cafa20f13eeb037a3950bdbb4b63dc39b7e3?branch=f35 + [Testcase] + + Start two Jammy VMs. One will be a NFS server, the other, the client. + + NFS server: + + Server VM: + $ sudo hostnamectl set-hostname jammy-nfs-server + $ sudo apt update && sudo apt upgrade -y + $ sudo apt install nfs-kernel-server + $ sudo mkdir /export + $ sudo mkdir /export/users + $ sudo vi /etc/exports # add the following lines: + /export 192.168.122.0/24(rw,fsid=0,no_subtree_check,sync) + /export/users 192.168.122.0/24(rw,nohide,insecure,no_subtree_check,sync) + $ sudo systemctl restart nfs-server.service + + AutoFS Client: + $ sudo apt update + $ sudo apt install autofs + $ sudo vim /etc/autofs.conf + browse_mode = yes + $ sudo mkdir /mnt2 + $ sudo vim /etc/auto.master + /mnt2 /etc/auto.indirect + $ sudo vim /etc/auto.indirect + export 192.168.122.18:/export + export-missing 192.168.122.18:/export-missing + $ sudo reboot + $ cd /mnt2 + $ ls -l + ls: cannot access 'export-missing': No such file or directory + total 4 + drwxr-xr-x 3 root root 4096 Feb 12 21:48 export + d? ? ?? ?? export-missing + $ mount -l | grep /mnt2 + /etc/auto.indirect on /mnt2 type autofs (rw,relatime,fd=6,pgrp=634,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21561) + 192.168.122.18:/export on /mnt2/export type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.18,mountvers=3,mountport=35786,mountproto=udp,local_lock=none,addr=192.168.122.18) + + We see the mount for export occurred, and export-missing was attempted, + but it was either bogus or the disk was not present, leading to a "No + such file or directory" error. + + There are test packages available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf378489-test + + If you install them, this is what should occur: + + $ ls -l + total 0 + drwxr-xr-x 2 root root 0 Feb 12 22:01 export + drwxr-xr-x 2 root root 0 Feb 12 22:01 export-missing + $ mount -l | grep /mnt2 + /etc/auto.indirect on /mnt2 type autofs (rw,relatime,fd=6,pgrp=636,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=18346) + + No mounts happen, and no errors either. + + [Where problems could occur] + + We are changing the behaviour of core utilities, ls and stat, such that + they no longer attempt to mount autofs shares when --ghost option is + present or browse_mode is enabled. + + This is the intended behaviour in the first place, and has been this way + for at least a decade prior, and was changed to return to this behaviour + shortly after the release of coreutils that introduced statx() that + caused automounts to occur. + + It is unlikely any system administrators are relying on the behaviour + found in jammy in any scripts or day to day operations that would be + adversely affected by the change. The worst case scenario is that a user + doing an 'ls -l' on an unmounted disk finds the mount doesn't + automatically occur, and they have to attach the disk and issue the + mount themselves. + + If a regression were to occur, it would be limited to the ls and stat + commands, specifically when listing directories linked to autofs + mountpoints. + + [Other info] + + The automount behaviour change was introduced upstream in version 8.32, + with the introduction of the statx() call. This means only Jammy is + affected. + + It was quickly reverted back to how it was originally in version 9.1, + which is already available in Mantic and onward. + + The commits that solve the issue are: + + commit 85c975df2c25bd799370b04bb294e568e001102f + From: Rohan Sable + Date: Mon, 7 Mar 2022 14:14:13 + + Subject: ls: avoid triggering automounts + Link:
[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled
Attached is a debdiff that solves this issue on Jammy. ** Patch added: "Debdiff for coreutils on Jammy" https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2033892/+attachment/5745181/+files/lp2033892_jammy.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to coreutils in Ubuntu. https://bugs.launchpad.net/bugs/2033892 Title: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled Status in coreutils package in Ubuntu: Fix Released Status in coreutils source package in Jammy: In Progress Status in coreutils package in Fedora: Fix Released Bug description: Release: 22.04.3 LTS coreutils 8.32-4.1ubuntu1 ls triggers unwanted mounts of autofs filesystems cause: coreutils 8.32.4.1ubuntu1 uses statx which not pass the AT_NO_AUTOMOUNT flag This bug is also known (and fixed) at Redhat https://bugzilla.redhat.com/show_bug.cgi?id=2044981 upstream commits: https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2 https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-178-g92cb8427c53 fedora commit https://src.fedoraproject.org/rpms/coreutils/c/d736cafa20f13eeb037a3950bdbb4b63dc39b7e3?branch=f35 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2033892/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled
** Also affects: coreutils (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: coreutils (Ubuntu Jammy) Status: New => In Progress ** Changed in: coreutils (Ubuntu Jammy) Importance: Undecided => Medium ** Changed in: coreutils (Ubuntu Jammy) Assignee: (unassigned) => Matthew Ruffell (mruffell) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to coreutils in Ubuntu. https://bugs.launchpad.net/bugs/2033892 Title: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled Status in coreutils package in Ubuntu: Fix Released Status in coreutils source package in Jammy: In Progress Status in coreutils package in Fedora: Fix Released Bug description: Release: 22.04.3 LTS coreutils 8.32-4.1ubuntu1 ls triggers unwanted mounts of autofs filesystems cause: coreutils 8.32.4.1ubuntu1 uses statx which not pass the AT_NO_AUTOMOUNT flag This bug is also known (and fixed) at Redhat https://bugzilla.redhat.com/show_bug.cgi?id=2044981 upstream commits: https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2 https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-178-g92cb8427c53 fedora commit https://src.fedoraproject.org/rpms/coreutils/c/d736cafa20f13eeb037a3950bdbb4b63dc39b7e3?branch=f35 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2033892/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Krister, Fascinating. I'm in New Zealand, so I use ap-southeast-2 in Sydney, Australia for all my instances, and I never gave it any thought that this could depend on how busy EBS is on the availability zone. I'll move my instances to us-west-2. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Krister, I have finally seen this occur in real life with my own two eyes! You are absolutely correct, the 4-retry doesn't seem to be sufficient sometimes. The reproducer works on Focal and previous in about 20 minutes, so its easy to see the issue trigger on Focal. But Focal and previous doesn't retry at all. On Jammy, Mantic and noble, it took about a week straight, but I managed to get it to trigger for each of them. Start Tue Jan 16 01:57:20 UTC 2024 Tue Jan 16 02:18:53 UTC 2024 End Tue Jan 23 20:12:28 UTC 2024 Tue Jan 23 14:32:08 UTC 2024 The 4-retry does help, and helps quite a lot really. Anyway, I upgraded my test environment to the test packages, and I will leave them running for a week. If things look good then, I'll get these patches sponsored for SRU. Sorry for the delay, but I really wanted to see it fail on Jammy, Mantic and Noble before we go patching them. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Krister, I apologise for the delay. The main issue I have been having with testing is that it reproduces significantly faster on some releases than others, and I still haven't managed to reproduce once on some releases. I'll set up some fresh reproducers now, and leave them running. If you want to help test, there are test packages for all releases in: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test Regardless, I'll try move this forwards. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Attached is a patch for noble that solves this issue. ** Patch added: "Debdiff for e2fsprogs on noble" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5738302/+files/lp2036467_noble.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Attached is a V2 patch for mantic with a different version number, due to it no longer being the devel release. ** Patch removed: "Debdiff for e2fsprogs on mantic" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707893/+files/lp2036467_mantic.debdiff ** Patch added: "Debdiff for e2fsprogs on mantic V2" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5738301/+files/lp2036467_mantic_v2.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs
Upstream bug: https://gitlab.gnome.org/GNOME/glib/-/issues/3185 ** Bug watch added: gitlab.gnome.org/GNOME/glib/-/issues #3185 https://gitlab.gnome.org/GNOME/glib/-/issues/3185 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to glib2.0 in Ubuntu. https://bugs.launchpad.net/bugs/2044420 Title: gtkpod segfaults when attempting to display songs Status in glib2.0 package in Ubuntu: New Status in gtkpod package in Ubuntu: New Status in glib2.0 source package in Mantic: New Status in gtkpod source package in Mantic: New Status in glib2.0 source package in Noble: New Status in gtkpod source package in Noble: New Bug description: Open gtkpod, and select your ipod from the list. If it has more than one screenfull of songs to display in the list, gtkpod will immediately segfault. I haven't found a workaround yet. Broken on Mantic, works on Lunar. Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault. __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 685 ../string/strxfrm_l.c: No such file or directory. (gdb) bt #0 __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 #1 0x770c5a5e in g_utf8_collate_key () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x77f852ec in fuzzy_skip_prefix () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #3 0x7fffa80980ca in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #4 0x7fffa80997fd in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #5 0x7fffa8099526 in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #6 0x7fffa809f196 in sorttab_display_select_playlist_cb () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #7 0x7718d130 in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #8 0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #9 0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #10 0x771abbd6 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #11 0x771abc93 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #12 0x77f67e4b in gtkpod_set_current_playlist () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #13 0x7fffa807cce0 in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #16 0x7708c46f in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0 #18 0xea1f in main () To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/2044420/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs
** Also affects: glib2.0 (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to glib2.0 in Ubuntu. https://bugs.launchpad.net/bugs/2044420 Title: gtkpod segfaults when attempting to display songs Status in glib2.0 package in Ubuntu: New Status in gtkpod package in Ubuntu: New Status in glib2.0 source package in Mantic: New Status in gtkpod source package in Mantic: New Status in glib2.0 source package in Noble: New Status in gtkpod source package in Noble: New Bug description: Open gtkpod, and select your ipod from the list. If it has more than one screenfull of songs to display in the list, gtkpod will immediately segfault. I haven't found a workaround yet. Broken on Mantic, works on Lunar. Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault. __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 685 ../string/strxfrm_l.c: No such file or directory. (gdb) bt #0 __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 <_nl_global_locale>) at ../string/strxfrm_l.c:685 #1 0x770c5a5e in g_utf8_collate_key () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x77f852ec in fuzzy_skip_prefix () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #3 0x7fffa80980ca in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #4 0x7fffa80997fd in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #5 0x7fffa8099526 in normal_sort_tab_page_add_track () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #6 0x7fffa809f196 in sorttab_display_select_playlist_cb () at /usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so #7 0x7718d130 in g_closure_invoke () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #8 0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #9 0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #10 0x771abbd6 in g_signal_emit_valist () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #11 0x771abc93 in g_signal_emit () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0 #12 0x77f67e4b in gtkpod_set_current_playlist () at /lib/x86_64-linux-gnu/libgtkpod.so.1 #13 0x7fffa807cce0 in ??? () at /usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #16 0x7708c46f in g_main_loop_run () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0 #18 0xea1f in main () To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/2044420/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
** Description changed: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. + $ resize2fs /dev/nvme1n1p1 + resize2fs 1.47.0 (5-Feb-2023) + resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 + Couldn't find valid filesystem superblock. + Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: -#!/usr/bin/bash -set -euxo pipefail + #!/usr/bin/bash + set -euxo pipefail -while true -do -parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s -sleep .5 -mkfs.ext4 /dev/nvme1n1p1 -mount -t ext4 /dev/nvme1n1p1 /mnt -stress-ng --temp-path /mnt -D 4 & -STRESS_PID=$! -sleep 1 -growpart /dev/nvme1n1 1 -resize2fs /dev/nvme1n1p1 -kill $STRESS_PID -wait $STRESS_PID -umount /mnt -wipefs -a /dev/nvme1n1p1 -wipefs -a /dev/nvme1n1 -done + while true + do + parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s + sleep .5 + mkfs.ext4 /dev/nvme1n1p1 + mount -t ext4 /dev/nvme1n1p1 /mnt + stress-ng --temp-path /mnt -D 4 & + STRESS_PID=$! + sleep 1 + growpart /dev/nvme1n1 1 + resize2fs /dev/nvme1n1p1 + kill $STRESS_PID + wait $STRESS_PID + umount /mnt + wipefs -a /dev/nvme1n1p1 + wipefs -a /dev/nvme1n1 + done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] - Upstream mailing list discussion: + Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for - online resizes + online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images. ** Changed in: e2fsprogs (Ubuntu Bionic) Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
@juliank I'm just doing a little bit more testing for the moment, as I really want to make sure this isn't going to cause any issues in the cloud images. It would be nice to have this bug fixed though, I have seen a few cases related to it over the years. I'll ask my SEG colleagues for help with sponsoring in a day or two. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
** Summary changed: - superblock checksum mismatch in resize2fs + Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs ** Description changed: - Hi, - We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: + [Impact] + + This is a long running bug plaguing cloud-images, where on a rare + occasion resize2fs would fail and the image would not resize to fit the + entire disk. + + Online resizes would fail due to a superblock checksum mismatch, where + the superblock in memory differs from what is currently on disk due to + changes made to the image. + + Changing the read of the superblock to Direct I/O solves the issue. + + [Testcase] + + Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use + as a scratch disk. + + Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done - (This was on a 60gb gp3 volume attached to a c5.4xlarge) + Test packages are available in the following ppa: - We were able to find a fix that works and get the patch accepted - upstream. The short explanation is that by switching the superblock - read to direct io, we no longer see the problem. + https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test - The patch is available here, but hasn't been published in a released - version of e2fsprogs: + If you install the test packages, the race no longer occurs. - https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 + [Where problems could occur] - A longer thread with the maintainer is available here: + We are changing how resize2fs reads the superblock from underlying + disks. + If a regression were to occur, resize2fs could fail to resize offline or + online volumes. As all cloud-images are online resized during their + initial boot, this could have a large impact to public and private + clouds should a regression occur. + + [Other info] + + Upstream mailing list discussion: + https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ - This bug report is to request that Ubuntu backport this patch to the - versions of e2fsprogs that are in releases that are available in images - on AWS, preferably Focal and Jammy. + This was fixed in the below commit upstream: + + commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 + Author: Theodore Ts'o + Date: Thu, 15 Jun 2023 00:17:01 -0400 + Subject: resize2fs: use Direct I/O when reading the superblock for + online resizes + Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 + + The commit has not been tagged to any release. All supported Ubuntu + releases require this fix, and need to be published in standard non-ESM + archives to be picked up in cloud images. ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
Attached is a debdiff for e2fsprogs on trusty which fixes this issue. ** Patch added: "Debdiff for e2fsprogs on trusty" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707900/+files/lp2036467_trusty.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
Attached is a debdiff for e2fsprogs on xenial which fixes this issue. ** Patch added: "Debdiff for e2fsprogs on xenial" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707899/+files/lp2036467_xenial.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
Attached is a debdiff for e2fsprogs on bionic which fixes this issue. ** Patch added: "Debdiff for e2fsprogs on bionic" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707898/+files/lp2036467_bionic.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
Attached is a debdiff for e2fsprogs on focal which fixes this issue. ** Patch added: "Debdiff for e2fsprogs on focal" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707896/+files/lp2036467_focal.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
Attached is a debdiff for e2fsprogs on jammy which fixes this issue. ** Patch added: "Debdiff for e2fsprogs on jammy" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707895/+files/lp2036467_jammy.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
Attached is a debdiff for e2fsprogs on lunar which fixes this issue. ** Patch added: "Debdiff for e2fsprogs on lunar" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707894/+files/lp2036467_lunar.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
Attached is a debdiff for e2fsprogs on mantic which fixes this issue. ** Patch added: "Debdiff for e2fsprogs on mantic" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707893/+files/lp2036467_mantic.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs
** Also affects: e2fsprogs (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: e2fsprogs (Ubuntu Trusty) Importance: Undecided Status: New ** Also affects: e2fsprogs (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: e2fsprogs (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: e2fsprogs (Ubuntu Mantic) Status: Confirmed => In Progress ** Changed in: e2fsprogs (Ubuntu Lunar) Status: New => In Progress ** Changed in: e2fsprogs (Ubuntu Jammy) Status: New => In Progress ** Changed in: e2fsprogs (Ubuntu Focal) Status: New => In Progress ** Changed in: e2fsprogs (Ubuntu Bionic) Status: New => In Progress ** Changed in: e2fsprogs (Ubuntu Xenial) Status: New => In Progress ** Changed in: e2fsprogs (Ubuntu Trusty) Status: New => In Progress ** Changed in: e2fsprogs (Ubuntu Mantic) Importance: Undecided => Critical ** Changed in: e2fsprogs (Ubuntu Lunar) Importance: Undecided => Critical ** Changed in: e2fsprogs (Ubuntu Jammy) Importance: Undecided => Critical ** Changed in: e2fsprogs (Ubuntu Focal) Importance: Undecided => Critical ** Changed in: e2fsprogs (Ubuntu Bionic) Importance: Undecided => Critical ** Changed in: e2fsprogs (Ubuntu Xenial) Importance: Undecided => Critical ** Changed in: e2fsprogs (Ubuntu Trusty) Importance: Undecided => Critical ** Changed in: e2fsprogs (Ubuntu Mantic) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: e2fsprogs (Ubuntu Lunar) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: e2fsprogs (Ubuntu Jammy) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: e2fsprogs (Ubuntu Focal) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: e2fsprogs (Ubuntu Bionic) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: e2fsprogs (Ubuntu Xenial) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: e2fsprogs (Ubuntu Trusty) Assignee: (unassigned) => Matthew Ruffell (mruffell) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: Hi, We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done (This was on a 60gb gp3 volume attached to a c5.4xlarge) We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem. The patch is available here, but hasn't been published in a released version of e2fsprogs: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 A longer thread with the maintainer is available here: https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort
Hi William. The libunwind SRU for Bionic and Focal have now been released to -updates. Their versions are 1.2.1-8ubuntu0.1 for Bionic, and 1.2.1-9ubuntu0.1 for Focal. I just want to apologise for the significant delay in getting libunwind released. It really was a exceptional amount of time, and I'm sorry it took so long. Since I wrote to you last, I root caused the issue and worked with Paride to resolve the regression that was introduced into autopkgtest itself. The bug in autopkgtest was quite obscure, and it required the following to occur: 1. an all-proposed build (--apt-pocket=proposed with no package pinning) 2. multiple tests defined in d/t/control 3. the tests do not allow reusing the same testbed system All these conditions were present in the kernel autopkgtests, and the result was that the change to allow apt pinning for -proposed caused _create_apt_pinning_for_packages() to be called incorrectly and it set a pinning for the -release pocket at 990, over -updates and -proposed, at 500 each, which meant that -release was being favoured over -proposed, and it caused all sorts of apt resolve issues. The issue was introduced in: commit 1c018c78de9d9421c0c358c900a37e545334cc66 From: Paride Legovini Date: Thu, 15 Dec 2022 21:47:02 +0100 Subject: Pin pockets with Pin-Priority: 500 Link: https://salsa.debian.org/ci-team/autopkgtest/-/commit/1c018c78de9d9421c0c358c900a37e545334cc66 The full explanation of the autopkgtest issues can be found in the below emails: >From myself to Paride https://paste.ubuntu.com/p/44yFTBNBHh/ >From Paride to myself: https://paste.ubuntu.com/p/jtt5wh6BB2/ Paride's merge request; https://salsa.debian.org/ci-team/autopkgtest/-/merge_requests/218 Final fix commit: https://salsa.debian.org/ci-team/autopkgtest/-/commit/94b9bb8db3051123d7b29a7880420340a76c7b7e The fix is in place on the Launchpad build infrastructure, and we re-ran all autopkgtests around libunwind and its reverse dependencies, and they all passed, leading us clear to release libunwind to -updates. Again, I sincerely apologise for keeping you waiting for so long, and I thank you for your patience and understanding while I debugged autopkgtest. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libunwind in Ubuntu. https://bugs.launchpad.net/bugs/1999104 Title: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort Status in libunwind package in Ubuntu: Fix Released Status in libunwind source package in Bionic: Fix Released Status in libunwind source package in Focal: Fix Released Bug description: [Impact] On architectures other than i386 and amd64, the C++ exception support in libunwind appears to be broken, always failing and calling std::terminate() which leads to the program aborting. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0xf7c2daac in __GI_abort () at abort.c:79 #2 0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #3 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #4 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #5 0xf7e1f5e0 in __cxa_rethrow () from /lib/aarch64-linux-gnu/libstdc++.so.6 #6 0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #7 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #8 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #9 0xf7e1f574 in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6 #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9 #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9 Compiling libunwind with --enable-cxx-exceptions enabled leads to _Unwind_RaiseException being called during __cxa_throw(), which fails to find a handler, and the generic std::terminate() is called instead, aborting the program. On i386 and amd64 this doesn't seem to be the case, and the libunwind handlers seem to be present. To fix, we only enable the configure option --enable-cxx-exceptions on i386 and amd64 only, in debian/rules. This lets other architectures fall back to the symbols provided by libgcc_s, which implementation works correctly. [Testcase] Ali Sadi has provided a reproducer program. Start an arm64 instance, for example, a c6g.medium instance on AWS, with either Bionic or Focal. $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz $ sudo apt install -y build-essential libunwind-dev $ tar xvf libunwind.tar.gz && cd test $ make all There are two executable, main and main_unwind. main is not linked to
[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort
Hi William, I sincerely apologise for the delay. Currently libunwind is stuck in -proposed due to benign autopkgtest regressions in the kernel packages. If you go to the below page: https://people.canonical.com/~ubuntu-archive/pending-sru.html And search for "libunwind" you will see entries for Bionic and Focal. It is SRU policy to not release a package with current autopkgtest regressions. Now, I have spent more time than I am willing to admit on trying to debug these failures, and I have also asked the Kernel Team, several which took a look, and some Launchpad admins, and we are still a bit stuck. The problem does not reproduce locally, only on Launchpad builders. For example, take the 4.15 Bionic Kernel: https://autopkgtest.ubuntu.com/packages/l/linux/bionic/amd64 (it is a reverse dependency of libunwind, which is why it is selected for autopkgtest) https://autopkgtest.ubuntu.com/results/autopkgtest- bionic/bionic/amd64/l/linux/20230110_115614_09e98@/log.gz It rebuilds fine, but then runs into apt resolver trouble when running the kernel testsuite. autopkgtest makes a dummy package, that contains the list of necessary dependencies to run the testsuite, dpkg -i to install the package, and then does an apt install -f to force dependency resolution. The dummy package is called autopkgtest-satdep. https://paste.ubuntu.com/p/Cszfkvy47Z/ But it fails in strange ways, like not being able to select build- essential, even though it is already installed in the builder. I am still trying to debug the root cause behind these autopkgtest regressions, which is why things have been delayed. There is a provision in SRUs where they can be released as long as I can prove that the upload did not cause the regression: https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions In which case, I may as well invoke this clause, since I don't wish to keep you waiting any longer. I will try and get this package released within the week. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libunwind in Ubuntu. https://bugs.launchpad.net/bugs/1999104 Title: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort Status in libunwind package in Ubuntu: Fix Released Status in libunwind source package in Bionic: Fix Committed Status in libunwind source package in Focal: Fix Committed Bug description: [Impact] On architectures other than i386 and amd64, the C++ exception support in libunwind appears to be broken, always failing and calling std::terminate() which leads to the program aborting. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0xf7c2daac in __GI_abort () at abort.c:79 #2 0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #3 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #4 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #5 0xf7e1f5e0 in __cxa_rethrow () from /lib/aarch64-linux-gnu/libstdc++.so.6 #6 0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #7 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #8 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #9 0xf7e1f574 in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6 #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9 #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9 Compiling libunwind with --enable-cxx-exceptions enabled leads to _Unwind_RaiseException being called during __cxa_throw(), which fails to find a handler, and the generic std::terminate() is called instead, aborting the program. On i386 and amd64 this doesn't seem to be the case, and the libunwind handlers seem to be present. To fix, we only enable the configure option --enable-cxx-exceptions on i386 and amd64 only, in debian/rules. This lets other architectures fall back to the symbols provided by libgcc_s, which implementation works correctly. [Testcase] Ali Sadi has provided a reproducer program. Start an arm64 instance, for example, a c6g.medium instance on AWS, with either Bionic or Focal. $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz $ sudo apt install -y build-essential libunwind-dev $ tar xvf libunwind.tar.gz && cd test $ make all There are two executable, main and main_unwind. main is not linked to libunwind, and main_unwind is linked to libunwind. $ ./main int throws lib int caught main $ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted
[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received
read + or add any extra thread synchronisation primitives. + + This has been tested with 13k VM deployments on Microsoft Azure, and has + found to work as expected with no failures, meaning risk of additional + race conditions we are not aware of is low. + + The reason why this patch was not forwarded upstream, is that isc-dhcp + is now officially End Of Life, and has effectively been abandoned by + upstream. You can read about it in these notices: + + https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html + https://www.isc.org/blogs/isc-dhcp-eol/ + + Upstream won't fix any more bugs, make any new releases, or even accept + any new commits. They are putting their efforts into isc-kea now. ** No longer affects: bind9-libs (Ubuntu Focal) ** No longer affects: bind9-libs (Ubuntu Jammy) ** Changed in: bind9-libs (Ubuntu) Status: Fix Released => Won't Fix ** Also affects: isc-dhcp (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: bind9-libs (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: isc-dhcp (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: bind9-libs (Ubuntu Jammy) Importance: Undecided Status: New ** No longer affects: bind9-libs (Ubuntu Focal) ** No longer affects: bind9-libs (Ubuntu Jammy) ** Changed in: isc-dhcp (Ubuntu Focal) Status: New => In Progress ** Changed in: isc-dhcp (Ubuntu Jammy) Status: New => In Progress ** Changed in: isc-dhcp (Ubuntu Focal) Importance: Undecided => High ** Changed in: isc-dhcp (Ubuntu Jammy) Importance: Undecided => High ** Changed in: isc-dhcp (Ubuntu Focal) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: isc-dhcp (Ubuntu Jammy) Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1926139 Title: dhclient: thread concurrency race leads to DHCPOFFER packets not being received Status in bind9-libs package in Ubuntu: Won't Fix Status in isc-dhcp package in Ubuntu: Invalid Status in isc-dhcp source package in Focal: In Progress Status in isc-dhcp source package in Jammy: In Progress Bug description: [Impact] Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable. This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments. The logs of an affected dhclient produce the following: Listening on LPF/enp1s0/52:54:00:1c:d7:00 Sending on LPF/enp1s0/52:54:00:1c:d7:00 Sending on Socket/fallback DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f) ... (omitting 20 similar lines) ... DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f) No DHCPOFFERS received. No working leases in persistent database - sleeping. Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/ Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/ The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up. tcpdump: Screenshot of Wireshark: This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies. The full explanation is in the "Other Info" section, but the fix is to add a mutex that restricts access to the global linked list of open sockets, and ensures that a newly created socket is added to this list, before the socketmanager callback has an opportunity to walk this list when there is data immediately able to be read. Mauricio has provided such a patch, and includes options to disable this behaviour during runtime to minimise regression risk. [Testcase] Reproducer based on GDB and DHCP noise injection. It uses 3 veth pairs (DHCP server/cl
[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received
** Tags added: sts-sponsor -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1926139 Title: dhclient: thread concurrency race leads to DHCPOFFER packets not being received Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in bind9-libs source package in Jammy: In Progress Bug description: [Impact] Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable. This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud-init instead of systemd-networkd, and this is causing issues with larger deployments. The logs of an affected dhclient produce the following: Listening on LPF/enp1s0/52:54:00:1c:d7:00 Sending on LPF/enp1s0/52:54:00:1c:d7:00 Sending on Socket/fallback DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f) ... (omitting 20 similar lines) ... DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f) No DHCPOFFERS received. No working leases in persistent database - sleeping. Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/ Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/ The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up. tcpdump: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap Screenshot of Wireshark: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies. The full explanation is in the "Other Info" section, but the fix for this is to change bind9-libs from being built multithreaded, back to single threaded as intended by dhclient maintainers. In Focal and Jammy, isc-dhcp links against bind9 libraries provided in bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9 library it uses, which is already configured properly to --disable- threads. Change the Focal and Jammy bind9-libs to --disable-threads and update symbol files to reflect the library is single threaded again. [Testcase] Start a fresh Focal or Jammy instance. Download and set executable test-parallel.sh, and edit some lines: 1) wget https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh 2) chmod +x test-parallel.sh 3) vim test-parallel.sh Change iface="enp5s0" to your interface, likely iface="enp1s0". Comment out the line "# cp bionic-dhclient $workdir/dhclient". 4) sudo ./test-parallel.sh After five minutes, if you issue reproduces, you will see "TEST FAILED". You can watch the output with: 5) cat /tmp/dhclient-* | less Next, for instrumented runs, you need to build dhclient from source. 1) sudo apt install build-essential devscripts 2) apt source isc-dhcp 3) sudo apt build-dep isc-dhcp 4) cd isc-dhcp Apply the below patch: https://paste.ubuntu.com/p/hGsssrVyG4/ 5) patch -p1 < ~/patch.patch 6) debuild -b -uc -us 7) cd .. 8) sudo dpkg -i isc-dhcp-client-* 9) sudo ./test-parallel.sh 10) cat /tmp/dhclient-* | less Look for the race, as described in "Other Info", namely: mruffell: registering with socket manager mruffell: callback called mruffell: omapi object is NULL mruffell: omapi object is NULL mruffell: Adding obj to linked list mruffell: Obj added to list The issue has reproduced. If you install the test package from the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test Instructions to install (on a Focal or Jammy system): 1) sudo add-apt-repository
[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received
Screenshot of wireshark. ** Attachment added: "Screenshot of wireshark" https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png ** Description changed: [Impact] Occasionally, during instance boot or machine start-up, dhclient will attempt to acquire a dhcp lease and fail, leaving the instance with no IP address and making it unreachable. This happens about once every 100 reboots on bare metal, or Chris Patterson in comment #2 describes it as affecting between ~0.3% to 2% of deployments on Microsoft Azure. Azure uses dhclient called from cloud- init instead of systemd-networkd, and this is causing issues with larger deployments. The logs of an affected dhclient produce the following: Listening on LPF/enp1s0/52:54:00:1c:d7:00 Sending on LPF/enp1s0/52:54:00:1c:d7:00 Sending on Socket/fallback DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f) ... (omitting 20 similar lines) ... DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f) DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f) No DHCPOFFERS received. No working leases in persistent database - sleeping. Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/ Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/ The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER packets being replied to with DHCPOFFER packets, but the got_one() callback is never called, dhclient does not read these DHCPOFFER packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 DHCPDISCOVER packets sent, it gives up. - tcpdump: - Screenshot of Wireshark: + tcpdump: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap + Screenshot of Wireshark: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png This behaviour led several bug reporters to believe it was a kernel issue, with the kernel not pushing DHCPOFFER packets to dhclient. This is not the case, the actual problem is dhclient containing a thread concurrency race condition, and when the race occurs, the read socket is closed prematurely, and dhclient does not read any of the DHCPOFFER replies. The full explanation is in the "Other Info" section, but the fix for this is to change bind9-libs from being built multithreaded, back to single threaded as intended by dhclient maintainers. In Focal and Jammy, isc-dhcp links against bind9 libraries provided in bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9 library it uses, which is already configured properly to --disable- threads. Change the Focal and Jammy bind9-libs to --disable-threads and update symbol files to reflect the library is single threaded again. [Testcase] Start a fresh Focal or Jammy instance. Download and set executable test-parallel.sh, and edit some lines: 1) wget https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh 2) chmod +x test-parallel.sh 3) vim test-parallel.sh Change iface="enp5s0" to your interface, likely iface="enp1s0". Comment out the line "# cp bionic-dhclient $workdir/dhclient". 4) sudo ./test-parallel.sh After five minutes, if you issue reproduces, you will see "TEST FAILED". You can watch the output with: 5) cat /tmp/dhclient-* | less Next, for instrumented runs, you need to build dhclient from source. 1) sudo apt install build-essential devscripts 2) apt source isc-dhcp 3) sudo apt build-dep isc-dhcp 4) cd isc-dhcp Apply the below patch: https://paste.ubuntu.com/p/hGsssrVyG4/ 5) patch -p1 < ~/patch.patch 6) debuild -b -uc -us 7) cd .. 8) sudo dpkg -i isc-dhcp-client-* 9) sudo ./test-parallel.sh 10) cat /tmp/dhclient-* | less Look for the race, as described in "Other Info", namely: mruffell: registering with socket manager mruffell: callback called mruffell: omapi object is NULL mruffell: omapi object is NULL mruffell: Adding obj to linked list mruffell: Obj added to list The issue has reproduced. If you install the test package from the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test Instructions to install (on a Focal or Jammy system): 1) sudo add-apt-repository ppa:mruffell/sf337873-test 2) sudo apt update 3) sudo apt install libdns-export1109 libisc-export1105 4) sudo apt-cache policy libisc-export1105 | grep Installed Installed:
[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received
packet capture from a reproduction run ** Description changed: - Platform: Qemu/libvirt on AMD64 - Ubuntu version: 20.04 - isc-dhcp-client version: 4.4.1-2.1ubuntu5 - Problem: When dhclient is used during boot every few reboots the DHCP OFFER packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no read event is triggered. - Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still occurs. So this issue might also be kernel related. - - Attached diff shows a strace of all threads and a pcap showing the - tcpdump output. - - Edit: - - Sometimes the dhclient command does receive the OFFER packet and connection is restored. - - In my testing running dhclient manually from the terminal when the OFFERs aren't received will result in a new dhclient session which does receive the OFFER packet and connection is restored. + [Impact] + + Occasionally, during instance boot or machine start-up, dhclient will + attempt to acquire a dhcp lease and fail, leaving the instance with no + IP address and making it unreachable. + + This happens about once every 100 reboots on bare metal, or Chris + Patterson in comment #2 describes it as affecting between ~0.3% to 2% of + deployments on Microsoft Azure. Azure uses dhclient called from cloud- + init instead of systemd-networkd, and this is causing issues with larger + deployments. + + The logs of an affected dhclient produce the following: + + Listening on LPF/enp1s0/52:54:00:1c:d7:00 + Sending on LPF/enp1s0/52:54:00:1c:d7:00 + Sending on Socket/fallback + DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f) + DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f) + ... + (omitting 20 similar lines) + ... + DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f) + DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f) + DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f) + No DHCPOFFERS received. + No working leases in persistent database - sleeping. + + Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/ + Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/ + + The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER + packets being replied to with DHCPOFFER packets, but the got_one() + callback is never called, dhclient does not read these DHCPOFFER + packets, and continues sending DHCPDISCOVER packets. Once it reaches 25 + DHCPDISCOVER packets sent, it gives up. + + tcpdump: + Screenshot of Wireshark: + + This behaviour led several bug reporters to believe it was a kernel + issue, with the kernel not pushing DHCPOFFER packets to dhclient. This + is not the case, the actual problem is dhclient containing a thread + concurrency race condition, and when the race occurs, the read socket is + closed prematurely, and dhclient does not read any of the DHCPOFFER + replies. + + The full explanation is in the "Other Info" section, but the fix for + this is to change bind9-libs from being built multithreaded, back to + single threaded as intended by dhclient maintainers. + + In Focal and Jammy, isc-dhcp links against bind9 libraries provided in + bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9 + library it uses, which is already configured properly to --disable- + threads. + + Change the Focal and Jammy bind9-libs to --disable-threads and update + symbol files to reflect the library is single threaded again. + + [Testcase] + + Start a fresh Focal or Jammy instance. + + Download and set executable test-parallel.sh, and edit some lines: + + 1) wget https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh + 2) chmod +x test-parallel.sh + 3) vim test-parallel.sh + + Change iface="enp5s0" to your interface, likely iface="enp1s0". + Comment out the line "# cp bionic-dhclient $workdir/dhclient". + + 4) sudo ./test-parallel.sh + + After five minutes, if you issue reproduces, you will see "TEST FAILED". + + You can watch the output with: + + 5) cat /tmp/dhclient-* | less + + Next, for instrumented runs, you need to build dhclient from source. + + 1) sudo apt install build-essential devscripts + 2) apt source isc-dhcp + 3) sudo apt build-dep isc-dhcp + 4) cd isc-dhcp + + Apply the below patch: + + https://paste.ubuntu.com/p/hGsssrVyG4/ + + 5) patch -p1 < ~/patch.patch + 6) debuild -b -uc -us + 7) cd .. + 8) sudo dpkg -i isc-dhcp-client-* + 9) sudo ./test-parallel.sh + 10) cat /tmp/dhclient-* | less + + Look for the race, as described in "Other Info", namely: + + mruffell: registering with socket manager + mruffell: callback called + mruffell: omapi object is NULL + mruffell: omapi object is NULL + mruffell: Adding obj to linked list + mruffell:
[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received
** Summary changed: - dhclient doesn't receive dhcp offer from kernel + dhclient: thread concurrency race leads to DHCPOFFER packets not being received -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1926139 Title: dhclient: thread concurrency race leads to DHCPOFFER packets not being received Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in bind9-libs source package in Jammy: In Progress Bug description: Platform: Qemu/libvirt on AMD64 Ubuntu version: 20.04 isc-dhcp-client version: 4.4.1-2.1ubuntu5 Problem: When dhclient is used during boot every few reboots the DHCP OFFER packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no read event is triggered. Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still occurs. So this issue might also be kernel related. Attached diff shows a strace of all threads and a pcap showing the tcpdump output. Edit: - Sometimes the dhclient command does receive the OFFER packet and connection is restored. - In my testing running dhclient manually from the terminal when the OFFERs aren't received will result in a new dhclient session which does receive the OFFER packet and connection is restored. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/bind9-libs/+bug/1926139/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1926139] Re: dhclient doesn't receive dhcp offer from kernel
Attached is a debdiff for Jammy which fixes this bug. ** Patch added: "Debdiff for bind9-libs for Jammy" https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641516/+files/lp1926139_jammy.debdiff ** Tags added: focal jammy sts ** Also affects: bind9-libs (Ubuntu) Importance: Undecided Status: New ** Also affects: isc-dhcp (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: bind9-libs (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: isc-dhcp (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: bind9-libs (Ubuntu Focal) Importance: Undecided Status: New ** No longer affects: isc-dhcp (Ubuntu Focal) ** No longer affects: isc-dhcp (Ubuntu Jammy) ** Changed in: isc-dhcp (Ubuntu) Status: New => Invalid ** Changed in: bind9-libs (Ubuntu Focal) Status: New => In Progress ** Changed in: bind9-libs (Ubuntu Jammy) Status: New => In Progress ** Changed in: bind9-libs (Ubuntu) Status: New => Fix Released ** Changed in: bind9-libs (Ubuntu Focal) Importance: Undecided => High ** Changed in: bind9-libs (Ubuntu Jammy) Importance: Undecided => High ** Changed in: bind9-libs (Ubuntu Focal) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: bind9-libs (Ubuntu Jammy) Assignee: (unassigned) => Matthew Ruffell (mruffell) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1926139 Title: dhclient doesn't receive dhcp offer from kernel Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in bind9-libs source package in Jammy: In Progress Bug description: Platform: Qemu/libvirt on AMD64 Ubuntu version: 20.04 isc-dhcp-client version: 4.4.1-2.1ubuntu5 Problem: When dhclient is used during boot every few reboots the DHCP OFFER packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no read event is triggered. Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still occurs. So this issue might also be kernel related. Attached diff shows a strace of all threads and a pcap showing the tcpdump output. Edit: - Sometimes the dhclient command does receive the OFFER packet and connection is restored. - In my testing running dhclient manually from the terminal when the OFFERs aren't received will result in a new dhclient session which does receive the OFFER packet and connection is restored. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/bind9-libs/+bug/1926139/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1926139] Re: dhclient doesn't receive dhcp offer from kernel
Attached is a debdiff for Focal which fixes this bug. ** Patch added: "Debdiff for bind9-libs for Focal" https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641515/+files/lp1926139_focal.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1926139 Title: dhclient doesn't receive dhcp offer from kernel Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in bind9-libs source package in Jammy: In Progress Bug description: Platform: Qemu/libvirt on AMD64 Ubuntu version: 20.04 isc-dhcp-client version: 4.4.1-2.1ubuntu5 Problem: When dhclient is used during boot every few reboots the DHCP OFFER packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no read event is triggered. Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still occurs. So this issue might also be kernel related. Attached diff shows a strace of all threads and a pcap showing the tcpdump output. Edit: - Sometimes the dhclient command does receive the OFFER packet and connection is restored. - In my testing running dhclient manually from the terminal when the OFFERs aren't received will result in a new dhclient session which does receive the OFFER packet and connection is restored. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/bind9-libs/+bug/1926139/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort
Performing verification for Bionic. I started two instances on AWS, one c6g.medium (arm64) and a t2.micro (amd64). I went through the reproducer listed in the testcase with libunwind-dev 1.2.1-8 from -release. $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz $ sudo apt install -y build-essential libunwind-dev $ tar xvf libunwind.tar.gz && cd test test/ test/lib.hpp test/main.cpp test/lib.cpp test/Makefile ~/test$ make all g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,. main.cpp -ltest -lunwind On arm64: ~/test$ ./main int throws lib int caught main ~/test$ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) On amd64: ~/test$ ./main int throws lib int caught main ~/test$ ./main_unwind int throws lib int caught main As expected, we see arm64 abort the execution of the reproducer. I then installed 1.2.1-8ubuntu0.1 from -proposed and rebuilt the reproducers: $ make clean $ make all g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,. main.cpp -ltest -lunwind On arm64: $ ./main int throws lib int caught main $ ./main_unwind int throws lib int caught main On amd64: ~/test$ ./main int throws lib int caught main ~/test$ ./main_unwind int throws lib int caught main We see that 1.2.1-8ubuntu0.1 from -proposed does not abort, and instead runs as expected. There is no change in behaviour on amd64. The package in -proposed fixes the problem, happy to mark as verified for Bionic. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libunwind in Ubuntu. https://bugs.launchpad.net/bugs/1999104 Title: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort Status in libunwind package in Ubuntu: Fix Released Status in libunwind source package in Bionic: Fix Committed Status in libunwind source package in Focal: Fix Committed Bug description: [Impact] On architectures other than i386 and amd64, the C++ exception support in libunwind appears to be broken, always failing and calling std::terminate() which leads to the program aborting. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0xf7c2daac in __GI_abort () at abort.c:79 #2 0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #3 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #4 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #5 0xf7e1f5e0 in __cxa_rethrow () from /lib/aarch64-linux-gnu/libstdc++.so.6 #6 0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #7 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #8 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #9 0xf7e1f574 in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6 #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9 #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9 Compiling libunwind with --enable-cxx-exceptions enabled leads to _Unwind_RaiseException being called during __cxa_throw(), which fails to find a handler, and the generic std::terminate() is called instead, aborting the program. On i386 and amd64 this doesn't seem to be the case, and the libunwind handlers seem to be present. To fix, we only enable the configure option --enable-cxx-exceptions on i386 and amd64 only, in debian/rules. This lets other architectures fall back to the symbols provided by libgcc_s, which implementation works correctly. [Testcase] Ali Sadi has provided a reproducer program. Start an arm64 instance, for example, a c6g.medium instance on AWS, with either Bionic or Focal. $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz $ sudo apt install -y build-essential libunwind-dev $ tar xvf libunwind.tar.gz && cd test $ make all There are two executable, main and main_unwind. main is not linked to libunwind, and main_unwind is linked to libunwind. $ ./main int throws lib int caught main $ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test $ make clean $ sudo apt install -y
[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort
Performing verification for Focal. I started two instances on AWS, one c6g.medium (arm64) and a t2.micro (amd64). I went through the reproducer listed in the testcase with libunwind-dev 1.2.1-9build1 from -release. $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz $ sudo apt install -y build-essential libunwind-dev $ tar xvf libunwind.tar.gz && cd test test/ test/lib.hpp test/main.cpp test/lib.cpp test/Makefile ~/test$ make all g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,. main.cpp -ltest -lunwind On arm64: ~/test$ ./main int throws lib int caught main ~/test$ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) On amd64: ~/test$ ./main int throws lib int caught main ~/test$ ./main_unwind int throws lib int caught main As expected, we see arm64 abort the execution of the reproducer. I then installed 1.2.1-9ubuntu0.1 from -proposed and rebuilt the reproducers: $ make clean $ make all g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,. main.cpp -ltest -lunwind On arm64: $ ./main int throws lib int caught main $ ./main_unwind int throws lib int caught main On amd64: ~/test$ ./main int throws lib int caught main ~/test$ ./main_unwind int throws lib int caught main We see that 1.2.1-9ubuntu0.1 from -proposed does not abort, and instead runs as expected. There is no change in behaviour on amd64. The package in -proposed fixes the problem, happy to mark as verified for Focal. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libunwind in Ubuntu. https://bugs.launchpad.net/bugs/1999104 Title: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort Status in libunwind package in Ubuntu: Fix Released Status in libunwind source package in Bionic: Fix Committed Status in libunwind source package in Focal: Fix Committed Bug description: [Impact] On architectures other than i386 and amd64, the C++ exception support in libunwind appears to be broken, always failing and calling std::terminate() which leads to the program aborting. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0xf7c2daac in __GI_abort () at abort.c:79 #2 0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #3 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #4 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #5 0xf7e1f5e0 in __cxa_rethrow () from /lib/aarch64-linux-gnu/libstdc++.so.6 #6 0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #7 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #8 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #9 0xf7e1f574 in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6 #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9 #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9 Compiling libunwind with --enable-cxx-exceptions enabled leads to _Unwind_RaiseException being called during __cxa_throw(), which fails to find a handler, and the generic std::terminate() is called instead, aborting the program. On i386 and amd64 this doesn't seem to be the case, and the libunwind handlers seem to be present. To fix, we only enable the configure option --enable-cxx-exceptions on i386 and amd64 only, in debian/rules. This lets other architectures fall back to the symbols provided by libgcc_s, which implementation works correctly. [Testcase] Ali Sadi has provided a reproducer program. Start an arm64 instance, for example, a c6g.medium instance on AWS, with either Bionic or Focal. $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz $ sudo apt install -y build-essential libunwind-dev $ tar xvf libunwind.tar.gz && cd test $ make all There are two executable, main and main_unwind. main is not linked to libunwind, and main_unwind is linked to libunwind. $ ./main int throws lib int caught main $ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test $ make clean $ sudo apt
[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort
** Tags added: sts-sponsor -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libunwind in Ubuntu. https://bugs.launchpad.net/bugs/1999104 Title: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort Status in libunwind package in Ubuntu: Fix Released Status in libunwind source package in Bionic: In Progress Status in libunwind source package in Focal: In Progress Bug description: [Impact] On architectures other than i386 and amd64, the C++ exception support in libunwind appears to be broken, always failing and calling std::terminate() which leads to the program aborting. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0xf7c2daac in __GI_abort () at abort.c:79 #2 0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #3 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #4 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #5 0xf7e1f5e0 in __cxa_rethrow () from /lib/aarch64-linux-gnu/libstdc++.so.6 #6 0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #7 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #8 0xf7e1f280 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6 #9 0xf7e1f574 in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6 #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9 #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9 Compiling libunwind with --enable-cxx-exceptions enabled leads to _Unwind_RaiseException being called during __cxa_throw(), which fails to find a handler, and the generic std::terminate() is called instead, aborting the program. On i386 and amd64 this doesn't seem to be the case, and the libunwind handlers seem to be present. To fix, we only enable the configure option --enable-cxx-exceptions on i386 and amd64 only, in debian/rules. This lets other architectures fall back to the symbols provided by libgcc_s, which implementation works correctly. [Testcase] Ali Sadi has provided a reproducer program. Start an arm64 instance, for example, a c6g.medium instance on AWS, with either Bionic or Focal. $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz $ sudo apt install -y build-essential libunwind-dev $ tar xvf libunwind.tar.gz && cd test $ make all There are two executable, main and main_unwind. main is not linked to libunwind, and main_unwind is linked to libunwind. $ ./main int throws lib int caught main $ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test $ make clean $ sudo apt install -y libunwind-dev $ make all $ ./main int throws lib int caught main $ ./main_unwind int throws lib int caught main The exception is caught as expected the program does not abort. [Where problems could occur] For architectures other than i386 and amd64, we are changing from libunwind provided exception handlers for __cxa_throw(), and using those provided by libgcc_s instead. There are a few reverse dependencies for libunwind-dev and libunwind8, which need to be considered: $ apt rdepends libunwind-dev libunwind-dev Reverse Depends: Depends: libunwind-setjmp0-dev (= 1.2.1-9build1) Depends: libefl-all-dev t$ apt rdepends libunwind-dev8 libunwind8 Reverse Depends: Depends: libunwind-dev (= 1.2.1-9build1) Depends: xvfb Depends: xnest Depends: xdmx Depends: xwayland Depends: xserver-xorg-core Depends: xserver-xephyr Depends: linux-tools-5.4.0-* Depends: linux-raspi-tools-* Depends: linux-raspi2-tools-5.4.0-* Depends: linux-raspi2-5.4-tools-5.4.0-* Depends: linux-oracle-5.15-tools-5.15.0-* Depends: linux-lowlatency-hwe-5.15-tools-5.15.0-* Depends: linux-hwe-5.8-tools-5.8.0-* Depends: linux-hwe-5.15-tools-5.15.0-* Depends: linux-gke-tools-5.4.0-* Depends: linux-gke-5.15-tools-5.15.0-* Depends: linux-gcp-tools-5.4.0-* Depends: linux-gcp-5.15-tools-5.15.0-* Depends: linux-azure-tools-5.4.0-* Depends: linux-azure-5.15-tools-5.15.0-* Depends: linux-aws-tools-5.4.0-* Depends: linux-aws-5.8-tools-5.8.0-* Depends: linux-aws-5.15-tools-5.15.0-* Depends: xvfb Depends: xnest Depends: xdmx Depends: trafficserver Depends: tilix Depends: tigervnc-standalone-server Depends: tarantool Depends:
[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort
** Summary changed: - libunwind causes crashes on arm64 + arm64: broken c++ exception handler support leads to std::terminate() being called and program abort ** Description changed: - There is a bug in libunwind in both 18.04 and 20.04 on arm64 where when - linked with libunwind instead of catching an exception, the program - crashes. This was first seen on mcrouter, but attached is a small - reproducer where `main_unwind` will crash. The libunwind shipping with - 22.04 doesn't appear to have this problem, nor do unmodified upstream - versions (including the 1.2.1 which is the 18.04 and 20.04 version). + [Impact] - Attached is a small reproducer that demonstrates the problem. + On architectures other than i386 and amd64, the C++ exception support in + libunwind appears to be broken, always failing and calling + std::terminate() which leads to the program aborting. - Ubuntu 22.04: - ``` - $ ./main - int throws lib - int caught main - $ ./main_unwind - int throws lib - int caught main - ``` + (gdb) bt + #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 + #1 0xf7c2daac in __GI_abort () at abort.c:79 + #2 0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() () +from /lib/aarch64-linux-gnu/libstdc++.so.6 + #3 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 + #4 0xf7e1f280 in std::terminate() () +from /lib/aarch64-linux-gnu/libstdc++.so.6 + #5 0xf7e1f5e0 in __cxa_rethrow () +from /lib/aarch64-linux-gnu/libstdc++.so.6 + #6 0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() () +from /lib/aarch64-linux-gnu/libstdc++.so.6 + #7 0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 + #8 0xf7e1f280 in std::terminate() () +from /lib/aarch64-linux-gnu/libstdc++.so.6 + #9 0xf7e1f574 in __cxa_throw () +from /lib/aarch64-linux-gnu/libstdc++.so.6 + #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9 + #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9 - Ubuntu 20.04: - ``` + Compiling libunwind with --enable-cxx-exceptions enabled leads to + _Unwind_RaiseException being called during __cxa_throw(), which fails to + find a handler, and the generic std::terminate() is called instead, + aborting the program. + + On i386 and amd64 this doesn't seem to be the case, and the libunwind + handlers seem to be present. + + To fix, we only enable the configure option --enable-cxx-exceptions on + i386 and amd64 only, in debian/rules. This lets other architectures fall + back to the symbols provided by libgcc_s, which implementation works + correctly. + + [Testcase] + + Ali Sadi has provided a reproducer program. + + Start an arm64 instance, for example, a c6g.medium instance on AWS, with + either Bionic or Focal. + + $ wget https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz + $ sudo apt install -y build-essential libunwind-dev + $ tar xvf libunwind.tar.gz && cd test + $ make all + + There are two executable, main and main_unwind. main is not linked to + libunwind, and main_unwind is linked to libunwind. + $ ./main int throws lib int caught main $ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) - ``` + + If you install the test package available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test + + $ make clean + $ sudo apt install -y libunwind-dev + $ make all + $ ./main + int throws lib + int caught main + $ ./main_unwind + int throws lib + int caught main + + The exception is caught as expected the program does not abort. + + [Where problems could occur] + + For architectures other than i386 and amd64, we are changing from + libunwind provided exception handlers for __cxa_throw(), and using those + provided by libgcc_s instead. + + There are a few reverse dependencies for libunwind-dev and libunwind8, + which need to be considered: + + $ apt rdepends libunwind-dev + libunwind-dev + Reverse Depends: + Depends: libunwind-setjmp0-dev (= 1.2.1-9build1) + Depends: libefl-all-dev + + t$ apt rdepends libunwind-dev8 + libunwind8 + Reverse Depends: + Depends: libunwind-dev (= 1.2.1-9build1) + Depends: xvfb + Depends: xnest + Depends: xdmx + Depends: xwayland + Depends: xserver-xorg-core + Depends: xserver-xephyr + Depends: linux-tools-5.4.0-* + Depends: linux-raspi-tools-* + Depends: linux-raspi2-tools-5.4.0-* + Depends: linux-raspi2-5.4-tools-5.4.0-* + Depends: linux-oracle-5.15-tools-5.15.0-* + Depends: linux-lowlatency-hwe-5.15-tools-5.15.0-* + Depends: linux-hwe-5.8-tools-5.8.0-* + Depends: linux-hwe-5.15-tools-5.15.0-* + Depends: linux-gke-tools-5.4.0-* + Depends: linux-gke-5.15-tools-5.15.0-* + Depends: linux-gcp-tools-5.4.0-* + Depends: linux-gcp-5.15-tools-5.15.0-* + Depends:
[Touch-packages] [Bug 1999104] Re: libunwind causes crashes on arm64
Attached is a debdiff which fixes this problem on Bionic ** Also affects: libunwind (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: libunwind (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: libunwind (Ubuntu) Status: New => Fix Released ** Changed in: libunwind (Ubuntu Bionic) Status: New => In Progress ** Changed in: libunwind (Ubuntu Focal) Status: New => In Progress ** Changed in: libunwind (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: libunwind (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: libunwind (Ubuntu Bionic) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: libunwind (Ubuntu Focal) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Patch added: "Debdiff for libunwind on Bionic" https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635450/+files/lp1999104_bionic.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libunwind in Ubuntu. https://bugs.launchpad.net/bugs/1999104 Title: libunwind causes crashes on arm64 Status in libunwind package in Ubuntu: Fix Released Status in libunwind source package in Bionic: In Progress Status in libunwind source package in Focal: In Progress Bug description: There is a bug in libunwind in both 18.04 and 20.04 on arm64 where when linked with libunwind instead of catching an exception, the program crashes. This was first seen on mcrouter, but attached is a small reproducer where `main_unwind` will crash. The libunwind shipping with 22.04 doesn't appear to have this problem, nor do unmodified upstream versions (including the 1.2.1 which is the 18.04 and 20.04 version). Attached is a small reproducer that demonstrates the problem. Ubuntu 22.04: ``` $ ./main int throws lib int caught main $ ./main_unwind int throws lib int caught main ``` Ubuntu 20.04: ``` $ ./main int throws lib int caught main $ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) ``` To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1999104] Re: libunwind causes crashes on arm64
Attached is a debdiff which fixes this problem on Focal. ** Patch added: "Debdiff for libunwind on Focal" https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635451/+files/lp1999104_focal.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to libunwind in Ubuntu. https://bugs.launchpad.net/bugs/1999104 Title: libunwind causes crashes on arm64 Status in libunwind package in Ubuntu: Fix Released Status in libunwind source package in Bionic: In Progress Status in libunwind source package in Focal: In Progress Bug description: There is a bug in libunwind in both 18.04 and 20.04 on arm64 where when linked with libunwind instead of catching an exception, the program crashes. This was first seen on mcrouter, but attached is a small reproducer where `main_unwind` will crash. The libunwind shipping with 22.04 doesn't appear to have this problem, nor do unmodified upstream versions (including the 1.2.1 which is the 18.04 and 20.04 version). Attached is a small reproducer that demonstrates the problem. Ubuntu 22.04: ``` $ ./main int throws lib int caught main $ ./main_unwind int throws lib int caught main ``` Ubuntu 20.04: ``` $ ./main int throws lib int caught main $ ./main_unwind terminate called after throwing an instance of 'int' terminate called recursively Aborted (core dumped) ``` To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure
Attached is an improvement on the previous patch revision. Output is now forwarded to logger, we use shell expansion to enumerate network devices, we omit loopback, and we added a udevadm settle to wait for any thunderstorms to resolve before we continue installing the new udev package. ** Patch added: "Debdiff for systemd on Bionic part two V2" https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+attachment/5614287/+files/lp1988119_bionic_part_two_V2.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1988119 Title: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: Fix Committed Bug description: [Impact] A widespread outage was caused on Azure instances earlier today, when systemd 237-3ubuntu10.54 was published to the bionic-security pocket. Instances could no longer resolve DNS queries, breaking networking. For affected users, the following workarounds are available. Use whatever is most convenient. - Reboot your instances - or - - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" as root The trigger was found to be open-vm-tools issuing "udevadm trigger". Azure has a specific netplan setup that uses the `driver` match to set up networking. If a udevadm trigger is executed, the KV pair that contains this info is lost. Next time netplan is executed, the server loses it's DNS information. This is the same as bug 1902960 experienced on Focal two years ago. The root cause was found to be a bug in systemd, where if we receive a "Remove" action from a change uevent, we need to run net_setup_link(), we need to skip device rename and keep the old name. [Testcase] Start an instance up on Azure, any type. Simply issue udevadm trigger and reload systemd-networkd: $ ping google.com PING google.com (172.253.62.102) 56(84) bytes of data. 64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 time=1.85 ms $ sudo udevadm trigger && sudo systemctl restart systemd-networkd $ ping google.com ping: google.com: Temporary failure in name resolution To fix a broken instance, you can run: $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd- networkd and then install the test packages below: Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test If you install them, the issue should no longer occur. [Where problems could occur] If a regression were to occur, it would affect systemd-udevd processing 'change' events from network devices, which could lead to network outages. Since this would happen when systemd-networkd is restarted on postinstall, a regression would cause widespread outages due to this SRU being targeted to the security pocket, where unattended-upgrades will automatically install from. Side effects could include incorrect udevd device properties. It is very important that this SRU is well tested before release. [Other info] This was fixed in Systemd 247 with the following commit: commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 Author: Yu Watanabe Date: Mon, 14 Sep 2020 15:21:04 +0900 Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= properties on non-'add' uevent Link: https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960 two years ago. Focal required a heavy backport, which was performed by Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re- assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below pastebin: https://paste.ubuntu.com/p/K5k7bGt3Wx/ The changes between the Focal backport and the Bionic backport are: - We use udev_device_get_action() instead of device_get_action() - device_action_from_string() is used to get to enum DeviceAction - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto no_rename" - log_device_* has been changed to log_*. See attached debdiff for Bionic backport. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure
Attached is the second patch required to fully fix this bug. It adds a check on preinstall to see if ID_NET_DRIVER is present on the network interface, and if it is missing, call udevadm trigger -c add on the interface to add it. ** Patch added: "Debdiff for systemd on Bionic part two" https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+attachment/5613890/+files/lp1988119_bionic_part_two.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1988119 Title: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: Fix Committed Bug description: [Impact] A widespread outage was caused on Azure instances earlier today, when systemd 237-3ubuntu10.54 was published to the bionic-security pocket. Instances could no longer resolve DNS queries, breaking networking. For affected users, the following workarounds are available. Use whatever is most convenient. - Reboot your instances - or - - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" as root The trigger was found to be open-vm-tools issuing "udevadm trigger". Azure has a specific netplan setup that uses the `driver` match to set up networking. If a udevadm trigger is executed, the KV pair that contains this info is lost. Next time netplan is executed, the server loses it's DNS information. This is the same as bug 1902960 experienced on Focal two years ago. The root cause was found to be a bug in systemd, where if we receive a "Remove" action from a change uevent, we need to run net_setup_link(), we need to skip device rename and keep the old name. [Testcase] Start an instance up on Azure, any type. Simply issue udevadm trigger and reload systemd-networkd: $ ping google.com PING google.com (172.253.62.102) 56(84) bytes of data. 64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 time=1.85 ms $ sudo udevadm trigger && sudo systemctl restart systemd-networkd $ ping google.com ping: google.com: Temporary failure in name resolution To fix a broken instance, you can run: $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd- networkd and then install the test packages below: Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test If you install them, the issue should no longer occur. [Where problems could occur] If a regression were to occur, it would affect systemd-udevd processing 'change' events from network devices, which could lead to network outages. Since this would happen when systemd-networkd is restarted on postinstall, a regression would cause widespread outages due to this SRU being targeted to the security pocket, where unattended-upgrades will automatically install from. Side effects could include incorrect udevd device properties. It is very important that this SRU is well tested before release. [Other info] This was fixed in Systemd 247 with the following commit: commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 Author: Yu Watanabe Date: Mon, 14 Sep 2020 15:21:04 +0900 Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= properties on non-'add' uevent Link: https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960 two years ago. Focal required a heavy backport, which was performed by Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re- assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below pastebin: https://paste.ubuntu.com/p/K5k7bGt3Wx/ The changes between the Focal backport and the Bionic backport are: - We use udev_device_get_action() instead of device_get_action() - device_action_from_string() is used to get to enum DeviceAction - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto no_rename" - log_device_* has been changed to log_*. See attached debdiff for Bionic backport. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure
The failure mode still exists if "udevadm trigger" has been issued before the package upgrade to systemd 237-3ubuntu10.55. That is, if unattended-upgrades or the user had installed open-vm-tools, and has not rebooted yet, they will lose network connection on upgrade to 237-3ubuntu10.55. We need to implement a way to add ID_NET_DRIVER back to the device before the systemd upgrade takes place, otherwise an outage will occur. Release admins - DO NOT RELEASE systemd 237-3ubuntu10.55 yet. Tagging block-proposed. $ ping google.com PING google.com (142.251.45.110) 56(84) bytes of data. 64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=1 ttl=56 time=1.51 ms 64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=2 ttl=56 time=1.35 ms 64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=3 ttl=56 time=1.17 ms ^C --- google.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 1.172/1.349/1.516/0.140 ms azureuser@mruffell-test:~$ sudo apt-cache policy systemd | grep Installed Installed: 237-3ubuntu10.53 azureuser@mruffell-test:~$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER E: ID_NET_DRIVER=hv_netvsc azureuser@mruffell-test:~$ sudo udevadm trigger azureuser@mruffell-test:~$ ping google.com PING google.com (142.251.45.110) 56(84) bytes of data. 64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=1 ttl=56 time=2.15 ms 64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=2 ttl=56 time=1.21 ms ^C --- google.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 1.212/1.682/2.152/0.470 ms azureuser@mruffell-test:~$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER azureuser@mruffell-test:~$ sudo apt install libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev Reading package lists... Done Building dependency tree Reading state information... Done The following package was automatically installed and is no longer required: linux-headers-4.15.0-191 Use 'sudo apt autoremove' to remove it. Suggested packages: systemd-container The following packages will be upgraded: libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev 7 upgraded, 0 newly installed, 0 to remove and 8 not upgraded. Need to get 4497 kB of archives. After this operation, 8192 B of additional disk space will be used. Get:1 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libsystemd0 amd64 237-3ubuntu10.55 [205 kB] Get:2 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libnss-systemd amd64 237-3ubuntu10.55 [105 kB] Get:3 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libpam-systemd amd64 237-3ubuntu10.55 [107 kB] Get:4 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 systemd amd64 237-3ubuntu10.55 [2915 kB] Get:5 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 udev amd64 237-3ubuntu10.55 [1099 kB] Get:6 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libudev1 amd64 237-3ubuntu10.55 [54.2 kB] Get:7 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 systemd-sysv amd64 237-3ubuntu10.55 [12.0 kB] Fetched 4497 kB in 3s (1461 kB/s) (Reading database ... 77176 files and directories currently installed.) Preparing to unpack .../libsystemd0_237-3ubuntu10.55_amd64.deb ... Unpacking libsystemd0:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ... Setting up libsystemd0:amd64 (237-3ubuntu10.55) ... (Reading database ... 77176 files and directories currently installed.) Preparing to unpack .../libnss-systemd_237-3ubuntu10.55_amd64.deb ... Unpacking libnss-systemd:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ... Preparing to unpack .../libpam-systemd_237-3ubuntu10.55_amd64.deb ... Unpacking libpam-systemd:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ... Preparing to unpack .../systemd_237-3ubuntu10.55_amd64.deb ... Unpacking systemd (237-3ubuntu10.55) over (237-3ubuntu10.53) ... Preparing to unpack .../udev_237-3ubuntu10.55_amd64.deb ... Unpacking udev (237-3ubuntu10.55) over (237-3ubuntu10.53) ... Preparing to unpack .../libudev1_237-3ubuntu10.55_amd64.deb ... Unpacking libudev1:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ... Setting up libudev1:amd64 (237-3ubuntu10.55) ... Setting up systemd (237-3ubuntu10.55) ... (Reading database ... 77176 files and directories currently installed.) Preparing to unpack .../systemd-sysv_237-3ubuntu10.55_amd64.deb ... Unpacking systemd-sysv (237-3ubuntu10.55) over (237-3ubuntu10.53) ... Setting up libnss-systemd:amd64 (237-3ubuntu10.55) ... Setting up systemd-sysv (237-3ubuntu10.55) ... Setting up udev (237-3ubuntu10.55) ... update-initramfs: deferring update (trigger activated) Setting up libpam-systemd:amd64 (237-3ubuntu10.55) ...
[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure
** Changed in: systemd (Ubuntu Bionic) Status: Fix Released => Fix Committed -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1988119 Title: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: Fix Committed Bug description: [Impact] A widespread outage was caused on Azure instances earlier today, when systemd 237-3ubuntu10.54 was published to the bionic-security pocket. Instances could no longer resolve DNS queries, breaking networking. For affected users, the following workarounds are available. Use whatever is most convenient. - Reboot your instances - or - - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" as root The trigger was found to be open-vm-tools issuing "udevadm trigger". Azure has a specific netplan setup that uses the `driver` match to set up networking. If a udevadm trigger is executed, the KV pair that contains this info is lost. Next time netplan is executed, the server loses it's DNS information. This is the same as bug 1902960 experienced on Focal two years ago. The root cause was found to be a bug in systemd, where if we receive a "Remove" action from a change uevent, we need to run net_setup_link(), we need to skip device rename and keep the old name. [Testcase] Start an instance up on Azure, any type. Simply issue udevadm trigger and reload systemd-networkd: $ ping google.com PING google.com (172.253.62.102) 56(84) bytes of data. 64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 time=1.85 ms $ sudo udevadm trigger && sudo systemctl restart systemd-networkd $ ping google.com ping: google.com: Temporary failure in name resolution To fix a broken instance, you can run: $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd- networkd and then install the test packages below: Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test If you install them, the issue should no longer occur. [Where problems could occur] If a regression were to occur, it would affect systemd-udevd processing 'change' events from network devices, which could lead to network outages. Since this would happen when systemd-networkd is restarted on postinstall, a regression would cause widespread outages due to this SRU being targeted to the security pocket, where unattended-upgrades will automatically install from. Side effects could include incorrect udevd device properties. It is very important that this SRU is well tested before release. [Other info] This was fixed in Systemd 247 with the following commit: commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 Author: Yu Watanabe Date: Mon, 14 Sep 2020 15:21:04 +0900 Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= properties on non-'add' uevent Link: https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960 two years ago. Focal required a heavy backport, which was performed by Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re- assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below pastebin: https://paste.ubuntu.com/p/K5k7bGt3Wx/ The changes between the Focal backport and the Bionic backport are: - We use udev_device_get_action() instead of device_get_action() - device_action_from_string() is used to get to enum DeviceAction - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto no_rename" - log_device_* has been changed to log_*. See attached debdiff for Bionic backport. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure
Hello everyone, I know there are quite a few people watching this bug, so I will provide a status update. The test package has been looking good throughout our internal testing, and we have proceeded to build the next systemd update, version 237-3ubuntu10.55, and it is currently in the bionic-security -proposed ppa. If you would like to help test, that would be greatly appreciated. Please use a fresh VM on Azure, and please don't put the package into production just yet. Instructions to install (On a Bionic system): 1) sudo add-apt-repository ppa:ubuntu-security-proposed/ppa 2) sudo apt update 3) sudo apt install libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev 4) sudo apt-cache policy systemd | grep Installed Installed: 237-3ubuntu10.55 5) sudo rm /etc/apt/sources.list.d/ubuntu-security-proposed-ubuntu-ppa-bionic.list 6) sudo apt update >From there you can run the reproducer: $ sudo udevadm trigger && sudo systemctl restart systemd-networkd $ ping google.com PING google.com (172.253.122.138) 56(84) bytes of data. 64 bytes from bh-in-f138.1e100.net (172.253.122.138): icmp_seq=1 ttl=103 time=1.67 ms if you do test, comment here on how it went. Again, please don't put the package into production until it has had a little more testing, and we will get this released to the world as quickly and safely as we can. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1988119 Title: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: In Progress Bug description: [Impact] A widespread outage was caused on Azure instances earlier today, when systemd 237-3ubuntu10.54 was published to the bionic-security pocket. Instances could no longer resolve DNS queries, breaking networking. For affected users, the following workarounds are available. Use whatever is most convenient. - Reboot your instances - or - - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" as root The trigger was found to be open-vm-tools issuing "udevadm trigger". Azure has a specific netplan setup that uses the `driver` match to set up networking. If a udevadm trigger is executed, the KV pair that contains this info is lost. Next time netplan is executed, the server loses it's DNS information. This is the same as bug 1902960 experienced on Focal two years ago. The root cause was found to be a bug in systemd, where if we receive a "Remove" action from a change uevent, we need to run net_setup_link(), we need to skip device rename and keep the old name. [Testcase] Start an instance up on Azure, any type. Simply issue udevadm trigger and reload systemd-networkd: $ ping google.com PING google.com (172.253.62.102) 56(84) bytes of data. 64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 time=1.85 ms $ sudo udevadm trigger && sudo systemctl restart systemd-networkd $ ping google.com ping: google.com: Temporary failure in name resolution To fix a broken instance, you can run: $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd- networkd and then install the test packages below: Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test If you install them, the issue should no longer occur. [Where problems could occur] If a regression were to occur, it would affect systemd-udevd processing 'change' events from network devices, which could lead to network outages. Since this would happen when systemd-networkd is restarted on postinstall, a regression would cause widespread outages due to this SRU being targeted to the security pocket, where unattended-upgrades will automatically install from. Side effects could include incorrect udevd device properties. It is very important that this SRU is well tested before release. [Other info] This was fixed in Systemd 247 with the following commit: commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 Author: Yu Watanabe Date: Mon, 14 Sep 2020 15:21:04 +0900 Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= properties on non-'add' uevent Link: https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960 two years ago. Focal required a heavy backport, which was performed by Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re- assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below pastebin: https://paste.ubuntu.com/p/K5k7bGt3Wx/ The changes between the Focal backport and the Bionic backport are: - We use udev_device_get_action()
[Touch-packages] [Bug 1988119] Re: Update to systemd 237-3ubuntu10.54 broke dns
Attached is a debdiff for systemd on Bionic which fixes this bug. ** Description changed: - Two servers today that updated systemd to "systemd 237-3ubuntu10.54" - https://ubuntu.com/security/notices/USN-5583-1 + [Impact] - could not resolve dns anymore. - no dns servers, normally set through dhcp. + A widespread outage was caused on Azure instances earlier today, when + systemd 237-3ubuntu10.54 was published to the bionic-security pocket. + Instances could no longer resolve DNS queries, breaking networking. - Ubuntu 18.04 + For affected users, the following workarounds are available. Use whatever is most convenient. + - Reboot your instances + - or - + - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" as root - Temp fix. - 1. Edit /etc/systemd/resolved.conf - 1. Add/Uncomment # FallbackDNS=168.63.129.16 - 1. Restart systemd-resolved sudo systemctl restart systemd-resolved.service - 1. Confirm dns working with systemd-resolve google.com + The trigger was found to be open-vm-tools issuing "udevadm trigger". + Azure has a specific netplan setup that uses the `driver` match to set + up networking. If a udevadm trigger is executed, the KV pair that + contains this info is lost. Next time netplan is executed, the server + loses it's DNS information. + + This is the same as bug 1902960 experienced on Focal two years ago. + + The root cause was found to be a bug in systemd, where if we receive a + "Remove" action from a change uevent, we need to run net_setup_link(), + we need to skip device rename and keep the old name. + + [Testcase] + + Start an instance up on Azure, any type. Simply issue udevadm trigger + and reload systemd-networkd: + + $ ping google.com + PING google.com (172.253.62.102) 56(84) bytes of data. + 64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 time=1.85 ms + $ sudo udevadm trigger && sudo systemctl restart systemd-networkd + $ ping google.com + ping: google.com: Temporary failure in name resolution + + To fix a broken instance, you can run: + + $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd- + networkd + + and then install the test packages below: + + Test packages are available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test + + If you install them, the issue should no longer occur. + + [Where problems could occur] + + If a regression were to occur, it would affect systemd-udevd processing + 'change' events from network devices, which could lead to network + outages. Since this would happen when systemd-networkd is restarted on + postinstall, a regression would cause widespread outages due to this SRU + being targeted to the security pocket, where unattended-upgrades will + automatically install from. + + Side effects could include incorrect udevd device properties. + + It is very important that this SRU is well tested before release. + + [Other info] + + This was fixed in Systemd 247 with the following commit: + + commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 + Author: Yu Watanabe + Date: Mon, 14 Sep 2020 15:21:04 +0900 + Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= properties on non-'add' uevent + Link: https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151 + + This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960 + two years ago. Focal required a heavy backport, which was performed by + Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re- + assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below + pastebin: + + https://paste.ubuntu.com/p/K5k7bGt3Wx/ + + The changes between the Focal backport and the Bionic backport are: + + - We use udev_device_get_action() instead of device_get_action() + - device_action_from_string() is used to get to enum DeviceAction + - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto no_rename" + - log_device_* has been changed to log_*. + + See attached debdiff for Bionic backport. ** Summary changed: - Update to systemd 237-3ubuntu10.54 broke dns + systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure ** Patch added: "Debdiff for systemd on Bionic" https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+attachment/5612617/+files/lp1988119_bionic.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1988119 Title: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: In Progress Bug description: [Impact] A widespread outage was caused on Azure instances earlier today, when systemd 237-3ubuntu10.54 was published to the bionic-security pocket. Instances could no longer
[Touch-packages] [Bug 1988119] Re: Update to systemd 237-3ubuntu10.54 broke dns
** Changed in: systemd (Ubuntu Bionic) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Tags added: bionic sts -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1988119 Title: Update to systemd 237-3ubuntu10.54 broke dns Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: In Progress Bug description: Two servers today that updated systemd to "systemd 237-3ubuntu10.54" https://ubuntu.com/security/notices/USN-5583-1 could not resolve dns anymore. no dns servers, normally set through dhcp. Ubuntu 18.04 Temp fix. 1. Edit /etc/systemd/resolved.conf 1. Add/Uncomment # FallbackDNS=168.63.129.16 1. Restart systemd-resolved sudo systemctl restart systemd-resolved.service 1. Confirm dns working with systemd-resolve google.com To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1988119] Re: Update to systemd 237-3ubuntu10.54 broke dns
** Also affects: systemd (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: systemd (Ubuntu Bionic) Status: New => In Progress ** Changed in: systemd (Ubuntu) Status: Confirmed => Fix Released ** Changed in: systemd (Ubuntu Bionic) Importance: Undecided => Critical -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1988119 Title: Update to systemd 237-3ubuntu10.54 broke dns Status in systemd package in Ubuntu: Fix Released Status in systemd source package in Bionic: In Progress Bug description: Two servers today that updated systemd to "systemd 237-3ubuntu10.54" https://ubuntu.com/security/notices/USN-5583-1 could not resolve dns anymore. no dns servers, normally set through dhcp. Ubuntu 18.04 Temp fix. 1. Edit /etc/systemd/resolved.conf 1. Add/Uncomment # FallbackDNS=168.63.129.16 1. Restart systemd-resolved sudo systemctl restart systemd-resolved.service 1. Confirm dns working with systemd-resolve google.com To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1964880] Re: software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages'
I installed software-properties 0.99.20 from -proposed, and opened software-properties-gtk, and clicked the "Additional Drivers" tab. The tab loaded correctly and did not crash. The package in -proposed fixes the issue, happy to mark verified. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to software-properties in Ubuntu. https://bugs.launchpad.net/bugs/1964880 Title: software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages' Status in software-properties package in Ubuntu: Fix Committed Status in software-properties source package in Jammy: Fix Committed Bug description: Opened up "Software & Updates" and clicked the "Additional Drivers Tab", for the tab to crash. ProblemType: Crash DistroRelease: Ubuntu 22.04 Package: software-properties-gtk 0.99.19 ProcVersionSignature: Ubuntu 5.15.0-23.23-generic 5.15.27 Uname: Linux 5.15.0-23-generic x86_64 ApportVersion: 2.20.11-0ubuntu79 Architecture: amd64 CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Tue Mar 15 19:02:39 2022 ExecutablePath: /usr/bin/software-properties-gtk InstallationDate: Installed on 2022-01-02 (72 days ago) InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220101) InterpreterPath: /usr/bin/python3.10 PackageArchitecture: all ProcCmdline: /usr/bin/python3 /usr/bin/software-properties-gtk Python3Details: /usr/bin/python3.10, Python 3.10.2+, python3-minimal, 3.10.1-0ubuntu2 PythonArgs: ['/usr/bin/software-properties-gtk'] PythonDetails: N/A SourcePackage: software-properties Title: software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages' UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/software-properties/+bug/1964880/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1964880] [NEW] software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages'
Public bug reported: Opened up "Software & Updates" and clicked the "Additional Drivers Tab", for the tab to crash. ProblemType: Crash DistroRelease: Ubuntu 22.04 Package: software-properties-gtk 0.99.19 ProcVersionSignature: Ubuntu 5.15.0-23.23-generic 5.15.27 Uname: Linux 5.15.0-23-generic x86_64 ApportVersion: 2.20.11-0ubuntu79 Architecture: amd64 CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Tue Mar 15 19:02:39 2022 ExecutablePath: /usr/bin/software-properties-gtk InstallationDate: Installed on 2022-01-02 (72 days ago) InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220101) InterpreterPath: /usr/bin/python3.10 PackageArchitecture: all ProcCmdline: /usr/bin/python3 /usr/bin/software-properties-gtk Python3Details: /usr/bin/python3.10, Python 3.10.2+, python3-minimal, 3.10.1-0ubuntu2 PythonArgs: ['/usr/bin/software-properties-gtk'] PythonDetails: N/A SourcePackage: software-properties Title: software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages' UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo ** Affects: software-properties (Ubuntu) Importance: Undecided Status: New ** Affects: software-properties (Ubuntu Jammy) Importance: Undecided Status: New ** Tags: amd64 apport-crash jammy need-duplicate-check third-party-packages wayland-session ** Also affects: software-properties (Ubuntu Jammy) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to software-properties in Ubuntu. https://bugs.launchpad.net/bugs/1964880 Title: software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages' Status in software-properties package in Ubuntu: New Status in software-properties source package in Jammy: New Bug description: Opened up "Software & Updates" and clicked the "Additional Drivers Tab", for the tab to crash. ProblemType: Crash DistroRelease: Ubuntu 22.04 Package: software-properties-gtk 0.99.19 ProcVersionSignature: Ubuntu 5.15.0-23.23-generic 5.15.27 Uname: Linux 5.15.0-23-generic x86_64 ApportVersion: 2.20.11-0ubuntu79 Architecture: amd64 CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Tue Mar 15 19:02:39 2022 ExecutablePath: /usr/bin/software-properties-gtk InstallationDate: Installed on 2022-01-02 (72 days ago) InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220101) InterpreterPath: /usr/bin/python3.10 PackageArchitecture: all ProcCmdline: /usr/bin/python3 /usr/bin/software-properties-gtk Python3Details: /usr/bin/python3.10, Python 3.10.2+, python3-minimal, 3.10.1-0ubuntu2 PythonArgs: ['/usr/bin/software-properties-gtk'] PythonDetails: N/A SourcePackage: software-properties Title: software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages' UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/software-properties/+bug/1964880/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1960863] Re: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances
Performing verification for openssl on Focal. An affected user performed the verification, due to c7g instance types being in "Preview" state on Amazon AWS, and not generally accessible. The user started a c7g instance, and checked they had openssl 1.1.1f-1ubuntu2.10 from -updates. They attempted to use the poly1035 MAC downloading the file from the testcase: $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped) They can reproduce the issue. They then enabled -proposed from ports.ubuntu.com mirror, and installed openssl 1.1.1f-1ubuntu2.11. They again tried downloading the file: $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 (note the file doesn't actually download due to curl not automatically following 301 redirects): $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin --verbose ... * SSL connection using TLSv1.2 / ECDHE-ECDSA-CHACHA20-POLY1305 ... < HTTP/1.1 301 Moved Permanently < Location: https://downloads.gradle-dn.com/distributions/gradle-7.2-bin.zip ... curl does not segfault, and exits successfully. The package in -proposed fixes the issue. Happy to mark as verified. ** Tags removed: sts-sponsor verification-needed verification-needed-focal ** Tags added: verification-done verification-done-focal -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1960863 Title: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: Fix Committed Bug description: [Impact] Support for hardware pointer authentication for armv8 systems was merged in openssl 1.1.1f, but it contains a bug in the implementation for poly1305 message authenticated code routines, which causes the calling program to fail pointer authentication, which causes the program to crash with a segmentation fault. You can easily test it by accessing any website that uses poly1305. There is no workaround except use a different MAC. [Testcase] This bug applies to armv8 systems which support pointer authentication. Start an armv8 instance, such as a c7g graviton 3 instance on AWS, and make sure the paca flag is present in lscpu: $ grep paca /proc/cpuinfo Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng Next, attempt to connect to any website that uses poly1305 MAC. $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped) There is a test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test Install it, and poly1305 operations will no longer segfault. [Where problems could occur] The patch changes the order of operations for loading the SP and checking the AUTIASP against it, from checking the AUTIASP against nothing then loading the correct SP to check with, to the correct loading the SP and then checking the AUTIASP against the SP. This only changes one code path for armv8 systems, and other architectures are not affected. This is also only limited to poly1305 MAC. If a regression were to occur, it would only affect users of poly1035 MAC on armv8 with pacs support. [Other info] The fix landed upstream in openssl 1.1.1i with the following commit: commit 5795acffd8706e1cb584284ee5bb3a30986d0e75 Author: Ard Biesheuvel Date: Tue Oct 27 18:02:40 2020 +0100 Subject: crypto/poly1305/asm: fix armv8 pointer authentication Link: https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75 This commit is already present in Impish onward. Only Focal needs the fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net
[Touch-packages] [Bug 1961542] Re: libsmartcols: Revert back to previous behaviour of non-shell parsable column output (lsblk -P)
Attached is a debdiff for Jammy util-linux. ** Patch added: "debdiff for util-linux on Jammy" https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1961542/+attachment/5562373/+files/lp1961542_jammy.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/1961542 Title: libsmartcols: Revert back to previous behaviour of non-shell parsable column output (lsblk -P) Status in util-linux package in Ubuntu: In Progress Status in util-linux source package in Jammy: In Progress Bug description: [Impact] util-linux 2.37 in Jammy has introduced some new behaviour for lsblk and similar tools which depend on libsmartcols. This switched the -P / --pairs parameter from printing column names as normal, to changing the names to shell compatible names instead. e.g. lsblk -P now outputs LOG_SEC instead of LOG-SEC. The change broke some core tooling which rely on on the output of lsblk -P, most notably, curtin and MAAS, but I am sure there will be more applications affected. Affected MAAS users will see the following traceback when attempting to deploy 22.04: Traceback (most recent call last): File "/curtin/curtin/block/__init__.py", line 785, in get_blockdev_sector_size logical = info[parent]['LOG-SEC'] KeyError: 'LOG-SEC' 'LOG-SEC' curtin: Installation failed with exception: Unexpected error while running This is documented in MAAS bug 1956613. MAAS decided to fix it by changing from -P to -J, in the following commit: https://git.launchpad.net/maas/commit/?id=e2c01963430e6837198a54bc1eadf3efc9fdd9a2 Curtin now checks for MAJ_MIN, and changes it back to MAJ:MIN in: https://github.com/canonical/curtin/commit/ce811db127fe1ce46498b83615f8faed8c7dfeb6 The issue is that these commits are not tagged to any MAAS release, and users would be forced to upgrade MAAS to the latest stable release when available if they want to deploy 22.04. There are many users out there that don't want to upgrade MAAS, so returning to the previous column output is the most desirable solution. [Testcase] On a Jammy install, simply run lsblk with either -P or --pairs: $ sudo lsblk -P ... NAME="sda" MAJ_MIN="8:0" RM="0" SIZE="465.8G" RO="0" TYPE="disk" MOUNTPOINTS="" ... Affected installs will see MAJ_MIN. There is a test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf326660-test If you install it, you will see MAJ:MIN, just like it is on Impish and previous. [Where problems could occur] We are changing the column output for -P and --pairs for the following applications: * lsblk * findmnt * lsipc If any application has been modified to depend on the new column output, it will break. I don't have any examples of something that will break, because MAAS and curtin were modified such that they would be compatible with both column name formats. It should be noted that the manpage documents that lsblk output can change at any time: > The default output, as well as the default output from options like --fs and --topology, is subject to change. > So whenever possible, you should avoid using default outputs in your scripts. > Always explicitly define expected columns by using --output columns-list and --list in environments where a stable output is required. If a regression should occur, we will need to fix up these affected packages also. [Other info] The change came about when a user asked upstream to make -P / --pairs shell parsable, in Issue 1201 upstream: https://github.com/util-linux/util-linux/issues/1201 Karel Zak obliged, and it was implemented in the following commit: commit 58b510e5805d8350c31bfb81a47bcd38ea9fdd7e From: Karel Zak Date: Thu, 3 Dec 2020 12:14:10 +0100 Subject: libsmartcols: sanitize variable names on export output Link: https://github.com/util-linux/util-linux/commit/58b510e5805d8350c31bfb81a47bcd38ea9fdd7e I wrote to Karel Zak with the regressions introduced by changing the format, and asked to revert back, and instead implement the shell parsable logic as a new parameter. This happened in upstream issue 1594: https://github.com/util-linux/util-linux/issues/1594 Karel Zak was happy to oblige again, and we now have the following commits: 338ad4a93 findmnt: commit missing flag 0f843ab64 lsblk: update --help output for -y eba05f308 lsipc: add -y,--shell 152c17aa4 findmnt: add -y,--shell 9c7e81ff1 lslogins: add -y,--shell 25fb0638a lsblk: add -y/--shell 39679ea0c lsfd: use new libsmartcols functions 6fd0e3590 column: use new libsmartcols functions 0b3c2e80d include/carefulputc: remove unused function 3b5db50f7 libsmartcols: change "export" behavior, add "shellvar" flag While we got the intended behaviour, these commits won't land until
[Touch-packages] [Bug 1961542] [NEW] libsmartcols: Revert back to previous behaviour of non-shell parsable column output (lsblk -P)
Public bug reported: [Impact] util-linux 2.37 in Jammy has introduced some new behaviour for lsblk and similar tools which depend on libsmartcols. This switched the -P / --pairs parameter from printing column names as normal, to changing the names to shell compatible names instead. e.g. lsblk -P now outputs LOG_SEC instead of LOG-SEC. The change broke some core tooling which rely on on the output of lsblk -P, most notably, curtin and MAAS, but I am sure there will be more applications affected. Affected MAAS users will see the following traceback when attempting to deploy 22.04: Traceback (most recent call last): File "/curtin/curtin/block/__init__.py", line 785, in get_blockdev_sector_size logical = info[parent]['LOG-SEC'] KeyError: 'LOG-SEC' 'LOG-SEC' curtin: Installation failed with exception: Unexpected error while running This is documented in MAAS bug 1956613. MAAS decided to fix it by changing from -P to -J, in the following commit: https://git.launchpad.net/maas/commit/?id=e2c01963430e6837198a54bc1eadf3efc9fdd9a2 Curtin now checks for MAJ_MIN, and changes it back to MAJ:MIN in: https://github.com/canonical/curtin/commit/ce811db127fe1ce46498b83615f8faed8c7dfeb6 The issue is that these commits are not tagged to any MAAS release, and users would be forced to upgrade MAAS to the latest stable release when available if they want to deploy 22.04. There are many users out there that don't want to upgrade MAAS, so returning to the previous column output is the most desirable solution. [Testcase] On a Jammy install, simply run lsblk with either -P or --pairs: $ sudo lsblk -P ... NAME="sda" MAJ_MIN="8:0" RM="0" SIZE="465.8G" RO="0" TYPE="disk" MOUNTPOINTS="" ... Affected installs will see MAJ_MIN. There is a test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf326660-test If you install it, you will see MAJ:MIN, just like it is on Impish and previous. [Where problems could occur] We are changing the column output for -P and --pairs for the following applications: * lsblk * findmnt * lsipc If any application has been modified to depend on the new column output, it will break. I don't have any examples of something that will break, because MAAS and curtin were modified such that they would be compatible with both column name formats. It should be noted that the manpage documents that lsblk output can change at any time: > The default output, as well as the default output from options like --fs and > --topology, is subject to change. > So whenever possible, you should avoid using default outputs in your scripts. > Always explicitly define expected columns by using --output columns-list and > --list in environments where a stable output is required. If a regression should occur, we will need to fix up these affected packages also. [Other info] The change came about when a user asked upstream to make -P / --pairs shell parsable, in Issue 1201 upstream: https://github.com/util-linux/util-linux/issues/1201 Karel Zak obliged, and it was implemented in the following commit: commit 58b510e5805d8350c31bfb81a47bcd38ea9fdd7e From: Karel Zak Date: Thu, 3 Dec 2020 12:14:10 +0100 Subject: libsmartcols: sanitize variable names on export output Link: https://github.com/util-linux/util-linux/commit/58b510e5805d8350c31bfb81a47bcd38ea9fdd7e I wrote to Karel Zak with the regressions introduced by changing the format, and asked to revert back, and instead implement the shell parsable logic as a new parameter. This happened in upstream issue 1594: https://github.com/util-linux/util-linux/issues/1594 Karel Zak was happy to oblige again, and we now have the following commits: 338ad4a93 findmnt: commit missing flag 0f843ab64 lsblk: update --help output for -y eba05f308 lsipc: add -y,--shell 152c17aa4 findmnt: add -y,--shell 9c7e81ff1 lslogins: add -y,--shell 25fb0638a lsblk: add -y/--shell 39679ea0c lsfd: use new libsmartcols functions 6fd0e3590 column: use new libsmartcols functions 0b3c2e80d include/carefulputc: remove unused function 3b5db50f7 libsmartcols: change "export" behavior, add "shellvar" flag While we got the intended behaviour, these commits won't land until util-linux 2.38, which will be after Jammy releases, and the other issue is that this changes a significant amount of code, like nearly 1k lines, and is spread over 10+ commits. I wrote to ubuntu-devel asking for advice, on either 1) not changing anything 2) backporting the 10+ new commits, or 3) simply reverting the commit which changed the behaviour. https://lists.ubuntu.com/archives/ubuntu-devel/2022-February/041870.html ubuntu-devel had strong support for option (3). Hence, we will revert the below commit to ensure Jammy can be deployed on all existing MAAS releases. 58b510e580 libsmartcols: sanitize variable names on export output ** Affects: util-linu
[Touch-packages] [Bug 1960863] Re: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances
** Tags added: sts-sponsor -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1960863 Title: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: In Progress Bug description: [Impact] Support for hardware pointer authentication for armv8 systems was merged in openssl 1.1.1f, but it contains a bug in the implementation for poly1305 message authenticated code routines, which causes the calling program to fail pointer authentication, which causes the program to crash with a segmentation fault. You can easily test it by accessing any website that uses poly1305. There is no workaround except use a different MAC. [Testcase] This bug applies to armv8 systems which support pointer authentication. Start an armv8 instance, such as a c7g graviton 3 instance on AWS, and make sure the paca flag is present in lscpu: $ grep paca /proc/cpuinfo Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng Next, attempt to connect to any website that uses poly1305 MAC. $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped) There is a test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test Install it, and poly1305 operations will no longer segfault. [Where problems could occur] The patch changes the order of operations for loading the SP and checking the AUTIASP against it, from checking the AUTIASP against nothing then loading the correct SP to check with, to the correct loading the SP and then checking the AUTIASP against the SP. This only changes one code path for armv8 systems, and other architectures are not affected. This is also only limited to poly1305 MAC. If a regression were to occur, it would only affect users of poly1035 MAC on armv8 with pacs support. [Other info] The fix landed upstream in openssl 1.1.1i with the following commit: commit 5795acffd8706e1cb584284ee5bb3a30986d0e75 Author: Ard Biesheuvel Date: Tue Oct 27 18:02:40 2020 +0100 Subject: crypto/poly1305/asm: fix armv8 pointer authentication Link: https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75 This commit is already present in Impish onward. Only Focal needs the fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1960863] Re: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances
Attached is a debdiff for openssl on Focal ** Patch added: "debdiff for openssl on Focal" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+attachment/5560898/+files/lp1960863_focal.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1960863 Title: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: In Progress Bug description: [Impact] Support for hardware pointer authentication for armv8 systems was merged in openssl 1.1.1f, but it contains a bug in the implementation for poly1305 message authenticated code routines, which causes the calling program to fail pointer authentication, which causes the program to crash with a segmentation fault. You can easily test it by accessing any website that uses poly1305. There is no workaround except use a different MAC. [Testcase] This bug applies to armv8 systems which support pointer authentication. Start an armv8 instance, such as a c7g graviton 3 instance on AWS, and make sure the paca flag is present in lscpu: $ grep paca /proc/cpuinfo Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng Next, attempt to connect to any website that uses poly1305 MAC. $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped) There is a test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test Install it, and poly1305 operations will no longer segfault. [Where problems could occur] The patch changes the order of operations for loading the SP and checking the AUTIASP against it, from checking the AUTIASP against nothing then loading the correct SP to check with, to the correct loading the SP and then checking the AUTIASP against the SP. This only changes one code path for armv8 systems, and other architectures are not affected. This is also only limited to poly1305 MAC. If a regression were to occur, it would only affect users of poly1035 MAC on armv8 with pacs support. [Other info] The fix landed upstream in openssl 1.1.1i with the following commit: commit 5795acffd8706e1cb584284ee5bb3a30986d0e75 Author: Ard Biesheuvel Date: Tue Oct 27 18:02:40 2020 +0100 Subject: crypto/poly1305/asm: fix armv8 pointer authentication Link: https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75 This commit is already present in Impish onward. Only Focal needs the fix. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1960863] [NEW] armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances
Public bug reported: [Impact] Support for hardware pointer authentication for armv8 systems was merged in openssl 1.1.1f, but it contains a bug in the implementation for poly1305 message authenticated code routines, which causes the calling program to fail pointer authentication, which causes the program to crash with a segmentation fault. You can easily test it by accessing any website that uses poly1305. There is no workaround except use a different MAC. [Testcase] This bug applies to armv8 systems which support pointer authentication. Start an armv8 instance, such as a c7g graviton 3 instance on AWS, and make sure the paca flag is present in lscpu: $ grep paca /proc/cpuinfo Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng Next, attempt to connect to any website that uses poly1305 MAC. $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped) There is a test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test Install it, and poly1305 operations will no longer segfault. [Where problems could occur] The patch changes the order of operations for loading the SP and checking the AUTIASP against it, from checking the AUTIASP against nothing then loading the correct SP to check with, to the correct loading the SP and then checking the AUTIASP against the SP. This only changes one code path for armv8 systems, and other architectures are not affected. This is also only limited to poly1305 MAC. If a regression were to occur, it would only affect users of poly1035 MAC on armv8 with pacs support. [Other info] The fix landed upstream in openssl 1.1.1i with the following commit: commit 5795acffd8706e1cb584284ee5bb3a30986d0e75 Author: Ard Biesheuvel Date: Tue Oct 27 18:02:40 2020 +0100 Subject: crypto/poly1305/asm: fix armv8 pointer authentication Link: https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75 This commit is already present in Impish onward. Only Focal needs the fix. ** Affects: openssl (Ubuntu) Importance: Undecided Status: Fix Released ** Affects: openssl (Ubuntu Focal) Importance: High Assignee: Matthew Ruffell (mruffell) Status: In Progress ** Tags: focal sts ** Also affects: openssl (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: openssl (Ubuntu) Status: New => Fix Released ** Changed in: openssl (Ubuntu Focal) Status: New => In Progress ** Changed in: openssl (Ubuntu Focal) Importance: Undecided => High ** Changed in: openssl (Ubuntu Focal) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Tags added: focal sts -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1960863 Title: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: In Progress Bug description: [Impact] Support for hardware pointer authentication for armv8 systems was merged in openssl 1.1.1f, but it contains a bug in the implementation for poly1305 message authenticated code routines, which causes the calling program to fail pointer authentication, which causes the program to crash with a segmentation fault. You can easily test it by accessing any website that uses poly1305. There is no workaround except use a different MAC. [Testcase] This bug applies to armv8 systems which support pointer authentication. Start an armv8 instance, such as a c7g graviton 3 instance on AWS, and make sure the paca flag is present in lscpu: $ grep paca /proc/cpuinfo Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng Next, attempt to connect to any website that uses poly1305 MAC. $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output gradle-7.2.bin % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped) There is a test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test Install it, and poly1305 operations will no longer segfault. [Where problems could occur]
[Touch-packages] [Bug 1954724] Re: Removing unattended-upgrades removes ubuntu-server-minimal
This has been fixed as of ubuntu-meta 1.474 https://launchpad.net/ubuntu/+source/ubuntu-meta/1.474 $ sudo apt rdepends unattended-upgrades unattended-upgrades Reverse Depends: Recommends: python3-software-properties Recommends: ubuntu-mate-desktop Recommends: ubuntu-mate-core Depends: freedombox Recommends: fbx-all Recommends: ubuntu-server-minimal Recommends: ubuntu-server ** Changed in: ubuntu-meta (Ubuntu Jammy) Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to ubuntu-meta in Ubuntu. https://bugs.launchpad.net/bugs/1954724 Title: Removing unattended-upgrades removes ubuntu-server-minimal Status in ubuntu-meta package in Ubuntu: Fix Released Status in ubuntu-meta source package in Impish: Confirmed Status in ubuntu-meta source package in Jammy: Fix Released Bug description: On Impish and later, removing unattended-upgrades also removes ubuntu- server-minimal due to ubuntu-server-minimal depending on unattended- upgrades $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: ubuntu-server-minimal unattended-upgrades 0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded. This behaviour has changed since ubuntu-meta 1.471 [1] when the ubuntu-server-minimal metapackage was introduced, declaring unattended-upgrades as Depends. [1] https://launchpadlibrarian.net/550345392/ubuntu- meta_1.470_1.471.diff.gz On Focal, there was no such behaviour on a fresh ubuntu-server install: $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: unattended-upgrades 0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded. Removing unattended-upgrades is quite popular amongst our users, and they should be allowed to remove the package without removing the ubuntu-server-minimal metapackage. Looking at the source package for ubuntu-meta, unattended-upgrades is only Depends for ubuntu-server-minimal, maybe we should simply remove it, or instead, change to recommends? $ grep -Rin "unattended-upgrades" . ./server-minimal-armhf:23:unattended-upgrades ./server-minimal-riscv64:23:unattended-upgrades ./server-minimal-arm64:23:unattended-upgrades ./server-minimal-ppc64el:23:unattended-upgrades ./server-minimal-s390x:24:unattended-upgrades ./server-minimal-amd64:23:unattended-upgrades $ sudo apt rdepends unattended-upgrades unattended-upgrades Reverse Depends: Recommends: python3-software-properties Recommends: ubuntu-mate-desktop Recommends: ubuntu-mate-core Depends: freedombox Recommends: fbx-all Depends: ubuntu-server-minimal To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1954724/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1954724] Re: Removing unattended-upgrades removes ubuntu-server-minimal
** Description changed: On Impish and later, removing unattended-upgrades also removes ubuntu- server-minimal due to ubuntu-server-minimal depending on unattended- upgrades $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: - ubuntu-server-minimal unattended-upgrades + ubuntu-server-minimal unattended-upgrades 0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded. This behaviour has changed since ubuntu-meta 1.471 [1] when the ubuntu- server-minimal metapackage was introduced, declaring unattended-upgrades as Depends. [1] https://launchpadlibrarian.net/550345392/ubuntu- meta_1.470_1.471.diff.gz On Focal, there was no such behaviour on a fresh ubuntu-server install: $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: - unattended-upgrades + unattended-upgrades 0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded. Removing unattended-upgrades is quite popular amongst our users, and they should be allowed to remove the package without removing the ubuntu-server-minimal metapackage. Looking at the source package for ubuntu-meta, unattended-upgrades is only Depends for ubuntu-server-minimal, maybe we should simply remove - it? + it, or instead, change to recommends? $ grep -Rin "unattended-upgrades" . ./server-minimal-armhf:23:unattended-upgrades ./server-minimal-riscv64:23:unattended-upgrades ./server-minimal-arm64:23:unattended-upgrades ./server-minimal-ppc64el:23:unattended-upgrades ./server-minimal-s390x:24:unattended-upgrades ./server-minimal-amd64:23:unattended-upgrades $ sudo apt rdepends unattended-upgrades unattended-upgrades Reverse Depends: - Recommends: python3-software-properties - Recommends: ubuntu-mate-desktop - Recommends: ubuntu-mate-core - Depends: freedombox - Recommends: fbx-all - Depends: ubuntu-server-minimal + Recommends: python3-software-properties + Recommends: ubuntu-mate-desktop + Recommends: ubuntu-mate-core + Depends: freedombox + Recommends: fbx-all + Depends: ubuntu-server-minimal -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to ubuntu-meta in Ubuntu. https://bugs.launchpad.net/bugs/1954724 Title: Removing unattended-upgrades removes ubuntu-server-minimal Status in ubuntu-meta package in Ubuntu: New Status in ubuntu-meta source package in Impish: New Status in ubuntu-meta source package in Jammy: New Bug description: On Impish and later, removing unattended-upgrades also removes ubuntu- server-minimal due to ubuntu-server-minimal depending on unattended- upgrades $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: ubuntu-server-minimal unattended-upgrades 0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded. This behaviour has changed since ubuntu-meta 1.471 [1] when the ubuntu-server-minimal metapackage was introduced, declaring unattended-upgrades as Depends. [1] https://launchpadlibrarian.net/550345392/ubuntu- meta_1.470_1.471.diff.gz On Focal, there was no such behaviour on a fresh ubuntu-server install: $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: unattended-upgrades 0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded. Removing unattended-upgrades is quite popular amongst our users, and they should be allowed to remove the package without removing the ubuntu-server-minimal metapackage. Looking at the source package for ubuntu-meta, unattended-upgrades is only Depends for ubuntu-server-minimal, maybe we should simply remove it, or instead, change to recommends? $ grep -Rin "unattended-upgrades" . ./server-minimal-armhf:23:unattended-upgrades ./server-minimal-riscv64:23:unattended-upgrades ./server-minimal-arm64:23:unattended-upgrades ./server-minimal-ppc64el:23:unattended-upgrades ./server-minimal-s390x:24:unattended-upgrades ./server-minimal-amd64:23:unattended-upgrades $ sudo apt rdepends unattended-upgrades unattended-upgrades Reverse Depends: Recommends: python3-software-properties Recommends: ubuntu-mate-desktop Recommends: ubuntu-mate-core Depends: freedombox Recommends: fbx-all Depends: ubuntu-server-minimal To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1954724/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1954724] [NEW] Removing unattended-upgrades removes ubuntu-server-minimal
Public bug reported: On Impish and later, removing unattended-upgrades also removes ubuntu- server-minimal due to ubuntu-server-minimal depending on unattended- upgrades $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: ubuntu-server-minimal unattended-upgrades 0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded. This behaviour has changed since ubuntu-meta 1.471 [1] when the ubuntu- server-minimal metapackage was introduced, declaring unattended-upgrades as Depends. [1] https://launchpadlibrarian.net/550345392/ubuntu- meta_1.470_1.471.diff.gz On Focal, there was no such behaviour on a fresh ubuntu-server install: $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: unattended-upgrades 0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded. Removing unattended-upgrades is quite popular amongst our users, and they should be allowed to remove the package without removing the ubuntu-server-minimal metapackage. Looking at the source package for ubuntu-meta, unattended-upgrades is only Depends for ubuntu-server-minimal, maybe we should simply remove it? $ grep -Rin "unattended-upgrades" . ./server-minimal-armhf:23:unattended-upgrades ./server-minimal-riscv64:23:unattended-upgrades ./server-minimal-arm64:23:unattended-upgrades ./server-minimal-ppc64el:23:unattended-upgrades ./server-minimal-s390x:24:unattended-upgrades ./server-minimal-amd64:23:unattended-upgrades $ sudo apt rdepends unattended-upgrades unattended-upgrades Reverse Depends: Recommends: python3-software-properties Recommends: ubuntu-mate-desktop Recommends: ubuntu-mate-core Depends: freedombox Recommends: fbx-all Depends: ubuntu-server-minimal ** Affects: ubuntu-meta (Ubuntu) Importance: Undecided Status: New ** Affects: ubuntu-meta (Ubuntu Impish) Importance: Undecided Status: New ** Affects: ubuntu-meta (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: ubuntu-meta (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: ubuntu-meta (Ubuntu Impish) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to ubuntu-meta in Ubuntu. https://bugs.launchpad.net/bugs/1954724 Title: Removing unattended-upgrades removes ubuntu-server-minimal Status in ubuntu-meta package in Ubuntu: New Status in ubuntu-meta source package in Impish: New Status in ubuntu-meta source package in Jammy: New Bug description: On Impish and later, removing unattended-upgrades also removes ubuntu- server-minimal due to ubuntu-server-minimal depending on unattended- upgrades $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: ubuntu-server-minimal unattended-upgrades 0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded. This behaviour has changed since ubuntu-meta 1.471 [1] when the ubuntu-server-minimal metapackage was introduced, declaring unattended-upgrades as Depends. [1] https://launchpadlibrarian.net/550345392/ubuntu- meta_1.470_1.471.diff.gz On Focal, there was no such behaviour on a fresh ubuntu-server install: $ sudo apt remove unattended-upgrades ... The following packages will be REMOVED: unattended-upgrades 0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded. Removing unattended-upgrades is quite popular amongst our users, and they should be allowed to remove the package without removing the ubuntu-server-minimal metapackage. Looking at the source package for ubuntu-meta, unattended-upgrades is only Depends for ubuntu-server-minimal, maybe we should simply remove it? $ grep -Rin "unattended-upgrades" . ./server-minimal-armhf:23:unattended-upgrades ./server-minimal-riscv64:23:unattended-upgrades ./server-minimal-arm64:23:unattended-upgrades ./server-minimal-ppc64el:23:unattended-upgrades ./server-minimal-s390x:24:unattended-upgrades ./server-minimal-amd64:23:unattended-upgrades $ sudo apt rdepends unattended-upgrades unattended-upgrades Reverse Depends: Recommends: python3-software-properties Recommends: ubuntu-mate-desktop Recommends: ubuntu-mate-core Depends: freedombox Recommends: fbx-all Depends: ubuntu-server-minimal To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1954724/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1930359] Re: glib2.0: Uninitialised memory is written to gschema.compiled, failure to parse this file leads to gdm, gnome-shell failing to start
Performing verification for Focal. I will first reproduce the problem with glib2.0 2.64.6-1~ubuntu20.04.3 from -security with the libglib2.0-0 libglib2.0-bin libglib2.0-data packages. I deleted all existing schemas from /usr/share/glib-2.0/schemas and replaced them with a set of schemas which reproduce the problem easily from my customer. $ cd /usr/share/glib-2.0/schemas/ $ sudo rm * $ sudo cp ~/schemas/* . The gsettings.compiled from the customer has been corrupted, and when I reboot, gdm fails to start and I get a blank screen with a blinking insertion pointer. The sha256 of the customers corrupted gsettings.compiled is: $ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 2c98dc9a7fdbac858a8d5ca7e4dd813f16058a46dba2c54b5239cd8cdba5bb3e When I ssh back in, and recompile the file: $ sudo glib-compile-schemas /usr/share/glib-2.0/schemas Error parsing key “logout” in schema “org.gnome.settings-daemon.plugins.media-keys” as specified in override file “/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can not parse as value of type 'as'. Ignoring override for this key. $ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 78163b5fefbd6320ce0d355c9531bf657a4f4dc15f057d95ef144323cd56 The sha256 has changed. Doing a bindiff, I see: $ sudo cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' 376F E3 4F 3771 A4 C2 We see two bytes different. These bytes are the uninitialised memory this bug is about. When I reboot, gdm starts fine, but that is because this time I got lucky and the parser for the gschema.compiled file thinks 4F and C2 are okay. But there are combinations which aren't okay, and will end up with a corrupted gschema.compiled file. Re-compiling the file again: $ sudo glib-compile-schemas /usr/share/glib-2.0/schemas Error parsing key “logout” in schema “org.gnome.settings-daemon.plugins.media-keys” as specified in override file “/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can not parse as value of type 'as'. Ignoring override for this key. $ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 460c70faca7afc26fa88a0e5918d312478e15f20ad84f4afaa5d17627a823e01 The sha256 changed, and if we bindiff, the bytes have changed: $ sudo cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' 376F E3 A6 3771 A4 A1 If we run glib-compile-schemas through valgrind, it reports that we are writing to uninitialised memory: https://paste.ubuntu.com/p/sxrQtbswpw/ I then enabled -proposed and installed libglib2.0-0 libglib2.0-bin libglib2.0-data version 2.64.6-1~ubuntu20.04.4. Now, when I re-compile the gschemas.compiled file, the sha256 matches every time, meaning no more non-deterministic behaviour caused by writing unitialised memory to disk: $ sudo glib-compile-schemas /usr/share/glib-2.0/schemas Error parsing key “logout” in schema “org.gnome.settings-daemon.plugins.media-keys” as specified in override file “/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can not parse as value of type 'as'. Ignoring override for this key. $ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= cd9132d18b596a304251cd1eb50b64aa6fd7511a312906f9a49e1975a319fbf1 $ sudo glib-compile-schemas /usr/share/glib-2.0/schemas Error parsing key “logout” in schema “org.gnome.settings-daemon.plugins.media-keys” as specified in override file “/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can not parse as value of type 'as'. Ignoring override for this key. $ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= cd9132d18b596a304251cd1eb50b64aa6fd7511a312906f9a49e1975a319fbf1 Doing a bindiff, I see the changed bytes from before are now all zeros, which is what the patch initialises the buffer to: $ sudo cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' 376F E3 00 3771 A4 00 3772 55 00 Doing a run through valgrind, we no longer get a report about writing to uninitialised memory: https://paste.ubuntu.com/p/z52DGZcdz3/ Rebooting, the VM comes up and GDM starts properly, so glib can parse the gsettings.compiled file without any issues. Wonderful. The problem is fixed by the package in -proposed, happy to mark as verified. ** Tags removed: sts-sponsor verification-needed verification-needed-focal ** Tags added:
[Touch-packages] [Bug 1930359] Re: glib2.0: Uninitialised memory is written to gschema.compiled, failure to parse this file leads to gdm, gnome-shell failing to start
Attached is a debdiff for glib2.0 on Focal which fixes this problem. ** Patch added: "Debdiff for glib2.0 for Focal" https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/1930359/+attachment/5510466/+files/lp1930359_focal.debdiff ** Tags removed: regression-update ** Tags added: sts-sponsor -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to glib2.0 in Ubuntu. https://bugs.launchpad.net/bugs/1930359 Title: glib2.0: Uninitialised memory is written to gschema.compiled, failure to parse this file leads to gdm, gnome-shell failing to start Status in glib2.0 package in Ubuntu: Fix Released Status in glib2.0 source package in Focal: In Progress Bug description: [Impact] A recent SRU of mutter 3.36.9-0ubuntu0.20.04.1 caused an outage for a user with 300 VDIs running Focal, where GNOME applications would fail to start, and if you reboot, gdm and gnome-shell both fail to start, and you are left with a black screen and a blinking cursor. After much investigation, mutter was not at fault. Instead, mutter- common calls the libglib2.0-0 hook on upgrade: Processing triggers for libglib2.0-0:amd64 (2.64.6-1~ubuntu20.04.3) ... This in turn calls glib-compile-schemas to recompile the gsettings gschema cache, from the files in /usr/share/glib-2.0/schemas/. The result is a binary gschemas.compiled file, which is loaded by libglib2.0 on every invocation of a GNOME application, or gdm or gnome-shell to fetch application default settings. Now, glib2.0 2.64.6-1~ubuntu20.04.3 in Focal has some non- deterministic behaviour when calling glib-compile-schemas, causing generated gschemas.compiled files to have differing contents on each run: # glib-compile-schemas /usr/share/glib-2.0/schemas # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' 376F E3 D0 3771 A4 DB # glib-compile-schemas /usr/share/glib-2.0/schemas # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' 376F E3 C3 3771 A4 98 # glib-compile-schemas /usr/share/glib-2.0/schemas # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' 376F E3 68 3771 A4 30 3772 55 56 The bytes on the left are from a corrupted gschemas.compiled provided by an affected user. The changing bytes on the right are non- deterministic. I ran valgrind over glib-compile-schemas, and found that we are writing to uninitialised memory. https://paste.ubuntu.com/p/hvZccwdzxz/ What is happening is that a submodule of glib, gvdb, contains the logic for serialising the gschema data structures, and when it allocates a buffer to store the eventual gschemas.compiled file, it does not initialise it. When we populate the fields in the buffer, some bytes are never overwritten, and these junk bytes find themselves written to gschemas.compiled. On boot, when gdm and gnome-shell attempt to parse and load this corrupted gschemas.compiled file, it can't parse the junk bytes, and raises and error, which propagates up to a breakpoint in glib logging, but no debugger is present, so the kernel traps the breakpoint, and terminates the library, and the calling application, e.g. gdm. The result is that the user is left starting at a black screen with a blinking pointer. [Testcase] On a Focal system, simply run valgrind over glib-compile-schemas: # valgrind glib-compile-schemas /usr/share/glib-2.0/schemas You will get output like this, with the warning "Syscall param write(buf) points to uninitialised byte(s)": https://paste.ubuntu.com/p/hvZccwdzxz/ If you happen to have a large amount of gschema overrides present on your system, like my affected user does, you can save a copy of a generated gschema.compiled to your home directory and bindiff it against recompiles: # glib-compile-schemas /usr/share/glib-2.0/schemas # cp /usr/share/glib-2.0/schemas/gschema.compiled /home/ubuntu/schemas/gschemas.compiled # glib-compile-schemas /usr/share/glib-2.0/schemas # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' 376F E3 C3 3771 A4 98 If you install the test package from the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf311791-test When you run valgrind, it will report a clean run with no writing to uninitialised buffers, and all invocations of glib-compile-schemas will be deterministic, and generate the file same with the same sha256 hash every time. The unwritten bytes if you do a
[Touch-packages] [Bug 1930359] Re: gdm fails to start in a VMware Horizon VDI environment with latest mutter 3.36.9-0ubuntu0.20.04.1 in focal-updates
** No longer affects: mutter (Ubuntu) ** No longer affects: mutter (Ubuntu Focal) ** Changed in: glib2.0 (Ubuntu) Status: New => Fix Released ** Changed in: glib2.0 (Ubuntu Focal) Status: New => In Progress ** Changed in: glib2.0 (Ubuntu Focal) Importance: Undecided => High ** Summary changed: - gdm fails to start in a VMware Horizon VDI environment with latest mutter 3.36.9-0ubuntu0.20.04.1 in focal-updates + glib2.0: Uninitialised memory is written to gschema.compiled, failure to parse this file leads to gdm, gnome-shell failing to start ** Description changed: [Impact] - gdm fails to start in a VMware Horizon VDI environment, with Nvidia GRID - gpus passed into the VDIs. + A recent SRU of mutter 3.36.9-0ubuntu0.20.04.1 caused an outage for a + user with 300 VDIs running Focal, where GNOME applications would fail to + start, and if you reboot, gdm and gnome-shell both fail to start, and + you are left with a black screen and a blinking cursor. - Downgrading mutter from 3.36.9-0ubuntu0.20.04.1 to 3.36.1-3ubuntu3 in - -release fixes the issue, and the issue does not occur with - 3.36.7+git20201123-0.20.04.1. + After much investigation, mutter was not at fault. Instead, mutter- + common calls the libglib2.0-0 hook on upgrade: - Currently looking into what landed in bug 1919143 and bug 1905825. + Processing triggers for libglib2.0-0:amd64 (2.64.6-1~ubuntu20.04.3) ... + + This in turn calls glib-compile-schemas to recompile the gsettings + gschema cache, from the files in /usr/share/glib-2.0/schemas/. The + result is a binary gschemas.compiled file, which is loaded by libglib2.0 + on every invocation of a GNOME application, or gdm or gnome-shell to + fetch application default settings. + + Now, glib2.0 2.64.6-1~ubuntu20.04.3 in Focal has some non-deterministic + behaviour when calling glib-compile-schemas, causing generated + gschemas.compiled files to have differing contents on each run: + + # glib-compile-schemas /usr/share/glib-2.0/schemas + # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' + 376F E3 D0 + 3771 A4 DB + + # glib-compile-schemas /usr/share/glib-2.0/schemas + # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' + 376F E3 C3 + 3771 A4 98 + + # glib-compile-schemas /usr/share/glib-2.0/schemas + # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' + 376F E3 68 + 3771 A4 30 + 3772 55 56 + + The bytes on the left are from a corrupted gschemas.compiled provided by + an affected user. The changing bytes on the right are non-deterministic. + + I ran valgrind over glib-compile-schemas, and found that we are writing + to uninitialised memory. + + https://paste.ubuntu.com/p/hvZccwdzxz/ + + What is happening is that a submodule of glib, gvdb, contains the logic + for serialising the gschema data structures, and when it allocates a + buffer to store the eventual gschemas.compiled file, it does not + initialise it. + + When we populate the fields in the buffer, some bytes are never + overwritten, and these junk bytes find themselves written to + gschemas.compiled. + + On boot, when gdm and gnome-shell attempt to parse and load this + corrupted gschemas.compiled file, it can't parse the junk bytes, and + raises and error, which propagates up to a breakpoint in glib logging, + but no debugger is present, so the kernel traps the breakpoint, and + terminates the library, and the calling application, e.g. gdm. + + The result is that the user is left starting at a black screen with a + blinking pointer. [Testcase] + On a Focal system, simply run valgrind over glib-compile-schemas: + + # valgrind glib-compile-schemas /usr/share/glib-2.0/schemas + + You will get output like this, with the warning "Syscall param + write(buf) points to uninitialised byte(s)": + + https://paste.ubuntu.com/p/hvZccwdzxz/ + + If you happen to have a large amount of gschema overrides present on + your system, like my affected user does, you can save a copy of a + generated gschema.compiled to your home directory and bindiff it against + recompiles: + + # glib-compile-schemas /usr/share/glib-2.0/schemas + # cp /usr/share/glib-2.0/schemas/gschema.compiled /home/ubuntu/schemas/gschemas.compiled + # glib-compile-schemas /usr/share/glib-2.0/schemas + # cmp -l /home/ubuntu/schemas/gschemas.compiled /usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}' + 376F E3 C3 + 3771 A4 98 + + If you install the test package from the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf311791-test + + When you run valgrind, it will
[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix
Performing verification for Groovy I enabled -proposed and installed libpam-modules libpam-modules-bin libpam-runtime libpam0g version 1.3.1-5ubuntu6.20.10.1 >From there, I set the pam_faillock configuration in: /etc/security/faillock.conf: deny = 3 unlock_time = 120 and also: /etc/pam.d/common-auth: # here are the per-package modules (the "Primary" block) authrequisite pam_faillock.so preauth auth[success=1 default=ignore] pam_unix.so nullok_secure auth[default=die] pam_faillock.so authfail authsufficient pam_faillock.so authsucc # here's the fallback if no module succeeds authrequisite pam_deny.so # prime the stack with a positive return value if there isn't one already; # this avoids us returning an error just because nothing sets a success code # since the modules above will each just jump around authrequiredpam_permit.so # and here are more per-package modules (the "Additional" block) authoptionalpam_cap.so # end of pam-auth-update config >From there, I created a new user "dave", and rebooted the system. I connected via ssh with the "dave" user and used the wrong password 5 times. I then tried with the correct password and found the account to be locked. I waited 2 minutes, and tried again with the correct password, and I was logged in. When the account was locked, I logged in as the "ubuntu" user and ran: $ sudo faillock --user dave dave: WhenType Source Valid 2021-05-19 02:08:53 RHOST 192.168.122.1V 2021-05-19 02:08:58 RHOST 192.168.122.1V 2021-05-19 02:09:02 RHOST 192.168.122.1V And I could see the times that "dave" was locked. I also tested resetting via: $ sudo faillock --user dave --reset and "dave" was allowed to log in again. My tests agree with what Richard sees. Marking as verified for Groovy. ** Tags removed: verification-needed verification-needed-groovy ** Tags added: verification-done verification-done-groovy -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to pam in Ubuntu. https://bugs.launchpad.net/bugs/1927796 Title: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix Status in pam package in Ubuntu: Fix Committed Status in pam source package in Bionic: Fix Committed Status in pam source package in Focal: Fix Committed Status in pam source package in Groovy: Fix Committed Status in pam source package in Hirsute: Fix Committed Status in pam source package in Impish: Fix Committed Bug description: [IMPACT] There is a known issue in pam_tally2 which may cause an account to be lock down even with correct password, in a busy node environment where simultaneous logins takes place (https://github.com/linux-pam/linux-pam/issues/71). There are already two customer cases from Canonical clients complaining about this behavior (00297697 and 00303806). Also, potentially, this will cause further problems in the future, since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to lock accounts when wrong passwords are used. And both benchmarks - but specially STIG - requires use of a lot of audit rules, which can lead to the busy node environment. The issue impacts all pam_tally2 versions distributed in all currently supported Ubuntu versions and also the next unreleased one. Note that, according to https://github.com/linux-pam/linux-pam/issues/71, there is no plan to fix this issue! [FIX] This fix proposes to add pam_faillock module to the PAM package, so users of pam_tally2 having issues can migrate to pam_faillock. We also plan to modify the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but in order to do so, we need the pam_faillock module to be available. Note that we don't propose to remove pam_tally2, since not every user of this module is affected. [TEST] Tested on a VM installed with Focal server iso and on another with Bionic server iso. Enabled pam_faillock module as recommeded by its man page. Then tried to log over ssh with an incorrect password, until the account got locked. Waited for the configured grace time to unlock and logged in using the correct password. Note that, since the pam_tally2 issue is caused by a racing condition, with a hard to recreate environment (we could not even reproduce it with pam_tally2), we could not reproduce the conditions to test pam_faillock with. [REGRESSION POTENTIAL] The regression potential for this is small, since we're not removing the old pam_tally2 module, just adding another one. So anyone still using pam_tally2 will be able to do so. To manage
[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix
Performing verification for Hirsute I enabled -proposed and installed libpam-modules libpam-modules-bin libpam-runtime libpam0g version 1.3.1-5ubuntu6.21.04.1 >From there, I set the pam_faillock configuration in: /etc/security/faillock.conf: deny = 3 unlock_time = 120 and also: /etc/pam.d/common-auth: # here are the per-package modules (the "Primary" block) authrequisite pam_faillock.so preauth auth[success=1 default=ignore] pam_unix.so nullok_secure auth[default=die] pam_faillock.so authfail authsufficient pam_faillock.so authsucc # here's the fallback if no module succeeds authrequisite pam_deny.so # prime the stack with a positive return value if there isn't one already; # this avoids us returning an error just because nothing sets a success code # since the modules above will each just jump around authrequiredpam_permit.so # and here are more per-package modules (the "Additional" block) authoptionalpam_cap.so # end of pam-auth-update config >From there, I created a new user "dave", and rebooted the system. I connected via ssh with the "dave" user and used the wrong password 5 times. I then tried with the correct password and found the account to be locked. I waited 2 minutes, and tried again with the correct password, and I was logged in. When the account was locked, I logged in as the "ubuntu" user and ran: $ sudo faillock --user dave dave: WhenType Source Valid 2021-05-19 01:50:25 RHOST 192.168.122.1V 2021-05-19 01:50:28 RHOST 192.168.122.1V 2021-05-19 01:50:31 RHOST 192.168.122.1V And I could see the times that "dave" was locked. I also tested resetting via: $ sudo faillock --user dave --reset and "dave" was allowed to log in again. My tests agree with what Richard sees. Marking as verified for Hirsute. ** Tags removed: verification-needed-hirsute ** Tags added: verification-done-hirsute -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to pam in Ubuntu. https://bugs.launchpad.net/bugs/1927796 Title: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix Status in pam package in Ubuntu: Fix Committed Status in pam source package in Bionic: Fix Committed Status in pam source package in Focal: Fix Committed Status in pam source package in Groovy: Fix Committed Status in pam source package in Hirsute: Fix Committed Status in pam source package in Impish: Fix Committed Bug description: [IMPACT] There is a known issue in pam_tally2 which may cause an account to be lock down even with correct password, in a busy node environment where simultaneous logins takes place (https://github.com/linux-pam/linux-pam/issues/71). There are already two customer cases from Canonical clients complaining about this behavior (00297697 and 00303806). Also, potentially, this will cause further problems in the future, since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to lock accounts when wrong passwords are used. And both benchmarks - but specially STIG - requires use of a lot of audit rules, which can lead to the busy node environment. The issue impacts all pam_tally2 versions distributed in all currently supported Ubuntu versions and also the next unreleased one. Note that, according to https://github.com/linux-pam/linux-pam/issues/71, there is no plan to fix this issue! [FIX] This fix proposes to add pam_faillock module to the PAM package, so users of pam_tally2 having issues can migrate to pam_faillock. We also plan to modify the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but in order to do so, we need the pam_faillock module to be available. Note that we don't propose to remove pam_tally2, since not every user of this module is affected. [TEST] Tested on a VM installed with Focal server iso and on another with Bionic server iso. Enabled pam_faillock module as recommeded by its man page. Then tried to log over ssh with an incorrect password, until the account got locked. Waited for the configured grace time to unlock and logged in using the correct password. Note that, since the pam_tally2 issue is caused by a racing condition, with a hard to recreate environment (we could not even reproduce it with pam_tally2), we could not reproduce the conditions to test pam_faillock with. [REGRESSION POTENTIAL] The regression potential for this is small, since we're not removing the old pam_tally2 module, just adding another one. So anyone still using pam_tally2 will be able to do so. To manage notifications about this bug go to:
[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix
Performing verification for Focal I enabled -proposed and installed libpam-modules libpam-modules-bin libpam-runtime libpam0g version 1.3.1-5ubuntu4.2 >From there, I set the pam_faillock configuration in: /etc/security/faillock.conf: deny = 3 unlock_time = 120 and also: /etc/pam.d/common-auth: # here are the per-package modules (the "Primary" block) authrequisite pam_faillock.so preauth auth[success=1 default=ignore] pam_unix.so nullok_secure auth[default=die] pam_faillock.so authfail authsufficient pam_faillock.so authsucc # here's the fallback if no module succeeds authrequisite pam_deny.so # prime the stack with a positive return value if there isn't one already; # this avoids us returning an error just because nothing sets a success code # since the modules above will each just jump around authrequiredpam_permit.so # and here are more per-package modules (the "Additional" block) authoptionalpam_cap.so # end of pam-auth-update config >From there, I created a new user "dave", and rebooted the system. I connected via ssh with the "dave" user and used the wrong password 5 times. I then tried with the correct password and found the account to be locked. I waited 2 minutes, and tried again with the correct password, and I was logged in. When the account was locked, I logged in as the "ubuntu" user and ran: $ sudo faillock --user dave dave: WhenType Source Valid 2021-05-19 00:31:08 RHOST 192.168.122.1V 2021-05-19 00:31:13 RHOST 192.168.122.1V 2021-05-19 00:31:17 RHOST 192.168.122.1V And I could see the times that "dave" was locked. I also tested resetting via: $ sudo faillock --user dave --reset and "dave" was allowed to log in again. My tests agree with what Richard sees. Marking as verified for Focal. ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to pam in Ubuntu. https://bugs.launchpad.net/bugs/1927796 Title: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix Status in pam package in Ubuntu: Fix Committed Status in pam source package in Bionic: Fix Committed Status in pam source package in Focal: Fix Committed Status in pam source package in Groovy: Fix Committed Status in pam source package in Hirsute: Fix Committed Status in pam source package in Impish: Fix Committed Bug description: [IMPACT] There is a known issue in pam_tally2 which may cause an account to be lock down even with correct password, in a busy node environment where simultaneous logins takes place (https://github.com/linux-pam/linux-pam/issues/71). There are already two customer cases from Canonical clients complaining about this behavior (00297697 and 00303806). Also, potentially, this will cause further problems in the future, since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to lock accounts when wrong passwords are used. And both benchmarks - but specially STIG - requires use of a lot of audit rules, which can lead to the busy node environment. The issue impacts all pam_tally2 versions distributed in all currently supported Ubuntu versions and also the next unreleased one. Note that, according to https://github.com/linux-pam/linux-pam/issues/71, there is no plan to fix this issue! [FIX] This fix proposes to add pam_faillock module to the PAM package, so users of pam_tally2 having issues can migrate to pam_faillock. We also plan to modify the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but in order to do so, we need the pam_faillock module to be available. Note that we don't propose to remove pam_tally2, since not every user of this module is affected. [TEST] Tested on a VM installed with Focal server iso and on another with Bionic server iso. Enabled pam_faillock module as recommeded by its man page. Then tried to log over ssh with an incorrect password, until the account got locked. Waited for the configured grace time to unlock and logged in using the correct password. Note that, since the pam_tally2 issue is caused by a racing condition, with a hard to recreate environment (we could not even reproduce it with pam_tally2), we could not reproduce the conditions to test pam_faillock with. [REGRESSION POTENTIAL] The regression potential for this is small, since we're not removing the old pam_tally2 module, just adding another one. So anyone still using pam_tally2 will be able to do so. To manage notifications about this bug go to:
[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix
Performing verification for Bionic I enabled -proposed and installed libpam-modules libpam-modules-bin libpam-runtime libpam0g version 1.1.8-3.6ubuntu2.18.04.3 >From there, I set the pam_faillock configuration in: /etc/security/faillock.conf: deny = 3 unlock_time = 120 and also: /etc/pam.d/common-auth: # here are the per-package modules (the "Primary" block) authrequisite pam_faillock.so preauth auth[success=1 default=ignore] pam_unix.so nullok_secure auth[default=die] pam_faillock.so authfail authsufficient pam_faillock.so authsucc # here's the fallback if no module succeeds authrequisite pam_deny.so # prime the stack with a positive return value if there isn't one already; # this avoids us returning an error just because nothing sets a success code # since the modules above will each just jump around authrequiredpam_permit.so # and here are more per-package modules (the "Additional" block) authoptionalpam_cap.so # end of pam-auth-update config >From there, I created a new user "dave", and rebooted the system. I connected via ssh with the "dave" user and used the wrong password 5 times. I then tried with the correct password and found the account to be locked. I waited 2 minutes, and tried again with the correct password, and I was logged in. When the account was locked, I logged in as the "ubuntu" user and ran: $ sudo faillock --user dave dave: WhenType Source Valid 2021-05-19 00:57:10 RHOST 192.168.122.1V 2021-05-19 00:57:12 RHOST 192.168.122.1V 2021-05-19 00:57:16 RHOST 192.168.122.1V And I could see the times that "dave" was locked. I also tested resetting via: $ sudo faillock --user dave --reset and "dave" was allowed to log in again. My tests agree with what Richard sees. Marking as verified for Bionic. ** Tags removed: verification-needed-bionic ** Tags added: sts verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to pam in Ubuntu. https://bugs.launchpad.net/bugs/1927796 Title: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix Status in pam package in Ubuntu: Fix Committed Status in pam source package in Bionic: Fix Committed Status in pam source package in Focal: Fix Committed Status in pam source package in Groovy: Fix Committed Status in pam source package in Hirsute: Fix Committed Status in pam source package in Impish: Fix Committed Bug description: [IMPACT] There is a known issue in pam_tally2 which may cause an account to be lock down even with correct password, in a busy node environment where simultaneous logins takes place (https://github.com/linux-pam/linux-pam/issues/71). There are already two customer cases from Canonical clients complaining about this behavior (00297697 and 00303806). Also, potentially, this will cause further problems in the future, since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to lock accounts when wrong passwords are used. And both benchmarks - but specially STIG - requires use of a lot of audit rules, which can lead to the busy node environment. The issue impacts all pam_tally2 versions distributed in all currently supported Ubuntu versions and also the next unreleased one. Note that, according to https://github.com/linux-pam/linux-pam/issues/71, there is no plan to fix this issue! [FIX] This fix proposes to add pam_faillock module to the PAM package, so users of pam_tally2 having issues can migrate to pam_faillock. We also plan to modify the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but in order to do so, we need the pam_faillock module to be available. Note that we don't propose to remove pam_tally2, since not every user of this module is affected. [TEST] Tested on a VM installed with Focal server iso and on another with Bionic server iso. Enabled pam_faillock module as recommeded by its man page. Then tried to log over ssh with an incorrect password, until the account got locked. Waited for the configured grace time to unlock and logged in using the correct password. Note that, since the pam_tally2 issue is caused by a racing condition, with a hard to recreate environment (we could not even reproduce it with pam_tally2), we could not reproduce the conditions to test pam_faillock with. [REGRESSION POTENTIAL] The regression potential for this is small, since we're not removing the old pam_tally2 module, just adding another one. So anyone still using pam_tally2 will be able to do so. To manage notifications about this bug go
[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
Performing verification for Groovy. I went and generated the ssl certificates and attempted to verify them with the openssl version 1.1.1f-1ubuntu4.3 from -updates. ubuntu@deep-mako:~$ sudo apt-cache policy openssl | grep Installed Installed: 1.1.1f-1ubuntu4.3 ubuntu@deep-mako:~$ mkdir reproducer ubuntu@deep-mako:~$ cd reproducer ubuntu@deep-mako:~/reproducer$ mkdir CA ubuntu@deep-mako:~/reproducer$ cat << EOF >> rootCA.cnf > [ req ] > prompt = no > distinguished_name = req_distinguished_name > x509_extensions = usr_cert > > [ req_distinguished_name ] > C = DE > O = Test Org > CN = Test RSA PSS Root-CA > > [ usr_cert ] > basicConstraints = critical,CA:TRUE > keyUsage = critical,keyCertSign,cRLSign > subjectKeyIdentifier = hash > authorityKeyIdentifier = keyid:always > EOF ubuntu@deep-mako:~/reproducer$ cat << EOF >> subCA.cnf > [ req ] > prompt = no > distinguished_name = req_distinguished_name > x509_extensions = usr_cert > > [ req_distinguished_name ] > C = DE > O = Test Org > CN = Test RSA PSS Sub-CA > > [ usr_cert ] > basicConstraints = critical,CA:TRUE,pathlen:0 > keyUsage = critical,keyCertSign,cRLSign > subjectKeyIdentifier = hash > authorityKeyIdentifier = keyid:always > EOF ubuntu@deep-mako:~/reproducer$ cat << EOF >> user.cnf > [ req ] > prompt = no > distinguished_name = req_distinguished_name > x509_extensions = usr_cert > > [ req_distinguished_name ] > C = DE > O = Test Org > CN = Test User > > [ usr_cert ] > basicConstraints = critical,CA:FALSE,pathlen:0 > keyUsage = critical,digitalSignature,keyAgreement > extendedKeyUsage = clientAuth,serverAuth > subjectKeyIdentifier = hash > authorityKeyIdentifier = keyid:always > EOF ubuntu@deep-mako:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt rsa_keygen_bits:2048 + + ubuntu@deep-mako:~/reproducer$ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 ubuntu@deep-mako:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt rsa_keygen_bits:2048 ..+ .+ ubuntu@deep-mako:~/reproducer$ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 ubuntu@deep-mako:~/reproducer$ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Signature ok subject=C = DE, O = Test Org, CN = Test RSA PSS Sub-CA Getting CA Private Key ubuntu@deep-mako:~/reproducer$ c_rehash CA Doing CA ubuntu@deep-mako:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt rsa_keygen_bits:2048 ...+ .+ ubuntu@deep-mako:~/reproducer$ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 ubuntu@deep-mako:~/reproducer$ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Signature ok subject=C = DE, O = Test Org, CN = Test User Getting CA Private Key Now going and verifying the certificates: ubuntu@deep-mako:~/reproducer$ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem C = DE, O = Test Org, CN = Test User error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed We see verification failed, again on CA:FALSE,pathlen:0 basicConstraints. Now if we enable -proposed and install openssl 1.1.1f-1ubuntu4.4. $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem user1_cert.pem: OK The certificate verifies properly. The problem is fixed. Additionally, if we examine the new unit tests added to openssl's testsuite in the buildlog for Groovy: https://launchpadlibrarian.net/537503607/buildlog_ubuntu-groovy- amd64.openssl_1.1.1f-1ubuntu4.4_BUILDING.txt.gz We see: ../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level 1 -purpose sslserver -trusted ../../../test/certs/root-cert.pem -untrusted ../../../test/certs/ca-cert.pem ../../../test/certs/ee-pathlen.pem => 0 ok 84 - accept non-ca with pathlen:0 by default CN = server.example error 41 at 0 depth lookup: invalid or inconsistent certificate extension error ../../../test/certs/ee-pathlen.pem: verification failed ../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level 1 -purpose sslserver -x509_strict -trusted
[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
Performing verification for Focal Generating the ssl certificates, and reproducing the problem with version 1.1.1f-1ubuntu2.3 from -updates. ubuntu@select-lobster:~$ sudo apt-cache policy openssl | grep Installed Installed: 1.1.1f-1ubuntu2.3 ubuntu@select-lobster:~$ mkdir reproducer ubuntu@select-lobster:~$ cd reproducer ubuntu@select-lobster:~/reproducer$ mkdir CA ubuntu@select-lobster:~/reproducer$ cat << EOF >> rootCA.cnf > [ req ] > prompt = no > distinguished_name = req_distinguished_name > x509_extensions = usr_cert > > [ req_distinguished_name ] > C = DE > O = Test Org > CN = Test RSA PSS Root-CA > > [ usr_cert ] > basicConstraints = critical,CA:TRUE > keyUsage = critical,keyCertSign,cRLSign > subjectKeyIdentifier = hash > authorityKeyIdentifier = keyid:always > EOF ubuntu@select-lobster:~/reproducer$ cat << EOF >> subCA.cnf > [ req ] > prompt = no > distinguished_name = req_distinguished_name > x509_extensions = usr_cert > > [ req_distinguished_name ] > C = DE > O = Test Org > CN = Test RSA PSS Sub-CA > > [ usr_cert ] > basicConstraints = critical,CA:TRUE,pathlen:0 > keyUsage = critical,keyCertSign,cRLSign > subjectKeyIdentifier = hash > authorityKeyIdentifier = keyid:always > EOF ubuntu@select-lobster:~/reproducer$ cat << EOF >> user.cnf > [ req ] > prompt = no > distinguished_name = req_distinguished_name > x509_extensions = usr_cert > > [ req_distinguished_name ] > C = DE > O = Test Org > CN = Test User > > [ usr_cert ] > basicConstraints = critical,CA:FALSE,pathlen:0 > keyUsage = critical,digitalSignature,keyAgreement > extendedKeyUsage = clientAuth,serverAuth > subjectKeyIdentifier = hash > authorityKeyIdentifier = keyid:always > EOF ubuntu@select-lobster:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt rsa_keygen_bits:2048 ..+ + ubuntu@select-lobster:~/reproducer$ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 ubuntu@select-lobster:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt rsa_keygen_bits:2048 + + ubuntu@select-lobster:~/reproducer$ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 ubuntu@select-lobster:~/reproducer$ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Signature ok subject=C = DE, O = Test Org, CN = Test RSA PSS Sub-CA Getting CA Private Key ubuntu@select-lobster:~/reproducer$ c_rehash CA Doing CA ubuntu@select-lobster:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt rsa_keygen_bits:2048 ...+ .+ ubuntu@select-lobster:~/reproducer$ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 ubuntu@select-lobster:~/reproducer$ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Signature ok subject=C = DE, O = Test Org, CN = Test User Getting CA Private Key Now, we verify the certificates: ubuntu@select-lobster:~/reproducer$ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem C = DE, O = Test Org, CN = Test User error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed We see verification fail, due to CA:FALSE,pathlen:0 basicConstraints. I then enabled -proposed, and installed openssl and libssl1.1 version 1.1.1f-1ubuntu2.4 If we then repeat the certificate validation: ubuntu@select-lobster:~/reproducer$ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem user1_cert.pem: OK The certificates validate properly. Additionally, if we examine the new unit tests added to openssl's testsuite in the buildlog for focal: https://launchpadlibrarian.net/537505620/buildlog_ubuntu-focal- amd64.openssl_1.1.1f-1ubuntu2.4_BUILDING.txt.gz we see: ../../../test/certs/ee-pathlen.pem: OK ../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level 1 -purpose sslserver -trusted ../../../test/certs/root-cert.pem -untrusted ../../../test/certs/ca-cert.pem ../../../test/certs/ee-pathlen.pem => 0 ok 84 - accept non-ca with pathlen:0 by default CN = server.example error 41 at 0 depth lookup: invalid or inconsistent certificate extension error ../../../test/certs/ee-pathlen.pem: verification failed ../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
** Description changed: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 + Test builds are available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/lp1927161-test + [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 Test builds are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1927161-test [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
Attached is a V2 for hirsute which correctly has d/p/ in the debian/changelog. ** Patch added: "debdiff for openssl on hirsute" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494814/+files/lp1927161_hirsute_v2.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
Attached is a V2 for impish which correctly has d/p/ in the debian/changelog. ** Patch added: "debdiff for openssl on impish" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494813/+files/lp1927161_impish_v2.debdiff ** Patch removed: "debdiff for openssl on hirsute" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494811/+files/lp1927161_hirsute.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
Attached is a debdiff for openssl on groovy, which fixes this issue, and also bug 1926254 ** Patch added: "debdiff for openssl on groovy" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494812/+files/lp1926254_lp1927161_groovy.debdiff ** Patch removed: "debdiff for openssl on impish" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494810/+files/lp1927161_impish.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
Attached is a debdiff for openssl on hirsute which fixes this problem. ** Patch added: "debdiff for openssl on hirsute" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494811/+files/lp1927161_hirsute.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
Attached is a debdiff for impish which fixes this problem. ** Patch added: "debdiff for openssl on impish" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494810/+files/lp1927161_impish.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
I split 'pr12272.patch' into one file per commit, and I did a diff to ensure that there is no changes to the code: https://paste.ubuntu.com/p/zDqqXmsM8c/ When using these split up patches "dpkg-buildpackage -d -S" completes successfully. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one
** Changed in: openssl (Ubuntu Groovy) Status: New => In Progress ** Changed in: openssl (Ubuntu Hirsute) Status: New => In Progress ** Changed in: openssl (Ubuntu Impish) Status: New => In Progress ** Changed in: openssl (Ubuntu Groovy) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: openssl (Ubuntu Hirsute) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: openssl (Ubuntu Impish) Assignee: (unassigned) => Matthew Ruffell (mruffell) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1927161 Title: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one Status in openssl package in Ubuntu: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: In Progress Status in openssl source package in Impish: In Progress Bug description: [impact] openssl doesn't build source properly because of a badly-constructed patch [test case] $ pull-lp-source openssl groovy ... $ cd openssl-1.1.1f/ $ quilt pop -a ... $ dpkg-buildpackage -d -S dpkg-buildpackage: info: source package openssl dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3 dpkg-buildpackage: info: source distribution groovy-security dpkg-buildpackage: info: source changed by Marc Deslauriers dpkg-source --before-build . dpkg-source: warning: can't parse dependency perl:native dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned exit status 25 [regression potential] any regression would likely cause a failed build or would affect the functionality that patch pr12272 was added for, which is adding support for Intel CET [scope] this is needed only for g and later this is caused by the bad patch 'pr12272.patch' which is only included in g/h/i, so this does not apply to f or earlier [other info] note that if the patches are applied, this bug is bypassed; i.e. if 'quilt pop -a' is removed from the test case above, the bug doesn't reproduce. this is only a problem when the patches aren't already applied and dpkg-buildpackage needs to call dpkg-source to apply the patches. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
Hi Seth, Thanks for the review. I read the commit you found: commit 1e41dadfa7b9f792ed0f4714a3d3d36f070cf30e Author: Dr. David von Oheimb Date: Sat Jun 27 16:16:12 2020 +0200 Subject: Extend X509 cert checks and error reporting in v3_{purp,crld}.c and x509_{set,vfy}.c Link: https://github.com/openssl/openssl/commit/1e41dadfa7b9f792ed0f4714a3d3d36f070cf30e Firstly, yes, you are right, this commit does refactor the code I am suggesting we SRU to focal and groovy, but upon further inspection, this commit was not backported to the 1.1.1 stable series, as it is missing from the OpenSSL_1_1_1-stable branch. As you mentioned, it is a fairly invasive change and modifies a lot of different x509 components, it isn't suitable to be backported to 1.1.1 stable anyway, and much less be acceptable for SRU to focal or groovy. I think we should stick to the small targeted commits I suggested for this SRU, since they are a part of 1.1.1 stable, and are already in hirsute onward. To test that the logic from the suggested commits to SRU matches this new refactor commit from version 3.0alpha, I went and built the master branch of openssl, which had commit d1a770414acd34c774248ce8efbe202fd7a44041 at HEAD. $ env LD_LIBRARY_PATH="/home/ubuntu/openssl/" ../openssl/apps/openssl version OpenSSL 3.0.0-alpha16-dev (Library: OpenSSL 3.0.0-alpha16-dev ) $ env LD_LIBRARY_PATH="/home/ubuntu/openssl/" ../openssl/apps/openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem user1_cert.pem: OK The logic matches and the reproducer certificates verify OK. This confirms we aren't backporting a short lived change, and that this behaviour is the desired and accepted outcome. @ddstreet Please go ahead and sponsor the SRU to -updates, thanks. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1926254 Title: x509 Certificate verification fails when basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: Fix Released Bug description: [Impact] In openssl 1.1.1f, the below commit was merged: commit ba4356ae4002a04e28642da60c551877eea804f7 Author: Bernd Edlinger Date: Sat Jan 4 15:54:53 2020 +0100 Subject: Fix error handling in x509v3_cache_extensions and related functions Link: https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7 This introduced a regression which caused certificate validation to fail when certificates violate RFC 5280 [1], namely, when a certificate has "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen by self-signed leaf certificates with an intermediate CA before the root CA. Because of this, openssl 1.1.1f rejects these certificates and they cannot be used in the system certificate store, and ssl connections fail when you try to use them to connect to a ssl endpoint. The error you see when you try verify is: $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed The exact same certificates work fine on Xenial, Bionic and Hirsute. [1] https://tools.ietf.org/html/rfc5280.html [Testcase] We will create our own root CA, intermediate CA and leaf server certificate. Create necessary directories: $ mkdir reproducer $ cd reproducer $ mkdir CA Write openssl configuration files to disk for each CA and cert: $ cat << EOF >> rootCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Root-CA [ usr_cert ] basicConstraints= critical,CA:TRUE keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> subCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Sub-CA [ usr_cert ] basicConstraints= critical,CA:TRUE,pathlen:0 keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> user.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test User [ usr_cert ] basicConstraints= critical,CA:FALSE,pathlen:0 keyUsage=
[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
** Tags added: sts-sponsor -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1926254 Title: x509 Certificate verification fails when basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: Fix Released Bug description: [Impact] In openssl 1.1.1f, the below commit was merged: commit ba4356ae4002a04e28642da60c551877eea804f7 Author: Bernd Edlinger Date: Sat Jan 4 15:54:53 2020 +0100 Subject: Fix error handling in x509v3_cache_extensions and related functions Link: https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7 This introduced a regression which caused certificate validation to fail when certificates violate RFC 5280 [1], namely, when a certificate has "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen by self-signed leaf certificates with an intermediate CA before the root CA. Because of this, openssl 1.1.1f rejects these certificates and they cannot be used in the system certificate store, and ssl connections fail when you try to use them to connect to a ssl endpoint. The error you see when you try verify is: $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed The exact same certificates work fine on Xenial, Bionic and Hirsute. [1] https://tools.ietf.org/html/rfc5280.html [Testcase] We will create our own root CA, intermediate CA and leaf server certificate. Create necessary directories: $ mkdir reproducer $ cd reproducer $ mkdir CA Write openssl configuration files to disk for each CA and cert: $ cat << EOF >> rootCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Root-CA [ usr_cert ] basicConstraints= critical,CA:TRUE keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> subCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Sub-CA [ usr_cert ] basicConstraints= critical,CA:TRUE,pathlen:0 keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> user.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test User [ usr_cert ] basicConstraints= critical,CA:FALSE,pathlen:0 keyUsage= critical,digitalSignature,keyAgreement extendedKeyUsage= clientAuth,serverAuth subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF Then generate the necessary RSA keys and form certificates: $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ c_rehash CA $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Now, let's try verify the generated certificates: $ openssl version OpenSSL 1.1.1f 31 Mar 2020 $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer
[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
Attached is a debdiff for openssl on Groovy which fixes this bug. ** Patch added: "Debdiff for openssl on Groovy" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1926254/+attachment/5493443/+files/lp1926254_groovy.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1926254 Title: x509 Certificate verification fails when basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: Fix Released Bug description: [Impact] In openssl 1.1.1f, the below commit was merged: commit ba4356ae4002a04e28642da60c551877eea804f7 Author: Bernd Edlinger Date: Sat Jan 4 15:54:53 2020 +0100 Subject: Fix error handling in x509v3_cache_extensions and related functions Link: https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7 This introduced a regression which caused certificate validation to fail when certificates violate RFC 5280 [1], namely, when a certificate has "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen by self-signed leaf certificates with an intermediate CA before the root CA. Because of this, openssl 1.1.1f rejects these certificates and they cannot be used in the system certificate store, and ssl connections fail when you try to use them to connect to a ssl endpoint. The error you see when you try verify is: $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed The exact same certificates work fine on Xenial, Bionic and Hirsute. [1] https://tools.ietf.org/html/rfc5280.html [Testcase] We will create our own root CA, intermediate CA and leaf server certificate. Create necessary directories: $ mkdir reproducer $ cd reproducer $ mkdir CA Write openssl configuration files to disk for each CA and cert: $ cat << EOF >> rootCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Root-CA [ usr_cert ] basicConstraints= critical,CA:TRUE keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> subCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Sub-CA [ usr_cert ] basicConstraints= critical,CA:TRUE,pathlen:0 keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> user.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test User [ usr_cert ] basicConstraints= critical,CA:FALSE,pathlen:0 keyUsage= critical,digitalSignature,keyAgreement extendedKeyUsage= clientAuth,serverAuth subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF Then generate the necessary RSA keys and form certificates: $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ c_rehash CA $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Now, let's try verify the generated
[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
Attached is a debdiff for openssl on Focal which fixes this bug. ** Patch added: "Debdiff for openssl on focal" https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1926254/+attachment/5493442/+files/lp1926254_focal.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to openssl in Ubuntu. https://bugs.launchpad.net/bugs/1926254 Title: x509 Certificate verification fails when basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs Status in openssl package in Ubuntu: Fix Released Status in openssl source package in Focal: In Progress Status in openssl source package in Groovy: In Progress Status in openssl source package in Hirsute: Fix Released Bug description: [Impact] In openssl 1.1.1f, the below commit was merged: commit ba4356ae4002a04e28642da60c551877eea804f7 Author: Bernd Edlinger Date: Sat Jan 4 15:54:53 2020 +0100 Subject: Fix error handling in x509v3_cache_extensions and related functions Link: https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7 This introduced a regression which caused certificate validation to fail when certificates violate RFC 5280 [1], namely, when a certificate has "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen by self-signed leaf certificates with an intermediate CA before the root CA. Because of this, openssl 1.1.1f rejects these certificates and they cannot be used in the system certificate store, and ssl connections fail when you try to use them to connect to a ssl endpoint. The error you see when you try verify is: $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed The exact same certificates work fine on Xenial, Bionic and Hirsute. [1] https://tools.ietf.org/html/rfc5280.html [Testcase] We will create our own root CA, intermediate CA and leaf server certificate. Create necessary directories: $ mkdir reproducer $ cd reproducer $ mkdir CA Write openssl configuration files to disk for each CA and cert: $ cat << EOF >> rootCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Root-CA [ usr_cert ] basicConstraints= critical,CA:TRUE keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> subCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Sub-CA [ usr_cert ] basicConstraints= critical,CA:TRUE,pathlen:0 keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> user.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test User [ usr_cert ] basicConstraints= critical,CA:FALSE,pathlen:0 keyUsage= critical,digitalSignature,keyAgreement extendedKeyUsage= clientAuth,serverAuth subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF Then generate the necessary RSA keys and form certificates: $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ c_rehash CA $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Now, let's try verify the generated
[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
** Description changed: [Impact] In openssl 1.1.1f, the below commit was merged: commit ba4356ae4002a04e28642da60c551877eea804f7 Author: Bernd Edlinger Date: Sat Jan 4 15:54:53 2020 +0100 Subject: Fix error handling in x509v3_cache_extensions and related functions Link: https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7 This introduced a regression which caused certificate validation to fail when certificates violate RFC 5280 [1], namely, when a certificate has "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen by self-signed leaf certificates with an intermediate CA before the root CA. Because of this, openssl 1.1.1f rejects these certificates and they cannot be used in the system certificate store, and ssl connections fail when you try to use them to connect to a ssl endpoint. The error you see when you try verify is: $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed The exact same certificates work fine on Xenial, Bionic and Hirsute. [1] https://tools.ietf.org/html/rfc5280.html [Testcase] We will create our own root CA, intermediate CA and leaf server certificate. Create necessary directories: $ mkdir reproducer $ cd reproducer $ mkdir CA Write openssl configuration files to disk for each CA and cert: $ cat << EOF >> rootCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Root-CA [ usr_cert ] basicConstraints= critical,CA:TRUE keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> subCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Sub-CA [ usr_cert ] basicConstraints= critical,CA:TRUE,pathlen:0 keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> user.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test User [ usr_cert ] basicConstraints= critical,CA:FALSE,pathlen:0 keyUsage= critical,digitalSignature,keyAgreement extendedKeyUsage= clientAuth,serverAuth subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF Then generate the necessary RSA keys and form certificates: $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ c_rehash CA $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt rsa_keygen_bits:2048 $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1 Now, let's try verify the generated certificates: $ openssl version OpenSSL 1.1.1f 31 Mar 2020 $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed There are test packages available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf308725-test If you install these test packages, and attempt to verify, things work as planned. + $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem + user1_cert.pem: OK + [Where problems could occur] If a regression were to occur, it would occur around x509
[Touch-packages] [Bug 1926254] [NEW] x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs
view by the security team. One of the commits which fixes the issue adds two testcases to the openssl testsuite, which tests the "CA:FALSE, pathlen:0" certificates with and without -x509_strict, and tests to see if it passes without, and fails with. [Other info] This was reported in the upstream issue #11456 [2]: [2] https://github.com/openssl/openssl/issues/11456 I believe these three commits fix the issue: commit 00a0da2f021e6a0bc9519a6a9e5be66d45e6fc91 Author: Tomas Mraz Date: Thu Apr 2 15:56:12 2020 +0200 Subject: Allow certificates with Basic Constraints CA:false, pathlen:0 Link: https://github.com/openssl/openssl/commit/00a0da2f021e6a0bc9519a6a9e5be66d45e6fc91 commit 29e94f285f7f05b1aec6fa275e320bc5fa37ab1e Author: Tomas Mraz Date: Thu Apr 2 17:31:21 2020 +0200 Subject: Set X509_V_ERR_INVALID_EXTENSION error for invalid basic constraints Link: https://github.com/openssl/openssl/commit/29e94f285f7f05b1aec6fa275e320bc5fa37ab1e commit e78f2a8f269a4dcf820ca994e2b89b77972d79e1 Author: Tomas Mraz Date: Fri Apr 3 10:24:40 2020 +0200 Subject: Add test cases for the non CA certificate with pathlen:0 Link: https://github.com/openssl/openssl/commit/e78f2a8f269a4dcf820ca994e2b89b77972d79e1 These landed in openssl 1.1.1g, and hirsute already has these fixes. ** Affects: openssl (Ubuntu) Importance: Undecided Status: Fix Released ** Affects: openssl (Ubuntu Focal) Importance: Medium Assignee: Matthew Ruffell (mruffell) Status: In Progress ** Affects: openssl (Ubuntu Groovy) Importance: Medium Assignee: Matthew Ruffell (mruffell) Status: In Progress ** Affects: openssl (Ubuntu Hirsute) Importance: Undecided Status: Fix Released ** Tags: focal groovy sts ** Also affects: openssl (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: openssl (Ubuntu Hirsute) Importance: Undecided Status: New ** Also affects: openssl (Ubuntu Groovy) Importance: Undecided Status: New ** Changed in: openssl (Ubuntu) Status: New => Fix Released ** Changed in: openssl (Ubuntu Hirsute) Status: New => Fix Released ** Changed in: openssl (Ubuntu Focal) Status: New => In Progress ** Changed in: openssl (Ubuntu Groovy) Status: New => In Progress ** Changed in: openssl (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: openssl (Ubuntu Groovy) Importance: Undecided => Medium ** Changed in: openssl (Ubuntu Focal) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Changed in: openssl (Ubuntu Groovy) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Tags added: focal groovy sts ** Description changed: [Impact] In openssl 1.1.1f, the below commit was merged: commit ba4356ae4002a04e28642da60c551877eea804f7 Author: Bernd Edlinger Date: Sat Jan 4 15:54:53 2020 +0100 Subject: Fix error handling in x509v3_cache_extensions and related functions Link: https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7 This introduced a regression which caused certificate validation to fail when certificates violate RFC 5280 [1], namely, when a certificate has "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen by self-signed leaf certificates with an intermediate CA before the root CA. Because of this, openssl 1.1.1f rejects these certificates and they cannot be used in the system certificate store, and ssl connections fail when you try to use them to connect to a ssl endpoint. The error you see when you try verify is: $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem error 20 at 0 depth lookup: unable to get local issuer certificate error user1_cert.pem: verification failed The exact same certificates work fine on Xenial, Bionic and Hirsute. [1] https://tools.ietf.org/html/rfc5280.html [Testcase] We will create our own root CA, intermediate CA and leaf server certificate. Create necessary directories: $ mkdir reproducer $ cd reproducer $ mkdir CA Write openssl configuration files to disk for each CA and cert: $ cat << EOF >> rootCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Root-CA [ usr_cert ] basicConstraints= critical,CA:TRUE keyUsage= critical,keyCertSign,cRLSign subjectKeyIdentifier= hash authorityKeyIdentifier = keyid:always EOF $ cat << EOF >> subCA.cnf [ req ] prompt = no distinguished_name = req_distinguished_name x509_extensions = usr_cert [ req_distinguished_name ] C = DE O = Test Org CN = Test RSA PSS Sub-CA [ usr_cert ] basicConstraints
[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak
Performing verification for Focal. I installed rsyslog-relp 8.2001.0-1ubuntu1.1 and librelp0 1.5.0-1ubuntu2 from -updates. >From there I set up the configuration file, launched a new rsyslog instance, and used netcat to set 100 packets to the relp port. https://paste.ubuntu.com/p/HfSDvNJzpX/ As we can see, there are 100 sockets still open in the CLOSE_WAIT state. >From there I enabled -proposed and installed librelp 1.5.0-1ubuntu2.20.04.2. I started a new instance of rsyslog, and used netcat to send another 100 packets to the relp port. This time, all sockets were closed and not left in CLOSE_WAIT. https://paste.ubuntu.com/p/tjXHhQ2293/ I also ran the testcase from the upstream testsuite, imrelp- sessionbreak-vg.sh. I did this by: 1) pull-lp-source rsyslog focal 2) edit debian/rules, add --enable-valgrind, remove --without-valgrind-tests, 3) wget https://github.com/rsyslog/rsyslog/commit/baee0bd5420649329793746f0daf87c4f59fe6a6.patch 4) quilt import baee0bd5420649329793746f0daf87c4f59fe6a6.patch 5) quilt push 6) chmod +x tests/imrelp-sessionbreak-vg.sh 6) debuild -uc -us -b It will eventually build tests, and imrelp-sessionbreak-vg.sh passes: make[5]: Entering directory '/home/ubuntu/rsyslog-8.2001.0/tests' ... PASS: imrelp-sessionbreak-vg.sh ... We pass both the upstream testsuite and the testcase from the bug report. The file descriptor leak has been fixed, happy to mark as verified for Focal. ** Tags removed: verification-needed verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1908473 Title: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak Status in librelp package in Ubuntu: Fix Released Status in rsyslog package in Ubuntu: Fix Released Status in librelp source package in Focal: Fix Committed Status in rsyslog source package in Focal: Won't Fix Status in librelp source package in Groovy: Fix Committed Status in rsyslog source package in Groovy: Fix Released Status in librelp source package in Hirsute: Fix Released Status in rsyslog source package in Hirsute: Fix Released Bug description: [Impact] In recent versions of rsyslog and librelp, the imrelp module leaks file descriptors due to a bug where it does not correctly close sockets, and instead, leaves them in the CLOSE_WAIT state. This causes rsyslogd on busy servers to eventually hit the limit of maximum open files allowed, which locks rsyslogd up until it is restarted. A workaround is to restart rsyslogd every month or so to manually close all of the open sockets. Only users of the imrelp module are affected, and not rsyslog users in general. [Testcase] Install the rsyslog-relp module like so: $ sudo apt install rsyslog rsyslog-relp Next, generate a working directory, and make a config file that loads the relp module. $ sudo mkdir /workdir $ cat << EOF >> ./spool.conf \$LocalHostName spool \$AbortOnUncleanConfig on \$PreserveFQDN on global( workDirectory="/workdir" maxMessageSize="256k" ) main_queue(queue.type="Direct") module(load="imrelp") input( type="imrelp" name="imrelp" port="601" ruleset="spool" MaxDataSize="256k" ) ruleset(name="spool" queue.type="direct") { } # Just so rsyslog doesn't whine that we do not have outputs ruleset(name="noop" queue.type="direct") { action( type="omfile" name="omfile" file="/workdir/spool.log" ) } EOF Verify that the config is valid, then start a rsyslog server. $ sudo rsyslogd -f ./spool.conf -N9 $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid Fetch the rsyslogd PID and check for open files. $ RLOGPID=$(cat /workdir/rsyslogd.pid) $ sudo ls -l /proc/$RLOGPID/fd total 0 lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]' lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]' lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]' lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]' We have 3 sockets open by default. Next, use netcat to open 100 connections: $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done Now check for open file descriptors, and there will be an extra 100 sockets in the list: $ sudo ls -l /proc/$RLOGPID/fd https://paste.ubuntu.com/p/f6NQVNbZcR/ We can check the state of these sockets with: $ ss -t https://paste.ubuntu.com/p/7Ts2FbxJrg/ The listening sockets will be in CLOSE-WAIT, and the netcat sockets will be in FIN-WAIT-2. $ ss -t | grep CLOSE-WAIT | wc -l 100 If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test
[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak
Hi Mauricio, I filed bug 1912969 to fix the FTBFS for librelp on focal. Adjusting the packets down from 50,000 to 10,000 makes the build succeed on riscv64. I attached two debdiffs, one an incremental patch from 1.5.0-1ubuntu2.20.04.1, and the other a full patch from 1.5.0-1ubuntu2. Please review and sponsor. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1908473 Title: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak Status in librelp package in Ubuntu: Fix Released Status in rsyslog package in Ubuntu: Fix Released Status in librelp source package in Focal: Fix Committed Status in rsyslog source package in Focal: Won't Fix Status in librelp source package in Groovy: Fix Committed Status in rsyslog source package in Groovy: Fix Released Status in librelp source package in Hirsute: Fix Released Status in rsyslog source package in Hirsute: Fix Released Bug description: [Impact] In recent versions of rsyslog and librelp, the imrelp module leaks file descriptors due to a bug where it does not correctly close sockets, and instead, leaves them in the CLOSE_WAIT state. This causes rsyslogd on busy servers to eventually hit the limit of maximum open files allowed, which locks rsyslogd up until it is restarted. A workaround is to restart rsyslogd every month or so to manually close all of the open sockets. Only users of the imrelp module are affected, and not rsyslog users in general. [Testcase] Install the rsyslog-relp module like so: $ sudo apt install rsyslog rsyslog-relp Next, generate a working directory, and make a config file that loads the relp module. $ sudo mkdir /workdir $ cat << EOF >> ./spool.conf \$LocalHostName spool \$AbortOnUncleanConfig on \$PreserveFQDN on global( workDirectory="/workdir" maxMessageSize="256k" ) main_queue(queue.type="Direct") module(load="imrelp") input( type="imrelp" name="imrelp" port="601" ruleset="spool" MaxDataSize="256k" ) ruleset(name="spool" queue.type="direct") { } # Just so rsyslog doesn't whine that we do not have outputs ruleset(name="noop" queue.type="direct") { action( type="omfile" name="omfile" file="/workdir/spool.log" ) } EOF Verify that the config is valid, then start a rsyslog server. $ sudo rsyslogd -f ./spool.conf -N9 $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid Fetch the rsyslogd PID and check for open files. $ RLOGPID=$(cat /workdir/rsyslogd.pid) $ sudo ls -l /proc/$RLOGPID/fd total 0 lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]' lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]' lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]' lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]' We have 3 sockets open by default. Next, use netcat to open 100 connections: $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done Now check for open file descriptors, and there will be an extra 100 sockets in the list: $ sudo ls -l /proc/$RLOGPID/fd https://paste.ubuntu.com/p/f6NQVNbZcR/ We can check the state of these sockets with: $ ss -t https://paste.ubuntu.com/p/7Ts2FbxJrg/ The listening sockets will be in CLOSE-WAIT, and the netcat sockets will be in FIN-WAIT-2. $ ss -t | grep CLOSE-WAIT | wc -l 100 If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test When you open connections with netcat, these will be closed properly, and the file descriptor leak will be fixed. [Where problems could occur] If a regression were to occur, it would be limited to users of the imrelp module, which is a part of the rsyslogd-relp package, and depends on librelp. rsyslog-relp is not part of a default installation of rsyslog, and is opt in by changing a configuration file to enable imrelp. The changes to rsyslog implement a testcase which exercises the problematic code to ensure things are working as expected; this can be enabled manually on build, and has been verified to pass (#7). [Other] Upstream bug list: https://github.com/rsyslog/rsyslog/issues/4350 https://github.com/rsyslog/rsyslog/issues/4005 https://github.com/rsyslog/librelp/issues/188 https://github.com/rsyslog/librelp/pull/193 The following commits fix the problem: rsyslogd commit baee0bd5420649329793746f0daf87c4f59fe6a6 Author: Andre lorbach Date: Thu Apr 9 13:00:35 2020 +0200 Subject: testbench: Add test for imrelp to check broken session handling. Link:
[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak
Hi Mauricio, It seems riscv64 passes on Groovy due to tests being skipped on the riscv64 architecture. >From Groovy's build log: https://paste.ubuntu.com/p/NCJPDVSbSW/ If you look at the man page for dh_auto_test it mentions: If the DEB_BUILD_OPTIONS environment variable contains nocheck, no tests will be performed. nocheck was added to riscv64 by default for all packages in Groovy as a part of this change to dpkg in bug 1891686. The test cases basic-realistic.sh and tls-basic-realistic.sh fail on Focal because they attempt to send 100,000 packets between the server and the client, and we get to various stages, like 00029000 msgs sent, and now 00047000 msgs sent with some changes William made to the builders, before it times out and assumes the channel is dead, and the test fails. https://paste.ubuntu.com/p/hwYXSbKPPV/ We aren't going to hit the 100,000 packets on riscv anytime soon. I think I will open a new bug to adjust the packet counts from 100,000 down to 10,000 for basic-realistic.sh and tls-basic-realistic.sh, which resembles what has been done for receiver-abort.sh and tls-receiver- abort.sh. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1908473 Title: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak Status in librelp package in Ubuntu: Fix Released Status in rsyslog package in Ubuntu: Fix Released Status in librelp source package in Focal: Fix Committed Status in rsyslog source package in Focal: Won't Fix Status in librelp source package in Groovy: Fix Committed Status in rsyslog source package in Groovy: Fix Released Status in librelp source package in Hirsute: Fix Released Status in rsyslog source package in Hirsute: Fix Released Bug description: [Impact] In recent versions of rsyslog and librelp, the imrelp module leaks file descriptors due to a bug where it does not correctly close sockets, and instead, leaves them in the CLOSE_WAIT state. This causes rsyslogd on busy servers to eventually hit the limit of maximum open files allowed, which locks rsyslogd up until it is restarted. A workaround is to restart rsyslogd every month or so to manually close all of the open sockets. Only users of the imrelp module are affected, and not rsyslog users in general. [Testcase] Install the rsyslog-relp module like so: $ sudo apt install rsyslog rsyslog-relp Next, generate a working directory, and make a config file that loads the relp module. $ sudo mkdir /workdir $ cat << EOF >> ./spool.conf \$LocalHostName spool \$AbortOnUncleanConfig on \$PreserveFQDN on global( workDirectory="/workdir" maxMessageSize="256k" ) main_queue(queue.type="Direct") module(load="imrelp") input( type="imrelp" name="imrelp" port="601" ruleset="spool" MaxDataSize="256k" ) ruleset(name="spool" queue.type="direct") { } # Just so rsyslog doesn't whine that we do not have outputs ruleset(name="noop" queue.type="direct") { action( type="omfile" name="omfile" file="/workdir/spool.log" ) } EOF Verify that the config is valid, then start a rsyslog server. $ sudo rsyslogd -f ./spool.conf -N9 $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid Fetch the rsyslogd PID and check for open files. $ RLOGPID=$(cat /workdir/rsyslogd.pid) $ sudo ls -l /proc/$RLOGPID/fd total 0 lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]' lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]' lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]' lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]' We have 3 sockets open by default. Next, use netcat to open 100 connections: $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done Now check for open file descriptors, and there will be an extra 100 sockets in the list: $ sudo ls -l /proc/$RLOGPID/fd https://paste.ubuntu.com/p/f6NQVNbZcR/ We can check the state of these sockets with: $ ss -t https://paste.ubuntu.com/p/7Ts2FbxJrg/ The listening sockets will be in CLOSE-WAIT, and the netcat sockets will be in FIN-WAIT-2. $ ss -t | grep CLOSE-WAIT | wc -l 100 If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test When you open connections with netcat, these will be closed properly, and the file descriptor leak will be fixed. [Where problems could occur] If a regression were to occur, it would be limited to users of the imrelp module, which is a part of the rsyslogd-relp package, and depends on librelp. rsyslog-relp is not part of a default installation of rsyslog, and is opt in by changing a
[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak
Performing verification for Focal I installed rsyslog-relp 8.2001.0-1ubuntu1.1 and librelp0 1.5.0-1ubuntu2 from -updates. >From there I set up the configuration file, launched a new rsyslog instance, >and used netcat to set 100 packets to the relp port. https://paste.ubuntu.com/p/jCs9Dy6FYF/ As we can see, there are 100 sockets still open in the CLOSE_WAIT state. >From there I enabled -proposed and installed librelp 1.5.0-1ubuntu2.20.04.1. I started a new instance of rsyslog, and used netcat to send another 100 packets to the relp port. This time, all sockets were closed and not left in CLOSE_WAIT. https://paste.ubuntu.com/p/vdzsVTctmf/ I also ran the testcase from the upstream testsuite, imrelp- sessionbreak-vg.sh. I did this by: 1) pull-lp-source rsyslog focal 2) edit debian/rules, add --enable-valgrind, remove --without-valgrind-tests, 3) wget https://github.com/rsyslog/rsyslog/commit/baee0bd5420649329793746f0daf87c4f59fe6a6.patch 4) quilt import baee0bd5420649329793746f0daf87c4f59fe6a6.patch 5) quilt push 6) chmod +x tests/imrelp-sessionbreak-vg.sh 6) debuild -uc -us -b It will eventually build tests, and imrelp-sessionbreak-vg.sh passes: make[5]: Entering directory '/home/ubuntu/rsyslog-8.2001.0/tests' ... PASS: imrelp-sessionbreak-vg.sh ... We pass both the upstream testsuite and the testcase from the bug report. The file descriptor leak has been fixed, happy to mark as verified for Focal. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1908473 Title: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak Status in librelp package in Ubuntu: Fix Released Status in rsyslog package in Ubuntu: Fix Released Status in librelp source package in Focal: Fix Committed Status in rsyslog source package in Focal: Won't Fix Status in librelp source package in Groovy: Fix Committed Status in rsyslog source package in Groovy: Fix Released Status in librelp source package in Hirsute: Fix Released Status in rsyslog source package in Hirsute: Fix Released Bug description: [Impact] In recent versions of rsyslog and librelp, the imrelp module leaks file descriptors due to a bug where it does not correctly close sockets, and instead, leaves them in the CLOSE_WAIT state. This causes rsyslogd on busy servers to eventually hit the limit of maximum open files allowed, which locks rsyslogd up until it is restarted. A workaround is to restart rsyslogd every month or so to manually close all of the open sockets. Only users of the imrelp module are affected, and not rsyslog users in general. [Testcase] Install the rsyslog-relp module like so: $ sudo apt install rsyslog rsyslog-relp Next, generate a working directory, and make a config file that loads the relp module. $ sudo mkdir /workdir $ cat << EOF >> ./spool.conf \$LocalHostName spool \$AbortOnUncleanConfig on \$PreserveFQDN on global( workDirectory="/workdir" maxMessageSize="256k" ) main_queue(queue.type="Direct") module(load="imrelp") input( type="imrelp" name="imrelp" port="601" ruleset="spool" MaxDataSize="256k" ) ruleset(name="spool" queue.type="direct") { } # Just so rsyslog doesn't whine that we do not have outputs ruleset(name="noop" queue.type="direct") { action( type="omfile" name="omfile" file="/workdir/spool.log" ) } EOF Verify that the config is valid, then start a rsyslog server. $ sudo rsyslogd -f ./spool.conf -N9 $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid Fetch the rsyslogd PID and check for open files. $ RLOGPID=$(cat /workdir/rsyslogd.pid) $ sudo ls -l /proc/$RLOGPID/fd total 0 lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]' lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]' lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]' lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]' We have 3 sockets open by default. Next, use netcat to open 100 connections: $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done Now check for open file descriptors, and there will be an extra 100 sockets in the list: $ sudo ls -l /proc/$RLOGPID/fd https://paste.ubuntu.com/p/f6NQVNbZcR/ We can check the state of these sockets with: $ ss -t https://paste.ubuntu.com/p/7Ts2FbxJrg/ The listening sockets will be in CLOSE-WAIT, and the netcat sockets will be in FIN-WAIT-2. $ ss -t | grep CLOSE-WAIT | wc -l 100 If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test When you open connections with netcat, these will be closed properly, and the file descriptor leak
[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak
Performing verification for librelp in Groovy. I installed rsyslog-relp 8.2006.0-2ubuntu1 and librelp 1.5.0-1ubuntu2 from -updates to reproduce: https://paste.ubuntu.com/p/gtn4rcXc72/ >From there I set up the configuration script, ran a new instance of rsyslog, and used netcat to open 100 connections to the relp port. When I checked the list of file descriptors, there were 100 sockets open, in the CLOSE_WAIT state. >From there, I enabled -proposed and installed librelp 1.5.0-1ubuntu2.20.10.1: https://paste.ubuntu.com/p/nt342PJkQ5/ I started a new rsyslog instance, and used netcat to open 100 connections to the relp port. All sockets were closed when rsyslog was done with them, and there were no sockets in CLOSE_WAIT. I also ran the provided testcase in rsyslog, imrelp-sessionbreak-vg.sh. I did this by: 1) pull-lp-source rsyslog groovy 2) edit debian/rules, add --enable-valgrind, remove --without-valgrind-tests, 3) debuild -uc -us -b It will eventually build tests, and imrelp-sessionbreak-vg.sh passes: make[5]: Entering directory '/home/ubuntu/rsyslog-8.2006.0/tests' ... PASS: imrelp-sessionbreak-vg.sh ... We pass both the upstream testsuite and the testcase from the bug report. The file descriptor leak has been fixed, happy to mark as verified for Groovy. ** Tags removed: verification-needed-groovy ** Tags added: verification-done-groovy -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1908473 Title: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak Status in librelp package in Ubuntu: Fix Released Status in rsyslog package in Ubuntu: Fix Released Status in librelp source package in Focal: Fix Committed Status in rsyslog source package in Focal: Won't Fix Status in librelp source package in Groovy: Fix Committed Status in rsyslog source package in Groovy: Fix Released Status in librelp source package in Hirsute: Fix Released Status in rsyslog source package in Hirsute: Fix Released Bug description: [Impact] In recent versions of rsyslog and librelp, the imrelp module leaks file descriptors due to a bug where it does not correctly close sockets, and instead, leaves them in the CLOSE_WAIT state. This causes rsyslogd on busy servers to eventually hit the limit of maximum open files allowed, which locks rsyslogd up until it is restarted. A workaround is to restart rsyslogd every month or so to manually close all of the open sockets. Only users of the imrelp module are affected, and not rsyslog users in general. [Testcase] Install the rsyslog-relp module like so: $ sudo apt install rsyslog rsyslog-relp Next, generate a working directory, and make a config file that loads the relp module. $ sudo mkdir /workdir $ cat << EOF >> ./spool.conf \$LocalHostName spool \$AbortOnUncleanConfig on \$PreserveFQDN on global( workDirectory="/workdir" maxMessageSize="256k" ) main_queue(queue.type="Direct") module(load="imrelp") input( type="imrelp" name="imrelp" port="601" ruleset="spool" MaxDataSize="256k" ) ruleset(name="spool" queue.type="direct") { } # Just so rsyslog doesn't whine that we do not have outputs ruleset(name="noop" queue.type="direct") { action( type="omfile" name="omfile" file="/workdir/spool.log" ) } EOF Verify that the config is valid, then start a rsyslog server. $ sudo rsyslogd -f ./spool.conf -N9 $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid Fetch the rsyslogd PID and check for open files. $ RLOGPID=$(cat /workdir/rsyslogd.pid) $ sudo ls -l /proc/$RLOGPID/fd total 0 lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]' lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]' lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]' lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]' We have 3 sockets open by default. Next, use netcat to open 100 connections: $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done Now check for open file descriptors, and there will be an extra 100 sockets in the list: $ sudo ls -l /proc/$RLOGPID/fd https://paste.ubuntu.com/p/f6NQVNbZcR/ We can check the state of these sockets with: $ ss -t https://paste.ubuntu.com/p/7Ts2FbxJrg/ The listening sockets will be in CLOSE-WAIT, and the netcat sockets will be in FIN-WAIT-2. $ ss -t | grep CLOSE-WAIT | wc -l 100 If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test When you open connections with netcat, these will be closed properly, and the file descriptor leak will be fixed. [Where problems could occur] If a
[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak
Same one failure on today's rebuild. Strange, since this is the exact same code as Groovy. ** Attachment added: "buildlog_ubuntu-focal-riscv64.librelp_1.5.0-1ubuntu2.20.04.1_BUILDING.txt.gz.5" https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1908473/+attachment/5454976/+files/buildlog_ubuntu-focal-riscv64.librelp_1.5.0-1ubuntu2.20.04.1_BUILDING.txt.gz.5 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1908473 Title: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak Status in librelp package in Ubuntu: Fix Released Status in rsyslog package in Ubuntu: Fix Released Status in librelp source package in Focal: Fix Committed Status in rsyslog source package in Focal: Won't Fix Status in librelp source package in Groovy: Fix Committed Status in rsyslog source package in Groovy: Fix Released Status in librelp source package in Hirsute: Fix Released Status in rsyslog source package in Hirsute: Fix Released Bug description: [Impact] In recent versions of rsyslog and librelp, the imrelp module leaks file descriptors due to a bug where it does not correctly close sockets, and instead, leaves them in the CLOSE_WAIT state. This causes rsyslogd on busy servers to eventually hit the limit of maximum open files allowed, which locks rsyslogd up until it is restarted. A workaround is to restart rsyslogd every month or so to manually close all of the open sockets. Only users of the imrelp module are affected, and not rsyslog users in general. [Testcase] Install the rsyslog-relp module like so: $ sudo apt install rsyslog rsyslog-relp Next, generate a working directory, and make a config file that loads the relp module. $ sudo mkdir /workdir $ cat << EOF >> ./spool.conf \$LocalHostName spool \$AbortOnUncleanConfig on \$PreserveFQDN on global( workDirectory="/workdir" maxMessageSize="256k" ) main_queue(queue.type="Direct") module(load="imrelp") input( type="imrelp" name="imrelp" port="601" ruleset="spool" MaxDataSize="256k" ) ruleset(name="spool" queue.type="direct") { } # Just so rsyslog doesn't whine that we do not have outputs ruleset(name="noop" queue.type="direct") { action( type="omfile" name="omfile" file="/workdir/spool.log" ) } EOF Verify that the config is valid, then start a rsyslog server. $ sudo rsyslogd -f ./spool.conf -N9 $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid Fetch the rsyslogd PID and check for open files. $ RLOGPID=$(cat /workdir/rsyslogd.pid) $ sudo ls -l /proc/$RLOGPID/fd total 0 lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]' lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]' lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]' lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]' We have 3 sockets open by default. Next, use netcat to open 100 connections: $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done Now check for open file descriptors, and there will be an extra 100 sockets in the list: $ sudo ls -l /proc/$RLOGPID/fd https://paste.ubuntu.com/p/f6NQVNbZcR/ We can check the state of these sockets with: $ ss -t https://paste.ubuntu.com/p/7Ts2FbxJrg/ The listening sockets will be in CLOSE-WAIT, and the netcat sockets will be in FIN-WAIT-2. $ ss -t | grep CLOSE-WAIT | wc -l 100 If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test When you open connections with netcat, these will be closed properly, and the file descriptor leak will be fixed. [Where problems could occur] If a regression were to occur, it would be limited to users of the imrelp module, which is a part of the rsyslogd-relp package, and depends on librelp. rsyslog-relp is not part of a default installation of rsyslog, and is opt in by changing a configuration file to enable imrelp. The changes to rsyslog implement a testcase which exercises the problematic code to ensure things are working as expected; this can be enabled manually on build, and has been verified to pass (#7). [Other] Upstream bug list: https://github.com/rsyslog/rsyslog/issues/4350 https://github.com/rsyslog/rsyslog/issues/4005 https://github.com/rsyslog/librelp/issues/188 https://github.com/rsyslog/librelp/pull/193 The following commits fix the problem: rsyslogd commit baee0bd5420649329793746f0daf87c4f59fe6a6 Author: Andre lorbach Date: Thu Apr 9 13:00:35 2020 +0200 Subject: testbench: Add test for imrelp to check broken session handling. Link:
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
Hi Robie, I agree this probably isn't worth a SRU to Groovy, I just made the packages available in the odd chance that they might be considered. I will mark Groovy as won't fix. Hirsute is what really matters in the end. ** Changed in: rsyslog (Ubuntu Groovy) Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: Won't Fix Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
Attached is a patch which changes /var/log/dmesg to 0640 on groovy. It also contains Steve's recommendation to set the logrotate files to 0640. ** Patch added: "Debdiff for syslog on groovy" https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454311/+files/lp1912122_groovy_v2.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: In Progress Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
Attached is a patch which changes /var/log/dmesg to 0640 on hirsute. It also contains Steve's recommendation to set the logrotate files to 0640. ** Patch removed: "Debdiff for rsyslog on hirsute" https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454004/+files/lp1912122_hirsute.debdiff ** Patch removed: "Debdiff for syslog on groovy" https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454005/+files/lp1912122_groovy.debdiff ** Patch added: "Debdiff for rsyslog on hirsute" https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454310/+files/lp1912122_hirsute_v2.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: In Progress Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
** Tags added: sts-sponsor -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: In Progress Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
Attached is a debdiff for Groovy to change /var/log/dmesg to 0640. ** Patch added: "Debdiff for syslog on groovy" https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454005/+files/lp1912122_groovy.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: In Progress Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
Attached is a debdiff for hirsute to set /var/log/dmesg to 0640. ** Patch added: "Debdiff for rsyslog on hirsute" https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454004/+files/lp1912122_hirsute.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: In Progress Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: In Progress Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
** Changed in: rsyslog (Ubuntu Hirsute) Status: New => In Progress ** Changed in: rsyslog (Ubuntu Hirsute) Importance: Undecided => Medium ** Changed in: rsyslog (Ubuntu Hirsute) Assignee: (unassigned) => Matthew Ruffell (mruffell) ** Description changed: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: + https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test + + $ sudo systemctl daemon-reload + $ sudo systemctl start dmesg.service + $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. ** Changed in: rsyslog (Ubuntu Groovy) Status: New => In Progress ** Changed in: rsyslog (Ubuntu Groovy) Importance: Undecided => Medium ** Changed in: rsyslog (Ubuntu Groovy) Assignee: (unassigned) => Matthew Ruffell (mruffell) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: In Progress Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test $ sudo systemctl daemon-reload $ sudo systemctl start dmesg.service $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.
[Touch-packages] [Bug 1912122] [NEW] /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions
Public bug reported: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. ** Affects: rsyslog (Ubuntu) Importance: Medium Assignee: Matthew Ruffell (mruffell) Status: In Progress ** Affects: rsyslog (Ubuntu Groovy) Importance: Undecided Status: New ** Affects: rsyslog (Ubuntu Hirsute) Importance: Medium Assignee: Matthew Ruffell (mruffell) Status: In Progress ** Also affects: rsyslog (Ubuntu Groovy) Importance: Undecided Status: New ** Also affects: rsyslog (Ubuntu Hirsute) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1912122 Title: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions Status in rsyslog package in Ubuntu: In Progress Status in rsyslog source package in Groovy: New Status in rsyslog source package in Hirsute: In Progress Bug description: [Impact] In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu kernel starting with Groovy and onward, in an effort to restrict access to the kernel log buffer from unprivileged users. It seems we have overlooked /var/log/dmesg, as it is still mode 0644, while /var/log/kern.log, /var/log/syslog are all 0640: $ ll /var/log -rw-r--r-- 1 root adm 81768 Jan 18 09:09 dmesg -rw-r- 1 syslogadm 24538 Jan 18 13:05 kern.log -rw-r- 1 syslogadm213911 Jan 18 13:22 syslog Change /var/log/dmesg to 0640 to close the information leak. [Testcase] $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg [0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18) [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash --- If you install the package in the following ppa: $ sudo adduser dave $ su dave $ groups dave $ cat /var/log/kern.log cat: /var/log/kern.log: Permission denied $ cat /var/log/syslog cat: /var/log/syslog: Permission denied $ cat /var/log/dmesg cat: /var/log/dmesg: Permission denied [Where problems could occur] Some users or log scraper programs might need to view the kernel log buffers, and in this case, their underlying service accounts should be added to the 'adm' group. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak
** Tags added: sts-sponsor -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to rsyslog in Ubuntu. https://bugs.launchpad.net/bugs/1908473 Title: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak Status in librelp package in Ubuntu: In Progress Status in rsyslog package in Ubuntu: Fix Released Status in librelp source package in Focal: In Progress Status in rsyslog source package in Focal: Won't Fix Status in librelp source package in Groovy: In Progress Status in rsyslog source package in Groovy: Fix Released Status in librelp source package in Hirsute: In Progress Status in rsyslog source package in Hirsute: Fix Released Bug description: [Impact] In recent versions of rsyslog and librelp, the imrelp module leaks file descriptors due to a bug where it does not correctly close sockets, and instead, leaves them in the CLOSE_WAIT state. This causes rsyslogd on busy servers to eventually hit the limit of maximum open files allowed, which locks rsyslogd up until it is restarted. A workaround is to restart rsyslogd every month or so to manually close all of the open sockets. Only users of the imrelp module are affected, and not rsyslog users in general. [Testcase] Install the rsyslog-relp module like so: $ sudo apt install rsyslog rsyslog-relp Next, generate a working directory, and make a config file that loads the relp module. $ sudo mkdir /workdir $ cat << EOF >> ./spool.conf \$LocalHostName spool \$AbortOnUncleanConfig on \$PreserveFQDN on global( workDirectory="/workdir" maxMessageSize="256k" ) main_queue(queue.type="Direct") module(load="imrelp") input( type="imrelp" name="imrelp" port="601" ruleset="spool" MaxDataSize="256k" ) ruleset(name="spool" queue.type="direct") { } # Just so rsyslog doesn't whine that we do not have outputs ruleset(name="noop" queue.type="direct") { action( type="omfile" name="omfile" file="/workdir/spool.log" ) } EOF Verify that the config is valid, then start a rsyslog server. $ sudo rsyslogd -f ./spool.conf -N9 $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid Fetch the rsyslogd PID and check for open files. $ RLOGPID=$(cat /workdir/rsyslogd.pid) $ sudo ls -l /proc/$RLOGPID/fd total 0 lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]' lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]' lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]' lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]' We have 3 sockets open by default. Next, use netcat to open 100 connections: $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done Now check for open file descriptors, and there will be an extra 100 sockets in the list: $ sudo ls -l /proc/$RLOGPID/fd https://paste.ubuntu.com/p/f6NQVNbZcR/ We can check the state of these sockets with: $ ss -t https://paste.ubuntu.com/p/7Ts2FbxJrg/ The listening sockets will be in CLOSE-WAIT, and the netcat sockets will be in FIN-WAIT-2. $ ss -t | grep CLOSE-WAIT | wc -l 100 If you install the test package available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test When you open connections with netcat, these will be closed properly, and the file descriptor leak will be fixed. [Where problems could occur] If a regression were to occur, it would be limited to users of the imrelp module, which is a part of the rsyslogd-relp package, and depends on librelp. rsyslog-relp is not part of a default installation of rsyslog, and is opt in by changing a configuration file to enable imrelp. The changes to rsyslog implement a testcase which exercises the problematic code to ensure things are working as expected, and should run during autopkgtest time. [Other] Upstream bug list: https://github.com/rsyslog/rsyslog/issues/4350 https://github.com/rsyslog/rsyslog/issues/4005 https://github.com/rsyslog/librelp/issues/188 https://github.com/rsyslog/librelp/pull/193 The following commits fix the problem: rsyslogd commit baee0bd5420649329793746f0daf87c4f59fe6a6 Author: Andre lorbach Date: Thu Apr 9 13:00:35 2020 +0200 Subject: testbench: Add test for imrelp to check broken session handling. Link: https://github.com/rsyslog/rsyslog/commit/baee0bd5420649329793746f0daf87c4f59fe6a6 librelp === commit 7907c9c57f6ed94c8ce5a4e63c3c4e019f71cff0 Author: Andre lorbach Date: Mon May 11 14:59:55 2020 +0200 Subject: fix memory leak on session break. Link: https://github.com/rsyslog/librelp/commit/7907c9c57f6ed94c8ce5a4e63c3c4e019f71cff0 commit 4a6ad8637c244fd3a1caeb9a93950826f58e956a