[Touch-packages] [Bug 1949723] Re: systemd-resolved segfault in hashmap_iterate_entry

2024-03-26 Thread Matthew Ruffell
** Changed in: systemd (Ubuntu Focal)
   Status: New => In Progress

** Changed in: systemd (Ubuntu Focal)
   Importance: Low => Medium

** Changed in: systemd (Ubuntu Focal)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1949723

Title:
  systemd-resolved segfault in hashmap_iterate_entry

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  In Progress

Bug description:
  installed libnss-resolve that put "resolve" in nsswitch.conf.

  $ lsb_release -rd
  Description:  Ubuntu 20.04.3 LTS
  Release:  20.04
  $ dpkg -l systemd | grep systemd
  ii  systemd245.4-4ubuntu3.13 amd64system and service manager
  $ grep ^hosts /etc/nsswitch.conf 
  hosts:  files libvirt mdns4_minimal resolve [NOTFOUND=return] dns 
mymachines

  systemd-resolved crashed once with segmentation fault.


  (gdb) bt
  #0  0x7f119c67693a in hashmap_iterate_entry 
(h=h@entry=0x706f746b73656465, i=i@entry=0x7ffc4ef515d0) at 
../src/basic/hashmap.c:705
  #1  0x7f119c6789d6 in internal_hashmap_iterate (h=0x706f746b73656465, 
i=i@entry=0x7ffc4ef515d0, value=value@entry=0x7ffc4ef515c8, key=key@entry=0x0)
  at ../src/basic/hashmap.c:714
  #2  0x7f119c678a8b in set_iterate (s=, 
i=i@entry=0x7ffc4ef515d0, value=value@entry=0x7ffc4ef515c8) at 
../src/basic/hashmap.c:735
  #3  0x55ba5e0ea917 in dns_query_candidate_go (c=c@entry=0x55ba5f353180) 
at ../src/resolve/resolved-dns-query.c:152
  #4  0x55ba5e0e9f0c in dns_query_candidate_notify 
(c=c@entry=0x55ba5f353180) at ../src/resolve/resolved-dns-query.c:312
  #5  0x55ba5e0ea178 in dns_transaction_complete (t=0x55ba5f37a9d0, 
state=) at ../src/resolve/resolved-dns-transaction.c:351
  #6  0x55ba5e0e27cd in dns_transaction_process_dnssec 
(t=t@entry=0x55ba5f37a9d0) at ../src/resolve/resolved-dns-transaction.c:838
  #7  0x55ba5e0e3649 in dns_transaction_process_reply 
(t=t@entry=0x55ba5f37a9d0, p=p@entry=0x55ba5f39dce0)
  at ../src/resolve/resolved-dns-transaction.c:1210
  #8  0x55ba5e0e40ab in on_dns_packet (s=, fd=, revents=, userdata=0x55ba5f37a9d0)
  at ../src/resolve/resolved-dns-transaction.c:1264
  #9  0x7f119c5e6c77 in source_dispatch (s=s@entry=0x55ba5f346780) at 
../src/libsystemd/sd-event/sd-event.c:3193
  #10 0x7f119c5e6f11 in sd_event_dispatch (e=e@entry=0x55ba5f320430) at 
../src/libsystemd/sd-event/sd-event.c:3634
  #11 0x7f119c5e8948 in sd_event_run (e=e@entry=0x55ba5f320430, 
timeout=timeout@entry=18446744073709551615) at 
../src/libsystemd/sd-event/sd-event.c:3692
  #12 0x7f119c5e8b6f in sd_event_loop (e=0x55ba5f320430) at 
../src/libsystemd/sd-event/sd-event.c:3714
  #13 0x55ba5e0c326a in run (argv=, argc=) at 
../src/resolve/resolved.c:84
  #14 main (argc=, argv=) at 
../src/resolve/resolved.c:91

  This seems to have been reported upstream
  https://github.com/systemd/systemd/issues/16168

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1949723/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled

2024-03-24 Thread Matthew Ruffell
Thank you for the help sorting autopkgtests Mauricio.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to coreutils in Ubuntu.
https://bugs.launchpad.net/bugs/2033892

Title:
  ls -l triggers mount of autofs shares when --ghost option is present
  or browse_mode is enabled

Status in coreutils package in Ubuntu:
  Fix Released
Status in coreutils source package in Jammy:
  Fix Committed
Status in coreutils package in Fedora:
  Fix Released

Bug description:
  [Impact]

  Issuing a 'ls -l' or a 'stat' on an autofs share when you have set
  --ghost in the auto.master file, or browse_mode=yes in autofs.conf
  will lead to the shares being mounted, when they didn't previously.

  Disks / shares may not be present and the mounts may fail, leading to
  errors in your output.

  This is a behaviour change in autofs 8.32, which occurred in the
  transition to using statx() instead of stat()/lstat() in previous
  releases.

  There doesn't seem to be any workarounds, apart from not running a 'ls
  -l' in your autofs share directory.

  [Testcase]

  Start two Jammy VMs. One will be a NFS server, the other, the client.

  NFS server:

  Server VM:
  $ sudo hostnamectl set-hostname jammy-nfs-server
  $ sudo apt update && sudo apt upgrade -y
  $ sudo apt install nfs-kernel-server
  $ sudo mkdir /export
  $ sudo mkdir /export/users
  $ sudo vi /etc/exports # add the following lines:
  /export 192.168.122.0/24(rw,fsid=0,no_subtree_check,sync)
  /export/users 192.168.122.0/24(rw,nohide,insecure,no_subtree_check,sync)
  $ sudo systemctl restart nfs-server.service

  AutoFS Client:
  $ sudo apt update
  $ sudo apt install autofs
  $ sudo vim /etc/autofs.conf
  browse_mode = yes
  $ sudo mkdir /mnt2
  $ sudo vim /etc/auto.master
  /mnt2 /etc/auto.indirect
  $ sudo vim /etc/auto.indirect
  export 192.168.122.18:/export
  export-missing 192.168.122.18:/export-missing
  $ sudo reboot
  $ cd /mnt2
  $ ls -l
  ls: cannot access 'export-missing': No such file or directory
  total 4
  drwxr-xr-x 3 root root 4096 Feb 12 21:48 export
  d? ? ??   ?? export-missing
  $ mount -l | grep /mnt2
  /etc/auto.indirect on /mnt2 type autofs 
(rw,relatime,fd=6,pgrp=634,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21561)
  192.168.122.18:/export on /mnt2/export type nfs 
(rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.18,mountvers=3,mountport=35786,mountproto=udp,local_lock=none,addr=192.168.122.18)

  We see the mount for export occurred, and export-missing was
  attempted, but it was either bogus or the disk was not present,
  leading to a "No such file or directory" error.

  There are test packages available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf378489-test

  If you install them, this is what should occur:

  $ ls -l
  total 0
  drwxr-xr-x 2 root root 0 Feb 12 22:01 export
  drwxr-xr-x 2 root root 0 Feb 12 22:01 export-missing
  $ mount -l | grep /mnt2
  /etc/auto.indirect on /mnt2 type autofs 
(rw,relatime,fd=6,pgrp=636,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=18346)

  No mounts happen, and no errors either.

  [Where problems could occur]

  We are changing the behaviour of core utilities, ls and stat, such
  that they no longer attempt to mount autofs shares when --ghost option
  is present or browse_mode is enabled.

  This is the intended behaviour in the first place, and has been this
  way for at least a decade prior, and was changed to return to this
  behaviour shortly after the release of coreutils that introduced
  statx() that caused automounts to occur.

  It is unlikely any system administrators are relying on the behaviour
  found in jammy in any scripts or day to day operations that would be
  adversely affected by the change. The worst case scenario is that a
  user doing an 'ls -l' on an unmounted disk finds the mount doesn't
  automatically occur, and they have to attach the disk and issue the
  mount themselves.

  If a regression were to occur, it would be limited to the ls and stat
  commands, specifically when listing directories linked to autofs
  mountpoints.

  [Other info]

  The automount behaviour change was introduced upstream in version
  8.32, with the introduction of the statx() call. This means only Jammy
  is affected.

  It was quickly reverted back to how it was originally in version 9.1,
  which is already available in Mantic and onward.

  The commits that solve the issue are:

  commit 85c975df2c25bd799370b04bb294e568e001102f
  From: Rohan Sable 
  Date: Mon, 7 Mar 2022 14:14:13 +
  Subject: ls: avoid triggering automounts
  Link: 
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2

  commit 92cb8427c537f37edd43c5cef1909585201372ab 
  From: Pádraig Brady 
  Date: Mon, 7 Mar 2022 23:29:20 +
  Subject: stat: only automount with 

[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled

2024-03-20 Thread Matthew Ruffell
Performing verification for Jammy

I set up two Jammy VMs, one a nfs-server and the other an autofs/nfs-
client.

The client is using coreutils 8.32-4.1ubuntu1.1 from -updates.

$ apt-cache policy coreutils | grep Installed
  Installed: 8.32-4.1ubuntu1.1
  
I set up the nfs server and autofs mounts as the Testcase indicates.

$ ls -l
ls: cannot access 'export-missing': No such file or directory
total 4
drwxr-xr-x 3 root root 4096 Mar 20 22:16 export
d? ? ??   ?? export-missing

$ mount -l | grep mnt2
/etc/auto.indirect on /mnt2 type autofs 
(rw,relatime,fd=6,pgrp=692,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21588)
192.168.122.65:/export on /mnt2/export type nfs 
(rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.65,mountvers=3,mountport=47718,mountproto=udp,local_lock=none,addr=192.168.122.65)

The mounts were previously unmounted, but when I issue 'ls -l', the
mounts occur, which is not wanted, and we error out on the non-existant
export-missing mount.

I then enabled -proposed, and installed coreutils 8.32-4.1ubuntu1.2.

$ apt-cache policy coreutils | grep Installed
  Installed: 8.32-4.1ubuntu1.2

From there, lets try the 'ls -l':

$ ls -l
total 0
drwxr-xr-x 2 root root 0 Mar 20 22:25 export
drwxr-xr-x 2 root root 0 Mar 20 22:25 export-missing

$ mount -l | grep mnt2
/etc/auto.indirect on /mnt2 type autofs 
(rw,relatime,fd=6,pgrp=648,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=16856)

This time the mounts do not occur, we just get a listing of the possible
autofs mounts. We can confirm with 'mount -l' that nothing was actually
actioned.

The package in -proposed fixes the issues. Happy to mark verified for
Jammy.

** Tags removed: verification-needed verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to coreutils in Ubuntu.
https://bugs.launchpad.net/bugs/2033892

Title:
  ls -l triggers mount of autofs shares when --ghost option is present
  or browse_mode is enabled

Status in coreutils package in Ubuntu:
  Fix Released
Status in coreutils source package in Jammy:
  Fix Committed
Status in coreutils package in Fedora:
  Fix Released

Bug description:
  [Impact]

  Issuing a 'ls -l' or a 'stat' on an autofs share when you have set
  --ghost in the auto.master file, or browse_mode=yes in autofs.conf
  will lead to the shares being mounted, when they didn't previously.

  Disks / shares may not be present and the mounts may fail, leading to
  errors in your output.

  This is a behaviour change in autofs 8.32, which occurred in the
  transition to using statx() instead of stat()/lstat() in previous
  releases.

  There doesn't seem to be any workarounds, apart from not running a 'ls
  -l' in your autofs share directory.

  [Testcase]

  Start two Jammy VMs. One will be a NFS server, the other, the client.

  NFS server:

  Server VM:
  $ sudo hostnamectl set-hostname jammy-nfs-server
  $ sudo apt update && sudo apt upgrade -y
  $ sudo apt install nfs-kernel-server
  $ sudo mkdir /export
  $ sudo mkdir /export/users
  $ sudo vi /etc/exports # add the following lines:
  /export 192.168.122.0/24(rw,fsid=0,no_subtree_check,sync)
  /export/users 192.168.122.0/24(rw,nohide,insecure,no_subtree_check,sync)
  $ sudo systemctl restart nfs-server.service

  AutoFS Client:
  $ sudo apt update
  $ sudo apt install autofs
  $ sudo vim /etc/autofs.conf
  browse_mode = yes
  $ sudo mkdir /mnt2
  $ sudo vim /etc/auto.master
  /mnt2 /etc/auto.indirect
  $ sudo vim /etc/auto.indirect
  export 192.168.122.18:/export
  export-missing 192.168.122.18:/export-missing
  $ sudo reboot
  $ cd /mnt2
  $ ls -l
  ls: cannot access 'export-missing': No such file or directory
  total 4
  drwxr-xr-x 3 root root 4096 Feb 12 21:48 export
  d? ? ??   ?? export-missing
  $ mount -l | grep /mnt2
  /etc/auto.indirect on /mnt2 type autofs 
(rw,relatime,fd=6,pgrp=634,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21561)
  192.168.122.18:/export on /mnt2/export type nfs 
(rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.18,mountvers=3,mountport=35786,mountproto=udp,local_lock=none,addr=192.168.122.18)

  We see the mount for export occurred, and export-missing was
  attempted, but it was either bogus or the disk was not present,
  leading to a "No such file or directory" error.

  There are test packages available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf378489-test

  If you install them, this is what should occur:

  $ ls -l
  total 0
  drwxr-xr-x 2 root root 0 Feb 12 22:01 export
  drwxr-xr-x 2 root root 0 Feb 12 22:01 export-missing
  $ mount -l | grep /mnt2
  /etc/auto.indirect on /mnt2 type autofs 

[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs

2024-03-19 Thread Matthew Ruffell
Attached is a debdiff for mantic which fixes this issue.

** Patch added: "Debdiff for gtkpod on mantic"
   
https://bugs.launchpad.net/ubuntu/+source/gtkpod/+bug/2044420/+attachment/5757356/+files/lp2044420_mantic.debdiff

** Changed in: glib2.0 (Ubuntu Noble)
   Status: Triaged => Fix Released

** No longer affects: gtkpod (Ubuntu Noble)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to glib2.0 in Ubuntu.
https://bugs.launchpad.net/bugs/2044420

Title:
  gtkpod segfaults when attempting to display songs

Status in GLib:
  Fix Released
Status in glib2.0 package in Ubuntu:
  Fix Released
Status in gtkpod package in Ubuntu:
  New
Status in glib2.0 source package in Mantic:
  Triaged
Status in gtkpod source package in Mantic:
  New
Status in glib2.0 source package in Noble:
  Fix Released

Bug description:
  Open gtkpod, and select your ipod from the list. If it has more than
  one screenfull of songs to display in the list, gtkpod will
  immediately segfault.

  I haven't found a workaround yet.

  Broken on Mantic, works on Lunar.

  Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault.
  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  685   ../string/strxfrm_l.c: No such file or directory.
  (gdb) bt
  #0  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  #1  0x770c5a5e in g_utf8_collate_key () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x77f852ec in fuzzy_skip_prefix () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #3  0x7fffa80980ca in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #4  0x7fffa80997fd in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #5  0x7fffa8099526 in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #6  0x7fffa809f196 in sorttab_display_select_playlist_cb () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #7  0x7718d130 in g_closure_invoke () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #8  0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #9  0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #10 0x771abbd6 in g_signal_emit_valist () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #11 0x771abc93 in g_signal_emit () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #12 0x77f67e4b in gtkpod_set_current_playlist () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #13 0x7fffa807cce0 in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so
  #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #16 0x7708c46f in g_main_loop_run () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0
  #18 0xea1f in main ()

To manage notifications about this bug go to:
https://bugs.launchpad.net/glib/+bug/2044420/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs

2024-03-19 Thread Matthew Ruffell
gtkpod has been removed from debian, and thus removed from noble, so no
need to fix there.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to glib2.0 in Ubuntu.
https://bugs.launchpad.net/bugs/2044420

Title:
  gtkpod segfaults when attempting to display songs

Status in GLib:
  Fix Released
Status in glib2.0 package in Ubuntu:
  Fix Released
Status in gtkpod package in Ubuntu:
  New
Status in glib2.0 source package in Mantic:
  Triaged
Status in gtkpod source package in Mantic:
  New
Status in glib2.0 source package in Noble:
  Fix Released

Bug description:
  Open gtkpod, and select your ipod from the list. If it has more than
  one screenfull of songs to display in the list, gtkpod will
  immediately segfault.

  I haven't found a workaround yet.

  Broken on Mantic, works on Lunar.

  Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault.
  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  685   ../string/strxfrm_l.c: No such file or directory.
  (gdb) bt
  #0  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  #1  0x770c5a5e in g_utf8_collate_key () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x77f852ec in fuzzy_skip_prefix () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #3  0x7fffa80980ca in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #4  0x7fffa80997fd in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #5  0x7fffa8099526 in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #6  0x7fffa809f196 in sorttab_display_select_playlist_cb () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #7  0x7718d130 in g_closure_invoke () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #8  0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #9  0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #10 0x771abbd6 in g_signal_emit_valist () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #11 0x771abc93 in g_signal_emit () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #12 0x77f67e4b in gtkpod_set_current_playlist () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #13 0x7fffa807cce0 in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so
  #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #16 0x7708c46f in g_main_loop_run () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0
  #18 0xea1f in main ()

To manage notifications about this bug go to:
https://bugs.launchpad.net/glib/+bug/2044420/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-02-21 Thread Matthew Ruffell
I have been running the test packages on AWS with the reproducer running
for 20 days now, and they are still running great. The change to direct
IO really does fix this issue, and my testing has removed any and all
concerns of causing a regression.

Previously focal wouldn't last more than 20 minutes, and jammy onward, a
week.

I will get these patches sponsored now. Sorry for the delay Krister.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled

2024-02-12 Thread Matthew Ruffell
** Description changed:

- Release: 22.04.3 LTS
- coreutils 8.32-4.1ubuntu1
+ [Impact]
  
- ls triggers unwanted mounts of autofs filesystems
+ Issuing a 'ls -l' or a 'stat' on an autofs share when you have set
+ --ghost in the auto.master file, or browse_mode=yes in autofs.conf will
+ lead to the shares being mounted, when they didn't previously.
  
- cause: coreutils 8.32.4.1ubuntu1 uses statx which not pass the
- AT_NO_AUTOMOUNT flag
+ Disks / shares may not be present and the mounts may fail, leading to
+ errors in your output.
  
- This bug is also known (and fixed) at Redhat
- https://bugzilla.redhat.com/show_bug.cgi?id=2044981
+ This is a behaviour change in autofs 8.32, which occurred in the
+ transition to using statx() instead of stat()/lstat() in previous
+ releases.
  
- upstream commits:
- 
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2
- 
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-178-g92cb8427c53
+ There doesn't seem to be any workarounds, apart from not running a 'ls
+ -l' in your autofs share directory.
  
- fedora commit
- 
https://src.fedoraproject.org/rpms/coreutils/c/d736cafa20f13eeb037a3950bdbb4b63dc39b7e3?branch=f35
+ [Testcase]
+ 
+ Start two Jammy VMs. One will be a NFS server, the other, the client.
+ 
+ NFS server:
+ 
+ Server VM:
+ $ sudo hostnamectl set-hostname jammy-nfs-server
+ $ sudo apt update && sudo apt upgrade -y
+ $ sudo apt install nfs-kernel-server
+ $ sudo mkdir /export
+ $ sudo mkdir /export/users
+ $ sudo vi /etc/exports # add the following lines:
+ /export 192.168.122.0/24(rw,fsid=0,no_subtree_check,sync)
+ /export/users 192.168.122.0/24(rw,nohide,insecure,no_subtree_check,sync)
+ $ sudo systemctl restart nfs-server.service
+ 
+ AutoFS Client:
+ $ sudo apt update
+ $ sudo apt install autofs
+ $ sudo vim /etc/autofs.conf
+ browse_mode = yes
+ $ sudo mkdir /mnt2
+ $ sudo vim /etc/auto.master
+ /mnt2 /etc/auto.indirect
+ $ sudo vim /etc/auto.indirect
+ export 192.168.122.18:/export
+ export-missing 192.168.122.18:/export-missing
+ $ sudo reboot
+ $ cd /mnt2
+ $ ls -l
+ ls: cannot access 'export-missing': No such file or directory
+ total 4
+ drwxr-xr-x 3 root root 4096 Feb 12 21:48 export
+ d? ? ??   ?? export-missing
+ $ mount -l | grep /mnt2
+ /etc/auto.indirect on /mnt2 type autofs 
(rw,relatime,fd=6,pgrp=634,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=21561)
+ 192.168.122.18:/export on /mnt2/export type nfs 
(rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.18,mountvers=3,mountport=35786,mountproto=udp,local_lock=none,addr=192.168.122.18)
+ 
+ We see the mount for export occurred, and export-missing was attempted,
+ but it was either bogus or the disk was not present, leading to a "No
+ such file or directory" error.
+ 
+ There are test packages available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf378489-test
+ 
+ If you install them, this is what should occur:
+ 
+ $ ls -l
+ total 0
+ drwxr-xr-x 2 root root 0 Feb 12 22:01 export
+ drwxr-xr-x 2 root root 0 Feb 12 22:01 export-missing
+ $ mount -l | grep /mnt2
+ /etc/auto.indirect on /mnt2 type autofs 
(rw,relatime,fd=6,pgrp=636,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=18346)
+ 
+ No mounts happen, and no errors either.
+ 
+ [Where problems could occur]
+ 
+ We are changing the behaviour of core utilities, ls and stat, such that
+ they no longer attempt to mount autofs shares when --ghost option is
+ present or browse_mode is enabled.
+ 
+ This is the intended behaviour in the first place, and has been this way
+ for at least a decade prior, and was changed to return to this behaviour
+ shortly after the release of coreutils that introduced statx() that
+ caused automounts to occur.
+ 
+ It is unlikely any system administrators are relying on the behaviour
+ found in jammy in any scripts or day to day operations that would be
+ adversely affected by the change. The worst case scenario is that a user
+ doing an 'ls -l' on an unmounted disk finds the mount doesn't
+ automatically occur, and they have to attach the disk and issue the
+ mount themselves.
+ 
+ If a regression were to occur, it would be limited to the ls and stat
+ commands, specifically when listing directories linked to autofs
+ mountpoints.
+ 
+ [Other info]
+ 
+ The automount behaviour change was introduced upstream in version 8.32,
+ with the introduction of the statx() call. This means only Jammy is
+ affected.
+ 
+ It was quickly reverted back to how it was originally in version 9.1,
+ which is already available in Mantic and onward.
+ 
+ The commits that solve the issue are:
+ 
+ commit 85c975df2c25bd799370b04bb294e568e001102f
+ From: Rohan Sable 
+ Date: Mon, 7 Mar 2022 14:14:13 +
+ Subject: ls: avoid triggering automounts
+ Link: 

[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled

2024-02-07 Thread Matthew Ruffell
Attached is a debdiff that solves this issue on Jammy.

** Patch added: "Debdiff for coreutils on Jammy"
   
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2033892/+attachment/5745181/+files/lp2033892_jammy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to coreutils in Ubuntu.
https://bugs.launchpad.net/bugs/2033892

Title:
  ls -l triggers mount of autofs shares when --ghost option is present
  or browse_mode is enabled

Status in coreutils package in Ubuntu:
  Fix Released
Status in coreutils source package in Jammy:
  In Progress
Status in coreutils package in Fedora:
  Fix Released

Bug description:
  Release: 22.04.3 LTS
  coreutils 8.32-4.1ubuntu1

  ls triggers unwanted mounts of autofs filesystems

  cause: coreutils 8.32.4.1ubuntu1 uses statx which not pass the
  AT_NO_AUTOMOUNT flag

  This bug is also known (and fixed) at Redhat
  https://bugzilla.redhat.com/show_bug.cgi?id=2044981

  upstream commits:
  
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2
  
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-178-g92cb8427c53

  fedora commit
  
https://src.fedoraproject.org/rpms/coreutils/c/d736cafa20f13eeb037a3950bdbb4b63dc39b7e3?branch=f35

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2033892/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2033892] Re: ls -l triggers mount of autofs shares when --ghost option is present or browse_mode is enabled

2024-02-07 Thread Matthew Ruffell
** Also affects: coreutils (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: coreutils (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: coreutils (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: coreutils (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to coreutils in Ubuntu.
https://bugs.launchpad.net/bugs/2033892

Title:
  ls -l triggers mount of autofs shares when --ghost option is present
  or browse_mode is enabled

Status in coreutils package in Ubuntu:
  Fix Released
Status in coreutils source package in Jammy:
  In Progress
Status in coreutils package in Fedora:
  Fix Released

Bug description:
  Release: 22.04.3 LTS
  coreutils 8.32-4.1ubuntu1

  ls triggers unwanted mounts of autofs filesystems

  cause: coreutils 8.32.4.1ubuntu1 uses statx which not pass the
  AT_NO_AUTOMOUNT flag

  This bug is also known (and fixed) at Redhat
  https://bugzilla.redhat.com/show_bug.cgi?id=2044981

  upstream commits:
  
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-177-g85c975df2c2
  
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.0-178-g92cb8427c53

  fedora commit
  
https://src.fedoraproject.org/rpms/coreutils/c/d736cafa20f13eeb037a3950bdbb4b63dc39b7e3?branch=f35

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2033892/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-02-04 Thread Matthew Ruffell
Hi Krister,

Fascinating. I'm in New Zealand, so I use ap-southeast-2 in Sydney,
Australia for all my instances, and I never gave it any thought that
this could depend on how busy EBS is on the availability zone.

I'll move my instances to us-west-2.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-02-01 Thread Matthew Ruffell
Hi Krister,

I have finally seen this occur in real life with my own two eyes!

You are absolutely correct, the 4-retry doesn't seem to be sufficient
sometimes.

The reproducer works on Focal and previous in about 20 minutes, so its
easy to see the issue trigger on Focal. But Focal and previous doesn't
retry at all.

On Jammy, Mantic and noble, it took about a week straight, but I managed
to get it to trigger for each of them.

Start

Tue Jan 16 01:57:20 UTC 2024
Tue Jan 16 02:18:53 UTC 2024

End

Tue Jan 23 20:12:28 UTC 2024
Tue Jan 23 14:32:08 UTC 2024

The 4-retry does help, and helps quite a lot really.

Anyway, I upgraded my test environment to the test packages, and I will
leave them running for a week.

If things look good then, I'll get these patches sponsored for SRU.

Sorry for the delay, but I really wanted to see it fail on Jammy, Mantic
and Noble before we go patching them.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-01-10 Thread Matthew Ruffell
Hi Krister,

I apologise for the delay. The main issue I have been having with
testing is that it reproduces significantly faster on some releases than
others, and I still haven't managed to reproduce once on some releases.
I'll set up some fresh reproducers now, and leave them running.

If you want to help test, there are test packages for all releases in:
https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

Regardless, I'll try move this forwards.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-01-10 Thread Matthew Ruffell
Attached is a patch for noble that solves this issue.

** Patch added: "Debdiff for e2fsprogs on noble"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5738302/+files/lp2036467_noble.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-01-10 Thread Matthew Ruffell
Attached is a V2 patch for mantic with a different version number, due
to it no longer being the devel release.

** Patch removed: "Debdiff for e2fsprogs on mantic"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707893/+files/lp2036467_mantic.debdiff

** Patch added: "Debdiff for e2fsprogs on mantic V2"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5738301/+files/lp2036467_mantic_v2.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs

2023-11-23 Thread Matthew Ruffell
Upstream bug: https://gitlab.gnome.org/GNOME/glib/-/issues/3185

** Bug watch added: gitlab.gnome.org/GNOME/glib/-/issues #3185
   https://gitlab.gnome.org/GNOME/glib/-/issues/3185

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to glib2.0 in Ubuntu.
https://bugs.launchpad.net/bugs/2044420

Title:
  gtkpod segfaults when attempting to display songs

Status in glib2.0 package in Ubuntu:
  New
Status in gtkpod package in Ubuntu:
  New
Status in glib2.0 source package in Mantic:
  New
Status in gtkpod source package in Mantic:
  New
Status in glib2.0 source package in Noble:
  New
Status in gtkpod source package in Noble:
  New

Bug description:
  Open gtkpod, and select your ipod from the list. If it has more than
  one screenfull of songs to display in the list, gtkpod will
  immediately segfault.

  I haven't found a workaround yet.

  Broken on Mantic, works on Lunar.

  Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault.
  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  685   ../string/strxfrm_l.c: No such file or directory.
  (gdb) bt
  #0  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  #1  0x770c5a5e in g_utf8_collate_key () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x77f852ec in fuzzy_skip_prefix () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #3  0x7fffa80980ca in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #4  0x7fffa80997fd in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #5  0x7fffa8099526 in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #6  0x7fffa809f196 in sorttab_display_select_playlist_cb () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #7  0x7718d130 in g_closure_invoke () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #8  0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #9  0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #10 0x771abbd6 in g_signal_emit_valist () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #11 0x771abc93 in g_signal_emit () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #12 0x77f67e4b in gtkpod_set_current_playlist () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #13 0x7fffa807cce0 in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so
  #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #16 0x7708c46f in g_main_loop_run () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0
  #18 0xea1f in main ()

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/2044420/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2044420] Re: gtkpod segfaults when attempting to display songs

2023-11-23 Thread Matthew Ruffell
** Also affects: glib2.0 (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to glib2.0 in Ubuntu.
https://bugs.launchpad.net/bugs/2044420

Title:
  gtkpod segfaults when attempting to display songs

Status in glib2.0 package in Ubuntu:
  New
Status in gtkpod package in Ubuntu:
  New
Status in glib2.0 source package in Mantic:
  New
Status in gtkpod source package in Mantic:
  New
Status in glib2.0 source package in Noble:
  New
Status in gtkpod source package in Noble:
  New

Bug description:
  Open gtkpod, and select your ipod from the list. If it has more than
  one screenfull of songs to display in the list, gtkpod will
  immediately segfault.

  I haven't found a workaround yet.

  Broken on Mantic, works on Lunar.

  Thread 1 "gtkpod" received signal SIGSEGV, Segmentation fault.
  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  685   ../string/strxfrm_l.c: No such file or directory.
  (gdb) bt
  #0  __GI___wcsxfrm_l (dest=0x0, src=0x0, n=0, l=0x76fff5a0 
<_nl_global_locale>) at ../string/strxfrm_l.c:685
  #1  0x770c5a5e in g_utf8_collate_key () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x77f852ec in fuzzy_skip_prefix () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #3  0x7fffa80980ca in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #4  0x7fffa80997fd in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #5  0x7fffa8099526 in normal_sort_tab_page_add_track () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #6  0x7fffa809f196 in sorttab_display_select_playlist_cb () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libsorttab_display.so
  #7  0x7718d130 in g_closure_invoke () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #8  0x771ba4ac in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #9  0x771ab9b1 in ??? () at /lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #10 0x771abbd6 in g_signal_emit_valist () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #11 0x771abc93 in g_signal_emit () at 
/lib/x86_64-linux-gnu/libgobject-2.0.so.0
  #12 0x77f67e4b in gtkpod_set_current_playlist () at 
/lib/x86_64-linux-gnu/libgtkpod.so.1
  #13 0x7fffa807cce0 in ??? () at 
/usr/lib/x86_64-linux-gnu/gtkpod/libplaylist_display.so
  #14 0x7708ba11 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #15 0x770e746f in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #16 0x7708c46f in g_main_loop_run () at 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #17 0x777f61ed in gtk_main () at /lib/x86_64-linux-gnu/libgtk-3.so.0
  #18 0xea1f in main ()

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/2044420/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2023-10-11 Thread Matthew Ruffell
** Description changed:

  [Impact]
  
  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit the
  entire disk.
  
  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.
  
+ $ resize2fs /dev/nvme1n1p1
+ resize2fs 1.47.0 (5-Feb-2023)
+ resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
+ Couldn't find valid filesystem superblock.
+ 
  Changing the read of the superblock to Direct I/O solves the issue.
  
  [Testcase]
  
  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use
  as a scratch disk.
  
  Run the following script, courtesy of Krister Johansen and his team:
  
-#!/usr/bin/bash
-set -euxo pipefail
+    #!/usr/bin/bash
+    set -euxo pipefail
  
-while true
-do
-parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
-sleep .5
-mkfs.ext4 /dev/nvme1n1p1
-mount -t ext4 /dev/nvme1n1p1 /mnt
-stress-ng --temp-path /mnt -D 4 &
-STRESS_PID=$!
-sleep 1
-growpart /dev/nvme1n1 1
-resize2fs /dev/nvme1n1p1
-kill $STRESS_PID
-wait $STRESS_PID
-umount /mnt
-wipefs -a /dev/nvme1n1p1
-wipefs -a /dev/nvme1n1
-done
+    while true
+    do
+    parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
+    sleep .5
+    mkfs.ext4 /dev/nvme1n1p1
+    mount -t ext4 /dev/nvme1n1p1 /mnt
+    stress-ng --temp-path /mnt -D 4 &
+    STRESS_PID=$!
+    sleep 1
+    growpart /dev/nvme1n1 1
+    resize2fs /dev/nvme1n1p1
+    kill $STRESS_PID
+    wait $STRESS_PID
+    umount /mnt
+    wipefs -a /dev/nvme1n1p1
+    wipefs -a /dev/nvme1n1
+    done
  
  Test packages are available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test
  
  If you install the test packages, the race no longer occurs.
  
  [Where problems could occur]
  
  We are changing how resize2fs reads the superblock from underlying
  disks.
  
  If a regression were to occur, resize2fs could fail to resize offline or
  online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.
  
  [Other info]
  
- Upstream mailing list discussion: 
+ Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/
  
  This was fixed in the below commit upstream:
  
  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
-  online resizes
+  online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
  
  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-ESM
  archives to be picked up in cloud images.

** Changed in: e2fsprogs (Ubuntu Bionic)
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a 

[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2023-10-09 Thread Matthew Ruffell
@juliank I'm just doing a little bit more testing for the moment, as I
really want to make sure this isn't going to cause any issues in the
cloud images. It would be nice to have this bug fixed though, I have
seen a few cases related to it over the years.

I'll ask my SEG colleagues for help with sponsoring in a day or two.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion: 
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
** Summary changed:

- superblock checksum mismatch in resize2fs
+ Resizing cloud-images occasionally fails due to superblock checksum mismatch 
in resize2fs

** Description changed:

- Hi,
- We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:
+ [Impact]
+ 
+ This is a long running bug plaguing cloud-images, where on a rare
+ occasion resize2fs would fail and the image would not resize to fit the
+ entire disk.
+ 
+ Online resizes would fail due to a superblock checksum mismatch, where
+ the superblock in memory differs from what is currently on disk due to
+ changes made to the image.
+ 
+ Changing the read of the superblock to Direct I/O solves the issue.
+ 
+ [Testcase]
+ 
+ Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use
+ as a scratch disk.
+ 
+ Run the following script, courtesy of Krister Johansen and his team:
  
 #!/usr/bin/bash
 set -euxo pipefail
  
 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done
  
- (This was on a 60gb gp3 volume attached to a c5.4xlarge)
+ Test packages are available in the following ppa:
  
- We were able to find a fix that works and get the patch accepted
- upstream.  The short explanation is that by switching the superblock
- read to direct io, we no longer see the problem.
+ https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test
  
- The patch is available here, but hasn't been published in a released
- version of e2fsprogs:
+ If you install the test packages, the race no longer occurs.
  
- 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
+ [Where problems could occur]
  
- A longer thread with the maintainer is available here:
+ We are changing how resize2fs reads the superblock from underlying
+ disks.
  
+ If a regression were to occur, resize2fs could fail to resize offline or
+ online volumes. As all cloud-images are online resized during their
+ initial boot, this could have a large impact to public and private
+ clouds should a regression occur.
+ 
+ [Other info]
+ 
+ Upstream mailing list discussion: 
+ https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/
  
- This bug report is to request that Ubuntu backport this patch to the
- versions of e2fsprogs that are in releases that are available in images
- on AWS, preferably Focal and Jammy.
+ This was fixed in the below commit upstream:
+ 
+ commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
+ Author: Theodore Ts'o 
+ Date: Thu, 15 Jun 2023 00:17:01 -0400
+ Subject: resize2fs: use Direct I/O when reading the superblock for
+  online resizes
+ Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
+ 
+ The commit has not been tagged to any release. All supported Ubuntu
+ releases require this fix, and need to be published in standard non-ESM
+ archives to be picked up in cloud images.

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a 

[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
Attached is a debdiff for e2fsprogs on trusty which fixes this issue.

** Patch added: "Debdiff for e2fsprogs on trusty"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707900/+files/lp2036467_trusty.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
Attached is a debdiff for e2fsprogs on xenial which fixes this issue.

** Patch added: "Debdiff for e2fsprogs on xenial"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707899/+files/lp2036467_xenial.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
Attached is a debdiff for e2fsprogs on bionic which fixes this issue.

** Patch added: "Debdiff for e2fsprogs on bionic"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707898/+files/lp2036467_bionic.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
Attached is a debdiff for e2fsprogs on focal which fixes this issue.

** Patch added: "Debdiff for e2fsprogs on focal"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707896/+files/lp2036467_focal.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
Attached is a debdiff for e2fsprogs on jammy which fixes this issue.

** Patch added: "Debdiff for e2fsprogs on jammy"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707895/+files/lp2036467_jammy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
Attached is a debdiff for e2fsprogs on lunar which fixes this issue.

** Patch added: "Debdiff for e2fsprogs on lunar"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707894/+files/lp2036467_lunar.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
Attached is a debdiff for e2fsprogs on mantic which fixes this issue.

** Patch added: "Debdiff for e2fsprogs on mantic"
   
https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707893/+files/lp2036467_mantic.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: superblock checksum mismatch in resize2fs

2023-10-08 Thread Matthew Ruffell
** Also affects: e2fsprogs (Ubuntu Lunar)
   Importance: Undecided
   Status: New

** Also affects: e2fsprogs (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Also affects: e2fsprogs (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: e2fsprogs (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Changed in: e2fsprogs (Ubuntu Mantic)
   Status: Confirmed => In Progress

** Changed in: e2fsprogs (Ubuntu Lunar)
   Status: New => In Progress

** Changed in: e2fsprogs (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: e2fsprogs (Ubuntu Focal)
   Status: New => In Progress

** Changed in: e2fsprogs (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: e2fsprogs (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: e2fsprogs (Ubuntu Trusty)
   Status: New => In Progress

** Changed in: e2fsprogs (Ubuntu Mantic)
   Importance: Undecided => Critical

** Changed in: e2fsprogs (Ubuntu Lunar)
   Importance: Undecided => Critical

** Changed in: e2fsprogs (Ubuntu Jammy)
   Importance: Undecided => Critical

** Changed in: e2fsprogs (Ubuntu Focal)
   Importance: Undecided => Critical

** Changed in: e2fsprogs (Ubuntu Bionic)
   Importance: Undecided => Critical

** Changed in: e2fsprogs (Ubuntu Xenial)
   Importance: Undecided => Critical

** Changed in: e2fsprogs (Ubuntu Trusty)
   Importance: Undecided => Critical

** Changed in: e2fsprogs (Ubuntu Mantic)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: e2fsprogs (Ubuntu Lunar)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: e2fsprogs (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: e2fsprogs (Ubuntu Focal)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: e2fsprogs (Ubuntu Bionic)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: e2fsprogs (Ubuntu Xenial)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: e2fsprogs (Ubuntu Trusty)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  superblock checksum mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  In Progress
Status in e2fsprogs source package in Xenial:
  In Progress
Status in e2fsprogs source package in Bionic:
  In Progress
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  Hi,
  We run ext4 on EBS volumes on EC2.  During provisioning, cloud-init will 
occasionally report that resize2fs has failed due to a superblock checksum 
mismatch.  We debugged this internally, and were able to come up with the 
following reproducer:

 #!/usr/bin/bash
 set -euxo pipefail

 while true
 do
 parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
 sleep .5
 mkfs.ext4 /dev/nvme1n1p1
 mount -t ext4 /dev/nvme1n1p1 /mnt
 stress-ng --temp-path /mnt -D 4 &
 STRESS_PID=$!
 sleep 1
 growpart /dev/nvme1n1 1
 resize2fs /dev/nvme1n1p1
 kill $STRESS_PID
 wait $STRESS_PID
 umount /mnt
 wipefs -a /dev/nvme1n1p1
 wipefs -a /dev/nvme1n1
 done

  (This was on a 60gb gp3 volume attached to a c5.4xlarge)

  We were able to find a fix that works and get the patch accepted
  upstream.  The short explanation is that by switching the superblock
  read to direct io, we no longer see the problem.

  The patch is available here, but hasn't been published in a released
  version of e2fsprogs:

  
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  A longer thread with the maintainer is available here:

  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This bug report is to request that Ubuntu backport this patch to the
  versions of e2fsprogs that are in releases that are available in
  images on AWS, preferably Focal and Jammy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort

2023-03-21 Thread Matthew Ruffell
Hi William.

The libunwind SRU for Bionic and Focal have now been released to
-updates. Their versions are 1.2.1-8ubuntu0.1 for Bionic, and
1.2.1-9ubuntu0.1 for Focal.

I just want to apologise for the significant delay in getting libunwind
released. It really was a exceptional amount of time, and I'm sorry it
took so long.

Since I wrote to you last, I root caused the issue and worked with
Paride to resolve the regression that was introduced into autopkgtest
itself.

The bug in autopkgtest was quite obscure, and it required the following
to occur:

1. an all-proposed build (--apt-pocket=proposed with no package pinning)
2. multiple tests defined in d/t/control
3. the tests do not allow reusing the same testbed system

All these conditions were present in the kernel autopkgtests, and the
result was that the change to allow apt pinning for -proposed caused
_create_apt_pinning_for_packages() to be called incorrectly and it set a
pinning for the -release pocket at 990, over -updates and -proposed, at
500 each, which meant that -release was being favoured over -proposed,
and it caused all sorts of apt resolve issues.

The issue was introduced in:

commit 1c018c78de9d9421c0c358c900a37e545334cc66
From: Paride Legovini 
Date: Thu, 15 Dec 2022 21:47:02 +0100
Subject: Pin pockets with Pin-Priority: 500
Link: 
https://salsa.debian.org/ci-team/autopkgtest/-/commit/1c018c78de9d9421c0c358c900a37e545334cc66

The full explanation of the autopkgtest issues can be found in the below
emails:

>From myself to Paride
https://paste.ubuntu.com/p/44yFTBNBHh/

>From Paride to myself:
https://paste.ubuntu.com/p/jtt5wh6BB2/

Paride's merge request;
https://salsa.debian.org/ci-team/autopkgtest/-/merge_requests/218

Final fix commit:
https://salsa.debian.org/ci-team/autopkgtest/-/commit/94b9bb8db3051123d7b29a7880420340a76c7b7e

The fix is in place on the Launchpad build infrastructure, and we re-ran
all autopkgtests around libunwind and its reverse dependencies, and they
all passed, leading us clear to release libunwind to -updates.

Again, I sincerely apologise for keeping you waiting for so long, and I
thank you for your patience and understanding while I debugged
autopkgtest.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libunwind in Ubuntu.
https://bugs.launchpad.net/bugs/1999104

Title:
  arm64: broken c++ exception handler support leads to std::terminate()
  being called and program abort

Status in libunwind package in Ubuntu:
  Fix Released
Status in libunwind source package in Bionic:
  Fix Released
Status in libunwind source package in Focal:
  Fix Released

Bug description:
  [Impact]

  On architectures other than i386 and amd64, the C++ exception support
  in libunwind appears to be broken, always failing and calling
  std::terminate() which leads to the program aborting.

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0xf7c2daac in __GI_abort () at abort.c:79
  #2  0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #3  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #4  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #5  0xf7e1f5e0 in __cxa_rethrow ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #6  0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #7  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #8  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #9  0xf7e1f574 in __cxa_throw ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9
  #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9

  Compiling libunwind with --enable-cxx-exceptions enabled leads to
  _Unwind_RaiseException being called during __cxa_throw(), which fails
  to find a handler, and the generic std::terminate() is called instead,
  aborting the program.

  On i386 and amd64 this doesn't seem to be the case, and the libunwind
  handlers seem to be present.

  To fix, we only enable the configure option --enable-cxx-exceptions on
  i386 and amd64 only, in debian/rules. This lets other architectures
  fall back to the symbols provided by libgcc_s, which implementation
  works correctly.

  [Testcase]

  Ali Sadi has provided a reproducer program.

  Start an arm64 instance, for example, a c6g.medium instance on AWS,
  with either Bionic or Focal.

  $ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
  $ sudo apt install -y build-essential libunwind-dev
  $ tar xvf libunwind.tar.gz && cd test
  $ make all

  There are two executable, main and main_unwind. main is not linked to
  

[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort

2023-02-17 Thread Matthew Ruffell
Hi William,

I sincerely apologise for the delay.

Currently libunwind is stuck in -proposed due to benign autopkgtest
regressions in the kernel packages.

If you go to the below page:

https://people.canonical.com/~ubuntu-archive/pending-sru.html

And search for "libunwind" you will see entries for Bionic and Focal.

It is SRU policy to not release a package with current autopkgtest
regressions.

Now, I have spent more time than I am willing to admit on trying to
debug these failures, and I have also asked the Kernel Team, several
which took a look, and some Launchpad admins, and we are still a bit
stuck. The problem does not reproduce locally, only on Launchpad
builders.

For example, take the 4.15 Bionic Kernel:

https://autopkgtest.ubuntu.com/packages/l/linux/bionic/amd64

(it is a reverse dependency of libunwind, which is why it is selected
for autopkgtest)

https://autopkgtest.ubuntu.com/results/autopkgtest-
bionic/bionic/amd64/l/linux/20230110_115614_09e98@/log.gz

It rebuilds fine, but then runs into apt resolver trouble when running
the kernel testsuite.

autopkgtest makes a dummy package, that contains the list of necessary
dependencies to run the testsuite, dpkg -i to install the package, and
then does an apt install -f to force dependency resolution. The dummy
package is called autopkgtest-satdep.

https://paste.ubuntu.com/p/Cszfkvy47Z/

But it fails in strange ways, like not being able to select build-
essential, even though it is already installed in the builder.

I am still trying to debug the root cause behind these autopkgtest
regressions, which is why things have been delayed.

There is a provision in SRUs where they can be released as long as I can
prove that the upload did not cause the regression:

https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

In which case, I may as well invoke this clause, since I don't wish to
keep you waiting any longer.

I will try and get this package released within the week.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libunwind in Ubuntu.
https://bugs.launchpad.net/bugs/1999104

Title:
  arm64: broken c++ exception handler support leads to std::terminate()
  being called and program abort

Status in libunwind package in Ubuntu:
  Fix Released
Status in libunwind source package in Bionic:
  Fix Committed
Status in libunwind source package in Focal:
  Fix Committed

Bug description:
  [Impact]

  On architectures other than i386 and amd64, the C++ exception support
  in libunwind appears to be broken, always failing and calling
  std::terminate() which leads to the program aborting.

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0xf7c2daac in __GI_abort () at abort.c:79
  #2  0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #3  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #4  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #5  0xf7e1f5e0 in __cxa_rethrow ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #6  0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #7  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #8  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #9  0xf7e1f574 in __cxa_throw ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9
  #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9

  Compiling libunwind with --enable-cxx-exceptions enabled leads to
  _Unwind_RaiseException being called during __cxa_throw(), which fails
  to find a handler, and the generic std::terminate() is called instead,
  aborting the program.

  On i386 and amd64 this doesn't seem to be the case, and the libunwind
  handlers seem to be present.

  To fix, we only enable the configure option --enable-cxx-exceptions on
  i386 and amd64 only, in debian/rules. This lets other architectures
  fall back to the symbols provided by libgcc_s, which implementation
  works correctly.

  [Testcase]

  Ali Sadi has provided a reproducer program.

  Start an arm64 instance, for example, a c6g.medium instance on AWS,
  with either Bionic or Focal.

  $ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
  $ sudo apt install -y build-essential libunwind-dev
  $ tar xvf libunwind.tar.gz && cd test
  $ make all

  There are two executable, main and main_unwind. main is not linked to
  libunwind, and main_unwind is linked to libunwind.

  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  terminate called after throwing an instance of 'int'
  terminate called recursively
  Aborted 

[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received

2023-01-24 Thread Matthew Ruffell
read
+ or add any extra thread synchronisation primitives.
+ 
+ This has been tested with 13k VM deployments on Microsoft Azure, and has
+ found to work as expected with no failures, meaning risk of additional
+ race conditions we are not aware of is low.
+ 
+ The reason why this patch was not forwarded upstream, is that isc-dhcp
+ is now officially End Of Life, and has effectively been abandoned by
+ upstream. You can read about it in these notices:
+ 
+ https://lists.isc.org/pipermail/dhcp-users/2022-October/022786.html
+ https://www.isc.org/blogs/isc-dhcp-eol/
+ 
+ Upstream won't fix any more bugs, make any new releases, or even accept
+ any new commits. They are putting their efforts into isc-kea now.

** No longer affects: bind9-libs (Ubuntu Focal)

** No longer affects: bind9-libs (Ubuntu Jammy)

** Changed in: bind9-libs (Ubuntu)
   Status: Fix Released => Won't Fix

** Also affects: isc-dhcp (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: bind9-libs (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: isc-dhcp (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: bind9-libs (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** No longer affects: bind9-libs (Ubuntu Focal)

** No longer affects: bind9-libs (Ubuntu Jammy)

** Changed in: isc-dhcp (Ubuntu Focal)
   Status: New => In Progress

** Changed in: isc-dhcp (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: isc-dhcp (Ubuntu Focal)
   Importance: Undecided => High

** Changed in: isc-dhcp (Ubuntu Jammy)
   Importance: Undecided => High

** Changed in: isc-dhcp (Ubuntu Focal)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: isc-dhcp (Ubuntu Jammy)
 Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient: thread concurrency race leads to DHCPOFFER packets not being
  received

Status in bind9-libs package in Ubuntu:
  Won't Fix
Status in isc-dhcp package in Ubuntu:
  Invalid
Status in isc-dhcp source package in Focal:
  In Progress
Status in isc-dhcp source package in Jammy:
  In Progress

Bug description:
  [Impact]

  Occasionally, during instance boot or machine start-up, dhclient will
  attempt to acquire a dhcp lease and fail, leaving the instance with no
  IP address and making it unreachable.

  This happens about once every 100 reboots on bare metal, or Chris
  Patterson in comment #2 describes it as affecting between ~0.3% to 2%
  of deployments on Microsoft Azure. Azure uses dhclient called from
  cloud-init instead of systemd-networkd, and this is causing issues
  with larger deployments.

  The logs of an affected dhclient produce the following:

  Listening on LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   Socket/fallback
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
  ...
  (omitting 20 similar lines)
  ...
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
  No DHCPOFFERS received.
  No working leases in persistent database - sleeping.

  Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
  Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/

  The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER
  packets being replied to with DHCPOFFER packets, but the got_one()
  callback is never called, dhclient does not read these DHCPOFFER
  packets, and continues sending DHCPDISCOVER packets. Once it reaches
  25 DHCPDISCOVER packets sent, it gives up.

  tcpdump:
  Screenshot of Wireshark:

  This behaviour led several bug reporters to believe it was a kernel
  issue, with the kernel not pushing DHCPOFFER packets to dhclient. This
  is not the case, the actual problem is dhclient containing a thread
  concurrency race condition, and when the race occurs, the read socket
  is closed prematurely, and dhclient does not read any of the DHCPOFFER
  replies.

  The full explanation is in the "Other Info" section, but the fix is to
  add a mutex that restricts access to the global linked list of open
  sockets, and ensures that a newly created socket is added to this
  list, before the socketmanager callback has an opportunity to walk
  this list when there is data immediately able to be read.

  Mauricio has provided such a patch, and includes options to disable
  this behaviour during runtime to minimise regression risk.

  [Testcase]

  Reproducer based on GDB and DHCP noise injection.

  It uses 3 veth pairs (DHCP server/cl

[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received

2023-01-16 Thread Matthew Ruffell
** Tags added: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient: thread concurrency race leads to DHCPOFFER packets not being
  received

Status in bind9-libs package in Ubuntu:
  Fix Released
Status in isc-dhcp package in Ubuntu:
  Invalid
Status in bind9-libs source package in Focal:
  In Progress
Status in bind9-libs source package in Jammy:
  In Progress

Bug description:
  [Impact]

  Occasionally, during instance boot or machine start-up, dhclient will
  attempt to acquire a dhcp lease and fail, leaving the instance with no
  IP address and making it unreachable.

  This happens about once every 100 reboots on bare metal, or Chris
  Patterson in comment #2 describes it as affecting between ~0.3% to 2%
  of deployments on Microsoft Azure. Azure uses dhclient called from
  cloud-init instead of systemd-networkd, and this is causing issues
  with larger deployments.

  The logs of an affected dhclient produce the following:

  Listening on LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   Socket/fallback
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
  ...
  (omitting 20 similar lines)
  ...
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
  No DHCPOFFERS received.
  No working leases in persistent database - sleeping.

  Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
  Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/

  The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER
  packets being replied to with DHCPOFFER packets, but the got_one()
  callback is never called, dhclient does not read these DHCPOFFER
  packets, and continues sending DHCPDISCOVER packets. Once it reaches
  25 DHCPDISCOVER packets sent, it gives up.

  tcpdump: 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap
  Screenshot of Wireshark: 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png

  This behaviour led several bug reporters to believe it was a kernel
  issue, with the kernel not pushing DHCPOFFER packets to dhclient. This
  is not the case, the actual problem is dhclient containing a thread
  concurrency race condition, and when the race occurs, the read socket
  is closed prematurely, and dhclient does not read any of the DHCPOFFER
  replies.

  The full explanation is in the "Other Info" section, but the fix for
  this is to change bind9-libs from being built multithreaded, back to
  single threaded as intended by dhclient maintainers.

  In Focal and Jammy, isc-dhcp links against bind9 libraries provided in
  bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9
  library it uses, which is already configured properly to --disable-
  threads.

  Change the Focal and Jammy bind9-libs to --disable-threads and update
  symbol files to reflect the library is single threaded again.

  [Testcase]

  Start a fresh Focal or Jammy instance.

  Download and set executable test-parallel.sh, and edit some lines:

  1) wget 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
  2) chmod +x test-parallel.sh
  3) vim test-parallel.sh

  Change iface="enp5s0" to your interface, likely iface="enp1s0".
  Comment out the line "#   cp bionic-dhclient $workdir/dhclient".

  4) sudo ./test-parallel.sh

  After five minutes, if you issue reproduces, you will see "TEST
  FAILED".

  You can watch the output with:

  5) cat /tmp/dhclient-* | less

  Next, for instrumented runs, you need to build dhclient from source.

  1) sudo apt install build-essential devscripts
  2) apt source isc-dhcp
  3) sudo apt build-dep isc-dhcp
  4) cd isc-dhcp

  Apply the below patch:

  https://paste.ubuntu.com/p/hGsssrVyG4/

  5) patch -p1 < ~/patch.patch
  6) debuild -b -uc -us
  7) cd ..
  8) sudo dpkg -i isc-dhcp-client-*
  9) sudo ./test-parallel.sh
  10) cat /tmp/dhclient-* | less

  Look for the race, as described in "Other Info", namely:

  mruffell: registering with socket manager
  mruffell: callback called
  mruffell: omapi object is NULL
  mruffell: omapi object is NULL
  mruffell: Adding obj to linked list
  mruffell: Obj added to list

  The issue has reproduced.

  If you install the test package from the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test

  Instructions to install (on a Focal or Jammy system):
  1) sudo add-apt-repository 

[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received

2023-01-16 Thread Matthew Ruffell
Screenshot of wireshark.

** Attachment added: "Screenshot of wireshark"
   
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png

** Description changed:

  [Impact]
  
  Occasionally, during instance boot or machine start-up, dhclient will
  attempt to acquire a dhcp lease and fail, leaving the instance with no
  IP address and making it unreachable.
  
  This happens about once every 100 reboots on bare metal, or Chris
  Patterson in comment #2 describes it as affecting between ~0.3% to 2% of
  deployments on Microsoft Azure. Azure uses dhclient called from cloud-
  init instead of systemd-networkd, and this is causing issues with larger
  deployments.
  
  The logs of an affected dhclient produce the following:
  
  Listening on LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   LPF/enp1s0/52:54:00:1c:d7:00
  Sending on   Socket/fallback
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
  ...
  (omitting 20 similar lines)
  ...
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
  DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
  No DHCPOFFERS received.
  No working leases in persistent database - sleeping.
  
  Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
  Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
  
  The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER
  packets being replied to with DHCPOFFER packets, but the got_one()
  callback is never called, dhclient does not read these DHCPOFFER
  packets, and continues sending DHCPDISCOVER packets. Once it reaches 25
  DHCPDISCOVER packets sent, it gives up.
  
- tcpdump:
- Screenshot of Wireshark:
+ tcpdump: 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641810/+files/test.pcap
+ Screenshot of Wireshark: 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641811/+files/Screenshot_2023-01-17-16-14-21_1920x1200%250A1920x1080%250A1920x1080.png
  
  This behaviour led several bug reporters to believe it was a kernel
  issue, with the kernel not pushing DHCPOFFER packets to dhclient. This
  is not the case, the actual problem is dhclient containing a thread
  concurrency race condition, and when the race occurs, the read socket is
  closed prematurely, and dhclient does not read any of the DHCPOFFER
  replies.
  
  The full explanation is in the "Other Info" section, but the fix for
  this is to change bind9-libs from being built multithreaded, back to
  single threaded as intended by dhclient maintainers.
  
  In Focal and Jammy, isc-dhcp links against bind9 libraries provided in
  bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9
  library it uses, which is already configured properly to --disable-
  threads.
  
  Change the Focal and Jammy bind9-libs to --disable-threads and update
  symbol files to reflect the library is single threaded again.
  
  [Testcase]
  
  Start a fresh Focal or Jammy instance.
  
  Download and set executable test-parallel.sh, and edit some lines:
  
  1) wget 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
  2) chmod +x test-parallel.sh
  3) vim test-parallel.sh
  
  Change iface="enp5s0" to your interface, likely iface="enp1s0".
  Comment out the line "#   cp bionic-dhclient $workdir/dhclient".
  
  4) sudo ./test-parallel.sh
  
  After five minutes, if you issue reproduces, you will see "TEST FAILED".
  
  You can watch the output with:
  
  5) cat /tmp/dhclient-* | less
  
  Next, for instrumented runs, you need to build dhclient from source.
  
  1) sudo apt install build-essential devscripts
  2) apt source isc-dhcp
  3) sudo apt build-dep isc-dhcp
  4) cd isc-dhcp
  
  Apply the below patch:
  
  https://paste.ubuntu.com/p/hGsssrVyG4/
  
  5) patch -p1 < ~/patch.patch
  6) debuild -b -uc -us
  7) cd ..
  8) sudo dpkg -i isc-dhcp-client-*
  9) sudo ./test-parallel.sh
  10) cat /tmp/dhclient-* | less
  
  Look for the race, as described in "Other Info", namely:
  
  mruffell: registering with socket manager
  mruffell: callback called
  mruffell: omapi object is NULL
  mruffell: omapi object is NULL
  mruffell: Adding obj to linked list
  mruffell: Obj added to list
  
  The issue has reproduced.
  
  If you install the test package from the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf337873-test
  
  Instructions to install (on a Focal or Jammy system):
  1) sudo add-apt-repository ppa:mruffell/sf337873-test
  2) sudo apt update
  3) sudo apt install libdns-export1109 libisc-export1105
  4) sudo apt-cache policy libisc-export1105 | grep Installed
  Installed: 

[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received

2023-01-16 Thread Matthew Ruffell
packet capture from a reproduction run

** Description changed:

- Platform: Qemu/libvirt on AMD64
- Ubuntu version: 20.04
- isc-dhcp-client version: 4.4.1-2.1ubuntu5
- Problem: When dhclient is used during boot every few reboots the DHCP OFFER 
packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be 
seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no 
read event is triggered.
- Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these 
dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still 
occurs. So this issue might also be kernel related.
- 
- Attached diff shows a strace of all threads and a pcap showing the
- tcpdump output.
- 
- Edit:
- - Sometimes the dhclient command does receive the OFFER packet and connection 
is restored.
- - In my testing running dhclient manually from the terminal when the OFFERs 
aren't received will result in a new dhclient session which does receive the 
OFFER packet and connection is restored.
+ [Impact]
+ 
+ Occasionally, during instance boot or machine start-up, dhclient will
+ attempt to acquire a dhcp lease and fail, leaving the instance with no
+ IP address and making it unreachable.
+ 
+ This happens about once every 100 reboots on bare metal, or Chris
+ Patterson in comment #2 describes it as affecting between ~0.3% to 2% of
+ deployments on Microsoft Azure. Azure uses dhclient called from cloud-
+ init instead of systemd-networkd, and this is causing issues with larger
+ deployments.
+ 
+ The logs of an affected dhclient produce the following:
+ 
+ Listening on LPF/enp1s0/52:54:00:1c:d7:00
+ Sending on   LPF/enp1s0/52:54:00:1c:d7:00
+ Sending on   Socket/fallback
+ DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 3 (xid=0xd222950f)
+ DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 5 (xid=0xd222950f)
+ ...
+ (omitting 20 similar lines)
+ ...
+ DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 13 (xid=0xd222950f)
+ DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 8 (xid=0xd222950f)
+ DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 6 (xid=0xd222950f)
+ No DHCPOFFERS received.
+ No working leases in persistent database - sleeping.
+ 
+ Full log: https://paste.ubuntu.com/p/8yBfw2KR5h/
+ Log of a working run: https://paste.ubuntu.com/p/N3ZgqrxyQD/
+ 
+ The bizarre thing is when you tcpdump dhclient, we see all DHCPDISOVER
+ packets being replied to with DHCPOFFER packets, but the got_one()
+ callback is never called, dhclient does not read these DHCPOFFER
+ packets, and continues sending DHCPDISCOVER packets. Once it reaches 25
+ DHCPDISCOVER packets sent, it gives up.
+ 
+ tcpdump:
+ Screenshot of Wireshark:
+ 
+ This behaviour led several bug reporters to believe it was a kernel
+ issue, with the kernel not pushing DHCPOFFER packets to dhclient. This
+ is not the case, the actual problem is dhclient containing a thread
+ concurrency race condition, and when the race occurs, the read socket is
+ closed prematurely, and dhclient does not read any of the DHCPOFFER
+ replies.
+ 
+ The full explanation is in the "Other Info" section, but the fix for
+ this is to change bind9-libs from being built multithreaded, back to
+ single threaded as intended by dhclient maintainers.
+ 
+ In Focal and Jammy, isc-dhcp links against bind9 libraries provided in
+ bind9-libs, while in Kinetic onward isc-dhcp has an in-tree bind9
+ library it uses, which is already configured properly to --disable-
+ threads.
+ 
+ Change the Focal and Jammy bind9-libs to --disable-threads and update
+ symbol files to reflect the library is single threaded again.
+ 
+ [Testcase]
+ 
+ Start a fresh Focal or Jammy instance.
+ 
+ Download and set executable test-parallel.sh, and edit some lines:
+ 
+ 1) wget 
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5593045/+files/test-parallel.sh
+ 2) chmod +x test-parallel.sh
+ 3) vim test-parallel.sh
+ 
+ Change iface="enp5s0" to your interface, likely iface="enp1s0".
+ Comment out the line "#   cp bionic-dhclient $workdir/dhclient".
+ 
+ 4) sudo ./test-parallel.sh
+ 
+ After five minutes, if you issue reproduces, you will see "TEST FAILED".
+ 
+ You can watch the output with:
+ 
+ 5) cat /tmp/dhclient-* | less
+ 
+ Next, for instrumented runs, you need to build dhclient from source.
+ 
+ 1) sudo apt install build-essential devscripts
+ 2) apt source isc-dhcp
+ 3) sudo apt build-dep isc-dhcp
+ 4) cd isc-dhcp
+ 
+ Apply the below patch:
+ 
+ https://paste.ubuntu.com/p/hGsssrVyG4/
+ 
+ 5) patch -p1 < ~/patch.patch
+ 6) debuild -b -uc -us
+ 7) cd ..
+ 8) sudo dpkg -i isc-dhcp-client-*
+ 9) sudo ./test-parallel.sh
+ 10) cat /tmp/dhclient-* | less
+ 
+ Look for the race, as described in "Other Info", namely:
+ 
+ mruffell: registering with socket manager
+ mruffell: callback called
+ mruffell: omapi object is NULL
+ mruffell: omapi object is NULL
+ mruffell: Adding obj to linked list
+ mruffell: 

[Touch-packages] [Bug 1926139] Re: dhclient: thread concurrency race leads to DHCPOFFER packets not being received

2023-01-15 Thread Matthew Ruffell
** Summary changed:

- dhclient doesn't receive dhcp offer from kernel
+ dhclient: thread concurrency race leads to DHCPOFFER packets not being 
received

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient: thread concurrency race leads to DHCPOFFER packets not being
  received

Status in bind9-libs package in Ubuntu:
  Fix Released
Status in isc-dhcp package in Ubuntu:
  Invalid
Status in bind9-libs source package in Focal:
  In Progress
Status in bind9-libs source package in Jammy:
  In Progress

Bug description:
  Platform: Qemu/libvirt on AMD64
  Ubuntu version: 20.04
  isc-dhcp-client version: 4.4.1-2.1ubuntu5
  Problem: When dhclient is used during boot every few reboots the DHCP OFFER 
packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be 
seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no 
read event is triggered.
  Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these 
dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still 
occurs. So this issue might also be kernel related.

  Attached diff shows a strace of all threads and a pcap showing the
  tcpdump output.

  Edit:
  - Sometimes the dhclient command does receive the OFFER packet and connection 
is restored.
  - In my testing running dhclient manually from the terminal when the OFFERs 
aren't received will result in a new dhclient session which does receive the 
OFFER packet and connection is restored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/bind9-libs/+bug/1926139/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1926139] Re: dhclient doesn't receive dhcp offer from kernel

2023-01-15 Thread Matthew Ruffell
Attached is a debdiff for Jammy which fixes this bug.

** Patch added: "Debdiff for bind9-libs for Jammy"
   
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641516/+files/lp1926139_jammy.debdiff

** Tags added: focal jammy sts

** Also affects: bind9-libs (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: isc-dhcp (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: bind9-libs (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: isc-dhcp (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: bind9-libs (Ubuntu Focal)
   Importance: Undecided
   Status: New

** No longer affects: isc-dhcp (Ubuntu Focal)

** No longer affects: isc-dhcp (Ubuntu Jammy)

** Changed in: isc-dhcp (Ubuntu)
   Status: New => Invalid

** Changed in: bind9-libs (Ubuntu Focal)
   Status: New => In Progress

** Changed in: bind9-libs (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: bind9-libs (Ubuntu)
   Status: New => Fix Released

** Changed in: bind9-libs (Ubuntu Focal)
   Importance: Undecided => High

** Changed in: bind9-libs (Ubuntu Jammy)
   Importance: Undecided => High

** Changed in: bind9-libs (Ubuntu Focal)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: bind9-libs (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient doesn't receive dhcp offer from kernel

Status in bind9-libs package in Ubuntu:
  Fix Released
Status in isc-dhcp package in Ubuntu:
  Invalid
Status in bind9-libs source package in Focal:
  In Progress
Status in bind9-libs source package in Jammy:
  In Progress

Bug description:
  Platform: Qemu/libvirt on AMD64
  Ubuntu version: 20.04
  isc-dhcp-client version: 4.4.1-2.1ubuntu5
  Problem: When dhclient is used during boot every few reboots the DHCP OFFER 
packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be 
seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no 
read event is triggered.
  Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these 
dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still 
occurs. So this issue might also be kernel related.

  Attached diff shows a strace of all threads and a pcap showing the
  tcpdump output.

  Edit:
  - Sometimes the dhclient command does receive the OFFER packet and connection 
is restored.
  - In my testing running dhclient manually from the terminal when the OFFERs 
aren't received will result in a new dhclient session which does receive the 
OFFER packet and connection is restored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/bind9-libs/+bug/1926139/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1926139] Re: dhclient doesn't receive dhcp offer from kernel

2023-01-15 Thread Matthew Ruffell
Attached is a debdiff for Focal which fixes this bug.

** Patch added: "Debdiff for bind9-libs for Focal"
   
https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1926139/+attachment/5641515/+files/lp1926139_focal.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1926139

Title:
  dhclient doesn't receive dhcp offer from kernel

Status in bind9-libs package in Ubuntu:
  Fix Released
Status in isc-dhcp package in Ubuntu:
  Invalid
Status in bind9-libs source package in Focal:
  In Progress
Status in bind9-libs source package in Jammy:
  In Progress

Bug description:
  Platform: Qemu/libvirt on AMD64
  Ubuntu version: 20.04
  isc-dhcp-client version: 4.4.1-2.1ubuntu5
  Problem: When dhclient is used during boot every few reboots the DHCP OFFER 
packets aren't pushed from the kernel to dhclient. The DISCOVER packets can be 
seen in strace and tcpdump. The OFFER packets can be seen in tcpdump, but no 
read event is triggered.
  Ubuntu 18.04 doesn't have the problem, neither does Debian 10. Building these 
dhclient versions on Ubuntu 20.04 alleviates the problem a little, but it still 
occurs. So this issue might also be kernel related.

  Attached diff shows a strace of all threads and a pcap showing the
  tcpdump output.

  Edit:
  - Sometimes the dhclient command does receive the OFFER packet and connection 
is restored.
  - In my testing running dhclient manually from the terminal when the OFFERs 
aren't received will result in a new dhclient session which does receive the 
OFFER packet and connection is restored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/bind9-libs/+bug/1926139/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort

2023-01-04 Thread Matthew Ruffell
Performing verification for Bionic.

I started two instances on AWS, one c6g.medium (arm64) and a t2.micro
(amd64).

I went through the reproducer listed in the testcase with libunwind-dev
1.2.1-8 from -release.

$ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
$ sudo apt install -y build-essential libunwind-dev
$ tar xvf libunwind.tar.gz && cd test
test/
test/lib.hpp
test/main.cpp
test/lib.cpp
test/Makefile
~/test$ make all
g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp
g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest
g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,.  main.cpp -ltest -lunwind

On arm64:

~/test$ ./main
int throws lib
int caught main
~/test$ ./main_unwind
terminate called after throwing an instance of 'int'
terminate called recursively
Aborted (core dumped)

On amd64:

~/test$ ./main
int throws lib
int caught main
~/test$ ./main_unwind
int throws lib
int caught main

As expected, we see arm64 abort the execution of the reproducer.

I then installed 1.2.1-8ubuntu0.1 from -proposed and rebuilt the
reproducers:

$ make clean
$ make all
g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp
g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest
g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,.  main.cpp -ltest -lunwind

On arm64:

$ ./main
int throws lib
int caught main
$ ./main_unwind
int throws lib
int caught main

On amd64:

~/test$ ./main
int throws lib
int caught main
~/test$ ./main_unwind
int throws lib
int caught main

We see that 1.2.1-8ubuntu0.1 from -proposed does not abort, and instead
runs as expected. There is no change in behaviour on amd64. The package
in -proposed fixes the problem, happy to mark as verified for Bionic.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libunwind in Ubuntu.
https://bugs.launchpad.net/bugs/1999104

Title:
  arm64: broken c++ exception handler support leads to std::terminate()
  being called and program abort

Status in libunwind package in Ubuntu:
  Fix Released
Status in libunwind source package in Bionic:
  Fix Committed
Status in libunwind source package in Focal:
  Fix Committed

Bug description:
  [Impact]

  On architectures other than i386 and amd64, the C++ exception support
  in libunwind appears to be broken, always failing and calling
  std::terminate() which leads to the program aborting.

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0xf7c2daac in __GI_abort () at abort.c:79
  #2  0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #3  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #4  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #5  0xf7e1f5e0 in __cxa_rethrow ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #6  0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #7  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #8  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #9  0xf7e1f574 in __cxa_throw ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9
  #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9

  Compiling libunwind with --enable-cxx-exceptions enabled leads to
  _Unwind_RaiseException being called during __cxa_throw(), which fails
  to find a handler, and the generic std::terminate() is called instead,
  aborting the program.

  On i386 and amd64 this doesn't seem to be the case, and the libunwind
  handlers seem to be present.

  To fix, we only enable the configure option --enable-cxx-exceptions on
  i386 and amd64 only, in debian/rules. This lets other architectures
  fall back to the symbols provided by libgcc_s, which implementation
  works correctly.

  [Testcase]

  Ali Sadi has provided a reproducer program.

  Start an arm64 instance, for example, a c6g.medium instance on AWS,
  with either Bionic or Focal.

  $ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
  $ sudo apt install -y build-essential libunwind-dev
  $ tar xvf libunwind.tar.gz && cd test
  $ make all

  There are two executable, main and main_unwind. main is not linked to
  libunwind, and main_unwind is linked to libunwind.

  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  terminate called after throwing an instance of 'int'
  terminate called recursively
  Aborted (core dumped)

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test

  $ make clean
  $ sudo apt install -y 

[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort

2023-01-04 Thread Matthew Ruffell
Performing verification for Focal.

I started two instances on AWS, one c6g.medium (arm64) and a t2.micro
(amd64).

I went through the reproducer listed in the testcase with libunwind-dev
1.2.1-9build1 from -release.

$ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
$ sudo apt install -y build-essential libunwind-dev
$ tar xvf libunwind.tar.gz && cd test
test/
test/lib.hpp
test/main.cpp
test/lib.cpp
test/Makefile
~/test$ make all
g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp
g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest
g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,.  main.cpp -ltest -lunwind

On arm64:

~/test$ ./main
int throws lib
int caught main
~/test$ ./main_unwind
terminate called after throwing an instance of 'int'
terminate called recursively
Aborted (core dumped)

On amd64:

~/test$ ./main
int throws lib
int caught main
~/test$ ./main_unwind
int throws lib
int caught main

As expected, we see arm64 abort the execution of the reproducer.

I then installed 1.2.1-9ubuntu0.1 from -proposed and rebuilt the
reproducers:

$ make clean
$ make all
g++ -g -shared-libgcc -shared -fPIC -std=c++11 -o libtest.so lib.cpp
g++ -g -shared-libgcc -o main -L. -Wl,-rpath,. main.cpp -ltest
g++ -g -shared-libgcc -o main_unwind -L. -Wl,-rpath,.  main.cpp -ltest -lunwind

On arm64:

$ ./main
int throws lib
int caught main
$ ./main_unwind
int throws lib
int caught main

On amd64:

~/test$ ./main
int throws lib
int caught main
~/test$ ./main_unwind
int throws lib
int caught main

We see that 1.2.1-9ubuntu0.1 from -proposed does not abort, and instead
runs as expected. There is no change in behaviour on amd64. The package
in -proposed fixes the problem, happy to mark as verified for Focal.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libunwind in Ubuntu.
https://bugs.launchpad.net/bugs/1999104

Title:
  arm64: broken c++ exception handler support leads to std::terminate()
  being called and program abort

Status in libunwind package in Ubuntu:
  Fix Released
Status in libunwind source package in Bionic:
  Fix Committed
Status in libunwind source package in Focal:
  Fix Committed

Bug description:
  [Impact]

  On architectures other than i386 and amd64, the C++ exception support
  in libunwind appears to be broken, always failing and calling
  std::terminate() which leads to the program aborting.

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0xf7c2daac in __GI_abort () at abort.c:79
  #2  0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #3  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #4  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #5  0xf7e1f5e0 in __cxa_rethrow ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #6  0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #7  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #8  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #9  0xf7e1f574 in __cxa_throw ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9
  #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9

  Compiling libunwind with --enable-cxx-exceptions enabled leads to
  _Unwind_RaiseException being called during __cxa_throw(), which fails
  to find a handler, and the generic std::terminate() is called instead,
  aborting the program.

  On i386 and amd64 this doesn't seem to be the case, and the libunwind
  handlers seem to be present.

  To fix, we only enable the configure option --enable-cxx-exceptions on
  i386 and amd64 only, in debian/rules. This lets other architectures
  fall back to the symbols provided by libgcc_s, which implementation
  works correctly.

  [Testcase]

  Ali Sadi has provided a reproducer program.

  Start an arm64 instance, for example, a c6g.medium instance on AWS,
  with either Bionic or Focal.

  $ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
  $ sudo apt install -y build-essential libunwind-dev
  $ tar xvf libunwind.tar.gz && cd test
  $ make all

  There are two executable, main and main_unwind. main is not linked to
  libunwind, and main_unwind is linked to libunwind.

  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  terminate called after throwing an instance of 'int'
  terminate called recursively
  Aborted (core dumped)

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test

  $ make clean
  $ sudo apt 

[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort

2022-12-13 Thread Matthew Ruffell
** Tags added: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libunwind in Ubuntu.
https://bugs.launchpad.net/bugs/1999104

Title:
  arm64: broken c++ exception handler support leads to std::terminate()
  being called and program abort

Status in libunwind package in Ubuntu:
  Fix Released
Status in libunwind source package in Bionic:
  In Progress
Status in libunwind source package in Focal:
  In Progress

Bug description:
  [Impact]

  On architectures other than i386 and amd64, the C++ exception support
  in libunwind appears to be broken, always failing and calling
  std::terminate() which leads to the program aborting.

  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0xf7c2daac in __GI_abort () at abort.c:79
  #2  0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #3  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #4  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #5  0xf7e1f5e0 in __cxa_rethrow ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #6  0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #7  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
  #8  0xf7e1f280 in std::terminate() ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #9  0xf7e1f574 in __cxa_throw ()
 from /lib/aarch64-linux-gnu/libstdc++.so.6
  #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9
  #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9

  Compiling libunwind with --enable-cxx-exceptions enabled leads to
  _Unwind_RaiseException being called during __cxa_throw(), which fails
  to find a handler, and the generic std::terminate() is called instead,
  aborting the program.

  On i386 and amd64 this doesn't seem to be the case, and the libunwind
  handlers seem to be present.

  To fix, we only enable the configure option --enable-cxx-exceptions on
  i386 and amd64 only, in debian/rules. This lets other architectures
  fall back to the symbols provided by libgcc_s, which implementation
  works correctly.

  [Testcase]

  Ali Sadi has provided a reproducer program.

  Start an arm64 instance, for example, a c6g.medium instance on AWS,
  with either Bionic or Focal.

  $ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
  $ sudo apt install -y build-essential libunwind-dev
  $ tar xvf libunwind.tar.gz && cd test
  $ make all

  There are two executable, main and main_unwind. main is not linked to
  libunwind, and main_unwind is linked to libunwind.

  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  terminate called after throwing an instance of 'int'
  terminate called recursively
  Aborted (core dumped)

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test

  $ make clean
  $ sudo apt install -y libunwind-dev
  $ make all
  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  int throws lib
  int caught main

  The exception is caught as expected the program does not abort.

  [Where problems could occur]

  For architectures other than i386 and amd64, we are changing from
  libunwind provided exception handlers for __cxa_throw(), and using
  those provided by libgcc_s instead.

  There are a few reverse dependencies for libunwind-dev and libunwind8,
  which need to be considered:

  $ apt rdepends libunwind-dev
  libunwind-dev
  Reverse Depends:
Depends: libunwind-setjmp0-dev (= 1.2.1-9build1)
Depends: libefl-all-dev

  t$ apt rdepends libunwind-dev8
  libunwind8
  Reverse Depends:
Depends: libunwind-dev (= 1.2.1-9build1)
Depends: xvfb
Depends: xnest
Depends: xdmx
Depends: xwayland
Depends: xserver-xorg-core
Depends: xserver-xephyr
Depends: linux-tools-5.4.0-*
Depends: linux-raspi-tools-*
Depends: linux-raspi2-tools-5.4.0-*
Depends: linux-raspi2-5.4-tools-5.4.0-*
Depends: linux-oracle-5.15-tools-5.15.0-*
Depends: linux-lowlatency-hwe-5.15-tools-5.15.0-*
Depends: linux-hwe-5.8-tools-5.8.0-*
Depends: linux-hwe-5.15-tools-5.15.0-*
Depends: linux-gke-tools-5.4.0-*
Depends: linux-gke-5.15-tools-5.15.0-*
Depends: linux-gcp-tools-5.4.0-*
Depends: linux-gcp-5.15-tools-5.15.0-*
Depends: linux-azure-tools-5.4.0-*
Depends: linux-azure-5.15-tools-5.15.0-*
Depends: linux-aws-tools-5.4.0-*
Depends: linux-aws-5.8-tools-5.8.0-*
Depends: linux-aws-5.15-tools-5.15.0-*
Depends: xvfb
Depends: xnest
Depends: xdmx
Depends: trafficserver
Depends: tilix
Depends: tigervnc-standalone-server
Depends: tarantool
Depends: 

[Touch-packages] [Bug 1999104] Re: arm64: broken c++ exception handler support leads to std::terminate() being called and program abort

2022-12-12 Thread Matthew Ruffell
** Summary changed:

- libunwind causes crashes on arm64
+ arm64: broken c++ exception handler support leads to std::terminate() being 
called and program abort

** Description changed:

- There is a bug in libunwind in both 18.04 and 20.04 on arm64 where when
- linked with libunwind instead of catching an exception, the program
- crashes. This was first seen on mcrouter, but attached is a small
- reproducer where `main_unwind` will crash. The libunwind shipping with
- 22.04 doesn't appear to have this problem, nor do unmodified upstream
- versions (including the 1.2.1 which is the 18.04 and 20.04 version).
+ [Impact]
  
- Attached is a small reproducer that demonstrates the problem.
+ On architectures other than i386 and amd64, the C++ exception support in
+ libunwind appears to be broken, always failing and calling
+ std::terminate() which leads to the program aborting.
  
- Ubuntu 22.04:
- ```
- $ ./main
- int throws lib
- int caught main
- $ ./main_unwind
- int throws lib
- int caught main
- ```
+ (gdb) bt
+ #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
+ #1  0xf7c2daac in __GI_abort () at abort.c:79
+ #2  0xf7e21868 in __gnu_cxx::__verbose_terminate_handler() ()
+from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #3  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #4  0xf7e1f280 in std::terminate() ()
+from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #5  0xf7e1f5e0 in __cxa_rethrow ()
+from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #6  0xf7e21804 in __gnu_cxx::__verbose_terminate_handler() ()
+from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #7  0xf7e1f21c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #8  0xf7e1f280 in std::terminate() ()
+from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #9  0xf7e1f574 in __cxa_throw ()
+from /lib/aarch64-linux-gnu/libstdc++.so.6
+ #10 0xf7fb9f50 in function_throws_int () at lib.cpp:9
+ #11 0x0d54 in main (argc=1, argv=0xfab8) at main.cpp:9
  
- Ubuntu 20.04:
- ```
+ Compiling libunwind with --enable-cxx-exceptions enabled leads to
+ _Unwind_RaiseException being called during __cxa_throw(), which fails to
+ find a handler, and the generic std::terminate() is called instead,
+ aborting the program.
+ 
+ On i386 and amd64 this doesn't seem to be the case, and the libunwind
+ handlers seem to be present.
+ 
+ To fix, we only enable the configure option --enable-cxx-exceptions on
+ i386 and amd64 only, in debian/rules. This lets other architectures fall
+ back to the symbols provided by libgcc_s, which implementation works
+ correctly.
+ 
+ [Testcase]
+ 
+ Ali Sadi has provided a reproducer program.
+ 
+ Start an arm64 instance, for example, a c6g.medium instance on AWS, with
+ either Bionic or Focal.
+ 
+ $ wget 
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635122/+files/libunwind.tar.gz
+ $ sudo apt install -y build-essential libunwind-dev
+ $ tar xvf libunwind.tar.gz && cd test
+ $ make all
+ 
+ There are two executable, main and main_unwind. main is not linked to
+ libunwind, and main_unwind is linked to libunwind.
+ 
  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  terminate called after throwing an instance of 'int'
  terminate called recursively
  Aborted (core dumped)
- ```
+ 
+ If you install the test package available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf350246-test
+ 
+ $ make clean
+ $ sudo apt install -y libunwind-dev
+ $ make all
+ $ ./main
+ int throws lib
+ int caught main
+ $ ./main_unwind
+ int throws lib
+ int caught main
+ 
+ The exception is caught as expected the program does not abort.
+ 
+ [Where problems could occur]
+ 
+ For architectures other than i386 and amd64, we are changing from
+ libunwind provided exception handlers for __cxa_throw(), and using those
+ provided by libgcc_s instead.
+ 
+ There are a few reverse dependencies for libunwind-dev and libunwind8,
+ which need to be considered:
+ 
+ $ apt rdepends libunwind-dev
+ libunwind-dev
+ Reverse Depends:
+   Depends: libunwind-setjmp0-dev (= 1.2.1-9build1)
+   Depends: libefl-all-dev
+ 
+ t$ apt rdepends libunwind-dev8
+ libunwind8
+ Reverse Depends:
+   Depends: libunwind-dev (= 1.2.1-9build1)
+   Depends: xvfb
+   Depends: xnest
+   Depends: xdmx
+   Depends: xwayland
+   Depends: xserver-xorg-core
+   Depends: xserver-xephyr
+   Depends: linux-tools-5.4.0-*
+   Depends: linux-raspi-tools-*
+   Depends: linux-raspi2-tools-5.4.0-*
+   Depends: linux-raspi2-5.4-tools-5.4.0-*
+   Depends: linux-oracle-5.15-tools-5.15.0-*
+   Depends: linux-lowlatency-hwe-5.15-tools-5.15.0-*
+   Depends: linux-hwe-5.8-tools-5.8.0-*
+   Depends: linux-hwe-5.15-tools-5.15.0-*
+   Depends: linux-gke-tools-5.4.0-*
+   Depends: linux-gke-5.15-tools-5.15.0-*
+   Depends: linux-gcp-tools-5.4.0-*
+   Depends: linux-gcp-5.15-tools-5.15.0-*
+   Depends: 

[Touch-packages] [Bug 1999104] Re: libunwind causes crashes on arm64

2022-12-12 Thread Matthew Ruffell
Attached is a debdiff which fixes this problem on Bionic

** Also affects: libunwind (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: libunwind (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: libunwind (Ubuntu)
   Status: New => Fix Released

** Changed in: libunwind (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: libunwind (Ubuntu Focal)
   Status: New => In Progress

** Changed in: libunwind (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: libunwind (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: libunwind (Ubuntu Bionic)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: libunwind (Ubuntu Focal)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Patch added: "Debdiff for libunwind on Bionic"
   
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635450/+files/lp1999104_bionic.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libunwind in Ubuntu.
https://bugs.launchpad.net/bugs/1999104

Title:
  libunwind causes crashes on arm64

Status in libunwind package in Ubuntu:
  Fix Released
Status in libunwind source package in Bionic:
  In Progress
Status in libunwind source package in Focal:
  In Progress

Bug description:
  There is a bug in libunwind in both 18.04 and 20.04 on arm64 where
  when linked with libunwind instead of catching an exception, the
  program crashes. This was first seen on mcrouter, but attached is a
  small reproducer where `main_unwind` will crash. The libunwind
  shipping with 22.04 doesn't appear to have this problem, nor do
  unmodified upstream versions (including the 1.2.1 which is the 18.04
  and 20.04 version).

  Attached is a small reproducer that demonstrates the problem.

  Ubuntu 22.04:
  ```
  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  int throws lib
  int caught main
  ```

  Ubuntu 20.04:
  ```
  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  terminate called after throwing an instance of 'int'
  terminate called recursively
  Aborted (core dumped)
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1999104] Re: libunwind causes crashes on arm64

2022-12-12 Thread Matthew Ruffell
Attached is a debdiff which fixes this problem on Focal.

** Patch added: "Debdiff for libunwind on Focal"
   
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+attachment/5635451/+files/lp1999104_focal.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to libunwind in Ubuntu.
https://bugs.launchpad.net/bugs/1999104

Title:
  libunwind causes crashes on arm64

Status in libunwind package in Ubuntu:
  Fix Released
Status in libunwind source package in Bionic:
  In Progress
Status in libunwind source package in Focal:
  In Progress

Bug description:
  There is a bug in libunwind in both 18.04 and 20.04 on arm64 where
  when linked with libunwind instead of catching an exception, the
  program crashes. This was first seen on mcrouter, but attached is a
  small reproducer where `main_unwind` will crash. The libunwind
  shipping with 22.04 doesn't appear to have this problem, nor do
  unmodified upstream versions (including the 1.2.1 which is the 18.04
  and 20.04 version).

  Attached is a small reproducer that demonstrates the problem.

  Ubuntu 22.04:
  ```
  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  int throws lib
  int caught main
  ```

  Ubuntu 20.04:
  ```
  $ ./main
  int throws lib
  int caught main
  $ ./main_unwind
  terminate called after throwing an instance of 'int'
  terminate called recursively
  Aborted (core dumped)
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1999104/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure

2022-09-07 Thread Matthew Ruffell
Attached is an improvement on the previous patch revision. Output is now
forwarded to logger, we use shell expansion to enumerate network
devices, we omit loopback, and we added a udevadm settle to wait for any
thunderstorms to resolve before we continue installing the new udev
package.

** Patch added: "Debdiff for systemd on Bionic part two V2"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+attachment/5614287/+files/lp1988119_bionic_part_two_V2.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119

Title:
  systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS
  outages on Azure

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  A widespread outage was caused on Azure instances earlier today, when
  systemd 237-3ubuntu10.54 was published to the bionic-security pocket.
  Instances could no longer resolve DNS queries, breaking networking.

  For affected users, the following workarounds are available. Use whatever is 
most convenient.
  - Reboot your instances
  - or -
  - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" 
as root

  The trigger was found to be open-vm-tools issuing "udevadm trigger".
  Azure has a specific netplan setup that uses the `driver` match to set
  up networking. If a udevadm trigger is executed, the KV pair that
  contains this info is lost. Next time netplan is executed, the server
  loses it's DNS information.

  This is the same as bug 1902960 experienced on Focal two years ago.

  The root cause was found to be a bug in systemd, where if we receive a
  "Remove" action from a change uevent, we need to run net_setup_link(),
  we need to skip device rename and keep the old name.

  [Testcase]

  Start an instance up on Azure, any type. Simply issue udevadm trigger
  and reload systemd-networkd:

  $ ping google.com
  PING google.com (172.253.62.102) 56(84) bytes of data.
  64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 
time=1.85 ms
  $ sudo udevadm trigger && sudo systemctl restart systemd-networkd
  $ ping google.com
  ping: google.com: Temporary failure in name resolution

  To fix a broken instance, you can run:

  $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd-
  networkd

  and then install the test packages below:

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test

  If you install them, the issue should no longer occur.

  [Where problems could occur]

  If a regression were to occur, it would affect systemd-udevd
  processing 'change' events from network devices, which could lead to
  network outages. Since this would happen when systemd-networkd is
  restarted on postinstall, a regression would cause widespread outages
  due to this SRU being targeted to the security pocket, where
  unattended-upgrades will automatically install from.

  Side effects could include incorrect udevd device properties.

  It is very important that this SRU is well tested before release.

  [Other info]

  This was fixed in Systemd 247 with the following commit:

  commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
  Author: Yu Watanabe 
  Date: Mon, 14 Sep 2020 15:21:04 +0900
  Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= 
properties on non-'add' uevent
  Link: 
https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151

  This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960
  two years ago. Focal required a heavy backport, which was performed by
  Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re-
  assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below
  pastebin:

  https://paste.ubuntu.com/p/K5k7bGt3Wx/

  The changes between the Focal backport and the Bionic backport are:

  - We use udev_device_get_action() instead of device_get_action()
  - device_action_from_string() is used to get to enum DeviceAction
  - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto 
no_rename"
  - log_device_* has been changed to log_*.

  See attached debdiff for Bionic backport.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure

2022-09-05 Thread Matthew Ruffell
Attached is the second patch required to fully fix this bug. It adds a
check on preinstall to see if ID_NET_DRIVER is present on the network
interface, and if it is missing, call udevadm trigger -c add on the
interface to add it.

** Patch added: "Debdiff for systemd on Bionic part two"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+attachment/5613890/+files/lp1988119_bionic_part_two.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119

Title:
  systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS
  outages on Azure

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  A widespread outage was caused on Azure instances earlier today, when
  systemd 237-3ubuntu10.54 was published to the bionic-security pocket.
  Instances could no longer resolve DNS queries, breaking networking.

  For affected users, the following workarounds are available. Use whatever is 
most convenient.
  - Reboot your instances
  - or -
  - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" 
as root

  The trigger was found to be open-vm-tools issuing "udevadm trigger".
  Azure has a specific netplan setup that uses the `driver` match to set
  up networking. If a udevadm trigger is executed, the KV pair that
  contains this info is lost. Next time netplan is executed, the server
  loses it's DNS information.

  This is the same as bug 1902960 experienced on Focal two years ago.

  The root cause was found to be a bug in systemd, where if we receive a
  "Remove" action from a change uevent, we need to run net_setup_link(),
  we need to skip device rename and keep the old name.

  [Testcase]

  Start an instance up on Azure, any type. Simply issue udevadm trigger
  and reload systemd-networkd:

  $ ping google.com
  PING google.com (172.253.62.102) 56(84) bytes of data.
  64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 
time=1.85 ms
  $ sudo udevadm trigger && sudo systemctl restart systemd-networkd
  $ ping google.com
  ping: google.com: Temporary failure in name resolution

  To fix a broken instance, you can run:

  $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd-
  networkd

  and then install the test packages below:

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test

  If you install them, the issue should no longer occur.

  [Where problems could occur]

  If a regression were to occur, it would affect systemd-udevd
  processing 'change' events from network devices, which could lead to
  network outages. Since this would happen when systemd-networkd is
  restarted on postinstall, a regression would cause widespread outages
  due to this SRU being targeted to the security pocket, where
  unattended-upgrades will automatically install from.

  Side effects could include incorrect udevd device properties.

  It is very important that this SRU is well tested before release.

  [Other info]

  This was fixed in Systemd 247 with the following commit:

  commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
  Author: Yu Watanabe 
  Date: Mon, 14 Sep 2020 15:21:04 +0900
  Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= 
properties on non-'add' uevent
  Link: 
https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151

  This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960
  two years ago. Focal required a heavy backport, which was performed by
  Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re-
  assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below
  pastebin:

  https://paste.ubuntu.com/p/K5k7bGt3Wx/

  The changes between the Focal backport and the Bionic backport are:

  - We use udev_device_get_action() instead of device_get_action()
  - device_action_from_string() is used to get to enum DeviceAction
  - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto 
no_rename"
  - log_device_* has been changed to log_*.

  See attached debdiff for Bionic backport.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure

2022-08-31 Thread Matthew Ruffell
The failure mode still exists if "udevadm trigger" has been issued
before the package upgrade to systemd 237-3ubuntu10.55.

That is, if unattended-upgrades or the user had installed open-vm-tools,
and has not rebooted yet, they will lose network connection on upgrade
to 237-3ubuntu10.55.

We need to implement a way to add ID_NET_DRIVER back to the device
before the systemd upgrade takes place, otherwise an outage will occur.

Release admins - DO NOT RELEASE systemd 237-3ubuntu10.55 yet.

Tagging block-proposed.

$ ping google.com
PING google.com (142.251.45.110) 56(84) bytes of data.
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=1 ttl=56 
time=1.51 ms
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=2 ttl=56 
time=1.35 ms
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=3 ttl=56 
time=1.17 ms
^C
--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 1.172/1.349/1.516/0.140 ms
azureuser@mruffell-test:~$ sudo apt-cache policy systemd | grep Installed
  Installed: 237-3ubuntu10.53
azureuser@mruffell-test:~$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=hv_netvsc
azureuser@mruffell-test:~$ sudo udevadm trigger
azureuser@mruffell-test:~$ ping google.com
PING google.com (142.251.45.110) 56(84) bytes of data.
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=1 ttl=56 
time=2.15 ms
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=2 ttl=56 
time=1.21 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.212/1.682/2.152/0.470 ms
azureuser@mruffell-test:~$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
azureuser@mruffell-test:~$ sudo apt install libnss-systemd libpam-systemd 
libsystemd0 libudev1 systemd systemd-sysv udev
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following package was automatically installed and is no longer required:
  linux-headers-4.15.0-191
Use 'sudo apt autoremove' to remove it.
Suggested packages:
  systemd-container
The following packages will be upgraded:
  libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev
7 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Need to get 4497 kB of archives.
After this operation, 8192 B of additional disk space will be used.
Get:1 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main 
amd64 libsystemd0 amd64 237-3ubuntu10.55 [205 kB]
Get:2 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main 
amd64 libnss-systemd amd64 237-3ubuntu10.55 [105 kB]
Get:3 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main 
amd64 libpam-systemd amd64 237-3ubuntu10.55 [107 kB]
Get:4 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main 
amd64 systemd amd64 237-3ubuntu10.55 [2915 kB]
Get:5 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main 
amd64 udev amd64 237-3ubuntu10.55 [1099 kB]
Get:6 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main 
amd64 libudev1 amd64 237-3ubuntu10.55 [54.2 kB]
Get:7 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main 
amd64 systemd-sysv amd64 237-3ubuntu10.55 [12.0 kB]
Fetched 4497 kB in 3s (1461 kB/s)
(Reading database ... 77176 files and directories currently installed.)
Preparing to unpack .../libsystemd0_237-3ubuntu10.55_amd64.deb ...
Unpacking libsystemd0:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Setting up libsystemd0:amd64 (237-3ubuntu10.55) ...
(Reading database ... 77176 files and directories currently installed.)
Preparing to unpack .../libnss-systemd_237-3ubuntu10.55_amd64.deb ...
Unpacking libnss-systemd:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../libpam-systemd_237-3ubuntu10.55_amd64.deb ...
Unpacking libpam-systemd:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../systemd_237-3ubuntu10.55_amd64.deb ...
Unpacking systemd (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../udev_237-3ubuntu10.55_amd64.deb ...
Unpacking udev (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../libudev1_237-3ubuntu10.55_amd64.deb ...
Unpacking libudev1:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Setting up libudev1:amd64 (237-3ubuntu10.55) ...
Setting up systemd (237-3ubuntu10.55) ...
(Reading database ... 77176 files and directories currently installed.)
Preparing to unpack .../systemd-sysv_237-3ubuntu10.55_amd64.deb ...
Unpacking systemd-sysv (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Setting up libnss-systemd:amd64 (237-3ubuntu10.55) ...
Setting up systemd-sysv (237-3ubuntu10.55) ...
Setting up udev (237-3ubuntu10.55) ...
update-initramfs: deferring update (trigger activated)
Setting up libpam-systemd:amd64 (237-3ubuntu10.55) ...

[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure

2022-08-31 Thread Matthew Ruffell
** Changed in: systemd (Ubuntu Bionic)
   Status: Fix Released => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119

Title:
  systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS
  outages on Azure

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  A widespread outage was caused on Azure instances earlier today, when
  systemd 237-3ubuntu10.54 was published to the bionic-security pocket.
  Instances could no longer resolve DNS queries, breaking networking.

  For affected users, the following workarounds are available. Use whatever is 
most convenient.
  - Reboot your instances
  - or -
  - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" 
as root

  The trigger was found to be open-vm-tools issuing "udevadm trigger".
  Azure has a specific netplan setup that uses the `driver` match to set
  up networking. If a udevadm trigger is executed, the KV pair that
  contains this info is lost. Next time netplan is executed, the server
  loses it's DNS information.

  This is the same as bug 1902960 experienced on Focal two years ago.

  The root cause was found to be a bug in systemd, where if we receive a
  "Remove" action from a change uevent, we need to run net_setup_link(),
  we need to skip device rename and keep the old name.

  [Testcase]

  Start an instance up on Azure, any type. Simply issue udevadm trigger
  and reload systemd-networkd:

  $ ping google.com
  PING google.com (172.253.62.102) 56(84) bytes of data.
  64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 
time=1.85 ms
  $ sudo udevadm trigger && sudo systemctl restart systemd-networkd
  $ ping google.com
  ping: google.com: Temporary failure in name resolution

  To fix a broken instance, you can run:

  $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd-
  networkd

  and then install the test packages below:

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test

  If you install them, the issue should no longer occur.

  [Where problems could occur]

  If a regression were to occur, it would affect systemd-udevd
  processing 'change' events from network devices, which could lead to
  network outages. Since this would happen when systemd-networkd is
  restarted on postinstall, a regression would cause widespread outages
  due to this SRU being targeted to the security pocket, where
  unattended-upgrades will automatically install from.

  Side effects could include incorrect udevd device properties.

  It is very important that this SRU is well tested before release.

  [Other info]

  This was fixed in Systemd 247 with the following commit:

  commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
  Author: Yu Watanabe 
  Date: Mon, 14 Sep 2020 15:21:04 +0900
  Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= 
properties on non-'add' uevent
  Link: 
https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151

  This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960
  two years ago. Focal required a heavy backport, which was performed by
  Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re-
  assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below
  pastebin:

  https://paste.ubuntu.com/p/K5k7bGt3Wx/

  The changes between the Focal backport and the Bionic backport are:

  - We use udev_device_get_action() instead of device_get_action()
  - device_action_from_string() is used to get to enum DeviceAction
  - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto 
no_rename"
  - log_device_* has been changed to log_*.

  See attached debdiff for Bionic backport.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure

2022-08-31 Thread Matthew Ruffell
Hello everyone,

I know there are quite a few people watching this bug, so I will provide
a status update.

The test package has been looking good throughout our internal testing,
and we have proceeded to build the next systemd update, version
237-3ubuntu10.55, and it is currently in the bionic-security -proposed
ppa.

If you would like to help test, that would be greatly appreciated.
Please use a fresh VM on Azure, and please don't put the package into
production just yet.

Instructions to install (On a Bionic system):
1) sudo add-apt-repository ppa:ubuntu-security-proposed/ppa
2) sudo apt update
3) sudo apt install libnss-systemd libpam-systemd libsystemd0 libudev1 systemd 
systemd-sysv udev
4) sudo apt-cache policy systemd | grep Installed
Installed: 237-3ubuntu10.55
5) sudo rm 
/etc/apt/sources.list.d/ubuntu-security-proposed-ubuntu-ppa-bionic.list
6) sudo apt update

>From there you can run the reproducer:

$ sudo udevadm trigger && sudo systemctl restart systemd-networkd
$ ping google.com
PING google.com (172.253.122.138) 56(84) bytes of data.
64 bytes from bh-in-f138.1e100.net (172.253.122.138): icmp_seq=1 ttl=103 
time=1.67 ms

if you do test, comment here on how it went. Again, please don't put the
package into production until it has had a little more testing, and we
will get this released to the world as quickly and safely as we can.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119

Title:
  systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS
  outages on Azure

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  In Progress

Bug description:
  [Impact]

  A widespread outage was caused on Azure instances earlier today, when
  systemd 237-3ubuntu10.54 was published to the bionic-security pocket.
  Instances could no longer resolve DNS queries, breaking networking.

  For affected users, the following workarounds are available. Use whatever is 
most convenient.
  - Reboot your instances
  - or -
  - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" 
as root

  The trigger was found to be open-vm-tools issuing "udevadm trigger".
  Azure has a specific netplan setup that uses the `driver` match to set
  up networking. If a udevadm trigger is executed, the KV pair that
  contains this info is lost. Next time netplan is executed, the server
  loses it's DNS information.

  This is the same as bug 1902960 experienced on Focal two years ago.

  The root cause was found to be a bug in systemd, where if we receive a
  "Remove" action from a change uevent, we need to run net_setup_link(),
  we need to skip device rename and keep the old name.

  [Testcase]

  Start an instance up on Azure, any type. Simply issue udevadm trigger
  and reload systemd-networkd:

  $ ping google.com
  PING google.com (172.253.62.102) 56(84) bytes of data.
  64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 
time=1.85 ms
  $ sudo udevadm trigger && sudo systemctl restart systemd-networkd
  $ ping google.com
  ping: google.com: Temporary failure in name resolution

  To fix a broken instance, you can run:

  $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd-
  networkd

  and then install the test packages below:

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test

  If you install them, the issue should no longer occur.

  [Where problems could occur]

  If a regression were to occur, it would affect systemd-udevd
  processing 'change' events from network devices, which could lead to
  network outages. Since this would happen when systemd-networkd is
  restarted on postinstall, a regression would cause widespread outages
  due to this SRU being targeted to the security pocket, where
  unattended-upgrades will automatically install from.

  Side effects could include incorrect udevd device properties.

  It is very important that this SRU is well tested before release.

  [Other info]

  This was fixed in Systemd 247 with the following commit:

  commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
  Author: Yu Watanabe 
  Date: Mon, 14 Sep 2020 15:21:04 +0900
  Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= 
properties on non-'add' uevent
  Link: 
https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151

  This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960
  two years ago. Focal required a heavy backport, which was performed by
  Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re-
  assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below
  pastebin:

  https://paste.ubuntu.com/p/K5k7bGt3Wx/

  The changes between the Focal backport and the Bionic backport are:

  - We use udev_device_get_action() 

[Touch-packages] [Bug 1988119] Re: Update to systemd 237-3ubuntu10.54 broke dns

2022-08-31 Thread Matthew Ruffell
Attached is a debdiff for systemd on Bionic which fixes this bug.

** Description changed:

- Two servers today that updated systemd to "systemd 237-3ubuntu10.54" 
- https://ubuntu.com/security/notices/USN-5583-1
+ [Impact]
  
- could not resolve dns anymore.
- no dns servers, normally set through dhcp.
+ A widespread outage was caused on Azure instances earlier today, when
+ systemd 237-3ubuntu10.54 was published to the bionic-security pocket.
+ Instances could no longer resolve DNS queries, breaking networking.
  
- Ubuntu 18.04
+ For affected users, the following workarounds are available. Use whatever is 
most convenient.
+ - Reboot your instances
+ - or -
+ - Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" 
as root
  
- Temp fix.
-  1. Edit /etc/systemd/resolved.conf
-  1. Add/Uncomment # FallbackDNS=168.63.129.16
-  1. Restart systemd-resolved sudo systemctl restart systemd-resolved.service
-  1. Confirm dns working with systemd-resolve google.com
+ The trigger was found to be open-vm-tools issuing "udevadm trigger".
+ Azure has a specific netplan setup that uses the `driver` match to set
+ up networking. If a udevadm trigger is executed, the KV pair that
+ contains this info is lost. Next time netplan is executed, the server
+ loses it's DNS information.
+ 
+ This is the same as bug 1902960 experienced on Focal two years ago.
+ 
+ The root cause was found to be a bug in systemd, where if we receive a
+ "Remove" action from a change uevent, we need to run net_setup_link(),
+ we need to skip device rename and keep the old name.
+ 
+ [Testcase]
+ 
+ Start an instance up on Azure, any type. Simply issue udevadm trigger
+ and reload systemd-networkd:
+ 
+ $ ping google.com
+ PING google.com (172.253.62.102) 56(84) bytes of data.
+ 64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 
time=1.85 ms
+ $ sudo udevadm trigger && sudo systemctl restart systemd-networkd
+ $ ping google.com
+ ping: google.com: Temporary failure in name resolution
+ 
+ To fix a broken instance, you can run:
+ 
+ $ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd-
+ networkd
+ 
+ and then install the test packages below:
+ 
+ Test packages are available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test
+ 
+ If you install them, the issue should no longer occur.
+ 
+ [Where problems could occur]
+ 
+ If a regression were to occur, it would affect systemd-udevd processing
+ 'change' events from network devices, which could lead to network
+ outages. Since this would happen when systemd-networkd is restarted on
+ postinstall, a regression would cause widespread outages due to this SRU
+ being targeted to the security pocket, where unattended-upgrades will
+ automatically install from.
+ 
+ Side effects could include incorrect udevd device properties.
+ 
+ It is very important that this SRU is well tested before release.
+ 
+ [Other info]
+ 
+ This was fixed in Systemd 247 with the following commit:
+ 
+ commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
+ Author: Yu Watanabe 
+ Date: Mon, 14 Sep 2020 15:21:04 +0900
+ Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= 
properties on non-'add' uevent
+ Link: 
https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
+ 
+ This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960
+ two years ago. Focal required a heavy backport, which was performed by
+ Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re-
+ assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below
+ pastebin:
+ 
+ https://paste.ubuntu.com/p/K5k7bGt3Wx/
+ 
+ The changes between the Focal backport and the Bionic backport are:
+ 
+ - We use udev_device_get_action() instead of device_get_action()
+ - device_action_from_string() is used to get to enum DeviceAction
+ - We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto 
no_rename"
+ - log_device_* has been changed to log_*.
+ 
+ See attached debdiff for Bionic backport.

** Summary changed:

- Update to systemd 237-3ubuntu10.54 broke dns
+ systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages 
on Azure

** Patch added: "Debdiff for systemd on Bionic"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+attachment/5612617/+files/lp1988119_bionic.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119

Title:
  systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS
  outages on Azure

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  In Progress

Bug description:
  [Impact]

  A widespread outage was caused on Azure instances earlier today, when
  systemd 237-3ubuntu10.54 was published to the bionic-security pocket.
  Instances could no longer 

[Touch-packages] [Bug 1988119] Re: Update to systemd 237-3ubuntu10.54 broke dns

2022-08-30 Thread Matthew Ruffell
** Changed in: systemd (Ubuntu Bionic)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: bionic sts

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119

Title:
  Update to systemd 237-3ubuntu10.54 broke dns

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  In Progress

Bug description:
  Two servers today that updated systemd to "systemd 237-3ubuntu10.54" 
  https://ubuntu.com/security/notices/USN-5583-1

  could not resolve dns anymore.
  no dns servers, normally set through dhcp.

  Ubuntu 18.04

  Temp fix.
   1. Edit /etc/systemd/resolved.conf
   1. Add/Uncomment # FallbackDNS=168.63.129.16
   1. Restart systemd-resolved sudo systemctl restart systemd-resolved.service
   1. Confirm dns working with systemd-resolve google.com

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1988119] Re: Update to systemd 237-3ubuntu10.54 broke dns

2022-08-30 Thread Matthew Ruffell
** Also affects: systemd (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: systemd (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: systemd (Ubuntu)
   Status: Confirmed => Fix Released

** Changed in: systemd (Ubuntu Bionic)
   Importance: Undecided => Critical

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119

Title:
  Update to systemd 237-3ubuntu10.54 broke dns

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  In Progress

Bug description:
  Two servers today that updated systemd to "systemd 237-3ubuntu10.54" 
  https://ubuntu.com/security/notices/USN-5583-1

  could not resolve dns anymore.
  no dns servers, normally set through dhcp.

  Ubuntu 18.04

  Temp fix.
   1. Edit /etc/systemd/resolved.conf
   1. Add/Uncomment # FallbackDNS=168.63.129.16
   1. Restart systemd-resolved sudo systemctl restart systemd-resolved.service
   1. Confirm dns working with systemd-resolve google.com

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1964880] Re: software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages'

2022-03-22 Thread Matthew Ruffell
I installed software-properties 0.99.20 from -proposed, and opened
software-properties-gtk, and clicked the "Additional Drivers" tab. The
tab loaded correctly and did not crash.

The package in -proposed fixes the issue, happy to mark verified.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to software-properties in
Ubuntu.
https://bugs.launchpad.net/bugs/1964880

Title:
  software-properties-gtk crashed with AttributeError in
  packages_for_modalias(): 'Cache' object has no attribute 'packages'

Status in software-properties package in Ubuntu:
  Fix Committed
Status in software-properties source package in Jammy:
  Fix Committed

Bug description:
  Opened up "Software & Updates" and clicked the "Additional Drivers
  Tab", for the tab to crash.

  ProblemType: Crash
  DistroRelease: Ubuntu 22.04
  Package: software-properties-gtk 0.99.19
  ProcVersionSignature: Ubuntu 5.15.0-23.23-generic 5.15.27
  Uname: Linux 5.15.0-23-generic x86_64
  ApportVersion: 2.20.11-0ubuntu79
  Architecture: amd64
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Date: Tue Mar 15 19:02:39 2022
  ExecutablePath: /usr/bin/software-properties-gtk
  InstallationDate: Installed on 2022-01-02 (72 days ago)
  InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220101)
  InterpreterPath: /usr/bin/python3.10
  PackageArchitecture: all
  ProcCmdline: /usr/bin/python3 /usr/bin/software-properties-gtk
  Python3Details: /usr/bin/python3.10, Python 3.10.2+, python3-minimal, 
3.10.1-0ubuntu2
  PythonArgs: ['/usr/bin/software-properties-gtk']
  PythonDetails: N/A
  SourcePackage: software-properties
  Title: software-properties-gtk crashed with AttributeError in 
packages_for_modalias(): 'Cache' object has no attribute 'packages'
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/software-properties/+bug/1964880/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1964880] [NEW] software-properties-gtk crashed with AttributeError in packages_for_modalias(): 'Cache' object has no attribute 'packages'

2022-03-15 Thread Matthew Ruffell
Public bug reported:

Opened up "Software & Updates" and clicked the "Additional Drivers Tab",
for the tab to crash.

ProblemType: Crash
DistroRelease: Ubuntu 22.04
Package: software-properties-gtk 0.99.19
ProcVersionSignature: Ubuntu 5.15.0-23.23-generic 5.15.27
Uname: Linux 5.15.0-23-generic x86_64
ApportVersion: 2.20.11-0ubuntu79
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Tue Mar 15 19:02:39 2022
ExecutablePath: /usr/bin/software-properties-gtk
InstallationDate: Installed on 2022-01-02 (72 days ago)
InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220101)
InterpreterPath: /usr/bin/python3.10
PackageArchitecture: all
ProcCmdline: /usr/bin/python3 /usr/bin/software-properties-gtk
Python3Details: /usr/bin/python3.10, Python 3.10.2+, python3-minimal, 
3.10.1-0ubuntu2
PythonArgs: ['/usr/bin/software-properties-gtk']
PythonDetails: N/A
SourcePackage: software-properties
Title: software-properties-gtk crashed with AttributeError in 
packages_for_modalias(): 'Cache' object has no attribute 'packages'
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo

** Affects: software-properties (Ubuntu)
 Importance: Undecided
 Status: New

** Affects: software-properties (Ubuntu Jammy)
 Importance: Undecided
 Status: New


** Tags: amd64 apport-crash jammy need-duplicate-check third-party-packages 
wayland-session

** Also affects: software-properties (Ubuntu Jammy)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to software-properties in
Ubuntu.
https://bugs.launchpad.net/bugs/1964880

Title:
  software-properties-gtk crashed with AttributeError in
  packages_for_modalias(): 'Cache' object has no attribute 'packages'

Status in software-properties package in Ubuntu:
  New
Status in software-properties source package in Jammy:
  New

Bug description:
  Opened up "Software & Updates" and clicked the "Additional Drivers
  Tab", for the tab to crash.

  ProblemType: Crash
  DistroRelease: Ubuntu 22.04
  Package: software-properties-gtk 0.99.19
  ProcVersionSignature: Ubuntu 5.15.0-23.23-generic 5.15.27
  Uname: Linux 5.15.0-23-generic x86_64
  ApportVersion: 2.20.11-0ubuntu79
  Architecture: amd64
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Date: Tue Mar 15 19:02:39 2022
  ExecutablePath: /usr/bin/software-properties-gtk
  InstallationDate: Installed on 2022-01-02 (72 days ago)
  InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220101)
  InterpreterPath: /usr/bin/python3.10
  PackageArchitecture: all
  ProcCmdline: /usr/bin/python3 /usr/bin/software-properties-gtk
  Python3Details: /usr/bin/python3.10, Python 3.10.2+, python3-minimal, 
3.10.1-0ubuntu2
  PythonArgs: ['/usr/bin/software-properties-gtk']
  PythonDetails: N/A
  SourcePackage: software-properties
  Title: software-properties-gtk crashed with AttributeError in 
packages_for_modalias(): 'Cache' object has no attribute 'packages'
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/software-properties/+bug/1964880/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1960863] Re: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances

2022-02-23 Thread Matthew Ruffell
Performing verification for openssl on Focal.

An affected user performed the verification, due to c7g instance types
being in "Preview" state on Amazon AWS, and not generally accessible.

The user started a c7g instance, and checked they had openssl
1.1.1f-1ubuntu2.10 from -updates.

They attempted to use the poly1035 MAC downloading the file from the
testcase:

$ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 
0Segmentation fault (core dumped)

They can reproduce the issue.

They then enabled -proposed from ports.ubuntu.com mirror, and installed
openssl 1.1.1f-1ubuntu2.11.

They again tried downloading the file:

$ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
  
(note the file doesn't actually download due to curl not automatically 
following 301 redirects):

$ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin --verbose
...
* SSL connection using TLSv1.2 / ECDHE-ECDSA-CHACHA20-POLY1305
...
< HTTP/1.1 301 Moved Permanently
< Location: https://downloads.gradle-dn.com/distributions/gradle-7.2-bin.zip
...

curl does not segfault, and exits successfully. The package in -proposed
fixes the issue. Happy to mark as verified.

** Tags removed: sts-sponsor verification-needed verification-needed-focal
** Tags added: verification-done verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1960863

Title:
  armv8 paca: poly1305 users see segfaults when pointer authentication
  in use on AWS Graviton 3 instances

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  Fix Committed

Bug description:
  [Impact]

  Support for hardware pointer authentication for armv8 systems was
  merged in openssl 1.1.1f, but it contains a bug in the implementation
  for poly1305 message authenticated code routines, which causes the
  calling program to fail pointer authentication, which causes the
  program to crash with a segmentation fault.

  You can easily test it by accessing any website that uses poly1305.
  There is no workaround except use a different MAC.

  [Testcase]

  This bug applies to armv8 systems which support pointer
  authentication. Start an armv8 instance, such as a c7g graviton 3
  instance on AWS, and make sure the paca flag is present in lscpu:

  $ grep paca /proc/cpuinfo
  Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp 
cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm 
dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng

  Next, attempt to connect to any website that uses poly1305 MAC.

  $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin
  % Total % Received % Xferd Average Speed Time Time Time Current
  Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped)

  There is a test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test

  Install it, and poly1305 operations will no longer segfault.

  [Where problems could occur]

  The patch changes the order of operations for loading the SP and
  checking the AUTIASP against it, from checking the AUTIASP against
  nothing then loading the correct SP to check with, to the correct
  loading the SP and then checking the AUTIASP against the SP.

  This only changes one code path for armv8 systems, and other
  architectures are not affected. This is also only limited to poly1305
  MAC.

  If a regression were to occur, it would only affect users of poly1035
  MAC on armv8 with pacs support.

  [Other info]

  The fix landed upstream in openssl 1.1.1i with the following commit:

  commit 5795acffd8706e1cb584284ee5bb3a30986d0e75
  Author: Ard Biesheuvel 
  Date:   Tue Oct 27 18:02:40 2020 +0100
  Subject: crypto/poly1305/asm: fix armv8 pointer authentication
  Link: 
https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75

  This commit is already present in Impish onward. Only Focal needs the
  fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net

[Touch-packages] [Bug 1961542] Re: libsmartcols: Revert back to previous behaviour of non-shell parsable column output (lsblk -P)

2022-02-20 Thread Matthew Ruffell
Attached is a debdiff for Jammy util-linux.

** Patch added: "debdiff for util-linux on Jammy"
   
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1961542/+attachment/5562373/+files/lp1961542_jammy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1961542

Title:
  libsmartcols: Revert back to previous behaviour of non-shell parsable
  column output (lsblk -P)

Status in util-linux package in Ubuntu:
  In Progress
Status in util-linux source package in Jammy:
  In Progress

Bug description:
  [Impact]

  util-linux 2.37 in Jammy has introduced some new behaviour for lsblk
  and similar tools which depend on libsmartcols. This switched the -P /
  --pairs parameter from printing column names as normal, to changing
  the names to shell compatible names instead.

  e.g. lsblk -P now outputs LOG_SEC instead of LOG-SEC.

  The change broke some core tooling which rely on on the output of
  lsblk -P, most notably, curtin and MAAS, but I am sure there will be
  more applications affected.

  Affected MAAS users will see the following traceback when attempting
  to deploy 22.04:

  Traceback (most recent call last):
  File "/curtin/curtin/block/__init__.py", line 785, in get_blockdev_sector_size
  logical = info[parent]['LOG-SEC']
  KeyError: 'LOG-SEC'
  'LOG-SEC'
  curtin: Installation failed with exception: Unexpected error while running 

  This is documented in MAAS bug 1956613.

  MAAS decided to fix it by changing from -P to -J, in the following commit:
  
https://git.launchpad.net/maas/commit/?id=e2c01963430e6837198a54bc1eadf3efc9fdd9a2

  Curtin now checks for MAJ_MIN, and changes it back to MAJ:MIN in:
  
https://github.com/canonical/curtin/commit/ce811db127fe1ce46498b83615f8faed8c7dfeb6

  The issue is that these commits are not tagged to any MAAS release,
  and users would be forced to upgrade MAAS to the latest stable release
  when available if they want to deploy 22.04.

  There are many users out there that don't want to upgrade MAAS, so
  returning to the previous column output is the most desirable
  solution.

  [Testcase]

  On a Jammy install, simply run lsblk with either -P or --pairs:

  $ sudo lsblk -P
  ...
  NAME="sda" MAJ_MIN="8:0" RM="0" SIZE="465.8G" RO="0" TYPE="disk" 
MOUNTPOINTS=""
  ...

  Affected installs will see MAJ_MIN.

  There is a test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf326660-test

  If you install it, you will see MAJ:MIN, just like it is on Impish and
  previous.

  [Where problems could occur]

  We are changing the column output for -P and --pairs for the following
  applications:

  * lsblk
  * findmnt
  * lsipc

  If any application has been modified to depend on the new column
  output, it will break. I don't have any examples of something that
  will break, because MAAS and curtin were modified such that they would
  be compatible with both column name formats.

  It should be noted that the manpage documents that lsblk output can
  change at any time:

  > The default output, as well as the default output from options like --fs 
and --topology, is subject to change. 
  > So whenever possible, you should avoid using default outputs in your 
scripts. 
  > Always explicitly define expected columns by using --output columns-list 
and --list in environments where a stable output is required.

  If a regression should occur, we will need to fix up these affected
  packages also.

  [Other info]

  The change came about when a user asked upstream to make -P / --pairs
  shell parsable, in Issue 1201 upstream:

  https://github.com/util-linux/util-linux/issues/1201

  Karel Zak obliged, and it was implemented in the following commit:

  commit 58b510e5805d8350c31bfb81a47bcd38ea9fdd7e
  From: Karel Zak 
  Date: Thu, 3 Dec 2020 12:14:10 +0100
  Subject: libsmartcols: sanitize variable names on export output
  Link: 
https://github.com/util-linux/util-linux/commit/58b510e5805d8350c31bfb81a47bcd38ea9fdd7e

  I wrote to Karel Zak with the regressions introduced by changing the
  format, and asked to revert back, and instead implement the shell
  parsable logic as a new parameter.

  This happened in upstream issue 1594:

  https://github.com/util-linux/util-linux/issues/1594

  Karel Zak was happy to oblige again, and we now have the following
  commits:

  338ad4a93 findmnt: commit missing flag
  0f843ab64 lsblk: update --help output for -y
  eba05f308 lsipc: add -y,--shell
  152c17aa4 findmnt: add -y,--shell
  9c7e81ff1 lslogins: add -y,--shell
  25fb0638a lsblk: add -y/--shell
  39679ea0c lsfd: use new libsmartcols functions
  6fd0e3590 column: use new libsmartcols functions
  0b3c2e80d include/carefulputc: remove unused function
  3b5db50f7 libsmartcols: change "export" behavior, add "shellvar" flag

  While we got the intended behaviour, these commits won't land until
  

[Touch-packages] [Bug 1961542] [NEW] libsmartcols: Revert back to previous behaviour of non-shell parsable column output (lsblk -P)

2022-02-20 Thread Matthew Ruffell
Public bug reported:

[Impact]

util-linux 2.37 in Jammy has introduced some new behaviour for lsblk and
similar tools which depend on libsmartcols. This switched the -P /
--pairs parameter from printing column names as normal, to changing the
names to shell compatible names instead.

e.g. lsblk -P now outputs LOG_SEC instead of LOG-SEC.

The change broke some core tooling which rely on on the output of lsblk
-P, most notably, curtin and MAAS, but I am sure there will be more
applications affected.

Affected MAAS users will see the following traceback when attempting to
deploy 22.04:

Traceback (most recent call last):
File "/curtin/curtin/block/__init__.py", line 785, in get_blockdev_sector_size
logical = info[parent]['LOG-SEC']
KeyError: 'LOG-SEC'
'LOG-SEC'
curtin: Installation failed with exception: Unexpected error while running 

This is documented in MAAS bug 1956613.

MAAS decided to fix it by changing from -P to -J, in the following commit:
https://git.launchpad.net/maas/commit/?id=e2c01963430e6837198a54bc1eadf3efc9fdd9a2

Curtin now checks for MAJ_MIN, and changes it back to MAJ:MIN in:
https://github.com/canonical/curtin/commit/ce811db127fe1ce46498b83615f8faed8c7dfeb6

The issue is that these commits are not tagged to any MAAS release, and
users would be forced to upgrade MAAS to the latest stable release when
available if they want to deploy 22.04.

There are many users out there that don't want to upgrade MAAS, so
returning to the previous column output is the most desirable solution.

[Testcase]

On a Jammy install, simply run lsblk with either -P or --pairs:

$ sudo lsblk -P
...
NAME="sda" MAJ_MIN="8:0" RM="0" SIZE="465.8G" RO="0" TYPE="disk" MOUNTPOINTS=""
...

Affected installs will see MAJ_MIN.

There is a test package available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf326660-test

If you install it, you will see MAJ:MIN, just like it is on Impish and
previous.

[Where problems could occur]

We are changing the column output for -P and --pairs for the following
applications:

* lsblk
* findmnt
* lsipc

If any application has been modified to depend on the new column output,
it will break. I don't have any examples of something that will break,
because MAAS and curtin were modified such that they would be compatible
with both column name formats.

It should be noted that the manpage documents that lsblk output can
change at any time:

> The default output, as well as the default output from options like --fs and 
> --topology, is subject to change. 
> So whenever possible, you should avoid using default outputs in your scripts. 
> Always explicitly define expected columns by using --output columns-list and 
> --list in environments where a stable output is required.

If a regression should occur, we will need to fix up these affected
packages also.

[Other info]

The change came about when a user asked upstream to make -P / --pairs
shell parsable, in Issue 1201 upstream:

https://github.com/util-linux/util-linux/issues/1201

Karel Zak obliged, and it was implemented in the following commit:

commit 58b510e5805d8350c31bfb81a47bcd38ea9fdd7e
From: Karel Zak 
Date: Thu, 3 Dec 2020 12:14:10 +0100
Subject: libsmartcols: sanitize variable names on export output
Link: 
https://github.com/util-linux/util-linux/commit/58b510e5805d8350c31bfb81a47bcd38ea9fdd7e

I wrote to Karel Zak with the regressions introduced by changing the
format, and asked to revert back, and instead implement the shell
parsable logic as a new parameter.

This happened in upstream issue 1594:

https://github.com/util-linux/util-linux/issues/1594

Karel Zak was happy to oblige again, and we now have the following
commits:

338ad4a93 findmnt: commit missing flag
0f843ab64 lsblk: update --help output for -y
eba05f308 lsipc: add -y,--shell
152c17aa4 findmnt: add -y,--shell
9c7e81ff1 lslogins: add -y,--shell
25fb0638a lsblk: add -y/--shell
39679ea0c lsfd: use new libsmartcols functions
6fd0e3590 column: use new libsmartcols functions
0b3c2e80d include/carefulputc: remove unused function
3b5db50f7 libsmartcols: change "export" behavior, add "shellvar" flag

While we got the intended behaviour, these commits won't land until
util-linux 2.38, which will be after Jammy releases, and the other issue
is that this changes a significant amount of code, like nearly 1k lines,
and is spread over 10+ commits.

I wrote to ubuntu-devel asking for advice, on either 1) not changing
anything 2) backporting the 10+ new commits, or 3) simply reverting the
commit which changed the behaviour.

https://lists.ubuntu.com/archives/ubuntu-devel/2022-February/041870.html

ubuntu-devel had strong support for option (3).

Hence, we will revert the below commit to ensure Jammy can be deployed on all 
existing MAAS releases.
58b510e580 libsmartcols: sanitize variable names on export output

** Affects: util-linu

[Touch-packages] [Bug 1960863] Re: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances

2022-02-14 Thread Matthew Ruffell
** Tags added: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1960863

Title:
  armv8 paca: poly1305 users see segfaults when pointer authentication
  in use on AWS Graviton 3 instances

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  In Progress

Bug description:
  [Impact]

  Support for hardware pointer authentication for armv8 systems was
  merged in openssl 1.1.1f, but it contains a bug in the implementation
  for poly1305 message authenticated code routines, which causes the
  calling program to fail pointer authentication, which causes the
  program to crash with a segmentation fault.

  You can easily test it by accessing any website that uses poly1305.
  There is no workaround except use a different MAC.

  [Testcase]

  This bug applies to armv8 systems which support pointer
  authentication. Start an armv8 instance, such as a c7g graviton 3
  instance on AWS, and make sure the paca flag is present in lscpu:

  $ grep paca /proc/cpuinfo
  Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp 
cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm 
dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng

  Next, attempt to connect to any website that uses poly1305 MAC.

  $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin
  % Total % Received % Xferd Average Speed Time Time Time Current
  Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped)

  There is a test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test

  Install it, and poly1305 operations will no longer segfault.

  [Where problems could occur]

  The patch changes the order of operations for loading the SP and
  checking the AUTIASP against it, from checking the AUTIASP against
  nothing then loading the correct SP to check with, to the correct
  loading the SP and then checking the AUTIASP against the SP.

  This only changes one code path for armv8 systems, and other
  architectures are not affected. This is also only limited to poly1305
  MAC.

  If a regression were to occur, it would only affect users of poly1035
  MAC on armv8 with pacs support.

  [Other info]

  The fix landed upstream in openssl 1.1.1i with the following commit:

  commit 5795acffd8706e1cb584284ee5bb3a30986d0e75
  Author: Ard Biesheuvel 
  Date:   Tue Oct 27 18:02:40 2020 +0100
  Subject: crypto/poly1305/asm: fix armv8 pointer authentication
  Link: 
https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75

  This commit is already present in Impish onward. Only Focal needs the
  fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1960863] Re: armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances

2022-02-14 Thread Matthew Ruffell
Attached is a debdiff for openssl on Focal

** Patch added: "debdiff for openssl on Focal"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+attachment/5560898/+files/lp1960863_focal.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1960863

Title:
  armv8 paca: poly1305 users see segfaults when pointer authentication
  in use on AWS Graviton 3 instances

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  In Progress

Bug description:
  [Impact]

  Support for hardware pointer authentication for armv8 systems was
  merged in openssl 1.1.1f, but it contains a bug in the implementation
  for poly1305 message authenticated code routines, which causes the
  calling program to fail pointer authentication, which causes the
  program to crash with a segmentation fault.

  You can easily test it by accessing any website that uses poly1305.
  There is no workaround except use a different MAC.

  [Testcase]

  This bug applies to armv8 systems which support pointer
  authentication. Start an armv8 instance, such as a c7g graviton 3
  instance on AWS, and make sure the paca flag is present in lscpu:

  $ grep paca /proc/cpuinfo
  Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp 
cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm 
dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng

  Next, attempt to connect to any website that uses poly1305 MAC.

  $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin
  % Total % Received % Xferd Average Speed Time Time Time Current
  Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped)

  There is a test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test

  Install it, and poly1305 operations will no longer segfault.

  [Where problems could occur]

  The patch changes the order of operations for loading the SP and
  checking the AUTIASP against it, from checking the AUTIASP against
  nothing then loading the correct SP to check with, to the correct
  loading the SP and then checking the AUTIASP against the SP.

  This only changes one code path for armv8 systems, and other
  architectures are not affected. This is also only limited to poly1305
  MAC.

  If a regression were to occur, it would only affect users of poly1035
  MAC on armv8 with pacs support.

  [Other info]

  The fix landed upstream in openssl 1.1.1i with the following commit:

  commit 5795acffd8706e1cb584284ee5bb3a30986d0e75
  Author: Ard Biesheuvel 
  Date:   Tue Oct 27 18:02:40 2020 +0100
  Subject: crypto/poly1305/asm: fix armv8 pointer authentication
  Link: 
https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75

  This commit is already present in Impish onward. Only Focal needs the
  fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1960863/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1960863] [NEW] armv8 paca: poly1305 users see segfaults when pointer authentication in use on AWS Graviton 3 instances

2022-02-14 Thread Matthew Ruffell
Public bug reported:

[Impact]

Support for hardware pointer authentication for armv8 systems was merged
in openssl 1.1.1f, but it contains a bug in the implementation for
poly1305 message authenticated code routines, which causes the calling
program to fail pointer authentication, which causes the program to
crash with a segmentation fault.

You can easily test it by accessing any website that uses poly1305.
There is no workaround except use a different MAC.

[Testcase]

This bug applies to armv8 systems which support pointer authentication.
Start an armv8 instance, such as a c7g graviton 3 instance on AWS, and
make sure the paca flag is present in lscpu:

$ grep paca /proc/cpuinfo
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp 
cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm 
dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng

Next, attempt to connect to any website that uses poly1305 MAC.

$ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped)

There is a test package available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test

Install it, and poly1305 operations will no longer segfault.

[Where problems could occur]

The patch changes the order of operations for loading the SP and
checking the AUTIASP against it, from checking the AUTIASP against
nothing then loading the correct SP to check with, to the correct
loading the SP and then checking the AUTIASP against the SP.

This only changes one code path for armv8 systems, and other
architectures are not affected. This is also only limited to poly1305
MAC.

If a regression were to occur, it would only affect users of poly1035
MAC on armv8 with pacs support.

[Other info]

The fix landed upstream in openssl 1.1.1i with the following commit:

commit 5795acffd8706e1cb584284ee5bb3a30986d0e75
Author: Ard Biesheuvel 
Date:   Tue Oct 27 18:02:40 2020 +0100
Subject: crypto/poly1305/asm: fix armv8 pointer authentication
Link: 
https://github.com/openssl/openssl/commit/5795acffd8706e1cb584284ee5bb3a30986d0e75

This commit is already present in Impish onward. Only Focal needs the
fix.

** Affects: openssl (Ubuntu)
 Importance: Undecided
 Status: Fix Released

** Affects: openssl (Ubuntu Focal)
 Importance: High
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress


** Tags: focal sts

** Also affects: openssl (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Changed in: openssl (Ubuntu)
   Status: New => Fix Released

** Changed in: openssl (Ubuntu Focal)
   Status: New => In Progress

** Changed in: openssl (Ubuntu Focal)
   Importance: Undecided => High

** Changed in: openssl (Ubuntu Focal)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: focal sts

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1960863

Title:
  armv8 paca: poly1305 users see segfaults when pointer authentication
  in use on AWS Graviton 3 instances

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  In Progress

Bug description:
  [Impact]

  Support for hardware pointer authentication for armv8 systems was
  merged in openssl 1.1.1f, but it contains a bug in the implementation
  for poly1305 message authenticated code routines, which causes the
  calling program to fail pointer authentication, which causes the
  program to crash with a segmentation fault.

  You can easily test it by accessing any website that uses poly1305.
  There is no workaround except use a different MAC.

  [Testcase]

  This bug applies to armv8 systems which support pointer
  authentication. Start an armv8 instance, such as a c7g graviton 3
  instance on AWS, and make sure the paca flag is present in lscpu:

  $ grep paca /proc/cpuinfo
  Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp 
cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm 
dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng

  Next, attempt to connect to any website that uses poly1305 MAC.

  $ curl https://services.gradle.org/distributions/gradle-7.2-bin.zip --output 
gradle-7.2.bin
  % Total % Received % Xferd Average Speed Time Time Time Current
  Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Segmentation fault (core dumped)

  There is a test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf327917-test

  Install it, and poly1305 operations will no longer segfault.

  [Where problems could occur]

[Touch-packages] [Bug 1954724] Re: Removing unattended-upgrades removes ubuntu-server-minimal

2022-01-30 Thread Matthew Ruffell
This has been fixed as of ubuntu-meta 1.474
https://launchpad.net/ubuntu/+source/ubuntu-meta/1.474

$ sudo apt rdepends unattended-upgrades
unattended-upgrades
Reverse Depends:
  Recommends: python3-software-properties
  Recommends: ubuntu-mate-desktop
  Recommends: ubuntu-mate-core
  Depends: freedombox
  Recommends: fbx-all
  Recommends: ubuntu-server-minimal
  Recommends: ubuntu-server

** Changed in: ubuntu-meta (Ubuntu Jammy)
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ubuntu-meta in Ubuntu.
https://bugs.launchpad.net/bugs/1954724

Title:
  Removing unattended-upgrades removes ubuntu-server-minimal

Status in ubuntu-meta package in Ubuntu:
  Fix Released
Status in ubuntu-meta source package in Impish:
  Confirmed
Status in ubuntu-meta source package in Jammy:
  Fix Released

Bug description:
  On Impish and later, removing unattended-upgrades also removes ubuntu-
  server-minimal due to ubuntu-server-minimal depending on unattended-
  upgrades

  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
    ubuntu-server-minimal unattended-upgrades
  0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.

  This behaviour has changed since ubuntu-meta 1.471 [1] when the
  ubuntu-server-minimal metapackage was introduced, declaring
  unattended-upgrades as Depends.

  [1] https://launchpadlibrarian.net/550345392/ubuntu-
  meta_1.470_1.471.diff.gz

  On Focal, there was no such behaviour on a fresh ubuntu-server
  install:

  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
    unattended-upgrades
  0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded.

  Removing unattended-upgrades is quite popular amongst our users, and
  they should be allowed to remove the package without removing the
  ubuntu-server-minimal metapackage.

  Looking at the source package for ubuntu-meta, unattended-upgrades is
  only Depends for ubuntu-server-minimal, maybe we should simply remove
  it, or instead, change to recommends?

  $ grep -Rin "unattended-upgrades" .
  ./server-minimal-armhf:23:unattended-upgrades
  ./server-minimal-riscv64:23:unattended-upgrades
  ./server-minimal-arm64:23:unattended-upgrades
  ./server-minimal-ppc64el:23:unattended-upgrades
  ./server-minimal-s390x:24:unattended-upgrades
  ./server-minimal-amd64:23:unattended-upgrades

  $ sudo apt rdepends unattended-upgrades
  unattended-upgrades
  Reverse Depends:
    Recommends: python3-software-properties
    Recommends: ubuntu-mate-desktop
    Recommends: ubuntu-mate-core
    Depends: freedombox
    Recommends: fbx-all
    Depends: ubuntu-server-minimal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1954724/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1954724] Re: Removing unattended-upgrades removes ubuntu-server-minimal

2021-12-13 Thread Matthew Ruffell
** Description changed:

  On Impish and later, removing unattended-upgrades also removes ubuntu-
  server-minimal due to ubuntu-server-minimal depending on unattended-
  upgrades
  
  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
-   ubuntu-server-minimal unattended-upgrades
+   ubuntu-server-minimal unattended-upgrades
  0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
  
  This behaviour has changed since ubuntu-meta 1.471 [1] when the ubuntu-
  server-minimal metapackage was introduced, declaring unattended-upgrades
  as Depends.
  
  [1] https://launchpadlibrarian.net/550345392/ubuntu-
  meta_1.470_1.471.diff.gz
  
  On Focal, there was no such behaviour on a fresh ubuntu-server install:
  
  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
-   unattended-upgrades
+   unattended-upgrades
  0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded.
  
  Removing unattended-upgrades is quite popular amongst our users, and
  they should be allowed to remove the package without removing the
  ubuntu-server-minimal metapackage.
  
  Looking at the source package for ubuntu-meta, unattended-upgrades is
  only Depends for ubuntu-server-minimal, maybe we should simply remove
- it?
+ it, or instead, change to recommends?
  
  $ grep -Rin "unattended-upgrades" .
  ./server-minimal-armhf:23:unattended-upgrades
  ./server-minimal-riscv64:23:unattended-upgrades
  ./server-minimal-arm64:23:unattended-upgrades
  ./server-minimal-ppc64el:23:unattended-upgrades
  ./server-minimal-s390x:24:unattended-upgrades
  ./server-minimal-amd64:23:unattended-upgrades
  
  $ sudo apt rdepends unattended-upgrades
  unattended-upgrades
  Reverse Depends:
-   Recommends: python3-software-properties
-   Recommends: ubuntu-mate-desktop
-   Recommends: ubuntu-mate-core
-   Depends: freedombox
-   Recommends: fbx-all
-   Depends: ubuntu-server-minimal
+   Recommends: python3-software-properties
+   Recommends: ubuntu-mate-desktop
+   Recommends: ubuntu-mate-core
+   Depends: freedombox
+   Recommends: fbx-all
+   Depends: ubuntu-server-minimal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ubuntu-meta in Ubuntu.
https://bugs.launchpad.net/bugs/1954724

Title:
  Removing unattended-upgrades removes ubuntu-server-minimal

Status in ubuntu-meta package in Ubuntu:
  New
Status in ubuntu-meta source package in Impish:
  New
Status in ubuntu-meta source package in Jammy:
  New

Bug description:
  On Impish and later, removing unattended-upgrades also removes ubuntu-
  server-minimal due to ubuntu-server-minimal depending on unattended-
  upgrades

  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
    ubuntu-server-minimal unattended-upgrades
  0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.

  This behaviour has changed since ubuntu-meta 1.471 [1] when the
  ubuntu-server-minimal metapackage was introduced, declaring
  unattended-upgrades as Depends.

  [1] https://launchpadlibrarian.net/550345392/ubuntu-
  meta_1.470_1.471.diff.gz

  On Focal, there was no such behaviour on a fresh ubuntu-server
  install:

  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
    unattended-upgrades
  0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded.

  Removing unattended-upgrades is quite popular amongst our users, and
  they should be allowed to remove the package without removing the
  ubuntu-server-minimal metapackage.

  Looking at the source package for ubuntu-meta, unattended-upgrades is
  only Depends for ubuntu-server-minimal, maybe we should simply remove
  it, or instead, change to recommends?

  $ grep -Rin "unattended-upgrades" .
  ./server-minimal-armhf:23:unattended-upgrades
  ./server-minimal-riscv64:23:unattended-upgrades
  ./server-minimal-arm64:23:unattended-upgrades
  ./server-minimal-ppc64el:23:unattended-upgrades
  ./server-minimal-s390x:24:unattended-upgrades
  ./server-minimal-amd64:23:unattended-upgrades

  $ sudo apt rdepends unattended-upgrades
  unattended-upgrades
  Reverse Depends:
    Recommends: python3-software-properties
    Recommends: ubuntu-mate-desktop
    Recommends: ubuntu-mate-core
    Depends: freedombox
    Recommends: fbx-all
    Depends: ubuntu-server-minimal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1954724/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1954724] [NEW] Removing unattended-upgrades removes ubuntu-server-minimal

2021-12-13 Thread Matthew Ruffell
Public bug reported:

On Impish and later, removing unattended-upgrades also removes ubuntu-
server-minimal due to ubuntu-server-minimal depending on unattended-
upgrades

$ sudo apt remove unattended-upgrades
...
The following packages will be REMOVED:
  ubuntu-server-minimal unattended-upgrades
0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.

This behaviour has changed since ubuntu-meta 1.471 [1] when the ubuntu-
server-minimal metapackage was introduced, declaring unattended-upgrades
as Depends.

[1] https://launchpadlibrarian.net/550345392/ubuntu-
meta_1.470_1.471.diff.gz

On Focal, there was no such behaviour on a fresh ubuntu-server install:

$ sudo apt remove unattended-upgrades
...
The following packages will be REMOVED:
  unattended-upgrades
0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded.

Removing unattended-upgrades is quite popular amongst our users, and
they should be allowed to remove the package without removing the
ubuntu-server-minimal metapackage.

Looking at the source package for ubuntu-meta, unattended-upgrades is
only Depends for ubuntu-server-minimal, maybe we should simply remove
it?

$ grep -Rin "unattended-upgrades" .
./server-minimal-armhf:23:unattended-upgrades
./server-minimal-riscv64:23:unattended-upgrades
./server-minimal-arm64:23:unattended-upgrades
./server-minimal-ppc64el:23:unattended-upgrades
./server-minimal-s390x:24:unattended-upgrades
./server-minimal-amd64:23:unattended-upgrades

$ sudo apt rdepends unattended-upgrades
unattended-upgrades
Reverse Depends:
  Recommends: python3-software-properties
  Recommends: ubuntu-mate-desktop
  Recommends: ubuntu-mate-core
  Depends: freedombox
  Recommends: fbx-all
  Depends: ubuntu-server-minimal

** Affects: ubuntu-meta (Ubuntu)
 Importance: Undecided
 Status: New

** Affects: ubuntu-meta (Ubuntu Impish)
 Importance: Undecided
 Status: New

** Affects: ubuntu-meta (Ubuntu Jammy)
 Importance: Undecided
 Status: New

** Also affects: ubuntu-meta (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: ubuntu-meta (Ubuntu Impish)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ubuntu-meta in Ubuntu.
https://bugs.launchpad.net/bugs/1954724

Title:
  Removing unattended-upgrades removes ubuntu-server-minimal

Status in ubuntu-meta package in Ubuntu:
  New
Status in ubuntu-meta source package in Impish:
  New
Status in ubuntu-meta source package in Jammy:
  New

Bug description:
  On Impish and later, removing unattended-upgrades also removes ubuntu-
  server-minimal due to ubuntu-server-minimal depending on unattended-
  upgrades

  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
ubuntu-server-minimal unattended-upgrades
  0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.

  This behaviour has changed since ubuntu-meta 1.471 [1] when the
  ubuntu-server-minimal metapackage was introduced, declaring
  unattended-upgrades as Depends.

  [1] https://launchpadlibrarian.net/550345392/ubuntu-
  meta_1.470_1.471.diff.gz

  On Focal, there was no such behaviour on a fresh ubuntu-server
  install:

  $ sudo apt remove unattended-upgrades
  ...
  The following packages will be REMOVED:
unattended-upgrades
  0 upgraded, 0 newly installed, 1 to remove and 9 not upgraded.

  Removing unattended-upgrades is quite popular amongst our users, and
  they should be allowed to remove the package without removing the
  ubuntu-server-minimal metapackage.

  Looking at the source package for ubuntu-meta, unattended-upgrades is
  only Depends for ubuntu-server-minimal, maybe we should simply remove
  it?

  $ grep -Rin "unattended-upgrades" .
  ./server-minimal-armhf:23:unattended-upgrades
  ./server-minimal-riscv64:23:unattended-upgrades
  ./server-minimal-arm64:23:unattended-upgrades
  ./server-minimal-ppc64el:23:unattended-upgrades
  ./server-minimal-s390x:24:unattended-upgrades
  ./server-minimal-amd64:23:unattended-upgrades

  $ sudo apt rdepends unattended-upgrades
  unattended-upgrades
  Reverse Depends:
Recommends: python3-software-properties
Recommends: ubuntu-mate-desktop
Recommends: ubuntu-mate-core
Depends: freedombox
Recommends: fbx-all
Depends: ubuntu-server-minimal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1954724/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1930359] Re: glib2.0: Uninitialised memory is written to gschema.compiled, failure to parse this file leads to gdm, gnome-shell failing to start

2021-07-13 Thread Matthew Ruffell
Performing verification for Focal.

I will first reproduce the problem with glib2.0 2.64.6-1~ubuntu20.04.3
from -security with the libglib2.0-0 libglib2.0-bin libglib2.0-data
packages.

I deleted all existing schemas from /usr/share/glib-2.0/schemas and
replaced them with a set of schemas which reproduce the problem easily
from my customer.

$ cd /usr/share/glib-2.0/schemas/
$ sudo rm *
$ sudo cp ~/schemas/* .

The gsettings.compiled from the customer has been corrupted, and when I
reboot, gdm fails to start and I get a blank screen with a blinking
insertion pointer.

The sha256 of the customers corrupted gsettings.compiled is:

$ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled
SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 
2c98dc9a7fdbac858a8d5ca7e4dd813f16058a46dba2c54b5239cd8cdba5bb3e

When I ssh back in, and recompile the file:

$ sudo glib-compile-schemas /usr/share/glib-2.0/schemas
Error parsing key “logout” in schema 
“org.gnome.settings-daemon.plugins.media-keys” as specified in override file 
“/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can 
not parse as value of type 'as'. Ignoring override for this key.
$ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled
SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 
78163b5fefbd6320ce0d355c9531bf657a4f4dc15f057d95ef144323cd56

The sha256 has changed. Doing a bindiff, I see:

$ sudo cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
376F E3 4F
3771 A4 C2

We see two bytes different. These bytes are the uninitialised memory
this bug is about. When I reboot, gdm starts fine, but that is because
this time I got lucky and the parser for the gschema.compiled file
thinks 4F and C2 are okay. But there are combinations which aren't okay,
and will end up with a corrupted gschema.compiled file.

Re-compiling the file again:

$ sudo glib-compile-schemas /usr/share/glib-2.0/schemas
Error parsing key “logout” in schema 
“org.gnome.settings-daemon.plugins.media-keys” as specified in override file 
“/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can 
not parse as value of type 'as'. Ignoring override for this key.
$ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled
SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 
460c70faca7afc26fa88a0e5918d312478e15f20ad84f4afaa5d17627a823e01

The sha256 changed, and if we bindiff, the bytes have changed:

$ sudo cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
376F E3 A6
3771 A4 A1

If we run glib-compile-schemas through valgrind, it reports that we are
writing to uninitialised memory:

https://paste.ubuntu.com/p/sxrQtbswpw/

I then enabled -proposed and installed libglib2.0-0 libglib2.0-bin
libglib2.0-data version 2.64.6-1~ubuntu20.04.4.

Now, when I re-compile the gschemas.compiled file, the sha256 matches
every time, meaning no more non-deterministic behaviour caused by
writing unitialised memory to disk:

$ sudo glib-compile-schemas /usr/share/glib-2.0/schemas
Error parsing key “logout” in schema 
“org.gnome.settings-daemon.plugins.media-keys” as specified in override file 
“/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can 
not parse as value of type 'as'. Ignoring override for this key.
$ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled
SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 
cd9132d18b596a304251cd1eb50b64aa6fd7511a312906f9a49e1975a319fbf1

$ sudo glib-compile-schemas /usr/share/glib-2.0/schemas
Error parsing key “logout” in schema 
“org.gnome.settings-daemon.plugins.media-keys” as specified in override file 
“/usr/share/glib-2.0/schemas/50_vmware_viewagent.gschema.override”: 0-22:can 
not parse as value of type 'as'. Ignoring override for this key.
$ sudo openssl sha256 /usr/share/glib-2.0/schemas/gschemas.compiled
SHA256(/usr/share/glib-2.0/schemas/gschemas.compiled)= 
cd9132d18b596a304251cd1eb50b64aa6fd7511a312906f9a49e1975a319fbf1

Doing a bindiff, I see the changed bytes from before are now all zeros,
which is what the patch initialises the buffer to:

$ sudo cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
376F E3 00
3771 A4 00
3772 55 00

Doing a run through valgrind, we no longer get a report about writing to
uninitialised memory:

https://paste.ubuntu.com/p/z52DGZcdz3/

Rebooting, the VM comes up and GDM starts properly, so glib can parse
the gsettings.compiled file without any issues.

Wonderful. The problem is fixed by the package in -proposed, happy to
mark as verified.

** Tags removed: sts-sponsor verification-needed verification-needed-focal
** Tags added: 

[Touch-packages] [Bug 1930359] Re: glib2.0: Uninitialised memory is written to gschema.compiled, failure to parse this file leads to gdm, gnome-shell failing to start

2021-07-11 Thread Matthew Ruffell
Attached is a debdiff for glib2.0 on Focal which fixes this problem.

** Patch added: "Debdiff for glib2.0 for Focal"
   
https://bugs.launchpad.net/ubuntu/+source/glib2.0/+bug/1930359/+attachment/5510466/+files/lp1930359_focal.debdiff

** Tags removed: regression-update
** Tags added: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to glib2.0 in Ubuntu.
https://bugs.launchpad.net/bugs/1930359

Title:
  glib2.0: Uninitialised memory is written to gschema.compiled, failure
  to parse this file leads to gdm, gnome-shell failing to start

Status in glib2.0 package in Ubuntu:
  Fix Released
Status in glib2.0 source package in Focal:
  In Progress

Bug description:
  [Impact]

  A recent SRU of mutter 3.36.9-0ubuntu0.20.04.1 caused an outage for a
  user with 300 VDIs running Focal, where GNOME applications would fail
  to start, and if you reboot, gdm and gnome-shell both fail to start,
  and you are left with a black screen and a blinking cursor.

  After much investigation, mutter was not at fault. Instead, mutter-
  common calls the libglib2.0-0 hook on upgrade:

  Processing triggers for libglib2.0-0:amd64 (2.64.6-1~ubuntu20.04.3)
  ...

  This in turn calls glib-compile-schemas to recompile the gsettings
  gschema cache, from the files in /usr/share/glib-2.0/schemas/. The
  result is a binary gschemas.compiled file, which is loaded by
  libglib2.0 on every invocation of a GNOME application, or gdm or
  gnome-shell to fetch application default settings.

  Now, glib2.0 2.64.6-1~ubuntu20.04.3 in Focal has some non-
  deterministic behaviour when calling glib-compile-schemas, causing
  generated gschemas.compiled files to have differing contents on each
  run:

  # glib-compile-schemas /usr/share/glib-2.0/schemas
  # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
  376F E3 D0
  3771 A4 DB

  # glib-compile-schemas /usr/share/glib-2.0/schemas
  # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
  376F E3 C3
  3771 A4 98

  # glib-compile-schemas /usr/share/glib-2.0/schemas
  # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
  376F E3 68
  3771 A4 30
  3772 55 56

  The bytes on the left are from a corrupted gschemas.compiled provided
  by an affected user. The changing bytes on the right are non-
  deterministic.

  I ran valgrind over glib-compile-schemas, and found that we are
  writing to uninitialised memory.

  https://paste.ubuntu.com/p/hvZccwdzxz/

  What is happening is that a submodule of glib, gvdb, contains the
  logic for serialising the gschema data structures, and when it
  allocates a buffer to store the eventual gschemas.compiled file, it
  does not initialise it.

  When we populate the fields in the buffer, some bytes are never
  overwritten, and these junk bytes find themselves written to
  gschemas.compiled.

  On boot, when gdm and gnome-shell attempt to parse and load this
  corrupted gschemas.compiled file, it can't parse the junk bytes, and
  raises and error, which propagates up to a breakpoint in glib logging,
  but no debugger is present, so the kernel traps the breakpoint, and
  terminates the library, and the calling application, e.g. gdm.

  The result is that the user is left starting at a black screen with a
  blinking pointer.

  [Testcase]

  On a Focal system, simply run valgrind over glib-compile-schemas:

  # valgrind glib-compile-schemas /usr/share/glib-2.0/schemas

  You will get output like this, with the warning "Syscall param
  write(buf) points to uninitialised byte(s)":

  https://paste.ubuntu.com/p/hvZccwdzxz/

  If you happen to have a large amount of gschema overrides present on
  your system, like my affected user does, you can save a copy of a
  generated gschema.compiled to your home directory and bindiff it
  against recompiles:

  # glib-compile-schemas /usr/share/glib-2.0/schemas
  # cp /usr/share/glib-2.0/schemas/gschema.compiled 
/home/ubuntu/schemas/gschemas.compiled
  # glib-compile-schemas /usr/share/glib-2.0/schemas
  # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
  376F E3 C3
  3771 A4 98

  If you install the test package from the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf311791-test

  When you run valgrind, it will report a clean run with no writing to
  uninitialised buffers, and all invocations of glib-compile-schemas
  will be deterministic, and generate the file same with the same sha256
  hash every time. The unwritten bytes if you do a 

[Touch-packages] [Bug 1930359] Re: gdm fails to start in a VMware Horizon VDI environment with latest mutter 3.36.9-0ubuntu0.20.04.1 in focal-updates

2021-07-11 Thread Matthew Ruffell
** No longer affects: mutter (Ubuntu)

** No longer affects: mutter (Ubuntu Focal)

** Changed in: glib2.0 (Ubuntu)
   Status: New => Fix Released

** Changed in: glib2.0 (Ubuntu Focal)
   Status: New => In Progress

** Changed in: glib2.0 (Ubuntu Focal)
   Importance: Undecided => High

** Summary changed:

- gdm fails to start in a VMware Horizon VDI environment with latest mutter 
3.36.9-0ubuntu0.20.04.1 in focal-updates
+ glib2.0: Uninitialised memory is written to gschema.compiled, failure to 
parse this file leads to gdm, gnome-shell failing to start

** Description changed:

  [Impact]
  
- gdm fails to start in a VMware Horizon VDI environment, with Nvidia GRID
- gpus passed into the VDIs.
+ A recent SRU of mutter 3.36.9-0ubuntu0.20.04.1 caused an outage for a
+ user with 300 VDIs running Focal, where GNOME applications would fail to
+ start, and if you reboot, gdm and gnome-shell both fail to start, and
+ you are left with a black screen and a blinking cursor.
  
- Downgrading mutter from 3.36.9-0ubuntu0.20.04.1 to 3.36.1-3ubuntu3 in
- -release fixes the issue, and the issue does not occur with
- 3.36.7+git20201123-0.20.04.1.
+ After much investigation, mutter was not at fault. Instead, mutter-
+ common calls the libglib2.0-0 hook on upgrade:
  
- Currently looking into what landed in bug 1919143 and bug 1905825.
+ Processing triggers for libglib2.0-0:amd64 (2.64.6-1~ubuntu20.04.3) ...
+ 
+ This in turn calls glib-compile-schemas to recompile the gsettings
+ gschema cache, from the files in /usr/share/glib-2.0/schemas/. The
+ result is a binary gschemas.compiled file, which is loaded by libglib2.0
+ on every invocation of a GNOME application, or gdm or gnome-shell to
+ fetch application default settings.
+ 
+ Now, glib2.0 2.64.6-1~ubuntu20.04.3 in Focal has some non-deterministic
+ behaviour when calling glib-compile-schemas, causing generated
+ gschemas.compiled files to have differing contents on each run:
+ 
+ # glib-compile-schemas /usr/share/glib-2.0/schemas
+ # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
+ 376F E3 D0
+ 3771 A4 DB
+ 
+ # glib-compile-schemas /usr/share/glib-2.0/schemas
+ # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
+ 376F E3 C3
+ 3771 A4 98
+ 
+ # glib-compile-schemas /usr/share/glib-2.0/schemas
+ # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
+ 376F E3 68
+ 3771 A4 30
+ 3772 55 56
+ 
+ The bytes on the left are from a corrupted gschemas.compiled provided by
+ an affected user. The changing bytes on the right are non-deterministic.
+ 
+ I ran valgrind over glib-compile-schemas, and found that we are writing
+ to uninitialised memory.
+ 
+ https://paste.ubuntu.com/p/hvZccwdzxz/
+ 
+ What is happening is that a submodule of glib, gvdb, contains the logic
+ for serialising the gschema data structures, and when it allocates a
+ buffer to store the eventual gschemas.compiled file, it does not
+ initialise it.
+ 
+ When we populate the fields in the buffer, some bytes are never
+ overwritten, and these junk bytes find themselves written to
+ gschemas.compiled.
+ 
+ On boot, when gdm and gnome-shell attempt to parse and load this
+ corrupted gschemas.compiled file, it can't parse the junk bytes, and
+ raises and error, which propagates up to a breakpoint in glib logging,
+ but no debugger is present, so the kernel traps the breakpoint, and
+ terminates the library, and the calling application, e.g. gdm.
+ 
+ The result is that the user is left starting at a black screen with a
+ blinking pointer.
  
  [Testcase]
  
+ On a Focal system, simply run valgrind over glib-compile-schemas:
+ 
+ # valgrind glib-compile-schemas /usr/share/glib-2.0/schemas
+ 
+ You will get output like this, with the warning "Syscall param
+ write(buf) points to uninitialised byte(s)":
+ 
+ https://paste.ubuntu.com/p/hvZccwdzxz/
+ 
+ If you happen to have a large amount of gschema overrides present on
+ your system, like my affected user does, you can save a copy of a
+ generated gschema.compiled to your home directory and bindiff it against
+ recompiles:
+ 
+ # glib-compile-schemas /usr/share/glib-2.0/schemas
+ # cp /usr/share/glib-2.0/schemas/gschema.compiled 
/home/ubuntu/schemas/gschemas.compiled
+ # glib-compile-schemas /usr/share/glib-2.0/schemas
+ # cmp -l /home/ubuntu/schemas/gschemas.compiled 
/usr/share/glib-2.0/schemas/gschemas.compiled | gawk '{printf "%08X %02X 
%02X\n", $1, strtonum(0$2), strtonum(0$3)}'
+ 376F E3 C3
+ 3771 A4 98
+ 
+ If you install the test package from the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf311791-test
+ 
+ When you run valgrind, it will 

[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix

2021-05-18 Thread Matthew Ruffell
Performing verification for Groovy

I enabled -proposed and installed libpam-modules libpam-modules-bin
libpam-runtime libpam0g version 1.3.1-5ubuntu6.20.10.1

>From there, I set the pam_faillock configuration in:

/etc/security/faillock.conf:
deny = 3
unlock_time = 120

and also:

/etc/pam.d/common-auth:

# here are the per-package modules (the "Primary" block)
authrequisite   pam_faillock.so preauth
auth[success=1 default=ignore]  pam_unix.so nullok_secure
auth[default=die]   pam_faillock.so authfail
authsufficient  pam_faillock.so authsucc
# here's the fallback if no module succeeds
authrequisite   pam_deny.so
# prime the stack with a positive return value if there isn't one already;
# this avoids us returning an error just because nothing sets a success code
# since the modules above will each just jump around
authrequiredpam_permit.so
# and here are more per-package modules (the "Additional" block)
authoptionalpam_cap.so
# end of pam-auth-update config

>From there, I created a new user "dave", and rebooted the system.

I connected via ssh with the "dave" user and used the wrong password 5 times.
I then tried with the correct password and found the account to be locked.

I waited 2 minutes, and tried again with the correct password, and I was logged
in.

When the account was locked, I logged in as the "ubuntu" user and ran:

$ sudo faillock --user dave
dave:
WhenType  Source   Valid
2021-05-19 02:08:53 RHOST 192.168.122.1V
2021-05-19 02:08:58 RHOST 192.168.122.1V
2021-05-19 02:09:02 RHOST 192.168.122.1V

And I could see the times that "dave" was locked.

I also tested resetting via:

$ sudo faillock --user dave --reset

and "dave" was allowed to log in again.

My tests agree with what Richard sees. Marking as verified for Groovy.

** Tags removed: verification-needed verification-needed-groovy
** Tags added: verification-done verification-done-groovy

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to pam in Ubuntu.
https://bugs.launchpad.net/bugs/1927796

Title:
  [SRU]pam_tally2 can cause accounts to be locked by correct password.
  pam_faillock use is the recommended fix

Status in pam package in Ubuntu:
  Fix Committed
Status in pam source package in Bionic:
  Fix Committed
Status in pam source package in Focal:
  Fix Committed
Status in pam source package in Groovy:
  Fix Committed
Status in pam source package in Hirsute:
  Fix Committed
Status in pam source package in Impish:
  Fix Committed

Bug description:
  [IMPACT]
  There is a known issue in pam_tally2 which may cause an account to be lock 
down even with correct password, in a busy node environment where simultaneous 
logins takes place (https://github.com/linux-pam/linux-pam/issues/71).

  There are already two customer cases from Canonical clients
  complaining about this behavior (00297697 and 00303806).

  Also, potentially, this will cause further problems in the future,
  since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to
  lock accounts when wrong passwords are used. And both benchmarks - but
  specially STIG - requires use of a lot of audit rules, which can lead
  to the busy node environment.

  The issue impacts all pam_tally2 versions distributed in all currently
  supported Ubuntu versions and also the next unreleased one. Note that,
  according to https://github.com/linux-pam/linux-pam/issues/71, there
  is no plan to fix this issue!

  [FIX]
  This fix proposes to add pam_faillock module to the PAM package, so users of 
pam_tally2 having issues can migrate to pam_faillock. We also plan to modify 
the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but 
in order to do so, we need the pam_faillock module to be available.

  Note that we don't propose to remove pam_tally2, since not every user
  of this module is affected.

  [TEST]
  Tested on a VM installed with Focal server iso and on another with Bionic 
server iso. Enabled pam_faillock module as recommeded by its man page. Then 
tried to log over ssh with an incorrect password, until the account got locked. 
Waited for the configured grace time to unlock and logged in using the correct 
password.

  Note that, since the pam_tally2 issue is caused by a racing condition,
  with a hard to recreate environment (we could not even reproduce it
  with pam_tally2), we could not reproduce the conditions to test
  pam_faillock with.

  [REGRESSION POTENTIAL]
  The regression potential for this is small, since we're not removing the old 
pam_tally2 module, just adding another one. So anyone still using pam_tally2 
will be able to do so.

To manage 

[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix

2021-05-18 Thread Matthew Ruffell
Performing verification for Hirsute

I enabled -proposed and installed libpam-modules libpam-modules-bin
libpam-runtime libpam0g version 1.3.1-5ubuntu6.21.04.1

>From there, I set the pam_faillock configuration in:

/etc/security/faillock.conf:
deny = 3
unlock_time = 120

and also:

/etc/pam.d/common-auth:

# here are the per-package modules (the "Primary" block)
authrequisite   pam_faillock.so preauth
auth[success=1 default=ignore]  pam_unix.so nullok_secure
auth[default=die]   pam_faillock.so authfail
authsufficient  pam_faillock.so authsucc
# here's the fallback if no module succeeds
authrequisite   pam_deny.so
# prime the stack with a positive return value if there isn't one already;
# this avoids us returning an error just because nothing sets a success code
# since the modules above will each just jump around
authrequiredpam_permit.so
# and here are more per-package modules (the "Additional" block)
authoptionalpam_cap.so
# end of pam-auth-update config

>From there, I created a new user "dave", and rebooted the system.

I connected via ssh with the "dave" user and used the wrong password 5 times.
I then tried with the correct password and found the account to be locked.

I waited 2 minutes, and tried again with the correct password, and I was logged
in.

When the account was locked, I logged in as the "ubuntu" user and ran:

$ sudo faillock --user dave
dave:
WhenType  Source   Valid
2021-05-19 01:50:25 RHOST 192.168.122.1V
2021-05-19 01:50:28 RHOST 192.168.122.1V
2021-05-19 01:50:31 RHOST 192.168.122.1V

And I could see the times that "dave" was locked.

I also tested resetting via:

$ sudo faillock --user dave --reset

and "dave" was allowed to log in again.

My tests agree with what Richard sees. Marking as verified for Hirsute.

** Tags removed: verification-needed-hirsute
** Tags added: verification-done-hirsute

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to pam in Ubuntu.
https://bugs.launchpad.net/bugs/1927796

Title:
  [SRU]pam_tally2 can cause accounts to be locked by correct password.
  pam_faillock use is the recommended fix

Status in pam package in Ubuntu:
  Fix Committed
Status in pam source package in Bionic:
  Fix Committed
Status in pam source package in Focal:
  Fix Committed
Status in pam source package in Groovy:
  Fix Committed
Status in pam source package in Hirsute:
  Fix Committed
Status in pam source package in Impish:
  Fix Committed

Bug description:
  [IMPACT]
  There is a known issue in pam_tally2 which may cause an account to be lock 
down even with correct password, in a busy node environment where simultaneous 
logins takes place (https://github.com/linux-pam/linux-pam/issues/71).

  There are already two customer cases from Canonical clients
  complaining about this behavior (00297697 and 00303806).

  Also, potentially, this will cause further problems in the future,
  since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to
  lock accounts when wrong passwords are used. And both benchmarks - but
  specially STIG - requires use of a lot of audit rules, which can lead
  to the busy node environment.

  The issue impacts all pam_tally2 versions distributed in all currently
  supported Ubuntu versions and also the next unreleased one. Note that,
  according to https://github.com/linux-pam/linux-pam/issues/71, there
  is no plan to fix this issue!

  [FIX]
  This fix proposes to add pam_faillock module to the PAM package, so users of 
pam_tally2 having issues can migrate to pam_faillock. We also plan to modify 
the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but 
in order to do so, we need the pam_faillock module to be available.

  Note that we don't propose to remove pam_tally2, since not every user
  of this module is affected.

  [TEST]
  Tested on a VM installed with Focal server iso and on another with Bionic 
server iso. Enabled pam_faillock module as recommeded by its man page. Then 
tried to log over ssh with an incorrect password, until the account got locked. 
Waited for the configured grace time to unlock and logged in using the correct 
password.

  Note that, since the pam_tally2 issue is caused by a racing condition,
  with a hard to recreate environment (we could not even reproduce it
  with pam_tally2), we could not reproduce the conditions to test
  pam_faillock with.

  [REGRESSION POTENTIAL]
  The regression potential for this is small, since we're not removing the old 
pam_tally2 module, just adding another one. So anyone still using pam_tally2 
will be able to do so.

To manage notifications about this bug go to:

[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix

2021-05-18 Thread Matthew Ruffell
Performing verification for Focal

I enabled -proposed and installed libpam-modules libpam-modules-bin
libpam-runtime libpam0g version 1.3.1-5ubuntu4.2

>From there, I set the pam_faillock configuration in:

/etc/security/faillock.conf:
deny = 3
unlock_time = 120

and also:

/etc/pam.d/common-auth:

# here are the per-package modules (the "Primary" block)
authrequisite   pam_faillock.so preauth
auth[success=1 default=ignore]  pam_unix.so nullok_secure
auth[default=die]   pam_faillock.so authfail
authsufficient  pam_faillock.so authsucc
# here's the fallback if no module succeeds
authrequisite   pam_deny.so
# prime the stack with a positive return value if there isn't one already;
# this avoids us returning an error just because nothing sets a success code
# since the modules above will each just jump around
authrequiredpam_permit.so
# and here are more per-package modules (the "Additional" block)
authoptionalpam_cap.so
# end of pam-auth-update config

>From there, I created a new user "dave", and rebooted the system.

I connected via ssh with the "dave" user and used the wrong password 5 times.
I then tried with the correct password and found the account to be locked.

I waited 2 minutes, and tried again with the correct password, and I was logged
in.

When the account was locked, I logged in as the "ubuntu" user and ran:

$ sudo faillock --user dave
dave:
WhenType  Source   Valid
2021-05-19 00:31:08 RHOST 192.168.122.1V
2021-05-19 00:31:13 RHOST 192.168.122.1V
2021-05-19 00:31:17 RHOST 192.168.122.1V

And I could see the times that "dave" was locked.

I also tested resetting via:

$ sudo faillock --user dave --reset

and "dave" was allowed to log in again.

My tests agree with what Richard sees. Marking as verified for Focal.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to pam in Ubuntu.
https://bugs.launchpad.net/bugs/1927796

Title:
  [SRU]pam_tally2 can cause accounts to be locked by correct password.
  pam_faillock use is the recommended fix

Status in pam package in Ubuntu:
  Fix Committed
Status in pam source package in Bionic:
  Fix Committed
Status in pam source package in Focal:
  Fix Committed
Status in pam source package in Groovy:
  Fix Committed
Status in pam source package in Hirsute:
  Fix Committed
Status in pam source package in Impish:
  Fix Committed

Bug description:
  [IMPACT]
  There is a known issue in pam_tally2 which may cause an account to be lock 
down even with correct password, in a busy node environment where simultaneous 
logins takes place (https://github.com/linux-pam/linux-pam/issues/71).

  There are already two customer cases from Canonical clients
  complaining about this behavior (00297697 and 00303806).

  Also, potentially, this will cause further problems in the future,
  since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to
  lock accounts when wrong passwords are used. And both benchmarks - but
  specially STIG - requires use of a lot of audit rules, which can lead
  to the busy node environment.

  The issue impacts all pam_tally2 versions distributed in all currently
  supported Ubuntu versions and also the next unreleased one. Note that,
  according to https://github.com/linux-pam/linux-pam/issues/71, there
  is no plan to fix this issue!

  [FIX]
  This fix proposes to add pam_faillock module to the PAM package, so users of 
pam_tally2 having issues can migrate to pam_faillock. We also plan to modify 
the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but 
in order to do so, we need the pam_faillock module to be available.

  Note that we don't propose to remove pam_tally2, since not every user
  of this module is affected.

  [TEST]
  Tested on a VM installed with Focal server iso and on another with Bionic 
server iso. Enabled pam_faillock module as recommeded by its man page. Then 
tried to log over ssh with an incorrect password, until the account got locked. 
Waited for the configured grace time to unlock and logged in using the correct 
password.

  Note that, since the pam_tally2 issue is caused by a racing condition,
  with a hard to recreate environment (we could not even reproduce it
  with pam_tally2), we could not reproduce the conditions to test
  pam_faillock with.

  [REGRESSION POTENTIAL]
  The regression potential for this is small, since we're not removing the old 
pam_tally2 module, just adding another one. So anyone still using pam_tally2 
will be able to do so.

To manage notifications about this bug go to:

[Touch-packages] [Bug 1927796] Re: [SRU]pam_tally2 can cause accounts to be locked by correct password. pam_faillock use is the recommended fix

2021-05-18 Thread Matthew Ruffell
Performing verification for Bionic

I enabled -proposed and installed libpam-modules libpam-modules-bin
libpam-runtime libpam0g version 1.1.8-3.6ubuntu2.18.04.3

>From there, I set the pam_faillock configuration in:

/etc/security/faillock.conf:
deny = 3
unlock_time = 120

and also:

/etc/pam.d/common-auth:

# here are the per-package modules (the "Primary" block)
authrequisite   pam_faillock.so preauth
auth[success=1 default=ignore]  pam_unix.so nullok_secure
auth[default=die]   pam_faillock.so authfail
authsufficient  pam_faillock.so authsucc
# here's the fallback if no module succeeds
authrequisite   pam_deny.so
# prime the stack with a positive return value if there isn't one already;
# this avoids us returning an error just because nothing sets a success code
# since the modules above will each just jump around
authrequiredpam_permit.so
# and here are more per-package modules (the "Additional" block)
authoptionalpam_cap.so
# end of pam-auth-update config

>From there, I created a new user "dave", and rebooted the system.

I connected via ssh with the "dave" user and used the wrong password 5 times.
I then tried with the correct password and found the account to be locked.

I waited 2 minutes, and tried again with the correct password, and I was logged
in.

When the account was locked, I logged in as the "ubuntu" user and ran:

$ sudo faillock --user dave
dave:
WhenType  Source   Valid
2021-05-19 00:57:10 RHOST 192.168.122.1V
2021-05-19 00:57:12 RHOST 192.168.122.1V
2021-05-19 00:57:16 RHOST 192.168.122.1V

And I could see the times that "dave" was locked.

I also tested resetting via:

$ sudo faillock --user dave --reset

and "dave" was allowed to log in again.

My tests agree with what Richard sees. Marking as verified for Bionic.

** Tags removed: verification-needed-bionic
** Tags added: sts verification-done-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to pam in Ubuntu.
https://bugs.launchpad.net/bugs/1927796

Title:
  [SRU]pam_tally2 can cause accounts to be locked by correct password.
  pam_faillock use is the recommended fix

Status in pam package in Ubuntu:
  Fix Committed
Status in pam source package in Bionic:
  Fix Committed
Status in pam source package in Focal:
  Fix Committed
Status in pam source package in Groovy:
  Fix Committed
Status in pam source package in Hirsute:
  Fix Committed
Status in pam source package in Impish:
  Fix Committed

Bug description:
  [IMPACT]
  There is a known issue in pam_tally2 which may cause an account to be lock 
down even with correct password, in a busy node environment where simultaneous 
logins takes place (https://github.com/linux-pam/linux-pam/issues/71).

  There are already two customer cases from Canonical clients
  complaining about this behavior (00297697 and 00303806).

  Also, potentially, this will cause further problems in the future,
  since both STIG benchmarks and CIS benchmarks rely on pam_tally2 to
  lock accounts when wrong passwords are used. And both benchmarks - but
  specially STIG - requires use of a lot of audit rules, which can lead
  to the busy node environment.

  The issue impacts all pam_tally2 versions distributed in all currently
  supported Ubuntu versions and also the next unreleased one. Note that,
  according to https://github.com/linux-pam/linux-pam/issues/71, there
  is no plan to fix this issue!

  [FIX]
  This fix proposes to add pam_faillock module to the PAM package, so users of 
pam_tally2 having issues can migrate to pam_faillock. We also plan to modify 
the current STIG benchmarks to rely on pam_faillock instead of pam_tally2, but 
in order to do so, we need the pam_faillock module to be available.

  Note that we don't propose to remove pam_tally2, since not every user
  of this module is affected.

  [TEST]
  Tested on a VM installed with Focal server iso and on another with Bionic 
server iso. Enabled pam_faillock module as recommeded by its man page. Then 
tried to log over ssh with an incorrect password, until the account got locked. 
Waited for the configured grace time to unlock and logged in using the correct 
password.

  Note that, since the pam_tally2 issue is caused by a racing condition,
  with a hard to recreate environment (we could not even reproduce it
  with pam_tally2), we could not reproduce the conditions to test
  pam_faillock with.

  [REGRESSION POTENTIAL]
  The regression potential for this is small, since we're not removing the old 
pam_tally2 module, just adding another one. So anyone still using pam_tally2 
will be able to do so.

To manage notifications about this bug go 

[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-05-11 Thread Matthew Ruffell
Performing verification for Groovy.

I went and generated the ssl certificates and attempted to verify them with
the openssl version 1.1.1f-1ubuntu4.3 from -updates.

ubuntu@deep-mako:~$ sudo apt-cache policy openssl | grep Installed
  Installed: 1.1.1f-1ubuntu4.3
ubuntu@deep-mako:~$ mkdir reproducer
ubuntu@deep-mako:~$ cd reproducer
ubuntu@deep-mako:~/reproducer$ mkdir CA
ubuntu@deep-mako:~/reproducer$ cat << EOF >> rootCA.cnf
> [ req ]
> prompt = no
> distinguished_name = req_distinguished_name
> x509_extensions = usr_cert
> 
> [ req_distinguished_name ]
> C = DE
> O = Test Org
> CN = Test RSA PSS Root-CA
> 
> [ usr_cert ]
> basicConstraints = critical,CA:TRUE
> keyUsage = critical,keyCertSign,cRLSign
> subjectKeyIdentifier = hash
> authorityKeyIdentifier = keyid:always
> EOF
ubuntu@deep-mako:~/reproducer$ cat << EOF >> subCA.cnf
> [ req ]
> prompt = no
> distinguished_name = req_distinguished_name
> x509_extensions = usr_cert
> 
> [ req_distinguished_name ]
> C = DE
> O = Test Org
> CN = Test RSA PSS Sub-CA
> 
> [ usr_cert ]
> basicConstraints = critical,CA:TRUE,pathlen:0
> keyUsage = critical,keyCertSign,cRLSign
> subjectKeyIdentifier = hash
> authorityKeyIdentifier = keyid:always
> EOF
ubuntu@deep-mako:~/reproducer$ cat << EOF >> user.cnf
> [ req ]
> prompt = no
> distinguished_name = req_distinguished_name
> x509_extensions = usr_cert
> 
> [ req_distinguished_name ]
> C = DE
> O = Test Org
> CN = Test User
> 
> [ usr_cert ]
> basicConstraints = critical,CA:FALSE,pathlen:0
> keyUsage = critical,digitalSignature,keyAgreement
> extendedKeyUsage = clientAuth,serverAuth
> subjectKeyIdentifier = hash
> authorityKeyIdentifier = keyid:always
> EOF
ubuntu@deep-mako:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out 
rootCA_key.pem -pkeyopt rsa_keygen_bits:2048
+
+
ubuntu@deep-mako:~/reproducer$ openssl req -config rootCA.cnf -set_serial 01 
-new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key 
rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
ubuntu@deep-mako:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out 
subCA_key.pem -pkeyopt rsa_keygen_bits:2048
..+
.+
ubuntu@deep-mako:~/reproducer$ openssl req -config subCA.cnf -new -out 
subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt 
rsa_pss_saltlen:-1
ubuntu@deep-mako:~/reproducer$ openssl x509 -req -sha256 -in subCA_req.pem -CA 
CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial 
rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 
4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
Signature ok
subject=C = DE, O = Test Org, CN = Test RSA PSS Sub-CA
Getting CA Private Key
ubuntu@deep-mako:~/reproducer$ c_rehash CA
Doing CA
ubuntu@deep-mako:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out 
user1_key.pem -pkeyopt rsa_keygen_bits:2048
...+
.+
ubuntu@deep-mako:~/reproducer$ openssl req -config user.cnf -new -out 
user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt 
rsa_pss_saltlen:-1
ubuntu@deep-mako:~/reproducer$ openssl x509 -req -sha256 -in user1_req.pem -CA 
CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial 
subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 
1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
Signature ok
subject=C = DE, O = Test Org, CN = Test User
Getting CA Private Key

Now going and verifying the certificates:

ubuntu@deep-mako:~/reproducer$ openssl verify -CAfile CA/rootCA_cert.pem 
-untrusted CA/subCA_cert.pem user1_cert.pem
C = DE, O = Test Org, CN = Test User
error 20 at 0 depth lookup: unable to get local issuer certificate
error user1_cert.pem: verification failed 

We see verification failed, again on CA:FALSE,pathlen:0
basicConstraints.

Now if we enable -proposed and install openssl 1.1.1f-1ubuntu4.4.

$ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
user1_cert.pem: OK

The certificate verifies properly. The problem is fixed.

Additionally, if we examine the new unit tests added to openssl's testsuite in
the buildlog for Groovy:

https://launchpadlibrarian.net/537503607/buildlog_ubuntu-groovy-
amd64.openssl_1.1.1f-1ubuntu4.4_BUILDING.txt.gz

We see:

../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level 1 -purpose 
sslserver -trusted ../../../test/certs/root-cert.pem -untrusted 
../../../test/certs/ca-cert.pem ../../../test/certs/ee-pathlen.pem => 0
ok 84 - accept non-ca with pathlen:0 by default
CN = server.example
error 41 at 0 depth lookup: invalid or inconsistent certificate extension
error ../../../test/certs/ee-pathlen.pem: verification failed
../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level 1 -purpose 
sslserver -x509_strict -trusted 

[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-05-11 Thread Matthew Ruffell
Performing verification for Focal

Generating the ssl certificates, and reproducing the problem with version
1.1.1f-1ubuntu2.3 from -updates.

ubuntu@select-lobster:~$ sudo apt-cache policy openssl | grep Installed
  Installed: 1.1.1f-1ubuntu2.3
ubuntu@select-lobster:~$ mkdir reproducer
ubuntu@select-lobster:~$ cd reproducer
ubuntu@select-lobster:~/reproducer$ mkdir CA
ubuntu@select-lobster:~/reproducer$ cat << EOF >> rootCA.cnf
> [ req ]
> prompt = no
> distinguished_name = req_distinguished_name
> x509_extensions = usr_cert
> 
> [ req_distinguished_name ]
> C = DE
> O = Test Org
> CN = Test RSA PSS Root-CA
> 
> [ usr_cert ]
> basicConstraints = critical,CA:TRUE
> keyUsage = critical,keyCertSign,cRLSign
> subjectKeyIdentifier = hash
> authorityKeyIdentifier = keyid:always
> EOF
ubuntu@select-lobster:~/reproducer$ cat << EOF >> subCA.cnf
> [ req ]
> prompt = no
> distinguished_name = req_distinguished_name
> x509_extensions = usr_cert
> 
> [ req_distinguished_name ]
> C = DE
> O = Test Org
> CN = Test RSA PSS Sub-CA
> 
> [ usr_cert ]
> basicConstraints = critical,CA:TRUE,pathlen:0
> keyUsage = critical,keyCertSign,cRLSign
> subjectKeyIdentifier = hash
> authorityKeyIdentifier = keyid:always
> EOF
ubuntu@select-lobster:~/reproducer$ cat << EOF >> user.cnf
> [ req ]
> prompt = no
> distinguished_name = req_distinguished_name
> x509_extensions = usr_cert
> 
> [ req_distinguished_name ]
> C = DE
> O = Test Org
> CN = Test User
> 
> [ usr_cert ]
> basicConstraints = critical,CA:FALSE,pathlen:0
> keyUsage = critical,digitalSignature,keyAgreement
> extendedKeyUsage = clientAuth,serverAuth
> subjectKeyIdentifier = hash
> authorityKeyIdentifier = keyid:always
> EOF
ubuntu@select-lobster:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out 
rootCA_key.pem -pkeyopt rsa_keygen_bits:2048
..+
+
ubuntu@select-lobster:~/reproducer$ openssl req -config rootCA.cnf -set_serial 
01 -new -batch -sha256 -nodes -x509 -days 9125 -out CA/rootCA_cert.pem -key 
rootCA_key.pem -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
ubuntu@select-lobster:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out 
subCA_key.pem -pkeyopt rsa_keygen_bits:2048
+
+
ubuntu@select-lobster:~/reproducer$ openssl req -config subCA.cnf -new -out 
subCA_req.pem -key subCA_key.pem -sigopt rsa_padding_mode:pss -sigopt 
rsa_pss_saltlen:-1
ubuntu@select-lobster:~/reproducer$ openssl x509 -req -sha256 -in subCA_req.pem 
-CA CA/rootCA_cert.pem -CAkey rootCA_key.pem -out CA/subCA_cert.pem -CAserial 
rootCA_serial.txt -CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 
4380 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
Signature ok
subject=C = DE, O = Test Org, CN = Test RSA PSS Sub-CA
Getting CA Private Key
ubuntu@select-lobster:~/reproducer$ c_rehash CA
Doing CA
ubuntu@select-lobster:~/reproducer$ openssl genpkey -algorithm RSA-PSS -out 
user1_key.pem -pkeyopt rsa_keygen_bits:2048
...+
.+
ubuntu@select-lobster:~/reproducer$ openssl req -config user.cnf -new -out 
user1_req.pem -key user1_key.pem -sigopt rsa_padding_mode:pss -sigopt 
rsa_pss_saltlen:-1
ubuntu@select-lobster:~/reproducer$ openssl x509 -req -sha256 -in user1_req.pem 
-CA CA/subCA_cert.pem -CAkey subCA_key.pem -out user1_cert.pem -CAserial 
subCA_serial.txt -CAcreateserial -extfile user.cnf -extensions usr_cert -days 
1825 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
Signature ok
subject=C = DE, O = Test Org, CN = Test User
Getting CA Private Key

Now, we verify the certificates:

ubuntu@select-lobster:~/reproducer$ openssl verify -CAfile CA/rootCA_cert.pem 
-untrusted CA/subCA_cert.pem user1_cert.pem
C = DE, O = Test Org, CN = Test User
error 20 at 0 depth lookup: unable to get local issuer certificate
error user1_cert.pem: verification failed

We see verification fail, due to CA:FALSE,pathlen:0 basicConstraints.

I then enabled -proposed, and installed openssl and libssl1.1 version
1.1.1f-1ubuntu2.4

If we then repeat the certificate validation:

ubuntu@select-lobster:~/reproducer$ openssl verify -CAfile CA/rootCA_cert.pem 
-untrusted CA/subCA_cert.pem user1_cert.pem
user1_cert.pem: OK

The certificates validate properly.

Additionally, if we examine the new unit tests added to openssl's testsuite in
the buildlog for focal:

https://launchpadlibrarian.net/537505620/buildlog_ubuntu-focal-
amd64.openssl_1.1.1f-1ubuntu2.4_BUILDING.txt.gz

we see:

../../../test/certs/ee-pathlen.pem: OK
../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level 1 -purpose 
sslserver -trusted ../../../test/certs/root-cert.pem -untrusted 
../../../test/certs/ca-cert.pem ../../../test/certs/ee-pathlen.pem => 0
ok 84 - accept non-ca with pathlen:0 by default
CN = server.example
error 41 at 0 depth lookup: invalid or inconsistent certificate extension
error ../../../test/certs/ee-pathlen.pem: verification failed
../../util/shlib_wrap.sh ../../apps/openssl verify -auth_level 

[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
** Description changed:

  [impact]
  
  openssl doesn't build source properly because of a badly-constructed
  patch
  
  [test case]
  
  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25
  
+ Test builds are available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/lp1927161-test
+ 
  [regression potential]
  
  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding support
  for Intel CET
  
  [scope]
  
  this is needed only for g and later
  
  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier
  
  [other info]
  
  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  Test builds are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1927161-test

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
Attached is a V2 for hirsute which correctly has d/p/ in the
debian/changelog.

** Patch added: "debdiff for openssl on hirsute"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494814/+files/lp1927161_hirsute_v2.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
Attached is a V2 for impish which correctly has d/p/ in the
debian/changelog.

** Patch added: "debdiff for openssl on impish"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494813/+files/lp1927161_impish_v2.debdiff

** Patch removed: "debdiff for openssl on hirsute"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494811/+files/lp1927161_hirsute.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
Attached is a debdiff for openssl on groovy, which fixes this issue, and
also bug 1926254

** Patch added: "debdiff for openssl on groovy"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494812/+files/lp1926254_lp1927161_groovy.debdiff

** Patch removed: "debdiff for openssl on impish"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494810/+files/lp1927161_impish.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
Attached is a debdiff for openssl on hirsute which fixes this problem.

** Patch added: "debdiff for openssl on hirsute"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494811/+files/lp1927161_hirsute.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
Attached is a debdiff for impish which fixes this problem.

** Patch added: "debdiff for openssl on impish"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+attachment/5494810/+files/lp1927161_impish.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
I split 'pr12272.patch' into one file per commit, and I did a diff to
ensure that there is no changes to the code:

https://paste.ubuntu.com/p/zDqqXmsM8c/

When using these split up patches "dpkg-buildpackage -d -S" completes
successfully.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1927161] Re: dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch' patches files multiple times; split the diff in multiple files or merge the hunks into a single one

2021-05-04 Thread Matthew Ruffell
** Changed in: openssl (Ubuntu Groovy)
   Status: New => In Progress

** Changed in: openssl (Ubuntu Hirsute)
   Status: New => In Progress

** Changed in: openssl (Ubuntu Impish)
   Status: New => In Progress

** Changed in: openssl (Ubuntu Groovy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: openssl (Ubuntu Hirsute)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: openssl (Ubuntu Impish)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1927161

Title:
  dpkg-source: error: diff 'openssl/debian/patches/pr12272.patch'
  patches files multiple times; split the diff in multiple files or
  merge the hunks into a single one

Status in openssl package in Ubuntu:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  In Progress
Status in openssl source package in Impish:
  In Progress

Bug description:
  [impact]

  openssl doesn't build source properly because of a badly-constructed
  patch

  [test case]

  $ pull-lp-source openssl groovy
  ...
  $ cd openssl-1.1.1f/
  $ quilt pop -a
  ...
  $ dpkg-buildpackage -d -S
  dpkg-buildpackage: info: source package openssl
  dpkg-buildpackage: info: source version 1.1.1f-1ubuntu4.3
  dpkg-buildpackage: info: source distribution groovy-security
  dpkg-buildpackage: info: source changed by Marc Deslauriers 

   dpkg-source --before-build .
  dpkg-source: warning: can't parse dependency perl:native
  dpkg-source: error: diff 'openssl-1.1.1f/debian/patches/pr12272.patch' 
patches files multiple times; split the diff in multiple files or merge the 
hunks into a single one
  dpkg-buildpackage: error: dpkg-source --before-build . subprocess returned 
exit status 25

  [regression potential]

  any regression would likely cause a failed build or would affect the
  functionality that patch pr12272 was added for, which is adding
  support for Intel CET

  [scope]

  this is needed only for g and later

  this is caused by the bad patch 'pr12272.patch' which is only included
  in g/h/i, so this does not apply to f or earlier

  [other info]

  note that if the patches are applied, this bug is bypassed; i.e. if
  'quilt pop -a' is removed from the test case above, the bug doesn't
  reproduce. this is only a problem when the patches aren't already
  applied and dpkg-buildpackage needs to call dpkg-source to apply the
  patches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1927161/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-05-02 Thread Matthew Ruffell
Hi Seth,

Thanks for the review.

I read the commit you found:

commit 1e41dadfa7b9f792ed0f4714a3d3d36f070cf30e
Author: Dr. David von Oheimb 
Date:   Sat Jun 27 16:16:12 2020 +0200
Subject: Extend X509 cert checks and error reporting in v3_{purp,crld}.c and 
x509_{set,vfy}.c
Link: 
https://github.com/openssl/openssl/commit/1e41dadfa7b9f792ed0f4714a3d3d36f070cf30e

Firstly, yes, you are right, this commit does refactor the code I am
suggesting we SRU to focal and groovy, but upon further inspection, this
commit was not backported to the 1.1.1 stable series, as it is missing
from the OpenSSL_1_1_1-stable branch. As you mentioned, it is a fairly
invasive change and modifies a lot of different x509 components, it
isn't suitable to be backported to 1.1.1 stable anyway, and much less be
acceptable for SRU to focal or groovy.

I think we should stick to the small targeted commits I suggested for
this SRU, since they are a part of 1.1.1 stable, and are already in
hirsute onward.

To test that the logic from the suggested commits to SRU matches this
new refactor commit from version 3.0alpha, I went and built the master
branch of openssl, which had commit
d1a770414acd34c774248ce8efbe202fd7a44041 at HEAD.

$ env LD_LIBRARY_PATH="/home/ubuntu/openssl/" ../openssl/apps/openssl version
OpenSSL 3.0.0-alpha16-dev  (Library: OpenSSL 3.0.0-alpha16-dev )

$ env LD_LIBRARY_PATH="/home/ubuntu/openssl/" ../openssl/apps/openssl verify 
-CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem user1_cert.pem
user1_cert.pem: OK

The logic matches and the reproducer certificates verify OK. This
confirms we aren't backporting a short lived change, and that this
behaviour is the desired and accepted outcome.

@ddstreet Please go ahead and sponsor the SRU to -updates, thanks.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1926254

Title:
  x509 Certificate verification fails when
  basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In openssl 1.1.1f, the below commit was merged:

  commit ba4356ae4002a04e28642da60c551877eea804f7
  Author: Bernd Edlinger 
  Date:   Sat Jan 4 15:54:53 2020 +0100
  Subject: Fix error handling in x509v3_cache_extensions and related functions
  Link: 
https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7

  This introduced a regression which caused certificate validation to
  fail when certificates violate RFC 5280 [1], namely, when a
  certificate has "basicConstraints=CA:FALSE,pathlen:0". This
  combination is commonly seen by self-signed leaf certificates with an
  intermediate CA before the root CA.

  Because of this, openssl 1.1.1f rejects these certificates and they
  cannot be used in the system certificate store, and ssl connections
  fail when you try to use them to connect to a ssl endpoint.

  The error you see when you try verify is:

  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer certificate
  error user1_cert.pem: verification failed

  The exact same certificates work fine on Xenial, Bionic and Hirsute.

  [1] https://tools.ietf.org/html/rfc5280.html

  [Testcase]

  We will create our own root CA, intermediate CA and leaf server
  certificate.

  Create necessary directories:

  $ mkdir reproducer
  $ cd reproducer
  $ mkdir CA

  Write openssl configuration files to disk for each CA and cert:

  $ cat << EOF >> rootCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Root-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> subCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Sub-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE,pathlen:0
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> user.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test User

  [ usr_cert ]
  basicConstraints= critical,CA:FALSE,pathlen:0
  keyUsage= 

[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-04-29 Thread Matthew Ruffell
** Tags added: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1926254

Title:
  x509 Certificate verification fails when
  basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In openssl 1.1.1f, the below commit was merged:

  commit ba4356ae4002a04e28642da60c551877eea804f7
  Author: Bernd Edlinger 
  Date:   Sat Jan 4 15:54:53 2020 +0100
  Subject: Fix error handling in x509v3_cache_extensions and related functions
  Link: 
https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7

  This introduced a regression which caused certificate validation to
  fail when certificates violate RFC 5280 [1], namely, when a
  certificate has "basicConstraints=CA:FALSE,pathlen:0". This
  combination is commonly seen by self-signed leaf certificates with an
  intermediate CA before the root CA.

  Because of this, openssl 1.1.1f rejects these certificates and they
  cannot be used in the system certificate store, and ssl connections
  fail when you try to use them to connect to a ssl endpoint.

  The error you see when you try verify is:

  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer certificate
  error user1_cert.pem: verification failed

  The exact same certificates work fine on Xenial, Bionic and Hirsute.

  [1] https://tools.ietf.org/html/rfc5280.html

  [Testcase]

  We will create our own root CA, intermediate CA and leaf server
  certificate.

  Create necessary directories:

  $ mkdir reproducer
  $ cd reproducer
  $ mkdir CA

  Write openssl configuration files to disk for each CA and cert:

  $ cat << EOF >> rootCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Root-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> subCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Sub-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE,pathlen:0
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> user.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test User

  [ usr_cert ]
  basicConstraints= critical,CA:FALSE,pathlen:0
  keyUsage= critical,digitalSignature,keyAgreement
  extendedKeyUsage= clientAuth,serverAuth
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  Then generate the necessary RSA keys and form certificates:

  $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes 
-x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1

  $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey 
rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt 
-CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ c_rehash CA

  $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey 
subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial 
-extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss 
-sigopt rsa_pss_saltlen:-1

  Now, let's try verify the generated certificates:

  $ openssl version
  OpenSSL 1.1.1f  31 Mar 2020
  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer 

[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-04-29 Thread Matthew Ruffell
Attached is a debdiff for openssl on Groovy which fixes this bug.

** Patch added: "Debdiff for openssl on Groovy"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1926254/+attachment/5493443/+files/lp1926254_groovy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1926254

Title:
  x509 Certificate verification fails when
  basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In openssl 1.1.1f, the below commit was merged:

  commit ba4356ae4002a04e28642da60c551877eea804f7
  Author: Bernd Edlinger 
  Date:   Sat Jan 4 15:54:53 2020 +0100
  Subject: Fix error handling in x509v3_cache_extensions and related functions
  Link: 
https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7

  This introduced a regression which caused certificate validation to
  fail when certificates violate RFC 5280 [1], namely, when a
  certificate has "basicConstraints=CA:FALSE,pathlen:0". This
  combination is commonly seen by self-signed leaf certificates with an
  intermediate CA before the root CA.

  Because of this, openssl 1.1.1f rejects these certificates and they
  cannot be used in the system certificate store, and ssl connections
  fail when you try to use them to connect to a ssl endpoint.

  The error you see when you try verify is:

  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer certificate
  error user1_cert.pem: verification failed

  The exact same certificates work fine on Xenial, Bionic and Hirsute.

  [1] https://tools.ietf.org/html/rfc5280.html

  [Testcase]

  We will create our own root CA, intermediate CA and leaf server
  certificate.

  Create necessary directories:

  $ mkdir reproducer
  $ cd reproducer
  $ mkdir CA

  Write openssl configuration files to disk for each CA and cert:

  $ cat << EOF >> rootCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Root-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> subCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Sub-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE,pathlen:0
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> user.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test User

  [ usr_cert ]
  basicConstraints= critical,CA:FALSE,pathlen:0
  keyUsage= critical,digitalSignature,keyAgreement
  extendedKeyUsage= clientAuth,serverAuth
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  Then generate the necessary RSA keys and form certificates:

  $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes 
-x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1

  $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey 
rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt 
-CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ c_rehash CA

  $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey 
subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial 
-extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss 
-sigopt rsa_pss_saltlen:-1

  Now, let's try verify the generated 

[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-04-29 Thread Matthew Ruffell
Attached is a debdiff for openssl on Focal which fixes this bug.

** Patch added: "Debdiff for openssl on focal"
   
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1926254/+attachment/5493442/+files/lp1926254_focal.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/1926254

Title:
  x509 Certificate verification fails when
  basicConstraints=CA:FALSE,pathlen:0 on self-signed leaf certs

Status in openssl package in Ubuntu:
  Fix Released
Status in openssl source package in Focal:
  In Progress
Status in openssl source package in Groovy:
  In Progress
Status in openssl source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In openssl 1.1.1f, the below commit was merged:

  commit ba4356ae4002a04e28642da60c551877eea804f7
  Author: Bernd Edlinger 
  Date:   Sat Jan 4 15:54:53 2020 +0100
  Subject: Fix error handling in x509v3_cache_extensions and related functions
  Link: 
https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7

  This introduced a regression which caused certificate validation to
  fail when certificates violate RFC 5280 [1], namely, when a
  certificate has "basicConstraints=CA:FALSE,pathlen:0". This
  combination is commonly seen by self-signed leaf certificates with an
  intermediate CA before the root CA.

  Because of this, openssl 1.1.1f rejects these certificates and they
  cannot be used in the system certificate store, and ssl connections
  fail when you try to use them to connect to a ssl endpoint.

  The error you see when you try verify is:

  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer certificate
  error user1_cert.pem: verification failed

  The exact same certificates work fine on Xenial, Bionic and Hirsute.

  [1] https://tools.ietf.org/html/rfc5280.html

  [Testcase]

  We will create our own root CA, intermediate CA and leaf server
  certificate.

  Create necessary directories:

  $ mkdir reproducer
  $ cd reproducer
  $ mkdir CA

  Write openssl configuration files to disk for each CA and cert:

  $ cat << EOF >> rootCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Root-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> subCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Sub-CA

  [ usr_cert ]
  basicConstraints= critical,CA:TRUE,pathlen:0
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  $ cat << EOF >> user.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert

  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test User

  [ usr_cert ]
  basicConstraints= critical,CA:FALSE,pathlen:0
  keyUsage= critical,digitalSignature,keyAgreement
  extendedKeyUsage= clientAuth,serverAuth
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF

  Then generate the necessary RSA keys and form certificates:

  $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes 
-x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1

  $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey 
rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt 
-CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ c_rehash CA

  $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey 
subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial 
-extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss 
-sigopt rsa_pss_saltlen:-1

  Now, let's try verify the generated 

[Touch-packages] [Bug 1926254] Re: x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-04-27 Thread Matthew Ruffell
** Description changed:

  [Impact]
  
  In openssl 1.1.1f, the below commit was merged:
  
  commit ba4356ae4002a04e28642da60c551877eea804f7
  Author: Bernd Edlinger 
  Date:   Sat Jan 4 15:54:53 2020 +0100
  Subject: Fix error handling in x509v3_cache_extensions and related functions
  Link: 
https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7
  
  This introduced a regression which caused certificate validation to fail
  when certificates violate RFC 5280 [1], namely, when a certificate has
  "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen
  by self-signed leaf certificates with an intermediate CA before the root
  CA.
  
  Because of this, openssl 1.1.1f rejects these certificates and they
  cannot be used in the system certificate store, and ssl connections fail
  when you try to use them to connect to a ssl endpoint.
  
  The error you see when you try verify is:
  
  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer certificate
  error user1_cert.pem: verification failed
  
  The exact same certificates work fine on Xenial, Bionic and Hirsute.
  
  [1] https://tools.ietf.org/html/rfc5280.html
  
  [Testcase]
  
  We will create our own root CA, intermediate CA and leaf server
  certificate.
  
  Create necessary directories:
  
  $ mkdir reproducer
  $ cd reproducer
  $ mkdir CA
  
  Write openssl configuration files to disk for each CA and cert:
  
  $ cat << EOF >> rootCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert
  
  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Root-CA
  
  [ usr_cert ]
  basicConstraints= critical,CA:TRUE
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF
  
  $ cat << EOF >> subCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert
  
  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Sub-CA
  
  [ usr_cert ]
  basicConstraints= critical,CA:TRUE,pathlen:0
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF
  
  $ cat << EOF >> user.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert
  
  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test User
  
  [ usr_cert ]
  basicConstraints= critical,CA:FALSE,pathlen:0
  keyUsage= critical,digitalSignature,keyAgreement
  extendedKeyUsage= clientAuth,serverAuth
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF
  
  Then generate the necessary RSA keys and form certificates:
  
  $ openssl genpkey -algorithm RSA-PSS -out rootCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config rootCA.cnf -set_serial 01 -new -batch -sha256 -nodes 
-x509 -days 9125 -out CA/rootCA_cert.pem -key rootCA_key.pem -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  
  $ openssl genpkey -algorithm RSA-PSS -out subCA_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config subCA.cnf -new -out subCA_req.pem -key subCA_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in subCA_req.pem -CA CA/rootCA_cert.pem -CAkey 
rootCA_key.pem -out CA/subCA_cert.pem -CAserial rootCA_serial.txt 
-CAcreateserial -extfile subCA.cnf -extensions usr_cert -days 4380 -sigopt 
rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ c_rehash CA
  
  $ openssl genpkey -algorithm RSA-PSS -out user1_key.pem -pkeyopt 
rsa_keygen_bits:2048
  $ openssl req -config user.cnf -new -out user1_req.pem -key user1_key.pem 
-sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:-1
  $ openssl x509 -req -sha256 -in user1_req.pem -CA CA/subCA_cert.pem -CAkey 
subCA_key.pem -out user1_cert.pem -CAserial subCA_serial.txt -CAcreateserial 
-extfile user.cnf -extensions usr_cert -days 1825 -sigopt rsa_padding_mode:pss 
-sigopt rsa_pss_saltlen:-1
  
  Now, let's try verify the generated certificates:
  
  $ openssl version
  OpenSSL 1.1.1f  31 Mar 2020
  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer certificate
  error user1_cert.pem: verification failed
  
  There are test packages available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf308725-test
  
  If you install these test packages, and attempt to verify, things work
  as planned.
  
+ $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
+ user1_cert.pem: OK
+ 
  [Where problems could occur]
  
  If a regression were to occur, it would occur around x509 

[Touch-packages] [Bug 1926254] [NEW] x509 Certificate verification fails when basicConstraints=CA:FALSE, pathlen:0 on self-signed leaf certs

2021-04-27 Thread Matthew Ruffell
view by the
security team.

One of the commits which fixes the issue adds two testcases to the
openssl testsuite, which tests the "CA:FALSE, pathlen:0" certificates
with and without -x509_strict, and tests to see if it passes without,
and fails with.

[Other info]

This was reported in the upstream issue #11456 [2]:

[2] https://github.com/openssl/openssl/issues/11456

I believe these three commits fix the issue:

commit 00a0da2f021e6a0bc9519a6a9e5be66d45e6fc91
Author: Tomas Mraz 
Date:   Thu Apr 2 15:56:12 2020 +0200
Subject: Allow certificates with Basic Constraints CA:false, pathlen:0
Link: 
https://github.com/openssl/openssl/commit/00a0da2f021e6a0bc9519a6a9e5be66d45e6fc91

commit 29e94f285f7f05b1aec6fa275e320bc5fa37ab1e
Author: Tomas Mraz 
Date:   Thu Apr 2 17:31:21 2020 +0200
Subject: Set X509_V_ERR_INVALID_EXTENSION error for invalid basic constraints
Link: 
https://github.com/openssl/openssl/commit/29e94f285f7f05b1aec6fa275e320bc5fa37ab1e

commit e78f2a8f269a4dcf820ca994e2b89b77972d79e1
Author: Tomas Mraz 
Date:   Fri Apr 3 10:24:40 2020 +0200
Subject: Add test cases for the non CA certificate with pathlen:0
Link: 
https://github.com/openssl/openssl/commit/e78f2a8f269a4dcf820ca994e2b89b77972d79e1

These landed in openssl 1.1.1g, and hirsute already has these fixes.

** Affects: openssl (Ubuntu)
 Importance: Undecided
 Status: Fix Released

** Affects: openssl (Ubuntu Focal)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress

** Affects: openssl (Ubuntu Groovy)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress

** Affects: openssl (Ubuntu Hirsute)
 Importance: Undecided
 Status: Fix Released


** Tags: focal groovy sts

** Also affects: openssl (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: openssl (Ubuntu Hirsute)
   Importance: Undecided
   Status: New

** Also affects: openssl (Ubuntu Groovy)
   Importance: Undecided
   Status: New

** Changed in: openssl (Ubuntu)
   Status: New => Fix Released

** Changed in: openssl (Ubuntu Hirsute)
   Status: New => Fix Released

** Changed in: openssl (Ubuntu Focal)
   Status: New => In Progress

** Changed in: openssl (Ubuntu Groovy)
   Status: New => In Progress

** Changed in: openssl (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: openssl (Ubuntu Groovy)
   Importance: Undecided => Medium

** Changed in: openssl (Ubuntu Focal)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: openssl (Ubuntu Groovy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: focal groovy sts

** Description changed:

  [Impact]
  
  In openssl 1.1.1f, the below commit was merged:
  
  commit ba4356ae4002a04e28642da60c551877eea804f7
  Author: Bernd Edlinger 
  Date:   Sat Jan 4 15:54:53 2020 +0100
  Subject: Fix error handling in x509v3_cache_extensions and related functions
  Link: 
https://github.com/openssl/openssl/commit/ba4356ae4002a04e28642da60c551877eea804f7
  
  This introduced a regression which caused certificate validation to fail
  when certificates violate RFC 5280 [1], namely, when a certificate has
  "basicConstraints=CA:FALSE,pathlen:0". This combination is commonly seen
  by self-signed leaf certificates with an intermediate CA before the root
  CA.
  
  Because of this, openssl 1.1.1f rejects these certificates and they
  cannot be used in the system certificate store, and ssl connections fail
  when you try to use them to connect to a ssl endpoint.
  
  The error you see when you try verify is:
  
  $ openssl verify -CAfile CA/rootCA_cert.pem -untrusted CA/subCA_cert.pem 
user1_cert.pem
  error 20 at 0 depth lookup: unable to get local issuer certificate
  error user1_cert.pem: verification failed
  
  The exact same certificates work fine on Xenial, Bionic and Hirsute.
  
  [1] https://tools.ietf.org/html/rfc5280.html
  
  [Testcase]
  
  We will create our own root CA, intermediate CA and leaf server
  certificate.
  
  Create necessary directories:
  
  $ mkdir reproducer
  $ cd reproducer
  $ mkdir CA
  
  Write openssl configuration files to disk for each CA and cert:
  
  $ cat << EOF >> rootCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert
  
  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Root-CA
  
  [ usr_cert ]
  basicConstraints= critical,CA:TRUE
  keyUsage= critical,keyCertSign,cRLSign
  subjectKeyIdentifier= hash
  authorityKeyIdentifier  = keyid:always
  EOF
  
  $ cat << EOF >> subCA.cnf
  [ req ]
  prompt  = no
  distinguished_name  = req_distinguished_name
  x509_extensions = usr_cert
  
  [ req_distinguished_name ]
  C  = DE
  O  = Test Org
  CN = Test RSA PSS Sub-CA
  
  [ usr_cert ]
  basicConstraints  

[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak

2021-01-25 Thread Matthew Ruffell
Performing verification for Focal.

I installed rsyslog-relp 8.2001.0-1ubuntu1.1 and librelp0 1.5.0-1ubuntu2
from -updates.

>From there I set up the configuration file, launched a new rsyslog
instance, and used netcat to set 100 packets to the relp port.

https://paste.ubuntu.com/p/HfSDvNJzpX/

As we can see, there are 100 sockets still open in the CLOSE_WAIT state.

>From there I enabled -proposed and installed librelp
1.5.0-1ubuntu2.20.04.2.

I started a new instance of rsyslog, and used netcat to send another 100
packets to the relp port. This time, all sockets were closed and not
left in CLOSE_WAIT.

https://paste.ubuntu.com/p/tjXHhQ2293/

I also ran the testcase from the upstream testsuite, imrelp-
sessionbreak-vg.sh.

I did this by:

1) pull-lp-source rsyslog focal
2) edit debian/rules, add --enable-valgrind, remove --without-valgrind-tests,
3) wget 
https://github.com/rsyslog/rsyslog/commit/baee0bd5420649329793746f0daf87c4f59fe6a6.patch
4) quilt import baee0bd5420649329793746f0daf87c4f59fe6a6.patch
5) quilt push
6) chmod +x tests/imrelp-sessionbreak-vg.sh
6) debuild -uc -us -b

It will eventually build tests, and imrelp-sessionbreak-vg.sh passes:

make[5]: Entering directory '/home/ubuntu/rsyslog-8.2001.0/tests'
...
PASS: imrelp-sessionbreak-vg.sh
...

We pass both the upstream testsuite and the testcase from the bug
report.

The file descriptor leak has been fixed, happy to mark as verified for
Focal.

** Tags removed: verification-needed verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1908473

Title:
  rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which
  leads to file descriptor leak

Status in librelp package in Ubuntu:
  Fix Released
Status in rsyslog package in Ubuntu:
  Fix Released
Status in librelp source package in Focal:
  Fix Committed
Status in rsyslog source package in Focal:
  Won't Fix
Status in librelp source package in Groovy:
  Fix Committed
Status in rsyslog source package in Groovy:
  Fix Released
Status in librelp source package in Hirsute:
  Fix Released
Status in rsyslog source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In recent versions of rsyslog and librelp, the imrelp module leaks
  file descriptors due to a bug where it does not correctly close
  sockets, and instead, leaves them in the CLOSE_WAIT state.

  This causes rsyslogd on busy servers to eventually hit the limit of
  maximum open files allowed, which locks rsyslogd up until it is
  restarted.

  A workaround is to restart rsyslogd every month or so to manually
  close all of the open sockets.

  Only users of the imrelp module are affected, and not rsyslog users in
  general.

  [Testcase]

  Install the rsyslog-relp module like so:

  $ sudo apt install rsyslog rsyslog-relp

  Next, generate a working directory, and make a config file that loads
  the relp module.

  $ sudo mkdir /workdir
  $ cat << EOF >> ./spool.conf
  \$LocalHostName spool
  \$AbortOnUncleanConfig on
  \$PreserveFQDN on

  global(
  workDirectory="/workdir"
  maxMessageSize="256k"
  )

  main_queue(queue.type="Direct")
  module(load="imrelp")
  input(
  type="imrelp"
  name="imrelp"
  port="601"
  ruleset="spool"
  MaxDataSize="256k"
  )

  ruleset(name="spool" queue.type="direct") {
  }

  # Just so rsyslog doesn't whine that we do not have outputs
  ruleset(name="noop" queue.type="direct") {
  action(
  type="omfile"
  name="omfile"
  file="/workdir/spool.log"
  )
  }
  EOF

  Verify that the config is valid, then start a rsyslog server.

  $ sudo rsyslogd -f ./spool.conf -N9
  $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid

  Fetch the rsyslogd PID and check for open files.

  $ RLOGPID=$(cat /workdir/rsyslogd.pid)
  $ sudo ls -l /proc/$RLOGPID/fd
  total 0
  lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom
  lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]'
  lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]'
  lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]'
  lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]'

  We have 3 sockets open by default. Next, use netcat to open 100
  connections:

  $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done

  Now check for open file descriptors, and there will be an extra 100 sockets
  in the list:

  $ sudo ls -l /proc/$RLOGPID/fd

  https://paste.ubuntu.com/p/f6NQVNbZcR/

  We can check the state of these sockets with:

  $ ss -t

  https://paste.ubuntu.com/p/7Ts2FbxJrg/

  The listening sockets will be in CLOSE-WAIT, and the netcat sockets
  will be in FIN-WAIT-2.

  $ ss -t | grep CLOSE-WAIT | wc -l
  100

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test

  

[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak

2021-01-24 Thread Matthew Ruffell
Hi Mauricio,

I filed bug 1912969 to fix the FTBFS for librelp on focal. Adjusting the
packets down from 50,000 to 10,000 makes the build succeed on riscv64.

I attached two debdiffs, one an incremental patch from
1.5.0-1ubuntu2.20.04.1, and the other a full patch from 1.5.0-1ubuntu2.

Please review and sponsor.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1908473

Title:
  rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which
  leads to file descriptor leak

Status in librelp package in Ubuntu:
  Fix Released
Status in rsyslog package in Ubuntu:
  Fix Released
Status in librelp source package in Focal:
  Fix Committed
Status in rsyslog source package in Focal:
  Won't Fix
Status in librelp source package in Groovy:
  Fix Committed
Status in rsyslog source package in Groovy:
  Fix Released
Status in librelp source package in Hirsute:
  Fix Released
Status in rsyslog source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In recent versions of rsyslog and librelp, the imrelp module leaks
  file descriptors due to a bug where it does not correctly close
  sockets, and instead, leaves them in the CLOSE_WAIT state.

  This causes rsyslogd on busy servers to eventually hit the limit of
  maximum open files allowed, which locks rsyslogd up until it is
  restarted.

  A workaround is to restart rsyslogd every month or so to manually
  close all of the open sockets.

  Only users of the imrelp module are affected, and not rsyslog users in
  general.

  [Testcase]

  Install the rsyslog-relp module like so:

  $ sudo apt install rsyslog rsyslog-relp

  Next, generate a working directory, and make a config file that loads
  the relp module.

  $ sudo mkdir /workdir
  $ cat << EOF >> ./spool.conf
  \$LocalHostName spool
  \$AbortOnUncleanConfig on
  \$PreserveFQDN on

  global(
  workDirectory="/workdir"
  maxMessageSize="256k"
  )

  main_queue(queue.type="Direct")
  module(load="imrelp")
  input(
  type="imrelp"
  name="imrelp"
  port="601"
  ruleset="spool"
  MaxDataSize="256k"
  )

  ruleset(name="spool" queue.type="direct") {
  }

  # Just so rsyslog doesn't whine that we do not have outputs
  ruleset(name="noop" queue.type="direct") {
  action(
  type="omfile"
  name="omfile"
  file="/workdir/spool.log"
  )
  }
  EOF

  Verify that the config is valid, then start a rsyslog server.

  $ sudo rsyslogd -f ./spool.conf -N9
  $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid

  Fetch the rsyslogd PID and check for open files.

  $ RLOGPID=$(cat /workdir/rsyslogd.pid)
  $ sudo ls -l /proc/$RLOGPID/fd
  total 0
  lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom
  lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]'
  lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]'
  lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]'
  lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]'

  We have 3 sockets open by default. Next, use netcat to open 100
  connections:

  $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done

  Now check for open file descriptors, and there will be an extra 100 sockets
  in the list:

  $ sudo ls -l /proc/$RLOGPID/fd

  https://paste.ubuntu.com/p/f6NQVNbZcR/

  We can check the state of these sockets with:

  $ ss -t

  https://paste.ubuntu.com/p/7Ts2FbxJrg/

  The listening sockets will be in CLOSE-WAIT, and the netcat sockets
  will be in FIN-WAIT-2.

  $ ss -t | grep CLOSE-WAIT | wc -l
  100

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test

  When you open connections with netcat, these will be closed properly,
  and the file descriptor leak will be fixed.

  [Where problems could occur]

  If a regression were to occur, it would be limited to users of the
  imrelp module, which is a part of the rsyslogd-relp package, and
  depends on librelp.

  rsyslog-relp is not part of a default installation of rsyslog, and is
  opt in by changing a configuration file to enable imrelp.

  The changes to rsyslog implement a testcase which exercises the problematic 
code to ensure things are working as expected; this
  can be enabled manually on build, and has been verified to pass (#7).

  [Other]

  Upstream bug list:

  https://github.com/rsyslog/rsyslog/issues/4350
  https://github.com/rsyslog/rsyslog/issues/4005
  https://github.com/rsyslog/librelp/issues/188
  https://github.com/rsyslog/librelp/pull/193

  The following commits fix the problem:

  rsyslogd
  

  commit baee0bd5420649329793746f0daf87c4f59fe6a6
  Author: Andre lorbach 
  Date:   Thu Apr 9 13:00:35 2020 +0200
  Subject: testbench: Add test for imrelp to check broken session handling.
  Link: 

[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak

2021-01-21 Thread Matthew Ruffell
Hi Mauricio,

It seems riscv64 passes on Groovy due to tests being skipped on the
riscv64 architecture.

>From Groovy's build log:

https://paste.ubuntu.com/p/NCJPDVSbSW/

If you look at the man page for dh_auto_test it mentions:

If the DEB_BUILD_OPTIONS environment variable contains nocheck, no tests
will be performed.

nocheck was added to riscv64 by default for all packages in Groovy as a
part of this change to dpkg in bug 1891686.

The test cases basic-realistic.sh and tls-basic-realistic.sh fail on
Focal because they attempt to send 100,000 packets between the server
and the client, and we get to various stages, like 00029000 msgs sent,
and now 00047000 msgs sent with some changes William made to the
builders, before it times out and assumes the channel is dead, and the
test fails.

https://paste.ubuntu.com/p/hwYXSbKPPV/

We aren't going to hit the 100,000 packets on riscv anytime soon. I
think I will open a new bug to adjust the packet counts from 100,000
down to 10,000 for basic-realistic.sh and tls-basic-realistic.sh, which
resembles what has been done for receiver-abort.sh and tls-receiver-
abort.sh.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1908473

Title:
  rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which
  leads to file descriptor leak

Status in librelp package in Ubuntu:
  Fix Released
Status in rsyslog package in Ubuntu:
  Fix Released
Status in librelp source package in Focal:
  Fix Committed
Status in rsyslog source package in Focal:
  Won't Fix
Status in librelp source package in Groovy:
  Fix Committed
Status in rsyslog source package in Groovy:
  Fix Released
Status in librelp source package in Hirsute:
  Fix Released
Status in rsyslog source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In recent versions of rsyslog and librelp, the imrelp module leaks
  file descriptors due to a bug where it does not correctly close
  sockets, and instead, leaves them in the CLOSE_WAIT state.

  This causes rsyslogd on busy servers to eventually hit the limit of
  maximum open files allowed, which locks rsyslogd up until it is
  restarted.

  A workaround is to restart rsyslogd every month or so to manually
  close all of the open sockets.

  Only users of the imrelp module are affected, and not rsyslog users in
  general.

  [Testcase]

  Install the rsyslog-relp module like so:

  $ sudo apt install rsyslog rsyslog-relp

  Next, generate a working directory, and make a config file that loads
  the relp module.

  $ sudo mkdir /workdir
  $ cat << EOF >> ./spool.conf
  \$LocalHostName spool
  \$AbortOnUncleanConfig on
  \$PreserveFQDN on

  global(
  workDirectory="/workdir"
  maxMessageSize="256k"
  )

  main_queue(queue.type="Direct")
  module(load="imrelp")
  input(
  type="imrelp"
  name="imrelp"
  port="601"
  ruleset="spool"
  MaxDataSize="256k"
  )

  ruleset(name="spool" queue.type="direct") {
  }

  # Just so rsyslog doesn't whine that we do not have outputs
  ruleset(name="noop" queue.type="direct") {
  action(
  type="omfile"
  name="omfile"
  file="/workdir/spool.log"
  )
  }
  EOF

  Verify that the config is valid, then start a rsyslog server.

  $ sudo rsyslogd -f ./spool.conf -N9
  $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid

  Fetch the rsyslogd PID and check for open files.

  $ RLOGPID=$(cat /workdir/rsyslogd.pid)
  $ sudo ls -l /proc/$RLOGPID/fd
  total 0
  lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom
  lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]'
  lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]'
  lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]'
  lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]'

  We have 3 sockets open by default. Next, use netcat to open 100
  connections:

  $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done

  Now check for open file descriptors, and there will be an extra 100 sockets
  in the list:

  $ sudo ls -l /proc/$RLOGPID/fd

  https://paste.ubuntu.com/p/f6NQVNbZcR/

  We can check the state of these sockets with:

  $ ss -t

  https://paste.ubuntu.com/p/7Ts2FbxJrg/

  The listening sockets will be in CLOSE-WAIT, and the netcat sockets
  will be in FIN-WAIT-2.

  $ ss -t | grep CLOSE-WAIT | wc -l
  100

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test

  When you open connections with netcat, these will be closed properly,
  and the file descriptor leak will be fixed.

  [Where problems could occur]

  If a regression were to occur, it would be limited to users of the
  imrelp module, which is a part of the rsyslogd-relp package, and
  depends on librelp.

  rsyslog-relp is not part of a default installation of rsyslog, and is
  opt in by changing a 

[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak

2021-01-20 Thread Matthew Ruffell
Performing verification for Focal

I installed rsyslog-relp 8.2001.0-1ubuntu1.1 and librelp0 1.5.0-1ubuntu2
from -updates.

>From there I set up the configuration file, launched a new rsyslog instance, 
>and
used netcat to set 100 packets to the relp port.

https://paste.ubuntu.com/p/jCs9Dy6FYF/

As we can see, there are 100 sockets still open in the CLOSE_WAIT state.

>From there I enabled -proposed and installed librelp
1.5.0-1ubuntu2.20.04.1.

I started a new instance of rsyslog, and used netcat to send another 100 packets
to the relp port. This time, all sockets were closed and not left in CLOSE_WAIT.

https://paste.ubuntu.com/p/vdzsVTctmf/

I also ran the testcase from the upstream testsuite, imrelp-
sessionbreak-vg.sh.

I did this by:

1) pull-lp-source rsyslog focal
2) edit debian/rules, add --enable-valgrind, remove --without-valgrind-tests,
3) wget 
https://github.com/rsyslog/rsyslog/commit/baee0bd5420649329793746f0daf87c4f59fe6a6.patch
4) quilt import baee0bd5420649329793746f0daf87c4f59fe6a6.patch
5) quilt push
6) chmod +x tests/imrelp-sessionbreak-vg.sh
6) debuild -uc -us -b

It will eventually build tests, and imrelp-sessionbreak-vg.sh passes:

make[5]: Entering directory '/home/ubuntu/rsyslog-8.2001.0/tests'
...
PASS: imrelp-sessionbreak-vg.sh
...

We pass both the upstream testsuite and the testcase from the bug
report.

The file descriptor leak has been fixed, happy to mark as verified for
Focal.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1908473

Title:
  rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which
  leads to file descriptor leak

Status in librelp package in Ubuntu:
  Fix Released
Status in rsyslog package in Ubuntu:
  Fix Released
Status in librelp source package in Focal:
  Fix Committed
Status in rsyslog source package in Focal:
  Won't Fix
Status in librelp source package in Groovy:
  Fix Committed
Status in rsyslog source package in Groovy:
  Fix Released
Status in librelp source package in Hirsute:
  Fix Released
Status in rsyslog source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In recent versions of rsyslog and librelp, the imrelp module leaks
  file descriptors due to a bug where it does not correctly close
  sockets, and instead, leaves them in the CLOSE_WAIT state.

  This causes rsyslogd on busy servers to eventually hit the limit of
  maximum open files allowed, which locks rsyslogd up until it is
  restarted.

  A workaround is to restart rsyslogd every month or so to manually
  close all of the open sockets.

  Only users of the imrelp module are affected, and not rsyslog users in
  general.

  [Testcase]

  Install the rsyslog-relp module like so:

  $ sudo apt install rsyslog rsyslog-relp

  Next, generate a working directory, and make a config file that loads
  the relp module.

  $ sudo mkdir /workdir
  $ cat << EOF >> ./spool.conf
  \$LocalHostName spool
  \$AbortOnUncleanConfig on
  \$PreserveFQDN on

  global(
  workDirectory="/workdir"
  maxMessageSize="256k"
  )

  main_queue(queue.type="Direct")
  module(load="imrelp")
  input(
  type="imrelp"
  name="imrelp"
  port="601"
  ruleset="spool"
  MaxDataSize="256k"
  )

  ruleset(name="spool" queue.type="direct") {
  }

  # Just so rsyslog doesn't whine that we do not have outputs
  ruleset(name="noop" queue.type="direct") {
  action(
  type="omfile"
  name="omfile"
  file="/workdir/spool.log"
  )
  }
  EOF

  Verify that the config is valid, then start a rsyslog server.

  $ sudo rsyslogd -f ./spool.conf -N9
  $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid

  Fetch the rsyslogd PID and check for open files.

  $ RLOGPID=$(cat /workdir/rsyslogd.pid)
  $ sudo ls -l /proc/$RLOGPID/fd
  total 0
  lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom
  lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]'
  lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]'
  lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]'
  lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]'

  We have 3 sockets open by default. Next, use netcat to open 100
  connections:

  $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done

  Now check for open file descriptors, and there will be an extra 100 sockets
  in the list:

  $ sudo ls -l /proc/$RLOGPID/fd

  https://paste.ubuntu.com/p/f6NQVNbZcR/

  We can check the state of these sockets with:

  $ ss -t

  https://paste.ubuntu.com/p/7Ts2FbxJrg/

  The listening sockets will be in CLOSE-WAIT, and the netcat sockets
  will be in FIN-WAIT-2.

  $ ss -t | grep CLOSE-WAIT | wc -l
  100

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test

  When you open connections with netcat, these will be closed properly,
  and the file descriptor leak 

[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak

2021-01-20 Thread Matthew Ruffell
Performing verification for librelp in Groovy.

I installed rsyslog-relp 8.2006.0-2ubuntu1 and librelp 1.5.0-1ubuntu2
from -updates to reproduce:

https://paste.ubuntu.com/p/gtn4rcXc72/

>From there I set up the configuration script, ran a new instance of
rsyslog, and used netcat to open 100 connections to the relp port.

When I checked the list of file descriptors, there were 100 sockets
open, in the CLOSE_WAIT state.

>From there, I enabled -proposed and installed librelp
1.5.0-1ubuntu2.20.10.1:

https://paste.ubuntu.com/p/nt342PJkQ5/

I started a new rsyslog instance, and used netcat to open 100
connections to the relp port.

All sockets were closed when rsyslog was done with them, and there were
no sockets in CLOSE_WAIT.

I also ran the provided testcase in rsyslog, imrelp-sessionbreak-vg.sh.

I did this by:

1) pull-lp-source rsyslog groovy
2) edit debian/rules, add --enable-valgrind, remove --without-valgrind-tests,
3) debuild -uc -us -b

It will eventually build tests, and imrelp-sessionbreak-vg.sh passes:

make[5]: Entering directory '/home/ubuntu/rsyslog-8.2006.0/tests'
...
PASS: imrelp-sessionbreak-vg.sh
...

We pass both the upstream testsuite and the testcase from the bug
report.

The file descriptor leak has been fixed, happy to mark as verified for
Groovy.

** Tags removed: verification-needed-groovy
** Tags added: verification-done-groovy

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1908473

Title:
  rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which
  leads to file descriptor leak

Status in librelp package in Ubuntu:
  Fix Released
Status in rsyslog package in Ubuntu:
  Fix Released
Status in librelp source package in Focal:
  Fix Committed
Status in rsyslog source package in Focal:
  Won't Fix
Status in librelp source package in Groovy:
  Fix Committed
Status in rsyslog source package in Groovy:
  Fix Released
Status in librelp source package in Hirsute:
  Fix Released
Status in rsyslog source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In recent versions of rsyslog and librelp, the imrelp module leaks
  file descriptors due to a bug where it does not correctly close
  sockets, and instead, leaves them in the CLOSE_WAIT state.

  This causes rsyslogd on busy servers to eventually hit the limit of
  maximum open files allowed, which locks rsyslogd up until it is
  restarted.

  A workaround is to restart rsyslogd every month or so to manually
  close all of the open sockets.

  Only users of the imrelp module are affected, and not rsyslog users in
  general.

  [Testcase]

  Install the rsyslog-relp module like so:

  $ sudo apt install rsyslog rsyslog-relp

  Next, generate a working directory, and make a config file that loads
  the relp module.

  $ sudo mkdir /workdir
  $ cat << EOF >> ./spool.conf
  \$LocalHostName spool
  \$AbortOnUncleanConfig on
  \$PreserveFQDN on

  global(
  workDirectory="/workdir"
  maxMessageSize="256k"
  )

  main_queue(queue.type="Direct")
  module(load="imrelp")
  input(
  type="imrelp"
  name="imrelp"
  port="601"
  ruleset="spool"
  MaxDataSize="256k"
  )

  ruleset(name="spool" queue.type="direct") {
  }

  # Just so rsyslog doesn't whine that we do not have outputs
  ruleset(name="noop" queue.type="direct") {
  action(
  type="omfile"
  name="omfile"
  file="/workdir/spool.log"
  )
  }
  EOF

  Verify that the config is valid, then start a rsyslog server.

  $ sudo rsyslogd -f ./spool.conf -N9
  $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid

  Fetch the rsyslogd PID and check for open files.

  $ RLOGPID=$(cat /workdir/rsyslogd.pid)
  $ sudo ls -l /proc/$RLOGPID/fd
  total 0
  lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom
  lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]'
  lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]'
  lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]'
  lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]'

  We have 3 sockets open by default. Next, use netcat to open 100
  connections:

  $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done

  Now check for open file descriptors, and there will be an extra 100 sockets
  in the list:

  $ sudo ls -l /proc/$RLOGPID/fd

  https://paste.ubuntu.com/p/f6NQVNbZcR/

  We can check the state of these sockets with:

  $ ss -t

  https://paste.ubuntu.com/p/7Ts2FbxJrg/

  The listening sockets will be in CLOSE-WAIT, and the netcat sockets
  will be in FIN-WAIT-2.

  $ ss -t | grep CLOSE-WAIT | wc -l
  100

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test

  When you open connections with netcat, these will be closed properly,
  and the file descriptor leak will be fixed.

  [Where problems could occur]

  If a 

[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak

2021-01-20 Thread Matthew Ruffell
Same one failure on today's rebuild. Strange, since this is the exact
same code as Groovy.


** Attachment added: 
"buildlog_ubuntu-focal-riscv64.librelp_1.5.0-1ubuntu2.20.04.1_BUILDING.txt.gz.5"
   
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1908473/+attachment/5454976/+files/buildlog_ubuntu-focal-riscv64.librelp_1.5.0-1ubuntu2.20.04.1_BUILDING.txt.gz.5

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1908473

Title:
  rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which
  leads to file descriptor leak

Status in librelp package in Ubuntu:
  Fix Released
Status in rsyslog package in Ubuntu:
  Fix Released
Status in librelp source package in Focal:
  Fix Committed
Status in rsyslog source package in Focal:
  Won't Fix
Status in librelp source package in Groovy:
  Fix Committed
Status in rsyslog source package in Groovy:
  Fix Released
Status in librelp source package in Hirsute:
  Fix Released
Status in rsyslog source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In recent versions of rsyslog and librelp, the imrelp module leaks
  file descriptors due to a bug where it does not correctly close
  sockets, and instead, leaves them in the CLOSE_WAIT state.

  This causes rsyslogd on busy servers to eventually hit the limit of
  maximum open files allowed, which locks rsyslogd up until it is
  restarted.

  A workaround is to restart rsyslogd every month or so to manually
  close all of the open sockets.

  Only users of the imrelp module are affected, and not rsyslog users in
  general.

  [Testcase]

  Install the rsyslog-relp module like so:

  $ sudo apt install rsyslog rsyslog-relp

  Next, generate a working directory, and make a config file that loads
  the relp module.

  $ sudo mkdir /workdir
  $ cat << EOF >> ./spool.conf
  \$LocalHostName spool
  \$AbortOnUncleanConfig on
  \$PreserveFQDN on

  global(
  workDirectory="/workdir"
  maxMessageSize="256k"
  )

  main_queue(queue.type="Direct")
  module(load="imrelp")
  input(
  type="imrelp"
  name="imrelp"
  port="601"
  ruleset="spool"
  MaxDataSize="256k"
  )

  ruleset(name="spool" queue.type="direct") {
  }

  # Just so rsyslog doesn't whine that we do not have outputs
  ruleset(name="noop" queue.type="direct") {
  action(
  type="omfile"
  name="omfile"
  file="/workdir/spool.log"
  )
  }
  EOF

  Verify that the config is valid, then start a rsyslog server.

  $ sudo rsyslogd -f ./spool.conf -N9
  $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid

  Fetch the rsyslogd PID and check for open files.

  $ RLOGPID=$(cat /workdir/rsyslogd.pid)
  $ sudo ls -l /proc/$RLOGPID/fd
  total 0
  lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom
  lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]'
  lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]'
  lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]'
  lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]'

  We have 3 sockets open by default. Next, use netcat to open 100
  connections:

  $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done

  Now check for open file descriptors, and there will be an extra 100 sockets
  in the list:

  $ sudo ls -l /proc/$RLOGPID/fd

  https://paste.ubuntu.com/p/f6NQVNbZcR/

  We can check the state of these sockets with:

  $ ss -t

  https://paste.ubuntu.com/p/7Ts2FbxJrg/

  The listening sockets will be in CLOSE-WAIT, and the netcat sockets
  will be in FIN-WAIT-2.

  $ ss -t | grep CLOSE-WAIT | wc -l
  100

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test

  When you open connections with netcat, these will be closed properly,
  and the file descriptor leak will be fixed.

  [Where problems could occur]

  If a regression were to occur, it would be limited to users of the
  imrelp module, which is a part of the rsyslogd-relp package, and
  depends on librelp.

  rsyslog-relp is not part of a default installation of rsyslog, and is
  opt in by changing a configuration file to enable imrelp.

  The changes to rsyslog implement a testcase which exercises the problematic 
code to ensure things are working as expected; this
  can be enabled manually on build, and has been verified to pass (#7).

  [Other]

  Upstream bug list:

  https://github.com/rsyslog/rsyslog/issues/4350
  https://github.com/rsyslog/rsyslog/issues/4005
  https://github.com/rsyslog/librelp/issues/188
  https://github.com/rsyslog/librelp/pull/193

  The following commits fix the problem:

  rsyslogd
  

  commit baee0bd5420649329793746f0daf87c4f59fe6a6
  Author: Andre lorbach 
  Date:   Thu Apr 9 13:00:35 2020 +0200
  Subject: testbench: Add test for imrelp to check broken session handling.
  Link: 

[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-20 Thread Matthew Ruffell
Hi Robie, I agree this probably isn't worth a SRU to Groovy, I just made
the packages available in the odd chance that they might be considered.
I will mark Groovy as won't fix.

Hirsute is what really matters in the end.

** Changed in: rsyslog (Ubuntu Groovy)
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  Won't Fix
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-18 Thread Matthew Ruffell
Attached is a patch which changes /var/log/dmesg to 0640 on groovy. It
also contains Steve's recommendation to set the logrotate files to 0640.

** Patch added: "Debdiff for syslog on groovy"
   
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454311/+files/lp1912122_groovy_v2.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  In Progress
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-18 Thread Matthew Ruffell
Attached is a patch which changes /var/log/dmesg to 0640 on hirsute. It
also contains Steve's recommendation to set the logrotate files to 0640.

** Patch removed: "Debdiff for rsyslog on hirsute"
   
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454004/+files/lp1912122_hirsute.debdiff

** Patch removed: "Debdiff for syslog on groovy"
   
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454005/+files/lp1912122_groovy.debdiff

** Patch added: "Debdiff for rsyslog on hirsute"
   
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454310/+files/lp1912122_hirsute_v2.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  In Progress
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-17 Thread Matthew Ruffell
** Tags added: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  In Progress
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-17 Thread Matthew Ruffell
Attached is a debdiff for Groovy to change /var/log/dmesg to 0640.

** Patch added: "Debdiff for syslog on groovy"
   
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454005/+files/lp1912122_groovy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  In Progress
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-17 Thread Matthew Ruffell
Attached is a debdiff for hirsute to set /var/log/dmesg to 0640.

** Patch added: "Debdiff for rsyslog on hirsute"
   
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+attachment/5454004/+files/lp1912122_hirsute.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  In Progress
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-17 Thread Matthew Ruffell
** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  In Progress
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1912122] Re: /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-17 Thread Matthew Ruffell
** Changed in: rsyslog (Ubuntu Hirsute)
   Status: New => In Progress

** Changed in: rsyslog (Ubuntu Hirsute)
   Importance: Undecided => Medium

** Changed in: rsyslog (Ubuntu Hirsute)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Description changed:

  [Impact]
  
  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu
  kernel starting with Groovy and onward, in an effort to restrict access
  to the kernel log buffer from unprivileged users.
  
  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:
  
  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog
  
  Change /var/log/dmesg to 0640 to close the information leak.
  
  [Testcase]
  
  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---
  
  If you install the package in the following ppa:
  
+ https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test
+ 
+ $ sudo systemctl daemon-reload
+ $ sudo systemctl start dmesg.service
+ 
  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied
  
  [Where problems could occur]
  
  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

** Changed in: rsyslog (Ubuntu Groovy)
   Status: New => In Progress

** Changed in: rsyslog (Ubuntu Groovy)
   Importance: Undecided => Medium

** Changed in: rsyslog (Ubuntu Groovy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  In Progress
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1912122-test

  $ sudo systemctl daemon-reload
  $ sudo systemctl start dmesg.service

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.

[Touch-packages] [Bug 1912122] [NEW] /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT restrictions

2021-01-17 Thread Matthew Ruffell
Public bug reported:

[Impact]

In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the Ubuntu
kernel starting with Groovy and onward, in an effort to restrict access
to the kernel log buffer from unprivileged users.

It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
while /var/log/kern.log, /var/log/syslog are all 0640:

$ ll /var/log
-rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
-rw-r-   1 syslogadm 24538 Jan 18 13:05 kern.log
-rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

Change /var/log/dmesg to 0640 to close the information leak.

[Testcase]

$ sudo adduser dave
$ su dave
$ groups
dave
$ cat /var/log/kern.log
cat: /var/log/kern.log: Permission denied
$ cat /var/log/syslog
cat: /var/log/syslog: Permission denied
$ cat /var/log/dmesg
[0.00] kernel: Linux version 5.8.0-36-generic (buildd@lgw01-amd64-011) 
(gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld (GNU Binutils for Ubuntu) 
2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 11:35:09 UTC 2021 (Ubuntu 
5.8.0-36.40+21.04.1-generic 5.8.18)
[0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

If you install the package in the following ppa:

$ sudo adduser dave
$ su dave
$ groups
dave
$ cat /var/log/kern.log
cat: /var/log/kern.log: Permission denied
$ cat /var/log/syslog
cat: /var/log/syslog: Permission denied
$ cat /var/log/dmesg
cat: /var/log/dmesg: Permission denied

[Where problems could occur]

Some users or log scraper programs might need to view the kernel log
buffers, and in this case, their underlying service accounts should be
added to the 'adm' group.

** Affects: rsyslog (Ubuntu)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress

** Affects: rsyslog (Ubuntu Groovy)
 Importance: Undecided
 Status: New

** Affects: rsyslog (Ubuntu Hirsute)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress

** Also affects: rsyslog (Ubuntu Groovy)
   Importance: Undecided
   Status: New

** Also affects: rsyslog (Ubuntu Hirsute)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1912122

Title:
  /var/log/dmesg is 0644, should be 0640 to match new DMESG_RESTRICT
  restrictions

Status in rsyslog package in Ubuntu:
  In Progress
Status in rsyslog source package in Groovy:
  New
Status in rsyslog source package in Hirsute:
  In Progress

Bug description:
  [Impact]

  In bug 1886112, CONFIG_SECURITY_DMESG_RESTRICT was enabled on the
  Ubuntu kernel starting with Groovy and onward, in an effort to
  restrict access to the kernel log buffer from unprivileged users.

  It seems we have overlooked /var/log/dmesg, as it is still mode 0644,
  while /var/log/kern.log, /var/log/syslog are all 0640:

  $ ll /var/log
  -rw-r--r--   1 root  adm 81768 Jan 18 09:09 dmesg
  -rw-r-   1 syslogadm 24538 Jan 18 13:05 
kern.log
  -rw-r-   1 syslogadm213911 Jan 18 13:22 syslog

  Change /var/log/dmesg to 0640 to close the information leak.

  [Testcase]

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  [0.00] kernel: Linux version 5.8.0-36-generic 
(buildd@lgw01-amd64-011) (gcc (Ubuntu 10.2.1-2ubuntu3) 10.2.1 20201221, GNU ld 
(GNU Binutils for Ubuntu) 2.35.50.20210106) #40+21.04.1-Ubuntu SMP Thu Jan 7 
11:35:09 UTC 2021 (Ubuntu 5.8.0-36.40+21.04.1-generic 5.8.18)
  [0.00] kernel: Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed maybe-ubiquity quiet splash ---

  If you install the package in the following ppa:

  $ sudo adduser dave
  $ su dave
  $ groups
  dave
  $ cat /var/log/kern.log
  cat: /var/log/kern.log: Permission denied
  $ cat /var/log/syslog
  cat: /var/log/syslog: Permission denied
  $ cat /var/log/dmesg
  cat: /var/log/dmesg: Permission denied

  [Where problems could occur]

  Some users or log scraper programs might need to view the kernel log
  buffers, and in this case, their underlying service accounts should be
  added to the 'adm' group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1912122/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 1908473] Re: rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which leads to file descriptor leak

2021-01-05 Thread Matthew Ruffell
** Tags added: sts-sponsor

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to rsyslog in Ubuntu.
https://bugs.launchpad.net/bugs/1908473

Title:
  rsyslog-relp: imrelp module leaves sockets in CLOSE_WAIT state which
  leads to file descriptor leak

Status in librelp package in Ubuntu:
  In Progress
Status in rsyslog package in Ubuntu:
  Fix Released
Status in librelp source package in Focal:
  In Progress
Status in rsyslog source package in Focal:
  Won't Fix
Status in librelp source package in Groovy:
  In Progress
Status in rsyslog source package in Groovy:
  Fix Released
Status in librelp source package in Hirsute:
  In Progress
Status in rsyslog source package in Hirsute:
  Fix Released

Bug description:
  [Impact]

  In recent versions of rsyslog and librelp, the imrelp module leaks
  file descriptors due to a bug where it does not correctly close
  sockets, and instead, leaves them in the CLOSE_WAIT state.

  This causes rsyslogd on busy servers to eventually hit the limit of
  maximum open files allowed, which locks rsyslogd up until it is
  restarted.

  A workaround is to restart rsyslogd every month or so to manually
  close all of the open sockets.

  Only users of the imrelp module are affected, and not rsyslog users in
  general.

  [Testcase]

  Install the rsyslog-relp module like so:

  $ sudo apt install rsyslog rsyslog-relp

  Next, generate a working directory, and make a config file that loads
  the relp module.

  $ sudo mkdir /workdir
  $ cat << EOF >> ./spool.conf
  \$LocalHostName spool
  \$AbortOnUncleanConfig on
  \$PreserveFQDN on

  global(
  workDirectory="/workdir"
  maxMessageSize="256k"
  )

  main_queue(queue.type="Direct")
  module(load="imrelp")
  input(
  type="imrelp"
  name="imrelp"
  port="601"
  ruleset="spool"
  MaxDataSize="256k"
  )

  ruleset(name="spool" queue.type="direct") {
  }

  # Just so rsyslog doesn't whine that we do not have outputs
  ruleset(name="noop" queue.type="direct") {
  action(
  type="omfile"
  name="omfile"
  file="/workdir/spool.log"
  )
  }
  EOF

  Verify that the config is valid, then start a rsyslog server.

  $ sudo rsyslogd -f ./spool.conf -N9
  $ sudo rsyslogd -f ./spool.conf -i /workdir/rsyslogd.pid

  Fetch the rsyslogd PID and check for open files.

  $ RLOGPID=$(cat /workdir/rsyslogd.pid)
  $ sudo ls -l /proc/$RLOGPID/fd
  total 0
  lr-x-- 1 root root 64 Dec 17 01:22 0 -> /dev/urandom
  lrwx-- 1 root root 64 Dec 17 01:22 1 -> 'socket:[41228]'
  lrwx-- 1 root root 64 Dec 17 01:22 3 -> 'socket:[41222]'
  lrwx-- 1 root root 64 Dec 17 01:22 4 -> 'socket:[41223]'
  lrwx-- 1 root root 64 Dec 17 01:22 7 -> 'anon_inode:[eventpoll]'

  We have 3 sockets open by default. Next, use netcat to open 100
  connections:

  $ for i in {1..100} ; do nc -z 127.0.0.1 601 ; done

  Now check for open file descriptors, and there will be an extra 100 sockets
  in the list:

  $ sudo ls -l /proc/$RLOGPID/fd

  https://paste.ubuntu.com/p/f6NQVNbZcR/

  We can check the state of these sockets with:

  $ ss -t

  https://paste.ubuntu.com/p/7Ts2FbxJrg/

  The listening sockets will be in CLOSE-WAIT, and the netcat sockets
  will be in FIN-WAIT-2.

  $ ss -t | grep CLOSE-WAIT | wc -l
  100

  If you install the test package available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf299578-test

  When you open connections with netcat, these will be closed properly,
  and the file descriptor leak will be fixed.

  [Where problems could occur]

  If a regression were to occur, it would be limited to users of the
  imrelp module, which is a part of the rsyslogd-relp package, and
  depends on librelp.

  rsyslog-relp is not part of a default installation of rsyslog, and is
  opt in by changing a configuration file to enable imrelp.

  The changes to rsyslog implement a testcase which exercises the
  problematic code to ensure things are working as expected, and should
  run during autopkgtest time.

  [Other]

  Upstream bug list:

  https://github.com/rsyslog/rsyslog/issues/4350
  https://github.com/rsyslog/rsyslog/issues/4005
  https://github.com/rsyslog/librelp/issues/188
  https://github.com/rsyslog/librelp/pull/193

  The following commits fix the problem:

  rsyslogd
  

  commit baee0bd5420649329793746f0daf87c4f59fe6a6
  Author: Andre lorbach 
  Date:   Thu Apr 9 13:00:35 2020 +0200
  Subject: testbench: Add test for imrelp to check broken session handling.
  Link: 
https://github.com/rsyslog/rsyslog/commit/baee0bd5420649329793746f0daf87c4f59fe6a6

  librelp
  ===

  commit 7907c9c57f6ed94c8ce5a4e63c3c4e019f71cff0
  Author: Andre lorbach 
  Date:   Mon May 11 14:59:55 2020 +0200
  Subject: fix memory leak on session break.
  Link: 
https://github.com/rsyslog/librelp/commit/7907c9c57f6ed94c8ce5a4e63c3c4e019f71cff0

  commit 4a6ad8637c244fd3a1caeb9a93950826f58e956a
  

  1   2   >