[Bug 2081129] Re: libpam-sss: require_cert_auth is not absolute, will fall back to password auth on smartcard removal

2024-09-18 Thread Matthew Ruffell
Attached is a debdiff that fixes this issue on jammy.

** Patch added: "Debdiff for sssd on jammy"
   
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/2081129/+attachment/5819098/+files/lp2081129_jammy.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2081129

Title:
  libpam-sss: require_cert_auth is not absolute, will fall back to
  password auth on smartcard removal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/2081129/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2081129] [NEW] libpam-sss: require_cert_auth is not absolute, will fall back to password auth on smartcard removal

2024-09-18 Thread Matthew Ruffell
ther places, like sudo.

The changes to require_cert_auth could potentially lock users out of
their systems, if they were relying on the broken behaviour of password
fallbacks. Users of these systems should hopefully have their smartcard
present and working in order to log in. If they don't, they will be
locked out of their systems.

It is very difficult to estimate if there is anyone relying on faulty
behaviour.

Since this changes how smartcards authenticate, if a regression were to
occur, it could affect anyone that uses smartcards with sssd / libpam-
sss.

[Other info]

Upstream bug:
https://github.com/SSSD/sssd/issues/6022
https://github.com/SSSD/sssd/issues/6023

Commits that fix the issue, landed in 2.7.0 present in Kinetic or later.

commit 731b3e668c6a659922466aee7fa8093412707325
Author: Sumit Bose 
Date:  Tue Apr 13 17:12:24 2021 +0200
Subject: pam: add more checks for require_cert_auth
Link: 
https://github.com/SSSD/sssd/commit/731b3e668c6a659922466aee7fa8093412707325

commit 4d2277f8c3065771a8c3bbc7938309a4905640f0
Author: Sumit Bose 
Date:  Mon Feb 21 18:02:47 2022 +0100
Subject: pam: better SC fallback message
Link: 
https://github.com/SSSD/sssd/commit/4d2277f8c3065771a8c3bbc7938309a4905640f0

** Affects: sssd (Ubuntu)
     Importance: Undecided
 Status: Fix Released

** Affects: sssd (Ubuntu Jammy)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress


** Tags: jammy sts

** Tags added: jammy sts

** Also affects: sssd (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: sssd (Ubuntu)
   Status: New => Fix Released

** Changed in: sssd (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: sssd (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: sssd (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2081129

Title:
  libpam-sss: require_cert_auth is not absolute, will fall back to
  password auth on smartcard removal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/2081129/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2077657] Re: Kernel Oops - BUG: kernel NULL pointer dereference, RIP: 0010:tcp_rearm_rto+0xe4/0x160

2024-09-17 Thread Matthew Ruffell
Upstream threads:

V1:
https://lore.kernel.org/netdev/a76ac35a-9be2-4849-985c-2f3b2a922...@akamai.com/T/

V2:
https://www.spinics.net/lists/netdev/msg1027412.html

V3:
https://lore.kernel.org/netdev/CADVnQy=xv_qy77nzk2wvjxdkjsiba+k5b4lhgf4msr-v1r2...@mail.gmail.com/T/

Josh, if you need any help building test Ubuntu kernels, let us know.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2077657

Title:
  Kernel Oops - BUG: kernel NULL pointer dereference, RIP:
  0010:tcp_rearm_rto+0xe4/0x160

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  We've recently started seeing the following crash on a # of machines
  we have running a ceph cluster. They are all running Ubuntu 20.04.6
  LTS.

  Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference, 
address: 0020
  Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel 
mode
  Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x) - not-present 
page
  Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0 
  Jul 26 15:05:02 rx [11061395.801164] Oops:  [#1] SMP NOPTI
  Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1 
Tainted: GW 5.4.0-174-generic #193-Ubuntu
  Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26 
os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
  Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160
  Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 
5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef 
e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 
e3
  Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:b75d40003e08 EFLAGS: 
00010246
  Jul 26 15:05:02 rx [11061395.855149] RAX:  RBX: 
20c49ba5e353f7cf RCX: 
  Jul 26 15:05:02 rx [11061395.862542] RDX: 62177c30 RSI: 
231c RDI: 9874ad283a60
  Jul 26 15:05:02 rx [11061395.869933] RBP: b75d40003e20 R08: 
 R09: 987605e20aa8
  Jul 26 15:05:02 rx [11061395.877318] R10: b75d40003f00 R11: 
b75d4460f740 R12: 9874ad283900
  Jul 26 15:05:02 rx [11061395.884710] R13: 9874ad283a60 R14: 
9874ad283980 R15: 9874ad283d30
  Jul 26 15:05:02 rx [11061395.892095] FS:  7f1ef4a2e700() 
GS:987605e0() knlGS:
  Jul 26 15:05:02 rx [11061395.900438] CS:  0010 DS:  ES:  CR0: 
80050033
  Jul 26 15:05:02 rx [11061395.906435] CR2: 0020 CR3: 
003e450ba003 CR4: 00760ef0
  Jul 26 15:05:02 rx [11061395.913822] PKRU: 5554
  Jul 26 15:05:02 rx [11061395.916786] Call Trace:
  Jul 26 15:05:02 rx [11061395.919488]  
  Jul 26 15:05:02 rx [11061395.921765]  ? show_regs.cold+0x1a/0x1f
  Jul 26 15:05:02 rx [11061395.925859]  ? __die+0x90/0xd9
  Jul 26 15:05:02 rx [11061395.929169]  ? no_context+0x196/0x380
  Jul 26 15:05:02 rx [11061395.933088]  ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
  Jul 26 15:05:02 rx [11061395.938216]  ? ip6_sublist_rcv_finish+0x3d/0x50
  Jul 26 15:05:02 rx [11061395.943000]  ? __bad_area_nosemaphore+0x50/0x1a0
  Jul 26 15:05:02 rx [11061395.947873]  ? bad_area_nosemaphore+0x16/0x20
  Jul 26 15:05:02 rx [11061395.952486]  ? do_user_addr_fault+0x267/0x450
  Jul 26 15:05:02 rx [11061395.957104]  ? ipv6_list_rcv+0x112/0x140
  Jul 26 15:05:02 rx [11061395.961279]  ? __do_page_fault+0x58/0x90
  Jul 26 15:05:02 rx [11061395.965458]  ? do_page_fault+0x2c/0xe0
  Jul 26 15:05:02 rx [11061395.969465]  ? page_fault+0x34/0x40
  Jul 26 15:05:02 rx [11061395.973217]  ? tcp_rearm_rto+0xe4/0x160
  Jul 26 15:05:02 rx [11061395.977313]  ? tcp_rearm_rto+0xe4/0x160
  Jul 26 15:05:02 rx [11061395.981408]  tcp_send_loss_probe+0x10b/0x220
  Jul 26 15:05:02 rx [11061395.985937]  tcp_write_timer_handler+0x1b4/0x240
  Jul 26 15:05:02 rx [11061395.990809]  tcp_write_timer+0x9e/0xe0
  Jul 26 15:05:02 rx [11061395.994814]  ? tcp_write_timer_handler+0x240/0x240
  Jul 26 15:05:02 rx [11061395.999866]  call_timer_fn+0x32/0x130
  Jul 26 15:05:02 rx [11061396.003782]  __run_timers.part.0+0x180/0x280
  Jul 26 15:05:02 rx [11061396.008309]  ? recalibrate_cpu_khz+0x10/0x10
  Jul 26 15:05:02 rx [11061396.012841]  ? native_x2apic_icr_write+0x30/0x30
  Jul 26 15:05:02 rx [11061396.017718]  ? lapic_next_event+0x21/0x30
  Jul 26 15:05:02 rx [11061396.021984]  ? clockevents_program_event+0x8f/0xe0
  Jul 26 15:05:02 rx [11061396.027035]  run_timer_softirq+0x2a/0x50
  Jul 26 15:05:02 rx [11061396.031212]  __do_softirq+0xd1/0x2c1
  Jul 26 15:05:02 rx [11061396.035044]  do_softirq_own_stack+0x2a/0x40
  Jul 26 15:05:02 rx [11061396.039480]  
  Jul 26 15:05:02 rx [11061396.041840]  do_softirq.part.0+0x46/0x50
  Jul 26 15:05:02 rx [11061396.046022]  __local_bh_enable_ip+0x50/0x60
  Jul 26 15:05:02 rx [11061396.050460]  _raw_spin_

[Bug 2077657] Re: Kernel Oops - BUG: kernel NULL pointer dereference, RIP: 0010:tcp_rearm_rto+0xe4/0x160

2024-09-17 Thread Matthew Ruffell
Upstream threads:

V1:
https://lore.kernel.org/netdev/a76ac35a-9be2-4849-985c-2f3b2a922...@akamai.com/T/

V2:
https://www.spinics.net/lists/netdev/msg1027412.html

V3:
https://lore.kernel.org/netdev/CADVnQy=xv_qy77nzk2wvjxdkjsiba+k5b4lhgf4msr-v1r2...@mail.gmail.com/T/

Josh, if you need any help building test Ubuntu kernels, let us know.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077657

Title:
  Kernel Oops - BUG: kernel NULL pointer dereference, RIP:
  0010:tcp_rearm_rto+0xe4/0x160

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2080866] Re: lockd: refusing to freeze on S3 suspend; prevents suspend

2024-09-16 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2078704 ***
https://bugs.launchpad.net/bugs/2078704

Hi Andrew,

I believe this is the same as bug 2078704. We narrowed it down to
"sunrpc: exclude from freezer when waiting for requests:" that fixed the
issue. It should be present in 5.15.0-121-generic or later.

Thanks,
Matthew

** This bug has been marked a duplicate of bug 2078704
   [REGRESSION] Unable to suspend-to-ram with NFS mounted on 5.15.0-119-generic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2080866

Title:
  lockd: refusing to freeze on S3 suspend; prevents suspend

Status in linux package in Ubuntu:
  New

Bug description:
  When I have one of my NFSv3 mounts active, the system will fail to go
  to sleep. Kernel trace:

  kernel: Freezing user space processes ... (elapsed 0.003 seconds) done.
  kernel: OOM killer disabled.
  kernel: Freezing remaining freezable tasks ...
  kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to 
freeze, wq_busy=0):
  kernel: task:lockd   state:I stack:0 pid:174656 ppid: 2 
flags:0x4000
  kernel: Call Trace:
  kernel:  
  kernel:  __schedule+0x24e/0x590
  kernel:  schedule+0x69/0x110
  kernel:  schedule_timeout+0x105/0x140
  kernel:  svc_get_next_xprt+0xf1/0x190 [sunrpc]
  kernel:  svc_recv+0x1a9/0x330 [sunrpc]
  kernel:  lockd+0xa9/0x1c0 [lockd]
  kernel:  ? set_grace_period+0xa0/0xa0 [lockd]
  kernel:  kthread+0x127/0x150
  kernel:  ? set_kthread_struct+0x50/0x50
  kernel:  ret_from_fork+0x1f/0x30
  kernel:  

  First appeared in linux-image-5.15.0-118-generic, present in linux-
  image-5.15.0-119-generic, not present in linux-
  image-5.15.0-117-generic.

  Steps to reproduce:
  * Boot linux-image-5.15.0-118-generic or -119
  * Mount an nfs share.  Ensure that lockd kernel module is loaded
  * Put computer into suspsend, it will fail to enter suspend as the lockd task 
will fail to freeze.

  When lockd module is not loaded, system is able to go into S3 without
  issue.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-118-generic (not installed)
  ProcVersionSignature: Ubuntu 5.15.0-117.127-generic 5.15.158
  Uname: Linux 5.15.0-117-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.11-0ubuntu82.6
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC1:  stieg  2541 F pipewire
stieg  2542 F wireplumber
   /dev/snd/controlC0:  stieg  2542 F wireplumber
   /dev/snd/seq:stieg  2541 F pipewire
  CasperMD5CheckResult: unknown
  CurrentDesktop: ubuntu:GNOME
  Date: Mon Sep 16 09:01:52 2024
  InstallationDate: Installed on 2021-07-13 (1160 days ago)
  InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 
(20210209.1)
  MachineType: LENOVO 20Y4S0CQ00
  ProcFB: 0 i915drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-117-generic 
root=/dev/mapper/vgubuntu-root ro nvidia_drm.modeset=1 quiet splash vt.handoff=7
  PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No 
PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-117-generic N/A
   linux-backports-modules-5.15.0-117-generic  N/A
   linux-firmware  20220329.git681281e4-0ubuntu3.31
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2022-06-17 (821 days ago)
  dmi.bios.date: 07/31/2024
  dmi.bios.release: 1.29
  dmi.bios.vendor: LENOVO
  dmi.bios.version: N40ET47W (1.29 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 20Y4S0CQ00
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0J40697 WIN
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.ec.firmware.release: 1.18
  dmi.modalias: 
dmi:bvnLENOVO:bvrN40ET47W(1.29):bd07/31/2024:br1.29:efr1.18:svnLENOVO:pn20Y4S0CQ00:pvrThinkPadP1Gen4i:rvnLENOVO:rn20Y4S0CQ00:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_20Y4_BU_Think_FM_ThinkPadP1Gen4i:
  dmi.product.family: ThinkPad P1 Gen 4i
  dmi.product.name: 20Y4S0CQ00
  dmi.product.sku: LENOVO_MT_20Y4_BU_Think_FM_ThinkPad P1 Gen 4i
  dmi.product.version: ThinkPad P1 Gen 4i
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2080866/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2078704] Re: [REGRESSION] Unable to suspend-to-ram with NFS mounted on 5.15.0-119-generic

2024-09-16 Thread Matthew Ruffell
** Changed in: linux (Ubuntu Jammy)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2078704

Title:
  [REGRESSION] Unable to suspend-to-ram with NFS mounted on
  5.15.0-119-generic

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Hello,

  Since Linux 5.15.0-119-generic, I am unable to suspend-to-ram my
  laptop when the NFS share from our OpenWRT router is mounted. I get
  the following errors in dmesg:

  [ 5205.898693] PM: suspend entry (deep)
  [ 5205.918450] Filesystems sync: 0.019 seconds
  [ 5205.918680] Freezing user space processes ... (elapsed 0.002 seconds) done.
  [ 5205.921666] OOM killer disabled.
  [ 5205.921668] Freezing remaining freezable tasks ... 
  [ 5225.933279] Freezing of tasks failed after 20.012 seconds (1 tasks 
refusing to freeze, wq_busy=0):
  [ 5225.933839] task:NFSv4 callback  state:I stack:0 pid:222125 ppid: 
2 flags:0x4000
  [ 5225.933867] Call Trace:
  [ 5225.933876]  
  [ 5225.933891]  __schedule+0x2cd/0x890
  [ 5225.933930]  schedule+0x69/0x110
  [ 5225.933961]  nfs41_callback_svc+0x179/0x180 [nfsv4]
  [ 5225.934140]  ? wait_woken+0x60/0x60
  [ 5225.934160]  ? nfs_map_gid_to_group+0x120/0x120 [nfsv4]
  [ 5225.934318]  kthread+0x127/0x150
  [ 5225.934338]  ? set_kthread_struct+0x50/0x50
  [ 5225.934359]  ret_from_fork+0x1f/0x30
  [ 5225.934394]  
  [ 5225.934412] Restarting kernel threads ... done.
  [ 5225.935170] OOM killer enabled.
  [ 5225.935178] Restarting tasks ... done.
  [ 5225.950919] PM: suspend exit

  After I unmount the share, the suspend works.

  To me it seems that it is caused by the backport of the patch "nfsd:
  don't allow nfsd threads to be signalled." (which got into 5.15.0-118)
  and the lack of the following commit (which is present in upstream
  5.15):

  commit 3feac2b5529335dff4f91d3e97b006a7096d63ec
  Author: NeilBrown 
  Date:   Fri Jun 7 09:10:48 2024 -0400

  sunrpc: exclude from freezer when waiting for requests:

  Prior to v6.1, the freezer will only wake a kernel thread from an
  uninterruptible sleep.  Since we changed svc_get_next_xprt() to 
use and
  IDLE sleep the freezer cannot wake it.  We need to tell the 
freezer to
  ignore it instead.

  To make this work with only upstream commits, 5.15.y would need
  commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
  which allows non-interruptible sleeps to be woken by the freezer.

  Fixes: 9b8a8e5e8129 ("nfsd: don't allow nfsd threads to be 
signalled.")
  Tested-by: Jon Hunter 
  Signed-off-by: NeilBrown 
  Signed-off-by: Greg Kroah-Hartman 

  This was discussed earlier on LKML in this thread:

  
https://lore.kernel.org/all/171693973585.27191.10038342787850677...@noble.neil.brown.name/

  
  My hardware: HP 17-by0001nw laptop

  The NFS share is mounted over WiFi.

  /etc/exports on the router:
   
  /mnt/pendrive/nfs
-fsid=0,rw,sync,no_subtree_check,all_squash,anonuid=1000,anongid=1000 
192.168.1.20 192.168.1.22 192.168.1.24

  mount parameters on my laptop (from /etc/fstab):

  192.168.1.3:/
  /media/netdrive nfs
  
users,noauto,exec,sync,noac,lookupcache=none,nfsvers=4.2,soft,timeo=10,retrans=2
  0   0

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-5.15.0-119-generic 5.15.0-119.129~20.04.1
  ProcVersionSignature: Ubuntu 5.15.0-119.129~20.04.1-generic 5.15.160
  Uname: Linux 5.15.0-119-generic x86_64
  ApportVersion: 2.20.11-0ubuntu27.27
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: GNOME-Classic:GNOME
  Date: Mon Sep  2 10:33:11 2024
  InstallationDate: Installed on 2020-09-12 (1450 days ago)
  InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=pl_PL.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-hwe-5.15
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2078704/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2078704] Re: [REGRESSION] Unable to suspend-to-ram with NFS mounted on 5.15.0-119-generic

2024-09-16 Thread Matthew Ruffell
** Changed in: linux (Ubuntu Jammy)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2078704

Title:
  [REGRESSION] Unable to suspend-to-ram with NFS mounted on
  5.15.0-119-generic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2078704/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2080866] Re: lockd: refusing to freeze on S3 suspend; prevents suspend

2024-09-16 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2078704 ***
https://bugs.launchpad.net/bugs/2078704

Hi Andrew,

I believe this is the same as bug 2078704. We narrowed it down to
"sunrpc: exclude from freezer when waiting for requests:" that fixed the
issue. It should be present in 5.15.0-121-generic or later.

Thanks,
Matthew

** This bug has been marked a duplicate of bug 2078704
   [REGRESSION] Unable to suspend-to-ram with NFS mounted on 5.15.0-119-generic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2080866

Title:
  lockd: refusing to freeze on S3 suspend; prevents suspend

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2080866/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2080635] Re: Wrong Battery Reading after Wake from S3 Sleep

2024-09-14 Thread Matthew Ruffell
Hi Paul,

Let's have a look.

ubuntu-noble$ git log --grep "thermal: core: Change PM notifier priority to the 
minimum" origin/master-next
commit 80123fe6fcc3f72ee6fbe7bc6c64043b9e4c91e1
Author: Rafael J. Wysocki 
Date:   Fri Jun 14 17:26:00 2024 +0200

thermal: core: Change PM notifier priority to the minimum

BugLink: https://bugs.launchpad.net/bugs/2075154

commit 494c7d055081da066424706b28faa9a4c719d852 upstream.

It is reported that commit 5a5efdaffda5 ("thermal: core: Resume thermal
zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2
to become invalid after a resume from S3 (and it is necessary to reboot
the machine to restore correct battery data).  Some investigation into
the problem indicated that it happened because, after the commit in
question, the ACPI battery PM notifier ran in parallel with
thermal_zone_device_resume() for one of the thermal zones which
apparently confused the platform firmware on the affected system.

While the exact reason for the firmware confusion remains unclear, it
is arguably not particularly relevant, and the expected behavior of the
affected system can be restored by making the thermal PM notifier run
at the lowest priority which avoids interference between work items
spawned by it and the other PM notifiers (that will run before those
work items now).

Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881
Reported-by: fhort...@yahoo.de
Tested-by: fhort...@yahoo.de
Cc: 6.8+  # 6.8+
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Manuel Diewald 
Signed-off-by: Stefan Bader 
Signed-off-by: Roxana Nicolescu 

matthew@ThinkPad-X1:~/Work/kernel/ubuntu-noble$ git describe --contains 
80123fe6fcc3f72ee6fbe7bc6c64043b9e4c91e1
Ubuntu-6.8.0-44.44~1

You should be good with 6.8.0-44-generic which came out this week for
24.04.

Looking at https://kernel.ubuntu.com/reports/kernel-stable-board/ still shows
6.8.0-44-generic still in -proposed for 22.04 maybe the Kernel Team will
release early next week.

Thanks,
Matthew

** Bug watch added: Linux Kernel Bug Tracker #218881
   https://bugzilla.kernel.org/show_bug.cgi?id=218881

** Changed in: linux-hwe-6.8 (Ubuntu)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.8 in Ubuntu.
https://bugs.launchpad.net/bugs/2080635

Title:
  Wrong Battery Reading after Wake from S3 Sleep

Status in linux-hwe-6.8 package in Ubuntu:
  Fix Released

Bug description:
  The bug was reported in
  https://bugzilla.kernel.org/show_bug.cgi?id=218881 and fixed from
  6.9.7 stable

  Ubuntu 22.04.4 LTS is on kernel 6.8.0

  Can the fix be ported to 6.8.x ?

  My system:
  Ubuntu 6.8.0-40.40~22.04.3-generic 6.8.12

  Thanks in advance!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.8/+bug/2080635/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2080635] Re: Wrong Battery Reading after Wake from S3 Sleep

2024-09-14 Thread Matthew Ruffell
Hi Paul,

Let's have a look.

ubuntu-noble$ git log --grep "thermal: core: Change PM notifier priority to the 
minimum" origin/master-next
commit 80123fe6fcc3f72ee6fbe7bc6c64043b9e4c91e1
Author: Rafael J. Wysocki 
Date:   Fri Jun 14 17:26:00 2024 +0200

thermal: core: Change PM notifier priority to the minimum

BugLink: https://bugs.launchpad.net/bugs/2075154

commit 494c7d055081da066424706b28faa9a4c719d852 upstream.

It is reported that commit 5a5efdaffda5 ("thermal: core: Resume thermal
zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2
to become invalid after a resume from S3 (and it is necessary to reboot
the machine to restore correct battery data).  Some investigation into
the problem indicated that it happened because, after the commit in
question, the ACPI battery PM notifier ran in parallel with
thermal_zone_device_resume() for one of the thermal zones which
apparently confused the platform firmware on the affected system.

While the exact reason for the firmware confusion remains unclear, it
is arguably not particularly relevant, and the expected behavior of the
affected system can be restored by making the thermal PM notifier run
at the lowest priority which avoids interference between work items
spawned by it and the other PM notifiers (that will run before those
work items now).

Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881
Reported-by: fhort...@yahoo.de
Tested-by: fhort...@yahoo.de
Cc: 6.8+  # 6.8+
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Manuel Diewald 
Signed-off-by: Stefan Bader 
Signed-off-by: Roxana Nicolescu 

matthew@ThinkPad-X1:~/Work/kernel/ubuntu-noble$ git describe --contains 
80123fe6fcc3f72ee6fbe7bc6c64043b9e4c91e1
Ubuntu-6.8.0-44.44~1

You should be good with 6.8.0-44-generic which came out this week for
24.04.

Looking at https://kernel.ubuntu.com/reports/kernel-stable-board/ still shows
6.8.0-44-generic still in -proposed for 22.04 maybe the Kernel Team will
release early next week.

Thanks,
Matthew

** Bug watch added: Linux Kernel Bug Tracker #218881
   https://bugzilla.kernel.org/show_bug.cgi?id=218881

** Changed in: linux-hwe-6.8 (Ubuntu)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2080635

Title:
  Wrong Battery Reading after Wake from S3 Sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.8/+bug/2080635/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2076957] Re: isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity

2024-09-10 Thread Matthew Ruffell
Performing verification for Jammy.

I started a n2-highcpu-32 instance on GCP due to bare metal systems
being unavailable due to the certification lab move.

I edited /etc/default/grub.d/50-cloudimg-settings.cfg and set:

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200 isolcpus=4-7,16-20
rcu_nocb_poll rcu_nocbs=4-7,16-20 systemd.unified_cgroup_hierarchy=1"

ran sudo update-grub and rebooted.

Due to 5.15.0-121-generic still being in -proposed due to 2024.08.05
releasing slightly later than expected, I enabled -proposed and
installed 5.15.0-121-generic to get a baseline.

I rebooted again.

I then set up htop, s-tui and the while loop to check for processes on
4-7,16-20.

I started s-tui, and there were processes placed on the other cores
within 3 minutes. By 10 minutes, all cores had stress running on them,
and isolation was completely ignored.

I then enabled -proposed2 and installed 5.15.0-122-generic:

$ uname -rv
5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024

I re-ran s-tui and started stress.

After 1 hour and 20 minutes, the isolcated cpus were still completely
isolated, with no processes running on them. Stress was only confined to
regular cpus.

The kernel in -proposed fixes the issue. Happy to mark verified for
jammy.

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076957

Title:
  isolcpus are ignored when using cgroups V2, causing processes to have
  wrong affinity

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076957/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2076957] Re: isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity

2024-09-10 Thread Matthew Ruffell
Performing verification for Jammy.

I started a n2-highcpu-32 instance on GCP due to bare metal systems
being unavailable due to the certification lab move.

I edited /etc/default/grub.d/50-cloudimg-settings.cfg and set:

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200 isolcpus=4-7,16-20
rcu_nocb_poll rcu_nocbs=4-7,16-20 systemd.unified_cgroup_hierarchy=1"

ran sudo update-grub and rebooted.

Due to 5.15.0-121-generic still being in -proposed due to 2024.08.05
releasing slightly later than expected, I enabled -proposed and
installed 5.15.0-121-generic to get a baseline.

I rebooted again.

I then set up htop, s-tui and the while loop to check for processes on
4-7,16-20.

I started s-tui, and there were processes placed on the other cores
within 3 minutes. By 10 minutes, all cores had stress running on them,
and isolation was completely ignored.

I then enabled -proposed2 and installed 5.15.0-122-generic:

$ uname -rv
5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024

I re-ran s-tui and started stress.

After 1 hour and 20 minutes, the isolcated cpus were still completely
isolated, with no processes running on them. Stress was only confined to
regular cpus.

The kernel in -proposed fixes the issue. Happy to mark verified for
jammy.

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2076957

Title:
  isolcpus are ignored when using cgroups V2, causing processes to have
  wrong affinity

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2076957

  [Impact]

  In latency sensitive environments, it is very common to use isolcpus
  to reserve a set of cpus that no other processes are to be placed on,
  and run just dpdk in poll mode.

  There is a bug in the jammy kernel, where if cgroups V2 are enabled,
  after several minutes the kernel will place other processes onto these
  reserved isolcpus at random. This disturbs dpdk and introduces
  latency.

  The issue does not occur with cgroups V1, so a workaround is to use
  cgroups V1 instead of V2 for the moment.

  [Fix]

  I arrived at this commit after a full git bisect, which fixes the
  issue. It landed in 6.2-rc1:

  commit 7fd4da9c1584be97ffbc40e600a19cb469fd4e78
  Author: Waiman Long 
  Date:   Sat Nov 12 17:19:39 2022 -0500
  Subject: cgroup/cpuset: Optimize cpuset_attach() on v2
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fd4da9c1584be97ffbc40e600a19cb469fd4e78

  Only the 5.15 Jammy kernel needs this fix. Focal works correctly as
  is.

  The commit skips calls to cpuset_attach() if the underlying cpusets or
  memory have not changed in a cgroup, and it seems to fix the issue.

  [Testcase]

  Deploy a bare metal server, ideally with a number of cores, 56 should be 
plenty.
  Use Jammy, with the 5.15 GA kernel.

  1) Edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT to have
  "isolcpus=4-7,32-35 rcu_nocb_poll rcu_nocbs=4-7,32-35 
systemd.unified_cgroup_hierarchy=1"
  2) sudo reboot
  3) sudo cat /sys/devices/system/cpu/isolated
  4-7,32-35
  4) sudo apt install s-tui stress
  5) sudo s-tui
  6) htop
  7) $ while true; do sudo ps -eLF | head -n 1; sudo ps -eLF | grep stress | 
awk -v a="4" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="5" '$9 
== a {print;}'; sudo ps -eLF | grep stress | awk -v a="6" '$9 == a {print;}'; 
sudo ps -eLF | grep stress | awk -v a="7" '$9 == a {print;}'; sudo ps -eLF | 
grep stress | awk -v a="32" '$9 == a {print;}'; sudo ps -eLF | grep stress | 
awk -v a="33" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="34" 
'$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="35" '$9 == a 
{print;}'; sleep 5; done

  Setup isolcpus to separate off 4-7 and 32-35, so each NUMA node has a
  set of isolated CPUs.

  s-tui is a great frontend for stress, and it starts stress processes.
  All stress processes should initially be on non-isolated CPUs, confirm
  this with htop, that 4-7 and 32-25 are at 0% while every other cpu is
  at 100%.

  After 3 minutes, but sometimes it takes up to 10 minutes, a stress
  process, or the s-tui process will be incorrectly placed onto an
  isolated cpu, causing it to increase in usage in htop. The while
  script checking ps with cpu affinities will also likely be printing
  the incorrectly placed process.

  A test kernel is available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf391137-test

  If you install it, the processes will not be placed onto the isolated
  cpus.

  [Where problems could occur]

  The patch changes how cgroups determines when cpuset_attach() should
  be called. cpuset_attach() is currently called very frequently in the
  5.15 Jammy kernel, but mos

[Kernel-packages] [Bug 2080039] Re: Kernel BUG: Bad page state in process kswapd0

2024-09-10 Thread Matthew Ruffell
6.8.0-44-generic for noble was released to -updates this morning.

** Changed in: linux (Ubuntu Noble)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2080039

Title:
  Kernel BUG: Bad page state in process kswapd0

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Released

Bug description:
  Since installing 24.04 two months ago, I've experienced a few random
  full-system freezes that required a hard-reset to recover. Up until
  now, I was not able to find the cause - plugging in a monitor to the
  system would just display nothing, and the journal logs would just
  stop abruptly.

  My first instinct was bad memory, so after it happened last week I ran
  memtest for several hours, but it did not find any memory errors.

  However I now believe I have found the actual cause, because it just
  happened again and luckily this time the journal saved the start of a
  kernel BUG message:

  BUG: Bad page state in process kswapd0  pfn:3f053e
  page:0f35bcf8 refcount:0 mapcount:0 mapping:0e24c844 
index:0x2bcbd pfn:0x3f053e
  aops:btree_aops [btrfs] ino:1
  flags: 0x17c008(uptodate|node=0|zone=2|lastcpupid=0x1f)
  page_type: 0x()

  After some digging, I found this kernel bug report:
  
https://lore.kernel.org/lkml/CABXGCsPktcHQOvKTbPaTwegMExije=Gpgci5NW=hqoro-s7...@mail.gmail.com/

  that appears to describe the exact same bug (I am also using btrfs as
  the root partition, and my swap file is also on that btrfs
  filesystem).

  Then I also found this kernel patch:
  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f3a5367c679d31473d3fbb391675055b4792c309

  that appears to be a fix for the above bug.

  To try to check if this fix is present in my kernel (no idea if this
  is valid), I installed the linux-source package, extracted the archive
  in /usr/src/linux-source-6.8.0, and checked the file modified by the
  patch mentioned above - and the changes do not appear to have been
  made.

  So if the patch has not been applied, could this please be done? If it
  has actually been applied, then this is some other bug and I need to
  do more investigation...

  For the time being I have disabled swap to hopefully try and avoid the
  crash.

  # uname -a
  Linux server 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 
20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

  # cat /proc/version_signature
  Ubuntu 6.8.0-41.41-generic 6.8.12

  # lsb_release -rd
  No LSB modules are available.
  Description:Ubuntu 24.04.1 LTS
  Release:24.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2080039/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2080039] Re: Kernel BUG: Bad page state in process kswapd0

2024-09-10 Thread Matthew Ruffell
6.8.0-44-generic for noble was released to -updates this morning.

** Changed in: linux (Ubuntu Noble)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2080039

Title:
  Kernel BUG: Bad page state in process kswapd0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2080039/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2080039] Re: Kernel BUG: Bad page state in process kswapd0

2024-09-09 Thread Matthew Ruffell
Hi Andre,

Great research! Lets have a look at that patch you found.

ubuntu-noble$ git log --grep "btrfs: protect folio::private when attaching 
extent buffer folios" origin/master-next
commit 78f0e5fd1fce33785a3454f5712a6f6160201bd5
Author: Qu Wenruo 
Date:   Thu Jun 6 11:01:51 2024 +0930
Subject: btrfs: protect folio::private when attaching extent buffer folios

$ git describe --contains 78f0e5fd1fce33785a3454f5712a6f6160201bd5
Ubuntu-6.8.0-44.44~547

This is a part of 6.8.0-44-generic, which is currently in -proposed. It should 
be
released this week as part of SRU cycle 2024.08.05.

If you need it right now, you can enable -proposed and install 6.8.0-44-generic,
but I think it should be out in a few days.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: linux-meta (Ubuntu)

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Noble)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2080039

Title:
  Kernel BUG: Bad page state in process kswapd0

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Committed

Bug description:
  Since installing 24.04 two months ago, I've experienced a few random
  full-system freezes that required a hard-reset to recover. Up until
  now, I was not able to find the cause - plugging in a monitor to the
  system would just display nothing, and the journal logs would just
  stop abruptly.

  My first instinct was bad memory, so after it happened last week I ran
  memtest for several hours, but it did not find any memory errors.

  However I now believe I have found the actual cause, because it just
  happened again and luckily this time the journal saved the start of a
  kernel BUG message:

  BUG: Bad page state in process kswapd0  pfn:3f053e
  page:0f35bcf8 refcount:0 mapcount:0 mapping:0e24c844 
index:0x2bcbd pfn:0x3f053e
  aops:btree_aops [btrfs] ino:1
  flags: 0x17c008(uptodate|node=0|zone=2|lastcpupid=0x1f)
  page_type: 0x()

  After some digging, I found this kernel bug report:
  
https://lore.kernel.org/lkml/CABXGCsPktcHQOvKTbPaTwegMExije=Gpgci5NW=hqoro-s7...@mail.gmail.com/

  that appears to describe the exact same bug (I am also using btrfs as
  the root partition, and my swap file is also on that btrfs
  filesystem).

  Then I also found this kernel patch:
  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f3a5367c679d31473d3fbb391675055b4792c309

  that appears to be a fix for the above bug.

  To try to check if this fix is present in my kernel (no idea if this
  is valid), I installed the linux-source package, extracted the archive
  in /usr/src/linux-source-6.8.0, and checked the file modified by the
  patch mentioned above - and the changes do not appear to have been
  made.

  So if the patch has not been applied, could this please be done? If it
  has actually been applied, then this is some other bug and I need to
  do more investigation...

  For the time being I have disabled swap to hopefully try and avoid the
  crash.

  # uname -a
  Linux server 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 
20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

  # cat /proc/version_signature
  Ubuntu 6.8.0-41.41-generic 6.8.12

  # lsb_release -rd
  No LSB modules are available.
  Description:Ubuntu 24.04.1 LTS
  Release:24.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2080039/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2080039] Re: Kernel BUG: Bad page state in process kswapd0

2024-09-09 Thread Matthew Ruffell
Hi Andre,

Great research! Lets have a look at that patch you found.

ubuntu-noble$ git log --grep "btrfs: protect folio::private when attaching 
extent buffer folios" origin/master-next
commit 78f0e5fd1fce33785a3454f5712a6f6160201bd5
Author: Qu Wenruo 
Date:   Thu Jun 6 11:01:51 2024 +0930
Subject: btrfs: protect folio::private when attaching extent buffer folios

$ git describe --contains 78f0e5fd1fce33785a3454f5712a6f6160201bd5
Ubuntu-6.8.0-44.44~547

This is a part of 6.8.0-44-generic, which is currently in -proposed. It should 
be
released this week as part of SRU cycle 2024.08.05.

If you need it right now, you can enable -proposed and install 6.8.0-44-generic,
but I think it should be out in a few days.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: linux-meta (Ubuntu)

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Noble)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2080039

Title:
  Kernel BUG: Bad page state in process kswapd0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2080039/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2078704] Re: [REGRESSION] Unable to suspend-to-ram with NFS mounted on 5.15.0-119-generic

2024-09-02 Thread Matthew Ruffell
Hi Mateusz,

ubuntu-jammy$ git log --grep "sunrpc: exclude from freezer when waiting for 
requests:" origin/master-next
commit 4bf93c02ba241ccde7e00572138c1175ac09b242
Author: NeilBrown 
Date:   Fri Jun 7 09:10:48 2024 -0400
Subject: sunrpc: exclude from freezer when waiting for requests:
$ git describe --contains 4bf93c02ba241ccde7e00572138c1175ac09b242
Ubuntu-5.15.0-120.130~552

This seems to be in 5.15.0-120-generic or later. Going by
https://kernel.ubuntu.com/ we seem to be in 2024.08.05 SRU cycle, which
by https://kernel.ubuntu.com/reports/kernel-stable-board/ mentions that
5.15.0-121-generic is in -proposed.

Looking at 5.15.0-121-generic:

8965f08b593a (tag: Ubuntu-5.15.0-121.131, origin/master-prep) UBUNTU: 
Ubuntu-5.15.0-121.131
b958511b9839 UBUNTU: link-to-tracker: update tracking bug
c8faaf97ef7a UBUNTU: SAUCE: Revert "bpf: Allow reads from uninit stack"
4e3730273e61 UBUNTU: Start new release
76b2d2efec68 (tag: Ubuntu-5.15.0-120.130) UBUNTU: Ubuntu-5.15.0-120.130

5.15.0-121-generic is a respin of 5.15.0-120-generic with a single
revert.

So this will be fixed in 5.15.0-121-generic which should be released
this week.

Let me know if it doesn't when you get 5.15.0-121-generic installed.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: linux-signed-hwe-5.15 (Ubuntu)

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Jammy)
   Status: New => Fix Committed

** Changed in: linux (Ubuntu)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2078704

Title:
  [REGRESSION] Unable to suspend-to-ram with NFS mounted on
  5.15.0-119-generic

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  Hello,

  Since Linux 5.15.0-119-generic, I am unable to suspend-to-ram my
  laptop when the NFS share from our OpenWRT router is mounted. I get
  the following errors in dmesg:

  [ 5205.898693] PM: suspend entry (deep)
  [ 5205.918450] Filesystems sync: 0.019 seconds
  [ 5205.918680] Freezing user space processes ... (elapsed 0.002 seconds) done.
  [ 5205.921666] OOM killer disabled.
  [ 5205.921668] Freezing remaining freezable tasks ... 
  [ 5225.933279] Freezing of tasks failed after 20.012 seconds (1 tasks 
refusing to freeze, wq_busy=0):
  [ 5225.933839] task:NFSv4 callback  state:I stack:0 pid:222125 ppid: 
2 flags:0x4000
  [ 5225.933867] Call Trace:
  [ 5225.933876]  
  [ 5225.933891]  __schedule+0x2cd/0x890
  [ 5225.933930]  schedule+0x69/0x110
  [ 5225.933961]  nfs41_callback_svc+0x179/0x180 [nfsv4]
  [ 5225.934140]  ? wait_woken+0x60/0x60
  [ 5225.934160]  ? nfs_map_gid_to_group+0x120/0x120 [nfsv4]
  [ 5225.934318]  kthread+0x127/0x150
  [ 5225.934338]  ? set_kthread_struct+0x50/0x50
  [ 5225.934359]  ret_from_fork+0x1f/0x30
  [ 5225.934394]  
  [ 5225.934412] Restarting kernel threads ... done.
  [ 5225.935170] OOM killer enabled.
  [ 5225.935178] Restarting tasks ... done.
  [ 5225.950919] PM: suspend exit

  After I unmount the share, the suspend works.

  To me it seems that it is caused by the backport of the patch "nfsd:
  don't allow nfsd threads to be signalled." (which got into 5.15.0-118)
  and the lack of the following commit (which is present in upstream
  5.15):

  commit 3feac2b5529335dff4f91d3e97b006a7096d63ec
  Author: NeilBrown 
  Date:   Fri Jun 7 09:10:48 2024 -0400

  sunrpc: exclude from freezer when waiting for requests:

  Prior to v6.1, the freezer will only wake a kernel thread from an
  uninterruptible sleep.  Since we changed svc_get_next_xprt() to 
use and
  IDLE sleep the freezer cannot wake it.  We need to tell the 
freezer to
  ignore it instead.

  To make this work with only upstream commits, 5.15.y would need
  commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
  which allows non-interruptible sleeps to be woken by the freezer.

  Fixes: 9b8a8e5e8129 ("nfsd: don't allow nfsd threads to be 
signalled.")
  Tested-by: Jon Hunter 
  Signed-off-by: NeilBrown 
  Signed-off-by: Greg Kroah-Hartman 

  This was discussed earlier on LKML in this thread:

  
https://lore.kernel.org/all/171693973585.27191.10038342787850677...@noble.neil.brown.name/

  
  My hardware: HP 17-by0001nw laptop

  The NFS share is mounted over WiFi.

  /etc/exports on the router:
   
  /mnt/pendrive/nfs
-fsid=0,rw,sync,no_subtree_check,all_squash,anonuid=1000,anongid=1000 
192.168.1.20 192.168.1.22 192.168.1.24

  mount parameters on my laptop (from /etc/fstab):

  192.168.1.3:/
  /media/netdrive nfs
  
users,noauto,exec,sync,noac,lookupcache=none,nfsve

[Bug 2078704] Re: [REGRESSION] Unable to suspend-to-ram with NFS mounted on 5.15.0-119-generic

2024-09-02 Thread Matthew Ruffell
Hi Mateusz,

ubuntu-jammy$ git log --grep "sunrpc: exclude from freezer when waiting for 
requests:" origin/master-next
commit 4bf93c02ba241ccde7e00572138c1175ac09b242
Author: NeilBrown 
Date:   Fri Jun 7 09:10:48 2024 -0400
Subject: sunrpc: exclude from freezer when waiting for requests:
$ git describe --contains 4bf93c02ba241ccde7e00572138c1175ac09b242
Ubuntu-5.15.0-120.130~552

This seems to be in 5.15.0-120-generic or later. Going by
https://kernel.ubuntu.com/ we seem to be in 2024.08.05 SRU cycle, which
by https://kernel.ubuntu.com/reports/kernel-stable-board/ mentions that
5.15.0-121-generic is in -proposed.

Looking at 5.15.0-121-generic:

8965f08b593a (tag: Ubuntu-5.15.0-121.131, origin/master-prep) UBUNTU: 
Ubuntu-5.15.0-121.131
b958511b9839 UBUNTU: link-to-tracker: update tracking bug
c8faaf97ef7a UBUNTU: SAUCE: Revert "bpf: Allow reads from uninit stack"
4e3730273e61 UBUNTU: Start new release
76b2d2efec68 (tag: Ubuntu-5.15.0-120.130) UBUNTU: Ubuntu-5.15.0-120.130

5.15.0-121-generic is a respin of 5.15.0-120-generic with a single
revert.

So this will be fixed in 5.15.0-121-generic which should be released
this week.

Let me know if it doesn't when you get 5.15.0-121-generic installed.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: linux-signed-hwe-5.15 (Ubuntu)

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Jammy)
   Status: New => Fix Committed

** Changed in: linux (Ubuntu)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2078704

Title:
  [REGRESSION] Unable to suspend-to-ram with NFS mounted on
  5.15.0-119-generic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2078704/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Ubuntu-x-swat] [Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-09-02 Thread Matthew Ruffell
Hi Marc,

You can use the test packages I made in comment #31 as a workaround for
the time being.

We are currently waiting on upstream to review the merge request here:

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1608

There hasn't been a lot of movement, but we can't move forward fixing
Ubuntu until it gets merged upstream.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Desktop-packages] [Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-09-02 Thread Matthew Ruffell
Hi Marc,

You can use the test packages I made in comment #31 as a workaround for
the time being.

We are currently waiting on upstream to review the merge request here:

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1608

There hasn't been a lot of movement, but we can't move forward fixing
Ubuntu until it gets merged upstream.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

Status in X.Org X server:
  New
Status in xorg-server package in Ubuntu:
  Confirmed
Status in xorg-server source package in Focal:
  Confirmed
Status in xorg-server source package in Jammy:
  Confirmed
Status in xorg-server source package in Noble:
  Confirmed
Status in xorg-server source package in Oracular:
  Confirmed

Bug description:
  Xorg crashed with assertion failure (usually in a VM):

  privates.h:121: dixGetPrivateAddr: Assertion `key->initialized'
  failed.

  WORKAROUND

  Select 'Ubuntu on Wayland' on the login screen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-09-02 Thread Matthew Ruffell
Hi Marc,

You can use the test packages I made in comment #31 as a workaround for
the time being.

We are currently waiting on upstream to review the merge request here:

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1608

There hasn't been a lot of movement, but we can't move forward fixing
Ubuntu until it gets merged upstream.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-29 Thread Matthew Ruffell
Hi Krister,

Yes, we are still planning to get this released for focal. To be able to
release, it needs to pass all of its autopkgtests.

If you have a look here:

https://ubuntu-archive-team.ubuntu.com/proposed-
migration/focal/update_excuses.html#e2fsprogs

Its failing on a few of them for some architectures. I probably need to
retry them a few times. I don't have permissions to trigger the tests
though, so I will ask around.

I'll try get the update released soon. I do understand this has been in
the works for a very long time, sorry about that.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Fix Released
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  Fix Committed
Status in e2fsprogs source package in Jammy:
  Fix Released
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  Won't Fix
Status in e2fsprogs source package in Noble:
  Fix Released
Status in e2fsprogs source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-29 Thread Matthew Ruffell
Hi Krister,

Yes, we are still planning to get this released for focal. To be able to
release, it needs to pass all of its autopkgtests.

If you have a look here:

https://ubuntu-archive-team.ubuntu.com/proposed-
migration/focal/update_excuses.html#e2fsprogs

Its failing on a few of them for some architectures. I probably need to
retry them a few times. I don't have permissions to trigger the tests
though, so I will ask around.

I'll try get the update released soon. I do understand this has been in
the works for a very long time, sorry about that.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2069961] Re: turbostat fails with too many open files on large systems

2024-08-26 Thread Matthew Ruffell
Patch on Kernel Team mailing list:

Cover Letter:
https://lists.ubuntu.com/archives/kernel-team/2024-August/153135.html
Patch:
https://lists.ubuntu.com/archives/kernel-team/2024-August/153136.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Mantic:
  Won't Fix
Status in linux source package in Noble:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2069961

  [Impact]

  On large systems, e.g. with 512 cpus or more, turbostat fails to run
  due to exceeding the rlimit for number of files. 512 cpus requires
  1028 file descriptors, but the current limit is 999.

  $ lscpu
  ...
  CPU(s):  512
    On-line CPU(s) list:   0-511
  ...

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There is no workaround, apart from maybe using powerstat instead.

  [Fix]

  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty
  for some time to come.

  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd

  This landed in 6.9-rc4, and requires a backport for minor context
  adjustment in the first hunk for jammy. Noble got fixed already
  through upstream stable.

  [Testcase]

  Deploy a bare metal system with 512 or more cpus.

  Install linux-tools:

  $ sudo apt install linux-tools-$(uname -r)

  Run turbostat:

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There are test kernels available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test

  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.

  [Where problems can occur]

  We are simply increasing the rlimit for file descriptors that
  turbostat can open. This should have no impact on any existing
  systems.

  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things
  are fixed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2069961] Re: turbostat fails with too many open files on large systems

2024-08-26 Thread Matthew Ruffell
Patch on Kernel Team mailing list:

Cover Letter:
https://lists.ubuntu.com/archives/kernel-team/2024-August/153135.html
Patch:
https://lists.ubuntu.com/archives/kernel-team/2024-August/153136.html

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2069961] Re: turbostat fails with too many open files on large systems

2024-08-26 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/2069961
  
  [Impact]
  
  On large systems, e.g. with 512 cpus or more, turbostat fails to run due
  to exceeding the rlimit for number of files. 512 cpus requires 1028 file
  descriptors, but the current limit is 999.
  
  $ lscpu
  ...
  CPU(s):  512
    On-line CPU(s) list:   0-511
  ...
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There is no workaround, apart from maybe using powerstat instead.
  
  [Fix]
  
  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty for
  some time to come.
  
  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
  
- This landed in 6.9-rc4, and is a clean cherry pick to jammy. Noble got
- fixed already through upstream stable.
+ This landed in 6.9-rc4, and requires a backport for minor context
+ adjustment in the first hunk for jammy. Noble got fixed already through
+ upstream stable.
  
  [Testcase]
  
  Deploy a bare metal system with 512 or more cpus.
  
  Install linux-tools:
  
  $ sudo apt install linux-tools-$(uname -r)
  
  Run turbostat:
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There are test kernels available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
  
  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.
  
  [Where problems can occur]
  
  We are simply increasing the rlimit for file descriptors that turbostat
  can open. This should have no impact on any existing systems.
  
  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things are
  fixed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Mantic:
  Won't Fix
Status in linux source package in Noble:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2069961

  [Impact]

  On large systems, e.g. with 512 cpus or more, turbostat fails to run
  due to exceeding the rlimit for number of files. 512 cpus requires
  1028 file descriptors, but the current limit is 999.

  $ lscpu
  ...
  CPU(s):  512
    On-line CPU(s) list:   0-511
  ...

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There is no workaround, apart from maybe using powerstat instead.

  [Fix]

  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty
  for some time to come.

  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd

  This landed in 6.9-rc4, and requires a backport for minor context
  adjustment in the first hunk for jammy. Noble got fixed already
  through upstream stable.

  [Testcase]

  Deploy a bare metal system with 512 or more cpus.

  Install linux-tools:

  $ sudo apt install linux-tools-$(uname -r)

  Run turbostat:

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There are test kernels available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test

  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.

  [Where problems can occur]

  We are simply increasing the rlimit for file descriptors that
  turbostat can open. This should have no impact on any existing
  systems.

  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things
  are fixed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   

[Bug 2069961] Re: turbostat fails with too many open files on large systems

2024-08-26 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/2069961
  
  [Impact]
  
  On large systems, e.g. with 512 cpus or more, turbostat fails to run due
  to exceeding the rlimit for number of files. 512 cpus requires 1028 file
  descriptors, but the current limit is 999.
  
  $ lscpu
  ...
  CPU(s):  512
    On-line CPU(s) list:   0-511
  ...
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There is no workaround, apart from maybe using powerstat instead.
  
  [Fix]
  
  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty for
  some time to come.
  
  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
  
- This landed in 6.9-rc4, and is a clean cherry pick to jammy. Noble got
- fixed already through upstream stable.
+ This landed in 6.9-rc4, and requires a backport for minor context
+ adjustment in the first hunk for jammy. Noble got fixed already through
+ upstream stable.
  
  [Testcase]
  
  Deploy a bare metal system with 512 or more cpus.
  
  Install linux-tools:
  
  $ sudo apt install linux-tools-$(uname -r)
  
  Run turbostat:
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There are test kernels available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
  
  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.
  
  [Where problems can occur]
  
  We are simply increasing the rlimit for file descriptors that turbostat
  can open. This should have no impact on any existing systems.
  
  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things are
  fixed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2069961] Re: turbostat fails with too many open files on large systems

2024-08-26 Thread Matthew Ruffell
Noble (6.8) got fixed already in 6.8.0-40-generic as a part of bug
2070349.

** Changed in: linux (Ubuntu Noble)
   Status: In Progress => Fix Released

** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/2069961
  
  [Impact]
  
  On large systems, e.g. with 512 cpus or more, turbostat fails to run due
  to exceeding the rlimit for number of files. 512 cpus requires 1028 file
  descriptors, but the current limit is 999.
  
  $ lscpu
  ...
  CPU(s):  512
    On-line CPU(s) list:   0-511
  ...
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There is no workaround, apart from maybe using powerstat instead.
  
  [Fix]
  
  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty for
  some time to come.
  
  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
  
- This landed in 6.9-rc4, and is a clean cherry pick to jammy and noble.
+ This landed in 6.9-rc4, and is a clean cherry pick to jammy. Noble got
+ fixed already through upstream stable.
  
  [Testcase]
  
  Deploy a bare metal system with 512 or more cpus.
  
  Install linux-tools:
  
  $ sudo apt install linux-tools-$(uname -r)
  
  Run turbostat:
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There are test kernels available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
  
  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.
  
  [Where problems can occur]
  
  We are simply increasing the rlimit for file descriptors that turbostat
  can open. This should have no impact on any existing systems.
  
  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things are
  fixed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Mantic:
  Won't Fix
Status in linux source package in Noble:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2069961

  [Impact]

  On large systems, e.g. with 512 cpus or more, turbostat fails to run
  due to exceeding the rlimit for number of files. 512 cpus requires
  1028 file descriptors, but the current limit is 999.

  $ lscpu
  ...
  CPU(s):  512
    On-line CPU(s) list:   0-511
  ...

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There is no workaround, apart from maybe using powerstat instead.

  [Fix]

  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty
  for some time to come.

  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd

  This landed in 6.9-rc4, and is a clean cherry pick to jammy. Noble got
  fixed already through upstream stable.

  [Testcase]

  Deploy a bare metal system with 512 or more cpus.

  Install linux-tools:

  $ sudo apt install linux-tools-$(uname -r)

  Run turbostat:

  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files

  There are test kernels available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test

  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.

  [Where problems can occur]

  We are simply increasing the rlimit for file descriptors that
  turbostat can open. This should have no impact on any existing
  systems.

  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things
  are fixed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel

[Bug 2069961] Re: turbostat fails with too many open files on large systems

2024-08-26 Thread Matthew Ruffell
Noble (6.8) got fixed already in 6.8.0-40-generic as a part of bug
2070349.

** Changed in: linux (Ubuntu Noble)
   Status: In Progress => Fix Released

** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/2069961
  
  [Impact]
  
  On large systems, e.g. with 512 cpus or more, turbostat fails to run due
  to exceeding the rlimit for number of files. 512 cpus requires 1028 file
  descriptors, but the current limit is 999.
  
  $ lscpu
  ...
  CPU(s):  512
    On-line CPU(s) list:   0-511
  ...
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There is no workaround, apart from maybe using powerstat instead.
  
  [Fix]
  
  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty for
  some time to come.
  
  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
  
- This landed in 6.9-rc4, and is a clean cherry pick to jammy and noble.
+ This landed in 6.9-rc4, and is a clean cherry pick to jammy. Noble got
+ fixed already through upstream stable.
  
  [Testcase]
  
  Deploy a bare metal system with 512 or more cpus.
  
  Install linux-tools:
  
  $ sudo apt install linux-tools-$(uname -r)
  
  Run turbostat:
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There are test kernels available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
  
  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.
  
  [Where problems can occur]
  
  We are simply increasing the rlimit for file descriptors that turbostat
  can open. This should have no impact on any existing systems.
  
  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things are
  fixed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2069961] Re: turbostat and powerstat not working on 22.04

2024-08-26 Thread Matthew Ruffell
** Description changed:

- We're unable to run Turbostat on Turin 2P Volcano systems with core
- count greater than 400 on Ubuntu 22.04.  On Ubuntu 24.04 turbostat works
- fine.
+ BugLink: https://bugs.launchpad.net/bugs/2069961
  
- The following commit fixes the turbostat problem :
+ [Impact]
  
- 
https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git/commit/?h=turbostat&id=3ac1d14d0583a2de75d49a5234d767e2590384dd
+ On large systems, e.g. with 512 cpus or more, turbostat fails to run due
+ to exceeding the rlimit for number of files. 512 cpus requires 1028 file
+ descriptors, but the current limit is 999.
+ 
+ $ lscpu
+ ...
+ CPU(s):  512
+   On-line CPU(s) list:   0-511
+ ...
+ 
+ $ sudo turbostat
+ ...
+ turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
+ 
+ There is no workaround, apart from maybe using powerstat instead.
+ 
+ [Fix]
+ 
+ The fix is to increase the rlimit to increase the amount of file
+ descriptors that turbostat can open to 2^15, which should be plenty for
+ some time to come.
+ 
+ commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
+ Author: Wyes Karny 
+ Date:   Tue Oct 3 05:07:51 2023 +
+ Subject: tools/power turbostat: Increase the limit for fd opened
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
+ 
+ This landed in 6.9-rc2, and is a clean cherry pick to jammy and noble.
+ 
+ [Testcase]
+ 
+ Deploy a bare metal system with 512 or more cpus.
+ 
+ Install linux-tools:
+ 
+ $ sudo apt install linux-tools-$(uname -r)
+ 
+ Run turbostat:
+ 
+ $ sudo turbostat
+ ...
+ turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
+ 
+ There are test kernels available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
+ 
+ If you install them, you should be able to see normal turbostat output
+ for all cpus installed in the system.
+ 
+ [Where problems can occur]
+ 
+ We are simply increasing the rlimit for file descriptors that turbostat
+ can open. This should have no impact on any existing systems.
+ 
+ If a regression should occur, then turbostat functionality might not
+ work. Users could use powerstat instead as a workaround while things are
+ fixed.

** Summary changed:

- turbostat and powerstat not working on 22.04
+ turbostat fails with too many open files on large systems

** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/2069961
  
  [Impact]
  
  On large systems, e.g. with 512 cpus or more, turbostat fails to run due
  to exceeding the rlimit for number of files. 512 cpus requires 1028 file
  descriptors, but the current limit is 999.
  
  $ lscpu
  ...
  CPU(s):  512
-   On-line CPU(s) list:   0-511
+   On-line CPU(s) list:   0-511
  ...
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There is no workaround, apart from maybe using powerstat instead.
  
  [Fix]
  
  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty for
  some time to come.
  
  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
  
- This landed in 6.9-rc2, and is a clean cherry pick to jammy and noble.
+ This landed in 6.9-rc4, and is a clean cherry pick to jammy and noble.
  
  [Testcase]
  
  Deploy a bare metal system with 512 or more cpus.
  
  Install linux-tools:
  
  $ sudo apt install linux-tools-$(uname -r)
  
  Run turbostat:
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There are test kernels available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
  
  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.
  
  [Where problems can occur]
  
  We are simply increasing the rlimit for file descriptors that turbostat
  can open. This should have no impact on any existing systems.
  
  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things are
  fixed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Mantic:
  Won't Fix
Status in linux source p

[Bug 2069961] Re: turbostat and powerstat not working on 22.04

2024-08-26 Thread Matthew Ruffell
** Description changed:

- We're unable to run Turbostat on Turin 2P Volcano systems with core
- count greater than 400 on Ubuntu 22.04.  On Ubuntu 24.04 turbostat works
- fine.
+ BugLink: https://bugs.launchpad.net/bugs/2069961
  
- The following commit fixes the turbostat problem :
+ [Impact]
  
- 
https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git/commit/?h=turbostat&id=3ac1d14d0583a2de75d49a5234d767e2590384dd
+ On large systems, e.g. with 512 cpus or more, turbostat fails to run due
+ to exceeding the rlimit for number of files. 512 cpus requires 1028 file
+ descriptors, but the current limit is 999.
+ 
+ $ lscpu
+ ...
+ CPU(s):  512
+   On-line CPU(s) list:   0-511
+ ...
+ 
+ $ sudo turbostat
+ ...
+ turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
+ 
+ There is no workaround, apart from maybe using powerstat instead.
+ 
+ [Fix]
+ 
+ The fix is to increase the rlimit to increase the amount of file
+ descriptors that turbostat can open to 2^15, which should be plenty for
+ some time to come.
+ 
+ commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
+ Author: Wyes Karny 
+ Date:   Tue Oct 3 05:07:51 2023 +
+ Subject: tools/power turbostat: Increase the limit for fd opened
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
+ 
+ This landed in 6.9-rc2, and is a clean cherry pick to jammy and noble.
+ 
+ [Testcase]
+ 
+ Deploy a bare metal system with 512 or more cpus.
+ 
+ Install linux-tools:
+ 
+ $ sudo apt install linux-tools-$(uname -r)
+ 
+ Run turbostat:
+ 
+ $ sudo turbostat
+ ...
+ turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
+ 
+ There are test kernels available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
+ 
+ If you install them, you should be able to see normal turbostat output
+ for all cpus installed in the system.
+ 
+ [Where problems can occur]
+ 
+ We are simply increasing the rlimit for file descriptors that turbostat
+ can open. This should have no impact on any existing systems.
+ 
+ If a regression should occur, then turbostat functionality might not
+ work. Users could use powerstat instead as a workaround while things are
+ fixed.

** Summary changed:

- turbostat and powerstat not working on 22.04
+ turbostat fails with too many open files on large systems

** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/2069961
  
  [Impact]
  
  On large systems, e.g. with 512 cpus or more, turbostat fails to run due
  to exceeding the rlimit for number of files. 512 cpus requires 1028 file
  descriptors, but the current limit is 999.
  
  $ lscpu
  ...
  CPU(s):  512
-   On-line CPU(s) list:   0-511
+   On-line CPU(s) list:   0-511
  ...
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There is no workaround, apart from maybe using powerstat instead.
  
  [Fix]
  
  The fix is to increase the rlimit to increase the amount of file
  descriptors that turbostat can open to 2^15, which should be plenty for
  some time to come.
  
  commit 3ac1d14d0583a2de75d49a5234d767e2590384dd
  Author: Wyes Karny 
  Date:   Tue Oct 3 05:07:51 2023 +
  Subject: tools/power turbostat: Increase the limit for fd opened
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ac1d14d0583a2de75d49a5234d767e2590384dd
  
- This landed in 6.9-rc2, and is a clean cherry pick to jammy and noble.
+ This landed in 6.9-rc4, and is a clean cherry pick to jammy and noble.
  
  [Testcase]
  
  Deploy a bare metal system with 512 or more cpus.
  
  Install linux-tools:
  
  $ sudo apt install linux-tools-$(uname -r)
  
  Run turbostat:
  
  $ sudo turbostat
  ...
  turbostat: /sys/devices/system/cpu/cpu477/cpuidle/state0/usage: open failed: 
Too many open files
  
  There are test kernels available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf388491-test
  
  If you install them, you should be able to see normal turbostat output
  for all cpus installed in the system.
  
  [Where problems can occur]
  
  We are simply increasing the rlimit for file descriptors that turbostat
  can open. This should have no impact on any existing systems.
  
  If a regression should occur, then turbostat functionality might not
  work. Users could use powerstat instead as a workaround while things are
  fixed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat fails with too many open files on large systems

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.

[Bug 2069961] Re: turbostat and powerstat not working on 22.04

2024-08-26 Thread Matthew Ruffell
** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: linux (Ubuntu Noble)
   Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Noble)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: linux (Ubuntu Noble)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat and powerstat not working on 22.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2069961] Re: turbostat and powerstat not working on 22.04

2024-08-26 Thread Matthew Ruffell
** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: linux (Ubuntu Noble)
   Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Noble)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: linux (Ubuntu Noble)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2069961

Title:
  turbostat and powerstat not working on 22.04

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Mantic:
  Won't Fix
Status in linux source package in Noble:
  In Progress

Bug description:
  We're unable to run Turbostat on Turin 2P Volcano systems with core
  count greater than 400 on Ubuntu 22.04.  On Ubuntu 24.04 turbostat
  works fine.

  The following commit fixes the turbostat problem :

  
https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git/commit/?h=turbostat&id=3ac1d14d0583a2de75d49a5234d767e2590384dd

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069961/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2077145] Re: GDS force mitigation re-enabled in 6.10 causing crashes

2024-08-21 Thread Matthew Ruffell
Hi Tormod,

Could you check 6.11.0-4-generic for Oracular in this particular ppa?

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable

The 6.10 kernel in -proposed will be replaced by this one, or a newer
build eventually.

Just waiting for the kernel team to make the source available in the
normal oracular git repo.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2077145

Title:
  GDS force mitigation re-enabled in 6.10 causing crashes

Status in linux package in Ubuntu:
  New

Bug description:
  The (supposedly unintended) re-enabling of GDS force migration in the
  Ubuntu 6.10 kernels causes the AVX instruction to be disabled on older
  CPUs which have no available microcode update. This causes various
  programs to crash due to the unconditional use of AVX in libgnutls.so,
  libxul.so, etc.

  Typically "traps" of "invalid opcode" will be seen in dmesg output
  along with the initial notice:

  [0.121833] GDS: Microcode update needed! Disabling AVX as mitigation.
  [0.121835] GDS: Mitigation: AVX disabled, no microcode

  When GDS force mitigation appeared in the kernel, with default "y", it
  created a lot of issues like these and Ubuntu quickly patched all
  their kernels, this from the 6.2.0-28.29_6.2.0-31.31 diff:

  ==

  ```
  diff -u linux-6.2.0/debian.master/changelog 
linux-6.2.0/debian.master/changelog
  --- linux-6.2.0/debian.master/changelog
  +++ linux-6.2.0/debian.master/changelog
  @@ -1,3 +1,13 @@
  +linux (6.2.0-31.31) lunar; urgency=medium
  +
  +  * lunar/linux: 6.2.0-31.31 -proposed tracker (LP: #2031146)
  +
  +  * libgnutls report "trap invalid opcode" when trying to install packages 
over
  +https (LP: #2031093)
  +- [Config]: disable CONFIG_GDS_FORCE_MITIGATION
  +
  + -- Thadeu Lima de Souza Cascardo   Mon, 14 Aug 2023 
08:29:52 -0300
  +
   linux (6.2.0-28.29) lunar; urgency=medium

     * lunar/linux: 6.2.0-28.29 -proposed tracker (LP: #2030547)
  diff -u linux-6.2.0/debian.master/config/annotations 
linux-6.2.0/debian.master/config/annotations
  --- linux-6.2.0/debian.master/config/annotations
  +++ linux-6.2.0/debian.master/config/annotations
  @@ -4992,7 +4992,7 @@
   CONFIG_GCC_VERSION  policy<{'amd64': '120200', 
'arm64': '120200', 'armhf': '120200', 'ppc64el': '120200', 'riscv64': '120200', 
's390x': '120200'}>
   CONFIG_GCOV_KERNEL  policy<{'amd64': 'n', 
'arm64': 'n', 'armhf': 'n', 'ppc64el': 'n', 'riscv64': 'n', 's390x': 'n'}>
   CONFIG_GDB_SCRIPTS  policy<{'amd64': 'y', 
'arm64': 'y', 'armhf': 'y', 'ppc64el': 'y', 'riscv64': 'y', 's390x': 'y'}>
  -CONFIG_GDS_FORCE_MITIGATION policy<{'amd64': 'y'}>
  +CONFIG_GDS_FORCE_MITIGATION policy<{'amd64': 'n'}>
   CONFIG_GEMINI_ETHERNET  policy<{'arm64': 'm', 
'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
   CONFIG_GENERIC_ADC_BATTERY  policy<{'amd64': 'm', 
'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
   CONFIG_GENERIC_ADC_THERMAL  policy<{'amd64': 'm', 
'arm64': 'm', 'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
  ```

  ==

  In upstream 6.9 the option was renamed from
  CONFIG_GDS_FORCE_MITIGATION to CONFIG_MITIGATION_GDS_FORCE, but when
  Ubuntu jumped from 6.8 to 6.10, this customization was lost, seen in
  the 6.8.0-31.31_6.10.0-15.15 diff:

  ==

   ```
   CONFIG_GDB_SCRIPTS  policy<{'amd64': 'y', 
'arm64': 'y', 'armhf': 'y', 'ppc64el': 'y', 'riscv64': 'y', 's390x': 'y'}>
  -CONFIG_GDS_FORCE_MITIGATION policy<{'amd64': 'n'}>
   CONFIG_GEMINI_ETHERNET  policy<{'arm64': 'm', 
'armhf': 'm', 'ppc64el': 'm', 'riscv64': 'm'}>
  ...
   CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY  policy<{'arm64': 'y'}>
  +CONFIG_MITIGATION_CALL_DEPTH_TRACKING   policy<{'amd64': 'y'}>
  +CONFIG_MITIGATION_GDS_FORCE policy<{'amd64': 'y'}>
  +CONFIG_MITIGATION_IBPB_ENTRYpolicy<{'amd64': 'y'}>
  +CONFIG_MITIGATION_IBRS_ENTRYpolicy<{'amd64': 'y'}>
  ```

  ==

  I am sure this was an oversight, and that the old option was simply
  dropped because it didn't exist any longer, without thinking of it
  being renamed (among a lot of other renames).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077145/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2077145] Re: GDS force mitigation re-enabled in 6.10 causing crashes

2024-08-21 Thread Matthew Ruffell
Hi Tormod,

Could you check 6.11.0-4-generic for Oracular in this particular ppa?

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable

The 6.10 kernel in -proposed will be replaced by this one, or a newer
build eventually.

Just waiting for the kernel team to make the source available in the
normal oracular git repo.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077145

Title:
  GDS force mitigation re-enabled in 6.10 causing crashes

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077145/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2077044] Re: zap_pid_ns_processes() gets stuck in a busy loop when zombie processes are in namespace

2024-08-14 Thread Matthew Ruffell
This should land in 5.15.0-121-generic and 6.8.0-44-generic.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2077044

Title:
  zap_pid_ns_processes() gets stuck in a busy loop when zombie processes
  are in namespace

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  Fix Committed
Status in linux source package in Noble:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2077044

  [Impact]

  A deadlock can occur in zap_pid_ns_processes() which can hang the
  system due to RCU getting stuck.

  zap_pid_ns_processes() has a busy loop that calls kernel_wait4() on a
  child process of the namespace init task, waiting for it to exit. The
  problem is, it clears TIF_SIGPENDING, but not TIF_NOTIFY_SIGNAL as
  well, leading us to get stuck in the busy loop forever, due to the
  child sleeping in synchronize_rcu(), and is never woken up due to the
  parent being stuck in the busy loop and never calling schedule() or
  rcu_note_context_switch().

  A oops is:

  Watchdog: BUG: soft lockup - CPU#3 stuck for 276s! [rcudeadlock:1836]
  CPU: 3 PID: 1836 Comm: rcudeadlock Tainted: G L
5.15.0-117-generic #127-Ubuntu
  RIP: 0010:_raw_read_lock+0xe/0x30
  Code: f0 0f b1 17 74 08 31 c0 5d c3 cc cc cc cc b8 01 00 00 00 5d c3 cc cc cc 
cc 0f 1f 00 0f 1f 44 00 00 b8 00 02 00 00 f0 0f c1 07  ff 01 00 00 75 05 c3 
cc cc cc cc 55 48 89 e5 e8 4d 79 36 ff 5d
  CR2: 00c0002b
  Call Trace:
   
   ? show_trace_log_lvl+0x1d6/0x2ea
   ? show_trace_log_lvl+0x1d6/0x2ea
   ? kernel_wait4+0xaf/0x150
   ? show_regs.part.0+0x23/0x29
   ? show_regs.cold+0x8/0xd
   ? watchdog_timer_fn+0x1be/0x220
   ? lockup_detector_update_enable+0x60/0x60
   ? __hrtimer_run_queues+0x107/0x230
   ? read_hv_clock_tsc_cs+0x9/0x30
   ? hrtimer_interrupt+0x101/0x220
   ? hv_stimer0_isr+0x20/0x30
   ? __sysvec_hyperv_stimer0+0x32/0x70
   ? sysvec_hyperv_stimer0+0x7b/0x90
   
   
   ? asm_sysvec_hyperv_stimer0+0x1b/0x20
   ? _raw_read_lock+0xe/0x30
   ? do_wait+0xa0/0x310
   kernel_wait4+0xaf/0x150
   ? thread_group_exited+0x50/0x50
   zap_pid_ns_processes+0x111/0x1a0
   forget_original_parent+0x348/0x360
   exit_notify+0x4a/0x210
   do_exit+0x24f/0x3c0
   do_group_exit+0x3b/0xb0
   get_signal+0x150/0x900
   arch_do_signal_or_restart+0xde/0x100
   ? __x64_sys_futex+0x78/0x1e0
   exit_to_user_mode_loop+0xc4/0x160
   exit_to_user_mode_prepare+0xa3/0xb0
   syscall_exit_to_user_mode+0x27/0x50
   ? x64_sys_call+0x1022/0x1fa0
   do_syscall_64+0x63/0xb0
   ? __io_uring_add_tctx_node+0x111/0x1a0
   ? fput+0x13/0x20
   ? __do_sys_io_uring_enter+0x10d/0x540
   ? __smp_call_single_queue+0x59/0x90
   ? exit_to_user_mode_prepare+0x37/0xb0
   ? syscall_exit_to_user_mode+0x2c/0x50
   ? x64_sys_call+0x1819/0x1fa0
   ? do_syscall_64+0x63/0xb0
   ? try_to_wake_up+0x200/0x5a0
   ? wake_up_q+0x50/0x90
   ? futex_wake+0x159/0x190
   ? do_futex+0x162/0x1f0
   ? __x64_sys_futex+0x78/0x1e0
   ? switch_fpu_return+0x4e/0xc0
   ? exit_to_user_mode_prepare+0x37/0xb0
   ? syscall_exit_to_user_mode+0x2c/0x50
   ? x64_sys_call+0x1022/0x1fa0
   ? do_syscall_64+0x63/0xb0
   ? do_user_addr_fault+0x1e7/0x670
   ? exit_to_user_mode_prepare+0x37/0xb0
   ? irqentry_exit_to_user_mode+0xe/0x20
   ? irqentry_exit+0x1d/0x30
   ? exc_page_fault+0x89/0x170
   entry_SYSCALL_64_after_hwframe+0x6c/0xd6
   

  There is no known workaround.

  [Fix]

  This was fixed in the below commit in 6.10-rc5:

  commit 7fea700e04bd3f424c2d836e98425782f97b494e
  Author: Oleg Nesterov 
  Date:   Sat Jun 8 14:06:16 2024 +0200
  Subject: zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with 
TIF_SIGPENDING
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fea700e04bd3f424c2d836e98425782f97b494e

  This patch has made its way to upstream stable, and is already applied to 
Ubuntu
  kernels.

  [Testcase]

  There are two possible testcases to reproduce this issue.
  This reproducer is courtesy of Rachel Menge, using the reproducers in her 
github repo:

  https://github.com/rlmenge/rcu-soft-lock-issue-repro

  Start a Jammy or Noble VM on Azure, D8sV3 will be plenty.

  $ git clone https://github.com/rlmenge/rcu-soft-lock-issue-repro.git

  npm repro:

  Install Docker.

  $ sudo docker run telescope.azurecr.io/issue-repro/zombie:v1.1.11
  $ ./rcu-npm-repro.sh

  go repro:

  $ go mod init rcudeadlock.go
  $ go mod tidy
  $ CGO_ENABLED=0 go build -o ./rcudeadlock ./
  $ sudo ./rcudeadlock

  Look at dmesg. After some minutes, you should see the hung task
  timeout from the impact section.

  [Where problems can occur]

  We are clearing TIF_NOTIFY_SIGNAL in the child, in order for signal_pending() 
to return false and not lead us to a busy wait loop.
  This change should work as intended.

  If a regression were to occur, it could potentially affect all
  processes in namespaces.

  [Other 

[Bug 2077044] Re: zap_pid_ns_processes() gets stuck in a busy loop when zombie processes are in namespace

2024-08-14 Thread Matthew Ruffell
This should land in 5.15.0-121-generic and 6.8.0-44-generic.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077044

Title:
  zap_pid_ns_processes() gets stuck in a busy loop when zombie processes
  are in namespace

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077044/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2077044] [NEW] zap_pid_ns_processes() gets stuck in a busy loop when zombie processes are in namespace

2024-08-14 Thread Matthew Ruffell
Public bug reported:

BugLink: https://bugs.launchpad.net/bugs/2077044

[Impact]

A deadlock can occur in zap_pid_ns_processes() which can hang the system
due to RCU getting stuck.

zap_pid_ns_processes() has a busy loop that calls kernel_wait4() on a
child process of the namespace init task, waiting for it to exit. The
problem is, it clears TIF_SIGPENDING, but not TIF_NOTIFY_SIGNAL as well,
leading us to get stuck in the busy loop forever, due to the child
sleeping in synchronize_rcu(), and is never woken up due to the parent
being stuck in the busy loop and never calling schedule() or
rcu_note_context_switch().

A oops is:

Watchdog: BUG: soft lockup - CPU#3 stuck for 276s! [rcudeadlock:1836]
CPU: 3 PID: 1836 Comm: rcudeadlock Tainted: G L
5.15.0-117-generic #127-Ubuntu
RIP: 0010:_raw_read_lock+0xe/0x30
Code: f0 0f b1 17 74 08 31 c0 5d c3 cc cc cc cc b8 01 00 00 00 5d c3 cc cc cc 
cc 0f 1f 00 0f 1f 44 00 00 b8 00 02 00 00 f0 0f c1 07  ff 01 00 00 75 05 c3 
cc cc cc cc 55 48 89 e5 e8 4d 79 36 ff 5d
CR2: 00c0002b
Call Trace:
 
 ? show_trace_log_lvl+0x1d6/0x2ea
 ? show_trace_log_lvl+0x1d6/0x2ea
 ? kernel_wait4+0xaf/0x150
 ? show_regs.part.0+0x23/0x29
 ? show_regs.cold+0x8/0xd
 ? watchdog_timer_fn+0x1be/0x220
 ? lockup_detector_update_enable+0x60/0x60
 ? __hrtimer_run_queues+0x107/0x230
 ? read_hv_clock_tsc_cs+0x9/0x30
 ? hrtimer_interrupt+0x101/0x220
 ? hv_stimer0_isr+0x20/0x30
 ? __sysvec_hyperv_stimer0+0x32/0x70
 ? sysvec_hyperv_stimer0+0x7b/0x90
 
 
 ? asm_sysvec_hyperv_stimer0+0x1b/0x20
 ? _raw_read_lock+0xe/0x30
 ? do_wait+0xa0/0x310
 kernel_wait4+0xaf/0x150
 ? thread_group_exited+0x50/0x50
 zap_pid_ns_processes+0x111/0x1a0
 forget_original_parent+0x348/0x360
 exit_notify+0x4a/0x210
 do_exit+0x24f/0x3c0
 do_group_exit+0x3b/0xb0
 get_signal+0x150/0x900
 arch_do_signal_or_restart+0xde/0x100
 ? __x64_sys_futex+0x78/0x1e0
 exit_to_user_mode_loop+0xc4/0x160
 exit_to_user_mode_prepare+0xa3/0xb0
 syscall_exit_to_user_mode+0x27/0x50
 ? x64_sys_call+0x1022/0x1fa0
 do_syscall_64+0x63/0xb0
 ? __io_uring_add_tctx_node+0x111/0x1a0
 ? fput+0x13/0x20
 ? __do_sys_io_uring_enter+0x10d/0x540
 ? __smp_call_single_queue+0x59/0x90
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x2c/0x50
 ? x64_sys_call+0x1819/0x1fa0
 ? do_syscall_64+0x63/0xb0
 ? try_to_wake_up+0x200/0x5a0
 ? wake_up_q+0x50/0x90
 ? futex_wake+0x159/0x190
 ? do_futex+0x162/0x1f0
 ? __x64_sys_futex+0x78/0x1e0
 ? switch_fpu_return+0x4e/0xc0
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x2c/0x50
 ? x64_sys_call+0x1022/0x1fa0
 ? do_syscall_64+0x63/0xb0
 ? do_user_addr_fault+0x1e7/0x670
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? irqentry_exit_to_user_mode+0xe/0x20
 ? irqentry_exit+0x1d/0x30
 ? exc_page_fault+0x89/0x170
 entry_SYSCALL_64_after_hwframe+0x6c/0xd6
 

There is no known workaround.

[Fix]

This was fixed in the below commit in 6.10-rc5:

commit 7fea700e04bd3f424c2d836e98425782f97b494e
Author: Oleg Nesterov 
Date:   Sat Jun 8 14:06:16 2024 +0200
Subject: zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fea700e04bd3f424c2d836e98425782f97b494e

This patch has made its way to upstream stable, and is already applied to Ubuntu
kernels.

[Testcase]

There are two possible testcases to reproduce this issue.
This reproducer is courtesy of Rachel Menge, using the reproducers in her 
github repo:

https://github.com/rlmenge/rcu-soft-lock-issue-repro

Start a Jammy or Noble VM on Azure, D8sV3 will be plenty.

$ git clone https://github.com/rlmenge/rcu-soft-lock-issue-repro.git

npm repro:

Install Docker.

$ sudo docker run telescope.azurecr.io/issue-repro/zombie:v1.1.11
$ ./rcu-npm-repro.sh

go repro:

$ go mod init rcudeadlock.go
$ go mod tidy
$ CGO_ENABLED=0 go build -o ./rcudeadlock ./
$ sudo ./rcudeadlock

Look at dmesg. After some minutes, you should see the hung task timeout
from the impact section.

[Where problems can occur]

We are clearing TIF_NOTIFY_SIGNAL in the child, in order for signal_pending() 
to return false and not lead us to a busy wait loop.
This change should work as intended.

If a regression were to occur, it could potentially affect all processes
in namespaces.

[Other Info]

Upstream mailing list discussion:
https://lore.kernel.org/linux-kernel/1386cd49-36d0-4a5c-85e9-bc42056a5...@linux.microsoft.com/T/

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Fix Released

** Affects: linux (Ubuntu Jammy)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: Fix Committed

** Affects: linux (Ubuntu Noble)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: Fix Committed


** Tags: sts

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu

[Bug 2077044] [NEW] zap_pid_ns_processes() gets stuck in a busy loop when zombie processes are in namespace

2024-08-14 Thread Matthew Ruffell
Public bug reported:

BugLink: https://bugs.launchpad.net/bugs/2077044

[Impact]

A deadlock can occur in zap_pid_ns_processes() which can hang the system
due to RCU getting stuck.

zap_pid_ns_processes() has a busy loop that calls kernel_wait4() on a
child process of the namespace init task, waiting for it to exit. The
problem is, it clears TIF_SIGPENDING, but not TIF_NOTIFY_SIGNAL as well,
leading us to get stuck in the busy loop forever, due to the child
sleeping in synchronize_rcu(), and is never woken up due to the parent
being stuck in the busy loop and never calling schedule() or
rcu_note_context_switch().

A oops is:

Watchdog: BUG: soft lockup - CPU#3 stuck for 276s! [rcudeadlock:1836]
CPU: 3 PID: 1836 Comm: rcudeadlock Tainted: G L
5.15.0-117-generic #127-Ubuntu
RIP: 0010:_raw_read_lock+0xe/0x30
Code: f0 0f b1 17 74 08 31 c0 5d c3 cc cc cc cc b8 01 00 00 00 5d c3 cc cc cc 
cc 0f 1f 00 0f 1f 44 00 00 b8 00 02 00 00 f0 0f c1 07  ff 01 00 00 75 05 c3 
cc cc cc cc 55 48 89 e5 e8 4d 79 36 ff 5d
CR2: 00c0002b
Call Trace:
 
 ? show_trace_log_lvl+0x1d6/0x2ea
 ? show_trace_log_lvl+0x1d6/0x2ea
 ? kernel_wait4+0xaf/0x150
 ? show_regs.part.0+0x23/0x29
 ? show_regs.cold+0x8/0xd
 ? watchdog_timer_fn+0x1be/0x220
 ? lockup_detector_update_enable+0x60/0x60
 ? __hrtimer_run_queues+0x107/0x230
 ? read_hv_clock_tsc_cs+0x9/0x30
 ? hrtimer_interrupt+0x101/0x220
 ? hv_stimer0_isr+0x20/0x30
 ? __sysvec_hyperv_stimer0+0x32/0x70
 ? sysvec_hyperv_stimer0+0x7b/0x90
 
 
 ? asm_sysvec_hyperv_stimer0+0x1b/0x20
 ? _raw_read_lock+0xe/0x30
 ? do_wait+0xa0/0x310
 kernel_wait4+0xaf/0x150
 ? thread_group_exited+0x50/0x50
 zap_pid_ns_processes+0x111/0x1a0
 forget_original_parent+0x348/0x360
 exit_notify+0x4a/0x210
 do_exit+0x24f/0x3c0
 do_group_exit+0x3b/0xb0
 get_signal+0x150/0x900
 arch_do_signal_or_restart+0xde/0x100
 ? __x64_sys_futex+0x78/0x1e0
 exit_to_user_mode_loop+0xc4/0x160
 exit_to_user_mode_prepare+0xa3/0xb0
 syscall_exit_to_user_mode+0x27/0x50
 ? x64_sys_call+0x1022/0x1fa0
 do_syscall_64+0x63/0xb0
 ? __io_uring_add_tctx_node+0x111/0x1a0
 ? fput+0x13/0x20
 ? __do_sys_io_uring_enter+0x10d/0x540
 ? __smp_call_single_queue+0x59/0x90
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x2c/0x50
 ? x64_sys_call+0x1819/0x1fa0
 ? do_syscall_64+0x63/0xb0
 ? try_to_wake_up+0x200/0x5a0
 ? wake_up_q+0x50/0x90
 ? futex_wake+0x159/0x190
 ? do_futex+0x162/0x1f0
 ? __x64_sys_futex+0x78/0x1e0
 ? switch_fpu_return+0x4e/0xc0
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x2c/0x50
 ? x64_sys_call+0x1022/0x1fa0
 ? do_syscall_64+0x63/0xb0
 ? do_user_addr_fault+0x1e7/0x670
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? irqentry_exit_to_user_mode+0xe/0x20
 ? irqentry_exit+0x1d/0x30
 ? exc_page_fault+0x89/0x170
 entry_SYSCALL_64_after_hwframe+0x6c/0xd6
 

There is no known workaround.

[Fix]

This was fixed in the below commit in 6.10-rc5:

commit 7fea700e04bd3f424c2d836e98425782f97b494e
Author: Oleg Nesterov 
Date:   Sat Jun 8 14:06:16 2024 +0200
Subject: zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fea700e04bd3f424c2d836e98425782f97b494e

This patch has made its way to upstream stable, and is already applied to Ubuntu
kernels.

[Testcase]

There are two possible testcases to reproduce this issue.
This reproducer is courtesy of Rachel Menge, using the reproducers in her 
github repo:

https://github.com/rlmenge/rcu-soft-lock-issue-repro

Start a Jammy or Noble VM on Azure, D8sV3 will be plenty.

$ git clone https://github.com/rlmenge/rcu-soft-lock-issue-repro.git

npm repro:

Install Docker.

$ sudo docker run telescope.azurecr.io/issue-repro/zombie:v1.1.11
$ ./rcu-npm-repro.sh

go repro:

$ go mod init rcudeadlock.go
$ go mod tidy
$ CGO_ENABLED=0 go build -o ./rcudeadlock ./
$ sudo ./rcudeadlock

Look at dmesg. After some minutes, you should see the hung task timeout
from the impact section.

[Where problems can occur]

We are clearing TIF_NOTIFY_SIGNAL in the child, in order for signal_pending() 
to return false and not lead us to a busy wait loop.
This change should work as intended.

If a regression were to occur, it could potentially affect all processes
in namespaces.

[Other Info]

Upstream mailing list discussion:
https://lore.kernel.org/linux-kernel/1386cd49-36d0-4a5c-85e9-bc42056a5...@linux.microsoft.com/T/

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Fix Released

** Affects: linux (Ubuntu Jammy)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: Fix Committed

** Affects: linux (Ubuntu Noble)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: Fix Committed


** Tags: sts

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu

[Bug 2076957] Re: isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity

2024-08-13 Thread Matthew Ruffell
Patches are on kernel team mailing list.

https://lists.ubuntu.com/archives/kernel-team/2024-August/152811.html
https://lists.ubuntu.com/archives/kernel-team/2024-August/152812.html

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076957

Title:
  isolcpus are ignored when using cgroups V2, causing processes to have
  wrong affinity

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076957/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2076957] Re: isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity

2024-08-13 Thread Matthew Ruffell
Patches are on kernel team mailing list.

https://lists.ubuntu.com/archives/kernel-team/2024-August/152811.html
https://lists.ubuntu.com/archives/kernel-team/2024-August/152812.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2076957

Title:
  isolcpus are ignored when using cgroups V2, causing processes to have
  wrong affinity

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Jammy:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2076957

  [Impact]

  In latency sensitive environments, it is very common to use isolcpus
  to reserve a set of cpus that no other processes are to be placed on,
  and run just dpdk in poll mode.

  There is a bug in the jammy kernel, where if cgroups V2 are enabled,
  after several minutes the kernel will place other processes onto these
  reserved isolcpus at random. This disturbs dpdk and introduces
  latency.

  The issue does not occur with cgroups V1, so a workaround is to use
  cgroups V1 instead of V2 for the moment.

  [Fix]

  I arrived at this commit after a full git bisect, which fixes the
  issue. It landed in 6.2-rc1:

  commit 7fd4da9c1584be97ffbc40e600a19cb469fd4e78
  Author: Waiman Long 
  Date:   Sat Nov 12 17:19:39 2022 -0500
  Subject: cgroup/cpuset: Optimize cpuset_attach() on v2
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fd4da9c1584be97ffbc40e600a19cb469fd4e78

  Only the 5.15 Jammy kernel needs this fix. Focal works correctly as
  is.

  The commit skips calls to cpuset_attach() if the underlying cpusets or
  memory have not changed in a cgroup, and it seems to fix the issue.

  [Testcase]

  Deploy a bare metal server, ideally with a number of cores, 56 should be 
plenty.
  Use Jammy, with the 5.15 GA kernel.

  1) Edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT to have
  "isolcpus=4-7,32-35 rcu_nocb_poll rcu_nocbs=4-7,32-35 
systemd.unified_cgroup_hierarchy=1"
  2) sudo reboot
  3) sudo cat /sys/devices/system/cpu/isolated
  4-7,32-35
  4) sudo apt install s-tui stress
  5) sudo s-tui
  6) htop
  7) $ while true; do sudo ps -eLF | head -n 1; sudo ps -eLF | grep stress | 
awk -v a="4" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="5" '$9 
== a {print;}'; sudo ps -eLF | grep stress | awk -v a="6" '$9 == a {print;}'; 
sudo ps -eLF | grep stress | awk -v a="7" '$9 == a {print;}'; sudo ps -eLF | 
grep stress | awk -v a="32" '$9 == a {print;}'; sudo ps -eLF | grep stress | 
awk -v a="33" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="34" 
'$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="35" '$9 == a 
{print;}'; sleep 5; done

  Setup isolcpus to separate off 4-7 and 32-35, so each NUMA node has a
  set of isolated CPUs.

  s-tui is a great frontend for stress, and it starts stress processes.
  All stress processes should initially be on non-isolated CPUs, confirm
  this with htop, that 4-7 and 32-25 are at 0% while every other cpu is
  at 100%.

  After 3 minutes, but sometimes it takes up to 10 minutes, a stress
  process, or the s-tui process will be incorrectly placed onto an
  isolated cpu, causing it to increase in usage in htop. The while
  script checking ps with cpu affinities will also likely be printing
  the incorrectly placed process.

  A test kernel is available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf391137-test

  If you install it, the processes will not be placed onto the isolated
  cpus.

  [Where problems could occur]

  The patch changes how cgroups determines when cpuset_attach() should
  be called. cpuset_attach() is currently called very frequently in the
  5.15 Jammy kernel, but most operations should be NOP due to no changes
  occurring in cpusets or memory in the cgroup the process is attached
  to. We are changing it to instead skip calling cpuset_attach() if
  there are no changes, which should offer a small performance increase,
  as well as fixing this isolcpus bug.

  If a regression were to occur, it would affect cgroups V2 only, and it
  could cause resource limits to be applied incorrectly in the worst
  case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076957/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2076957] [NEW] isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity

2024-08-13 Thread Matthew Ruffell
Public bug reported:

BugLink: https://bugs.launchpad.net/bugs/2076957

[Impact]

In latency sensitive environments, it is very common to use isolcpus to
reserve a set of cpus that no other processes are to be placed on, and
run just dpdk in poll mode.

There is a bug in the jammy kernel, where if cgroups V2 are enabled,
after several minutes the kernel will place other processes onto these
reserved isolcpus at random. This disturbs dpdk and introduces latency.

The issue does not occur with cgroups V1, so a workaround is to use
cgroups V1 instead of V2 for the moment.

[Fix]

I arrived at this commit after a full git bisect, which fixes the issue.
It landed in 6.2-rc1:

commit 7fd4da9c1584be97ffbc40e600a19cb469fd4e78
Author: Waiman Long 
Date:   Sat Nov 12 17:19:39 2022 -0500
Subject: cgroup/cpuset: Optimize cpuset_attach() on v2
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fd4da9c1584be97ffbc40e600a19cb469fd4e78

Only the 5.15 Jammy kernel needs this fix. Focal works correctly as is.

The commit skips calls to cpuset_attach() if the underlying cpusets or
memory have not changed in a cgroup, and it seems to fix the issue.

[Testcase]

Deploy a bare metal server, ideally with a number of cores, 56 should be plenty.
Use Jammy, with the 5.15 GA kernel.

1) Edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT to have
"isolcpus=4-7,32-35 rcu_nocb_poll rcu_nocbs=4-7,32-35 
systemd.unified_cgroup_hierarchy=1"
2) sudo reboot
3) sudo cat /sys/devices/system/cpu/isolated
4-7,32-35
4) sudo apt install s-tui stress
5) sudo s-tui
6) htop
7) $ while true; do sudo ps -eLF | head -n 1; sudo ps -eLF | grep stress | awk 
-v a="4" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="5" '$9 == a 
{print;}'; sudo ps -eLF | grep stress | awk -v a="6" '$9 == a {print;}'; sudo 
ps -eLF | grep stress | awk -v a="7" '$9 == a {print;}'; sudo ps -eLF | grep 
stress | awk -v a="32" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v 
a="33" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="34" '$9 == a 
{print;}'; sudo ps -eLF | grep stress | awk -v a="35" '$9 == a {print;}'; sleep 
5; done

Setup isolcpus to separate off 4-7 and 32-35, so each NUMA node has a
set of isolated CPUs.

s-tui is a great frontend for stress, and it starts stress processes.
All stress processes should initially be on non-isolated CPUs, confirm
this with htop, that 4-7 and 32-25 are at 0% while every other cpu is at
100%.

After 3 minutes, but sometimes it takes up to 10 minutes, a stress
process, or the s-tui process will be incorrectly placed onto an
isolated cpu, causing it to increase in usage in htop. The while script
checking ps with cpu affinities will also likely be printing the
incorrectly placed process.

A test kernel is available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf391137-test

If you install it, the processes will not be placed onto the isolated
cpus.

[Where problems could occur]

The patch changes how cgroups determines when cpuset_attach() should be
called. cpuset_attach() is currently called very frequently in the 5.15
Jammy kernel, but most operations should be NOP due to no changes
occurring in cpusets or memory in the cgroup the process is attached to.
We are changing it to instead skip calling cpuset_attach() if there are
no changes, which should offer a small performance increase, as well as
fixing this isolcpus bug.

If a regression were to occur, it would affect cgroups V2 only, and it
could cause resource limits to be applied incorrectly in the worst case.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Fix Released

** Affects: linux (Ubuntu Jammy)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress


** Tags: jammy sts

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Description changed:

- BugLink: https://bugs.launchpad.net/bugs/
+ BugLink: https://bugs.launchpad.net/bugs/2076957
  
  [Impact]
  
  In latency sensitive environments, it is very common to use isolcpus to
  reserve a set of cpus that no other processes are to be placed on, and
  run just dpdk in poll mode.
  
  There is a bug in the jammy kernel, where if cgroups V2 are enabled,
  after several minutes the kernel will place other processes onto these
  reserved isolcpus at random. This disturbs dpdk and introduces latency.
  
  The issue does not occur with cgroups V1, so a workar

[Kernel-packages] [Bug 2076957] [NEW] isolcpus are ignored when using cgroups V2, causing processes to have wrong affinity

2024-08-13 Thread Matthew Ruffell
Public bug reported:

BugLink: https://bugs.launchpad.net/bugs/2076957

[Impact]

In latency sensitive environments, it is very common to use isolcpus to
reserve a set of cpus that no other processes are to be placed on, and
run just dpdk in poll mode.

There is a bug in the jammy kernel, where if cgroups V2 are enabled,
after several minutes the kernel will place other processes onto these
reserved isolcpus at random. This disturbs dpdk and introduces latency.

The issue does not occur with cgroups V1, so a workaround is to use
cgroups V1 instead of V2 for the moment.

[Fix]

I arrived at this commit after a full git bisect, which fixes the issue.
It landed in 6.2-rc1:

commit 7fd4da9c1584be97ffbc40e600a19cb469fd4e78
Author: Waiman Long 
Date:   Sat Nov 12 17:19:39 2022 -0500
Subject: cgroup/cpuset: Optimize cpuset_attach() on v2
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fd4da9c1584be97ffbc40e600a19cb469fd4e78

Only the 5.15 Jammy kernel needs this fix. Focal works correctly as is.

The commit skips calls to cpuset_attach() if the underlying cpusets or
memory have not changed in a cgroup, and it seems to fix the issue.

[Testcase]

Deploy a bare metal server, ideally with a number of cores, 56 should be plenty.
Use Jammy, with the 5.15 GA kernel.

1) Edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT to have
"isolcpus=4-7,32-35 rcu_nocb_poll rcu_nocbs=4-7,32-35 
systemd.unified_cgroup_hierarchy=1"
2) sudo reboot
3) sudo cat /sys/devices/system/cpu/isolated
4-7,32-35
4) sudo apt install s-tui stress
5) sudo s-tui
6) htop
7) $ while true; do sudo ps -eLF | head -n 1; sudo ps -eLF | grep stress | awk 
-v a="4" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="5" '$9 == a 
{print;}'; sudo ps -eLF | grep stress | awk -v a="6" '$9 == a {print;}'; sudo 
ps -eLF | grep stress | awk -v a="7" '$9 == a {print;}'; sudo ps -eLF | grep 
stress | awk -v a="32" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v 
a="33" '$9 == a {print;}'; sudo ps -eLF | grep stress | awk -v a="34" '$9 == a 
{print;}'; sudo ps -eLF | grep stress | awk -v a="35" '$9 == a {print;}'; sleep 
5; done

Setup isolcpus to separate off 4-7 and 32-35, so each NUMA node has a
set of isolated CPUs.

s-tui is a great frontend for stress, and it starts stress processes.
All stress processes should initially be on non-isolated CPUs, confirm
this with htop, that 4-7 and 32-25 are at 0% while every other cpu is at
100%.

After 3 minutes, but sometimes it takes up to 10 minutes, a stress
process, or the s-tui process will be incorrectly placed onto an
isolated cpu, causing it to increase in usage in htop. The while script
checking ps with cpu affinities will also likely be printing the
incorrectly placed process.

A test kernel is available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf391137-test

If you install it, the processes will not be placed onto the isolated
cpus.

[Where problems could occur]

The patch changes how cgroups determines when cpuset_attach() should be
called. cpuset_attach() is currently called very frequently in the 5.15
Jammy kernel, but most operations should be NOP due to no changes
occurring in cpusets or memory in the cgroup the process is attached to.
We are changing it to instead skip calling cpuset_attach() if there are
no changes, which should offer a small performance increase, as well as
fixing this isolcpus bug.

If a regression were to occur, it would affect cgroups V2 only, and it
could cause resource limits to be applied incorrectly in the worst case.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Fix Released

** Affects: linux (Ubuntu Jammy)
 Importance: Medium
 Assignee: Matthew Ruffell (mruffell)
 Status: In Progress


** Tags: jammy sts

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Description changed:

- BugLink: https://bugs.launchpad.net/bugs/
+ BugLink: https://bugs.launchpad.net/bugs/2076957
  
  [Impact]
  
  In latency sensitive environments, it is very common to use isolcpus to
  reserve a set of cpus that no other processes are to be placed on, and
  run just dpdk in poll mode.
  
  There is a bug in the jammy kernel, where if cgroups V2 are enabled,
  after several minutes the kernel will place other processes onto these
  reserved isolcpus at random. This disturbs dpdk and introduces latency.
  
  The issue does not occur with cgroups V1, so a workar

[Ubuntu-x-swat] [Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-08-12 Thread Matthew Ruffell
Hi Doug,

You have done some awesome work! Thank you very much for debugging and
opening a merge request upstream.

I can reproduce the issue, and yes, your patch with the help of previous
authors does fix the issue.

Hopefully we can try and get the attention of the maintainers, and see
if they are interested in pulling the patch in.

In the meantime, I built some test packages to share if anyone wants to
try the patch out.

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING
PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to Install (On a focal, jammy, noble or oracular system):
1) sudo add-apt-repository ppa:mruffell/sf392117-test
2) sudo apt update
3) sudo apt install xserver-common xserver-xephyr xserver-xorg-core 
xserver-xorg-legacy
4) sudo apt-cache policy xserver-common | grep Installed
Oracular:
2:21.1.12-1ubuntu1+sf392117v20240813b1
Noble:
2:21.1.12-1ubuntu1+sf392117v20240813b0
Jammy:
2:21.1.4-2ubuntu1.7~22.04.11+sf392117v20240813b1 
Focal:
2:1.20.13-1ubuntu1~20.04.17+sf392117v20240813b1 

You probably want to run it in a VM. Probably best to reboot after installing
before trying to reproduce.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Desktop-packages] [Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-08-12 Thread Matthew Ruffell
Hi Doug,

You have done some awesome work! Thank you very much for debugging and
opening a merge request upstream.

I can reproduce the issue, and yes, your patch with the help of previous
authors does fix the issue.

Hopefully we can try and get the attention of the maintainers, and see
if they are interested in pulling the patch in.

In the meantime, I built some test packages to share if anyone wants to
try the patch out.

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING
PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to Install (On a focal, jammy, noble or oracular system):
1) sudo add-apt-repository ppa:mruffell/sf392117-test
2) sudo apt update
3) sudo apt install xserver-common xserver-xephyr xserver-xorg-core 
xserver-xorg-legacy
4) sudo apt-cache policy xserver-common | grep Installed
Oracular:
2:21.1.12-1ubuntu1+sf392117v20240813b1
Noble:
2:21.1.12-1ubuntu1+sf392117v20240813b0
Jammy:
2:21.1.4-2ubuntu1.7~22.04.11+sf392117v20240813b1 
Focal:
2:1.20.13-1ubuntu1~20.04.17+sf392117v20240813b1 

You probably want to run it in a VM. Probably best to reboot after installing
before trying to reproduce.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

Status in X.Org X server:
  New
Status in xorg-server package in Ubuntu:
  Confirmed
Status in xorg-server source package in Focal:
  New
Status in xorg-server source package in Jammy:
  New
Status in xorg-server source package in Noble:
  New
Status in xorg-server source package in Oracular:
  Confirmed

Bug description:
  Xorg crashed with assertion failure (usually in a VM):

  privates.h:121: dixGetPrivateAddr: Assertion `key->initialized'
  failed.

  WORKAROUND

  Select 'Ubuntu on Wayland' on the login screen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-08-12 Thread Matthew Ruffell
Hi Doug,

You have done some awesome work! Thank you very much for debugging and
opening a merge request upstream.

I can reproduce the issue, and yes, your patch with the help of previous
authors does fix the issue.

Hopefully we can try and get the attention of the maintainers, and see
if they are interested in pulling the patch in.

In the meantime, I built some test packages to share if anyone wants to
try the patch out.

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING
PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to Install (On a focal, jammy, noble or oracular system):
1) sudo add-apt-repository ppa:mruffell/sf392117-test
2) sudo apt update
3) sudo apt install xserver-common xserver-xephyr xserver-xorg-core 
xserver-xorg-legacy
4) sudo apt-cache policy xserver-common | grep Installed
Oracular:
2:21.1.12-1ubuntu1+sf392117v20240813b1
Noble:
2:21.1.12-1ubuntu1+sf392117v20240813b0
Jammy:
2:21.1.4-2ubuntu1.7~22.04.11+sf392117v20240813b1 
Focal:
2:1.20.13-1ubuntu1~20.04.17+sf392117v20240813b1 

You probably want to run it in a VM. Probably best to reboot after installing
before trying to reproduce.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Ubuntu-x-swat] [Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-08-12 Thread Matthew Ruffell
** Also affects: xorg-server (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Oracular)
   Importance: High
   Status: Confirmed

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-08-12 Thread Matthew Ruffell
** Also affects: xorg-server (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Oracular)
   Importance: High
   Status: Confirmed

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Desktop-packages] [Bug 1861609] Re: Xorg crashed with assertion failure (usually in a VM) at [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized' failed]

2024-08-12 Thread Matthew Ruffell
** Also affects: xorg-server (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: xorg-server (Ubuntu Oracular)
   Importance: High
   Status: Confirmed

** Tags added: sts

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xorg-server in Ubuntu.
https://bugs.launchpad.net/bugs/1861609

Title:
  Xorg crashed with assertion failure (usually in a VM) at
  [privates.h:121/122: dixGetPrivateAddr: Assertion `key->initialized'
  failed]

Status in X.Org X server:
  New
Status in xorg-server package in Ubuntu:
  Confirmed
Status in xorg-server source package in Focal:
  New
Status in xorg-server source package in Jammy:
  New
Status in xorg-server source package in Noble:
  New
Status in xorg-server source package in Oracular:
  Confirmed

Bug description:
  Xorg crashed with assertion failure (usually in a VM):

  privates.h:121: dixGetPrivateAddr: Assertion `key->initialized'
  failed.

  WORKAROUND

  Select 'Ubuntu on Wayland' on the login screen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/xorg-server/+bug/1861609/+subscriptions


-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-12 Thread Matthew Ruffell
Performing verification for Jammy.

So, I had forgotten all about these instances, so let's check them now.

The instance with -updates:

It failed with:
+ resize2fs /dev/nvme1n1p1
resize2fs 1.46.5 (30-Dec-2021)
resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
Couldn't find valid filesystem superblock.

the timestamp is Thu Aug  1 06:59:14 UTC 2024

I started it on... Thu Aug 1 06:02:48 UTC 2024

58 minutes! I was expecting a couple of days for Jammy -updates to be honest.
I really should have checked earlier.

So we can reproduce the issue.

I then logged into the -proposed instance, with resize2fs
1.46.5-2ubuntu1.2:

It is still running, as of Mon Aug 12 10:54:06 UTC 2024.

This is fantastic. It survived 11 days and 5 hours in the high EBS traffic
us-west-2.

The package in -proposed fixes the issue. Happy to mark verified for
Jammy.

** Tags removed: verification-needed verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Fix Released
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  Fix Committed
Status in e2fsprogs source package in Jammy:
  Fix Committed
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  Won't Fix
Status in e2fsprogs source package in Noble:
  Fix Committed
Status in e2fsprogs source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-12 Thread Matthew Ruffell
Performing verification for Jammy.

So, I had forgotten all about these instances, so let's check them now.

The instance with -updates:

It failed with:
+ resize2fs /dev/nvme1n1p1
resize2fs 1.46.5 (30-Dec-2021)
resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
Couldn't find valid filesystem superblock.

the timestamp is Thu Aug  1 06:59:14 UTC 2024

I started it on... Thu Aug 1 06:02:48 UTC 2024

58 minutes! I was expecting a couple of days for Jammy -updates to be honest.
I really should have checked earlier.

So we can reproduce the issue.

I then logged into the -proposed instance, with resize2fs
1.46.5-2ubuntu1.2:

It is still running, as of Mon Aug 12 10:54:06 UTC 2024.

This is fantastic. It survived 11 days and 5 hours in the high EBS traffic
us-west-2.

The package in -proposed fixes the issue. Happy to mark verified for
Jammy.

** Tags removed: verification-needed verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-12 Thread Matthew Ruffell
Performing verification for Noble.

So, I had forgotten all about these instances, so let's check them now.

The instance with -updates:

It failed with:
+ resize2fs /dev/nvme1n1p1
resize2fs 1.47.0 (5-Feb-2023)
resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
Couldn't find valid filesystem superblock.

the timestamp is Wed Aug  7 06:42:15 UTC 2024

I started it on... Thu Aug 1 05:46:34 UTC 2024

Okay, this is better than Focal, we lasted 6 days 1 hour.

So we can reproduce the issue.

I then logged into the -proposed instance, with resize2fs
1.47.0-2.4~exp1ubuntu4.1:

It is still running, as of Mon Aug 12 10:49:00 UTC 2024.

This is fantastic. It survived 11 days and 5 hours in the high EBS traffic
us-west-2.

I have terminated the instances now.

The package in -proposed fixes the issue. Happy to mark verified for
Noble.

** Tags removed: verification-needed-noble
** Tags added: verification-done-noble

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Fix Released
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  Fix Committed
Status in e2fsprogs source package in Jammy:
  Fix Committed
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  Won't Fix
Status in e2fsprogs source package in Noble:
  Fix Committed
Status in e2fsprogs source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-12 Thread Matthew Ruffell
Performing verification for Noble.

So, I had forgotten all about these instances, so let's check them now.

The instance with -updates:

It failed with:
+ resize2fs /dev/nvme1n1p1
resize2fs 1.47.0 (5-Feb-2023)
resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
Couldn't find valid filesystem superblock.

the timestamp is Wed Aug  7 06:42:15 UTC 2024

I started it on... Thu Aug 1 05:46:34 UTC 2024

Okay, this is better than Focal, we lasted 6 days 1 hour.

So we can reproduce the issue.

I then logged into the -proposed instance, with resize2fs
1.47.0-2.4~exp1ubuntu4.1:

It is still running, as of Mon Aug 12 10:49:00 UTC 2024.

This is fantastic. It survived 11 days and 5 hours in the high EBS traffic
us-west-2.

I have terminated the instances now.

The package in -proposed fixes the issue. Happy to mark verified for
Noble.

** Tags removed: verification-needed-noble
** Tags added: verification-done-noble

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-12 Thread Matthew Ruffell
Performing verification for Focal.

So, I had forgotten all about these instances, so let's check them now.

The instance with -updates:

It failed with:
+ resize2fs /dev/nvme1n1p1
resize2fs 1.45.5 (07-Jan-2020)
resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
Couldn't find valid filesystem superblock.

the timestamp is Thu Aug  1 05:57:54 UTC 2024

I started it on... Thu Aug 1 05:46:34 UTC 2024

What! It only lasted 11 minutes! I should have checked on it earlier...

So we can reproduce the issue.

I then logged into the -proposed instance, with resize2fs
1.45.5-2ubuntu1.2:

It is still running, as of Mon Aug 12 10:41:35 UTC 2024.

This is fantastic. It survived 11 days and 5 hours in the high EBS traffic
us-west-2. 

I have terminated these instances now.

The package in -proposed fixes the issue. Happy to mark verified for
Focal.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Fix Released
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  Fix Committed
Status in e2fsprogs source package in Jammy:
  Fix Committed
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  Won't Fix
Status in e2fsprogs source package in Noble:
  Fix Committed
Status in e2fsprogs source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-12 Thread Matthew Ruffell
Performing verification for Focal.

So, I had forgotten all about these instances, so let's check them now.

The instance with -updates:

It failed with:
+ resize2fs /dev/nvme1n1p1
resize2fs 1.45.5 (07-Jan-2020)
resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
Couldn't find valid filesystem superblock.

the timestamp is Thu Aug  1 05:57:54 UTC 2024

I started it on... Thu Aug 1 05:46:34 UTC 2024

What! It only lasted 11 minutes! I should have checked on it earlier...

So we can reproduce the issue.

I then logged into the -proposed instance, with resize2fs
1.45.5-2ubuntu1.2:

It is still running, as of Mon Aug 12 10:41:35 UTC 2024.

This is fantastic. It survived 11 days and 5 hours in the high EBS traffic
us-west-2. 

I have terminated these instances now.

The package in -proposed fixes the issue. Happy to mark verified for
Focal.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2072755] Re: i915: Fixup regressions introduced with enabling single CCS engine

2024-08-08 Thread Matthew Ruffell
Thanks for testing TheDreadPirate. I marked the bug as verified.

** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2072755

Title:
  i915: Fixup regressions introduced with enabling single CCS engine

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2072755

  [Impact]

  Recently, the Intel i915 susbsystem underwent a change that limited
  the number of CCS engines that were initialised by default, and
  exposed to the user. Different chipsets have differing amounts of CCS
  engines, but most available in the market have 4 CCS engines. The new
  change just starts a single engine only, and allocates all CCS slices
  to this single engine. This single engine is then exposed to
  userspace. This effort is to workaround a hardware bug.

  This all happened in:

  commit 6db31251bb265813994bfb104eb4b4d0f44d64fb
  Author: Andi Shyti 
  Date:   Thu Mar 28 08:34:05 2024 +0100
  Subject: drm/i915/gt: Enable only one CCS for compute workload
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6db31251bb265813994bfb104eb4b4d0f44d64fb

  which landed in:

  $ git describe --contains 67f164e8510b16bda18642464863dba87a33d8cb
  Ubuntu-6.8.0-38.38~525

  There have been some side effects as a result of these changes,
  leading to failure of userspace applications, namely in video
  transcoding with ffmepg, resulting in fence expiration errors in dmesg
  like:

  [ 81.026591] Fence expiration time out
  i915-:01:00.0:ffmpeg[521]:2!

  There has also been a performance impact introduced by this change,
  which dropped performance of the GPU to 1/4 of what it was previously.
  This is likely due to most ARC GPUs usually having 4 CCS engines, and
  going down to 1 only without actually allocating the other three.

  There are no workarounds. Users are suggested to downgrade to
  6.8.0-36-generic while the fix is coming.

  [Fix]

  The regression was fixed by these two commits:

  commit aee54e282002a127612b71255bbe879ec0103afd
  Author: Andi Shyti 
  Date: Fri Apr 26 02:07:23 2024 +0200
  Subject: drm/i915/gt: Automate CCS Mode setting during engine resets
  Link: 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/commit/?id=aee54e282002a127612b71255bbe879ec0103afd

  commit ee01b6a386eaf9984b58a2476e8f531149679da9
  Author: Andi Shyti 
  Date: Fri May 17 11:06:16 2024 +0200
  Subject: drm/i915/gt: Fix CCS id's calculation for CCS mode setting
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee01b6a386eaf9984b58a2476e8f531149679da9

  "drm/i915/gt: Automate CCS Mode setting during engine resets" is
  already applied to noble/master-next through upstream stable v6.8.10.

  We just need "drm/i915/gt: Fix CCS id's calculation for CCS mode
  setting". It is queued up for v6.9.4, but that could still be another
  SRU cycle or two away. So send it now.

  "drm/i915/gt: Fix CCS id's calculation for CCS mode setting" restores
  another 1/4 performance, but some performance issues still remain, and
  will hopefully be addressed in a future patch.

  [Testcase]

  This affects video transcoding with ffmpeg, on machines equipped with
  Intel ARC GPUs.

  An example ffmpeg command might be:

  /usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -probesize 1G
  -ss 00:00:03.000 -noaccurate_seek -init_hw_device
  vaapi=va:,kernel_driver=i915,driver=iHD -init_hw_device qsv=qs@va
  -filter_hw_device qs -hwaccel vaapi -hwaccel_output_format vaapi
  -noautorotate -i file:"/path/to/1080_video.mkv" -noautoscale
  -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map
  -0:s -codec:v:0 av1_qsv -preset veryfast -b:v 3616000 -maxrate 3616000
  -bufsize 7232000 -g:v:0 72 -keyint_min:v:0 72 -vf
  
"setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,scale_vaapi=w=1280:h=720:format=nv12:extra_hw_frames=24,hwmap=derive_device=qsv,format=qsv"
  -codec:a:0 libfdk_aac -ac 2 -vbr:a 5 -copyts -avoid_negative_ts
  disabled -max_muxing_queue_size 2048 -f hls -max_delay 500
  -hls_time 3 -hls_segment_type fmp4 -hls_fmp4_init_filename
  "c30716eb121448346fcc00a2440071a3-1.mp4" -start_number 1
  -hls_segment_filename
  "/var/lib/jellyfin/transcodes/c30716eb121448346fcc00a2440071a3%d.mp4"
  -hls_playlist_type vod -hls_list_size 0 -y
  "/var/lib/jellyfin/transcodes/c30716eb121448346fcc00a2440071a3.m3u8

  Another user on bug 2072933 came up with this minimalist reproducer:

  #include 
  #include 

  int main() {
// auto selector = sycl::cpu_selector_v; // Works fine
auto selector = sycl::gpu_selector_v;

auto queue = sycl::queue(selector);

printf("Hello\n");
queue.submit([&](sycl::handler &cgh) 

[Bug 2072755] Re: i915: Fixup regressions introduced with enabling single CCS engine

2024-08-08 Thread Matthew Ruffell
Thanks for testing TheDreadPirate. I marked the bug as verified.

** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2072755

Title:
  i915: Fixup regressions introduced with enabling single CCS engine

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2072755/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2075110] Re: md: nvme over tcp with a striped underlying md raid device leads to data corruption

2024-08-07 Thread Matthew Ruffell
Performing verification for Noble.

I started a n2-standard-2 instance on Google cloud, running Noble.

I installed 6.8.0-39-generic from -updates, rebooted, and followed the 
instructions in the
testcase.

$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)

Having a look at dmesg:

unknown: run blktests md/001 at 2024-08-08 04:26:39
root[1982]: run blktests md/001
kernel: brd: module loaded
(udev-worker)[1987]: dm-0: Process '/usr/bin/unshare -m /usr/bin/snap 
auto-import --mount=/dev/dm-0' failed with exit code 1.
kernel: Key type psk registered
kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
kernel: nvmet_tcp: enabling port 0 (127.0.0.1:4420)
kernel: nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for 
NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
kernel: nvme nvme1: creating 2 I/O queues.
kernel: nvme nvme1: mapped 2/0/0 default/read/poll queues.
kernel: nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, 
hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
(udev-worker)[2018]: nvme1n1: Process '/usr/bin/unshare -m /usr/bin/snap 
auto-import --mount=/dev/nvme1n1' failed with exit code 1.
(udev-worker)[2018]: md127: Process '/usr/bin/unshare -m /usr/bin/snap 
auto-import --mount=/dev/md127' failed with exit code 1.
kernel: md/raid1:md127: active with 1 out of 2 mirrors
kernel: [ cut here ]
kernel: WARNING: CPU: 0 PID: 50 at net/core/skbuff.c:6995 
skb_splice_from_iter+0x139/0x370
kernel: Modules linked in: nvme_tcp nvmet_tcp nvmet nvme_keyring brd raid1 
cfg80211 8021q garp mrp stp llc binfmt_misc nls_iso8859_1 intel_rapl_msr 
intel_rapl_common intel_uncore_frequency_common isst_if_common nfit 
crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic 
ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl 
pvpanic_mmio pvpanic nvme psmouse i2c_piix4 input_leds mac_hid serio_raw 
dm_multipath nvme_fabrics nvme_core nvme_auth efi_pstore nfnetlink dmi_sysfs 
virtio_rng ip_tables x_tables autofs4
kernel: CPU: 0 PID: 50 Comm: kworker/0:1H Not tainted 6.8.0-39-generic 
#39-Ubuntu
kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 06/27/2024
kernel: Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
kernel: RIP: 0010:skb_splice_from_iter+0x139/0x370
kernel: Code: 39 e1 48 8b 53 08 49 0f 47 cc 49 89 cd f6 c2 01 0f 85 c0 01 00 00 
66 90 48 89 da 48 8b 12 80 e6 08 0f 84 8e 00 00 00 4d 89 fe <0f> 0b 49 c7 c0 fb 
ff ff ff 48 8b 85 68 ff ff ff 41 01 46 70 41 01
kernel: RSP: 0018:bd92001b3a30 EFLAGS: 00010246
kernel: RAX:  RBX: f5f1c48d9b40 RCX: 1000
kernel: RDX:  RSI:  RDI: 
kernel: RBP: bd92001b3ad8 R08:  R09: 
kernel: R10:  R11:  R12: 20e8
kernel: R13: 1000 R14: 96834b496400 R15: 96834b496400
kernel: FS:  () GS:968477c0() 
knlGS:
kernel: CS:  0010 DS:  ES:  CR0: 80050033
kernel: CR2: 7507bcfe5f84 CR3: 00010b49c002 CR4: 003706f0
kernel: DR0:  DR1:  DR2: 
kernel: DR3:  DR6: fffe0ff0 DR7: 0400
kernel: Call Trace:
kernel:  
kernel:  ? show_regs+0x6d/0x80
kernel:  ? __warn+0x89/0x160
kernel:  ? skb_splice_from_iter+0x139/0x370
kernel:  ? report_bug+0x17e/0x1b0
kernel:  ? handle_bug+0x51/0xa0
kernel:  ? exc_invalid_op+0x18/0x80
kernel:  ? asm_exc_invalid_op+0x1b/0x20
kernel:  ? skb_splice_from_iter+0x139/0x370
kernel:  tcp_sendmsg_locked+0x352/0xd70
kernel:  ? tcp_push+0x159/0x190
kernel:  ? tcp_sendmsg_locked+0x9c4/0xd70
kernel:  tcp_sendmsg+0x2c/0x50
kernel:  inet_sendmsg+0x42/0x80
kernel:  sock_sendmsg+0x118/0x150
kernel:  nvme_tcp_try_send_data+0x18b/0x4c0 [nvme_tcp]
kernel:  nvme_tcp_try_send+0x23c/0x300 [nvme_tcp]
kernel:  nvme_tcp_io_work+0x40/0xe0 [nvme_tcp]
kernel:  process_one_work+0x16c/0x350
kernel:  worker_thread+0x306/0x440
kernel:  ? _raw_spin_unlock_irqrestore+0x11/0x60
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xef/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x44/0x70
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  
kernel: ---[ end trace  ]---
kernel: nvme nvme1: failed to send request -5
kernel: nvme nvme1: I/O tag 111 (106f) type 4 opcode 0x0 (I/O Cmd) QID 1 timeout
kernel: nvme nvme1: starting error recovery
kernel: block nvme1n1: no usable path - requeuing I/O
kernel: nvme nvme1: Reconnecting in 10 seconds...

blktests md/001 hangs the system, in this particular scenario.

I then restarted the instance, enabled -proposed2, and installed
6.8.0-41-generic:

6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC
2024

I c

[Bug 2075110] Re: md: nvme over tcp with a striped underlying md raid device leads to data corruption

2024-08-07 Thread Matthew Ruffell
Performing verification for Noble.

I started a n2-standard-2 instance on Google cloud, running Noble.

I installed 6.8.0-39-generic from -updates, rebooted, and followed the 
instructions in the
testcase.

$ sudo ./check md/001
md/001 (Raid with bitmap on tcp nvmet with opt-io-size over bitmap size)

Having a look at dmesg:

unknown: run blktests md/001 at 2024-08-08 04:26:39
root[1982]: run blktests md/001
kernel: brd: module loaded
(udev-worker)[1987]: dm-0: Process '/usr/bin/unshare -m /usr/bin/snap 
auto-import --mount=/dev/dm-0' failed with exit code 1.
kernel: Key type psk registered
kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
kernel: nvmet_tcp: enabling port 0 (127.0.0.1:4420)
kernel: nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for 
NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
kernel: nvme nvme1: creating 2 I/O queues.
kernel: nvme nvme1: mapped 2/0/0 default/read/poll queues.
kernel: nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, 
hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
(udev-worker)[2018]: nvme1n1: Process '/usr/bin/unshare -m /usr/bin/snap 
auto-import --mount=/dev/nvme1n1' failed with exit code 1.
(udev-worker)[2018]: md127: Process '/usr/bin/unshare -m /usr/bin/snap 
auto-import --mount=/dev/md127' failed with exit code 1.
kernel: md/raid1:md127: active with 1 out of 2 mirrors
kernel: [ cut here ]
kernel: WARNING: CPU: 0 PID: 50 at net/core/skbuff.c:6995 
skb_splice_from_iter+0x139/0x370
kernel: Modules linked in: nvme_tcp nvmet_tcp nvmet nvme_keyring brd raid1 
cfg80211 8021q garp mrp stp llc binfmt_misc nls_iso8859_1 intel_rapl_msr 
intel_rapl_common intel_uncore_frequency_common isst_if_common nfit 
crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic 
ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl 
pvpanic_mmio pvpanic nvme psmouse i2c_piix4 input_leds mac_hid serio_raw 
dm_multipath nvme_fabrics nvme_core nvme_auth efi_pstore nfnetlink dmi_sysfs 
virtio_rng ip_tables x_tables autofs4
kernel: CPU: 0 PID: 50 Comm: kworker/0:1H Not tainted 6.8.0-39-generic 
#39-Ubuntu
kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 06/27/2024
kernel: Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
kernel: RIP: 0010:skb_splice_from_iter+0x139/0x370
kernel: Code: 39 e1 48 8b 53 08 49 0f 47 cc 49 89 cd f6 c2 01 0f 85 c0 01 00 00 
66 90 48 89 da 48 8b 12 80 e6 08 0f 84 8e 00 00 00 4d 89 fe <0f> 0b 49 c7 c0 fb 
ff ff ff 48 8b 85 68 ff ff ff 41 01 46 70 41 01
kernel: RSP: 0018:bd92001b3a30 EFLAGS: 00010246
kernel: RAX:  RBX: f5f1c48d9b40 RCX: 1000
kernel: RDX:  RSI:  RDI: 
kernel: RBP: bd92001b3ad8 R08:  R09: 
kernel: R10:  R11:  R12: 20e8
kernel: R13: 1000 R14: 96834b496400 R15: 96834b496400
kernel: FS:  () GS:968477c0() 
knlGS:
kernel: CS:  0010 DS:  ES:  CR0: 80050033
kernel: CR2: 7507bcfe5f84 CR3: 00010b49c002 CR4: 003706f0
kernel: DR0:  DR1:  DR2: 
kernel: DR3:  DR6: fffe0ff0 DR7: 0400
kernel: Call Trace:
kernel:  
kernel:  ? show_regs+0x6d/0x80
kernel:  ? __warn+0x89/0x160
kernel:  ? skb_splice_from_iter+0x139/0x370
kernel:  ? report_bug+0x17e/0x1b0
kernel:  ? handle_bug+0x51/0xa0
kernel:  ? exc_invalid_op+0x18/0x80
kernel:  ? asm_exc_invalid_op+0x1b/0x20
kernel:  ? skb_splice_from_iter+0x139/0x370
kernel:  tcp_sendmsg_locked+0x352/0xd70
kernel:  ? tcp_push+0x159/0x190
kernel:  ? tcp_sendmsg_locked+0x9c4/0xd70
kernel:  tcp_sendmsg+0x2c/0x50
kernel:  inet_sendmsg+0x42/0x80
kernel:  sock_sendmsg+0x118/0x150
kernel:  nvme_tcp_try_send_data+0x18b/0x4c0 [nvme_tcp]
kernel:  nvme_tcp_try_send+0x23c/0x300 [nvme_tcp]
kernel:  nvme_tcp_io_work+0x40/0xe0 [nvme_tcp]
kernel:  process_one_work+0x16c/0x350
kernel:  worker_thread+0x306/0x440
kernel:  ? _raw_spin_unlock_irqrestore+0x11/0x60
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xef/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x44/0x70
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  
kernel: ---[ end trace  ]---
kernel: nvme nvme1: failed to send request -5
kernel: nvme nvme1: I/O tag 111 (106f) type 4 opcode 0x0 (I/O Cmd) QID 1 timeout
kernel: nvme nvme1: starting error recovery
kernel: block nvme1n1: no usable path - requeuing I/O
kernel: nvme nvme1: Reconnecting in 10 seconds...

blktests md/001 hangs the system, in this particular scenario.

I then restarted the instance, enabled -proposed2, and installed
6.8.0-41-generic:

6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC
2024

I c

[Kernel-packages] [Bug 2069534] Re: Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

2024-08-07 Thread Matthew Ruffell
Performing verification for Noble.

I again started two T2A instances on Google Cloud, both running Noble.

One instance has:
6.8.0-39-generic #39-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul  6 02:50:39 UTC 2024
The other, 6.8.0-41-generic from -proposed2:
6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 23:26:06 UTC 2024

I edited /etc/default/grub.d/50-cloudimg-settings.cfg and set:

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200"

to

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232"

ran:

$ sudo update-grub

and rebooted.

Again, I never saw the 6.8.0-39-generic again.

The 6.8.0-41-generic instance came up just fine:

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-6.8.0-41-generic 
root=PARTUUID=e1ce6327-4835-4b2e-b73e-e7d6231d4869 ro console=ttyS0,115200 
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232

The 6.8.0-41-generic in -proposed2 fixes the issue. Happy to mark
verified for Noble.

** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.8 in Ubuntu.
https://bugs.launchpad.net/bugs/2069534

Title:
  Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

Status in linux package in Ubuntu:
  Fix Released
Status in linux-hwe-6.8 package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Invalid
Status in linux-hwe-6.8 source package in Jammy:
  Fix Committed
Status in linux source package in Noble:
  Fix Committed
Status in linux-hwe-6.8 source package in Noble:
  Invalid

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2069534

  [Impact]

  Linux 6.8 kernel fails to boot on ARM64 when any Linux command line
  param is more than 146 characters.

  This most notably affects MAAS deployments, as MAAS generates very
  long command line parameters for ARM64, e.g.:

  nomodeset
  
root=squash:http://10.254.131.130:5248/images/3b08252fa962c37a47d890fb5fe182b631a0c0478d758bf4573efa859cc2c548/ubuntu/arm64/ga-24.04/noble/stable/squashfs
  ip=sjc01-2b16-u07-mgx01b:BOOTIF ip6=off cc:\{'datasource_list':
  ['MAAS']\}end_cc cloud-config-url=http://10-254-131-128--25.maas-
  internal:5248/MAAS/metadata/latest/by-id/de6dn3/?op=get_preseed ro
  overlayroot=tmpfs overlayroot_cfgdisk=disabled log_host=10.254.131.130
  log_port=5247 --- BOOTIF=01-${net_default_mac}

  This was introduced in 6.8-rc1 by:

  commit dc3f5aae06381b43bc9d0d416bd15ee1682940e9
  Author: Ard Biesheuvel 
  Date: Wed Nov 29 12:16:12 2023 +0100
  Subject: arm64: idreg-override: Avoid parameq() and parameqn()
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dc3f5aae06381b43bc9d0d416bd15ee1682940e9

  There is no workaround, other than using command line parameters less
  than 146 characters. This is not tenable for MAAS users.

  [Fix]

  The fix arrived in a major refactor of early ARM64 init, where they
  moved from assembly to the pi mini c library. The specific commit that
  fixed the issue is:

  commit e223a449125571daa62debd8249fa4fc2da0a961
  Author: Ard Biesheuvel 
  Date: Wed Feb 14 13:28:50 2024 +0100
  Subject: arm64: idreg-override: Move to early mini C runtime
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e223a449125571daa62debd8249fa4fc2da0a961

  However, this needs a lot of dependencies, mostly all the "mini c
  runtime" commits in the below merge commit:

  commit 6d75c6f40a03c97e1ecd683ae54e249abb9d922b
  Merge: fe46a7dd189e 1ef21fcd6a50
  Author: Linus Torvalds 
  Date: Thu Mar 14 15:35:42 2024 -0700
  Subject: Merge tag 'arm64-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d75c6f40a03c97e1ecd683ae54e249abb9d922b

  The amount of code is generally unacceptable for an SRU due to
  regression risk. I also don't think that reverting "arm64: idreg-
  override: Avoid parameq() and parameqn()" is the right solution
  either.

  Thankfully, Tj did some debugging of the root cause in comment #20
  [1], and found the issue occurs because of memcmp() in
  include/linux/fortify-string.h detecting an attempted out-of-bounds
  read when comparing buf and aliases[i].alias.

  That triggers the fortified memcmp()'s:

  if (p_size < size || q_size < size)
  fortify_panic(__func__);

  where q_size == 146, size == 147, and it crashes the kernel.

  [1]
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069534/comments/20

  I know SAUCE patches are to be avoided if possible, but Tj's solution
  is minimal and fixes the root cause without the regression risk of
  backporting

[Bug 2069534] Re: Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

2024-08-07 Thread Matthew Ruffell
Performing verification for Noble.

I again started two T2A instances on Google Cloud, both running Noble.

One instance has:
6.8.0-39-generic #39-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul  6 02:50:39 UTC 2024
The other, 6.8.0-41-generic from -proposed2:
6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 23:26:06 UTC 2024

I edited /etc/default/grub.d/50-cloudimg-settings.cfg and set:

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200"

to

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232"

ran:

$ sudo update-grub

and rebooted.

Again, I never saw the 6.8.0-39-generic again.

The 6.8.0-41-generic instance came up just fine:

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-6.8.0-41-generic 
root=PARTUUID=e1ce6327-4835-4b2e-b73e-e7d6231d4869 ro console=ttyS0,115200 
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232

The 6.8.0-41-generic in -proposed2 fixes the issue. Happy to mark
verified for Noble.

** Tags removed: verification-needed-noble-linux
** Tags added: verification-done-noble-linux

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069534

Title:
  Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069534/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2076201] Re: Virtualbox fails when starting a VM with kernel 5.15.0-116/117

2024-08-07 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2073267 ***
https://bugs.launchpad.net/bugs/2073267

** This bug has been marked a duplicate of bug 2073267
   Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076201

Title:
  Virtualbox fails when starting a VM with kernel 5.15.0-116/117

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076201/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2076201] Re: Virtualbox fails when starting a VM with kernel 5.15.0-116/117

2024-08-07 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2073267 ***
https://bugs.launchpad.net/bugs/2073267

** This bug has been marked a duplicate of bug 2073267
   Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2076201

Title:
  Virtualbox fails when starting a VM with kernel 5.15.0-116/117

Status in linux package in Ubuntu:
  New

Bug description:
  With kernel 5.15.0-107 it works.
  I tested with 116 and 117. On both kernels all VMs starts fails with:

  ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005)
  aIID={4680b2de-8690-11e9-b83d-5719e53cf1de} aComponent={DisplayWrap}
  aText={The console is not powered up (setVideoModeHint)},
  preserve=false aResultDetail=0

  I have attached the VB log of one VM.
  However, the error also occurs with Linux VMs.

  My host OS: Linux Mint 21.3 mate
  My HW: AMD Ryzen 5 3400G with Radeon Vega Graphics

  If you need more details, please ask ;)
  My english is terrible. Please write easily understandable.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076201/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-08-07 Thread Matthew Ruffell
Hi Arrigo,

Kernel SRU cycles are a little tricky... the Kernel Team make 150 kernels twice
a month, so not everything goes in the distro -proposed pocket.

This one is a respin, and hasn't made its way to the main -proposed pocket yet,
and because its a part of the security SRU cycle, it gets placed in the Kernel
Teams -proposed2 pocket:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/proposed2

The Kernel Team also have a -proposed pocket, which is just like the distro
-proposed, but only has the primary kernel SRU cycle packages in it:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/proposed

anyway, -119 is in -proposed2 if you want to try it.

The virtualbox package in -proposed is the same as the one in -updates, and
fixes a DKMS build issue with the 6.8 HWE kernel.

Hope that clears some things up.

Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.8 in Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

Status in linux package in Ubuntu:
  Invalid
Status in linux-hwe-6.8 package in Ubuntu:
  Invalid
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in virtualbox package in Ubuntu:
  Confirmed
Status in linux source package in Jammy:
  Fix Committed
Status in linux-hwe-6.8 source package in Jammy:
  Fix Committed
Status in linux-signed-hwe-5.15 source package in Jammy:
  Invalid
Status in virtualbox source package in Jammy:
  Confirmed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2073267

  SRU Justification:

  [Impact]

  Commit "randomize_kstack: Improve entropy diffusion" changed the
  kernel stack for entropy to 1KiB, limiting the thread kernel stack to
  15Kib. This impacts virtualbox 6.1.50 on jammy, that is no longer
  maintained upstream. The issue does not persist on version 7.0.20 due to a
  code refactoring that probably resulted in less stack usage. Fixing it on
  the jammy virtualbox package side is not straightfoward because the fix is
  not easy to backport to 6.x and upgrading the jammy package to 7.x breaks
  current users machines that run Windows, but not only.
  Users need to uninstall the Guest additions drivers, migrate the
  virtualbox package to 7.x, boot each VMs and install the Guest additions
  drivers in each VM.

  This impacts:
  1. jammy:linux
  2. jammy:linux-hwe-6.8
  3. focal:linux-hwe-5.15

  [Fix]

  Revert commit "randomize_kstack: Improve entropy diffusion"

  [Test Plan]

  Without this fix, a VM would crash, showing with "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)".
  After the kernel upgrade, all VMs should run with no problem.

  [Where problems could occur]
  This may have an impact on security. The commit is a fix to improve the
  stack entropy.

  Original description:

  It worked yesterday, but today I get a Guru Meditation trying to start
  some of my virtual machines. This shows up in VBox.log as "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)". I suspect this
  may have started due to a Linux kernel upgrade I installed this
  morning.

  A fresh VM with no disk shows the issue. Sometimes turning off the I/O
  APIC makes the issue go away, sometimes not. Turning off nested paging
  sometimes lets VirtualBox make a little bit of progress w.r.t. booting
  VMs, but that usually still crashes before the VM finishes starting.

  This may be related to this bug reported on the VirtualBox forums:
  
https://forums.virtualbox.org/viewtopic.php?t=111889&sid=5cd33c0872a03b689e7e9f84d850f538

  https://forums.virtualbox.org/viewtopic.php?t=111918

  Ubuntu is 22.04.4 LTS, kernel is 5.15.0-116-generic, VirtualBox is
  6.1.50-dfsg-1~ubuntu1.22.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2067862] Re: Removing legacy virtio-pci devices causes kernel panic

2024-08-07 Thread Matthew Ruffell
Hi Dong,

There is nothing more to do. Don't worry about all these derivative
kernels spamming the comments, we just needed to verify the main
-generic kernel only. There will be more bot spam in the future, you can
ignore it.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2067862

Title:
  Removing legacy virtio-pci devices causes kernel panic

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2067862

  [Impact]

  If you detach a legacy virtio-pci device from a current Noble system,
  it will cause a null pointer dereference, and panic the system. This
  is an issue if you force noble to use legacy virtio-pci devices, or
  run noble on very old hypervisors that only support legacy virtio-pci
  devices, e.g. trusty and older.

  BUG: kernel NULL pointer dereference, address: 
  ...
  CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic 
#31-Ubuntu
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  RIP: 0010:0x0
  ...
  Call Trace:
  
   ? show_regs+0x6d/0x80
   ? __die+0x24/0x80
   ? page_fault_oops+0x99/0x1b0
   ? do_user_addr_fault+0x2ee/0x6b0
   ? exc_page_fault+0x83/0x1b0
   ? asm_exc_page_fault+0x27/0x30
   vp_del_vqs+0x6e/0x2a0
   remove_vq_common+0x166/0x1a0
   virtnet_remove+0x61/0x80
   virtio_dev_remove+0x3f/0xc0
   device_remove+0x40/0x80
   device_release_driver_internal+0x20b/0x270
   device_release_driver+0x12/0x20
   bus_remove_device+0xcb/0x140
   device_del+0x161/0x3e0
   ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0
   device_unregister+0x17/0x60
   unregister_virtio_device+0x16/0x40
   virtio_pci_remove+0x43/0xa0
   pci_device_remove+0x36/0xb0
   device_remove+0x40/0x80
   device_release_driver_internal+0x20b/0x270
   device_release_driver+0x12/0x20
   pci_stop_bus_device+0x7a/0xb0
   pci_stop_and_remove_bus_device+0x12/0x30
   disable_slot+0x4f/0xa0
   acpiphp_disable_and_eject_slot+0x1c/0xa0
   hotplug_event+0x11b/0x280
   ? __pfx_acpiphp_hotplug_notify+0x10/0x10
   acpiphp_hotplug_notify+0x27/0x70
   acpi_device_hotplug+0xb6/0x300
   acpi_hotplug_work_fn+0x1e/0x40
   process_one_work+0x16c/0x350
   worker_thread+0x306/0x440
   ? _raw_spin_lock_irqsave+0xe/0x20
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xef/0x120
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x44/0x70
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30
  

  The issue was introduced in:

  commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a
  Author: Feng Liu 
  Date:   Tue Dec 19 11:32:40 2023 +0200
  Subject: virtio-pci: Introduce admin virtqueue
  Link: 
https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a

  Modern virtio-pci devices are not affected. If the device is a legacy
  virtio device, the is_avq function pointer is not assigned in the
  virtio_pci_device structure of the legacy virtio device, resulting in
  a NULL pointer dereference when the code calls if
  (vp_dev->is_avq(vdev, vq->index)).

  There is no workaround. If you are affected, then not detaching
  devices for the time being is the only solution.

  [Fix]

  This was fixed in 6.9-rc1 by:

  commit c8fae27d141a32a1624d0d0d5419d94252824498
  From: Li Zhang 
  Date: Sat, 16 Mar 2024 13:25:54 +0800
  Subject: virtio-pci: Check if is_avq is NULL
  Link: 
https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498

  This is a clean cherry pick to noble. The commit just adds a basic
  NULL pointer check before it dereferences the pointer.

  [Testcase]

  Start a fresh Noble VM.

  Edit the grub kernel command line:

  1) sudo vim /etc/default/grub
  GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1" 
  2) sudo update-grub
  3) sudo reboot

  Outside the VM, on the host:

  $ qemu-img create -f qcow2 /root/share-device.qcow2 2G
  $ cat >> share-device.xml << EOF
  disk type='file' device='disk'>
  
  
  
  
  EOF
  $ sudo -s
  # virsh attach-device noble-test share-device.xml --config --live
  # virsh detach-device noble-test share-device.xml --config --live

  A kernel panic should occur.

  There is a test kernel available in:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test

  If you install it, the panic should no longer occur.

  [Where problems could occur]

  We are adding a basic null pointer check right before the pointer is
  about to be used, which is quite low risk.

  If a regression were to occur, it would only affect VMs using legacy
  virtio-pci devices, which is not the default. It would potentially
  have large impacts on fleets of very old hypervisors running trusty,
  precise or lucid, but that is very unlikely in this day and age.

  [Other Info]

  Upstream mailing list discussion and author testcase:
  
https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNK

[Bug 2067862] Re: Removing legacy virtio-pci devices causes kernel panic

2024-08-07 Thread Matthew Ruffell
Hi Dong,

There is nothing more to do. Don't worry about all these derivative
kernels spamming the comments, we just needed to verify the main
-generic kernel only. There will be more bot spam in the future, you can
ignore it.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/2067862

Title:
  Removing legacy virtio-pci devices causes kernel panic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2067862/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-08-07 Thread Matthew Ruffell
Hi Arrigo,

Kernel SRU cycles are a little tricky... the Kernel Team make 150 kernels twice
a month, so not everything goes in the distro -proposed pocket.

This one is a respin, and hasn't made its way to the main -proposed pocket yet,
and because its a part of the security SRU cycle, it gets placed in the Kernel
Teams -proposed2 pocket:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/proposed2

The Kernel Team also have a -proposed pocket, which is just like the distro
-proposed, but only has the primary kernel SRU cycle packages in it:

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/proposed

anyway, -119 is in -proposed2 if you want to try it.

The virtualbox package in -proposed is the same as the one in -updates, and
fixes a DKMS build issue with the 6.8 HWE kernel.

Hope that clears some things up.

Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-08-06 Thread Matthew Ruffell
Performing verification for Jammy.

I deployed a fresh baremetal server running Jammy in the Server Lab, installed
ubuntu-desktop and rebooted.

The kernel is 5.15.0-117-generic from updates:
5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024

$ sudo apt-cache policy virtualbox
virtualbox:
  Installed: 6.1.50-dfsg-1~ubuntu1.22.04.1
  
I downloaded a Ubuntu server iso from cdimage, and made a new virtualbox VM.

On starting, I get "Guru Meditation" and in the logs I see

00:00:02.191995 emR3Debug: rc=VERR_VMM_SET_JMP_ABORTED_RESUME

I then enable proposed2 and installed 5.15.0-119-generic:

5.15.0-119-generic #129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024

The virtualbox VM started up normally. Happy to mark verified for jammy.

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-08-06 Thread Matthew Ruffell
Performing verification for Jammy.

I deployed a fresh baremetal server running Jammy in the Server Lab, installed
ubuntu-desktop and rebooted.

The kernel is 5.15.0-117-generic from updates:
5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024

$ sudo apt-cache policy virtualbox
virtualbox:
  Installed: 6.1.50-dfsg-1~ubuntu1.22.04.1
  
I downloaded a Ubuntu server iso from cdimage, and made a new virtualbox VM.

On starting, I get "Guru Meditation" and in the logs I see

00:00:02.191995 emR3Debug: rc=VERR_VMM_SET_JMP_ABORTED_RESUME

I then enable proposed2 and installed 5.15.0-119-generic:

5.15.0-119-generic #129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024

The virtualbox VM started up normally. Happy to mark verified for jammy.

** Tags removed: verification-needed-jammy-linux
** Tags added: verification-done-jammy-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.8 in Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

Status in linux package in Ubuntu:
  Invalid
Status in linux-hwe-6.8 package in Ubuntu:
  Invalid
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in virtualbox package in Ubuntu:
  Confirmed
Status in linux source package in Jammy:
  Fix Committed
Status in linux-hwe-6.8 source package in Jammy:
  Fix Committed
Status in linux-signed-hwe-5.15 source package in Jammy:
  Invalid
Status in virtualbox source package in Jammy:
  New

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2073267

  SRU Justification:

  [Impact]

  Commit "randomize_kstack: Improve entropy diffusion" changed the
  kernel stack for entropy to 1KiB, limiting the thread kernel stack to
  15Kib. This impacts virtualbox 6.1.50 on jammy, that is no longer
  maintained upstream. The issue does not persist on version 7.0.20 due to a
  code refactoring that probably resulted in less stack usage. Fixing it on
  the jammy virtualbox package side is not straightfoward because the fix is
  not easy to backport to 6.x and upgrading the jammy package to 7.x breaks
  current users machines that run Windows, but not only.
  Users need to uninstall the Guest additions drivers, migrate the
  virtualbox package to 7.x, boot each VMs and install the Guest additions
  drivers in each VM.

  This impacts:
  1. jammy:linux
  2. jammy:linux-hwe-6.8
  3. focal:linux-hwe-5.15

  [Fix]

  Revert commit "randomize_kstack: Improve entropy diffusion"

  [Test Plan]

  Without this fix, a VM would crash, showing with "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)".
  After the kernel upgrade, all VMs should run with no problem.

  [Where problems could occur]
  This may have an impact on security. The commit is a fix to improve the
  stack entropy.

  Original description:

  It worked yesterday, but today I get a Guru Meditation trying to start
  some of my virtual machines. This shows up in VBox.log as "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)". I suspect this
  may have started due to a Linux kernel upgrade I installed this
  morning.

  A fresh VM with no disk shows the issue. Sometimes turning off the I/O
  APIC makes the issue go away, sometimes not. Turning off nested paging
  sometimes lets VirtualBox make a little bit of progress w.r.t. booting
  VMs, but that usually still crashes before the VM finishes starting.

  This may be related to this bug reported on the VirtualBox forums:
  
https://forums.virtualbox.org/viewtopic.php?t=111889&sid=5cd33c0872a03b689e7e9f84d850f538

  https://forums.virtualbox.org/viewtopic.php?t=111918

  Ubuntu is 22.04.4 LTS, kernel is 5.15.0-116-generic, VirtualBox is
  6.1.50-dfsg-1~ubuntu1.22.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2076048] Re: My system freezes after waking from suspend

2024-08-06 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2064595 ***
https://bugs.launchpad.net/bugs/2064595

Great to hear it! Everyone else, hold tight, 6.8.0-40-generic will be
released soon.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2076048

Title:
  My system freezes after waking from suspend

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have installed Ubuntu 24.04 on my Lenovo ThinkPad T14 Gen 3 (AMD
  Ryzen™ 7 PRO 6850U with Radeon™ Graphics × 16). Type - 21CF-003QAU

  Since installation, the following problem has occurred:

  - Whenever I suspend the system (e.g. power off menu > suspend), the system 
successfully goes into a suspended state.
  - When I wake the computer, the computer wakes and presents me with the login 
screen
  - I can log into the system, and the mouse and all previously open windows 
appear to be working, sound is working etc
  - I CAN'T seem to open any new applications or run any new commands at this 
time (no error messages, just nothing starting)
  - Approximately 30 seconds to 1 minute later, the system freezes completely. 
No error messages, and I can see everything on the screen. But mouse/keyboard 
input stops, sound stops, and I have to physically hard restart the laptop.
  - Example is caplock key light on keyboard stops responding. Interestingly 
the Esc/FnLock key light still works

  I have updated all the firmware / BIOS to the latest versions, and the
  issue still persists.

  I have run Ubuntu from a Live USB on the same machine, and when
  suspending, all appears to work OK.

  I have done a completely fresh install with default installation
  parameters, and system freeze as explained above occurs every time.

  Also reported to Lenovo here ->
  https://forums.lenovo.com/t5/Ubuntu/Lenovo-T14-Gen3-AMD-with-
  Ubuntu-24-04-System-freezes-30-seconds-after-waking-from-
  sleep/m-p/5324170?page=1#6404090

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: linux-image-6.8.0-39-generic 6.8.0-39.39
  ProcVersionSignature: Ubuntu 6.8.0-39.39-generic 6.8.8
  Uname: Linux 6.8.0-39-generic x86_64
  ApportVersion: 2.28.1-0ubuntu2
  Architecture: amd64
  CRDA: N/A
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Date: Mon Aug  5 09:21:52 2024
  InstallationDate: Installed on 2024-08-01 (4 days ago)
  InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
  MachineType: LENOVO 21CF003QAU
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
   XDG_RUNTIME_DIR=
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-39-generic 
root=UUID=a12931b8-49e7-4226-b38f-3febeac4f41d ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-6.8.0-39-generic N/A
   linux-backports-modules-6.8.0-39-generic  N/A
   linux-firmware20240318.git3b128b60-0ubuntu2.1
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 05/29/2024
  dmi.bios.release: 1.53
  dmi.bios.vendor: LENOVO
  dmi.bios.version: R23ET77W (1.53 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 21CF003QAU
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0T76538 WIN
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.ec.firmware.release: 1.32
  dmi.modalias: 
dmi:bvnLENOVO:bvrR23ET77W(1.53):bd05/29/2024:br1.53:efr1.32:svnLENOVO:pn21CF003QAU:pvrThinkPadT14Gen3:rvnLENOVO:rn21CF003QAU:rvrSDK0T76538WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_21CF_BU_Think_FM_ThinkPadT14Gen3:
  dmi.product.family: ThinkPad T14 Gen 3
  dmi.product.name: 21CF003QAU
  dmi.product.sku: LENOVO_MT_21CF_BU_Think_FM_ThinkPad T14 Gen 3
  dmi.product.version: ThinkPad T14 Gen 3
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076048/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2076048] Re: My system freezes after waking from suspend

2024-08-06 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2064595 ***
https://bugs.launchpad.net/bugs/2064595

Great to hear it! Everyone else, hold tight, 6.8.0-40-generic will be
released soon.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076048

Title:
  My system freezes after waking from suspend

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076048/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2076048] Re: My system freezes after waking from suspend

2024-08-06 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2064595 ***
https://bugs.launchpad.net/bugs/2064595

I think this is going to be fixed in 6.8.0-40-generic, currently in
-proposed. You could try install it now and see if it fixes your issue.
Let me know if it doesn't.

** This bug has been marked a duplicate of bug 2064595
   AMD Rembrandt & AMD Rembrandt-R: Suspend hangs system

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2076048

Title:
  My system freezes after waking from suspend

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have installed Ubuntu 24.04 on my Lenovo ThinkPad T14 Gen 3 (AMD
  Ryzen™ 7 PRO 6850U with Radeon™ Graphics × 16). Type - 21CF-003QAU

  Since installation, the following problem has occurred:

  - Whenever I suspend the system (e.g. power off menu > suspend), the system 
successfully goes into a suspended state.
  - When I wake the computer, the computer wakes and presents me with the login 
screen
  - I can log into the system, and the mouse and all previously open windows 
appear to be working, sound is working etc
  - I CAN'T seem to open any new applications or run any new commands at this 
time (no error messages, just nothing starting)
  - Approximately 30 seconds to 1 minute later, the system freezes completely. 
No error messages, and I can see everything on the screen. But mouse/keyboard 
input stops, sound stops, and I have to physically hard restart the laptop.
  - Example is caplock key light on keyboard stops responding. Interestingly 
the Esc/FnLock key light still works

  I have updated all the firmware / BIOS to the latest versions, and the
  issue still persists.

  I have run Ubuntu from a Live USB on the same machine, and when
  suspending, all appears to work OK.

  I have done a completely fresh install with default installation
  parameters, and system freeze as explained above occurs every time.

  Also reported to Lenovo here ->
  https://forums.lenovo.com/t5/Ubuntu/Lenovo-T14-Gen3-AMD-with-
  Ubuntu-24-04-System-freezes-30-seconds-after-waking-from-
  sleep/m-p/5324170?page=1#6404090

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: linux-image-6.8.0-39-generic 6.8.0-39.39
  ProcVersionSignature: Ubuntu 6.8.0-39.39-generic 6.8.8
  Uname: Linux 6.8.0-39-generic x86_64
  ApportVersion: 2.28.1-0ubuntu2
  Architecture: amd64
  CRDA: N/A
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Date: Mon Aug  5 09:21:52 2024
  InstallationDate: Installed on 2024-08-01 (4 days ago)
  InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
  MachineType: LENOVO 21CF003QAU
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
   XDG_RUNTIME_DIR=
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-39-generic 
root=UUID=a12931b8-49e7-4226-b38f-3febeac4f41d ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-6.8.0-39-generic N/A
   linux-backports-modules-6.8.0-39-generic  N/A
   linux-firmware20240318.git3b128b60-0ubuntu2.1
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 05/29/2024
  dmi.bios.release: 1.53
  dmi.bios.vendor: LENOVO
  dmi.bios.version: R23ET77W (1.53 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 21CF003QAU
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0T76538 WIN
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.ec.firmware.release: 1.32
  dmi.modalias: 
dmi:bvnLENOVO:bvrR23ET77W(1.53):bd05/29/2024:br1.53:efr1.32:svnLENOVO:pn21CF003QAU:pvrThinkPadT14Gen3:rvnLENOVO:rn21CF003QAU:rvrSDK0T76538WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_21CF_BU_Think_FM_ThinkPadT14Gen3:
  dmi.product.family: ThinkPad T14 Gen 3
  dmi.product.name: 21CF003QAU
  dmi.product.sku: LENOVO_MT_21CF_BU_Think_FM_ThinkPad T14 Gen 3
  dmi.product.version: ThinkPad T14 Gen 3
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076048/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Bug 2076048] Re: My system freezes after waking from suspend

2024-08-06 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2064595 ***
https://bugs.launchpad.net/bugs/2064595

I think this is going to be fixed in 6.8.0-40-generic, currently in
-proposed. You could try install it now and see if it fixes your issue.
Let me know if it doesn't.

** This bug has been marked a duplicate of bug 2064595
   AMD Rembrandt & AMD Rembrandt-R: Suspend hangs system

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076048

Title:
  My system freezes after waking from suspend

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076048/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2076003] Re: Shows Unknow display

2024-08-05 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2076004 ***
https://bugs.launchpad.net/bugs/2076004

** This bug has been marked a duplicate of bug 2076004
   Shows Unknown display with 6.8.0-39

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2076003

Title:
  Shows Unknow display

Status in linux package in Ubuntu:
  New

Bug description:
  I upgraded to this kernel so I could updated to the newest stuff, then
  I found that there is unknown display that shows up and can't be
  removed, I rollback to the old kernel I was on and it fixed it, to
  help I have an RTX 2070 (Gigabyte), I don't know if it's something
  with the kernel and the drivers communicating  but it's been an issue
  with some of my games I play on here

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076003/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-05 Thread Matthew Ruffell
Beginning verification for noble

I started two c5.large instances on us-west-2 on AWS, the same parameters that 
we used in previous tests. Each has a 60gb GP3 volume attached to it.
Each instance is running the GA kernel, 6.8.0-1012-aws.

One is -updates, the other is e2fsprogs from -proposed:

-updates:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.47.0-2.4~exp1ubuntu4
-proposed:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.47.0-2.4~exp1ubuntu4.1
  
Each is running the same script from the testcase.

I will leave these instances running for the next 7-14 days. We will
consider this bug verified if the -updates instance is broken, and the
-proposed instance still functioning correctly at the end of this time.

The timestamp of starting both tests is: Thu Aug  1 06:10:40 UTC 2024

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Fix Released
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  Fix Committed
Status in e2fsprogs source package in Jammy:
  Fix Committed
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  Won't Fix
Status in e2fsprogs source package in Noble:
  Fix Committed
Status in e2fsprogs source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-08-05 Thread Matthew Ruffell
Hi everyone,

I see that Greg KH just assigned CVE-2024-35918 to "randomize_kstack:
Improve entropy diffusion".

I suppose that means that we cannot revert it now.

https://lore.kernel.org/linux-cve-announce/2024073029-clerk-
trophy-b84c@gregkh/T/

This is going to take some time.

Thanks,
Matthew

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2024-35918

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

Status in linux package in Ubuntu:
  Triaged
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in virtualbox package in Ubuntu:
  Confirmed

Bug description:
  It worked yesterday, but today I get a Guru Meditation trying to start
  some of my virtual machines. This shows up in VBox.log as "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)". I suspect this
  may have started due to a Linux kernel upgrade I installed this
  morning.

  A fresh VM with no disk shows the issue. Sometimes turning off the I/O
  APIC makes the issue go away, sometimes not. Turning off nested paging
  sometimes lets VirtualBox make a little bit of progress w.r.t. booting
  VMs, but that usually still crashes before the VM finishes starting.

  This may be related to this bug reported on the VirtualBox forums:
  
https://forums.virtualbox.org/viewtopic.php?t=111889&sid=5cd33c0872a03b689e7e9f84d850f538

  https://forums.virtualbox.org/viewtopic.php?t=111918

  Ubuntu is 22.04.4 LTS, kernel is 5.15.0-116-generic, VirtualBox is
  6.1.50-dfsg-1~ubuntu1.22.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2069534] Re: Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

2024-08-05 Thread Matthew Ruffell
Hi Chris,

Yes, jammy-hwe-6.8 got fixed because Stefan Bader had to respin the kernel
for another regression anyway, so he opportunistically pulled it in.

For Noble, I think it will be part of the s2024.07.08 SRU cycle, as per
https://kernel.ubuntu.com/, as Manuel Diewald mentioned when I spoke to him.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.8 in Ubuntu.
https://bugs.launchpad.net/bugs/2069534

Title:
  Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

Status in linux package in Ubuntu:
  Fix Released
Status in linux-hwe-6.8 package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Invalid
Status in linux-hwe-6.8 source package in Jammy:
  Fix Committed
Status in linux source package in Noble:
  Fix Committed
Status in linux-hwe-6.8 source package in Noble:
  Invalid

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2069534

  [Impact]

  Linux 6.8 kernel fails to boot on ARM64 when any Linux command line
  param is more than 146 characters.

  This most notably affects MAAS deployments, as MAAS generates very
  long command line parameters for ARM64, e.g.:

  nomodeset
  
root=squash:http://10.254.131.130:5248/images/3b08252fa962c37a47d890fb5fe182b631a0c0478d758bf4573efa859cc2c548/ubuntu/arm64/ga-24.04/noble/stable/squashfs
  ip=sjc01-2b16-u07-mgx01b:BOOTIF ip6=off cc:\{'datasource_list':
  ['MAAS']\}end_cc cloud-config-url=http://10-254-131-128--25.maas-
  internal:5248/MAAS/metadata/latest/by-id/de6dn3/?op=get_preseed ro
  overlayroot=tmpfs overlayroot_cfgdisk=disabled log_host=10.254.131.130
  log_port=5247 --- BOOTIF=01-${net_default_mac}

  This was introduced in 6.8-rc1 by:

  commit dc3f5aae06381b43bc9d0d416bd15ee1682940e9
  Author: Ard Biesheuvel 
  Date: Wed Nov 29 12:16:12 2023 +0100
  Subject: arm64: idreg-override: Avoid parameq() and parameqn()
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dc3f5aae06381b43bc9d0d416bd15ee1682940e9

  There is no workaround, other than using command line parameters less
  than 146 characters. This is not tenable for MAAS users.

  [Fix]

  The fix arrived in a major refactor of early ARM64 init, where they
  moved from assembly to the pi mini c library. The specific commit that
  fixed the issue is:

  commit e223a449125571daa62debd8249fa4fc2da0a961
  Author: Ard Biesheuvel 
  Date: Wed Feb 14 13:28:50 2024 +0100
  Subject: arm64: idreg-override: Move to early mini C runtime
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e223a449125571daa62debd8249fa4fc2da0a961

  However, this needs a lot of dependencies, mostly all the "mini c
  runtime" commits in the below merge commit:

  commit 6d75c6f40a03c97e1ecd683ae54e249abb9d922b
  Merge: fe46a7dd189e 1ef21fcd6a50
  Author: Linus Torvalds 
  Date: Thu Mar 14 15:35:42 2024 -0700
  Subject: Merge tag 'arm64-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d75c6f40a03c97e1ecd683ae54e249abb9d922b

  The amount of code is generally unacceptable for an SRU due to
  regression risk. I also don't think that reverting "arm64: idreg-
  override: Avoid parameq() and parameqn()" is the right solution
  either.

  Thankfully, Tj did some debugging of the root cause in comment #20
  [1], and found the issue occurs because of memcmp() in
  include/linux/fortify-string.h detecting an attempted out-of-bounds
  read when comparing buf and aliases[i].alias.

  That triggers the fortified memcmp()'s:

  if (p_size < size || q_size < size)
  fortify_panic(__func__);

  where q_size == 146, size == 147, and it crashes the kernel.

  [1]
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069534/comments/20

  I know SAUCE patches are to be avoided if possible, but Tj's solution
  is minimal and fixes the root cause without the regression risk of
  backporting the entire mini C runtime, so I suggest we go with Tj's
  patch.

  commit a4c616d2156c9c4cf7c91e6983c8bf0d51985df1
  Author: Tj 
  Date:   Fri Jul 26 13:48:44 2024 +
  Subject: UBUNTU: SAUCE: arm64: v6.8: cmdline param >= 146 chars kills kernel
  Link: 
https://lore.kernel.org/stable/JsQ4W_o2R1NfPFTCCJjjksPED-8TuWGr796GMNeUMAdCh-2NSB_16x6TXcEecXwIfgzVxHzeB_-PMQnvQuDo0gmYE_lye0rC5KkbkDgkUqM=@proton.me/T/#u

  [Testcase]

  1) Deploy an ARM64 VM or use a bare metal ARM64 board with Noble, running 6.8.
  2) Edit /boot/grub/grub.cfg and add the following param to any boot entry with
  Linux 6.8

  
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232

  3) Reboot the machine and select the boot entry in grub with the testparam as
  above.
  4) Observe kernel never boots.

  [Where problems could occur]

  We are changing command line p

[Kernel-packages] [Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-08-05 Thread Matthew Ruffell
Hi David,

Yes, it does indeed seem that the CVE has now been rejected.

https://lore.kernel.org/linux-cve-announce/2024073029-clerk-
trophy-b84c@gregkh/

https://nvd.nist.gov/vuln/detail/CVE-2024-35918
https://www.cve.org/CVERecord/?id=CVE-2024-35918

Maybe we can revert it after all!

I will have a talk with Aaron and the Kernel Team about how we should
move forward.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

Status in linux package in Ubuntu:
  Triaged
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in virtualbox package in Ubuntu:
  Confirmed

Bug description:
  It worked yesterday, but today I get a Guru Meditation trying to start
  some of my virtual machines. This shows up in VBox.log as "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)". I suspect this
  may have started due to a Linux kernel upgrade I installed this
  morning.

  A fresh VM with no disk shows the issue. Sometimes turning off the I/O
  APIC makes the issue go away, sometimes not. Turning off nested paging
  sometimes lets VirtualBox make a little bit of progress w.r.t. booting
  VMs, but that usually still crashes before the VM finishes starting.

  This may be related to this bug reported on the VirtualBox forums:
  
https://forums.virtualbox.org/viewtopic.php?t=111889&sid=5cd33c0872a03b689e7e9f84d850f538

  https://forums.virtualbox.org/viewtopic.php?t=111918

  Ubuntu is 22.04.4 LTS, kernel is 5.15.0-116-generic, VirtualBox is
  6.1.50-dfsg-1~ubuntu1.22.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Desktop-packages] [Bug 1972914] Re: frequent 15-sec guest freeze with ubuntu 22.04 host and guest

2024-08-05 Thread Matthew Ruffell
Hi everyone,

$ cd Work/kernel/ubuntu-jammy/
~/Work/kernel/ubuntu-jammy$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"'
commit 1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"

...
~/Work/kernel/ubuntu-jammy$ git describe --contains 
1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Ubuntu-5.15.0-115.125~199
~/Work/kernel/ubuntu-jammy$ cd ..
~/Work/kernel$ cd ubuntu-noble/
~/Work/kernel/ubuntu-noble$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"' origin/master-next
commit ee451375fd8b767eb91721fa389b022f1582cb0f
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"
...
~/Work/kernel/ubuntu-noble$ git describe --contains 
ee451375fd8b767eb91721fa389b022f1582cb0f
Ubuntu-6.8.0-38.38~331

This has been fixed in 5.15.0-115-generic or later, and 6.8.0-38-generic
or later.

Let me know if you need any more help.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: xserver-xorg-video-qxl (Ubuntu)
   Status: Confirmed => Invalid

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Noble)
   Importance: Undecided
   Status: New

** No longer affects: xserver-xorg-video-qxl (Ubuntu Jammy)

** No longer affects: xserver-xorg-video-qxl (Ubuntu Noble)

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Noble)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to xserver-xorg-video-qxl in Ubuntu.
https://bugs.launchpad.net/bugs/1972914

Title:
  frequent 15-sec guest freeze with ubuntu 22.04 host and guest

Status in linux package in Ubuntu:
  Fix Released
Status in xserver-xorg-video-qxl package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Released
Status in linux source package in Noble:
  Fix Released

Bug description:
  I'm running a new installation of Ubuntu 22.04 Desktop on a Thinkpad
  T450s (core i5-5200, 2 cores / 4 vCPUs, 12 GB memory). Using the virt-
  manager GUI, I performed what I believe is a simple, plain-vanilla
  installation of an Ubuntu 22.04 guest running under qemu/kvm with 2
  vCPUs and 4 GB memory.

  I'm seeing very frequent, 15-second freezes of the guest. When it
  happens, the guest is completely unresponsive. After about 15 seconds,
  it works normally again, until the next freeze. The duration of the
  freeze appears to be the same every time. The guest isn't doing much -
  just open the calculator app and click number buttons. The freeze
  doesn't happen if I don't interact with the guest (just leave the
  system monitor running in the guest, so I can see that it's not
  frozen).

  I observe the freeze when I have 2 of the 4 vCPUs dedicated to the
  guest. I do NOT observe it when I have only one vCPU dedicated to the
  guest.

  Both the host and guest are using only a small fraction of the memory
  available to them. When the problem happens, the host indicates that
  CPU usage is very low across all vCPUs. The host appears to be
  operating normally when the guest is frozen.

  I see the following pair of lines in the guest syslog every time the
  freeze occurs (and only when the freeze occurs):

  May 10 13:48:40 qemu-jammy kernel: [  144.259799] qxl :00:01.0:
  object_init failed for (8298496, 0x0001)

  May 10 13:48:40 qemu-jammy kernel: [  144.259819]
  [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO

  I don't see anything in the host syslog that correlates with the
  freeze.

  If I choose "Virtio" in the Video drop-down in the virt-manager GUI,
  with "3D acceleration" UNchecked, the guest works fine, and the freeze
  never happens. Unfortunately, that loses fractional scaling, which is
  important to me. If I check "3D acceleration," the guest won't boot.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1972914/+subscriptions


-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-05 Thread Matthew Ruffell
Beginning verification for jammy

I started two c5.large instances on us-west-2 on AWS, the same parameters that 
we used in previous tests. Each has a 60gb GP3 volume attached to it.
I downgraded the HWE kernels down to GA kernels, and each is running 
5.15.0-1066-aws.

One is -updates, the other is e2fsprogs from -proposed:

-updates:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.46.5-2ubuntu1.1
-proposed:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.46.5-2ubuntu1.2
  
Each is running the same script from the testcase.

I will leave these instances running for the next 7-14 days. We will
consider this bug verified if the -updates instance is broken, and the
-proposed instance still functioning correctly at the end of this time.

The timestamp of starting both tests is: Thu Aug  1 06:02:48 UTC 2024

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Fix Released
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  Fix Committed
Status in e2fsprogs source package in Jammy:
  Fix Committed
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  Won't Fix
Status in e2fsprogs source package in Noble:
  Fix Committed
Status in e2fsprogs source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-08-05 Thread Matthew Ruffell
Beginning verification for focal

I started two c5.large instances on us-west-2 on AWS, the same parameters that 
we used in previous tests. Each has a 60gb GP3 volume attached to it.
I downgraded the HWE kernels down to GA kernels, and each is running 
5.4.0-1129-aws.

One is -updates, the other is e2fsprogs from -proposed:

-updates:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.45.5-2ubuntu1.1
-proposed:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.45.5-2ubuntu1.2
  
Each is running the same script from the testcase.

I will leave these instances running for the next 7-14 days. We will
consider this bug verified if the -updates instance is broken, and the
-proposed instance still functioning correctly at the end of this time.

The timestamp of starting both tests is: Thu Aug  1 05:46:34 UTC 2024

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  Fix Released
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  Fix Committed
Status in e2fsprogs source package in Jammy:
  Fix Committed
Status in e2fsprogs source package in Lunar:
  Won't Fix
Status in e2fsprogs source package in Mantic:
  Won't Fix
Status in e2fsprogs source package in Noble:
  Fix Committed
Status in e2fsprogs source package in Oracular:
  Fix Released

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open 
/dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
     parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
     sleep .5
     mkfs.ext4 /dev/nvme1n1p1
     mount -t ext4 /dev/nvme1n1p1 /mnt
     stress-ng --temp-path /mnt -D 4 &
     STRESS_PID=$!
     sleep 1
     growpart /dev/nvme1n1 1
     resize2fs /dev/nvme1n1p1
     kill $STRESS_PID
     wait $STRESS_PID
     umount /mnt
     wipefs -a /dev/nvme1n1p1
     wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/
  https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o 
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: 
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2069534] Re: Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

2024-08-05 Thread Matthew Ruffell
Performing verification for jammy-hwe-6.8

I started two T2A instances on google cloud, which are arm64, with
jammy.

One instance has:
6.8.0-39-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jul 10 16:59:11 UTC 
2
The other, 6.8.0-40-generic from -proposed:
6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:53:10 UTC 
2

I edited /etc/default/grub.d/50-cloudimg-settings.cfg and set:

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200"

to

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232"

ran:

$ sudo update-grub

and rebooted.

Unfortunately, I never saw the 6.8.0-39-generic again.

The 6.8.0-40-generic instance came up just fine:

$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-6.8.0-40-generic 
root=PARTUUID=17337627-dfbd-4ce7-9f99-4dd1da2542eb ro console=ttyS0,115200 
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232

The 6.8.0-40-generic in -proposed fixes the issue. Happy to mark
verified for jammy-hwe-6.8.

** Tags removed: verification-needed-jammy-linux-hwe-6.8
** Tags added: verification-done-jammy-linux-hwe-6.8

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.8 in Ubuntu.
https://bugs.launchpad.net/bugs/2069534

Title:
  Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

Status in linux package in Ubuntu:
  Fix Released
Status in linux-hwe-6.8 package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Invalid
Status in linux-hwe-6.8 source package in Jammy:
  Fix Committed
Status in linux source package in Noble:
  Fix Committed
Status in linux-hwe-6.8 source package in Noble:
  Invalid

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2069534

  [Impact]

  Linux 6.8 kernel fails to boot on ARM64 when any Linux command line
  param is more than 146 characters.

  This most notably affects MAAS deployments, as MAAS generates very
  long command line parameters for ARM64, e.g.:

  nomodeset
  
root=squash:http://10.254.131.130:5248/images/3b08252fa962c37a47d890fb5fe182b631a0c0478d758bf4573efa859cc2c548/ubuntu/arm64/ga-24.04/noble/stable/squashfs
  ip=sjc01-2b16-u07-mgx01b:BOOTIF ip6=off cc:\{'datasource_list':
  ['MAAS']\}end_cc cloud-config-url=http://10-254-131-128--25.maas-
  internal:5248/MAAS/metadata/latest/by-id/de6dn3/?op=get_preseed ro
  overlayroot=tmpfs overlayroot_cfgdisk=disabled log_host=10.254.131.130
  log_port=5247 --- BOOTIF=01-${net_default_mac}

  This was introduced in 6.8-rc1 by:

  commit dc3f5aae06381b43bc9d0d416bd15ee1682940e9
  Author: Ard Biesheuvel 
  Date: Wed Nov 29 12:16:12 2023 +0100
  Subject: arm64: idreg-override: Avoid parameq() and parameqn()
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dc3f5aae06381b43bc9d0d416bd15ee1682940e9

  There is no workaround, other than using command line parameters less
  than 146 characters. This is not tenable for MAAS users.

  [Fix]

  The fix arrived in a major refactor of early ARM64 init, where they
  moved from assembly to the pi mini c library. The specific commit that
  fixed the issue is:

  commit e223a449125571daa62debd8249fa4fc2da0a961
  Author: Ard Biesheuvel 
  Date: Wed Feb 14 13:28:50 2024 +0100
  Subject: arm64: idreg-override: Move to early mini C runtime
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e223a449125571daa62debd8249fa4fc2da0a961

  However, this needs a lot of dependencies, mostly all the "mini c
  runtime" commits in the below merge commit:

  commit 6d75c6f40a03c97e1ecd683ae54e249abb9d922b
  Merge: fe46a7dd189e 1ef21fcd6a50
  Author: Linus Torvalds 
  Date: Thu Mar 14 15:35:42 2024 -0700
  Subject: Merge tag 'arm64-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d75c6f40a03c97e1ecd683ae54e249abb9d922b

  The amount of code is generally unacceptable for an SRU due to
  regression risk. I also don't think that reverting "arm64: idreg-
  override: Avoid parameq() and parameqn()" is the right solution
  either.

  Thankfully, Tj did some debugging of the root cause in comment #20
  [1], and found the issue occurs because of memcmp() in
  include/linux/fortify-string.h detecting an attempted out-of-bounds
  read when comparing buf and aliases[i].alias.

  That triggers the fortified memcmp()'s:

  if (p_size < size || q_size < size)
  fortify_panic(__func__);

  where q_size == 146, size == 147, and it crashes the kernel.

  [1]
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069534/comments/20

  I know SAUCE patches are to be avoided if possible, but Tj's solution
  is minimal and fixes th

[Kernel-packages] [Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-08-05 Thread Matthew Ruffell
Hi everyone,

Keith, thanks for testing! It is fantastic news that the test kernel
fixes the issue, and our theory is correct, and we are on the right
track.

I also tested both test kernels myself, and can confirm that reverting
"randomize_kstack: Improve entropy diffusion" fixes the issue, and that
"randomize_kstack: Remove non-functional per-arch entropy filtering" has
no improvement at all, and still has the issue.

So, I think the way forward is to do the unpopular thing, which is to
revert "randomize_kstack: Improve entropy diffusion".

We should never have changed something so fundamental as the kernel
thread stack size on a stable kernel. We can do that on the development
release sure, but not on LTS kernels.

I will have a talk with the Kernel team about it, but I will begin
preparing the patches and writing a SRU template.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

Status in linux package in Ubuntu:
  Triaged
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in virtualbox package in Ubuntu:
  Confirmed

Bug description:
  It worked yesterday, but today I get a Guru Meditation trying to start
  some of my virtual machines. This shows up in VBox.log as "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)". I suspect this
  may have started due to a Linux kernel upgrade I installed this
  morning.

  A fresh VM with no disk shows the issue. Sometimes turning off the I/O
  APIC makes the issue go away, sometimes not. Turning off nested paging
  sometimes lets VirtualBox make a little bit of progress w.r.t. booting
  VMs, but that usually still crashes before the VM finishes starting.

  This may be related to this bug reported on the VirtualBox forums:
  
https://forums.virtualbox.org/viewtopic.php?t=111889&sid=5cd33c0872a03b689e7e9f84d850f538

  https://forums.virtualbox.org/viewtopic.php?t=111918

  Ubuntu is 22.04.4 LTS, kernel is 5.15.0-116-generic, VirtualBox is
  6.1.50-dfsg-1~ubuntu1.22.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2070020] Re: Lenovo dock no longer working after upgrade from 6.5.0-35 to 6.5.0-41

2024-08-05 Thread Matthew Ruffell
Hi everyone,

The Jammy HWE kernel will be rolling to the 6.8 kernel from Noble in a
couple days / or a week or so. The 6.5 mantic kernel is closed to any
new commits now, so best to just move to the 6.8 kernel now where this
is already fixed.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2070020

Title:
  Lenovo dock no longer working after upgrade from 6.5.0-35 to 6.5.0-41

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Mantic:
  Won't Fix

Bug description:
  1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> 
About Ubuntu
  Ubuntu 22.04.4 LTS

  2) The version of the package you are using, via 'apt-cache policy pkgname' 
or by checking in Software Center
  6.5.0-41.41~22.04.2

  3) What you expected to happen
  I have a Lenovo laptop (P15v Gen 3).
  I have a Lenovo dock attached to that.
  I have a monitor attached to that via HDMI.
  I have another monitor attached directly to the laptop's own HDMI port.

  When I boot my machine with the Lenovo dock attached, it usually
  works.

  4) What happened instead

  Yesterday, that display didn't work.
  It just stayed blank and the journal showed the following lines:
  ---
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(4190400)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(4190400)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(2095200)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(5028480)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(2514240)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(1257120)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(4190400)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(4190400)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(2095200)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(5028480)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(2514240)
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(1257120)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(4190400)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(4190400)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(2095200)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(5028480)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(2514240)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] perform_link_training_with_retries: 
Link(3) bandwidth too low after fallback req_bw(5796000) > link_bw(1257120)
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachine kernel: [drm] DMUB HPD IRQ callback: link_index=3
  Jun 21 08:52:13 mymachi

[Kernel-packages] [Bug 1972914] Re: frequent 15-sec guest freeze with ubuntu 22.04 host and guest

2024-08-05 Thread Matthew Ruffell
Hi everyone,

$ cd Work/kernel/ubuntu-jammy/
~/Work/kernel/ubuntu-jammy$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"'
commit 1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"

...
~/Work/kernel/ubuntu-jammy$ git describe --contains 
1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Ubuntu-5.15.0-115.125~199
~/Work/kernel/ubuntu-jammy$ cd ..
~/Work/kernel$ cd ubuntu-noble/
~/Work/kernel/ubuntu-noble$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"' origin/master-next
commit ee451375fd8b767eb91721fa389b022f1582cb0f
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"
...
~/Work/kernel/ubuntu-noble$ git describe --contains 
ee451375fd8b767eb91721fa389b022f1582cb0f
Ubuntu-6.8.0-38.38~331

This has been fixed in 5.15.0-115-generic or later, and 6.8.0-38-generic
or later.

Let me know if you need any more help.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: xserver-xorg-video-qxl (Ubuntu)
   Status: Confirmed => Invalid

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Noble)
   Importance: Undecided
   Status: New

** No longer affects: xserver-xorg-video-qxl (Ubuntu Jammy)

** No longer affects: xserver-xorg-video-qxl (Ubuntu Noble)

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Noble)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1972914

Title:
  frequent 15-sec guest freeze with ubuntu 22.04 host and guest

Status in linux package in Ubuntu:
  Fix Released
Status in xserver-xorg-video-qxl package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Released
Status in linux source package in Noble:
  Fix Released

Bug description:
  I'm running a new installation of Ubuntu 22.04 Desktop on a Thinkpad
  T450s (core i5-5200, 2 cores / 4 vCPUs, 12 GB memory). Using the virt-
  manager GUI, I performed what I believe is a simple, plain-vanilla
  installation of an Ubuntu 22.04 guest running under qemu/kvm with 2
  vCPUs and 4 GB memory.

  I'm seeing very frequent, 15-second freezes of the guest. When it
  happens, the guest is completely unresponsive. After about 15 seconds,
  it works normally again, until the next freeze. The duration of the
  freeze appears to be the same every time. The guest isn't doing much -
  just open the calculator app and click number buttons. The freeze
  doesn't happen if I don't interact with the guest (just leave the
  system monitor running in the guest, so I can see that it's not
  frozen).

  I observe the freeze when I have 2 of the 4 vCPUs dedicated to the
  guest. I do NOT observe it when I have only one vCPU dedicated to the
  guest.

  Both the host and guest are using only a small fraction of the memory
  available to them. When the problem happens, the host indicates that
  CPU usage is very low across all vCPUs. The host appears to be
  operating normally when the guest is frozen.

  I see the following pair of lines in the guest syslog every time the
  freeze occurs (and only when the freeze occurs):

  May 10 13:48:40 qemu-jammy kernel: [  144.259799] qxl :00:01.0:
  object_init failed for (8298496, 0x0001)

  May 10 13:48:40 qemu-jammy kernel: [  144.259819]
  [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO

  I don't see anything in the host syslog that correlates with the
  freeze.

  If I choose "Virtio" in the Video drop-down in the virt-manager GUI,
  with "3D acceleration" UNchecked, the guest works fine, and the freeze
  never happens. Unfortunately, that loses fractional scaling, which is
  important to me. If I check "3D acceleration," the guest won't boot.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1972914/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2075314] Re: Fence expiration time out i915-0000:03:00.0:python3[10055]:4

2024-08-05 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2072755 ***
https://bugs.launchpad.net/bugs/2072755

** This bug has been marked a duplicate of bug 2072755
   i915: Fixup regressions introduced with enabling single CCS engine

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2075314

Title:
  Fence expiration time out i915-:03:00.0:python3[10055]:4

Status in linux package in Ubuntu:
  New

Bug description:
  Hi Ubuntu Kernel Team,

  I notice today this:

  # sudo dmesg | grep Fence
  [   48.545459] Fence expiration time out i915-:03:00.0:python3[7848]:2!
  [   48.545490] Fence expiration time out i915-:03:00.0:python3[7848]:4!
  [   92.203489] Fence expiration time out i915-:03:00.0:python3[10055]:4!
  [   92.203620] Fence expiration time out i915-:03:00.0:python3[10055]:2!

  # uname -a
  Linux 6.8.0-39-generic #39-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul  5 21:49:14 UTC 
2024 x86_64 x86_64 x86_64 GNU/Linux

  # sudo dmesg | head -1
  [0.00] Linux version 6.8.0-39-generic (buildd@lcy02-amd64-112) 
(x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-23ubuntu4) 13.2.0, GNU ld (GNU Binutils 
for Ubuntu) 2.42) #39-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul  5 21:49:14 UTC 2024 
(Ubuntu 6.8.0-39.39-generic 6.8.8)

  # sudo dmesg | grep i915
  [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-39-generic 
root=UUID=a21e25f8-7188-4f2a-8761-0304ad49ad52 ro audit=0 i915.enable_guc=2 
quiet splash vt.handoff=7
  [0.074853] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-39-generic 
root=UUID=a21e25f8-7188-4f2a-8761-0304ad49ad52 ro audit=0 i915.enable_guc=2 
quiet splash vt.handoff=7
  [3.080173] i915 :00:02.0: enabling device (0006 -> 0007)
  [3.080973] i915 :00:02.0: [drm] VT-d active for gfx access
  [3.081006] i915 :00:02.0: [drm] Using Transparent Hugepages
  [3.082054] mei_hdcp :00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: 
bound :00:02.0 (ops i915_hdcp_ops [i915])
  [3.087910] i915 :00:02.0: [drm] Finished loading DMC firmware 
i915/adls_dmc_ver2_01.bin (v2.1)
  [3.092625] i915 :00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin 
version 70.20.0
  [3.092628] i915 :00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin 
version 7.9.3
  [3.106447] i915 :00:02.0: [drm] GT0: HuC: authenticated for all 
workloads
  [3.106454] i915 :00:02.0: [drm] GT0: GUC: submission disabled
  [3.106457] i915 :00:02.0: [drm] GT0: GUC: SLPC disabled
  [3.107179] mei_pxp :00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: 
bound :00:02.0 (ops i915_pxp_tee_component_ops [i915])
  [3.107268] i915 :00:02.0: [drm] Protected Xe Path (PXP) protected 
content support initialized
  [3.111240] [drm] Initialized i915 1.6.0 20230929 for :00:02.0 on 
minor 1
  [3.111392] i915 display info: display version: 12
  [3.111393] i915 display info: cursor_needs_physical: no
  [3.111394] i915 display info: has_cdclk_crawl: no
  [3.111395] i915 display info: has_cdclk_squash: no
  [3.111395] i915 display info: has_ddi: yes
  [3.111396] i915 display info: has_dp_mst: yes
  [3.111396] i915 display info: has_dsb: yes
  [3.111397] i915 display info: has_fpga_dbg: yes
  [3.111397] i915 display info: has_gmch: no
  [3.111398] i915 display info: has_hotplug: yes
  [3.111398] i915 display info: has_hti: yes
  [3.111399] i915 display info: has_ipc: yes
  [3.111399] i915 display info: has_overlay: no
  [3.111400] i915 display info: has_psr: yes
  [3.111400] i915 display info: has_psr_hw_tracking: no
  [3.111400] i915 display info: overlay_needs_physical: no
  [3.111401] i915 display info: supports_tv: no
  [3.111401] i915 display info: has_hdcp: yes
  [3.111402] i915 display info: has_dmc: yes
  [3.111402] i915 display info: has_dsc: yes
  [3.111492] i915 :00:02.0: [drm] Cannot find any crtc or sizes
  [3.111591] i915 :00:02.0: [drm] Cannot find any crtc or sizes
  [3.112034] i915 :03:00.0: [drm] VT-d active for gfx access
  [3.116312] snd_hda_intel :00:1f.3: bound :00:02.0 (ops 
i915_audio_component_bind_ops [i915])
  [3.131646] i915 :03:00.0: vgaarb: deactivate vga console
  [3.131669] i915 :03:00.0: [drm] Local memory IO size: 
0x0003fa00
  [3.131670] i915 :03:00.0: [drm] Local memory available: 
0x0003fa00
  [3.147614] i915 :03:00.0: [drm] Finished loading DMC firmware 
i915/dg2_dmc_ver2_08.bin (v2.8)
  [3.170742] i915 :03:00.0: [drm] GT0: GuC firmware i915/dg2_guc_70.bin 
version 70.20.0
  [3.170745] i915 :03:00.0: [drm] GT0: HuC firmware 
i915/dg2_huc_gsc.bin version 7.10.3
  [3.249811] i915 :03:00.0: [drm] GT0: GUC: submission disabled
  [3.249821] i915 :03:00.0: [drm] GT0: GUC: SLPC disabled
  [3.275149] [drm] Initialized i915 1.6.0 202309

[Ubuntu-x-swat] [Bug 1972914] Re: frequent 15-sec guest freeze with ubuntu 22.04 host and guest

2024-08-05 Thread Matthew Ruffell
Hi everyone,

$ cd Work/kernel/ubuntu-jammy/
~/Work/kernel/ubuntu-jammy$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"'
commit 1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"

...
~/Work/kernel/ubuntu-jammy$ git describe --contains 
1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Ubuntu-5.15.0-115.125~199
~/Work/kernel/ubuntu-jammy$ cd ..
~/Work/kernel$ cd ubuntu-noble/
~/Work/kernel/ubuntu-noble$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"' origin/master-next
commit ee451375fd8b767eb91721fa389b022f1582cb0f
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"
...
~/Work/kernel/ubuntu-noble$ git describe --contains 
ee451375fd8b767eb91721fa389b022f1582cb0f
Ubuntu-6.8.0-38.38~331

This has been fixed in 5.15.0-115-generic or later, and 6.8.0-38-generic
or later.

Let me know if you need any more help.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: xserver-xorg-video-qxl (Ubuntu)
   Status: Confirmed => Invalid

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Noble)
   Importance: Undecided
   Status: New

** No longer affects: xserver-xorg-video-qxl (Ubuntu Jammy)

** No longer affects: xserver-xorg-video-qxl (Ubuntu Noble)

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Noble)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xserver-xorg-video-qxl in Ubuntu.
https://bugs.launchpad.net/bugs/1972914

Title:
  frequent 15-sec guest freeze with ubuntu 22.04 host and guest

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1972914/+subscriptions


___
Mailing list: https://launchpad.net/~ubuntu-x-swat
Post to : ubuntu-x-swat@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-x-swat
More help   : https://help.launchpad.net/ListHelp


[Bug 1972914] Re: frequent 15-sec guest freeze with ubuntu 22.04 host and guest

2024-08-03 Thread Matthew Ruffell
Hi everyone,

$ cd Work/kernel/ubuntu-jammy/
~/Work/kernel/ubuntu-jammy$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"'
commit 1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"

...
~/Work/kernel/ubuntu-jammy$ git describe --contains 
1b146e3dc802253fd9a6e29e2d3b06d003fe9182
Ubuntu-5.15.0-115.125~199
~/Work/kernel/ubuntu-jammy$ cd ..
~/Work/kernel$ cd ubuntu-noble/
~/Work/kernel/ubuntu-noble$ git log --grep 'Revert "drm/qxl: simplify 
qxl_fence_wait"' origin/master-next
commit ee451375fd8b767eb91721fa389b022f1582cb0f
Author: Alex Constantino 
Date:   Thu Apr 4 19:14:48 2024 +0100

Revert "drm/qxl: simplify qxl_fence_wait"
...
~/Work/kernel/ubuntu-noble$ git describe --contains 
ee451375fd8b767eb91721fa389b022f1582cb0f
Ubuntu-6.8.0-38.38~331

This has been fixed in 5.15.0-115-generic or later, and 6.8.0-38-generic
or later.

Let me know if you need any more help.

Thanks,
Matthew

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: xserver-xorg-video-qxl (Ubuntu)
   Status: Confirmed => Invalid

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: xserver-xorg-video-qxl (Ubuntu Noble)
   Importance: Undecided
   Status: New

** No longer affects: xserver-xorg-video-qxl (Ubuntu Jammy)

** No longer affects: xserver-xorg-video-qxl (Ubuntu Noble)

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Noble)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1972914

Title:
  frequent 15-sec guest freeze with ubuntu 22.04 host and guest

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1972914/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2070020] Re: Lenovo dock no longer working after upgrade from 6.5.0-35 to 6.5.0-41

2024-08-03 Thread Matthew Ruffell
Hi everyone,

The Jammy HWE kernel will be rolling to the 6.8 kernel from Noble in a
couple days / or a week or so. The 6.5 mantic kernel is closed to any
new commits now, so best to just move to the 6.8 kernel now where this
is already fixed.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2070020

Title:
  Lenovo dock no longer working after upgrade from 6.5.0-35 to 6.5.0-41

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2070020/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2076003] Re: Shows Unknow display

2024-08-03 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2076004 ***
https://bugs.launchpad.net/bugs/2076004

** This bug has been marked a duplicate of bug 2076004
   Shows Unknown display with 6.8.0-39

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076003

Title:
  Shows Unknow display

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2076003/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2069534] Re: Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

2024-08-03 Thread Matthew Ruffell
Hi Chris,

Yes, jammy-hwe-6.8 got fixed because Stefan Bader had to respin the kernel
for another regression anyway, so he opportunistically pulled it in.

For Noble, I think it will be part of the s2024.07.08 SRU cycle, as per
https://kernel.ubuntu.com/, as Manuel Diewald mentioned when I spoke to him.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069534

Title:
  Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069534/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2069534] Re: Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

2024-08-01 Thread Matthew Ruffell
Performing verification for jammy-hwe-6.8

I started two T2A instances on google cloud, which are arm64, with
jammy.

One instance has:
6.8.0-39-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jul 10 16:59:11 UTC 
2
The other, 6.8.0-40-generic from -proposed:
6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:53:10 UTC 
2

I edited /etc/default/grub.d/50-cloudimg-settings.cfg and set:

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200"

to

GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232"

ran:

$ sudo update-grub

and rebooted.

Unfortunately, I never saw the 6.8.0-39-generic again.

The 6.8.0-40-generic instance came up just fine:

$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-6.8.0-40-generic 
root=PARTUUID=17337627-dfbd-4ce7-9f99-4dd1da2542eb ro console=ttyS0,115200 
testparam=f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b54edcba27e5f790d47911a4cc3e726d8d256878d3df9175c020e0f081c381e7b5732f126a62b4232

The 6.8.0-40-generic in -proposed fixes the issue. Happy to mark
verified for jammy-hwe-6.8.

** Tags removed: verification-needed-jammy-linux-hwe-6.8
** Tags added: verification-done-jammy-linux-hwe-6.8

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069534

Title:
  Linux 6.8 fails to boot on ARM64 if any param is more than 146 chars

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069534/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-07-31 Thread Matthew Ruffell
Beginning verification for noble

I started two c5.large instances on us-west-2 on AWS, the same parameters that 
we used in previous tests. Each has a 60gb GP3 volume attached to it.
Each instance is running the GA kernel, 6.8.0-1012-aws.

One is -updates, the other is e2fsprogs from -proposed:

-updates:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.47.0-2.4~exp1ubuntu4
-proposed:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.47.0-2.4~exp1ubuntu4.1
  
Each is running the same script from the testcase.

I will leave these instances running for the next 7-14 days. We will
consider this bug verified if the -updates instance is broken, and the
-proposed instance still functioning correctly at the end of this time.

The timestamp of starting both tests is: Thu Aug  1 06:10:40 UTC 2024

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-07-31 Thread Matthew Ruffell
Beginning verification for jammy

I started two c5.large instances on us-west-2 on AWS, the same parameters that 
we used in previous tests. Each has a 60gb GP3 volume attached to it.
I downgraded the HWE kernels down to GA kernels, and each is running 
5.15.0-1066-aws.

One is -updates, the other is e2fsprogs from -proposed:

-updates:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.46.5-2ubuntu1.1
-proposed:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.46.5-2ubuntu1.2
  
Each is running the same script from the testcase.

I will leave these instances running for the next 7-14 days. We will
consider this bug verified if the -updates instance is broken, and the
-proposed instance still functioning correctly at the end of this time.

The timestamp of starting both tests is: Thu Aug  1 06:02:48 UTC 2024

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

2024-07-31 Thread Matthew Ruffell
Beginning verification for focal

I started two c5.large instances on us-west-2 on AWS, the same parameters that 
we used in previous tests. Each has a 60gb GP3 volume attached to it.
I downgraded the HWE kernels down to GA kernels, and each is running 
5.4.0-1129-aws.

One is -updates, the other is e2fsprogs from -proposed:

-updates:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.45.5-2ubuntu1.1
-proposed:
$ apt-cache policy e2fsprogs | grep Installed
  Installed: 1.45.5-2ubuntu1.2
  
Each is running the same script from the testcase.

I will leave these instances running for the next 7-14 days. We will
consider this bug verified if the -updates instance is broken, and the
-proposed instance still functioning correctly at the end of this time.

The timestamp of starting both tests is: Thu Aug  1 05:46:34 UTC 2024

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-07-31 Thread Matthew Ruffell
Hi David,

Yes, it does indeed seem that the CVE has now been rejected.

https://lore.kernel.org/linux-cve-announce/2024073029-clerk-
trophy-b84c@gregkh/

https://nvd.nist.gov/vuln/detail/CVE-2024-35918
https://www.cve.org/CVERecord/?id=CVE-2024-35918

Maybe we can revert it after all!

I will have a talk with Aaron and the Kernel Team about how we should
move forward.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-07-31 Thread Matthew Ruffell
Hi everyone,

I see that Greg KH just assigned CVE-2024-35918 to "randomize_kstack:
Improve entropy diffusion".

I suppose that means that we cannot revert it now.

https://lore.kernel.org/linux-cve-announce/2024073029-clerk-
trophy-b84c@gregkh/T/

This is going to take some time.

Thanks,
Matthew

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2024-35918

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-07-31 Thread Matthew Ruffell
Hi everyone,

Keith, thanks for testing! It is fantastic news that the test kernel
fixes the issue, and our theory is correct, and we are on the right
track.

I also tested both test kernels myself, and can confirm that reverting
"randomize_kstack: Improve entropy diffusion" fixes the issue, and that
"randomize_kstack: Remove non-functional per-arch entropy filtering" has
no improvement at all, and still has the issue.

So, I think the way forward is to do the unpopular thing, which is to
revert "randomize_kstack: Improve entropy diffusion".

We should never have changed something so fundamental as the kernel
thread stack size on a stable kernel. We can do that on the development
release sure, but not on LTS kernels.

I will have a talk with the Kernel team about it, but I will begin
preparing the patches and writing a SRU template.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2075314] Re: Fence expiration time out i915-0000:03:00.0:python3[10055]:4

2024-07-31 Thread Matthew Ruffell
*** This bug is a duplicate of bug 2072755 ***
https://bugs.launchpad.net/bugs/2072755

** This bug has been marked a duplicate of bug 2072755
   i915: Fixup regressions introduced with enabling single CCS engine

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2075314

Title:
  Fence expiration time out i915-:03:00.0:python3[10055]:4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075314/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 2073267] Re: Virtualbox Guru meditation on VM start caused by kernel commit in v6.9-rc4

2024-07-31 Thread Matthew Ruffell
Hi Gianfranco,

Great! Let's work this out together.

I have strong doubts that:

commit ef40d28f17bd384d7e0b630c7d83f108a526351b
Author: Kees Cook 
Date:   Wed Jun 19 14:47:15 2024 -0700
Subject: randomize_kstack: Remove non-functional per-arch entropy filtering
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ef40d28f17bd384d7e0b630c7d83f108a526351b

will fix this issue. If you read the commit log and the code, it clearly
removes all per-arch entropy values, and just sticks with a universal
1KiB of space for all architectures.

The root cause of the issue is that "randomize_kstack: Improve entropy
diffusion" changed the amd64 kernel stack consumption for randomisation,
from 0.25KiB to a full 1KiB of space. The per thread kernel stacks are
only 16KiB in size, so we went from VirtualBox having 15.75KiB of stack
space down to 15KiB. VirtualBox must have been really pushing the limit
and needing that extra 0.75KiB of space, since without it, we panic.

They probably made some architectural changes in 7.0.x that reduce the
total kernel thread stack consumption, and now fall under the 15KiB
limit that "randomize_kstack: Improve entropy diffusion" imposes.

Anyway, I still made you a test kernel. It is based on
5.15.0-117-generic + "randomize_kstack: Remove non-functional per-arch
entropy filtering".

This test kernel is for Gianfranco Costamagna ONLY! Other users, please,
try my other test kernel in the above comment instead.

Gianfranco, the kernel will be ready in 3 hours from this message. They
are still building.

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING
PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to Install (On a focal or jammy system):
1) sudo add-apt-repository ppa:mruffell/lp2073267-test-2
2) sudo apt update
3) sudo apt install linux-image-unsigned-5.15.0-117-generic 
linux-modules-5.15.0-117-generic linux-modules-extra-5.15.0-117-generic 
linux-headers-5.15.0-117-generic
4) sudo reboot
5) uname -rv
Look for +TEST2073267v20240731b2 

Anyway, I think we either need to figure out how to get the virtualbox
kernel module stack consumption down, or we revert "randomize_kstack:
Improve entropy diffusion" for focal, focal HWE, jammy, jammy HWE (but
not noble).

 virtualbox | 6.1.6-dfsg-1| focal/multiverse   
| source, amd64
 virtualbox | 6.1.32-dfsg-1build1 | jammy/multiverse   
| source, amd64
 virtualbox | 6.1.50-dfsg-1~ubuntu1.20.04.1   | focal-security/multiverse  
| source, amd64
 virtualbox | 6.1.50-dfsg-1~ubuntu1.20.04.1   | focal-updates/multiverse   
| source, amd64
 virtualbox | 6.1.50-dfsg-1~ubuntu1.22.04.1   | jammy-updates/multiverse   
| source, amd64
 virtualbox | 6.1.50-dfsg-1~ubuntu1.22.04.2   | jammy-proposed/multiverse  
| source, amd64
 virtualbox | 7.0.16-dfsg-2   | noble/multiverse   
| source, amd64
 virtualbox | 7.0.16-dfsg-2ubuntu1| noble-updates/multiverse   
| source, amd64
 virtualbox | 7.0.20-dfsg-1   | oracular/multiverse
| source, amd64
 
Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2073267

Title:
  Virtualbox Guru meditation on VM start caused by kernel commit in
  v6.9-rc4

Status in linux package in Ubuntu:
  Triaged
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in virtualbox package in Ubuntu:
  Confirmed

Bug description:
  It worked yesterday, but today I get a Guru Meditation trying to start
  some of my virtual machines. This shows up in VBox.log as "VCPU0: Guru
  Meditation -2708 (VERR_VMM_SET_JMP_ABORTED_RESUME)". I suspect this
  may have started due to a Linux kernel upgrade I installed this
  morning.

  A fresh VM with no disk shows the issue. Sometimes turning off the I/O
  APIC makes the issue go away, sometimes not. Turning off nested paging
  sometimes lets VirtualBox make a little bit of progress w.r.t. booting
  VMs, but that usually still crashes before the VM finishes starting.

  This may be related to this bug reported on the VirtualBox forums:
  
https://forums.virtualbox.org/viewtopic.php?t=111889&sid=5cd33c0872a03b689e7e9f84d850f538

  https://forums.virtualbox.org/viewtopic.php?t=111918

  Ubuntu is 22.04.4 LTS, kernel is 5.15.0-116-generic, VirtualBox is
  6.1.50-dfsg-1~ubuntu1.22.04.1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2073267/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


  1   2   3   4   5   6   7   8   9   10   >