[Kernel-packages] [Bug 2062380] Re: Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper
** Summary changed: - Using a 6.8 kernel modprobe nvidia hangs on Grace Hopper + Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper ** Also affects: nvidia-graphics-drivers-535-server (Ubuntu) Importance: Undecided Status: New ** Changed in: nvidia-graphics-drivers-535-server (Ubuntu) Status: New => Confirmed ** Changed in: nvidia-graphics-drivers-550-server (Ubuntu) Status: New => Confirmed ** Description changed: Using both -generic and -nvidia 6.8 kernels I'm seeing a hang when I load the nvidia driver. + + $ sudo dmidecode -t 0 + # dmidecode 3.5 + Getting SMBIOS data from sysfs. + SMBIOS 3.6.0 present. + # SMBIOS implementations newer than version 3.5.0 are not + # fully supported by this version of dmidecode. + + Handle 0x0001, DMI type 0, 26 bytes + BIOS Information + Vendor: NVIDIA + Version: 01.02.01 + Release Date: 20240207 + ROM Size: 64 MB + Characteristics: + PCI is supported + PNP is supported + BIOS is upgradeable + BIOS shadowing is allowed + Boot from CD is supported + Selectable boot is supported + Serial services are supported (int 14h) + ACPI is supported + Targeted content distribution is supported + UEFI is supported + Firmware Revision: 0.0 [ 382.938326] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 382.946075] rcu: 53-...0: (4 ticks this GP) idle=1c2c/1/0x4000 softirq=4866/4868 fqs=14124 [ 382.955683] rcu: hardirqs softirqs csw/system [ 382.961378] rcu: number:0 00 [ 382.967071] rcu: cputime:0 00 ==> 30026(ms) [ 382.974189] rcu: (detected by 52, t=60034 jiffies, g=24469, q=1199 ncpus=72) [ 392.982095] rcu: rcu_preempt kthread starved for 9994 jiffies! g24469 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 392.992769] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior - After seeing this, I Enabled kdump and set kernel.panic_on_rcu_stall = 1 KDUMP INFO WARNING: cpu 54: cannot find NT_PRSTATUS note - KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k [TAINTED] - DUMPFILE: /var/crash/202404172139/dump.202404172139 [PARTIAL DUMP] - CPUS: 72 - DATE: Wed Apr 17 21:39:13 UTC 2024 - UPTIME: 00:06:10 + KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k [TAINTED] + DUMPFILE: /var/crash/202404172139/dump.202404172139 [PARTIAL DUMP] + CPUS: 72 + DATE: Wed Apr 17 21:39:13 UTC 2024 + UPTIME: 00:06:10 LOAD AVERAGE: 0.68, 0.63, 0.28 -TASKS: 854 - NODENAME: hinyari - RELEASE: 6.8.0-1005-nvidia-64k - VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024 - MACHINE: aarch64 (unknown Mhz) - MEMORY: 479.7 GB -PANIC: "Kernel panic - not syncing: RCU Stall" - PID: 0 - COMMAND: "swapper/21" - TASK: 82026880 (1 of 72) [THREAD_INFO: 82026880] - CPU: 21 -STATE: TASK_RUNNING (PANIC) + TASKS: 854 + NODENAME: hinyari + RELEASE: 6.8.0-1005-nvidia-64k + VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024 + MACHINE: aarch64 (unknown Mhz) + MEMORY: 479.7 GB + PANIC: "Kernel panic - not syncing: RCU Stall" + PID: 0 + COMMAND: "swapper/21" + TASK: 82026880 (1 of 72) [THREAD_INFO: 82026880] + CPU: 21 + STATE: TASK_RUNNING (PANIC) [ 300.313144] nvidia: loading out-of-tree module taints kernel. [ 300.313153] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 300.316694] nvidia-nvlink: Nvlink Core is being initialized, major device number 506 - [ 300.316699] + [ 300.316699] [ 360.323454] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 360.331206] rcu: 54-...0: (24 ticks this GP) idle=742c/1/0x4000 softirq=4931/4933 fqs=13148 [ 360.340903] rcu: hardirqs softirqs csw/system [ 360.346597] rcu: number:0 00 [ 360.352291] rcu: cputime:0 00 ==> 30031(ms) [ 360.359408] rcu: (detected by 21, t=60038 jiffies, g=25009, q=1123 ncpus=72) [ 360.366704] Sending NMI from CPU 21 to CPUs 54: [ 370.367310] rcu: rcu_preempt kthread starved for 9993 jiffies! g25009 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 370.377983] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 370.387322] rcu: RCU grace-period kthread stack dump: [ 370.392482] task:rcu_preempt state:I stack:0 pid:17tgid:17 ppid:2 flags:0x0008 [ 370.392488] Call trace: [
[Kernel-packages] [Bug 2062380] [NEW] Using a 6.8 kernel modprobe nvidia hangs on Grace Hopper
Public bug reported: Using both -generic and -nvidia 6.8 kernels I'm seeing a hang when I load the nvidia driver. [ 382.938326] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 382.946075] rcu: 53-...0: (4 ticks this GP) idle=1c2c/1/0x4000 softirq=4866/4868 fqs=14124 [ 382.955683] rcu: hardirqs softirqs csw/system [ 382.961378] rcu: number:0 00 [ 382.967071] rcu: cputime:0 00 ==> 30026(ms) [ 382.974189] rcu: (detected by 52, t=60034 jiffies, g=24469, q=1199 ncpus=72) [ 392.982095] rcu: rcu_preempt kthread starved for 9994 jiffies! g24469 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 392.992769] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior After seeing this, I Enabled kdump and set kernel.panic_on_rcu_stall = 1 KDUMP INFO WARNING: cpu 54: cannot find NT_PRSTATUS note KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k [TAINTED] DUMPFILE: /var/crash/202404172139/dump.202404172139 [PARTIAL DUMP] CPUS: 72 DATE: Wed Apr 17 21:39:13 UTC 2024 UPTIME: 00:06:10 LOAD AVERAGE: 0.68, 0.63, 0.28 TASKS: 854 NODENAME: hinyari RELEASE: 6.8.0-1005-nvidia-64k VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024 MACHINE: aarch64 (unknown Mhz) MEMORY: 479.7 GB PANIC: "Kernel panic - not syncing: RCU Stall" PID: 0 COMMAND: "swapper/21" TASK: 82026880 (1 of 72) [THREAD_INFO: 82026880] CPU: 21 STATE: TASK_RUNNING (PANIC) [ 300.313144] nvidia: loading out-of-tree module taints kernel. [ 300.313153] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 300.316694] nvidia-nvlink: Nvlink Core is being initialized, major device number 506 [ 300.316699] [ 360.323454] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 360.331206] rcu: 54-...0: (24 ticks this GP) idle=742c/1/0x4000 softirq=4931/4933 fqs=13148 [ 360.340903] rcu: hardirqs softirqs csw/system [ 360.346597] rcu: number:0 00 [ 360.352291] rcu: cputime:0 00 ==> 30031(ms) [ 360.359408] rcu: (detected by 21, t=60038 jiffies, g=25009, q=1123 ncpus=72) [ 360.366704] Sending NMI from CPU 21 to CPUs 54: [ 370.367310] rcu: rcu_preempt kthread starved for 9993 jiffies! g25009 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 370.377983] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 370.387322] rcu: RCU grace-period kthread stack dump: [ 370.392482] task:rcu_preempt state:I stack:0 pid:17tgid:17 ppid:2 flags:0x0008 [ 370.392488] Call trace: [ 370.392489] __switch_to+0xd0/0x118 [ 370.392499] __schedule+0x2a8/0x7b0 [ 370.392501] schedule+0x40/0x168 [ 370.392502] schedule_timeout+0xac/0x1e0 [ 370.392505] rcu_gp_fqs_loop+0x128/0x508 [ 370.392512] rcu_gp_kthread+0x150/0x188 [ 370.392514] kthread+0xf8/0x110 [ 370.392519] ret_from_fork+0x10/0x20 [ 370.392524] rcu: Stack dump where RCU GP kthread last ran: [ 370.398128] Sending NMI from CPU 21 to CPUs 31: [ 370.398131] NMI backtrace for cpu 31 [ 370.398136] CPU: 31 PID: 0 Comm: swapper/31 Kdump: loaded Tainted: G OE 6.8.0-1005-nvidia-64k #5-Ubuntu [ 370.398139] Hardware name: /P3880, BIOS 01.02.01 20240207 [ 370.398140] pstate: 6349 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 370.398142] pc : cpuidle_enter_state+0xd8/0x790 [ 370.398150] lr : cpuidle_enter_state+0xcc/0x790 [ 370.398153] sp : 800081eefd70 [ 370.398154] x29: 800081eefd70 x28: x27: [ 370.398157] x26: x25: 00563d67e4e0 x24: [ 370.398160] x23: a0a1445699f8 x22: x21: 00563d72ece0 [ 370.398162] x20: a0a144569a10 x19: 8fa4a800 x18: 800081f00030 [ 370.398165] x17: x16: x15: ac8c73b08db0 [ 370.398168] x14: x13: x12: [ 370.398170] x11: x10: 2da0fbe3d5e8c649 x9 : a0a1424fd244 [ 370.398173] x8 : 820559b8 x7 : x6 : [ 370.398175] x5 : x4 : x3 : [ 370.398178] x2 : x1 : x0 : [ 370.398181] Call trace: [ 370.398183] cpuidle_enter_state+0xd8/0x790 [ 370.398185] cpuidle_enter+0x44/0x78 [ 370.398195] cpuidle_idle_call+0x15c/0x210 [ 370.398202] do_idle+0xb0/0x130 [ 370.398204] cpu_startup_entry+0x40/0x50 [ 370.398206] secondary_start_kernel+0xec/0x130 [ 370.398211] __secondary_switched+0xc0/0xc8 [ 370.399132] Kernel panic - not syncing: RCU Stall [ 370.403938] CPU: 21 PID: 0 Comm:
[Kernel-packages] [Bug 2055712] Re: Pull-request to address bug in mm/page_alloc.c
** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2055712 Title: Pull-request to address bug in mm/page_alloc.c Status in linux-nvidia-6.5 package in Ubuntu: Fix Released Bug description: The current calculation of min_free_kbytes only uses ZONE_DMA and ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will also divide part of min_free_kbytes.This will cause the min watermark of ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE. __GFP_HIGH and PF_MEMALLOC allocations usually don't need movable zone pages, so just like ZONE_HIGHMEM, cap pages_min to a small value in __setup_per_zone_wmarks(). On my testing machine with 16GB of memory (transparent hugepage is turned off by default, and movablecore=12G is configured) The following is a comparative test data of watermark_min no patchadd patch ZONE_DMA1 8 ZONE_DMA32 151 709 ZONE_NORMAL 233 1113 ZONE_MOVABLE1434128 min_free_kbytes 72887326 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2055712/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055712] Re: Pull-request to address bug in mm/page_alloc.c
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2055712 Title: Pull-request to address bug in mm/page_alloc.c Status in linux-nvidia-6.5 package in Ubuntu: Fix Released Bug description: The current calculation of min_free_kbytes only uses ZONE_DMA and ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will also divide part of min_free_kbytes.This will cause the min watermark of ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE. __GFP_HIGH and PF_MEMALLOC allocations usually don't need movable zone pages, so just like ZONE_HIGHMEM, cap pages_min to a small value in __setup_per_zone_wmarks(). On my testing machine with 16GB of memory (transparent hugepage is turned off by default, and movablecore=12G is configured) The following is a comparative test data of watermark_min no patchadd patch ZONE_DMA1 8 ZONE_DMA32 151 709 ZONE_NORMAL 233 1113 ZONE_MOVABLE1434128 min_free_kbytes 72887326 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2055712/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2059150] Re: jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper
Upgrading bios firmware resolves failure $ sudo dmidecode -t 0 # dmidecode 3.3 Getting SMBIOS data from sysfs. SMBIOS 3.6.0 present. # SMBIOS implementations newer than version 3.5.0 are not # fully supported by this version of dmidecode. Handle 0x0001, DMI type 0, 26 bytes BIOS Information Vendor: NVIDIA Version: 01.02.01 Release Date: 20240207 ROM Size: 64 MB Characteristics: PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported Serial services are supported (int 14h) ACPI is supported Targeted content distribution is supported UEFI is supported Firmware Revision: 0.0 ** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2059150 Title: jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper Status in linux-nvidia-6.5 package in Ubuntu: Invalid Bug description: Output from BMC SOL console: Unhandled Exception from EL2 x0 = 0x11f210305619 x1 = 0x x2 = 0x x3 = 0x x4 = 0x5f972493 x5 = 0x x6 = 0x x7 = 0x x8 = 0x x9 = 0xa0e0a03e7d6c x10= 0x x11= 0x x12= 0x x13= 0x x14= 0x x15= 0x x16= 0x x17= 0x x18= 0x x19= 0xf0f18080 x20= 0x80009e86f6a0 x21= 0x80009e86f720 x22= 0x07a5a0e0a03e7d6c x23= 0x x24= 0xa0e0a3348aa0 x25= 0xa0e0a2990008 x26= 0xa0e0a2990008 x27= 0xa0e04b4f5748 x28= 0x80009e86f710 x29= 0x80008000fe00 x30= 0xa0e0a03e7d6c scr_el3= 0x0407073d sctlr_el3 = 0x30cd183f cptr_el3 = 0x00100100 tcr_el3= 0x80853510 daif = 0x02c0 mair_el3 = 0x004404ff spsr_el3 = 0x034000c9 elr_el3= 0xa0e04b4f58b4 ttbr0_el3 = 0x0078734a5001 esr_el3= 0x622c5c1f far_el3= 0x9446dd42099e8148 spsr_el1 = 0x elr_el1= 0x spsr_abt = 0x spsr_und = 0x spsr_irq = 0x spsr_fiq = 0x sctlr_el1 = 0x30d00980 actlr_el1 = 0x cpacr_el1 = 0x0030 csselr_el1 = 0x0002 sp_el1 = 0x esr_el1= 0x ttbr0_el1 = 0x ttbr1_el1 = 0x mair_el1 = 0x amair_el1 = 0x tcr_el1= 0x tpidr_el1 = 0x tpidr_el0 = 0x8000 tpidrro_el0= 0x par_el1= 0x0800 mpidr_el1 = 0x8102 afsr0_el1 = 0x afsr1_el1 = 0x contextidr_el1 = 0x vbar_el1 = 0x cntp_ctl_el0 = 0x cntp_cval_el0 = 0x0012ec91c420 cntv_ctl_el0 = 0x cntv_cval_el0 = 0x cntkctl_el1= 0x sp_el0 = 0x0078732cf4f0 isr_el1= 0x0040 cpuectlr_el1 = 0x4000340340003000 gicd_ispendr regs (Offsets 0x200 - 0x278) Offset:value 0200: 0xUnhandled Exception in EL3. x30= 0x0078732c4384 x0 = 0x x1 = 0x0078732cb7d8 x2 = 0x0018 x3 = 0x0078732b1720 x4 = 0x x5 = 0x003c x6 = 0x0078732c9109 x7 = 0x22000204 x8 = 0x4000340340003000 x9 = 0x x10= 0x x11= 0x0012ec91c420 x12= 0x x13= 0x x14= 0x
[Kernel-packages] [Bug 2059150] [NEW] jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper
Public bug reported: Output from BMC SOL console: Unhandled Exception from EL2 x0 = 0x11f210305619 x1 = 0x x2 = 0x x3 = 0x x4 = 0x5f972493 x5 = 0x x6 = 0x x7 = 0x x8 = 0x x9 = 0xa0e0a03e7d6c x10= 0x x11= 0x x12= 0x x13= 0x x14= 0x x15= 0x x16= 0x x17= 0x x18= 0x x19= 0xf0f18080 x20= 0x80009e86f6a0 x21= 0x80009e86f720 x22= 0x07a5a0e0a03e7d6c x23= 0x x24= 0xa0e0a3348aa0 x25= 0xa0e0a2990008 x26= 0xa0e0a2990008 x27= 0xa0e04b4f5748 x28= 0x80009e86f710 x29= 0x80008000fe00 x30= 0xa0e0a03e7d6c scr_el3= 0x0407073d sctlr_el3 = 0x30cd183f cptr_el3 = 0x00100100 tcr_el3= 0x80853510 daif = 0x02c0 mair_el3 = 0x004404ff spsr_el3 = 0x034000c9 elr_el3= 0xa0e04b4f58b4 ttbr0_el3 = 0x0078734a5001 esr_el3= 0x622c5c1f far_el3= 0x9446dd42099e8148 spsr_el1 = 0x elr_el1= 0x spsr_abt = 0x spsr_und = 0x spsr_irq = 0x spsr_fiq = 0x sctlr_el1 = 0x30d00980 actlr_el1 = 0x cpacr_el1 = 0x0030 csselr_el1 = 0x0002 sp_el1 = 0x esr_el1= 0x ttbr0_el1 = 0x ttbr1_el1 = 0x mair_el1 = 0x amair_el1 = 0x tcr_el1= 0x tpidr_el1 = 0x tpidr_el0 = 0x8000 tpidrro_el0= 0x par_el1= 0x0800 mpidr_el1 = 0x8102 afsr0_el1 = 0x afsr1_el1 = 0x contextidr_el1 = 0x vbar_el1 = 0x cntp_ctl_el0 = 0x cntp_cval_el0 = 0x0012ec91c420 cntv_ctl_el0 = 0x cntv_cval_el0 = 0x cntkctl_el1= 0x sp_el0 = 0x0078732cf4f0 isr_el1= 0x0040 cpuectlr_el1 = 0x4000340340003000 gicd_ispendr regs (Offsets 0x200 - 0x278) Offset:value 0200: 0xUnhandled Exception in EL3. x30= 0x0078732c4384 x0 = 0x x1 = 0x0078732cb7d8 x2 = 0x0018 x3 = 0x0078732b1720 x4 = 0x x5 = 0x003c x6 = 0x0078732c9109 x7 = 0x22000204 x8 = 0x4000340340003000 x9 = 0x x10= 0x x11= 0x0012ec91c420 x12= 0x x13= 0x x14= 0x x15= 0x0078732cf4f0 x16= 0x2200 x17= 0x0018 x18= 0x0407073d x19= 0x007873386440 x20= 0x80009e86f6a0 x21= 0x80009e86f720 x22= 0x07a5a0e0a03e7d6c x23= 0x x24= 0xa0e0a3348aa0 x25= 0xa0e0a2990008 x26= 0xa0e0a2990008 x27= 0xa0e04b4f5748 x28= 0x80009e86f710 x29= 0x80008000fe00 scr_el3= 0x0407073d sctlr_el3 = 0x30cd183f cptr_el3 = 0x00100100 tcr_el3= 0x80853510 daif = 0x03c0 mair_el3 = 0x004404ff spsr_el3 = 0x834002cd elr_el3= 0x0078732b0af4 ttbr0_el3 = 0x0078734a5001 esr_el3= 0xbe11 far_el3= 0x9446dd42099e8148 spsr_el1 = 0x elr_el1= 0x spsr_abt = 0x spsr_und = 0x spsr_irq = 0x spsr_fiq = 0x sctlr_el1 = 0x30d00980 actlr_el1 = 0x cpacr_el1 = 0x0030 csselr_el1 = 0x0002 sp_el1 = 0x esr_el1= 0x ttbr0_el1 = 0x ttbr1_el1 = 0x mair_el1 =
[Kernel-packages] [Bug 2056448] Re: hfs: weird file system free block state after creating files and removing them with a mix of i/o operations
** Summary changed: - weird file system free block state after creating files and removing them with a mix of i/o operations + hfs: weird file system free block state after creating files and removing them with a mix of i/o operations -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056448 Title: hfs: weird file system free block state after creating files and removing them with a mix of i/o operations Status in Linux: Unknown Status in linux package in Ubuntu: New Bug description: Summary: create hfs file system, loop-back mount it, run iomix stessor with stress-ng to exercise with a mix of file I/O operations and remove files at end. File system is empty but a lot of blocks are used and can't seem to be recovered. Kernel: 6.8.0-11-generic test case: sudo apt-get install hfsprogs dd if=/dev/zero of=fs.img bs=1M count=2048 mkfs.hfs fs.img sudo mount fs.img /mnt sudo mkdir /mnt/x df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 2015 2095113 1% /mnt sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2 df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 674635 1422493 33% /mnt ls -alR /mnt/ /mnt/: total 4 drwxr-xr-x 1 root root3 Mar 7 12:37 . drwxr-xr-x 23 root root 4096 Feb 28 14:13 .. drwxr-xr-x 1 root root2 Mar 7 12:37 x /mnt/x: total 0 drwxr-xr-x 1 root root 2 Mar 7 12:37 . drwxr-xr-x 1 root root 3 Mar 7 12:37 .. ..so file system is 33% full, but no files are on the file system. Something looks wrong here. To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/2056448/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2056448] Re: weird file system free block state after creating files and removing them with a mix of i/o operations
** Bug watch added: Linux Kernel Bug Tracker #218571 https://bugzilla.kernel.org/show_bug.cgi?id=218571 ** Also affects: linux via https://bugzilla.kernel.org/show_bug.cgi?id=218571 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056448 Title: weird file system free block state after creating files and removing them with a mix of i/o operations Status in Linux: Unknown Status in linux package in Ubuntu: New Bug description: Summary: create hfs file system, loop-back mount it, run iomix stessor with stress-ng to exercise with a mix of file I/O operations and remove files at end. File system is empty but a lot of blocks are used and can't seem to be recovered. Kernel: 6.8.0-11-generic test case: sudo apt-get install hfsprogs dd if=/dev/zero of=fs.img bs=1M count=2048 mkfs.hfs fs.img sudo mount fs.img /mnt sudo mkdir /mnt/x df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 2015 2095113 1% /mnt sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2 df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 674635 1422493 33% /mnt ls -alR /mnt/ /mnt/: total 4 drwxr-xr-x 1 root root3 Mar 7 12:37 . drwxr-xr-x 23 root root 4096 Feb 28 14:13 .. drwxr-xr-x 1 root root2 Mar 7 12:37 x /mnt/x: total 0 drwxr-xr-x 1 root root 2 Mar 7 12:37 . drwxr-xr-x 1 root root 3 Mar 7 12:37 .. ..so file system is 33% full, but no files are on the file system. Something looks wrong here. To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/2056448/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2056451] [NEW] hfs: concurrent create/unlink can trip -EEXIST on non-existent files
Public bug reported: Summary: create hfs file system, loop-back mount it, run filename stressor with stress-ng to exercise filename create/stat/unlink and we get unexpected -EEXIST errors. This can be worked around by adding a sync() call after the unlink() to ensure metadata is sync'd. Kernel: 6.8.0-11-generic test case: sudo apt-get install hfsprogs dd if=/dev/zero of=fs.img bs=1M count=2048 mkfs.hfs fs.img sudo mount fs.img /mnt sudo mkdir /mnt/x sudo stress-ng --temp-path /mnt/x --filename 8 --filename-opts posix -t 20 stress-ng: info: [132412] setting to a 20 secs run per stressor stress-ng: info: [132412] dispatching hogs: 8 filename stress-ng: fail: [132424] filename: open failed on file of length 1 bytes, errno=17 (File exists) stress-ng: fail: [132428] filename: open failed on file of length 20 bytes, errno=17 (File exists) stress-ng: fail: [132423] filename: open failed on file of length 30 bytes, errno=17 (File exists) stress-ng: fail: [132421] filename: open failed on file of length 30 bytes, errno=17 (File exists) stress-ng: fail: [132428] filename: open failed on file of length 30 bytes, errno=17 (File exists) stress-ng: fail: [132426] filename: open failed on file of length 23 bytes, errno=17 (File exists) stress-ng: fail: [132425] filename: open failed on file of length 30 bytes, errno=17 (File exists) stress-ng: fail: [132428] filename: open failed on file of length 1 bytes, errno=17 (File exists) stress-ng: fail: [132423] filename: open failed on file of length 7 bytes, errno=17 (File exists) stress-ng: fail: [132423] filename: open failed on file of length 11 bytes, errno=17 (File exists) stress-ng: fail: [132426] filename: open failed on file of length 24 bytes, errno=17 (File exists) adding a sync() call in the stress-ng stressor fixes the issue: git diff diff --git a/stress-filename.c b/stress-filename.c index a64898fb1..b8266f91e 100644 --- a/stress-filename.c +++ b/stress-filename.c @@ -308,6 +308,7 @@ static void stress_filename_test( VOID_RET(int, shim_stat(filename, )); (void)shim_unlink(filename); + (void)sync(); } /* exercise dcache lookup of non-existent filename */ sudo stress-ng --temp-path /mnt/x --filename 8 --filename-opts posix -t 20 stress-ng: info: [132461] setting to a 20 secs run per stressor stress-ng: info: [132461] dispatching hogs: 8 filename stress-ng: info: [132461] skipped: 0 stress-ng: info: [132461] passed: 8: filename (8) stress-ng: info: [132461] failed: 0 stress-ng: info: [132461] metrics untrustworthy: 0 stress-ng: info: [132461] successful run completed in 20.05 secs The sync should not be required by the way, I just added it to illustrate that there is a racy metadata sync issue in hfs. ** Affects: linux Importance: Unknown Status: Unknown ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Summary changed: - hfs: concurrent create/unlink can trip -EEXIST on files + hfs: concurrent create/unlink can trip -EEXIST on non-existent files ** Bug watch added: Linux Kernel Bug Tracker #218570 https://bugzilla.kernel.org/show_bug.cgi?id=218570 ** Also affects: linux via https://bugzilla.kernel.org/show_bug.cgi?id=218570 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056451 Title: hfs: concurrent create/unlink can trip -EEXIST on non-existent files Status in Linux: Unknown Status in linux package in Ubuntu: New Bug description: Summary: create hfs file system, loop-back mount it, run filename stressor with stress-ng to exercise filename create/stat/unlink and we get unexpected -EEXIST errors. This can be worked around by adding a sync() call after the unlink() to ensure metadata is sync'd. Kernel: 6.8.0-11-generic test case: sudo apt-get install hfsprogs dd if=/dev/zero of=fs.img bs=1M count=2048 mkfs.hfs fs.img sudo mount fs.img /mnt sudo mkdir /mnt/x sudo stress-ng --temp-path /mnt/x --filename 8 --filename-opts posix -t 20 stress-ng: info: [132412] setting to a 20 secs run per stressor stress-ng: info: [132412] dispatching hogs: 8 filename stress-ng: fail: [132424] filename: open failed on file of length 1 bytes, errno=17 (File exists) stress-ng: fail: [132428] filename: open failed on file of length 20 bytes, errno=17 (File exists) stress-ng: fail: [132423] filename: open failed on file of length 30 bytes, errno=17 (File exists) stress-ng: fail: [132421] filename: open failed on file of length 30 bytes, errno=17 (File exists) stress-ng: fail: [132428] filename: open failed on file of length 30 bytes, errno=17 (File exists) stress-ng: fail: [132426] filename: open failed on file of length 23 bytes, errno=17 (File exists) stress-ng: fail: [132425] filename: open failed
[Kernel-packages] [Bug 2056448] [NEW] weird file system free block state after creating files and removing them with a mix of i/o operations
Public bug reported: Summary: create hfs file system, loop-back mount it, run iomix stessor with stress-ng to exercise with a mix of file I/O operations and remove files at end. File system is empty but a lot of blocks are used and can't seem to be recovered. Kernel: 6.8.0-11-generic test case: sudo apt-get install hfsprogs dd if=/dev/zero of=fs.img bs=1M count=2048 mkfs.hfs fs.img sudo mount fs.img /mnt sudo mkdir /mnt/x df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 2015 2095113 1% /mnt sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2 df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 674635 1422493 33% /mnt ls -alR /mnt/ /mnt/: total 4 drwxr-xr-x 1 root root3 Mar 7 12:37 . drwxr-xr-x 23 root root 4096 Feb 28 14:13 .. drwxr-xr-x 1 root root2 Mar 7 12:37 x /mnt/x: total 0 drwxr-xr-x 1 root root 2 Mar 7 12:37 . drwxr-xr-x 1 root root 3 Mar 7 12:37 .. ..so file system is 33% full, but no files are on the file system. Something looks wrong here. ** Affects: linux (Ubuntu) Importance: High Status: New ** Changed in: linux (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2056448 Title: weird file system free block state after creating files and removing them with a mix of i/o operations Status in linux package in Ubuntu: New Bug description: Summary: create hfs file system, loop-back mount it, run iomix stessor with stress-ng to exercise with a mix of file I/O operations and remove files at end. File system is empty but a lot of blocks are used and can't seem to be recovered. Kernel: 6.8.0-11-generic test case: sudo apt-get install hfsprogs dd if=/dev/zero of=fs.img bs=1M count=2048 mkfs.hfs fs.img sudo mount fs.img /mnt sudo mkdir /mnt/x df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 2015 2095113 1% /mnt sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2 df /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop6 2097128 674635 1422493 33% /mnt ls -alR /mnt/ /mnt/: total 4 drwxr-xr-x 1 root root3 Mar 7 12:37 . drwxr-xr-x 23 root root 4096 Feb 28 14:13 .. drwxr-xr-x 1 root root2 Mar 7 12:37 x /mnt/x: total 0 drwxr-xr-x 1 root root 2 Mar 7 12:37 . drwxr-xr-x 1 root root 3 Mar 7 12:37 .. ..so file system is 33% full, but no files are on the file system. Something looks wrong here. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056448/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages
** Attachment added: "screen shot of my noble VM on a noble server" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+attachment/5751725/+files/Screenshot%20from%202024-03-02%2022-57-52.png -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2055310 Title: dmesg spammed by virtui-fs and 9pnet-virtio messages Status in linux package in Ubuntu: New Bug description: Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel messages after boot (running instances in a VM using virt-manager) uname -a Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux [ 30.638354] virtio-fs: tag not found [ 30.642316] 9pnet_virtio: no channels available for device config [ 35.897615] virtio-fs: tag not found [ 35.901568] 9pnet_virtio: no channels available for device config [ 41.141860] virtio-fs: tag not found [ 41.145513] 9pnet_virtio: no channels available for device config [ 46.382040] virtio-fs: tag not found [ 46.386141] 9pnet_virtio: no channels available for device config [ 51.632229] virtio-fs: tag not found [ 51.635727] 9pnet_virtio: no channels available for device config These are annoying when logging in via the console. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages
Does not occur on pre-noble VMs, e.g. fine with mantic through to trusty on all my VMs on the same host. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2055310 Title: dmesg spammed by virtui-fs and 9pnet-virtio messages Status in linux package in Ubuntu: New Bug description: Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel messages after boot (running instances in a VM using virt-manager) uname -a Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux [ 30.638354] virtio-fs: tag not found [ 30.642316] 9pnet_virtio: no channels available for device config [ 35.897615] virtio-fs: tag not found [ 35.901568] 9pnet_virtio: no channels available for device config [ 41.141860] virtio-fs: tag not found [ 41.145513] 9pnet_virtio: no channels available for device config [ 46.382040] virtio-fs: tag not found [ 46.386141] 9pnet_virtio: no channels available for device config [ 51.632229] virtio-fs: tag not found [ 51.635727] 9pnet_virtio: no channels available for device config These are annoying when logging in via the console. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages
Good idea, I've installed this on my host and it's still occurring on various VM architectures (x86-64, ppc64el, s390x, ect). My host is noble and up to date with updates. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2055310 Title: dmesg spammed by virtui-fs and 9pnet-virtio messages Status in linux package in Ubuntu: New Bug description: Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel messages after boot (running instances in a VM using virt-manager) uname -a Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux [ 30.638354] virtio-fs: tag not found [ 30.642316] 9pnet_virtio: no channels available for device config [ 35.897615] virtio-fs: tag not found [ 35.901568] 9pnet_virtio: no channels available for device config [ 41.141860] virtio-fs: tag not found [ 41.145513] 9pnet_virtio: no channels available for device config [ 46.382040] virtio-fs: tag not found [ 46.386141] 9pnet_virtio: no channels available for device config [ 51.632229] virtio-fs: tag not found [ 51.635727] 9pnet_virtio: no channels available for device config These are annoying when logging in via the console. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages
cking@noble-amd64:~$ uname -a Linux noble-amd64 6.8.0-11-generic #11-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 14 00:29:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux [9.551968] virtio-fs: tag not found [9.555352] 9pnet_virtio: no channels available for device config [ 14.850014] virtio-fs: tag not found [ 14.854959] 9pnet_virtio: no channels available for device config [ 18.302513] systemd-journald[451]: /var/log/journal/c8f7ec498f904c46b99f25f051792ec0/user-1000.journal: Journal file uses a different sequence number ID, rotating. [ 20.173092] virtio-fs: tag not found [ 20.176498] 9pnet_virtio: no channels available for device config [ 25.470406] virtio-fs: tag not found [ 25.475163] 9pnet_virtio: no channels available for device config [ 30.690179] virtio-fs: tag not found [ 30.695584] 9pnet_virtio: no channels available for device config [ 35.947869] virtio-fs: tag not found [ 35.951397] 9pnet_virtio: no channels available for device config [ 41.190386] virtio-fs: tag not found [ 41.195323] 9pnet_virtio: no channels available for device config [ 46.437815] virtio-fs: tag not found [ 46.441903] 9pnet_virtio: no channels available for device config [ 51.700719] virtio-fs: tag not found [ 51.704995] 9pnet_virtio: no channels available for device config ..but it stops now after ~52 seconds after boot. Anyhow, these are unusual new messages from the previous mantic kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2055310 Title: dmesg spammed by virtui-fs and 9pnet-virtio messages Status in linux package in Ubuntu: New Bug description: Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel messages after boot (running instances in a VM using virt-manager) uname -a Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux [ 30.638354] virtio-fs: tag not found [ 30.642316] 9pnet_virtio: no channels available for device config [ 35.897615] virtio-fs: tag not found [ 35.901568] 9pnet_virtio: no channels available for device config [ 41.141860] virtio-fs: tag not found [ 41.145513] 9pnet_virtio: no channels available for device config [ 46.382040] virtio-fs: tag not found [ 46.386141] 9pnet_virtio: no channels available for device config [ 51.632229] virtio-fs: tag not found [ 51.635727] 9pnet_virtio: no channels available for device config These are annoying when logging in via the console. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055310] [NEW] dmesg spammed by virtui-fs and 9pnet-virtio messages
Public bug reported: Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel messages after boot (running instances in a VM using virt-manager) uname -a Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux [ 30.638354] virtio-fs: tag not found [ 30.642316] 9pnet_virtio: no channels available for device config [ 35.897615] virtio-fs: tag not found [ 35.901568] 9pnet_virtio: no channels available for device config [ 41.141860] virtio-fs: tag not found [ 41.145513] 9pnet_virtio: no channels available for device config [ 46.382040] virtio-fs: tag not found [ 46.386141] 9pnet_virtio: no channels available for device config [ 51.632229] virtio-fs: tag not found [ 51.635727] 9pnet_virtio: no channels available for device config These are annoying when logging in via the console. ** Affects: linux (Ubuntu) Importance: Low Status: New ** Changed in: linux (Ubuntu) Importance: Undecided => Low ** Changed in: linux (Ubuntu) Milestone: None => ubuntu-24.04-beta ** Description changed: Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel - messages after boot: + messages after boot (running instances in a VM using virt-manager) uname -a Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux - [ 30.638354] virtio-fs: tag not found [ 30.642316] 9pnet_virtio: no channels available for device config [ 35.897615] virtio-fs: tag not found [ 35.901568] 9pnet_virtio: no channels available for device config [ 41.141860] virtio-fs: tag not found [ 41.145513] 9pnet_virtio: no channels available for device config [ 46.382040] virtio-fs: tag not found [ 46.386141] 9pnet_virtio: no channels available for device config [ 51.632229] virtio-fs: tag not found [ 51.635727] 9pnet_virtio: no channels available for device config These are annoying when logging in via the console. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2055310 Title: dmesg spammed by virtui-fs and 9pnet-virtio messages Status in linux package in Ubuntu: New Bug description: Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel messages after boot (running instances in a VM using virt-manager) uname -a Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux [ 30.638354] virtio-fs: tag not found [ 30.642316] 9pnet_virtio: no channels available for device config [ 35.897615] virtio-fs: tag not found [ 35.901568] 9pnet_virtio: no channels available for device config [ 41.141860] virtio-fs: tag not found [ 41.145513] 9pnet_virtio: no channels available for device config [ 46.382040] virtio-fs: tag not found [ 46.386141] 9pnet_virtio: no channels available for device config [ 51.632229] virtio-fs: tag not found [ 51.635727] 9pnet_virtio: no channels available for device config These are annoying when logging in via the console. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load
** Changed in: zfs-linux (Ubuntu) Assignee: Colin Ian King (colin-king) => (unassigned) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1899249 Title: OpenZFS writing stalls, under load Status in Native ZFS for Linux: New Status in zfs-linux package in Ubuntu: Fix Released Bug description: Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry Pi 8GB. ZFS deduplication, and LZJB compression is enabled. This issue seems to occur, intermittently, after some time (happens with both SMB access, via Samba, and when interacting with the system, via SSH), and never previously occurred, until a few months ago, and I sometimes have to force a reboot of the system (at the cost of some data loss), in order to use it again. The "dmesg" log reports: [25375.911590] z_wr_iss_h D0 2161 2 0x0028 [25375.911606] Call trace: [25375.911627] __switch_to+0x104/0x170 [25375.911639] __schedule+0x30c/0x7c0 [25375.911647] schedule+0x3c/0xb8 [25375.911655] io_schedule+0x20/0x58 [25375.911668] rq_qos_wait+0x100/0x178 [25375.911677] wbt_wait+0xb4/0xf0 [25375.911687] __rq_qos_throttle+0x38/0x50 [25375.911700] blk_mq_make_request+0x128/0x610 [25375.911712] generic_make_request+0xb4/0x2d8 [25375.911722] submit_bio+0x48/0x218 [25375.911960] vdev_disk_io_start+0x670/0x9f8 [zfs] [25375.912181] zio_vdev_io_start+0xdc/0x2b8 [zfs] [25375.912400] zio_nowait+0xd4/0x170 [zfs] [25375.912617] vdev_mirror_io_start+0xa8/0x1b0 [zfs] [25375.912839] zio_vdev_io_start+0x248/0x2b8 [zfs] [25375.913057] zio_execute+0xac/0x110 [zfs] [25375.913096] taskq_thread+0x2f8/0x570 [spl] [25375.913108] kthread+0xfc/0x128 [25375.913119] ret_from_fork+0x10/0x1c [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds. [25375.919916] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.934835] txg_syncD0 2333 2 0x0028 [25375.934850] Call trace: [25375.934869] __switch_to+0x104/0x170 [25375.934879] __schedule+0x30c/0x7c0 [25375.934887] schedule+0x3c/0xb8 [25375.934899] schedule_timeout+0x9c/0x190 [25375.934908] io_schedule_timeout+0x28/0x48 [25375.934946] __cv_timedwait_common+0x1a8/0x1f8 [spl] [25375.934982] __cv_timedwait_io+0x3c/0x50 [spl] [25375.935205] zio_wait+0x130/0x2a0 [zfs] [25375.935423] dsl_pool_sync+0x3fc/0x498 [zfs] [25375.935650] spa_sync+0x538/0xe68 [zfs] [25375.935867] txg_sync_thread+0x2c0/0x468 [zfs] [25375.935911] thread_generic_wrapper+0x74/0xa0 [spl] [25375.935924] kthread+0xfc/0x128 [25375.935935] ret_from_fork+0x10/0x1c [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds. [25375.942780] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.957702] zbackup D0 75339 5499 0x [25375.957716] Call trace: [25375.957732] __switch_to+0x104/0x170 [25375.957742] __schedule+0x30c/0x7c0 [25375.957750] schedule+0x3c/0xb8 [25375.957789] cv_wait_common+0x188/0x1b0 [spl] [25375.957823] __cv_wait+0x30/0x40 [spl] [25375.958045] zil_commit_impl+0x234/0xd30 [zfs] [25375.958263] zil_commit+0x48/0x70 [zfs] [25375.958481] zfs_create+0x544/0x7d0 [zfs] [25375.958698] zpl_create+0xb8/0x178 [zfs] [25375.958711] lookup_open+0x4ec/0x6a8 [25375.958721] do_last+0x260/0x8c0 [25375.958730] path_openat+0x84/0x258 [25375.958739] do_filp_open+0x84/0x108 [25375.958752] do_sys_open+0x180/0x2b0 [25375.958763] __arm64_sys_openat+0x2c/0x38 [25375.958773] el0_svc_common.constprop.0+0x80/0x218 [25375.958781] el0_svc_handler+0x34/0xa0 [25375.958791] el0_svc+0x10/0x2cc [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds. [25375.965564] Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25375.980479] zbackup D0 95187 5499 0x [25375.980493] Call trace: [25375.980514] __switch_to+0x104/0x170 [25375.980525] __schedule+0x30c/0x7c0 [25375.980536] schedule+0x3c/0xb8 [25375.980578] cv_wait_common+0x188/0x1b0 [spl] [25375.980612] __cv_wait+0x30/0x40 [spl] [25375.980834] zil_commit_impl+0x234/0xd30 [zfs] [25375.981052] zil_commit+0x48/0x70 [zfs] [25375.981280] zfs_write+0xa3c/0xb90 [zfs] [25375.981498] zpl_write_common_iovec+0xac/0x120 [zfs] [25375.981726] zpl_iter_write+0xe4/0x150 [zfs] [25375.981766] new_sync_write+0x100/0x1a8 [25375.981776] __vfs_write+0x74/0x90 [25375.981784] vfs_write+0xe4/0x1c8 [
[Kernel-packages] [Bug 2051342] Re: Enable lowlatency settings in the generic kernel
Looks like Michael Larabel has done some analysis for you already :-) https://www.phoronix.com/news/Ubuntu-Generic-LL-Kernel -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2051342 Title: Enable lowlatency settings in the generic kernel Status in linux package in Ubuntu: New Status in linux source package in Noble: New Bug description: [Impact] Ubuntu provides the "lowlatency" kernel: a kernel optimized for applications that have special "low latency" requirements. Currently, this kernel does not include any specific UBUNTU SAUCE patches to improve the extra "low latency" requirements, but the only difference is a small subset of .config options. Almost all these options are now configurable either at boot-time or even at run-time, with the only exception of CONFIG_HZ (250 in the generic kernel vs 1000 in the lowlatency kernel). Maintaining a separate kernel for a single config option seems a bit overkill and it is a significant cost of engineering hours, build time, regression testing time and resources. Not to mention the risk of the low-latency kernel falling behind and not being perfectly in sync with the latest generic kernel. Enabling the low-latency settings in the generic kernel has been evaluated before, but it has been never finalized due to the potential risk of performance regressions in CPU-intensive applications (increasing HZ from 250 to 1000 may introduce more kernel jitter in number crunching workloads). The outcome of the original proposal resulted in a re-classification of the lowlatency kernel as a desktop- oriented kernel, enabling additional low latency features (LP: #2023007). As we are approaching the release of the new Ubuntu 24.04 we may want to re-consider merging the low-latency settings in the generic kernel again. Following a detailed analysis of the specific low-latency features: - CONFIG_NO_HZ_FULL=y: enable access to "Full tickless mode" (shutdown clock tick when possible across all the enabled CPUs if they are either idle or running 1 task - reduce kernel jitter of running tasks due to the periodic clock tick, must be enabled at boot time passing `nohz_full=`); this can actually help CPU-intensive workloads and it could provide much more benefits than the CONFIG_HZ difference (since it can potentially shutdown any kernel jitter on specific CPUs), this one should really be enabled anyway, considering that it is configurable at boot time - CONFIG_RCU_NOCB_CPU=y: move RCU callbacks from softirq context to kthread context (reduce time spent in softirqs with preemption disabled to improve the overall system responsiveness, at the cost of introducing a potential performance penalty, because RCU callbacks are not processed by kernel threads); this should be enabled as well, since it is configurable at boot time (via the rcu_nocbs= parameter) - CONFIG_RCU_LAZY=y: batch RCU callbacks and then flush them after a timed delay instead of executing them immediately (c'an provide 5~10% power-savings for idle or lightly-loaded systems, this is extremely useful for laptops / portable devices - https://lore.kernel.org/lkml/20221016162305.2489629-3-j...@joelfernandes.org/); this has the potential to introduce significant performance regressions, but in the Noble kernel we already have a SAUCE patch that allows to enable/disable this option at boot time (see LP: #2045492), and by default it will be disabled (CONFIG_RCU_LAZY_DEFAULT_OFF=y) - CONFIG_HZ=1000 last but not least, the only option that is *only* tunable at compile time. As already mentioned there is a potential risk of regressions for CPU-intensive applications, but they can be mitigated (and maybe they could even outperformed) with NO_HZ_FULL. On the other hand, HZ=1000 can improve system responsiveness, that means most of the desktop and server applications will benefit from this (the largest part of the server workloads is I/O bound, more than CPU- bound, so they can benefit from having a kernel that can react faster at switching tasks), not to mention the benefit for the typical end users applications (gaming, live conferencing, multimedia, etc.). With all of that in place we can provide a kernel that has the flexibility to be more responsive, more performant and more power efficient (therefore more "generic"), simply by tuning run-time and boot-time options. Moreover, once these changes are applied we will be able to deprecate the lowlatency kernel, saving engineering time and also reducing power consumption (required to build the kernel and do all the testing). Optionally, we can also provide optimal "lowlatency" settings as a user-space package that would set the proper options in the kernel boot command line (GRUB, or similar). [Test case]
[Kernel-packages] [Bug 2051342] Re: Enable lowlatency settings in the generic kernel
@Andrea, that's a good start, but it may be worth running some of the Phoronix Tests too as they are a good spread of use cases. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2051342 Title: Enable lowlatency settings in the generic kernel Status in linux package in Ubuntu: New Status in linux source package in Noble: New Bug description: [Impact] Ubuntu provides the "lowlatency" kernel: a kernel optimized for applications that have special "low latency" requirements. Currently, this kernel does not include any specific UBUNTU SAUCE patches to improve the extra "low latency" requirements, but the only difference is a small subset of .config options. Almost all these options are now configurable either at boot-time or even at run-time, with the only exception of CONFIG_HZ (250 in the generic kernel vs 1000 in the lowlatency kernel). Maintaining a separate kernel for a single config option seems a bit overkill and it is a significant cost of engineering hours, build time, regression testing time and resources. Not to mention the risk of the low-latency kernel falling behind and not being perfectly in sync with the latest generic kernel. Enabling the low-latency settings in the generic kernel has been evaluated before, but it has been never finalized due to the potential risk of performance regressions in CPU-intensive applications (increasing HZ from 250 to 1000 may introduce more kernel jitter in number crunching workloads). The outcome of the original proposal resulted in a re-classification of the lowlatency kernel as a desktop- oriented kernel, enabling additional low latency features (LP: #2023007). As we are approaching the release of the new Ubuntu 24.04 we may want to re-consider merging the low-latency settings in the generic kernel again. Following a detailed analysis of the specific low-latency features: - CONFIG_NO_HZ_FULL=y: enable access to "Full tickless mode" (shutdown clock tick when possible across all the enabled CPUs if they are either idle or running 1 task - reduce kernel jitter of running tasks due to the periodic clock tick, must be enabled at boot time passing `nohz_full=`); this can actually help CPU-intensive workloads and it could provide much more benefits than the CONFIG_HZ difference (since it can potentially shutdown any kernel jitter on specific CPUs), this one should really be enabled anyway, considering that it is configurable at boot time - CONFIG_RCU_NOCB_CPU=y: move RCU callbacks from softirq context to kthread context (reduce time spent in softirqs with preemption disabled to improve the overall system responsiveness, at the cost of introducing a potential performance penalty, because RCU callbacks are not processed by kernel threads); this should be enabled as well, since it is configurable at boot time (via the rcu_nocbs= parameter) - CONFIG_RCU_LAZY=y: batch RCU callbacks and then flush them after a timed delay instead of executing them immediately (c'an provide 5~10% power-savings for idle or lightly-loaded systems, this is extremely useful for laptops / portable devices - https://lore.kernel.org/lkml/20221016162305.2489629-3-j...@joelfernandes.org/); this has the potential to introduce significant performance regressions, but in the Noble kernel we already have a SAUCE patch that allows to enable/disable this option at boot time (see LP: #2045492), and by default it will be disabled (CONFIG_RCU_LAZY_DEFAULT_OFF=y) - CONFIG_HZ=1000 last but not least, the only option that is *only* tunable at compile time. As already mentioned there is a potential risk of regressions for CPU-intensive applications, but they can be mitigated (and maybe they could even outperformed) with NO_HZ_FULL. On the other hand, HZ=1000 can improve system responsiveness, that means most of the desktop and server applications will benefit from this (the largest part of the server workloads is I/O bound, more than CPU- bound, so they can benefit from having a kernel that can react faster at switching tasks), not to mention the benefit for the typical end users applications (gaming, live conferencing, multimedia, etc.). With all of that in place we can provide a kernel that has the flexibility to be more responsive, more performant and more power efficient (therefore more "generic"), simply by tuning run-time and boot-time options. Moreover, once these changes are applied we will be able to deprecate the lowlatency kernel, saving engineering time and also reducing power consumption (required to build the kernel and do all the testing). Optionally, we can also provide optimal "lowlatency" settings as a user-space package that would set the proper options in the kernel boot command line (GRUB, or similar). [Test case]
[Kernel-packages] [Bug 2051342] Re: Enable lowlatency settings in the generic kernel
It may be worth trying a wider range of synthetic benchmarks to see how it affects scheduling, I/O, RCU and power consumption. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2051342 Title: Enable lowlatency settings in the generic kernel Status in linux package in Ubuntu: New Status in linux source package in Noble: New Bug description: [Impact] Ubuntu provides the "lowlatency" kernel: a kernel optimized for applications that have special "low latency" requirements. Currently, this kernel does not include any specific UBUNTU SAUCE patches to improve the extra "low latency" requirements, but the only difference is a small subset of .config options. Almost all these options are now configurable either at boot-time or even at run-time, with the only exception of CONFIG_HZ (250 in the generic kernel vs 1000 in the lowlatency kernel). Maintaining a separate kernel for a single config option seems a bit overkill and it is a significant cost of engineering hours, build time, regression testing time and resources. Not to mention the risk of the low-latency kernel falling behind and not being perfectly in sync with the latest generic kernel. Enabling the low-latency settings in the generic kernel has been evaluated before, but it has been never finalized due to the potential risk of performance regressions in CPU-intensive applications (increasing HZ from 250 to 1000 may introduce more kernel jitter in number crunching workloads). The outcome of the original proposal resulted in a re-classification of the lowlatency kernel as a desktop- oriented kernel, enabling additional low latency features (LP: #2023007). As we are approaching the release of the new Ubuntu 24.04 we may want to re-consider merging the low-latency settings in the generic kernel again. Following a detailed analysis of the specific low-latency features: - CONFIG_NO_HZ_FULL=y: enable access to "Full tickless mode" (shutdown clock tick when possible across all the enabled CPUs if they are either idle or running 1 task - reduce kernel jitter of running tasks due to the periodic clock tick, must be enabled at boot time passing `nohz_full=`); this can actually help CPU-intensive workloads and it could provide much more benefits than the CONFIG_HZ difference (since it can potentially shutdown any kernel jitter on specific CPUs), this one should really be enabled anyway, considering that it is configurable at boot time - CONFIG_RCU_NOCB_CPU=y: move RCU callbacks from softirq context to kthread context (reduce time spent in softirqs with preemption disabled to improve the overall system responsiveness, at the cost of introducing a potential performance penalty, because RCU callbacks are not processed by kernel threads); this should be enabled as well, since it is configurable at boot time (via the rcu_nocbs= parameter) - CONFIG_RCU_LAZY=y: batch RCU callbacks and then flush them after a timed delay instead of executing them immediately (c'an provide 5~10% power-savings for idle or lightly-loaded systems, this is extremely useful for laptops / portable devices - https://lore.kernel.org/lkml/20221016162305.2489629-3-j...@joelfernandes.org/); this has the potential to introduce significant performance regressions, but in the Noble kernel we already have a SAUCE patch that allows to enable/disable this option at boot time (see LP: #2045492), and by default it will be disabled (CONFIG_RCU_LAZY_DEFAULT_OFF=y) - CONFIG_HZ=1000 last but not least, the only option that is *only* tunable at compile time. As already mentioned there is a potential risk of regressions for CPU-intensive applications, but they can be mitigated (and maybe they could even outperformed) with NO_HZ_FULL. On the other hand, HZ=1000 can improve system responsiveness, that means most of the desktop and server applications will benefit from this (the largest part of the server workloads is I/O bound, more than CPU- bound, so they can benefit from having a kernel that can react faster at switching tasks), not to mention the benefit for the typical end users applications (gaming, live conferencing, multimedia, etc.). With all of that in place we can provide a kernel that has the flexibility to be more responsive, more performant and more power efficient (therefore more "generic"), simply by tuning run-time and boot-time options. Moreover, once these changes are applied we will be able to deprecate the lowlatency kernel, saving engineering time and also reducing power consumption (required to build the kernel and do all the testing). Optionally, we can also provide optimal "lowlatency" settings as a user-space package that would set the proper options in the kernel boot command line (GRUB, or similar). [Test case]
[Kernel-packages] [Bug 2049537] Re: Pull request for: peer-memory, ACPI thermal issues and coresight etm4x issues
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2049537 Title: Pull request for: peer-memory, ACPI thermal issues and coresight etm4x issues Status in linux-nvidia-6.5 package in Ubuntu: Fix Committed Bug description: * Add support of "Thermal fast Sampling Period (_TFP)" for passive cooling. * Finer grained CPU throttling * The peer_memory_client scheme allows a driver to register with the ib_umem system that it has the ability to understand user virtual address ranges that are not compatible with get_user_pages(). For instance VMAs created with io_remap_pfn_range(), or other driver special VMA. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2049537/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2048815] Re: Pull request to address TPM SPI devices
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048815 Title: Pull request to address TPM SPI devices Status in linux-nvidia-6.5 package in Ubuntu: Fix Committed Bug description: TPM devices may insert wait state on last clock cycle of ADDR phase. For SPI controllers that support full-duplex transfers, this can be detected using software by reading the MISO line. For SPI controllers that only support half-duplex transfers, such as the Tegra QSPI, it is not possible to detect the wait signal from software. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048815/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2048966] Re: Fix soft lockup triggered by arm_smmu_mm_invalidate_range
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048966 Title: Fix soft lockup triggered by arm_smmu_mm_invalidate_range Status in linux-nvidia-6.5 package in Ubuntu: Fix Committed Bug description: [Problem] When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13: 3000b4a3cf60 x12: x11: x10: x9 : c08120e4d6bc x8 : x7 : x6 : 00048cfa x5 : x4 : 0001 x3 : 000a x2 : 8000 x1 : x0 : 0001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 [Fix] backport the following upstream stable patch d5afb4b47e13161b3f33904d45110f9e6463bad6 Link: https://lore.kernel.org/r/20230920052257.8615-1-nicol...@nvidia.com To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048966/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2048966] Re: Fix soft lockup triggered by arm_smmu_mm_invalidate_range
** Description changed: + [Problem] + When running an SVA case, the following soft lockup is triggered: - - watchdog: BUG: soft lockup - CPU#244 stuck for 26s! - pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) - pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 - lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 - sp : 8000d83ef290 - x29: 8000d83ef290 x28: 3b9aca00 x27: - x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: - x23: 0040 x22: 8000d83ef340 x21: c63980c0 - x20: 0001 x19: c6398080 x18: - x17: x16: x15: 3000b4a3bbb0 - x14: 3000b4a30888 x13: 3000b4a3cf60 x12: - x11: x10: x9 : c08120e4d6bc - x8 : x7 : x6 : 00048cfa - x5 : x4 : 0001 x3 : 000a - x2 : 8000 x1 : x0 : 0001 - Call trace: - arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 - __arm_smmu_tlb_inv_range+0x118/0x254 - arm_smmu_tlb_inv_range_asid+0x6c/0x130 - arm_smmu_mm_invalidate_range+0xa0/0xa4 - __mmu_notifier_invalidate_range_end+0x88/0x120 - unmap_vmas+0x194/0x1e0 - unmap_region+0xb4/0x144 - do_mas_align_munmap+0x290/0x490 - do_mas_munmap+0xbc/0x124 - __vm_munmap+0xa8/0x19c - __arm64_sys_munmap+0x28/0x50 - invoke_syscall+0x78/0x11c - el0_svc_common.constprop.0+0x58/0x1c0 - do_el0_svc+0x34/0x60 - el0_svc+0x2c/0xd4 - el0t_64_sync_handler+0x114/0x140 - el0t_64_sync+0x1a4/0x1a8 - + + watchdog: BUG: soft lockup - CPU#244 stuck for 26s! + pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) + pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 + lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 + sp : 8000d83ef290 + x29: 8000d83ef290 x28: 3b9aca00 x27: + x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: + x23: 0040 x22: 8000d83ef340 x21: c63980c0 + x20: 0001 x19: c6398080 x18: + x17: x16: x15: 3000b4a3bbb0 + x14: 3000b4a30888 x13: 3000b4a3cf60 x12: + x11: x10: x9 : c08120e4d6bc + x8 : x7 : x6 : 00048cfa + x5 : x4 : 0001 x3 : 000a + x2 : 8000 x1 : x0 : 0001 + Call trace: + arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 + __arm_smmu_tlb_inv_range+0x118/0x254 + arm_smmu_tlb_inv_range_asid+0x6c/0x130 + arm_smmu_mm_invalidate_range+0xa0/0xa4 + __mmu_notifier_invalidate_range_end+0x88/0x120 + unmap_vmas+0x194/0x1e0 + unmap_region+0xb4/0x144 + do_mas_align_munmap+0x290/0x490 + do_mas_munmap+0xbc/0x124 + __vm_munmap+0xa8/0x19c + __arm64_sys_munmap+0x28/0x50 + invoke_syscall+0x78/0x11c + el0_svc_common.constprop.0+0x58/0x1c0 + do_el0_svc+0x34/0x60 + el0_svc+0x2c/0xd4 + el0t_64_sync_handler+0x114/0x140 + el0t_64_sync+0x1a4/0x1a8 + + + + [Fix] + + backport the following upstream stable patch + d5afb4b47e13161b3f33904d45110f9e6463bad6 + + Link: + https://lore.kernel.org/r/20230920052257.8615-1-nicol...@nvidia.com -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048966 Title: Fix soft lockup triggered by arm_smmu_mm_invalidate_range Status in linux-nvidia-6.5 package in Ubuntu: New Bug description: [Problem] When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13:
[Kernel-packages] [Bug 2048966] [NEW] Fix soft lockup triggered by arm_smmu_mm_invalidate_range
Public bug reported: When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13: 3000b4a3cf60 x12: x11: x10: x9 : c08120e4d6bc x8 : x7 : x6 : 00048cfa x5 : x4 : 0001 x3 : 000a x2 : 8000 x1 : x0 : 0001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 ** Affects: linux-nvidia-6.5 (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048966 Title: Fix soft lockup triggered by arm_smmu_mm_invalidate_range Status in linux-nvidia-6.5 package in Ubuntu: New Bug description: When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13: 3000b4a3cf60 x12: x11: x10: x9 : c08120e4d6bc x8 : x7 : x6 : 00048cfa x5 : x4 : 0001 x3 : 000a x2 : 8000 x1 : x0 : 0001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048966/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2042697] Re: Pull request to address thermal core issues
** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2042697 Title: Pull request to address thermal core issues Status in linux-nvidia-6.2 package in Ubuntu: Fix Released Bug description: The Grace development team has not been testing the 6.2 Ubuntu kernel but instead a newer kernel. When they run their thermal tests on a 6.2 kernel they are running into failures. Investigations have turned up several missing kernel patches. These patches are clean cherry-picks and have been tested and confirmed to fix the thermal issues we are seeing. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2042697/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1995606] Re: Upgrade thermald to 2.5.1 for Jammy (22.04)
Thanks team Canonical for this \o/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/1995606 Title: Upgrade thermald to 2.5.1 for Jammy (22.04) Status in OEM Priority Project: Fix Released Status in thermald package in Ubuntu: Fix Released Status in thermald source package in Jammy: Fix Released Bug description: [Justification] The purpose of this bug is that prevent the regression in the future. The automatic test scripts are better for the future SRU and is still on the planning. [Test case] For these CPU series, RPL/ADL/TGL/CML/CFL/KBL, the following tests will be run on machines in the CI lab: 1. Run stress-ng, and observe the temperature/frequency/power with s-tui - Temperatures should stay just below trip values - Power/performance profiles should stay roughly the same between old thermald and new thermald (unless specifically expected eg: to fix premature/insufficient throttling) 2. check if thermald could read rules from /dev/acpi_thermal_rel and generate the xml file on /etc/thermald/ correctly. - this depends on if acpi_thermal_rel exist. - if the machine suppots acpi_thermal_rel, the "thermal-conf.xml.auto" could be landed in etc/thermald/. - if not, the user-defined xml could be created, then jump to (3). - run thermald with --loglevel=debug, and compare the log with xml.auto file. check if the configuration could be parsed correctly. 3. check if theramd-conf.xml and thermal-cpu-cdev-order.xml can be loaded correctly. - run thermald with --loglevel=debug, and compare the log with xml files. - if parsed correctly, the configurations from XML files would appear in the log. 4. Run unit tests, the scripts are under test folder, using emul_temp to simulate the High temperatue and check thermald would throttle CPU through the related cooling device. - rapl.sh - intel_pstate.sh - powerclamp.sh - processor.sh 5. check if the power/frequency would be throttled once the temperature reach the trip-points of thermal zone. 6. check if system would be throttled even the temperature is under the trip-points. [ Where problems could occur ] since the PL1 min/max is introduced, there may have some cases that don't check the minimum of PL1 then make PL1 to smaller and smaller and throttle the CPU. this may cause machines run like the old behavior that doesn't have PL1 min/max. To manage notifications about this bug go to: https://bugs.launchpad.net/oem-priority/+bug/1995606/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Changed in: linux (Ubuntu) Status: Incomplete => Won't Fix ** No longer affects: linux (Ubuntu Jammy) ** No longer affects: linux-nvidia (Ubuntu Jammy) ** Changed in: linux-nvidia (Ubuntu) Status: New => Fix Committed ** Changed in: linux-nvidia (Ubuntu) Assignee: (unassigned) => Ian May (ian-may) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Won't Fix Status in linux-nvidia package in Ubuntu: Fix Committed Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] * Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] * This introduces new code paths so regression potential should be low. [Other Info] * SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Package changed: linux-nvidia (Ubuntu) => linux (Ubuntu) ** Also affects: linux-nvidia (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Won't Fix Status in linux-nvidia package in Ubuntu: Fix Committed Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] * Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] * This introduces new code paths so regression potential should be low. [Other Info] * SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Package changed: linux (Ubuntu) => linux-nvidia (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux-nvidia package in Ubuntu: Incomplete Status in linux-nvidia source package in Jammy: Incomplete Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] * Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] * This introduces new code paths so regression potential should be low. [Other Info] * SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2043059] Re: Installation errors out when installing in a chroot
I don't appear to have access to the image file used in the reproducer. http://bright-dev.nvidia.com/base-distributions/x86_64/dgx-os/dgx-os-6.1-trd4/DGXOS-6.1.0-DGX-H100.tar.gz So instead I'm using the following image for reproducing. https://cloud-images.ubuntu.com/jammy/20231027/jammy-server-cloudimg-amd64.tar.gz The error indicates to me that it can't find the root device. If I don't bind mount /dev into my image, I'm able to recreate the error with both linux-generic and linux-nvidia. With the host /dev mounted into the chroot both kernels are able to call mkinitramfs successfully. Can you confirm that 'cm-chroot-sw-img' is mounting /dev? mount | grep /cm/images/dgx-h100-image/dev If we are lucky and it happens to not be mounted could you try the following: sudo mount --bind /dev /cm/images/dgx-h100-image/dev sudo chroot /cm/images/dgx-h100-image /etc/kernel/postinst.d/kdump-tools 5.15.0-1040-nvidia If /dev is correctly mounted and problem persists, I'll probably need a way to get that image tar to further investigate. Thanks, Ian -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/2043059 Title: Installation errors out when installing in a chroot Status in linux-nvidia package in Ubuntu: New Bug description: Processing triggers for linux-image-5.15.0-1040-nvidia (5.15.0-1040.40) ... /etc/kernel/postinst.d/dkms: * dkms: running auto installation service for kernel 5.15.0-1040-nvidia ...done. /etc/kernel/postinst.d/initramfs-tools: update-initramfs: Generating /boot/initrd.img-5.15.0-1040-nvidia cryptsetup: WARNING: Couldn't determine root device W: Couldn't identify type of root file system for fsck hook cp: cannot stat '/etc/iscsi/initiatorname.iscsi': No such file or directory /etc/kernel/postinst.d/kdump-tools: kdump-tools: Generating /var/lib/kdump/initrd.img-5.15.0-1040-nvidia mkinitramfs: failed to determine device for / mkinitramfs: workaround is MODULES=most, check: grep -r MODULES /var/lib/kdump/initramfs-tools Error please report bug on initramfs-tools Include the output of 'mount' and 'cat /proc/mounts' update-initramfs: failed for /var/lib/kdump/initrd.img-5.15.0-1040-nvidia with 1. run-parts: /etc/kernel/postinst.d/kdump-tools exited with return code 1 dpkg: error processing package linux-image-5.15.0-1040-nvidia (--configure): installed linux-image-5.15.0-1040-nvidia package post-installation script subprocess returned error exit status 1 Errors were encountered while processing: linux-image-5.15.0-1040-nvidia E: Sub-process /usr/bin/dpkg returned an error code (1) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2043059/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1738534] Re: Processor turbo disabled/throttled after suspend
** Summary changed: - Processor turbo dsiabled/throttled after suspend + Processor turbo disabled/throttled after suspend -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/1738534 Title: Processor turbo disabled/throttled after suspend Status in linux package in Ubuntu: Confirmed Status in thermald package in Ubuntu: New Bug description: After suspending/resuming my laptop on battery power, I noticed choppy video playback. I've narrowed it down to the CPU being locked to lower frequencies after suspend/resume (only on battery). Plugging the laptop back in does not restore the normal performance, nor does suspend/resume after plugging it back in. The performance doesn't drop until after the suspend/resume, I'm not sure if it is _supposed_ to throttle when on battery, but either way the behaviour is wrong. Doing a full shutdown and restart restores the performance to normal. Prior to a suspend/resume cycle, cpupower reports: $ sudo cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 800 MHz - 3.00 GHz available cpufreq governors: performance powersave current policy: frequency should be within 800 MHz and 3.00 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 1.26 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes 2800 MHz max turbo 4 active cores 2800 MHz max turbo 3 active cores 2800 MHz max turbo 2 active cores 3000 MHz max turbo 1 active cores Afterwards, the frequency is clamped (cpufreq-set -r --max=3.0GHz has no effect) and turbo is disabled: $ sudo cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 800 MHz - 3.00 GHz available cpufreq governors: performance powersave current policy: frequency should be within 800 MHz and 1.80 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 950 MHz (asserted by call to kernel) boost state support: Supported: no Active: no 2800 MHz max turbo 4 active cores 2800 MHz max turbo 3 active cores 2800 MHz max turbo 2 active cores 3000 MHz max turbo 1 active cores Trying to re-enable turbo mode by setting the no_turbo intel_pstate /sys/ entry back to 0 is rejected: $ echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo 0 tee: /sys/devices/system/cpu/intel_pstate/no_turbo: Operation not permitted However, these two commands *do* work around the problem, forcing turbo mode back on and then restoring the normal frequency range: sudo x86_energy_perf_policy --turbo-enable 1 sudo cpufreq-set -r --min=0.8GHz --max=3.0GHz I also see this error in dmesg after some resumes (but the above problem sometimes happens without this error message): Dec 16 11:36:25 shauns-laptop kernel: intel_pstate: Turbo disabled by BIOS or unavailable on processor ProblemType: Bug DistroRelease: Ubuntu 17.10 Package: linux-image-4.13.0-19-generic 4.13.0-19.22 ProcVersionSignature: Ubuntu 4.13.0-19.22-generic 4.13.13 Uname: Linux 4.13.0-19-generic x86_64 ApportVersion: 2.20.7-0ubuntu3.6 Architecture: amd64 AudioDevicesInUse: USERPID ACCESS COMMAND /dev/snd/controlC0: shaun 1194 F pulseaudio CurrentDesktop: ubuntu:GNOME Date: Sat Dec 16 11:18:12 2017 InstallationDate: Installed on 2017-12-14 (1 days ago) InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20171018) Lsusb: Bus 001 Device 004: ID 2232:1024 Silicon Motion Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: SAMSUNG ELECTRONICS CO., LTD. 900X3C/900X3D/900X4C/900X4D ProcFB: 0 inteldrmfb ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-4.13.0-19-generic root=UUID=7352de8c-0017-44e1-81fb-0145ad9c1185 ro rootflags=subvol=@ quiet splash vt.handoff=7 RelatedPackageVersions: linux-restricted-modules-4.13.0-19-generic N/A linux-backports-modules-4.13.0-19-generic N/A
[Kernel-packages] [Bug 1901266] Re: system sluggish, thermal keep frequency at 400MHz
This bug report has not seen any further follow-up for 2+ years. Closing it. If it is still not fixed please re-open this issue. ** Changed in: thermald (Ubuntu) Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/1901266 Title: system sluggish, thermal keep frequency at 400MHz Status in thermald package in Ubuntu: Fix Released Bug description: This morning I upgraded to 20.10 from 20.04 The system was quite slow although I have a fast machine. My virtual windows 10 on virtualbox became unusable. When I tried to have the virtual machine open, I could not participate properly in a zoom call (I could still hear the people but they said that my voice was very choppy) On 20.04 I was super-happy with the speed and I could have as many apps running as I want. Based on google I started looking at % journalctl --follow and this shows quite a few errors but not repeating often enough to explain it. Then I googled some more and found that /boot/efi was writing and reading. Then I googled some more and thought I had trouble with gnome. So I reset it to default % dconf reset -f /org/gnome/ and disabled the extensions. This made things slightly better but by far not acceptable. After lots of searching I checked the frequency of the CPUs and it was at the minimum 400Hz (as shown by i7z and also other tools). I tried setting the governor with cpufreqctl and similar methods but this did not change anything. I then found an old bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769236 and tried % sudo systemctl stop thermald this seems to work. After a few seconds the frequency shown in i7z goes to ~4500 MHz and the virtual machine seems to work fine. ProblemType: Bug DistroRelease: Ubuntu 20.10 Package: thermald 2.3-4 ProcVersionSignature: Ubuntu 5.8.0-25.26-generic 5.8.14 Uname: Linux 5.8.0-25-generic x86_64 NonfreeKernelModules: wl ApportVersion: 2.20.11-0ubuntu50 Architecture: amd64 CasperMD5CheckResult: skip CurrentDesktop: ubuntu:GNOME Date: Sat Oct 24 01:39:40 2020 DistributionChannelDescriptor: # This is the distribution channel descriptor for the OEM CDs # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor canonical-oem-somerville-bionic-amd64-20180608-47+merion+X66 InstallationDate: Installed on 2019-09-27 (392 days ago) InstallationMedia: Ubuntu 18.04 "Bionic" - Build amd64 LIVE Binary 20180608-09:38 SourcePackage: thermald UpgradeStatus: Upgraded to groovy on 2020-10-23 (0 days ago) mtime.conffile..etc.thermald.thermal-conf.xml: 2020-10-24T01:35:59.781865 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1901266/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2041670] [NEW] tmpfs: O_DIRECT | O_CREATE open reports open failure but actually creates a file.
Public bug reported: creating a file on tmpfs with open(filename, O_RDWR | O_DIRECT | O_CREAT, 0666) reports an open failure error EINVAL, but still creates the file. The file should not be created if we hit such an error. Tested and fails on: mantic amd64: 6.5.0-10-generic lunar amd64: 6.2.0-35-generic jammie amd64: 5.15.0-generic focal: 5.4.0-165-generic bionic: 4.15.0-213-generic trusty: 4.4.0-148-generic sudo mkdir /mnt/tmpfs sudo mount -t tmpfs -o size=1G,nr_inodes=10k,mode=777 tmpfs /mnt/tmpfs sudo chmod 666 /mnt/tmpfs gcc reproducer.c -o reproducer sudo ./reproducer Run the attached program. It reports an open failure (errno 22, EINVAL) but still manages to create the file. Note this was original discovered by running stress-ng on tmpfs with the open stressor: stress-ng --open 1 ** Affects: linux Importance: Unknown Status: Unknown ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Attachment added: "C source to reproduce the issue" https://bugs.launchpad.net/bugs/2041670/+attachment/5713768/+files/reproducer.c ** Bug watch added: Linux Kernel Bug Tracker #218049 https://bugzilla.kernel.org/show_bug.cgi?id=218049 ** Also affects: linux via https://bugzilla.kernel.org/show_bug.cgi?id=218049 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2041670 Title: tmpfs: O_DIRECT | O_CREATE open reports open failure but actually creates a file. Status in Linux: Unknown Status in linux package in Ubuntu: New Bug description: creating a file on tmpfs with open(filename, O_RDWR | O_DIRECT | O_CREAT, 0666) reports an open failure error EINVAL, but still creates the file. The file should not be created if we hit such an error. Tested and fails on: mantic amd64: 6.5.0-10-generic lunar amd64: 6.2.0-35-generic jammie amd64: 5.15.0-generic focal: 5.4.0-165-generic bionic: 4.15.0-213-generic trusty: 4.4.0-148-generic sudo mkdir /mnt/tmpfs sudo mount -t tmpfs -o size=1G,nr_inodes=10k,mode=777 tmpfs /mnt/tmpfs sudo chmod 666 /mnt/tmpfs gcc reproducer.c -o reproducer sudo ./reproducer Run the attached program. It reports an open failure (errno 22, EINVAL) but still manages to create the file. Note this was original discovered by running stress-ng on tmpfs with the open stressor: stress-ng --open 1 To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/2041670/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Changed in: linux (Ubuntu) Status: Incomplete => New ** Description changed: SRU Justification: [Impact] - From Nvidia: + *From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] - Testing instructions are outlined in the SF case and has been tested on + *Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] - This introduces new code paths so regression potential should be low. + *This introduces new code paths so regression potential should be low. [Other Info] - SF#00370664 + + *SF#00370664 ** Description changed: SRU Justification: [Impact] - *From Nvidia: + * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] - *Testing instructions are outlined in the SF case and has been tested on - in house hardware and externally by Nvidia. + * Testing instructions are outlined in the SF case and has been tested + on in house hardware and externally by Nvidia. [Where problems could occur?] - *This introduces new code paths so regression potential should be low. + * This introduces new code paths so regression potential should be low. [Other Info] - *SF#00370664 + * SF#00370664 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: New Status in linux source package in Jammy: New Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Description changed: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ - The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: + The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on - local hardware and also by Nvidia. + in house hardware and externally by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Incomplete Status in linux source package in Jammy: New Bug description: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Description changed: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference + https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ + The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: + "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on local hardware and also by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Incomplete Status in linux source package in Jammy: New Bug description: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on local hardware and also by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport DMABUF functionality
** Description changed: SRU Justification: [Impact] - Backport RDMA DMABUF functionality + Backport RDMA DMABUF - Nvidia is working on a high performance networking solution with real + From Nvidia: + + "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not - execute these new code paths. + execute these new code paths." - * First 3 patches adds a new api to the RDMA subsystem that allows drivers to get a pinned dmabuf memory - region without requiring an implementation of the move_notify callback. - + Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ - - * The remaining patches add support for DMABUF when creating a devx umem. devx umems - are quite similar to MR's execpt they cannot be revoked, so this uses the - dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot - work with MR. - - https://lore.kernel.org/all/0-v1-bd147097458e+ede- - umem_dmabuf_...@nvidia.com/ + https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ [Test Plan] - SW Configuration: - • Download CUDA 12.2 run file (https://developer.nvidia.com/cuda-downloads?target_os=Linux_arch=x86_64=Ubuntu_version=20.04_type=runfile_local) - • Install using kernel-open i.e. #sh ./cuda_12.2.2_535.104.05_linux.run -m=kernel-open - • Clone perftest from https://github.com/linux-rdma/perftest. - • cd perftest - • export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH - • export LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LIBRARY_PATH - • run: ./autogen.sh ; ./configure CUDA_H_PATH=/usr/local/cuda/include/cuda.h; make - - # Start Server - $ ./ib_write_bw -d mlx5_2 -F --use_cuda=0 --use_cuda_dmabuf - - #Start Client - $ ./ib_write_bw -d mlx5_3 -F --use_cuda=1 --use_cuda_dmabuf localhost + Testing instructions are outlined in the SF case and has been tested on + local hardware and also by Nvidia. [Where problems could occur?] + + This introduces new code paths so regression potential should be low. + + [Other Info] + SF#00370664 ** Description changed: SRU Justification: [Impact] - - Backport RDMA DMABUF From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ [Test Plan] Testing instructions are outlined in the SF case and has been tested on local hardware and also by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other
[Kernel-packages] [Bug 2040526] [NEW] Backport DMABUF functionality
Public bug reported: SRU Justification: [Impact] Backport RDMA DMABUF functionality Nvidia is working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths. * First 3 patches adds a new api to the RDMA subsystem that allows drivers to get a pinned dmabuf memory region without requiring an implementation of the move_notify callback. https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ * The remaining patches add support for DMABUF when creating a devx umem. devx umems are quite similar to MR's execpt they cannot be revoked, so this uses the dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot work with MR. https://lore.kernel.org/all/0-v1-bd147097458e+ede- umem_dmabuf_...@nvidia.com/ [Test Plan] SW Configuration: • Download CUDA 12.2 run file (https://developer.nvidia.com/cuda-downloads?target_os=Linux_arch=x86_64=Ubuntu_version=20.04_type=runfile_local) • Install using kernel-open i.e. #sh ./cuda_12.2.2_535.104.05_linux.run -m=kernel-open • Clone perftest from https://github.com/linux-rdma/perftest. • cd perftest • export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH • export LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LIBRARY_PATH • run: ./autogen.sh ; ./configure CUDA_H_PATH=/usr/local/cuda/include/cuda.h; make # Start Server $ ./ib_write_bw -d mlx5_2 -F --use_cuda=0 --use_cuda_dmabuf #Start Client $ ./ib_write_bw -d mlx5_3 -F --use_cuda=1 --use_cuda_dmabuf localhost [Where problems could occur?] ** Affects: linux (Ubuntu) Importance: Undecided Status: Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport DMABUF functionality Status in linux package in Ubuntu: Incomplete Bug description: SRU Justification: [Impact] Backport RDMA DMABUF functionality Nvidia is working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths. * First 3 patches adds a new api to the RDMA subsystem that allows drivers to get a pinned dmabuf memory region without requiring an implementation of the move_notify callback. https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ * The remaining patches add support for DMABUF when creating a devx umem. devx umems are quite similar to MR's execpt they cannot be revoked, so this uses the dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot work with MR. https://lore.kernel.org/all/0-v1-bd147097458e+ede- umem_dmabuf_...@nvidia.com/ [Test Plan] SW Configuration: • Download CUDA 12.2 run file
[Kernel-packages] [Bug 2038099] Re: Enable building and signing of the nvidia-fs out-of-tree kernel module.
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Committed ** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2038099 Title: Enable building and signing of the nvidia-fs out-of-tree kernel module. Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: Fix Committed Bug description: [Issue] The nvidia-fs kernel module is a must have for Nvidia optimized kernels. There is now a version that is compatible with the Grace processor. Integrate the changes necessary to build and sign this out- of-tree kernel module. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2038099/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2033685] Re: Pull-request to address ARM CoreSoght PMU issues
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2033685 Title: Pull-request to address ARM CoreSoght PMU issues Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: Fix Committed Bug description: [issue] This patch set addresses several CoreSight PMU issues. These are all upstream patches. Commit Summary 2940a5e perf: arm_cspmu: Fix variable dereference warning 06f6951 perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used 292771d perf/arm_cspmu: Fix event attribute type 6992931 ACPI/APMT: Don't register invalid resource 48f4b92 perf/arm_cspmu: Clean up ACPI dependency 7da1852 perf/arm_cspmu: Decouple APMT dependency d3d56a4 perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE File Changes (4 files) M drivers/acpi/arm64/apmt.c (10) M drivers/perf/arm_cspmu/Kconfig (3) M drivers/perf/arm_cspmu/arm_cspmu.c (95) M drivers/perf/arm_cspmu/arm_cspmu.h (5) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2033685/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037688] Re: Pull-request to address TPM bypass issue
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037688 Title: Pull-request to address TPM bypass issue Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: Fix Committed Bug description: NVIDIA: [Config]: Ensure the TPM is available before IMA initializes Set the following configs: CONFIG_SPI_TEGRA210_QUAD=y CONFIG_TCG_TIS_SPI=y On Grace systems, the IMA driver emits the following log: ima: No TPM chip found, activating TPM-bypass! This occurs because the IMA driver initializes before we are able to detect the TPM. This will always be the case when the drivers required to communicate with the TPM, spi_tegra210_quad and tpm_tis_spi, are built as modules. Having these drivers as built-ins ensures that the TPM is available before the IMA driver initializes. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2037688/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2033685] Re: Pull-request to address ARM CoreSoght PMU issues
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2033685 Title: Pull-request to address ARM CoreSoght PMU issues Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: New Bug description: [issue] This patch set addresses several CoreSight PMU issues. These are all upstream patches. Commit Summary 2940a5e perf: arm_cspmu: Fix variable dereference warning 06f6951 perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used 292771d perf/arm_cspmu: Fix event attribute type 6992931 ACPI/APMT: Don't register invalid resource 48f4b92 perf/arm_cspmu: Clean up ACPI dependency 7da1852 perf/arm_cspmu: Decouple APMT dependency d3d56a4 perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE File Changes (4 files) M drivers/acpi/arm64/apmt.c (10) M drivers/perf/arm_cspmu/Kconfig (3) M drivers/perf/arm_cspmu/arm_cspmu.c (95) M drivers/perf/arm_cspmu/arm_cspmu.h (5) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2033685/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037688] Re: Pull-request to address TPM bypass issue
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037688 Title: Pull-request to address TPM bypass issue Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: New Bug description: NVIDIA: [Config]: Ensure the TPM is available before IMA initializes Set the following configs: CONFIG_SPI_TEGRA210_QUAD=y CONFIG_TCG_TIS_SPI=y On Grace systems, the IMA driver emits the following log: ima: No TPM chip found, activating TPM-bypass! This occurs because the IMA driver initializes before we are able to detect the TPM. This will always be the case when the drivers required to communicate with the TPM, spi_tegra210_quad and tpm_tis_spi, are built as modules. Having these drivers as built-ins ensures that the TPM is available before the IMA driver initializes. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2037688/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
Unable to collect data via apport-collect due to VPN restrictions. ** Changed in: linux (Ubuntu Mantic) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Confirmed Status in linux source package in Mantic: Confirmed Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
** Changed in: linux (Ubuntu Mantic) Status: Incomplete => New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: New Status in linux source package in Mantic: New Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
Did an hour of soak testing with arm64 kernel https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5.6/arm64/linux-image- unsigned-6.5.6-060506-generic_6.5.6-060506.202310061235_arm64.deb and cannot reproduce this issue. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Status in linux source package in Mantic: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
And can reproduce on real H/W on a 24 core "SC2A11" is a multi-core chip with 24 cores of ARM® Cortex-A53. with Linux 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux [ 201.075720] EXT4-fs (loop13): mounted filesystem 52e32882-8b3a-47ce-8bf6-ce095960b1e7 r/w with ordered data mode. Quota mode: none. [ 516.665218] [ cut here ] [ 516.665249] kernel BUG at fs/dcache.c:2050! [ 516.665279] Internal error: Oops - BUG: f2000800 [#1] SMP [ 516.665301] Modules linked in: tls vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc overlay cfg80211 binfmt_misc zfs(PO) nls_iso8859_1 spl(O) snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore uio_pdrv_genirq uio dm_multipath efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear nouveau crct10dif_ce drm_ttm_helper polyval_ce polyval_generic ttm ghash_ce i2c_algo_bit drm_display_helper cec rc_core sm4 drm_kms_helper sha2_ce sha256_arm64 xhci_pci drm r8169 sha1_ce ahci xhci_pci_renesas realtek sdhci_f_sdh30 sdhci_pltfm sdhci gpio_keys [ 516.665743] netsec gpio_mb86s7x i2c_synquacer aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [ 516.665900] CPU: 2 PID: 17292 Comm: stress-ng-filen Tainted: P O 6.5.0-7-generic #7-Ubuntu [ 516.665927] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #85 Nov 6 2020 [ 516.665948] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 516.665974] pc : d_instantiate_new+0xa8/0xc8 [ 516.666006] lr : ext4_add_nondir+0x10c/0x160 [ 516.666029] sp : 8000857838d0 [ 516.666043] x29: 8000857838d0 x28: x27: 8000816ea980 [ 516.666076] x26: 0008119915e0 x25: 8180 x24: 000856c61ce8 [ 516.666108] x23: 0008119915c0 x22: 800085783950 x21: 00080359e1c0 [ 516.666140] x20: 0008561b1ce8 x19: x18: 800085a6d068 [ 516.666172] x17: x16: x15: 878b4681cc52c99d [ 516.666204] x14: d59de2a9feb89dca x13: 85e2878b4681cc52 x12: c99dd59de2a9feb8 [ 516.666236] x11: e3b9eedbdf1c7d27 x10: 732db84fa4ef339b x9 : 8000807690bc [ 516.666268] x8 : x7 : x6 : [ 516.666299] x5 : x4 : x3 : [ 516.666330] x2 : 8000836b52e8 x1 : 0008561b1ce8 x0 : 0008119915c0 [ 516.666362] Call trace: [ 516.666377] d_instantiate_new+0xa8/0xc8 [ 516.666401] ext4_create+0x120/0x238 [ 516.666422] lookup_open.isra.0+0x480/0x4d0 [ 516.666447] open_last_lookups+0x160/0x3b0 [ 516.666466] path_openat+0xa0/0x2a0 [ 516.666484] do_filp_open+0xa8/0x180 [ 516.666502] do_sys_openat2+0xe8/0x128 [ 516.666524] __arm64_sys_openat+0x70/0xe0 [ 516.666545] invoke_syscall+0x7c/0x128 [ 516.666566] el0_svc_common.constprop.0+0x5c/0x168 [ 516.666586] do_el0_svc+0x38/0x68 [ 516.04] el0_svc+0x30/0xe0 [ 516.26] el0t_64_sync_handler+0x148/0x158 [ 516.47] el0t_64_sync+0x1b0/0x1b8 [ 516.74] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 516.96] ---[ end trace ]--- -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Status in linux source package in Mantic: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
Reproduced this with mainline arm64 kernel https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5.5/arm64/linux-image- unsigned-6.5.5-060505-generic_6.5.5-060505.202309230703_arm64.deb [ 219.219042] Internal error: Oops - BUG: f2000800 [#1] SMP [ 219.262013] Modules linked in: cfg80211 binfmt_misc nls_iso8859_1 dm_multipat h drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_ generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_t x xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_ce poly val_ce polyval_generic ghash_ce sm4 sha2_ce sha256_arm64 virtio_net sha1_ce arm_ smccc_trng virtio_rng net_failover xhci_pci failover xhci_pci_renesas aes_neon_b s aes_neon_blk aes_ce_blk aes_ce_cipher [ 219.322456] CPU: 13 PID: 1182 Comm: stress-ng-filen Not tainted 6.5.5-060505- generic #202309230703 [ 219.332405] Hardware name: QEMU KVM Virtual Machine, BIOS 2023.05-2 09/23/202 3 [ 219.340433] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 219.348163] pc : d_instantiate_new+0xa8/0xc8 [ 219.352942] lr : ext4_add_nondir+0x10c/0x160 [ 219.357822] sp : 8000826ab9d0 [ 219.361517] x29: 8000826ab9d0 x28: x27: a9b65720a940 [ 219.369535] x26: 1ea33582d2e0 x25: 8180 x24: 1ea3c2bb3d48 [ 219.377494] x23: 1ea33582d2c0 x22: 8000826abab0 x21: 1ea3c3344930 [ 219.385428] x20: 1ea324bda188 x19: x18: 800080b4d068 [ 219.393336] x17: x16: x15: 9afaefe7af176647 [ 219.401279] x14: f302afa80109b8f3 x13: a3469afaefe7af17 x12: 6647f302afa80109 [ 219.409258] x11: b4e7e46bc44fb52e x10: 4e81094291a860ce x9 : a9b6562b1b74 [ 219.417639] x8 : x7 : x6 : [ 219.426015] x5 : x4 : x3 : [ 219.434462] x2 : a9b6591b27e8 x1 : 1ea324bda188 x0 : 1ea33582d2c0 [ 219.442708] Call trace: [ 219.445901] d_instantiate_new+0xa8/0xc8 [ 219.450786] ext4_create+0x120/0x238 [ 219.454800] lookup_open.isra.0+0x478/0x4c8 [ 219.459476] open_last_lookups+0x160/0x3b0 [ 219.464060] path_openat+0x9c/0x290 [ 219.468062] do_filp_open+0xac/0x188 [ 219.472175] do_sys_openat2+0xe4/0x120 [ 219.476412] __arm64_sys_openat+0x6c/0xd8 [ 219.481300] invoke_syscall+0x7c/0x128 [ 219.485876] el0_svc_common.constprop.0+0x5c/0x168 [ 219.491561] do_el0_svc+0x38/0x68 [ 219.495523] el0_svc+0x30/0xe0 [ 219.499161] el0t_64_sync_handler+0x148/0x158 [ 219.504139] el0t_64_sync+0x1b0/0x1b8 [ 219.508320] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 219.515430] ---[ end trace ]--- -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Status in linux source package in Mantic: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr :
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
Reproduced this with mainline arm64 kernel https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5/arm64/linux-image- unsigned-6.5.0-060500-generic_6.5.0-060500.202308271831_arm64.deb [ 184.853731] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 184.862627] pc : d_instantiate_new+0xa8/0xc8 [ 184.867973] lr : ext4_add_nondir+0xf0/0x148 [ 184.872959] sp : 8000828ab950 [ 184.877059] x29: 8000828ab950 x28: x27: d975b8b9a6c0 [ 184.885032] x26: 7b0094e32c20 x25: 8180 x24: 7b01432e9848 [ 184.893573] x23: 8000828aba30 x22: 7b0094e32c00 x21: 7b0172d574d0 [ 184.902071] x20: 7b0089fbc688 x19: x18: 800082295068 [ 184.910550] x17: x16: x15: 5e9ca062546ae354 [ 184.919056] x14: 998c9ec3ecc3a882 x13: 24d23ffaf8b470b6 x12: 022485883b51bee2 [ 184.927692] x11: 5c7ac5c18df459ab x10: 6e24d23ffaf8b470 x9 : d975b7c3d730 [ 184.936212] x8 : x7 : x6 : [ 184.944811] x5 : x4 : x3 : [ 184.953651] x2 : d975bab42cf0 x1 : 7b0089fbc688 x0 : 7b0094e32c00 [ 184.962508] Call trace: [ 184.965316] d_instantiate_new+0xa8/0xc8 [ 184.969803] ext4_create+0x120/0x238 [ 184.973910] lookup_open.isra.0+0x478/0x4c8 [ 184.978689] open_last_lookups+0x160/0x3b0 [ 184.983374] path_openat+0x9c/0x290 [ 184.987372] do_filp_open+0xac/0x188 [ 184.991444] do_sys_openat2+0xe4/0x120 [ 184.995701] __arm64_sys_openat+0x6c/0xd8 [ 185.000271] invoke_syscall+0x7c/0x128 [ 185.004520] el0_svc_common.constprop.0+0x5c/0x168 [ 185.009977] do_el0_svc+0x38/0x68 [ 185.013775] el0_svc+0x30/0xe0 [ 185.017265] el0t_64_sync_handler+0x148/0x158 [ 185.022183] el0t_64_sync+0x1b0/0x1b8 [ 185.026332] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 185.033606] ---[ end trace ]--- Took a while to trigger. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Status in linux source package in Mantic: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 :
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
Can't reproduce this with mainline arm64 kernel https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5.6/arm64/linux-image- unsigned-6.5.6-060506-generic_6.5.6-060506.202310061235_arm64.deb -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Status in linux source package in Mantic: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
Can't seem to trip the issue on a 24 core x86 instance, maybe this is ARM64 specific. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Status in linux source package in Mantic: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
** Also affects: linux (Ubuntu Mantic) Importance: High Status: Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Status in linux source package in Mantic: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
I created a 1GB file and created a fresh ext4 file system on it and loop back mounted it on /mnt, I created test directory /mnt/test and ran: /stress-ng --filename 0 --temp-path /mnt/test --klog-check Managed to trip the kernel crash again. So it appears to occur on a fresh ext4 file system too :-( -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2038768] [NEW] arm64: linux: stress-ng filename stressor crashes kernel
Public bug reported: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- ** Affects: linux (Ubuntu) Importance: High Status: Incomplete ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Description changed: Running latest Ubuntu mantic with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux - How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: - [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy - async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0
[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel
Note that just running stress-ng with --filename 0 will reproduce the issue. I'm testing this now on a cleanly formatted ext4 file system -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2038768 Title: arm64: linux: stress-ng filename stressor crashes kernel Status in linux package in Ubuntu: Incomplete Bug description: Running latest Ubuntu mantic (ext4 file system) with kernel: Linux mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux How to reproduce: Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server. Install latest stress-ng from git repo: sudo apt-get update sudo apt-get build-dep stress-ng git clone git://github.com/ColinIanKing/stress-ng cd stress-ng make clean make -j 24 make verify-test-all When we reach the filename stressor the kernel crashes as follows: [ 902.594715] kernel BUG at fs/dcache.c:2050! [ 902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP [ 902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_ ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_ neon_blk aes_ce_blk aes_ce_cipher [ 902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener ic #7-Ubuntu [ 902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 902.715488] pc : d_instantiate_new+0xa8/0xc8 [ 902.720889] lr : ext4_add_nondir+0x10c/0x160 [ 902.725702] sp : 80008b6d3930 [ 902.729390] x29: 80008b6d3930 x28: x27: bd164e51a980 [ 902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968 [ 902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0 [ 902.755776] x20: 678a36f8f028 x19: x18: 80008af45068 [ 902.764647] x17: x16: x15: ecececececececec [ 902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec [ 902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc [ 902.789346] x8 : x7 : x6 : [ 902.798564] x5 : x4 : x3 : [ 902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00 [ 902.815544] Call trace: [ 902.818870] d_instantiate_new+0xa8/0xc8 [ 902.823523] ext4_create+0x120/0x238 [ 902.827716] lookup_open.isra.0+0x480/0x4d0 [ 902.832480] open_last_lookups+0x160/0x3b0 [ 902.837060] path_openat+0xa0/0x2a0 [ 902.840975] do_filp_open+0xa8/0x180 [ 902.845582] do_sys_openat2+0xe8/0x128 [ 902.850426] __arm64_sys_openat+0x70/0xe0 [ 902.854952] invoke_syscall+0x7c/0x128 [ 902.859155] el0_svc_common.constprop.0+0x5c/0x168 [ 902.864979] do_el0_svc+0x38/0x68 [ 902.869364] el0_svc+0x30/0xe0 [ 902.873401] el0t_64_sync_handler+0x148/0x158 [ 902.878336] el0t_64_sync+0x1b0/0x1b8 [ 902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421) [ 902.890632] ---[ end trace ]--- To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
Re: [Kernel-packages] [Bug 2031352] Re: Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot screen
This also fixed the problem on my end. My Dell xps 15 9510 NVIDIA GA107m RTX3050 Ti Linux version 6.2.0-31-generic (buildd@lcy02-amd64-032) (x86_64-linux-gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) On Tue, Sep 5, 2023 at 12:50 AM Juan Manuel Vicente < 2031...@bugs.launchpad.net> wrote: > After following @juergh updates. I realized the problem was the nouveau > drivers, also I saw my fresh installation (Ubuntu 22.04.3) was not > detecting my RTX 3070. So I fixed both problems installing the drivers. > > > sudo apt install nvidia-driver-535 > > Now, I can shutdown and/or restart my machine without issues. The only > problem is this driver is the propertary one. However in my case is not > a problem. > > Regards > Juan > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/2031352 > > Title: > Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot > screen > > Status in linux package in Ubuntu: > Confirmed > Status in linux-hwe-6.2 package in Ubuntu: > Confirmed > Status in systemd package in Ubuntu: > Invalid > Status in linux-hwe-6.2 source package in Jammy: > Confirmed > Status in linux source package in Lunar: > Confirmed > > Bug description: > [Impact] > > After updating to Kernel 6.2 a few days ago, I have been experiencing > issues with my system's shutdown and reboot functions. During these > processes, the system becomes unresponsive and hangs on a black > screen, which displays both the Dell and Ubuntu logos. This issue is > inconsistent; it happens sporadically. Currently, the only workaround > I've found to successfully shut down the system is to forcibly power > off the machine by holding down the power button for 5 seconds. > > I've also tested a fresh installation of Ubuntu 22.04.3. > > [Fix] > > Updated patch from linux-next: > https://patchwork.freedesktop.org/patch/538562/ > > [Test Case] > > Suspend,resume,shutdown,reboot should all work correctly. No nouveau > stack trace in the kernel log. > > [Where Problems Could Occur] > > Limited to nouveau driver that wants to load nonexistent ACR firmware. > Only nvidia GPUs are affected. > > [Additional information] > > > > ProblemType: Bug > DistroRelease: Ubuntu 22.04 > Package: systemd 249.11-0ubuntu3.9 > ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13 > Uname: Linux 6.2.0-26-generic x86_64 > NonfreeKernelModules: nvidia_modeset nvidia > ApportVersion: 2.20.11-0ubuntu82.5 > Architecture: amd64 > CasperMD5CheckResult: pass > CurrentDesktop: ubuntu:GNOME > Date: Mon Aug 14 22:41:14 2023 > InstallationDate: Installed on 2023-08-14 (1 days ago) > InstallationMedia: Ubuntu 22.04.3 2023.08.13 LTS (20230813) > MachineType: Dell Inc. XPS 8930 > ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-26-generic > root=UUID=14d1ee7a-565f-4ba4-b6dd-7bc16e487451 ro quiet splash vt.handoff=7 > SourcePackage: systemd > UpgradeStatus: No upgrade log present (probably fresh install) > dmi.bios.date: 03/14/2023 > dmi.bios.release: 1.1 > dmi.bios.vendor: Dell Inc. > dmi.bios.version: 1.1.30 > dmi.board.name: 0T88YD > dmi.board.vendor: Dell Inc. > dmi.board.version: A00 > dmi.chassis.type: 3 > dmi.chassis.vendor: Dell Inc. > dmi.chassis.version: Not Specified > dmi.modalias: > dmi:bvnDellInc.:bvr1.1.30:bd03/14/2023:br1.1:svnDellInc.:pnXPS8930:pvr1.1.30:rvnDellInc.:rn0T88YD:rvrA00:cvnDellInc.:ct3:cvrNotSpecified:sku0859: > dmi.product.family: XPS > dmi.product.name: XPS 8930 > dmi.product.sku: 0859 > dmi.product.version: 1.1.30 > dmi.sys.vendor: Dell Inc. > modified.conffile..etc.default.apport: ># set this to 0 to disable apport, or to 1 to enable it ># you can temporarily override this with ># sudo service apport start force_start=1 >enabled=0 > mtime.conffile..etc.default.apport: 2023-08-13T20:57:27 > mtime.conffile..etc.systemd.system.conf: 2023-08-13T20:57:27 > > To manage notifications about this bug go to: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2031352/+subscriptions > > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2031352 Title: Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot screen Status in linux package in Ubuntu: Confirmed Status in linux-hwe-6.2 package in Ubuntu: Confirmed Status in linux-hwe-6.2 source package in Jammy: Confirmed Status in linux source package in Lunar: Confirmed Bug description: [Impact] After updating to Kernel 6.2 a few days ago, I have been experiencing issues with my system's shutdown and reboot functions. During these processes, the system becomes unresponsive and hangs on a black screen, which displays both the Dell and Ubuntu logos. This issue is inconsistent;
[Kernel-packages] [Bug 2031352] Re: Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot screen
@juanma-v82 this also fixed my issue. > sudo apt install nvidia-driver-535 Linux version 6.2.0-31-generic (buildd@lcy02-amd64-032) (x86_64-linux- gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) Dell xps 15 9510 NVIDIA Corporation GA107M [GeForce RTX 3050 Ti Mobile] Regards, Ian -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2031352 Title: Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot screen Status in linux package in Ubuntu: Confirmed Status in linux-hwe-6.2 package in Ubuntu: Confirmed Status in linux-hwe-6.2 source package in Jammy: Confirmed Status in linux source package in Lunar: Confirmed Bug description: [Impact] After updating to Kernel 6.2 a few days ago, I have been experiencing issues with my system's shutdown and reboot functions. During these processes, the system becomes unresponsive and hangs on a black screen, which displays both the Dell and Ubuntu logos. This issue is inconsistent; it happens sporadically. Currently, the only workaround I've found to successfully shut down the system is to forcibly power off the machine by holding down the power button for 5 seconds. I've also tested a fresh installation of Ubuntu 22.04.3. [Fix] Updated patch from linux-next: https://patchwork.freedesktop.org/patch/538562/ [Test Case] Suspend,resume,shutdown,reboot should all work correctly. No nouveau stack trace in the kernel log. [Where Problems Could Occur] Limited to nouveau driver that wants to load nonexistent ACR firmware. Only nvidia GPUs are affected. [Additional information] ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: systemd 249.11-0ubuntu3.9 ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13 Uname: Linux 6.2.0-26-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Mon Aug 14 22:41:14 2023 InstallationDate: Installed on 2023-08-14 (1 days ago) InstallationMedia: Ubuntu 22.04.3 2023.08.13 LTS (20230813) MachineType: Dell Inc. XPS 8930 ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-26-generic root=UUID=14d1ee7a-565f-4ba4-b6dd-7bc16e487451 ro quiet splash vt.handoff=7 SourcePackage: systemd UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 03/14/2023 dmi.bios.release: 1.1 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.1.30 dmi.board.name: 0T88YD dmi.board.vendor: Dell Inc. dmi.board.version: A00 dmi.chassis.type: 3 dmi.chassis.vendor: Dell Inc. dmi.chassis.version: Not Specified dmi.modalias: dmi:bvnDellInc.:bvr1.1.30:bd03/14/2023:br1.1:svnDellInc.:pnXPS8930:pvr1.1.30:rvnDellInc.:rn0T88YD:rvrA00:cvnDellInc.:ct3:cvrNotSpecified:sku0859: dmi.product.family: XPS dmi.product.name: XPS 8930 dmi.product.sku: 0859 dmi.product.version: 1.1.30 dmi.sys.vendor: Dell Inc. modified.conffile..etc.default.apport: # set this to 0 to disable apport, or to 1 to enable it # you can temporarily override this with # sudo service apport start force_start=1 enabled=0 mtime.conffile..etc.default.apport: 2023-08-13T20:57:27 mtime.conffile..etc.systemd.system.conf: 2023-08-13T20:57:27 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2031352/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Changed in: linux (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: Triaged Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: Fix Committed Status in linux source package in Jammy: Fix Committed Status in linux-hwe-5.19 source package in Jammy: Won't Fix Status in linux-hwe-6.2 source package in Jammy: Fix Committed Status in linux source package in Lunar: Fix Committed Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Also affects: linux (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: linux-hwe-5.19 (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: linux-hwe-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-hwe-5.19 (Ubuntu Jammy) Status: New => Fix Committed ** Changed in: linux-hwe-6.2 (Ubuntu Jammy) Status: New => Fix Committed ** Changed in: linux-hwe-5.19 (Ubuntu Jammy) Status: Fix Committed => Won't Fix -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: Triaged Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: Fix Committed Status in linux source package in Jammy: New Status in linux-hwe-5.19 source package in Jammy: Won't Fix Status in linux-hwe-6.2 source package in Jammy: Fix Committed Status in linux source package in Lunar: Fix Committed Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Changed in: linux-hwe-6.2 (Ubuntu) Status: New => Incomplete ** Changed in: linux-hwe-6.2 (Ubuntu) Status: Incomplete => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: Triaged Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: Fix Committed Status in linux source package in Lunar: Fix Committed Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2031352] Re: Ubuntu 22.04.3 LTS stuck on power-off/reboot screen
** Changed in: systemd (Ubuntu) Status: Invalid => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2031352 Title: Ubuntu 22.04.3 LTS stuck on power-off/reboot screen Status in linux-hwe-6.2 package in Ubuntu: Confirmed Status in systemd package in Ubuntu: Confirmed Status in linux-hwe-6.2 source package in Jammy: Confirmed Bug description: After updating to Kernel 6.2 a few days ago, I have been experiencing issues with my system's shutdown and reboot functions. During these processes, the system becomes unresponsive and hangs on a black screen, which displays both the Dell and Ubuntu logos. This issue is inconsistent; it happens sporadically. Currently, the only workaround I've found to successfully shut down the system is to forcibly power off the machine by holding down the power button for 5 seconds. I've also tested a fresh installation of Ubuntu 22.04.3. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: systemd 249.11-0ubuntu3.9 ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13 Uname: Linux 6.2.0-26-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.11-0ubuntu82.5 Architecture: amd64 CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Mon Aug 14 22:41:14 2023 InstallationDate: Installed on 2023-08-14 (1 days ago) InstallationMedia: Ubuntu 22.04.3 2023.08.13 LTS (20230813) MachineType: Dell Inc. XPS 8930 ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-26-generic root=UUID=14d1ee7a-565f-4ba4-b6dd-7bc16e487451 ro quiet splash vt.handoff=7 SourcePackage: systemd UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 03/14/2023 dmi.bios.release: 1.1 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.1.30 dmi.board.name: 0T88YD dmi.board.vendor: Dell Inc. dmi.board.version: A00 dmi.chassis.type: 3 dmi.chassis.vendor: Dell Inc. dmi.chassis.version: Not Specified dmi.modalias: dmi:bvnDellInc.:bvr1.1.30:bd03/14/2023:br1.1:svnDellInc.:pnXPS8930:pvr1.1.30:rvnDellInc.:rn0T88YD:rvrA00:cvnDellInc.:ct3:cvrNotSpecified:sku0859: dmi.product.family: XPS dmi.product.name: XPS 8930 dmi.product.sku: 0859 dmi.product.version: 1.1.30 dmi.sys.vendor: Dell Inc. modified.conffile..etc.default.apport: # set this to 0 to disable apport, or to 1 to enable it # you can temporarily override this with # sudo service apport start force_start=1 enabled=0 mtime.conffile..etc.default.apport: 2023-08-13T20:57:27 mtime.conffile..etc.systemd.system.conf: 2023-08-13T20:57:27 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.2/+bug/2031352/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1982519] Re: GDS: Add NFS patches to optimized kernel
** Changed in: linux-nvidia-5.19 (Ubuntu Jammy) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/1982519 Title: GDS: Add NFS patches to optimized kernel Status in linux-nvidia package in Ubuntu: New Status in linux-nvidia-5.19 package in Ubuntu: New Status in linux-nvidia-6.2 package in Ubuntu: New Status in linux-nvidia source package in Jammy: Fix Released Status in linux-nvidia-5.19 source package in Jammy: Fix Released Status in linux-nvidia-6.2 source package in Jammy: Fix Released Bug description: [Impact] Adding these changes will enable GDS functionality NFS drivers. [Fix] This is a not a fix but a new feature being to NFS driver. [Test] Tested the NFS driver on a hpe system as I did not have a setup with BASEOS6. 1) Installed 5.15.39 kernel on the system (this is the kernel that optimized kernel is on currently). 2) Downloaded the optimized kernel. 3) Applied the patches to the optimized kernel 4) Replaced the NFS modules on the system with the one's built on optimized kernel. 5) Ran gds and compat mode tests on a NFS mount with the patched NFS driver. All tests went fine. Attaching the results Compat mode tests == ** API Tests, : 72 / 72 tests passed ** Testsuite : 211 / 211 tests passed done tests:Thu Jul 21 08:27:58 PM UTC 2022 GDS mode tests == ** NVFS IOCTL negative Tests, : 23 / 23 tests passed ** Testsuite : 249 / 249 tests passed End: nvidia-fs: GDS Version: 1.4.0.31 NVFS statistics(ver: 4.0) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/1982519/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1982519] Re: GDS: Add NFS patches to optimized kernel
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/1982519 Title: GDS: Add NFS patches to optimized kernel Status in linux-nvidia package in Ubuntu: New Status in linux-nvidia-5.19 package in Ubuntu: New Status in linux-nvidia-6.2 package in Ubuntu: New Status in linux-nvidia source package in Jammy: Fix Released Status in linux-nvidia-5.19 source package in Jammy: New Status in linux-nvidia-6.2 source package in Jammy: Fix Released Bug description: [Impact] Adding these changes will enable GDS functionality NFS drivers. [Fix] This is a not a fix but a new feature being to NFS driver. [Test] Tested the NFS driver on a hpe system as I did not have a setup with BASEOS6. 1) Installed 5.15.39 kernel on the system (this is the kernel that optimized kernel is on currently). 2) Downloaded the optimized kernel. 3) Applied the patches to the optimized kernel 4) Replaced the NFS modules on the system with the one's built on optimized kernel. 5) Ran gds and compat mode tests on a NFS mount with the patched NFS driver. All tests went fine. Attaching the results Compat mode tests == ** API Tests, : 72 / 72 tests passed ** Testsuite : 211 / 211 tests passed done tests:Thu Jul 21 08:27:58 PM UTC 2022 GDS mode tests == ** NVFS IOCTL negative Tests, : 23 / 23 tests passed ** Testsuite : 249 / 249 tests passed End: nvidia-fs: GDS Version: 1.4.0.31 NVFS statistics(ver: 4.0) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/1982519/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Also affects: linux (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: linux-hwe-5.19 (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: linux-hwe-6.2 (Ubuntu Lunar) Importance: Undecided Status: New ** No longer affects: linux-hwe-5.19 (Ubuntu Lunar) ** No longer affects: linux-hwe-6.2 (Ubuntu Lunar) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: New Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: New Status in linux source package in Lunar: New Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
With Kinetic going EOL, there will be no further SRU updates for linux- hwe-5.19 ** Description changed: + SRU Justification: + + [ Impact ] + + On systems that have the following combination of hardware + + 1) arm64 CPU + 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ + + No output when connecting a display to the BMC's VGA port. + + [ Fix ] + + For AST2500+ MMIO should be enabled by default. + + [ Test Plan ] + + Test on targeted hardware to make sure BMC is displaying output. + + [ Where problems could occur ] + + Not aware of any potential problems, but any should be confined to + ASPEED AST2500+ hardware. + + [ Other Info ] + + Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been + tested with affected BMC. + + + [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 ** Changed in: linux-hwe-5.19 (Ubuntu) Status: New => Won't Fix -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: New Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: New Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026883] Re: vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs
It may be worth trying this on real H/W to factor our the QEMU component. ** Description changed: When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.00, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.00, expected 13278722.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.00, expected 26213772.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.00, expected 39415832.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 157250576.00, expected 33576.261719 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 170314032.00, expected 13129044.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 183516080.00, expected 26348392.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 196647552.00, expected 39365508.00 etc.. However, running less instances than the number of CPUs this runs fine without any errors: /stress-ng --vecfp 1 --verify -t 10 stress-ng: info: [1521] setting to a 10 second run per stressor stress-ng: info: [1521] dispatching hogs: 1 vecfp stress-ng: info: [1521] passed: 1: vecfp (1) stress-ng: info: [1521] failed: 0 stress-ng: info: [1521] skipped: 0 stress-ng: info: [1521] metrics untrustworthy: 0 stress-ng: info: [1521] successful run completed in 19.00s It appears this only fails when the number of instances of the vecfp stressor is more than the number of virtual CPUs. This seems to indicate that vector floating point registers are being clobbered between processes, which could be a security exploitable issue. Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host - (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6) + (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6). + + List of PPC64el kernels reproducers: + + Lunar: 6.2.0-20-generic + Mantic: 6.3.0-7-generic + Not sure if this is a kernel or KVM issue, or both. ** Information type changed from Public to Private Security ** Changed in: linux (Ubuntu Lunar) Importance: Undecided => High ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Also affects: linux (Ubuntu Mantic) Importance: High Status: New ** Description changed: When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.00, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.00, expected 13278722.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.00, expected 26213772.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.00, expected 39415832.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 157250576.00, expected 33576.261719 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 170314032.00, expected 13129044.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 183516080.00, expected 26348392.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got
[Kernel-packages] [Bug 2026883] [NEW] vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs
*** This bug is a security vulnerability *** Private security bug reported: When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.00, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.00, expected 13278722.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.00, expected 26213772.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.00, expected 39415832.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 157250576.00, expected 33576.261719 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 170314032.00, expected 13129044.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 183516080.00, expected 26348392.00 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 196647552.00, expected 39365508.00 etc.. However, running less instances than the number of CPUs this runs fine without any errors: /stress-ng --vecfp 1 --verify -t 10 stress-ng: info: [1521] setting to a 10 second run per stressor stress-ng: info: [1521] dispatching hogs: 1 vecfp stress-ng: info: [1521] passed: 1: vecfp (1) stress-ng: info: [1521] failed: 0 stress-ng: info: [1521] skipped: 0 stress-ng: info: [1521] metrics untrustworthy: 0 stress-ng: info: [1521] successful run completed in 19.00s It appears this only fails when the number of instances of the vecfp stressor is more than the number of virtual CPUs. This seems to indicate that vector floating point registers are being clobbered between processes, which could be a security exploitable issue. Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6). List of PPC64el kernels reproducers: Focal: 5.4.0-148-generic Jammy: 5.15.0-58-generic Lunar: 6.2.0-20-generic Mantic: 6.3.0-7-generic Not sure if this is a kernel or KVM issue, or both. ** Affects: linux (Ubuntu) Importance: High Status: New ** Affects: linux (Ubuntu Focal) Importance: High Status: New ** Affects: linux (Ubuntu Lunar) Importance: High Status: New ** Affects: linux (Ubuntu Mantic) Importance: High Status: New ** Also affects: linux (Ubuntu Lunar) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2026883 Title: vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs Status in linux package in Ubuntu: New Status in linux source package in Focal: New Status in linux source package in Lunar: New Status in linux source package in Mantic: New Bug description: When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.00, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.00, expected 13278722.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.00, expected 26213772.00 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.00, expected 39415832.00 stress-ng: fail: [1488] vecfp: floatv16div float vector
[Kernel-packages] [Bug 2017903] Re: LSM stacking and AppArmor for 6.2: additional fixes
Note that this could be triggered with stress-ng --apparmor 0; see https://bugs.launchpad.net/ubuntu/mantic/+source/linux/+bug/2024599 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2017903 Title: LSM stacking and AppArmor for 6.2: additional fixes Status in linux package in Ubuntu: Fix Released Status in linux source package in Lunar: Fix Released Bug description: [Impact] We maintain custom LSM stacking and AppArmor SAUCE patches in our kernel to provide additional features that are not available in the upstream AppArmor. We have experienced occasional bugs in the lunar kernel (specifically with the environ.sh test) that can lead to system crashes / failures (such as potential NULL pointer dereference). [Test case] Run AppArmor autopkgtest / qa-regression-testing. [Fix] Apply the following additional fixes provided by AppArmor upstream maintainer: UBUNTU: SAUCE: apparmor: fix policy_compat perms remap for file dfa UBUNTU: SAUCE: apparmor: fix profile verification and enable it UBUNTU: SAUCE: apparmor: fix: add missing failure check in compute_xmatch_perms UBUNTU: SAUCE: apparmor: fix: kzalloc perms tables for shared dfas [Regression potential] Additional fixes are touching only AppArmor specific code, so we may experience regressions (bugs / behavior change) only in apparmor by applying them. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2017903/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load
Thanks JJ, much appreciated :-) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2024599 Title: linux-image-5.15.0-1032-realtime locks up under scheduler test load Status in apparmor package in Ubuntu: New Status in linux package in Ubuntu: Incomplete Status in apparmor source package in Jammy: New Status in linux source package in Jammy: Incomplete Status in apparmor source package in Kinetic: New Status in linux source package in Kinetic: New Status in apparmor source package in Lunar: New Status in linux source package in Lunar: New Status in apparmor source package in Mantic: New Status in linux source package in Mantic: Incomplete Bug description: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy uname -a Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2024599/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load
And also occurs in Ubuntu Mantic with 6.3.0-7-generic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2024599 Title: linux-image-5.15.0-1032-realtime locks up under scheduler test load Status in apparmor package in Ubuntu: New Status in linux package in Ubuntu: Incomplete Status in apparmor source package in Jammy: New Status in linux source package in Jammy: Incomplete Status in apparmor source package in Kinetic: New Status in linux source package in Kinetic: New Status in apparmor source package in Lunar: New Status in linux source package in Lunar: New Status in apparmor source package in Mantic: New Status in linux source package in Mantic: Incomplete Bug description: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy uname -a Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2024599/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load
5.15.0.75 works fine, no problem, 5.19.0-45 kernel crashes, so issue introduced between 5.15 and 5.19 ** Also affects: apparmor (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: apparmor (Ubuntu Kinetic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Kinetic) Importance: Undecided Status: New ** Also affects: apparmor (Ubuntu Mantic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Mantic) Importance: Low Status: Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2024599 Title: linux-image-5.15.0-1032-realtime locks up under scheduler test load Status in apparmor package in Ubuntu: New Status in linux package in Ubuntu: Incomplete Status in apparmor source package in Jammy: New Status in linux source package in Jammy: Incomplete Status in apparmor source package in Kinetic: New Status in linux source package in Kinetic: New Status in apparmor source package in Lunar: New Status in linux source package in Lunar: New Status in apparmor source package in Mantic: New Status in linux source package in Mantic: Incomplete Bug description: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy uname -a Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2024599/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load
And with 5.19.0-45-generic: sudo ./stress-ng --apparmor 1 --klog-check [sudo] password for cking: stress-ng: info: [1179] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor stress-ng: info: [1179] dispatching hogs: 1 apparmor stress-ng: info: [1180] klog-check: kernel cmdline: 'BOOT_IMAGE=/vmlinuz-5.19.0-45-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro' stress-ng: error: [1180] klog-check: error: [93.527396] 'AppArmor DFA next/check upper bounds error' stress-ng: error: [1180] klog-check: error: [93.827976] 'AppArmor DFA state with invalid match flags' stress-ng: error: [1180] klog-check: error: [93.991395] 'AppArmor DFA next/check upper bounds error' stress-ng: error: [1180] klog-check: error: [93.992189] 'AppArmor DFA next/check upper bounds error' stress-ng: error: [1180] klog-check: error: [94.007400] 'AppArmor DFA state with invalid match flags' stress-ng: error: [1180] klog-check: error: [94.059345] 'AppArmor DFA state with invalid match flags' stress-ng: error: [1180] klog-check: error: [94.104414] 'AppArmor DFA next/check upper bounds error' stress-ng: error: [1180] klog-check: alert: [94.128617] 'BUG: kernel NULL pointer dereference, address: 0130' stress-ng: error: [1180] klog-check: alert: [94.128644] '#PF: supervisor read access in kernel mode' stress-ng: error: [1180] klog-check: alert: [94.128659] '#PF: error_code(0x) - not-present page' stress-ng: info: [1180] klog-check: warning: [94.128685] 'Oops: [#1] PREEMPT SMP PTI' stress-ng: info: [1180] klog-check: warning: [94.128698] 'CPU: 7 PID: 1185 Comm: stress-ng-appar Not tainted 5.19.0-45-generic #46-Ubuntu' stress-ng: info: [1180] klog-check: warning: [94.128722] 'Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014' stress-ng: info: [1180] klog-check: warning: [94.128745] 'RIP: 0010:aa_unpack+0x11f/0x530' stress-ng: info: [1180] klog-check: warning: [94.128762] 'Code: 00 48 85 c0 0f 84 15 04 00 00 48 8d 75 a8 48 8d 7d b0 4c 8b 7d c0 e8 80 ec ff ff 48 89 c3 48 3d 00 f0 ff ff 0f 87 00 02 00 00 <4c> 8b b0 30 01 00 00 4d 85 f6 0f 84 38 01 00 00 49 8b 86 c8 00 00' stress-ng: info: [1180] klog-check: warning: [94.128807] 'RSP: 0018:b1fdc0f57ce0 EFLAGS: 00010207' stress-ng: info: [1180] klog-check: warning: [94.129378] 'RAX: RBX: RCX: ' stress-ng: info: [1180] klog-check: warning: [94.129928] 'RDX: RSI: RDI: ' stress-ng: info: [1180] klog-check: warning: [94.130443] 'RBP: b1fdc0f57d40 R08: R09: ' stress-ng: info: [1180] klog-check: warning: [94.131056] 'R10: R11: R12: b1fdc0f57da8' stress-ng: info: [1180] klog-check: warning: [94.131572] 'R13: b1fdc0f57da0 R14: 9da384835962 R15: 9da384820010' stress-ng: info: [1180] klog-check: warning: [94.132090] 'FS: 7fa65a059740() GS:9da3fbdc() knlGS:' stress-ng: info: [1180] klog-check: warning: [94.132652] 'CS: 0010 DS: ES: CR0: 80050033' stress-ng: info: [1180] klog-check: warning: [94.133206] 'CR2: 0130 CR3: 00010d432006 CR4: 00370ee0' stress-ng: info: [1180] klog-check: warning: [94.133739] 'DR0: DR1: DR2: ' stress-ng: info: [1180] klog-check: warning: [94.134282] 'DR3: DR6: fffe0ff0 DR7: 0400' stress-ng: info: [1180] klog-check: warning: [94.134868] 'Call Trace:' stress-ng: info: [1180] klog-check: warning: [94.135388] ' ' stress-ng: info: [1180] klog-check: warning: [94.135933] ' aa_replace_profiles+0xa1/0x10b0' stress-ng: info: [1180] klog-check: warning: [94.136471] ' ? check_heap_object+0x29/0x1e0' stress-ng: info: [1180] klog-check: warning: [94.137018] ' ? __check_object_size.part.0+0x4c/0xf0' stress-ng: info: [1180] klog-check: warning: [94.137528] ' policy_update+0xd0/0x170' stress-ng: info: [1180] klog-check: warning: [94.138061] ' profile_replace+0xb9/0x150' stress-ng: info: [1180] klog-check: warning: [94.138612] ' vfs_write+0xb7/0x290' stress-ng: info: [1180] klog-check: warning: [94.139124] ' ksys_write+0x73/0x100' stress-ng: info: [1180] klog-check: warning: [94.139616] ' __x64_sys_write+0x19/0x30' stress-ng: info: [1180] klog-check: warning: [94.140104] ' do_syscall_64+0x58/0x90' stress-ng: info: [1180] klog-check: warning: [94.140651] ' ? syscall_exit_to_user_mode+0x29/0x50' stress-ng: info: [1180] klog-check: warning: [94.141130] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1180] klog-check: warning: [94.141630] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1180] klog-check: warning: [94.142117] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1180] klog-check: warning: [94.142627] ' entry_SYSCALL_64_after_hwframe+0x63/0xcd' stress-ng: info: [1180] klog-check: warning:
[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load
On 6.2.0-21-generic I also get: sudo ./stress-ng --apparmor 1 --klog-check stress-ng: error: [1083] klog-check: alert: [66.442338] 'BUG: kernel NULL pointer dereference, address: 0030' stress-ng: error: [1083] klog-check: alert: [66.442538] '#PF: supervisor read access in kernel mode' stress-ng: error: [1083] klog-check: alert: [66.442718] '#PF: error_code(0x) - not-present page' stress-ng: info: [1083] klog-check: warning: [66.443080] 'Oops: [#1] PREEMPT SMP PTI' stress-ng: info: [1083] klog-check: warning: [66.443256] 'CPU: 3 PID: 1088 Comm: stress-ng-appar Not tainted 6.2.0-21-generic #21-Ubuntu' stress-ng: info: [1083] klog-check: warning: [66.443438] 'Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014' stress-ng: info: [1083] klog-check: warning: [66.443628] 'RIP: 0010:aafs_create.constprop.0+0x7f/0x130' stress-ng: info: [1083] klog-check: warning: [66.443819] 'Code: 4c 63 e0 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 c3 cc cc cc cc <4d> 8b 55 30 4d 8d ba a0 00 00 00 4c 89 55 c0 4c 89 ff e8 8a 59 a1' stress-ng: info: [1083] klog-check: warning: [66.444227] 'RSP: 0018:beb940907bd8 EFLAGS: 00010246' stress-ng: info: [1083] klog-check: warning: [66.33] 'RAX: RBX: 41ed RCX: ' stress-ng: info: [1083] klog-check: warning: [66.444646] 'RDX: RSI: RDI: ' stress-ng: info: [1083] klog-check: warning: [66.444862] 'RBP: beb940907c18 R08: R09: ' stress-ng: info: [1083] klog-check: warning: [66.445074] 'R10: R11: R12: 93db8b18' stress-ng: info: [1083] klog-check: warning: [66.445291] 'R13: R14: R15: ' stress-ng: info: [1083] klog-check: warning: [66.445503] 'FS: 7f60f5c07740() GS:9578bbcc() knlGS:' stress-ng: info: [1083] klog-check: warning: [66.445721] 'CS: 0010 DS: ES: CR0: 80050033' stress-ng: info: [1083] klog-check: warning: [66.445939] 'CR2: 0030 CR3: 000124ffa004 CR4: 00370ee0' stress-ng: info: [1083] klog-check: warning: [66.446163] 'DR0: DR1: DR2: ' stress-ng: info: [1083] klog-check: warning: [66.446387] 'DR3: DR6: fffe0ff0 DR7: 0400' stress-ng: info: [1083] klog-check: warning: [66.446608] 'Call Trace:' stress-ng: info: [1083] klog-check: warning: [66.446829] ' ' stress-ng: info: [1083] klog-check: warning: [66.447059] ' __aafs_profile_mkdir+0x3d6/0x480' stress-ng: info: [1083] klog-check: warning: [66.447290] ' aa_replace_profiles+0x862/0x1270' stress-ng: info: [1083] klog-check: warning: [66.447518] ' policy_update+0xe0/0x180' stress-ng: info: [1083] klog-check: warning: [66.447750] ' profile_replace+0xb9/0x150' stress-ng: info: [1083] klog-check: warning: [66.447981] ' vfs_write+0xc8/0x410' stress-ng: info: [1083] klog-check: warning: [66.448213] ' ? kmem_cache_free+0x1e/0x3b0' stress-ng: info: [1083] klog-check: warning: [66.448442] ' ksys_write+0x73/0x100' stress-ng: info: [1083] klog-check: warning: [66.448670] ' __x64_sys_write+0x19/0x30' stress-ng: info: [1083] klog-check: warning: [66.448892] ' do_syscall_64+0x58/0x90' stress-ng: info: [1083] klog-check: warning: [66.449115] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1083] klog-check: warning: [66.449337] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1083] klog-check: warning: [66.449551] ' ? exit_to_user_mode_loop+0xe0/0x130' stress-ng: info: [1083] klog-check: warning: [66.449775] ' ? exit_to_user_mode_prepare+0x30/0xb0' stress-ng: info: [1083] klog-check: warning: [66.449996] ' ? syscall_exit_to_user_mode+0x29/0x50' stress-ng: info: [1083] klog-check: warning: [66.450220] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1083] klog-check: warning: [66.450449] ' ? exit_to_user_mode_prepare+0x30/0xb0' stress-ng: info: [1083] klog-check: warning: [66.450681] ' ? syscall_exit_to_user_mode+0x29/0x50' stress-ng: info: [1083] klog-check: warning: [66.450915] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1083] klog-check: warning: [66.451151] ' ? do_syscall_64+0x67/0x90' stress-ng: info: [1083] klog-check: warning: [66.451384] ' entry_SYSCALL_64_after_hwframe+0x72/0xdc' stress-ng: info: [1083] klog-check: warning: [66.451614] 'RIP: 0033:0x7f60f5b0b9e4' stress-ng: info: [1083] klog-check: warning: [66.451848] 'Code: 15 39 a4 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d fd 2b 0f 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48' stress-ng: info: [1083] klog-check: warning: [66.452341] 'RSP: 002b:7ffdaa28bfb8 EFLAGS: 0202 ORIG_RAX: 0001' stress-ng: info:
[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load
I've managed to capture where it hangs, looks like a RCU issue, see attached screen shot. ** Attachment added: "Screenshot from 2023-06-23 12-28-42.png" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+attachment/5681654/+files/Screenshot%20from%202023-06-23%2012-28-42.png -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2024599 Title: linux-image-5.15.0-1032-realtime locks up under scheduler test load Status in linux package in Ubuntu: Incomplete Status in linux source package in Jammy: Incomplete Bug description: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy uname -a Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load
I'm working through the stressors to see which ones are possibly causing issues. I did notice that the apparmor stressor eats up memory until the system runs out of memory. This stressor loads illegal apparmor profiles and then removes them. Perhaps there is a memory leak in the loading of profiles that don't pass the verification phase: To show this issue, run the following, one can see that memory gets low over time before the user gets kicked off due to low memory: sudo ./stress-ng --apparmor 1 --vmstat 5 stress-ng: info: [1339] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor stress-ng: info: [1339] dispatching hogs: 1 apparmor stress-ng: info: [1340] vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa st stress-ng: info: [1340] vmstat: 2 1 0313824 32776 36435200 16 18 4858 9752 4 25 70 0 0 stress-ng: info: [1340] vmstat: 5 0 0257848 32776 36652800 0 1091 4573 8435 4 23 72 0 0 stress-ng: info: [1340] vmstat: 5 0 0198916 32784 36828800 0 20 4642 8681 4 23 71 1 0 stress-ng: info: [1340] vmstat: 2 0 0139496 32792 37060000 0 16 4612 8500 4 23 71 1 0 stress-ng: info: [1340] vmstat: 2 0 0 85032 32740 36391600 0 1751 4774 8710 4 23 71 1 0 stress-ng: info: [1340] vmstat: 5 0 0 92224 32748 31054800 0 2020 5919 10123 4 24 70 1 0 stress-ng: info: [1340] vmstat: 2 0 0 93380 30068 26848400 0 14 5590 10275 4 26 69 1 0 stress-ng: info: [1340] vmstat: 2 0 0102152 23648 20787200 0 3346 5277 9303 4 24 70 1 0 stress-ng: info: [1340] vmstat: 5 0 0 99184 18488 16908400 48 2180 5614 9901 4 25 71 0 0 stress-ng: info: [1340] vmstat: 2 0 0 88068 7080 14039200359 2090 6146 11013 4 27 68 0 0 stress-ng: info: [1340] vmstat: 2 0 0 92368 564 8210800 3568 2534 5899 10308 4 26 67 1 0 stress-ng: info: [1340] vmstat: 7 0 0 83784 100 4735600 99834 4212 8540 14574 4 28 65 2 0 stress-ng: info: [1340] vmstat: 2 0 0 76784 188 4491600 363427 7621 16647 28448 4 37 45 12 0 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2024599 Title: linux-image-5.15.0-1032-realtime locks up under scheduler test load Status in linux package in Ubuntu: Incomplete Status in linux source package in Jammy: Incomplete Bug description: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy uname -a Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2024599] [NEW] linux-image-5.15.0-1032-realtime locks up under scheduler test load
Public bug reported: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 22.04.2 LTS Release:22.04 Codename: jammy uname -a Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. ** Affects: linux (Ubuntu) Importance: Low Status: New ** Affects: linux (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Importance: Undecided => Low ** Also affects: linux (Ubuntu Jammy) Importance: Undecided Status: New ** Summary changed: - linux-image-5.15.0-1032-realtime locksup under scheduler test load + linux-image-5.15.0-1032-realtime locks up under scheduler test load ** Description changed: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy uname -a - Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 + Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free -totalusedfree shared buff/cache available + totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. - - cd stress-ng -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2024599 Title: linux-image-5.15.0-1032-realtime locks up under scheduler test load Status in linux package in Ubuntu: New Status in linux source package in Jammy: New Bug description: lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy uname -a Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 24 11:45:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux free totalusedfree shared buff/cache available Mem: 4013888 200984 34390121204 373892 3744628 Swap:4014076 0 4014076 Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS): how to reproduce issue: git clone https://github.com/ColinIanKing/stress-ng sudo apt-get update sudo apt-get build-dep stress-ng sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev libglvnd-dev libgbm-dev cd stress-ng make clean make -j 8 sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m ..wait for all the stressors to get invoked, system becomes unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears to be hard locked. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk
Unfortunately, an unacceptable side-effect of "sysctl -w vm.dirty_ratio=0" is that disk operations that move a lot of data are taking much too long. An rsync that normally takes less than an hour of real time was still running over 12 hours later (and hasn't finished yet). I'm reverting the vm.dirty_ratio back up to 20 to see if that clears out all the unfinished disk I/O. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/2018687 Title: rm -r dir on USB disk locks up hdparm on different disk Status in linux-meta-hwe-5.15 package in Ubuntu: New Bug description: Description:Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz Running a "rm -r dir" on a directory with millions of files that resides on a disk in an external USB-3 hard drive dock locks up an unrelated hdparm processes running on an internal disk such that the kernel says: May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. [...] May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. [...] May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that it doesn't affect anything: # \time hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive. I now run this loop to show the timings and process states below: # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done (I have some processes stopped in a freezer cgroup ("refrig") that I don't want to see in the grep output.) I SIGCONT the offending "rm -r" running on the drive in the USB3 drive dock and you see the rm appear in uninterruptible sleep along with a couple of kernel processes: Sun May 7 05:01:07 EDT 2023 Sun May 7 05:01:08 EDT 2023 Sun May 7 05:01:09 EDT 2023 Sun May 7 05:01:10 EDT 2023 Sun May 7 05:01:11 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:12 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:13 EDT 2023 The above lines showing those processes in uninterruptible sleep repeat over and over each second as the "rm -r" continues. I then start up "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in uninterruptible sleep and doesn't finish even after minutes of waiting: Sun May 7 05:01:25 EDT 2023 1 0 368 2 20 0 0 0 md_sup D? 2:57 [md0_raid5] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:26 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:27 EDT 2023 [...] 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:35 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 11260 2544 blk_mq D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:36 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04985 2 20 0 0 0 rq_qos D? 0:24 [jbd2/sdj1-8] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02
[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk
If I do this: # sysctl -w vm.dirty_ratio=0 the hdparm no longer hangs. It has to be zero; anything non-zero, even 1, causes large delays in disk-related commands such as hdparm, sync, smartctl, etc. I got this idea from here: https://serverfault.com/questions/405210/can-high-load-cause-server- hang-and-error-blocked-for-more-than-120-seconds -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/2018687 Title: rm -r dir on USB disk locks up hdparm on different disk Status in linux-meta-hwe-5.15 package in Ubuntu: New Bug description: Description:Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz Running a "rm -r dir" on a directory with millions of files that resides on a disk in an external USB-3 hard drive dock locks up an unrelated hdparm processes running on an internal disk such that the kernel says: May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. [...] May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. [...] May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that it doesn't affect anything: # \time hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive. I now run this loop to show the timings and process states below: # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done (I have some processes stopped in a freezer cgroup ("refrig") that I don't want to see in the grep output.) I SIGCONT the offending "rm -r" running on the drive in the USB3 drive dock and you see the rm appear in uninterruptible sleep along with a couple of kernel processes: Sun May 7 05:01:07 EDT 2023 Sun May 7 05:01:08 EDT 2023 Sun May 7 05:01:09 EDT 2023 Sun May 7 05:01:10 EDT 2023 Sun May 7 05:01:11 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:12 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:13 EDT 2023 The above lines showing those processes in uninterruptible sleep repeat over and over each second as the "rm -r" continues. I then start up "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in uninterruptible sleep and doesn't finish even after minutes of waiting: Sun May 7 05:01:25 EDT 2023 1 0 368 2 20 0 0 0 md_sup D? 2:57 [md0_raid5] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:26 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:27 EDT 2023 [...] 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:35 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 11260 2544 blk_mq D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:36 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04985 2 20 0 0 0 rq_qos D? 0:24 [jbd2/sdj1-8] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02
[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk
I now have a Perl script running that is removing duplicate files by doing thousands of hard links on a different external USB3 disk and it is locking up or timing out many disk-related things on all my other disks. Both this USB3 external drive and the one above are plugged directly into the motherboard (HP Z440 Workstation). F UID PIDPPID PRI NIVSZ RSS WCHAN STAT TTYTIME COMMAND 4 0 2210112 2210111 20 0 53680 42588 blk_mq D+ pts/24 3:00 /usr/bin/perl [...] May 14 13:37:49 kernel: [259424.745462] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 00:08:09 kernel: [297244.761855] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 00:10:10 kernel: [297365.592485] INFO: task smartd:2719 blocked for more than 241 seconds. May 15 01:08:34 kernel: [300869.682961] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 01:28:43 kernel: [302077.989582] INFO: task hdparm:2052842 blocked for more than 120 seconds. May 15 01:30:43 kernel: [302198.820278] INFO: task hdparm:2052842 blocked for more than 241 seconds. May 15 01:32:44 kernel: [302319.654907] INFO: task hdparm:2052842 blocked for more than 362 seconds. May 15 01:34:45 kernel: [302440.481601] INFO: task hdparm:2052842 blocked for more than 483 seconds. May 15 01:36:46 kernel: [302561.316237] INFO: task hdparm:2052842 blocked for more than 604 seconds. May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more than 120 seconds. From one of the logged events: May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 02:06:58 kernel: [304373.770209] Tainted: G O 5.15.0-72-generic #79~20.04.1-Ubuntu May 15 02:06:58 kernel: [304373.770215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 15 02:06:58 kernel: [304373.770218] task:smartd state:D stack: 0 pid: 2719 ppid: 1 flags:0x May 15 02:06:58 kernel: [304373.770226] Call Trace: May 15 02:06:58 kernel: [304373.770230] May 15 02:06:58 kernel: [304373.770236] __schedule+0x2cd/0x890 May 15 02:06:58 kernel: [304373.770251] schedule+0x69/0x110 May 15 02:06:58 kernel: [304373.770260] schedule_preempt_disabled+0xe/0x20 May 15 02:06:58 kernel: [304373.770269] __mutex_lock.isra.0+0x20c/0x470 May 15 02:06:58 kernel: [304373.770276] ? iput.part.0+0x61/0x1e0 May 15 02:06:58 kernel: [304373.770287] __mutex_lock_slowpath+0x13/0x20 May 15 02:06:58 kernel: [304373.770294] mutex_lock+0x36/0x40 May 15 02:06:58 kernel: [304373.770299] blkdev_get_by_dev+0x11d/0x2d0 May 15 02:06:58 kernel: [304373.770309] ? blkdev_close+0x30/0x30 May 15 02:06:58 kernel: [304373.770318] blkdev_open+0x50/0x90 May 15 02:06:58 kernel: [304373.770325] do_dentry_open+0x169/0x3e0 May 15 02:06:58 kernel: [304373.770336] vfs_open+0x2d/0x40 May 15 02:06:58 kernel: [304373.770342] do_open.isra.0+0x20d/0x480 May 15 02:06:58 kernel: [304373.770351] path_openat+0x18e/0xe50 May 15 02:06:58 kernel: [304373.770361] ? put_device+0x13/0x20 May 15 02:06:58 kernel: [304373.770371] ? scsi_device_put+0x31/0x40 May 15 02:06:58 kernel: [304373.770380] ? sd_release+0x3b/0xb0 May 15 02:06:58 kernel: [304373.770388] do_filp_open+0xb2/0x120 May 15 02:06:58 kernel: [304373.770398] ? __check_object_size+0x14f/0x160 May 15 02:06:58 kernel: [304373.770408] do_sys_openat2+0x249/0x330 May 15 02:06:58 kernel: [304373.770418] do_sys_open+0x46/0x80 May 15 02:06:58 kernel: [304373.770424] __x64_sys_openat+0x20/0x30 May 15 02:06:58 kernel: [304373.770430] do_syscall_64+0x5c/0xc0 May 15 02:06:58 kernel: [304373.770440] ? do_syscall_64+0x69/0xc0 May 15 02:06:58 kernel: [304373.770448] entry_SYSCALL_64_after_hwframe+0x61/0xcb May 15 02:06:58 kernel: [304373.770458] RIP: 0033:0x7f9b0d188d3b May 15 02:06:58 kernel: [304373.770465] RSP: 002b:7ffd72a3caf0 EFLAGS: 0246 ORIG_RAX: 0101 May 15 02:06:58 kernel: [304373.770473] RAX: ffda RBX: 55f1346783c0 RCX: 7f9b0d188d3b May 15 02:06:58 kernel: [304373.770479] RDX: 0800 RSI: 55f1346783f8 RDI: ff9c May 15 02:06:58 kernel: [304373.770484] RBP: 55f1346783f8 R08: 0001 R09: May 15 02:06:58 kernel: [304373.770488] R10: R11: 0246 R12: 0800 May 15 02:06:58 kernel: [304373.770493] R13: R14: 55f1334c26a4 R15: 7f9b0cd17250 May 15 02:06:58 kernel: [304373.770500] -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/2018687 Title: rm -r dir on USB disk locks up hdparm on different disk Status in linux-meta-hwe-5.15 package in Ubuntu: New Bug description: Description:Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R)
[Kernel-packages] [Bug 2019240] Re: Pull-request to address a number of enablement issues for Orin platforms
Changing Package to linux-nvidia-tegra ** Package changed: linux-nvidia (Ubuntu) => linux-nvidia-tegra (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-tegra in Ubuntu. https://bugs.launchpad.net/bugs/2019240 Title: Pull-request to address a number of enablement issues for Orin platforms Status in linux-nvidia-tegra package in Ubuntu: New Bug description: [impact] This patch set addresses a wide variety of bugs and missing features for NVIDIA Orin platforms. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-tegra/+bug/2019240/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk
** Description changed: Description:Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz Running a "rm -r dir" on a directory with millions of files that resides on a disk in an external USB-3 hard drive dock locks up an unrelated hdparm processes running on an internal disk such that the kernel says: - May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. + May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. + [...] + May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. + [...] + May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. [...] - May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. - [...] - May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. + May 7 04:30:05 kernel: [163968.537842] INFO: task hdparm:1391162 blocked for more than 483 seconds. First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that it doesn't affect anything: - # \time hdparm -t /dev/sda - /dev/sda: - Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec - 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k - 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps + # \time hdparm -t /dev/sda + /dev/sda: + Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec + 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k + 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive. I now run this loop to show the timings and process states below: - # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig + # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done (I have some processes stopped in a freezer cgroup ("refrig") that I don't want to see in the grep output.) I SIGCONT the offending "rm -r" running on the drive in the USB3 drive dock and you see the rm appear in uninterruptible sleep along with a couple of kernel processes: - Sun May 7 05:01:07 EDT 2023 - Sun May 7 05:01:08 EDT 2023 - Sun May 7 05:01:09 EDT 2023 - Sun May 7 05:01:10 EDT 2023 - Sun May 7 05:01:11 EDT 2023 - 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] - 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 - Sun May 7 05:01:12 EDT 2023 - 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] - 1 04016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd - 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 - Sun May 7 05:01:13 EDT 2023 + Sun May 7 05:01:07 EDT 2023 + Sun May 7 05:01:08 EDT 2023 + Sun May 7 05:01:09 EDT 2023 + Sun May 7 05:01:10 EDT 2023 + Sun May 7 05:01:11 EDT 2023 + 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] + 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 + Sun May 7 05:01:12 EDT 2023 + 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] + 1 04016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd + 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 + Sun May 7 05:01:13 EDT 2023 The above lines showing those processes in uninterruptible sleep repeat over and over each second as the "rm -r" continues. I then start up "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in uninterruptible sleep and doesn't finish even after minutes of waiting: - Sun May 7 05:01:25 EDT 2023 - 1 0 368 2 20 0 0 0 md_sup D? 2:57 [md0_raid5] - 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] - 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 - 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda - Sun May 7 05:01:26 EDT 2023 - 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] - 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] - 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf
[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk
** Description changed: Description:Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz Running a "rm -r dir" on a directory with millions of files that resides on a disk in an external USB-3 hard drive dock locks up an unrelated hdparm processes running on an internal disk such that the kernel says: May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. [...] May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. [...] May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. - [...] - May 7 04:30:05 kernel: [163968.537842] INFO: task hdparm:1391162 blocked for more than 483 seconds. First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that it doesn't affect anything: # \time hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive. I now run this loop to show the timings and process states below: # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done (I have some processes stopped in a freezer cgroup ("refrig") that I don't want to see in the grep output.) I SIGCONT the offending "rm -r" running on the drive in the USB3 drive dock and you see the rm appear in uninterruptible sleep along with a couple of kernel processes: Sun May 7 05:01:07 EDT 2023 Sun May 7 05:01:08 EDT 2023 Sun May 7 05:01:09 EDT 2023 Sun May 7 05:01:10 EDT 2023 Sun May 7 05:01:11 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:12 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:13 EDT 2023 The above lines showing those processes in uninterruptible sleep repeat over and over each second as the "rm -r" continues. I then start up "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in uninterruptible sleep and doesn't finish even after minutes of waiting: Sun May 7 05:01:25 EDT 2023 1 0 368 2 20 0 0 0 md_sup D? 2:57 [md0_raid5] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:26 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:27 EDT 2023 [...] 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:35 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 11260 2544 blk_mq D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:36 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04985 2 20 0 0 0 rq_qos D? 0:24 [jbd2/sdj1-8] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:37 EDT 2023 I keep waiting. The above lines repeat over and over and the hdparm is blocked and doesn't finish. Sun May 7 05:03:32 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:18 [usb-storage] 1 0 1366783 2 20
[Kernel-packages] [Bug 2018687] [NEW] rm -r dir on USB disk locks up hdparm on different disk
Public bug reported: Description:Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz Running a "rm -r dir" on a directory with millions of files that resides on a disk in an external USB-3 hard drive dock locks up an unrelated hdparm processes running on an internal disk such that the kernel says: May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. [...] May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. [...] May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that it doesn't affect anything: # \time hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive. I now run this loop to show the timings and process states below: # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done (I have some processes stopped in a freezer cgroup ("refrig") that I don't want to see in the grep output.) I SIGCONT the offending "rm -r" running on the drive in the USB3 drive dock and you see the rm appear in uninterruptible sleep along with a couple of kernel processes: Sun May 7 05:01:07 EDT 2023 Sun May 7 05:01:08 EDT 2023 Sun May 7 05:01:09 EDT 2023 Sun May 7 05:01:10 EDT 2023 Sun May 7 05:01:11 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:12 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:13 EDT 2023 The above lines showing those processes in uninterruptible sleep repeat over and over each second as the "rm -r" continues. I then start up "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in uninterruptible sleep and doesn't finish even after minutes of waiting: Sun May 7 05:01:25 EDT 2023 1 0 368 2 20 0 0 0 md_sup D? 2:57 [md0_raid5] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:26 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:27 EDT 2023 [...] 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:35 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 4 0 1423283 11939 20 0 11260 2544 blk_mq D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:36 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:17 [usb-storage] 1 04985 2 20 0 0 0 rq_qos D? 0:24 [jbd2/sdj1-8] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:37 EDT 2023 I keep waiting. The above lines repeat over and over and the hdparm is blocked and doesn't finish. Sun May 7 05:03:32 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D? 3:18 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:03 rm -rf 15tb3 4 0 14235019975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:34 EDT
[Kernel-packages] [Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought
** Changed in: stress-ng Status: New => Won't Fix ** Changed in: stress-ng Status: Won't Fix => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1959215 Title: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 1cc8" on Impish with node vought Status in Stress-ng: Invalid Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Confirmed Status in linux source package in Impish: Won't Fix Bug description: Issue found on Intel node "vought" with: * 5.13.0-28.31 * 5.13.0-27 * And possibly the 5.13.0-23 from the last cycle (this test didn't finish properly and marked as "Incomplete" back then, just like this cycle). For more earlier Impish kernels, this system was not tested with this test on them. The test will hang with unshare test in ubuntu_stress_smoke_tests: 12:39:39 DEBUG| [stdout] udp RETURNED 0 12:39:39 DEBUG| [stdout] udp PASSED 12:39:39 DEBUG| [stdout] udp-flood STARTING 12:39:41 DEBUG| [stdout] udp-flood RETURNED 0 12:39:41 DEBUG| [stdout] udp-flood PASSED 12:39:41 DEBUG| [stdout] unshare STARTING (Test hangs here) And eventually the test will be killed because of the timeout setting. stress-ng Test suite HEAD SHA1: b81116c Error can be found in dmesg: [ 2371.109961] BUG: unable to handle page fault for address: 1cc8 [ 2371.110074] #PF: supervisor read access in kernel mode [ 2371.114323] #PF: error_code(0x) - not-present page [ 2371.119931] PGD 0 P4D 0 [ 2371.125257] Oops: [#1] SMP NOPTI [ 2371.129247] CPU: 51 PID: 207256 Comm: stress-ng Tainted: P O 5.13.0-27-generic #29-Ubuntu [ 2371.133203] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019 [ 2371.135887] RIP: 0010:__next_zones_zonelist+0x6/0x50 [ 2371.138525] Code: d0 0f 4e d0 3d ff 03 00 00 7f 0d 48 63 d2 5d 48 8b 04 d5 60 e5 35 af c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <8b> 4f 08 48 89 f8 48 89 e5 48 85 d2 75 10 eb 1d 48 63 49 50 48 0f [ 2371.143813] RSP: 0018:a9c8b399fac0 EFLAGS: 00010282 [ 2371.146078] RAX: RBX: RCX: [ 2371.148293] RDX: 9c98e894ea98 RSI: 0002 RDI: 1cc0 [ 2371.150477] RBP: a9c8b399fb28 R08: R09: [ 2371.152650] R10: 0002 R11: d9bfbfcc5600 R12: 00052cc0 [ 2371.154778] R13: 0002 R14: 0001 R15: 00152cc0 [ 2371.156876] FS: 7fcbd141d740() GS:9cc14ccc() knlGS: [ 2371.158936] CS: 0010 DS: ES: CR0: 80050033 [ 2371.160958] CR2: 1cc8 CR3: 00059f292001 CR4: 007706e0 [ 2371.162950] DR0: DR1: DR2: [ 2371.164888] DR3: DR6: fffe0ff0 DR7: 0400 [ 2371.166811] PKRU: 5554 [ 2371.168694] Call Trace: [ 2371.170544] ? __alloc_pages+0x2f1/0x330 [ 2371.172386] kmalloc_large_node+0x45/0xb0 [ 2371.174222] __kmalloc_node+0x276/0x300 [ 2371.176036] ? queue_delayed_work_on+0x39/0x60 [ 2371.177853] kvmalloc_node+0x5a/0x90 [ 2371.179622] expand_one_shrinker_info+0x82/0x190 [ 2371.181382] prealloc_shrinker+0x175/0x1d0 [ 2371.183091] alloc_super+0x2bf/0x330 [ 2371.184764] ? __fput_sync+0x30/0x30 [ 2371.186384] sget_fc+0x74/0x2e0 [ 2371.187951] ? set_anon_super+0x50/0x50 [ 2371.189473] ? mqueue_create+0x20/0x20 [ 2371.190944] get_tree_keyed+0x34/0xd0 [ 2371.192363] mqueue_get_tree+0x1c/0x20 [ 2371.193734] vfs_get_tree+0x2a/0xc0 [ 2371.195105] fc_mount+0x13/0x50 [ 2371.196409] mq_init_ns+0x10a/0x1b0 [ 2371.197667] copy_ipcs+0x130/0x220 [ 2371.198899] create_new_namespaces+0xa6/0x2e0 [ 2371.200113] unshare_nsproxy_namespaces+0x5a/0xb0 [ 2371.201303] ksys_unshare+0x1db/0x3c0 [ 2371.202480] __x64_sys_unshare+0x12/0x20 [ 2371.203649] do_syscall_64+0x61/0xb0 [ 2371.204804] ? exit_to_user_mode_loop+0xec/0x160 [ 2371.205966] ? exit_to_user_mode_prepare+0x37/0xb0 [ 2371.207102] ? syscall_exit_to_user_mode+0x27/0x50 [ 2371.208222] ? __x64_sys_close+0x11/0x40 [ 2371.209336] ? do_syscall_64+0x6e/0xb0 [ 2371.210438] ? asm_exc_page_fault+0x8/0x30 [ 2371.211545] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2371.212641] RIP: 0033:0x7fcbd1562c4b [ 2371.213698] Code: 73 01 c3 48 8b 0d e5 e1 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 e1 0e 00 f7 d8 64 89 01 48 [ 2371.215851] RSP: 002b:7ffc5d8eb878 EFLAGS: 0246 ORIG_RAX: 0110 [ 2371.216846] RAX: ffda RBX: 7ffc5d8eba20 RCX:
[Kernel-packages] [Bug 1961076] Re: linux-hwe-5.4 ADT test failure (ubuntu_stress_smoke_test) with linux-hwe-5.4/5.4.0-100.113~18.04.1
** Changed in: stress-ng Status: New => Fix Released ** Changed in: stress-ng Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: stress-ng Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1961076 Title: linux-hwe-5.4 ADT test failure (ubuntu_stress_smoke_test) with linux- hwe-5.4/5.4.0-100.113~18.04.1 Status in Stress-ng: Fix Released Status in ubuntu-kernel-tests: New Status in linux-hwe-5.4 package in Ubuntu: New Status in linux-hwe-5.4 source package in Bionic: New Bug description: The 'dev-shm' stress-ng test is failing with bionic/linux-hwe-5.4 5.4.0-100.113~18.04.1 on ADT, only on ppc64el. Testing failed on: ppc64el: https://autopkgtest.ubuntu.com/results/autopkgtest-bionic/bionic/ppc64el/l/linux-hwe-5.4/20220216_115416_c1d6c@/log.gz 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] stress-ng 0.13.11 g48be8ff4ffc4 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] system: Linux autopkgtest 5.4.0-100-generic #113~18.04.1-Ubuntu SMP Mon Feb 7 15:02:55 UTC 2022 ppc64le 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] RAM total: 7.9G, RAM free: 3.3G, swap free: 1023.9M 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] 4 processors online, 4 processors configured 11:35:08 DEBUG| [stdout] stress-ng: info: [26897] setting to a 5 second run per stressor 11:35:08 DEBUG| [stdout] stress-ng: info: [26897] dispatching hogs: 4 dev-shm 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] cache allocate: using cache maximum level L1 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] cache allocate: shared cache buffer size: 32K 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] starting stressors 11:35:08 DEBUG| [stdout] stress-ng: debug: [26899] stress-ng-dev-shm: started [26899] (instance 0) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26900] stress-ng-dev-shm: started [26900] (instance 1) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: started [26901] (instance 2) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26902] stress-ng-dev-shm: started [26902] (instance 3) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] 4 stressors started 11:35:08 DEBUG| [stdout] stress-ng: debug: [26899] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 0) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26902] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 3) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 2) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26900] stress-ng-dev-shm: assuming killed by OOM killer, restarting again (instance 1) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] (stress-ng-dev-shm) terminated on signal: 9 (Killed) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] (stress-ng-dev-shm) was killed by the OOM killer 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] terminated 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] (stress-ng-dev-shm) terminated on signal: 9 (Killed) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] (stress-ng-dev-shm) was possibly killed by the OOM killer 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] terminated 11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: exited [26901] (instance 2) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26901] terminated 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] (stress-ng-dev-shm) terminated on signal: 9 (Killed) 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] (stress-ng-dev-shm) was killed by the OOM killer 11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] terminated 11:35:08 DEBUG| [stdout] stress-ng: info: [26897] successful run completed in 5.06s 11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 0 corrupted bogo-ops counter, 14 vs 0 11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 0 hash error in bogo-ops counter and run flag, 2146579844 vs 0 11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 1 corrupted bogo-ops counter, 13 vs 0 11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 1 hash error in bogo-ops counter and run flag, 1093487894 vs 0 11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 3 corrupted bogo-ops counter, 13 vs 0 11:35:08 DEBUG| [stdout] info: 5 failures reached, aborting stress process 11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] dev_shm instance 3 hash error in bogo-ops counter and run flag, 1093487894 vs 0 11:35:08 DEBUG| [stdout] stress-ng: fail: [26897] metr
[Kernel-packages] [Bug 1999731] Re: disk stress test failing with code 7
** Changed in: stress-ng Status: In Progress => Fix Released ** Changed in: stress-ng Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: stress-ng Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1999731 Title: disk stress test failing with code 7 Status in Stress-ng: Fix Released Status in linux package in Ubuntu: Invalid Status in stress-ng package in Ubuntu: Fix Released Bug description: Since mid of November we see lots of disk stress test failing with multiple Ubuntu kernel e.g. bionic-hwe, focal, focal-hwe. Most of them are with lockofd stressor and system are still alive after stress test. 05 Nov 08:51: Running stress-ng lockofd stressor for 240 seconds... ** stress-ng exited with code 7 To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1999731/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
** Changed in: stress-ng Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Released Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2017529] Re: livecd-rootfs: kernel bump
It's in a server room and it's hard to get logs out - I also have pressure to make it usable. IMHO, just bump the kernel on the livecd. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2017529 Title: livecd-rootfs: kernel bump Status in linux package in Ubuntu: Confirmed Bug description: Booting on a larger HP EPYC server results in no PCIe devices available. The kernel thinks that mem regions overlap and eventually ends up in stating no address space available for other things To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2017529/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2012260] Re: Add support for Adler Lake N
** Changed in: thermald (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2012260 Title: Add support for Adler Lake N Status in thermald package in Ubuntu: In Progress Bug description: [Impact] * Support thermald on Adler Lake N CPU. [Test Plan] * Use a machine with a Adler Lake N cpu. * systemctl status thermald * Status of thermald should be `running` [Where problems could occur] * This change is to add support for Adler Lake N in thermald, which won't impact other hardware. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2012260/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2009676] Re: Add support for Raptor Lake S CPUs
@koba, can you test the focal version too? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2009676 Title: Add support for Raptor Lake S CPUs Status in OEM Priority Project: New Status in thermald package in Ubuntu: Fix Released Status in thermald source package in Jammy: Fix Committed Bug description: [Impact] * Support thermald on Raptor Lake S CPU. [Test Plan] * Use a machine with a Raptor Lake S cpu. * systemctl status thermald * Status of thermald should be `running` [Where problems could occur] * This change is to add support for Raptor Lake S in thermald, which won't impact other hardware. [Other Info] https://github.com/intel/thermal_daemon/commit/e03493dc1e972374c1686492655250f8f48a15ba To manage notifications about this bug go to: https://bugs.launchpad.net/oem-priority/+bug/2009676/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2009676] Re: Add support for Raptor Lake S CPUs
I believe it's useful to have RPL focal support in thermald if users use newer HWE kernels. Plus the change is basically adding in some CPU id's so it's a small delta for useful potential support. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2009676 Title: Add support for Raptor Lake S CPUs Status in OEM Priority Project: New Status in thermald package in Ubuntu: Fix Released Status in thermald source package in Jammy: Fix Committed Bug description: [Impact] * Support thermald on Raptor Lake S CPU. [Test Plan] * Use a machine with a Raptor Lake S cpu. * systemctl status thermald * Status of thermald should be `running` [Where problems could occur] * This change is to add support for Raptor Lake S in thermald, which won't impact other hardware. [Other Info] https://github.com/intel/thermal_daemon/commit/e03493dc1e972374c1686492655250f8f48a15ba To manage notifications about this bug go to: https://bugs.launchpad.net/oem-priority/+bug/2009676/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2009676] Re: Add support for Raptor Lake S CPUs
I've applied the patch for Jammy and uploaded a new version ready for SRU. thermald (2.4.9-1ubuntu0.2) jammy; urgency=medium * Add support for Raptor Lake S CPUs. (LP: #2009676) Date: Wed, 8 Mar 2023 11:28:31 + Changed-By: Colin Ian King Maintainer: Ubuntu Developers https://launchpad.net/ubuntu/+source/thermald/2.4.9-1ubuntu0.2 == OK: thermald_2.4.9.orig.tar.xz OK: thermald_2.4.9-1ubuntu0.2.debian.tar.xz OK: thermald_2.4.9-1ubuntu0.2.dsc -> Component: main Section: misc Upload Warnings: Redirecting ubuntu jammy to ubuntu jammy-proposed. This upload awaits approval by a distro manager Announcing to jammy-chan...@lists.ubuntu.com Thank you for your contribution to Ubuntu. ** Changed in: thermald (Ubuntu) Milestone: None => jammy-updates -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2009676 Title: Add support for Raptor Lake S CPUs Status in OEM Priority Project: New Status in thermald package in Ubuntu: Fix Released Status in thermald source package in Jammy: New Bug description: [Impact] * Support thermald on Raptor Lake S CPU. [Test Plan] * Use a machine with a Raptor Lake S cpu. * systemctl status thermald * Status of thermald should be `running` [Where problems could occur] * This change is to add support for Raptor Lake S in thermald, which won't impact other hardware. [Other Info] https://github.com/intel/thermal_daemon/commit/e03493dc1e972374c1686492655250f8f48a15ba To manage notifications about this bug go to: https://bugs.launchpad.net/oem-priority/+bug/2009676/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Also there is a re-forking delay added to allow instances to fire up and back off if resources get low. These changes have been tested with 256, 1024, 4096 and 8192 instances on a 24 thread system with 32GB of memory. ** Changed in: linux (Ubuntu) Status: New => Invalid ** Changed in: linux (Ubuntu) Importance: High => Low -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Committed Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
This fix will land in the next release of stress-ng at the end of March 2023 ** Changed in: stress-ng Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Committed Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Added an ENOBUFS check on the sender with priority dropping on ENOBUFS errors and also a timer backoff delay. Added OOM killer respawning that can be overridden using the --oomable to allow overcommitted systems to ether respawn OOM'd rawsock instances (default) or not respawn (--oomable). Fix committed upstream: https://github.com/ColinIanKing/stress- ng/commit/e4d3b90267243d7505399e7059950097d9bd50ae -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Committed Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1999731] Re: disk stress test failing with code 7
** Changed in: stress-ng (Ubuntu) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1999731 Title: disk stress test failing with code 7 Status in Stress-ng: In Progress Status in linux package in Ubuntu: In Progress Status in stress-ng package in Ubuntu: Fix Released Bug description: Since mid of November we see lots of disk stress test failing with multiple Ubuntu kernel e.g. bionic-hwe, focal, focal-hwe. Most of them are with lockofd stressor and system are still alive after stress test. 05 Nov 08:51: Running stress-ng lockofd stressor for 240 seconds... ** stress-ng exited with code 7 To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1999731/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007579] Re: Raptor Lake Thermald ITMT version 2 support
@koba, if you have access to a Dell XPS 9320 then that would be super useful to verify too - thanks! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2007579 Title: Raptor Lake Thermald ITMT version 2 support Status in thermald package in Ubuntu: Fix Released Status in thermald source package in Bionic: Won't Fix Status in thermald source package in Focal: Won't Fix Status in thermald source package in Jammy: In Progress Status in thermald source package in Kinetic: Fix Committed Status in thermald source package in Lunar: Fix Released Bug description: == SRU Justification Kinetic == Raptor Lake system uses ITMT v2 instead of V1 for thermal configuration via GDDV. This was observed on Dell XPS 9320 system. Because thermald can't parse V2 table, it is not getting correct thermal threshold temperature and power limits. == The Fix == This is fixed in upstream thermald by the patch: https://github.com/intel/thermal_daemon/commit/90d56bc06cdcf78e7398ea7da389401516591774 This fix is part of Thermald 2.5.2 release. The fix applies cleanly and this is already in Ubuntu Lunar in thermald 2.5.2. The fix checks for illegal ITMT version and handles version 2 as a specific exceptional case. == Regression Risks == For systems that do not used ITMT, no change in behaviour will occur. Systems with versions > 2 (currently not valid) will not have ITMT parsed anymore; this will avoid misinterpreting unsupported ITMT data. Finally, version 2 of ITMT will be now parsed differently and additional fields will be parsed and these will be ignored as intended. == Test Plan == Test against a Dell XPS 9320 system. See if it handles the ITMT correctly. The thermald log should indicate version 2 is being used with the message: "ignore dummy_str: ds d1 d2 d3 " where ds = a string, d1 .. d3 are uint64 values that are parsed and ignored. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2007579/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Looks like the kernel is running out of resources and it is doing Out- of-memory killing of various processes. I think I have ways of reducing this from occurring. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007579] Re: Raptor Lake Thermald ITMT version 2 support
I've been exercising the existing code paths of thermald for several days now with no observable regression in behaviour. I cannot test the new code path change for this fix as I don't have the exact same system as that reported in the bug. For what I can see, there is no regression on this single change for Kinetic. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2007579 Title: Raptor Lake Thermald ITMT version 2 support Status in thermald package in Ubuntu: Fix Released Status in thermald source package in Bionic: Won't Fix Status in thermald source package in Focal: Won't Fix Status in thermald source package in Jammy: In Progress Status in thermald source package in Kinetic: Fix Committed Status in thermald source package in Lunar: Fix Released Bug description: == SRU Justification Kinetic == Raptor Lake system uses ITMT v2 instead of V1 for thermal configuration via GDDV. This was observed on Dell XPS 9320 system. Because thermald can't parse V2 table, it is not getting correct thermal threshold temperature and power limits. == The Fix == This is fixed in upstream thermald by the patch: https://github.com/intel/thermal_daemon/commit/90d56bc06cdcf78e7398ea7da389401516591774 This fix is part of Thermald 2.5.2 release. The fix applies cleanly and this is already in Ubuntu Lunar in thermald 2.5.2. The fix checks for illegal ITMT version and handles version 2 as a specific exceptional case. == Regression Risks == For systems that do not used ITMT, no change in behaviour will occur. Systems with versions > 2 (currently not valid) will not have ITMT parsed anymore; this will avoid misinterpreting unsupported ITMT data. Finally, version 2 of ITMT will be now parsed differently and additional fields will be parsed and these will be ignored as intended. == Test Plan == Test against a Dell XPS 9320 system. See if it handles the ITMT correctly. The thermald log should indicate version 2 is being used with the message: "ignore dummy_str: ds d1 d2 d3 " where ds = a string, d1 .. d3 are uint64 values that are parsed and ignored. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2007579/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2007579] Re: Raptor Lake Thermald ITMT version 2 support
** Changed in: thermald (Ubuntu Jammy) Status: Won't Fix => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to thermald in Ubuntu. https://bugs.launchpad.net/bugs/2007579 Title: Raptor Lake Thermald ITMT version 2 support Status in thermald package in Ubuntu: Fix Released Status in thermald source package in Bionic: Won't Fix Status in thermald source package in Focal: Won't Fix Status in thermald source package in Jammy: In Progress Status in thermald source package in Kinetic: In Progress Status in thermald source package in Lunar: Fix Released Bug description: == SRU Justification Kinetic == Raptor Lake system uses ITMT v2 instead of V1 for thermal configuration via GDDV. This was observed on Dell XPS 9320 system. Because thermald can't parse V2 table, it is not getting correct thermal threshold temperature and power limits. == The Fix == This is fixed in upstream thermald by the patch: https://github.com/intel/thermal_daemon/commit/90d56bc06cdcf78e7398ea7da389401516591774 This fix is part of Thermald 2.5.2 release. The fix applies cleanly and this is already in Ubuntu Lunar in thermald 2.5.2. The fix checks for illegal ITMT version and handles version 2 as a specific exceptional case. == Regression Risks == For systems that do not used ITMT, no change in behaviour will occur. Systems with versions > 2 (currently not valid) will not have ITMT parsed anymore; this will avoid misinterpreting unsupported ITMT data. Finally, version 2 of ITMT will be now parsed differently and additional fields will be parsed and these will be ignored as intended. == Test Plan == Test against a Dell XPS 9320 system. See if it handles the ITMT correctly. The thermald log should indicate version 2 is being used with the message: "ignore dummy_str: ds d1 d2 d3 " where ds = a string, d1 .. d3 are uint64 values that are parsed and ignored. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2007579/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp