[Kernel-packages] [Bug 2059316] Re: backport arm64 THP improvements from 6.9
** Also affects: linux-nvidia (Ubuntu) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Noble) Importance: Undecided Status: New ** Also affects: linux-nvidia (Ubuntu Noble) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2059316 Title: backport arm64 THP improvements from 6.9 Status in linux package in Ubuntu: In Progress Status in linux-nvidia package in Ubuntu: New Status in linux source package in Noble: New Status in linux-nvidia source package in Noble: New Bug description: Initial support for multi-size THP landed upstream in v6.8. In the 6.9 merge window, 2 other series have landed that show significant performance improvements on arm64 mm/memory: optimize fork() with PTE-mapped THP https://lkml.iu.edu/hypermail/linux/kernel/2401.3/02766.html Transparent Contiguous PTEs for User Mappings: https://lwn.net/Articles/962330/ On an Ampere AltraMax system w/ 4K page size, kernel builds in a tmpfs are reduced from 6m30s to 5m17s, a ~19% improvement. It has been reported that this can have a *10x* improvement for certain GPU workloads on ARM: https://lwn.net/Articles/954094/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2059316/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2062380] Re: Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper
** Summary changed: - Using a 6.8 kernel modprobe nvidia hangs on Grace Hopper + Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper ** Also affects: nvidia-graphics-drivers-535-server (Ubuntu) Importance: Undecided Status: New ** Changed in: nvidia-graphics-drivers-535-server (Ubuntu) Status: New => Confirmed ** Changed in: nvidia-graphics-drivers-550-server (Ubuntu) Status: New => Confirmed ** Description changed: Using both -generic and -nvidia 6.8 kernels I'm seeing a hang when I load the nvidia driver. + + $ sudo dmidecode -t 0 + # dmidecode 3.5 + Getting SMBIOS data from sysfs. + SMBIOS 3.6.0 present. + # SMBIOS implementations newer than version 3.5.0 are not + # fully supported by this version of dmidecode. + + Handle 0x0001, DMI type 0, 26 bytes + BIOS Information + Vendor: NVIDIA + Version: 01.02.01 + Release Date: 20240207 + ROM Size: 64 MB + Characteristics: + PCI is supported + PNP is supported + BIOS is upgradeable + BIOS shadowing is allowed + Boot from CD is supported + Selectable boot is supported + Serial services are supported (int 14h) + ACPI is supported + Targeted content distribution is supported + UEFI is supported + Firmware Revision: 0.0 [ 382.938326] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 382.946075] rcu: 53-...0: (4 ticks this GP) idle=1c2c/1/0x4000 softirq=4866/4868 fqs=14124 [ 382.955683] rcu: hardirqs softirqs csw/system [ 382.961378] rcu: number:0 00 [ 382.967071] rcu: cputime:0 00 ==> 30026(ms) [ 382.974189] rcu: (detected by 52, t=60034 jiffies, g=24469, q=1199 ncpus=72) [ 392.982095] rcu: rcu_preempt kthread starved for 9994 jiffies! g24469 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 392.992769] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior - After seeing this, I Enabled kdump and set kernel.panic_on_rcu_stall = 1 KDUMP INFO WARNING: cpu 54: cannot find NT_PRSTATUS note - KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k [TAINTED] - DUMPFILE: /var/crash/202404172139/dump.202404172139 [PARTIAL DUMP] - CPUS: 72 - DATE: Wed Apr 17 21:39:13 UTC 2024 - UPTIME: 00:06:10 + KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k [TAINTED] + DUMPFILE: /var/crash/202404172139/dump.202404172139 [PARTIAL DUMP] + CPUS: 72 + DATE: Wed Apr 17 21:39:13 UTC 2024 + UPTIME: 00:06:10 LOAD AVERAGE: 0.68, 0.63, 0.28 -TASKS: 854 - NODENAME: hinyari - RELEASE: 6.8.0-1005-nvidia-64k - VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024 - MACHINE: aarch64 (unknown Mhz) - MEMORY: 479.7 GB -PANIC: "Kernel panic - not syncing: RCU Stall" - PID: 0 - COMMAND: "swapper/21" - TASK: 82026880 (1 of 72) [THREAD_INFO: 82026880] - CPU: 21 -STATE: TASK_RUNNING (PANIC) + TASKS: 854 + NODENAME: hinyari + RELEASE: 6.8.0-1005-nvidia-64k + VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024 + MACHINE: aarch64 (unknown Mhz) + MEMORY: 479.7 GB + PANIC: "Kernel panic - not syncing: RCU Stall" + PID: 0 + COMMAND: "swapper/21" + TASK: 82026880 (1 of 72) [THREAD_INFO: 82026880] + CPU: 21 + STATE: TASK_RUNNING (PANIC) [ 300.313144] nvidia: loading out-of-tree module taints kernel. [ 300.313153] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 300.316694] nvidia-nvlink: Nvlink Core is being initialized, major device number 506 - [ 300.316699] + [ 300.316699] [ 360.323454] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 360.331206] rcu: 54-...0: (24 ticks this GP) idle=742c/1/0x4000 softirq=4931/4933 fqs=13148 [ 360.340903] rcu: hardirqs softirqs csw/system [ 360.346597] rcu: number:0 00 [ 360.352291] rcu: cputime:0 00 ==> 30031(ms) [ 360.359408] rcu: (detected by 21, t=60038 jiffies, g=25009, q=1123 ncpus=72) [ 360.366704] Sending NMI from CPU 21 to CPUs 54: [ 370.367310] rcu: rcu_preempt kthread starved for 9993 jiffies! g25009 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 370.377983] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 370.387322] rcu: RCU grace-period kthread stack dump: [ 370.392482] task:rcu_preempt state:I stack:0 pid:17tgid:17 ppid:2 flags:0x0008 [ 370.392488] Call trace: [
[Kernel-packages] [Bug 2062380] [NEW] Using a 6.8 kernel modprobe nvidia hangs on Grace Hopper
Public bug reported: Using both -generic and -nvidia 6.8 kernels I'm seeing a hang when I load the nvidia driver. [ 382.938326] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 382.946075] rcu: 53-...0: (4 ticks this GP) idle=1c2c/1/0x4000 softirq=4866/4868 fqs=14124 [ 382.955683] rcu: hardirqs softirqs csw/system [ 382.961378] rcu: number:0 00 [ 382.967071] rcu: cputime:0 00 ==> 30026(ms) [ 382.974189] rcu: (detected by 52, t=60034 jiffies, g=24469, q=1199 ncpus=72) [ 392.982095] rcu: rcu_preempt kthread starved for 9994 jiffies! g24469 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 392.992769] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior After seeing this, I Enabled kdump and set kernel.panic_on_rcu_stall = 1 KDUMP INFO WARNING: cpu 54: cannot find NT_PRSTATUS note KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k [TAINTED] DUMPFILE: /var/crash/202404172139/dump.202404172139 [PARTIAL DUMP] CPUS: 72 DATE: Wed Apr 17 21:39:13 UTC 2024 UPTIME: 00:06:10 LOAD AVERAGE: 0.68, 0.63, 0.28 TASKS: 854 NODENAME: hinyari RELEASE: 6.8.0-1005-nvidia-64k VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024 MACHINE: aarch64 (unknown Mhz) MEMORY: 479.7 GB PANIC: "Kernel panic - not syncing: RCU Stall" PID: 0 COMMAND: "swapper/21" TASK: 82026880 (1 of 72) [THREAD_INFO: 82026880] CPU: 21 STATE: TASK_RUNNING (PANIC) [ 300.313144] nvidia: loading out-of-tree module taints kernel. [ 300.313153] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 300.316694] nvidia-nvlink: Nvlink Core is being initialized, major device number 506 [ 300.316699] [ 360.323454] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 360.331206] rcu: 54-...0: (24 ticks this GP) idle=742c/1/0x4000 softirq=4931/4933 fqs=13148 [ 360.340903] rcu: hardirqs softirqs csw/system [ 360.346597] rcu: number:0 00 [ 360.352291] rcu: cputime:0 00 ==> 30031(ms) [ 360.359408] rcu: (detected by 21, t=60038 jiffies, g=25009, q=1123 ncpus=72) [ 360.366704] Sending NMI from CPU 21 to CPUs 54: [ 370.367310] rcu: rcu_preempt kthread starved for 9993 jiffies! g25009 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31 [ 370.377983] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 370.387322] rcu: RCU grace-period kthread stack dump: [ 370.392482] task:rcu_preempt state:I stack:0 pid:17tgid:17 ppid:2 flags:0x0008 [ 370.392488] Call trace: [ 370.392489] __switch_to+0xd0/0x118 [ 370.392499] __schedule+0x2a8/0x7b0 [ 370.392501] schedule+0x40/0x168 [ 370.392502] schedule_timeout+0xac/0x1e0 [ 370.392505] rcu_gp_fqs_loop+0x128/0x508 [ 370.392512] rcu_gp_kthread+0x150/0x188 [ 370.392514] kthread+0xf8/0x110 [ 370.392519] ret_from_fork+0x10/0x20 [ 370.392524] rcu: Stack dump where RCU GP kthread last ran: [ 370.398128] Sending NMI from CPU 21 to CPUs 31: [ 370.398131] NMI backtrace for cpu 31 [ 370.398136] CPU: 31 PID: 0 Comm: swapper/31 Kdump: loaded Tainted: G OE 6.8.0-1005-nvidia-64k #5-Ubuntu [ 370.398139] Hardware name: /P3880, BIOS 01.02.01 20240207 [ 370.398140] pstate: 6349 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 370.398142] pc : cpuidle_enter_state+0xd8/0x790 [ 370.398150] lr : cpuidle_enter_state+0xcc/0x790 [ 370.398153] sp : 800081eefd70 [ 370.398154] x29: 800081eefd70 x28: x27: [ 370.398157] x26: x25: 00563d67e4e0 x24: [ 370.398160] x23: a0a1445699f8 x22: x21: 00563d72ece0 [ 370.398162] x20: a0a144569a10 x19: 8fa4a800 x18: 800081f00030 [ 370.398165] x17: x16: x15: ac8c73b08db0 [ 370.398168] x14: x13: x12: [ 370.398170] x11: x10: 2da0fbe3d5e8c649 x9 : a0a1424fd244 [ 370.398173] x8 : 820559b8 x7 : x6 : [ 370.398175] x5 : x4 : x3 : [ 370.398178] x2 : x1 : x0 : [ 370.398181] Call trace: [ 370.398183] cpuidle_enter_state+0xd8/0x790 [ 370.398185] cpuidle_enter+0x44/0x78 [ 370.398195] cpuidle_idle_call+0x15c/0x210 [ 370.398202] do_idle+0xb0/0x130 [ 370.398204] cpu_startup_entry+0x40/0x50 [ 370.398206] secondary_start_kernel+0xec/0x130 [ 370.398211] __secondary_switched+0xc0/0xc8 [ 370.399132] Kernel panic - not syncing: RCU Stall [ 370.403938] CPU: 21 PID: 0 Comm:
[Kernel-packages] [Bug 2055712] Re: Pull-request to address bug in mm/page_alloc.c
** Tags added: verification-done-jammy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2055712 Title: Pull-request to address bug in mm/page_alloc.c Status in linux-nvidia-6.5 package in Ubuntu: Fix Released Bug description: The current calculation of min_free_kbytes only uses ZONE_DMA and ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will also divide part of min_free_kbytes.This will cause the min watermark of ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE. __GFP_HIGH and PF_MEMALLOC allocations usually don't need movable zone pages, so just like ZONE_HIGHMEM, cap pages_min to a small value in __setup_per_zone_wmarks(). On my testing machine with 16GB of memory (transparent hugepage is turned off by default, and movablecore=12G is configured) The following is a comparative test data of watermark_min no patchadd patch ZONE_DMA1 8 ZONE_DMA32 151 709 ZONE_NORMAL 233 1113 ZONE_MOVABLE1434128 min_free_kbytes 72887326 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2055712/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2055712] Re: Pull-request to address bug in mm/page_alloc.c
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2055712 Title: Pull-request to address bug in mm/page_alloc.c Status in linux-nvidia-6.5 package in Ubuntu: Fix Released Bug description: The current calculation of min_free_kbytes only uses ZONE_DMA and ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will also divide part of min_free_kbytes.This will cause the min watermark of ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE. __GFP_HIGH and PF_MEMALLOC allocations usually don't need movable zone pages, so just like ZONE_HIGHMEM, cap pages_min to a small value in __setup_per_zone_wmarks(). On my testing machine with 16GB of memory (transparent hugepage is turned off by default, and movablecore=12G is configured) The following is a comparative test data of watermark_min no patchadd patch ZONE_DMA1 8 ZONE_DMA32 151 709 ZONE_NORMAL 233 1113 ZONE_MOVABLE1434128 min_free_kbytes 72887326 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2055712/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2059150] Re: jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper
Upgrading bios firmware resolves failure $ sudo dmidecode -t 0 # dmidecode 3.3 Getting SMBIOS data from sysfs. SMBIOS 3.6.0 present. # SMBIOS implementations newer than version 3.5.0 are not # fully supported by this version of dmidecode. Handle 0x0001, DMI type 0, 26 bytes BIOS Information Vendor: NVIDIA Version: 01.02.01 Release Date: 20240207 ROM Size: 64 MB Characteristics: PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported Serial services are supported (int 14h) ACPI is supported Targeted content distribution is supported UEFI is supported Firmware Revision: 0.0 ** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2059150 Title: jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper Status in linux-nvidia-6.5 package in Ubuntu: Invalid Bug description: Output from BMC SOL console: Unhandled Exception from EL2 x0 = 0x11f210305619 x1 = 0x x2 = 0x x3 = 0x x4 = 0x5f972493 x5 = 0x x6 = 0x x7 = 0x x8 = 0x x9 = 0xa0e0a03e7d6c x10= 0x x11= 0x x12= 0x x13= 0x x14= 0x x15= 0x x16= 0x x17= 0x x18= 0x x19= 0xf0f18080 x20= 0x80009e86f6a0 x21= 0x80009e86f720 x22= 0x07a5a0e0a03e7d6c x23= 0x x24= 0xa0e0a3348aa0 x25= 0xa0e0a2990008 x26= 0xa0e0a2990008 x27= 0xa0e04b4f5748 x28= 0x80009e86f710 x29= 0x80008000fe00 x30= 0xa0e0a03e7d6c scr_el3= 0x0407073d sctlr_el3 = 0x30cd183f cptr_el3 = 0x00100100 tcr_el3= 0x80853510 daif = 0x02c0 mair_el3 = 0x004404ff spsr_el3 = 0x034000c9 elr_el3= 0xa0e04b4f58b4 ttbr0_el3 = 0x0078734a5001 esr_el3= 0x622c5c1f far_el3= 0x9446dd42099e8148 spsr_el1 = 0x elr_el1= 0x spsr_abt = 0x spsr_und = 0x spsr_irq = 0x spsr_fiq = 0x sctlr_el1 = 0x30d00980 actlr_el1 = 0x cpacr_el1 = 0x0030 csselr_el1 = 0x0002 sp_el1 = 0x esr_el1= 0x ttbr0_el1 = 0x ttbr1_el1 = 0x mair_el1 = 0x amair_el1 = 0x tcr_el1= 0x tpidr_el1 = 0x tpidr_el0 = 0x8000 tpidrro_el0= 0x par_el1= 0x0800 mpidr_el1 = 0x8102 afsr0_el1 = 0x afsr1_el1 = 0x contextidr_el1 = 0x vbar_el1 = 0x cntp_ctl_el0 = 0x cntp_cval_el0 = 0x0012ec91c420 cntv_ctl_el0 = 0x cntv_cval_el0 = 0x cntkctl_el1= 0x sp_el0 = 0x0078732cf4f0 isr_el1= 0x0040 cpuectlr_el1 = 0x4000340340003000 gicd_ispendr regs (Offsets 0x200 - 0x278) Offset:value 0200: 0xUnhandled Exception in EL3. x30= 0x0078732c4384 x0 = 0x x1 = 0x0078732cb7d8 x2 = 0x0018 x3 = 0x0078732b1720 x4 = 0x x5 = 0x003c x6 = 0x0078732c9109 x7 = 0x22000204 x8 = 0x4000340340003000 x9 = 0x x10= 0x x11= 0x0012ec91c420 x12= 0x x13= 0x x14= 0x
[Kernel-packages] [Bug 2059150] [NEW] jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper
Public bug reported: Output from BMC SOL console: Unhandled Exception from EL2 x0 = 0x11f210305619 x1 = 0x x2 = 0x x3 = 0x x4 = 0x5f972493 x5 = 0x x6 = 0x x7 = 0x x8 = 0x x9 = 0xa0e0a03e7d6c x10= 0x x11= 0x x12= 0x x13= 0x x14= 0x x15= 0x x16= 0x x17= 0x x18= 0x x19= 0xf0f18080 x20= 0x80009e86f6a0 x21= 0x80009e86f720 x22= 0x07a5a0e0a03e7d6c x23= 0x x24= 0xa0e0a3348aa0 x25= 0xa0e0a2990008 x26= 0xa0e0a2990008 x27= 0xa0e04b4f5748 x28= 0x80009e86f710 x29= 0x80008000fe00 x30= 0xa0e0a03e7d6c scr_el3= 0x0407073d sctlr_el3 = 0x30cd183f cptr_el3 = 0x00100100 tcr_el3= 0x80853510 daif = 0x02c0 mair_el3 = 0x004404ff spsr_el3 = 0x034000c9 elr_el3= 0xa0e04b4f58b4 ttbr0_el3 = 0x0078734a5001 esr_el3= 0x622c5c1f far_el3= 0x9446dd42099e8148 spsr_el1 = 0x elr_el1= 0x spsr_abt = 0x spsr_und = 0x spsr_irq = 0x spsr_fiq = 0x sctlr_el1 = 0x30d00980 actlr_el1 = 0x cpacr_el1 = 0x0030 csselr_el1 = 0x0002 sp_el1 = 0x esr_el1= 0x ttbr0_el1 = 0x ttbr1_el1 = 0x mair_el1 = 0x amair_el1 = 0x tcr_el1= 0x tpidr_el1 = 0x tpidr_el0 = 0x8000 tpidrro_el0= 0x par_el1= 0x0800 mpidr_el1 = 0x8102 afsr0_el1 = 0x afsr1_el1 = 0x contextidr_el1 = 0x vbar_el1 = 0x cntp_ctl_el0 = 0x cntp_cval_el0 = 0x0012ec91c420 cntv_ctl_el0 = 0x cntv_cval_el0 = 0x cntkctl_el1= 0x sp_el0 = 0x0078732cf4f0 isr_el1= 0x0040 cpuectlr_el1 = 0x4000340340003000 gicd_ispendr regs (Offsets 0x200 - 0x278) Offset:value 0200: 0xUnhandled Exception in EL3. x30= 0x0078732c4384 x0 = 0x x1 = 0x0078732cb7d8 x2 = 0x0018 x3 = 0x0078732b1720 x4 = 0x x5 = 0x003c x6 = 0x0078732c9109 x7 = 0x22000204 x8 = 0x4000340340003000 x9 = 0x x10= 0x x11= 0x0012ec91c420 x12= 0x x13= 0x x14= 0x x15= 0x0078732cf4f0 x16= 0x2200 x17= 0x0018 x18= 0x0407073d x19= 0x007873386440 x20= 0x80009e86f6a0 x21= 0x80009e86f720 x22= 0x07a5a0e0a03e7d6c x23= 0x x24= 0xa0e0a3348aa0 x25= 0xa0e0a2990008 x26= 0xa0e0a2990008 x27= 0xa0e04b4f5748 x28= 0x80009e86f710 x29= 0x80008000fe00 scr_el3= 0x0407073d sctlr_el3 = 0x30cd183f cptr_el3 = 0x00100100 tcr_el3= 0x80853510 daif = 0x03c0 mair_el3 = 0x004404ff spsr_el3 = 0x834002cd elr_el3= 0x0078732b0af4 ttbr0_el3 = 0x0078734a5001 esr_el3= 0xbe11 far_el3= 0x9446dd42099e8148 spsr_el1 = 0x elr_el1= 0x spsr_abt = 0x spsr_und = 0x spsr_irq = 0x spsr_fiq = 0x sctlr_el1 = 0x30d00980 actlr_el1 = 0x cpacr_el1 = 0x0030 csselr_el1 = 0x0002 sp_el1 = 0x esr_el1= 0x ttbr0_el1 = 0x ttbr1_el1 = 0x mair_el1 =
[Kernel-packages] [Bug 2049537] Re: Pull request for: peer-memory, ACPI thermal issues and coresight etm4x issues
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2049537 Title: Pull request for: peer-memory, ACPI thermal issues and coresight etm4x issues Status in linux-nvidia-6.5 package in Ubuntu: Fix Committed Bug description: * Add support of "Thermal fast Sampling Period (_TFP)" for passive cooling. * Finer grained CPU throttling * The peer_memory_client scheme allows a driver to register with the ib_umem system that it has the ability to understand user virtual address ranges that are not compatible with get_user_pages(). For instance VMAs created with io_remap_pfn_range(), or other driver special VMA. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2049537/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2048815] Re: Pull request to address TPM SPI devices
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048815 Title: Pull request to address TPM SPI devices Status in linux-nvidia-6.5 package in Ubuntu: Fix Committed Bug description: TPM devices may insert wait state on last clock cycle of ADDR phase. For SPI controllers that support full-duplex transfers, this can be detected using software by reading the MISO line. For SPI controllers that only support half-duplex transfers, such as the Tegra QSPI, it is not possible to detect the wait signal from software. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048815/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2048966] Re: Fix soft lockup triggered by arm_smmu_mm_invalidate_range
** Changed in: linux-nvidia-6.5 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048966 Title: Fix soft lockup triggered by arm_smmu_mm_invalidate_range Status in linux-nvidia-6.5 package in Ubuntu: Fix Committed Bug description: [Problem] When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13: 3000b4a3cf60 x12: x11: x10: x9 : c08120e4d6bc x8 : x7 : x6 : 00048cfa x5 : x4 : 0001 x3 : 000a x2 : 8000 x1 : x0 : 0001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 [Fix] backport the following upstream stable patch d5afb4b47e13161b3f33904d45110f9e6463bad6 Link: https://lore.kernel.org/r/20230920052257.8615-1-nicol...@nvidia.com To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048966/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2048966] Re: Fix soft lockup triggered by arm_smmu_mm_invalidate_range
** Description changed: + [Problem] + When running an SVA case, the following soft lockup is triggered: - - watchdog: BUG: soft lockup - CPU#244 stuck for 26s! - pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) - pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 - lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 - sp : 8000d83ef290 - x29: 8000d83ef290 x28: 3b9aca00 x27: - x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: - x23: 0040 x22: 8000d83ef340 x21: c63980c0 - x20: 0001 x19: c6398080 x18: - x17: x16: x15: 3000b4a3bbb0 - x14: 3000b4a30888 x13: 3000b4a3cf60 x12: - x11: x10: x9 : c08120e4d6bc - x8 : x7 : x6 : 00048cfa - x5 : x4 : 0001 x3 : 000a - x2 : 8000 x1 : x0 : 0001 - Call trace: - arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 - __arm_smmu_tlb_inv_range+0x118/0x254 - arm_smmu_tlb_inv_range_asid+0x6c/0x130 - arm_smmu_mm_invalidate_range+0xa0/0xa4 - __mmu_notifier_invalidate_range_end+0x88/0x120 - unmap_vmas+0x194/0x1e0 - unmap_region+0xb4/0x144 - do_mas_align_munmap+0x290/0x490 - do_mas_munmap+0xbc/0x124 - __vm_munmap+0xa8/0x19c - __arm64_sys_munmap+0x28/0x50 - invoke_syscall+0x78/0x11c - el0_svc_common.constprop.0+0x58/0x1c0 - do_el0_svc+0x34/0x60 - el0_svc+0x2c/0xd4 - el0t_64_sync_handler+0x114/0x140 - el0t_64_sync+0x1a4/0x1a8 - + + watchdog: BUG: soft lockup - CPU#244 stuck for 26s! + pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) + pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 + lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 + sp : 8000d83ef290 + x29: 8000d83ef290 x28: 3b9aca00 x27: + x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: + x23: 0040 x22: 8000d83ef340 x21: c63980c0 + x20: 0001 x19: c6398080 x18: + x17: x16: x15: 3000b4a3bbb0 + x14: 3000b4a30888 x13: 3000b4a3cf60 x12: + x11: x10: x9 : c08120e4d6bc + x8 : x7 : x6 : 00048cfa + x5 : x4 : 0001 x3 : 000a + x2 : 8000 x1 : x0 : 0001 + Call trace: + arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 + __arm_smmu_tlb_inv_range+0x118/0x254 + arm_smmu_tlb_inv_range_asid+0x6c/0x130 + arm_smmu_mm_invalidate_range+0xa0/0xa4 + __mmu_notifier_invalidate_range_end+0x88/0x120 + unmap_vmas+0x194/0x1e0 + unmap_region+0xb4/0x144 + do_mas_align_munmap+0x290/0x490 + do_mas_munmap+0xbc/0x124 + __vm_munmap+0xa8/0x19c + __arm64_sys_munmap+0x28/0x50 + invoke_syscall+0x78/0x11c + el0_svc_common.constprop.0+0x58/0x1c0 + do_el0_svc+0x34/0x60 + el0_svc+0x2c/0xd4 + el0t_64_sync_handler+0x114/0x140 + el0t_64_sync+0x1a4/0x1a8 + + + + [Fix] + + backport the following upstream stable patch + d5afb4b47e13161b3f33904d45110f9e6463bad6 + + Link: + https://lore.kernel.org/r/20230920052257.8615-1-nicol...@nvidia.com -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048966 Title: Fix soft lockup triggered by arm_smmu_mm_invalidate_range Status in linux-nvidia-6.5 package in Ubuntu: New Bug description: [Problem] When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13:
[Kernel-packages] [Bug 2048966] [NEW] Fix soft lockup triggered by arm_smmu_mm_invalidate_range
Public bug reported: When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13: 3000b4a3cf60 x12: x11: x10: x9 : c08120e4d6bc x8 : x7 : x6 : 00048cfa x5 : x4 : 0001 x3 : 000a x2 : 8000 x1 : x0 : 0001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 ** Affects: linux-nvidia-6.5 (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu. https://bugs.launchpad.net/bugs/2048966 Title: Fix soft lockup triggered by arm_smmu_mm_invalidate_range Status in linux-nvidia-6.5 package in Ubuntu: New Bug description: When running an SVA case, the following soft lockup is triggered: watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : 8000d83ef290 x29: 8000d83ef290 x28: 3b9aca00 x27: x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: x23: 0040 x22: 8000d83ef340 x21: c63980c0 x20: 0001 x19: c6398080 x18: x17: x16: x15: 3000b4a3bbb0 x14: 3000b4a30888 x13: 3000b4a3cf60 x12: x11: x10: x9 : c08120e4d6bc x8 : x7 : x6 : 00048cfa x5 : x4 : 0001 x3 : 000a x2 : 8000 x1 : x0 : 0001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048966/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2042697] Re: Pull request to address thermal core issues
** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2042697 Title: Pull request to address thermal core issues Status in linux-nvidia-6.2 package in Ubuntu: Fix Released Bug description: The Grace development team has not been testing the 6.2 Ubuntu kernel but instead a newer kernel. When they run their thermal tests on a 6.2 kernel they are running into failures. Investigations have turned up several missing kernel patches. These patches are clean cherry-picks and have been tested and confirmed to fix the thermal issues we are seeing. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2042697/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Changed in: linux (Ubuntu) Status: Incomplete => Won't Fix ** No longer affects: linux (Ubuntu Jammy) ** No longer affects: linux-nvidia (Ubuntu Jammy) ** Changed in: linux-nvidia (Ubuntu) Status: New => Fix Committed ** Changed in: linux-nvidia (Ubuntu) Assignee: (unassigned) => Ian May (ian-may) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Won't Fix Status in linux-nvidia package in Ubuntu: Fix Committed Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] * Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] * This introduces new code paths so regression potential should be low. [Other Info] * SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Package changed: linux-nvidia (Ubuntu) => linux (Ubuntu) ** Also affects: linux-nvidia (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Won't Fix Status in linux-nvidia package in Ubuntu: Fix Committed Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] * Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] * This introduces new code paths so regression potential should be low. [Other Info] * SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Package changed: linux (Ubuntu) => linux-nvidia (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux-nvidia package in Ubuntu: Incomplete Status in linux-nvidia source package in Jammy: Incomplete Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] * Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] * This introduces new code paths so regression potential should be low. [Other Info] * SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2043059] Re: Installation errors out when installing in a chroot
I don't appear to have access to the image file used in the reproducer. http://bright-dev.nvidia.com/base-distributions/x86_64/dgx-os/dgx-os-6.1-trd4/DGXOS-6.1.0-DGX-H100.tar.gz So instead I'm using the following image for reproducing. https://cloud-images.ubuntu.com/jammy/20231027/jammy-server-cloudimg-amd64.tar.gz The error indicates to me that it can't find the root device. If I don't bind mount /dev into my image, I'm able to recreate the error with both linux-generic and linux-nvidia. With the host /dev mounted into the chroot both kernels are able to call mkinitramfs successfully. Can you confirm that 'cm-chroot-sw-img' is mounting /dev? mount | grep /cm/images/dgx-h100-image/dev If we are lucky and it happens to not be mounted could you try the following: sudo mount --bind /dev /cm/images/dgx-h100-image/dev sudo chroot /cm/images/dgx-h100-image /etc/kernel/postinst.d/kdump-tools 5.15.0-1040-nvidia If /dev is correctly mounted and problem persists, I'll probably need a way to get that image tar to further investigate. Thanks, Ian -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/2043059 Title: Installation errors out when installing in a chroot Status in linux-nvidia package in Ubuntu: New Bug description: Processing triggers for linux-image-5.15.0-1040-nvidia (5.15.0-1040.40) ... /etc/kernel/postinst.d/dkms: * dkms: running auto installation service for kernel 5.15.0-1040-nvidia ...done. /etc/kernel/postinst.d/initramfs-tools: update-initramfs: Generating /boot/initrd.img-5.15.0-1040-nvidia cryptsetup: WARNING: Couldn't determine root device W: Couldn't identify type of root file system for fsck hook cp: cannot stat '/etc/iscsi/initiatorname.iscsi': No such file or directory /etc/kernel/postinst.d/kdump-tools: kdump-tools: Generating /var/lib/kdump/initrd.img-5.15.0-1040-nvidia mkinitramfs: failed to determine device for / mkinitramfs: workaround is MODULES=most, check: grep -r MODULES /var/lib/kdump/initramfs-tools Error please report bug on initramfs-tools Include the output of 'mount' and 'cat /proc/mounts' update-initramfs: failed for /var/lib/kdump/initrd.img-5.15.0-1040-nvidia with 1. run-parts: /etc/kernel/postinst.d/kdump-tools exited with return code 1 dpkg: error processing package linux-image-5.15.0-1040-nvidia (--configure): installed linux-image-5.15.0-1040-nvidia package post-installation script subprocess returned error exit status 1 Errors were encountered while processing: linux-image-5.15.0-1040-nvidia E: Sub-process /usr/bin/dpkg returned an error code (1) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2043059/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Changed in: linux (Ubuntu) Status: Incomplete => New ** Description changed: SRU Justification: [Impact] - From Nvidia: + *From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] - Testing instructions are outlined in the SF case and has been tested on + *Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] - This introduces new code paths so regression potential should be low. + *This introduces new code paths so regression potential should be low. [Other Info] - SF#00370664 + + *SF#00370664 ** Description changed: SRU Justification: [Impact] - *From Nvidia: + * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] - *Testing instructions are outlined in the SF case and has been tested on - in house hardware and externally by Nvidia. + * Testing instructions are outlined in the SF case and has been tested + on in house hardware and externally by Nvidia. [Where problems could occur?] - *This introduces new code paths so regression potential should be low. + * This introduces new code paths so regression potential should be low. [Other Info] - *SF#00370664 + * SF#00370664 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: New Status in linux source package in Jammy: New Bug description: SRU Justification: [Impact] * From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Description changed: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ - The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: + The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on - local hardware and also by Nvidia. + in house hardware and externally by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Incomplete Status in linux source package in Jammy: New Bug description: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on in house hardware and externally by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF
** Description changed: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference + https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ + The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: + "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on local hardware and also by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport RDMA DMABUF Status in linux package in Ubuntu: Incomplete Status in linux source package in Jammy: New Bug description: SRU Justification: [Impact] From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in Jammy and was included in: "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory" [Test Plan] Testing instructions are outlined in the SF case and has been tested on local hardware and also by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other Info] SF#00370664 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2040526] Re: Backport DMABUF functionality
** Description changed: SRU Justification: [Impact] - Backport RDMA DMABUF functionality + Backport RDMA DMABUF - Nvidia is working on a high performance networking solution with real + From Nvidia: + + "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not - execute these new code paths. + execute these new code paths." - * First 3 patches adds a new api to the RDMA subsystem that allows drivers to get a pinned dmabuf memory - region without requiring an implementation of the move_notify callback. - + Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ - - * The remaining patches add support for DMABUF when creating a devx umem. devx umems - are quite similar to MR's execpt they cannot be revoked, so this uses the - dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot - work with MR. - - https://lore.kernel.org/all/0-v1-bd147097458e+ede- - umem_dmabuf_...@nvidia.com/ + https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ [Test Plan] - SW Configuration: - • Download CUDA 12.2 run file (https://developer.nvidia.com/cuda-downloads?target_os=Linux_arch=x86_64=Ubuntu_version=20.04_type=runfile_local) - • Install using kernel-open i.e. #sh ./cuda_12.2.2_535.104.05_linux.run -m=kernel-open - • Clone perftest from https://github.com/linux-rdma/perftest. - • cd perftest - • export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH - • export LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LIBRARY_PATH - • run: ./autogen.sh ; ./configure CUDA_H_PATH=/usr/local/cuda/include/cuda.h; make - - # Start Server - $ ./ib_write_bw -d mlx5_2 -F --use_cuda=0 --use_cuda_dmabuf - - #Start Client - $ ./ib_write_bw -d mlx5_3 -F --use_cuda=1 --use_cuda_dmabuf localhost + Testing instructions are outlined in the SF case and has been tested on + local hardware and also by Nvidia. [Where problems could occur?] + + This introduces new code paths so regression potential should be low. + + [Other Info] + SF#00370664 ** Description changed: SRU Justification: [Impact] - - Backport RDMA DMABUF From Nvidia: "We are working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths." Upstream Reference https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/ [Test Plan] Testing instructions are outlined in the SF case and has been tested on local hardware and also by Nvidia. [Where problems could occur?] This introduces new code paths so regression potential should be low. [Other
[Kernel-packages] [Bug 2040526] [NEW] Backport DMABUF functionality
Public bug reported: SRU Justification: [Impact] Backport RDMA DMABUF functionality Nvidia is working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths. * First 3 patches adds a new api to the RDMA subsystem that allows drivers to get a pinned dmabuf memory region without requiring an implementation of the move_notify callback. https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ * The remaining patches add support for DMABUF when creating a devx umem. devx umems are quite similar to MR's execpt they cannot be revoked, so this uses the dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot work with MR. https://lore.kernel.org/all/0-v1-bd147097458e+ede- umem_dmabuf_...@nvidia.com/ [Test Plan] SW Configuration: • Download CUDA 12.2 run file (https://developer.nvidia.com/cuda-downloads?target_os=Linux_arch=x86_64=Ubuntu_version=20.04_type=runfile_local) • Install using kernel-open i.e. #sh ./cuda_12.2.2_535.104.05_linux.run -m=kernel-open • Clone perftest from https://github.com/linux-rdma/perftest. • cd perftest • export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH • export LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LIBRARY_PATH • run: ./autogen.sh ; ./configure CUDA_H_PATH=/usr/local/cuda/include/cuda.h; make # Start Server $ ./ib_write_bw -d mlx5_2 -F --use_cuda=0 --use_cuda_dmabuf #Start Client $ ./ib_write_bw -d mlx5_3 -F --use_cuda=1 --use_cuda_dmabuf localhost [Where problems could occur?] ** Affects: linux (Ubuntu) Importance: Undecided Status: Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2040526 Title: Backport DMABUF functionality Status in linux package in Ubuntu: Incomplete Bug description: SRU Justification: [Impact] Backport RDMA DMABUF functionality Nvidia is working on a high performance networking solution with real customers. That solution is being developed using the Ubuntu 22.04 LTS distro release and the distro kernel (lowlatency flavour). This “dma_buf” patchset consists of upstreamed patches that allow buffers to be shared between drivers thus enhancing performance while reducing copying of data. Our team is currently engaged in the development of a high-performance networking solution tailored to meet the demands of real-world customers. This cutting-edge solution is being crafted on the foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically the lowlatency flavor. At the heart of our innovation lies the transformative "dma_buf" patchset, comprising a series of patches that have been integrated into the upstream kernel in 5.16 and 5.17. These patches introduce a groundbreaking capability: enabling the seamless sharing of buffers among various drivers. This not only bolsters the solution's performance but also minimizes the need for data copying, effectively enhancing efficiency across the board. The new functionality is isolated such that existing user will not execute these new code paths. * First 3 patches adds a new api to the RDMA subsystem that allows drivers to get a pinned dmabuf memory region without requiring an implementation of the move_notify callback. https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/ * The remaining patches add support for DMABUF when creating a devx umem. devx umems are quite similar to MR's execpt they cannot be revoked, so this uses the dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot work with MR. https://lore.kernel.org/all/0-v1-bd147097458e+ede- umem_dmabuf_...@nvidia.com/ [Test Plan] SW Configuration: • Download CUDA 12.2 run file
[Kernel-packages] [Bug 2038099] Re: Enable building and signing of the nvidia-fs out-of-tree kernel module.
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Committed ** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2038099 Title: Enable building and signing of the nvidia-fs out-of-tree kernel module. Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: Fix Committed Bug description: [Issue] The nvidia-fs kernel module is a must have for Nvidia optimized kernels. There is now a version that is compatible with the Grace processor. Integrate the changes necessary to build and sign this out- of-tree kernel module. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2038099/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2033685] Re: Pull-request to address ARM CoreSoght PMU issues
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2033685 Title: Pull-request to address ARM CoreSoght PMU issues Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: Fix Committed Bug description: [issue] This patch set addresses several CoreSight PMU issues. These are all upstream patches. Commit Summary 2940a5e perf: arm_cspmu: Fix variable dereference warning 06f6951 perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used 292771d perf/arm_cspmu: Fix event attribute type 6992931 ACPI/APMT: Don't register invalid resource 48f4b92 perf/arm_cspmu: Clean up ACPI dependency 7da1852 perf/arm_cspmu: Decouple APMT dependency d3d56a4 perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE File Changes (4 files) M drivers/acpi/arm64/apmt.c (10) M drivers/perf/arm_cspmu/Kconfig (3) M drivers/perf/arm_cspmu/arm_cspmu.c (95) M drivers/perf/arm_cspmu/arm_cspmu.h (5) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2033685/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037688] Re: Pull-request to address TPM bypass issue
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037688 Title: Pull-request to address TPM bypass issue Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: Fix Committed Bug description: NVIDIA: [Config]: Ensure the TPM is available before IMA initializes Set the following configs: CONFIG_SPI_TEGRA210_QUAD=y CONFIG_TCG_TIS_SPI=y On Grace systems, the IMA driver emits the following log: ima: No TPM chip found, activating TPM-bypass! This occurs because the IMA driver initializes before we are able to detect the TPM. This will always be the case when the drivers required to communicate with the TPM, spi_tegra210_quad and tpm_tis_spi, are built as modules. Having these drivers as built-ins ensures that the TPM is available before the IMA driver initializes. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2037688/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2033685] Re: Pull-request to address ARM CoreSoght PMU issues
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2033685 Title: Pull-request to address ARM CoreSoght PMU issues Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: New Bug description: [issue] This patch set addresses several CoreSight PMU issues. These are all upstream patches. Commit Summary 2940a5e perf: arm_cspmu: Fix variable dereference warning 06f6951 perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used 292771d perf/arm_cspmu: Fix event attribute type 6992931 ACPI/APMT: Don't register invalid resource 48f4b92 perf/arm_cspmu: Clean up ACPI dependency 7da1852 perf/arm_cspmu: Decouple APMT dependency d3d56a4 perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE File Changes (4 files) M drivers/acpi/arm64/apmt.c (10) M drivers/perf/arm_cspmu/Kconfig (3) M drivers/perf/arm_cspmu/arm_cspmu.c (95) M drivers/perf/arm_cspmu/arm_cspmu.h (5) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2033685/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037688] Re: Pull-request to address TPM bypass issue
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-nvidia-6.2 (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037688 Title: Pull-request to address TPM bypass issue Status in linux-nvidia-6.2 package in Ubuntu: Fix Committed Status in linux-nvidia-6.2 source package in Jammy: New Bug description: NVIDIA: [Config]: Ensure the TPM is available before IMA initializes Set the following configs: CONFIG_SPI_TEGRA210_QUAD=y CONFIG_TCG_TIS_SPI=y On Grace systems, the IMA driver emits the following log: ima: No TPM chip found, activating TPM-bypass! This occurs because the IMA driver initializes before we are able to detect the TPM. This will always be the case when the drivers required to communicate with the TPM, spi_tegra210_quad and tpm_tis_spi, are built as modules. Having these drivers as built-ins ensures that the TPM is available before the IMA driver initializes. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2037688/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Changed in: linux (Ubuntu Jammy) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: Triaged Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: Fix Committed Status in linux source package in Jammy: Fix Committed Status in linux-hwe-5.19 source package in Jammy: Won't Fix Status in linux-hwe-6.2 source package in Jammy: Fix Committed Status in linux source package in Lunar: Fix Committed Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Also affects: linux (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: linux-hwe-5.19 (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: linux-hwe-6.2 (Ubuntu Jammy) Importance: Undecided Status: New ** Changed in: linux-hwe-5.19 (Ubuntu Jammy) Status: New => Fix Committed ** Changed in: linux-hwe-6.2 (Ubuntu Jammy) Status: New => Fix Committed ** Changed in: linux-hwe-5.19 (Ubuntu Jammy) Status: Fix Committed => Won't Fix -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: Triaged Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: Fix Committed Status in linux source package in Jammy: New Status in linux-hwe-5.19 source package in Jammy: Won't Fix Status in linux-hwe-6.2 source package in Jammy: Fix Committed Status in linux source package in Lunar: Fix Committed Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Changed in: linux-hwe-6.2 (Ubuntu) Status: New => Incomplete ** Changed in: linux-hwe-6.2 (Ubuntu) Status: Incomplete => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: Triaged Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: Fix Committed Status in linux source package in Lunar: Fix Committed Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1982519] Re: GDS: Add NFS patches to optimized kernel
** Changed in: linux-nvidia-5.19 (Ubuntu Jammy) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/1982519 Title: GDS: Add NFS patches to optimized kernel Status in linux-nvidia package in Ubuntu: New Status in linux-nvidia-5.19 package in Ubuntu: New Status in linux-nvidia-6.2 package in Ubuntu: New Status in linux-nvidia source package in Jammy: Fix Released Status in linux-nvidia-5.19 source package in Jammy: Fix Released Status in linux-nvidia-6.2 source package in Jammy: Fix Released Bug description: [Impact] Adding these changes will enable GDS functionality NFS drivers. [Fix] This is a not a fix but a new feature being to NFS driver. [Test] Tested the NFS driver on a hpe system as I did not have a setup with BASEOS6. 1) Installed 5.15.39 kernel on the system (this is the kernel that optimized kernel is on currently). 2) Downloaded the optimized kernel. 3) Applied the patches to the optimized kernel 4) Replaced the NFS modules on the system with the one's built on optimized kernel. 5) Ran gds and compat mode tests on a NFS mount with the patched NFS driver. All tests went fine. Attaching the results Compat mode tests == ** API Tests, : 72 / 72 tests passed ** Testsuite : 211 / 211 tests passed done tests:Thu Jul 21 08:27:58 PM UTC 2022 GDS mode tests == ** NVFS IOCTL negative Tests, : 23 / 23 tests passed ** Testsuite : 249 / 249 tests passed End: nvidia-fs: GDS Version: 1.4.0.31 NVFS statistics(ver: 4.0) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/1982519/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1982519] Re: GDS: Add NFS patches to optimized kernel
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy) Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia in Ubuntu. https://bugs.launchpad.net/bugs/1982519 Title: GDS: Add NFS patches to optimized kernel Status in linux-nvidia package in Ubuntu: New Status in linux-nvidia-5.19 package in Ubuntu: New Status in linux-nvidia-6.2 package in Ubuntu: New Status in linux-nvidia source package in Jammy: Fix Released Status in linux-nvidia-5.19 source package in Jammy: New Status in linux-nvidia-6.2 source package in Jammy: Fix Released Bug description: [Impact] Adding these changes will enable GDS functionality NFS drivers. [Fix] This is a not a fix but a new feature being to NFS driver. [Test] Tested the NFS driver on a hpe system as I did not have a setup with BASEOS6. 1) Installed 5.15.39 kernel on the system (this is the kernel that optimized kernel is on currently). 2) Downloaded the optimized kernel. 3) Applied the patches to the optimized kernel 4) Replaced the NFS modules on the system with the one's built on optimized kernel. 5) Ran gds and compat mode tests on a NFS mount with the patched NFS driver. All tests went fine. Attaching the results Compat mode tests == ** API Tests, : 72 / 72 tests passed ** Testsuite : 211 / 211 tests passed done tests:Thu Jul 21 08:27:58 PM UTC 2022 GDS mode tests == ** NVFS IOCTL negative Tests, : 23 / 23 tests passed ** Testsuite : 249 / 249 tests passed End: nvidia-fs: GDS Version: 1.4.0.31 NVFS statistics(ver: 4.0) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/1982519/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
** Also affects: linux (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: linux-hwe-5.19 (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: linux-hwe-6.2 (Ubuntu Lunar) Importance: Undecided Status: New ** No longer affects: linux-hwe-5.19 (Ubuntu Lunar) ** No longer affects: linux-hwe-6.2 (Ubuntu Lunar) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: New Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: New Status in linux source package in Lunar: New Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port
With Kinetic going EOL, there will be no further SRU updates for linux- hwe-5.19 ** Description changed: + SRU Justification: + + [ Impact ] + + On systems that have the following combination of hardware + + 1) arm64 CPU + 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ + + No output when connecting a display to the BMC's VGA port. + + [ Fix ] + + For AST2500+ MMIO should be enabled by default. + + [ Test Plan ] + + Test on targeted hardware to make sure BMC is displaying output. + + [ Where problems could occur ] + + Not aware of any potential problems, but any should be confined to + ASPEED AST2500+ hardware. + + [ Other Info ] + + Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been + tested with affected BMC. + + + [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 ** Changed in: linux-hwe-5.19 (Ubuntu) Status: New => Won't Fix -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2026776 Title: arm64+ast2600: No Output from BMC's VGA port Status in linux package in Ubuntu: New Status in linux-hwe-5.19 package in Ubuntu: Won't Fix Status in linux-hwe-6.2 package in Ubuntu: New Bug description: SRU Justification: [ Impact ] On systems that have the following combination of hardware 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ No output when connecting a display to the BMC's VGA port. [ Fix ] For AST2500+ MMIO should be enabled by default. [ Test Plan ] Test on targeted hardware to make sure BMC is displaying output. [ Where problems could occur ] Not aware of any potential problems, but any should be confined to ASPEED AST2500+ hardware. [ Other Info ] Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been tested with affected BMC. [Issue] On systems that have the following combination of hardware...: 1) arm64 CPU 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/ .. we see no output when connecting a display to the BMC's VGA port. Upon further investigation, we see that applying the following patch fixes this issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228 [Action] Please apply the following two backports to the appropriate Ubuntu HWE kernels: https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560 https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2019240] Re: Pull-request to address a number of enablement issues for Orin platforms
Changing Package to linux-nvidia-tegra ** Package changed: linux-nvidia (Ubuntu) => linux-nvidia-tegra (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-nvidia-tegra in Ubuntu. https://bugs.launchpad.net/bugs/2019240 Title: Pull-request to address a number of enablement issues for Orin platforms Status in linux-nvidia-tegra package in Ubuntu: New Bug description: [impact] This patch set addresses a wide variety of bugs and missing features for NVIDIA Orin platforms. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-tegra/+bug/2019240/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1999082] [NEW] linux-modules-nvidia-510-server fails to install
Public bug reported: $ sudo apt install linux-modules-nvidia-510-server-$(uname -r) Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: linux-modules-nvidia-510-server-5.4.0-135-generic : Depends: linux-signatures-nvidia-5.4.0-135-generic (= 5.4.0-135.152) but 5.4.0-135.152+1 is to be installed Depends: nvidia-kernel-common-510-server (<= 510.85.02-1) but it is not going to be installed Depends: nvidia-kernel-common-510-server (>= 510.85.02) but it is not going to be installed E: Unable to correct problems, you have held broken packages. $ lsb_release -rd Description:Ubuntu 20.04.5 LTS Release:20.04 ** Affects: nvidia-graphics-drivers-510-server (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to nvidia-graphics-drivers-510-server in Ubuntu. https://bugs.launchpad.net/bugs/1999082 Title: linux-modules-nvidia-510-server fails to install Status in nvidia-graphics-drivers-510-server package in Ubuntu: New Bug description: $ sudo apt install linux-modules-nvidia-510-server-$(uname -r) Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: linux-modules-nvidia-510-server-5.4.0-135-generic : Depends: linux-signatures-nvidia-5.4.0-135-generic (= 5.4.0-135.152) but 5.4.0-135.152+1 is to be installed Depends: nvidia-kernel-common-510-server (<= 510.85.02-1) but it is not going to be installed Depends: nvidia-kernel-common-510-server (>= 510.85.02) but it is not going to be installed E: Unable to correct problems, you have held broken packages. $ lsb_release -rd Description: Ubuntu 20.04.5 LTS Release: 20.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510-server/+bug/1999082/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1975509] Re: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic
** Changed in: fabric-manager-510 (Ubuntu Bionic) Status: New => Fix Committed ** Changed in: fabric-manager-510 (Ubuntu Focal) Status: New => Fix Committed ** Changed in: fabric-manager-510 (Ubuntu Impish) Status: New => Fix Committed ** Changed in: fabric-manager-510 (Ubuntu Jammy) Status: New => Fix Committed ** Changed in: fabric-manager-510 (Ubuntu Kinetic) Status: New => Fix Committed ** Changed in: libnvidia-nscq-510 (Ubuntu Bionic) Status: New => Fix Committed ** Changed in: libnvidia-nscq-510 (Ubuntu Focal) Status: New => Fix Committed ** Changed in: libnvidia-nscq-510 (Ubuntu Impish) Status: New => Fix Committed ** Changed in: libnvidia-nscq-510 (Ubuntu Jammy) Status: New => Fix Committed ** Changed in: libnvidia-nscq-510 (Ubuntu Kinetic) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1975509 Title: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic Status in fabric-manager-510 package in Ubuntu: Fix Committed Status in libnvidia-nscq-510 package in Ubuntu: Fix Committed Status in linux-restricted-modules package in Ubuntu: Confirmed Status in nvidia-graphics-drivers-510-server package in Ubuntu: Fix Committed Status in fabric-manager-510 source package in Bionic: Fix Committed Status in libnvidia-nscq-510 source package in Bionic: Fix Committed Status in linux-restricted-modules source package in Bionic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Bionic: Fix Released Status in fabric-manager-510 source package in Focal: Fix Committed Status in libnvidia-nscq-510 source package in Focal: Fix Committed Status in linux-restricted-modules source package in Focal: Confirmed Status in nvidia-graphics-drivers-510-server source package in Focal: Fix Released Status in fabric-manager-510 source package in Impish: Fix Committed Status in libnvidia-nscq-510 source package in Impish: Fix Committed Status in linux-restricted-modules source package in Impish: Confirmed Status in nvidia-graphics-drivers-510-server source package in Impish: Fix Released Status in fabric-manager-510 source package in Jammy: Fix Committed Status in libnvidia-nscq-510 source package in Jammy: Fix Committed Status in linux-restricted-modules source package in Jammy: Confirmed Status in nvidia-graphics-drivers-510-server source package in Jammy: Fix Released Status in fabric-manager-510 source package in Kinetic: Fix Committed Status in libnvidia-nscq-510 source package in Kinetic: Fix Committed Status in linux-restricted-modules source package in Kinetic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Kinetic: Fix Committed Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] === 510 kinetic/jammy/impish/focal/bionic === * New upstream release (LP: #1975509): - When calculating the address of grid barrier allocated for a CUDA stream, there was an off-by-one error. The address calculation is corrected in thisrelease. - An issue that caused an AC cycle test to fail with "AssertionError: NVLink links with inappropriate status found" is resolved. - An issue that caused NX 11 to become nonresponsive during a graphics operation is resolved. - Linking issues were observed when using libnvfm.so. Now and other depend tools use dynamic linking with libstdc++ and libgcc. - An intermittent error CUDA_ERROR_NVLINK_UNCORRECTABLE caused by some non-fatal nvlink interrupts is resolved. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/fabric-manager-510/+bug/1975509/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1976425] Re: Release of nvidia-graphics-drivers UDA/ERD 515.48.07 for Bionic, Focal, Impish, Jammy, Kinetic
** Summary changed: - Release of nvidia-graphics-drivers 515.48.07 for Bionic, Focal, Impish, Jammy, Kinetic + Release of nvidia-graphics-drivers UDA/ERD 515.48.07 for Bionic, Focal, Impish, Jammy, Kinetic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1976425 Title: Release of nvidia-graphics-drivers UDA/ERD 515.48.07 for Bionic, Focal, Impish, Jammy, Kinetic Status in linux-restricted-modules package in Ubuntu: New Status in linux-restricted-modules source package in Bionic: New Status in linux-restricted-modules source package in Focal: New Status in linux-restricted-modules source package in Impish: New Status in linux-restricted-modules source package in Jammy: New Status in linux-restricted-modules source package in Kinetic: New Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. Nvidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Changelog] RELEASE HIGHLIGHTS Published the source code to a variant of the NVIDIA Linux kernel modules dual-licensed as MIT/GPLv2. The source is available here: https://github.com/NVIDIA/open-gpu-kernel-modules and will be updated each driver release. Please see the "Open Linux Kernel Modules" chapter in the README for details. Added support for the VK_EXT_external_memory_dma_buf and VK_EXT_image_drm_format_modifier Vulkan extensions. To use this functionality, the nvidia-drm kernel module must be loaded with DRM KMS mode setting enabled. See the DRM KMS section of the README for guidance on enabling mode setting. Changed nvidia-suspend.service, nvidia-resume.service, and nvidia-hibernate.service to use WantedBy= rather than RequiredBy= dependencies for systemd-suspend.service and systemd-hibernate.service. This avoids a problem where suspend or hibernate fails if the NVIDIA driver is uninstalled without disabling these services first. See https://github.com/systemd/systemd/issues/21991 If these services were manually enabled, it may be necessary to update their dependencies by running sudo systemctl reenable nvidia-suspend.service nvidia-resume.service nvidia-hibernate.service Interlaced modes are now disabled when active stereo is enabled. NVIDIA X Server Settings will now display the quit confirmation dialog automatically if only there are pending changes that need to be manually saved. The corresponding configuration option to control the appearance of the quit dialog was thus also removed. Removed the warning message about mismatches between the compiler used to build the Linux kernel and the compiler used to build the NVIDIA kernel modules from nvidia-installer. Modern compilers are less likely to cause problems when this type of mismatch occurs, and it has become common in many distributions to build the Linux kernel with a different compiler than the default system compiler. Updated nvidia-installer to skip test-loading the kernel modules on systems where no supported NVIDIA GPUs are detected. Updated nvidia-installer to avoid a race condition which could cause the kernel module test load to fail due to udev automatically loading kernel modules left over from an existing NVIDIA driver installation. This failure resulted in an installation error message "Kernel module load error: File exists". Updated the RTD3 Video Memory Utilization Threshold (NVreg_DynamicPowerManagementVideoMemoryThreshold) maximum value from 200 MB to 1024 MB. Improved performance of GLX and Vulkan applications running in gamescope. Added a "kernelopen" feature tag to the supported-gpus.json file, to indicate which GPUs are compatible with open-gpu-kernel-modules. Improved Vulkan swapchain creation failure reporting. Applications can use the VK_EXT_debug_utils extension to receive additional information when an error was encountered in vkCreateSwapchainKHR(). Added a new configuration option for NVIDIA NGX to allow disabling the DSO signature check. See the "NGX" chapter of the README for more information. Fixed an issue where HDMI audio output was not working in some cases, especially
[Kernel-packages] [Bug 1976425] [NEW] Release of nvidia-graphics-drivers 515.48.07 for Bionic, Focal, Impish, Jammy, Kinetic
Public bug reported: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. Nvidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Changelog] RELEASE HIGHLIGHTS Published the source code to a variant of the NVIDIA Linux kernel modules dual-licensed as MIT/GPLv2. The source is available here: https://github.com/NVIDIA/open-gpu-kernel-modules and will be updated each driver release. Please see the "Open Linux Kernel Modules" chapter in the README for details. Added support for the VK_EXT_external_memory_dma_buf and VK_EXT_image_drm_format_modifier Vulkan extensions. To use this functionality, the nvidia-drm kernel module must be loaded with DRM KMS mode setting enabled. See the DRM KMS section of the README for guidance on enabling mode setting. Changed nvidia-suspend.service, nvidia-resume.service, and nvidia-hibernate.service to use WantedBy= rather than RequiredBy= dependencies for systemd-suspend.service and systemd-hibernate.service. This avoids a problem where suspend or hibernate fails if the NVIDIA driver is uninstalled without disabling these services first. See https://github.com/systemd/systemd/issues/21991 If these services were manually enabled, it may be necessary to update their dependencies by running sudo systemctl reenable nvidia-suspend.service nvidia-resume.service nvidia-hibernate.service Interlaced modes are now disabled when active stereo is enabled. NVIDIA X Server Settings will now display the quit confirmation dialog automatically if only there are pending changes that need to be manually saved. The corresponding configuration option to control the appearance of the quit dialog was thus also removed. Removed the warning message about mismatches between the compiler used to build the Linux kernel and the compiler used to build the NVIDIA kernel modules from nvidia-installer. Modern compilers are less likely to cause problems when this type of mismatch occurs, and it has become common in many distributions to build the Linux kernel with a different compiler than the default system compiler. Updated nvidia-installer to skip test-loading the kernel modules on systems where no supported NVIDIA GPUs are detected. Updated nvidia-installer to avoid a race condition which could cause the kernel module test load to fail due to udev automatically loading kernel modules left over from an existing NVIDIA driver installation. This failure resulted in an installation error message "Kernel module load error: File exists". Updated the RTD3 Video Memory Utilization Threshold (NVreg_DynamicPowerManagementVideoMemoryThreshold) maximum value from 200 MB to 1024 MB. Improved performance of GLX and Vulkan applications running in gamescope. Added a "kernelopen" feature tag to the supported-gpus.json file, to indicate which GPUs are compatible with open-gpu-kernel-modules. Improved Vulkan swapchain creation failure reporting. Applications can use the VK_EXT_debug_utils extension to receive additional information when an error was encountered in vkCreateSwapchainKHR(). Added a new configuration option for NVIDIA NGX to allow disabling the DSO signature check. See the "NGX" chapter of the README for more information. Fixed an issue where HDMI audio output was not working in some cases, especially with high display refresh rates (120Hz, 100Hz, etc.) using Fixed Rate Link (FRL) transmission mode. ** Affects: linux-restricted-modules (Ubuntu) Importance: Undecided Status: New ** Affects: linux-restricted-modules (Ubuntu Bionic) Importance: Undecided Status: New ** Affects: linux-restricted-modules (Ubuntu Focal) Importance: Undecided Status: New ** Affects: linux-restricted-modules (Ubuntu Impish) Importance: Undecided Status: New ** Affects: linux-restricted-modules (Ubuntu Jammy) Importance: Undecided Status: New ** Affects: linux-restricted-modules (Ubuntu Kinetic) Importance: Undecided Status: New ** Description changed: + RELEASE HIGHLIGHTS + Published the source code to a variant of the NVIDIA Linux kernel modules dual-licensed as MIT/GPLv2. The source is available here: https://github.com/NVIDIA/open-gpu-kernel-modules and will be updated each driver release. Please
[Kernel-packages] [Bug 1975509] Re: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic
** Also affects: fabric-manager-510 (Ubuntu) Importance: Undecided Status: New ** Also affects: libnvidia-nscq-510 (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1975509 Title: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic Status in fabric-manager-510 package in Ubuntu: New Status in libnvidia-nscq-510 package in Ubuntu: New Status in linux-restricted-modules package in Ubuntu: Confirmed Status in nvidia-graphics-drivers-510-server package in Ubuntu: Fix Committed Status in fabric-manager-510 source package in Bionic: New Status in libnvidia-nscq-510 source package in Bionic: New Status in linux-restricted-modules source package in Bionic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Bionic: Fix Committed Status in fabric-manager-510 source package in Focal: New Status in libnvidia-nscq-510 source package in Focal: New Status in linux-restricted-modules source package in Focal: Confirmed Status in nvidia-graphics-drivers-510-server source package in Focal: Fix Committed Status in fabric-manager-510 source package in Impish: New Status in libnvidia-nscq-510 source package in Impish: New Status in linux-restricted-modules source package in Impish: Confirmed Status in nvidia-graphics-drivers-510-server source package in Impish: Fix Committed Status in fabric-manager-510 source package in Jammy: New Status in libnvidia-nscq-510 source package in Jammy: New Status in linux-restricted-modules source package in Jammy: Confirmed Status in nvidia-graphics-drivers-510-server source package in Jammy: Fix Committed Status in fabric-manager-510 source package in Kinetic: New Status in libnvidia-nscq-510 source package in Kinetic: New Status in linux-restricted-modules source package in Kinetic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Kinetic: Fix Committed Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] === 510 kinetic/jammy/impish/focal/bionic === * New upstream release (LP: #1975509): - When calculating the address of grid barrier allocated for a CUDA stream, there was an off-by-one error. The address calculation is corrected in thisrelease. - An issue that caused an AC cycle test to fail with "AssertionError: NVLink links with inappropriate status found" is resolved. - An issue that caused NX 11 to become nonresponsive during a graphics operation is resolved. - Linking issues were observed when using libnvfm.so. Now and other depend tools use dynamic linking with libstdc++ and libgcc. - An intermittent error CUDA_ERROR_NVLINK_UNCORRECTABLE caused by some non-fatal nvlink interrupts is resolved. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/fabric-manager-510/+bug/1975509/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1975509] Re: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic
** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Bionic) Status: Confirmed => In Progress ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Focal) Status: Confirmed => In Progress ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Impish) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1975509 Title: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic Status in linux-restricted-modules package in Ubuntu: Confirmed Status in nvidia-graphics-drivers-510-server package in Ubuntu: Confirmed Status in linux-restricted-modules source package in Bionic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Bionic: In Progress Status in linux-restricted-modules source package in Focal: Confirmed Status in nvidia-graphics-drivers-510-server source package in Focal: In Progress Status in linux-restricted-modules source package in Impish: Confirmed Status in nvidia-graphics-drivers-510-server source package in Impish: In Progress Status in linux-restricted-modules source package in Jammy: Confirmed Status in nvidia-graphics-drivers-510-server source package in Jammy: Confirmed Status in linux-restricted-modules source package in Kinetic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Kinetic: Confirmed Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] === 510 kinetic/jammy/impish/focal/bionic === * New upstream release (LP: #1975509): - When calculating the address of grid barrier allocated for a CUDA stream, there was an off-by-one error. The address calculation is corrected in thisrelease. - An issue that caused an AC cycle test to fail with "AssertionError: NVLink links with inappropriate status found" is resolved. - An issue that caused NX 11 to become nonresponsive during a graphics operation is resolved. - Linking issues were observed when using libnvfm.so. Now and other depend tools use dynamic linking with libstdc++ and libgcc. - An intermittent error CUDA_ERROR_NVLINK_UNCORRECTABLE caused by some non-fatal nvlink interrupts is resolved. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules/+bug/1975509/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1975509] Re: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic
** Description changed: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] + === 510 kinetic/jammy/impish/focal/bionic === - When calculating the address of grid barrier allocated for a CUDA - stream, there was an off-by-one error. The address calculation is - corrected in this release. - - An issue that caused an AC cycle test to fail with "AssertionError: NVLink links with inappropriate status found" is resolved. - An issue that caused NX 11 to become nonresponsive during a graphics operation is resolved. - - Linking issues were observed when using libnvfm.so. Now and other depend tools use dynamic linking with libstdc++ and libgcc. - An intermittent error CUDA_ERROR_NVLINK_UNCORRECTABLE caused by some non-fatal nvlink interrupts is resolved. + * New upstream release (LP: #1975509): + - When calculating the address of grid barrier allocated for a CUDA + stream, there was an off-by-one error. The address calculation is + corrected in thisrelease. + - An issue that caused an AC cycle test to fail with "AssertionError: + NVLink links with inappropriate status found" is resolved. + - An issue that caused NX 11 to become nonresponsive during a graphics + operation is resolved. + - Linking issues were observed when using libnvfm.so. Now and other + depend tools use dynamic linking with libstdc++ and libgcc. + - An intermittent error CUDA_ERROR_NVLINK_UNCORRECTABLE caused by some + non-fatal nvlink interrupts is resolved. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1975509 Title: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic Status in linux-restricted-modules package in Ubuntu: Confirmed Status in nvidia-graphics-drivers-510-server package in Ubuntu: Confirmed Status in linux-restricted-modules source package in Bionic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Bionic: Confirmed Status in linux-restricted-modules source package in Focal: Confirmed Status in nvidia-graphics-drivers-510-server source package in Focal: Confirmed Status in linux-restricted-modules source package in Impish: Confirmed Status in nvidia-graphics-drivers-510-server source package in Impish: Confirmed Status in linux-restricted-modules source package in Jammy: Confirmed Status in nvidia-graphics-drivers-510-server source package in Jammy: Confirmed Status in linux-restricted-modules source package in Kinetic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Kinetic: Confirmed Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] === 510 kinetic/jammy/impish/focal/bionic === * New upstream release (LP: #1975509): - When calculating the address of grid barrier allocated for a CUDA stream, there was an off-by-one error. The address calculation is corrected in thisrelease. - An issue that caused an AC cycle test to fail with "AssertionError: NVLink links with inappropriate status found" is resolved. - An issue that caused NX 11 to become nonresponsive during a graphics
[Kernel-packages] [Bug 1975509] Re: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic
** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Bionic) Assignee: (unassigned) => Ian May (ian-may) ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Focal) Assignee: (unassigned) => Ian May (ian-may) ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Impish) Assignee: (unassigned) => Ian May (ian-may) ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Jammy) Assignee: (unassigned) => Ian May (ian-may) ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Kinetic) Assignee: (unassigned) => Ian May (ian-may) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1975509 Title: Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic Status in linux-restricted-modules package in Ubuntu: Confirmed Status in nvidia-graphics-drivers-510-server package in Ubuntu: Confirmed Status in linux-restricted-modules source package in Bionic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Bionic: Confirmed Status in linux-restricted-modules source package in Focal: Confirmed Status in nvidia-graphics-drivers-510-server source package in Focal: Confirmed Status in linux-restricted-modules source package in Impish: Confirmed Status in nvidia-graphics-drivers-510-server source package in Impish: Confirmed Status in linux-restricted-modules source package in Jammy: Confirmed Status in nvidia-graphics-drivers-510-server source package in Jammy: Confirmed Status in linux-restricted-modules source package in Kinetic: Confirmed Status in nvidia-graphics-drivers-510-server source package in Kinetic: Confirmed Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] When calculating the address of grid barrier allocated for a CUDA stream, there was an off-by-one error. The address calculation is corrected in this release. An issue that caused an AC cycle test to fail with "AssertionError: NVLink links with inappropriate status found" is resolved. An issue that caused NX 11 to become nonresponsive during a graphics operation is resolved. Linking issues were observed when using libnvfm.so. Now and other depend tools use dynamic linking with libstdc++ and libgcc. An intermittent error CUDA_ERROR_NVLINK_UNCORRECTABLE caused by some non-fatal nvlink interrupts is resolved. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules/+bug/1975509/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1975509] [NEW] Update to the 510.73.08 ERD NVIDIA driver series in Bionic, Focal, Impish, Jammy, and Kinetic
Public bug reported: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] When calculating the address of grid barrier allocated for a CUDA stream, there was an off-by-one error. The address calculation is corrected in this release. An issue that caused an AC cycle test to fail with "AssertionError: NVLink links with inappropriate status found" is resolved. An issue that caused NX 11 to become nonresponsive during a graphics operation is resolved. Linking issues were observed when using libnvfm.so. Now and other depend tools use dynamic linking with libstdc++ and libgcc. An intermittent error CUDA_ERROR_NVLINK_UNCORRECTABLE caused by some non-fatal nvlink interrupts is resolved. ** Affects: linux-restricted-modules (Ubuntu) Importance: Undecided Status: Confirmed ** Affects: nvidia-graphics-drivers-510-server (Ubuntu) Importance: Undecided Status: Confirmed ** Affects: linux-restricted-modules (Ubuntu Bionic) Importance: Undecided Status: Confirmed ** Affects: nvidia-graphics-drivers-510-server (Ubuntu Bionic) Importance: Undecided Status: Confirmed ** Affects: linux-restricted-modules (Ubuntu Focal) Importance: Undecided Status: Confirmed ** Affects: nvidia-graphics-drivers-510-server (Ubuntu Focal) Importance: Undecided Status: Confirmed ** Affects: linux-restricted-modules (Ubuntu Impish) Importance: Undecided Status: Confirmed ** Affects: nvidia-graphics-drivers-510-server (Ubuntu Impish) Importance: Undecided Status: Confirmed ** Affects: linux-restricted-modules (Ubuntu Jammy) Importance: Undecided Status: Confirmed ** Affects: nvidia-graphics-drivers-510-server (Ubuntu Jammy) Importance: Undecided Status: Confirmed ** Affects: linux-restricted-modules (Ubuntu Kinetic) Importance: Undecided Status: Confirmed ** Affects: nvidia-graphics-drivers-510-server (Ubuntu Kinetic) Importance: Undecided Status: Confirmed ** Also affects: nvidia-graphics-drivers-510-server (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510-server (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510-server (Ubuntu Impish) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510-server (Ubuntu Kinetic) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510-server (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux-restricted-modules (Ubuntu) Importance: Undecided Status: New ** Changed in: linux-restricted-modules (Ubuntu Bionic) Status: New => Confirmed ** Changed in: linux-restricted-modules (Ubuntu Focal) Status: New => Confirmed ** Changed in: linux-restricted-modules (Ubuntu Impish) Status: New => Confirmed ** Changed in: linux-restricted-modules (Ubuntu Jammy) Status: New => Confirmed ** Changed in: linux-restricted-modules (Ubuntu Kinetic) Status: New => Confirmed ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Bionic) Status: New => Confirmed ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Focal) Status: New => Confirmed ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Impish) Status: New => Confirmed ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Jammy) Status: New => Confirmed ** Changed in: nvidia-graphics-drivers-510-server (Ubuntu Kinetic) Status: New => Confirmed ** No longer affects: linux-restricted-modules (Ubuntu Kinetic) ** No longer affects: nvidia-graphics-drivers-510-server (Ubuntu Kinetic) ** Also affects: linux-restricted-modules (Ubuntu Kinetic) Importance: Undecided Status: Confirmed ** Also affects: nvidia-graphics-drivers-510-server (Ubuntu Kinetic) Importance: Undecided Status: Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to nvidia-graphics-drivers-510-server in Ubuntu.
[Kernel-packages] [Bug 1970798] Re: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs
** Description changed: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The - following upstream patches add the reporting capability. + following upstream v5.7 patches add the reporting capability. - https://lore.kernel.org/linux- - pci/20200229030706.17835-1-helg...@kernel.org/ + PCI ML submission + https://lore.kernel.org/linux-pci/20200229030706.17835-1-helg...@kernel.org/ + + Upstream Patches + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9cb3985af63555810bb07de50acdf4170771451d + + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e56faff57f0b39661093c00e0262d4ab9088830e + + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6348a34dcb98d8e285685a205f2a601817fa2d38 + + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=757bfaa2c3515803dde9a6728bbf8c8a3c5f098a + [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver misreporting speeds. [Other] SF-00333784 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1970798 Title: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Bug description: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream v5.7 patches add the reporting capability. PCI ML submission https://lore.kernel.org/linux-pci/20200229030706.17835-1-helg...@kernel.org/ Upstream Patches https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9cb3985af63555810bb07de50acdf4170771451d https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e56faff57f0b39661093c00e0262d4ab9088830e https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6348a34dcb98d8e285685a205f2a601817fa2d38 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=757bfaa2c3515803dde9a6728bbf8c8a3c5f098a [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver misreporting speeds. [Other] SF-00333784 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970798/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970798] Re: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs
** Description changed: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If - a problem did occur it would be with sysfs or pcie driver. + a problem did occur it would be with sysfs or pcie driver misreporting + speeds. [Other] SF-00333784 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1970798 Title: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Bug description: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver misreporting speeds. [Other] SF-00333784 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970798/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970798] Re: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs
** Description changed: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver. + + [Other] + SF00333784 ** Description changed: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver. [Other] + SF00333784 ** Description changed: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver. [Other] - SF00333784 + SF-00333784 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1970798 Title: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Bug description: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver. [Other] SF-00333784 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970798/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970798] Re: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs
** Description changed: - Our NCCL software uses the sysfs to populate the attached topo.xml file. - Several of the entries should report "32 GT/s", but they're saying - "Unknown speed" instead. For instance: + [Impact] - + NVIDIA Collective Communication Library software uses sysfs to report + performance statistics. Users have reported entries showing "Unknown + speed" when they should be reporting "32 GT/s". - The 5.4 kernel is missing the following commit: - https://lore.kernel.org/all/1581937984-40353-2-git-send-email- - yangyic...@hisilicon.com/ + Example: + "" + + PCIe 5.0 which supports 32 GT/s is available in the 5.4 kernel, but the + patches for properly reporting speeds in sysfs are missing. The + following upstream patches add the reporting capability. + + https://lore.kernel.org/linux- + pci/20200229030706.17835-1-helg...@kernel.org/ + + + [Test Plan] + + Testing these speeds requires special hardware. A Test kernel with these + patches applied was provided to the customer and they confirmed the + proper numbers are reported. + + + [Where problems could occur] + + Changes are for reporting info so chance of problems should be low. If + a problem did occur it would be with sysfs or pcie driver. ** Changed in: linux (Ubuntu Focal) Status: Incomplete => In Progress ** Changed in: linux (Ubuntu) Status: Incomplete => In Progress ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Changed in: linux (Ubuntu Focal) Importance: Undecided => High ** Description changed: + SRU Justification + [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 which supports 32 GT/s is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ - [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. - [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver. ** Description changed: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" - PCIe 5.0 which supports 32 GT/s is available in the 5.4 kernel, but the + PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver. ** Changed in: linux (Ubuntu) Importance: High => Medium ** Changed in: linux (Ubuntu Focal) Importance: High => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1970798 Title: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Bug description: SRU Justification [Impact] NVIDIA Collective Communication Library software uses sysfs to report performance statistics. Users have reported entries showing "Unknown speed" when they should be reporting "32 GT/s". Example: "" PCIe 5.0 supports 32 GT/s and is available in the 5.4 kernel, but the patches for properly reporting speeds in sysfs are missing. The following upstream patches add the reporting capability. https://lore.kernel.org/linux- pci/20200229030706.17835-1-helg...@kernel.org/ [Test Plan] Testing these speeds requires special hardware. A Test kernel with these patches applied was provided to the customer and they confirmed the proper numbers are reported. [Where problems could occur] Changes are for reporting info so chance of problems should be low. If a problem did occur it would be with sysfs or pcie driver. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970798/+subscriptions -- Mailing list:
[Kernel-packages] [Bug 1970798] [NEW] 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs
Public bug reported: Our NCCL software uses the sysfs to populate the attached topo.xml file. Several of the entries should report "32 GT/s", but they're saying "Unknown speed" instead. For instance: The 5.4 kernel is missing the following commit: https://lore.kernel.org/all/1581937984-40353-2-git-send-email- yangyic...@hisilicon.com/ ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Ian May (ian-may) Status: New ** Affects: linux (Ubuntu Focal) Importance: Undecided Assignee: Ian May (ian-may) Status: New ** Also affects: linux (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Ian May (ian-may) ** Changed in: linux (Ubuntu Focal) Assignee: (unassigned) => Ian May (ian-may) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1970798 Title: 32 GT/s PCI link speeds reporting "Unknown speed" in sysfs Status in linux package in Ubuntu: New Status in linux source package in Focal: New Bug description: Our NCCL software uses the sysfs to populate the attached topo.xml file. Several of the entries should report "32 GT/s", but they're saying "Unknown speed" instead. For instance: The 5.4 kernel is missing the following commit: https://lore.kernel.org/all/1581937984-40353-2-git-send-email- yangyic...@hisilicon.com/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970798/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970451] Re: Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy
** Changed in: nvidia-graphics-drivers-510 (Ubuntu Bionic) Status: Confirmed => Fix Committed ** Changed in: nvidia-graphics-drivers-510 (Ubuntu Focal) Status: Confirmed => Fix Committed ** Changed in: nvidia-graphics-drivers-510 (Ubuntu Impish) Status: Confirmed => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1970451 Title: Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy Status in linux-restricted-modules package in Ubuntu: Confirmed Status in nvidia-graphics-drivers-510 package in Ubuntu: Confirmed Status in linux-restricted-modules source package in Bionic: Confirmed Status in nvidia-graphics-drivers-510 source package in Bionic: Fix Committed Status in linux-restricted-modules source package in Focal: Confirmed Status in nvidia-graphics-drivers-510 source package in Focal: Fix Committed Status in linux-restricted-modules source package in Impish: Confirmed Status in nvidia-graphics-drivers-510 source package in Impish: Fix Committed Status in linux-restricted-modules source package in Jammy: Confirmed Status in nvidia-graphics-drivers-510 source package in Jammy: Confirmed Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] === 510 jammy/impish/focal/bionic === * New upstream release (LP: #1970451): - Fixed an issue where NvFBC was requesting Vulkan 1.0 while using Vulkan 1.1 core features. This caused NvFBC to fail to initialize with Vulkan loader versions 1.3.204 or newer. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules/+bug/1970451/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970451] Re: Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy
** Description changed: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] + + [Changelog] + + === 510 jammy/impish/focal/bionic === + + * New upstream release (LP: #1970451): + - Fixed an issue where NvFBC was requesting Vulkan 1.0 while using + Vulkan 1.1 core features. This caused NvFBC to fail to initialize + with Vulkan loader versions 1.3.204 or newer. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1970451 Title: Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy Status in linux-restricted-modules package in Ubuntu: New Status in nvidia-graphics-drivers-510 package in Ubuntu: New Status in linux-restricted-modules source package in Bionic: New Status in nvidia-graphics-drivers-510 source package in Bionic: New Status in linux-restricted-modules source package in Focal: New Status in nvidia-graphics-drivers-510 source package in Focal: New Status in linux-restricted-modules source package in Impish: New Status in nvidia-graphics-drivers-510 source package in Impish: New Status in linux-restricted-modules source package in Jammy: New Status in nvidia-graphics-drivers-510 source package in Jammy: New Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] [Changelog] === 510 jammy/impish/focal/bionic === * New upstream release (LP: #1970451): - Fixed an issue where NvFBC was requesting Vulkan 1.0 while using Vulkan 1.1 core features. This caused NvFBC to fail to initialize with Vulkan loader versions 1.3.204 or newer. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules/+bug/1970451/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970451] Re: Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy
** Also affects: linux-restricted-modules (Ubuntu Impish) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510 (Ubuntu Impish) Importance: Undecided Status: New ** Also affects: linux-restricted-modules (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510 (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: linux-restricted-modules (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510 (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: linux-restricted-modules (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: nvidia-graphics-drivers-510 (Ubuntu Bionic) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1970451 Title: Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy Status in linux-restricted-modules package in Ubuntu: New Status in nvidia-graphics-drivers-510 package in Ubuntu: New Status in linux-restricted-modules source package in Bionic: New Status in nvidia-graphics-drivers-510 source package in Bionic: New Status in linux-restricted-modules source package in Focal: New Status in nvidia-graphics-drivers-510 source package in Focal: New Status in linux-restricted-modules source package in Impish: New Status in nvidia-graphics-drivers-510 source package in Impish: New Status in linux-restricted-modules source package in Jammy: New Status in nvidia-graphics-drivers-510 source package in Jammy: New Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules/+bug/1970451/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1970451] [NEW] Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy
Public bug reported: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] ** Affects: linux-restricted-modules (Ubuntu) Importance: Undecided Assignee: Ian May (ian-may) Status: New ** Affects: nvidia-graphics-drivers-510 (Ubuntu) Importance: Undecided Assignee: Ian May (ian-may) Status: New ** Package changed: ubuntu => linux-restricted-modules (Ubuntu) ** Changed in: linux-restricted-modules (Ubuntu) Assignee: (unassigned) => Ian May (ian-may) ** Also affects: nvidia-graphics-drivers-510 (Ubuntu) Importance: Undecided Status: New ** Changed in: nvidia-graphics-drivers-510 (Ubuntu) Assignee: (unassigned) => Ian May (ian-may) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-restricted-modules in Ubuntu. https://bugs.launchpad.net/bugs/1970451 Title: Update to the 510.68.02 UDA NVIDIA driver series in Bionic, Focal, Impish, and Jammy Status in linux-restricted-modules package in Ubuntu: New Status in nvidia-graphics-drivers-510 package in Ubuntu: New Bug description: [Impact] These releases provide both bug fixes and new features, and we would like to make sure all of our users have access to these improvements. See the changelog entry below for a full list of changes and bugs. [Test Case] The following development and SRU process was followed: https://wiki.ubuntu.com/NVidiaUpdates Certification test suite must pass on a range of hardware: https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened. [Regression Potential] In order to mitigate the regression potential, the results of the aforementioned system level tests are attached to this bug. [Discussion] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-restricted-modules/+bug/1970451/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1959216] Re: linux-azure: CONFIG_FB_EFI=y
wget https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+files/linux-buildinfo-5.13.0-1014-azure_5.13.0-1014.16_amd64.deb dpkg -x linux-buildinfo-5.13.0-1014-azure_5.13.0-1014.16_amd64.deb . grep CONFIG_FB_EFI ./usr/lib/linux/5.13.0-1014-azure/config CONFIG_FB_EFI=y ** Tags removed: verification-needed-impish ** Tags added: verification-done-impish -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure-5.13 in Ubuntu. https://bugs.launchpad.net/bugs/1959216 Title: linux-azure: CONFIG_FB_EFI=y Status in linux-azure package in Ubuntu: Fix Committed Status in linux-azure-4.15 package in Ubuntu: Invalid Status in linux-azure-5.11 package in Ubuntu: Invalid Status in linux-azure-5.13 package in Ubuntu: Invalid Status in linux-azure source package in Bionic: Invalid Status in linux-azure-4.15 source package in Bionic: Fix Committed Status in linux-azure-5.11 source package in Bionic: Invalid Status in linux-azure-5.13 source package in Bionic: Invalid Status in linux-azure source package in Focal: Fix Committed Status in linux-azure-4.15 source package in Focal: Invalid Status in linux-azure-5.11 source package in Focal: Fix Committed Status in linux-azure-5.13 source package in Focal: Fix Committed Status in linux-azure source package in Impish: Fix Committed Status in linux-azure-4.15 source package in Impish: Invalid Status in linux-azure-5.11 source package in Impish: Invalid Status in linux-azure-5.13 source package in Impish: Invalid Status in linux-azure source package in Jammy: Fix Committed Status in linux-azure-4.15 source package in Jammy: Invalid Status in linux-azure-5.11 source package in Jammy: Invalid Status in linux-azure-5.13 source package in Jammy: Invalid Bug description: SRU Justification [Impact] Secure boot instances of linux-azure require an EFI framebuffer in some cases in order for the VM to boot. The issue was noticed in Ubuntu 18.04 linux-azure kernel, but actually exists in the latest mainline kernel. The issue happens when the below conditions are met: hyperv_pci is built into the kernel and hyperv_fb is not, i.e., this means hyperv_pci loads before hyperv_fb loads. CONFIG_FB_EFI is not defined, i.e., the efifb driver is not used. Here is how the bug happens: Linux VM starts, and vmbus_reserve_fb() reserves the VRAM [base=0xf800, length=8MB]. hyper-pci loads, gets MMIO [base=0xf880, lengh=8KB] as the bridge config window, and may get some other 64-bit MMIO ranges, and some 32-bit MMIO ranges (if needed.) hyperv-fb loads, and gets MMIO [base=0xf800, lengh=8MB or a different length], and sets screen_info.lfb_base = 0. VM panics. The kdump kernel starts to run, and vmbus_reserve_fb() is not reserving [base=0xf800, length=8MB] due to the lfb_base==0. hyperv-pci loads and gets [base=0xf800, lengh=8KB] and the host PCI VSP driver rejects this address as the bridge config window. The crux of the problem is that Linux vmbus driver itself is unable to detect the VRAM base/length (it looks like a video BIOS call is needed to get this info and such a BIOS call is inappropriate or impossible in hv_vmbus) and has to rely on screen_info.lfb_base (which is set by grub or the kdump/kexec tool and can be reset to zero by hyperv_fb/drm). Solution: Enable CONFIG_FB_EFI=y [Test Case] Microsoft tested. This config is also enabled on the master branch. [Where things could go wrong] VMs on certain instance types could fail to boot. [Other Info] SF: #00327005 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1959216/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1959216] Re: linux-azure: CONFIG_FB_EFI=y
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure-5.13 in Ubuntu. https://bugs.launchpad.net/bugs/1959216 Title: linux-azure: CONFIG_FB_EFI=y Status in linux-azure package in Ubuntu: Fix Committed Status in linux-azure-4.15 package in Ubuntu: Invalid Status in linux-azure-5.11 package in Ubuntu: Invalid Status in linux-azure-5.13 package in Ubuntu: Invalid Status in linux-azure source package in Bionic: Invalid Status in linux-azure-4.15 source package in Bionic: Fix Committed Status in linux-azure-5.11 source package in Bionic: Invalid Status in linux-azure-5.13 source package in Bionic: Invalid Status in linux-azure source package in Focal: Fix Committed Status in linux-azure-4.15 source package in Focal: Invalid Status in linux-azure-5.11 source package in Focal: Fix Committed Status in linux-azure-5.13 source package in Focal: Fix Committed Status in linux-azure source package in Impish: Fix Committed Status in linux-azure-4.15 source package in Impish: Invalid Status in linux-azure-5.11 source package in Impish: Invalid Status in linux-azure-5.13 source package in Impish: Invalid Status in linux-azure source package in Jammy: Fix Committed Status in linux-azure-4.15 source package in Jammy: Invalid Status in linux-azure-5.11 source package in Jammy: Invalid Status in linux-azure-5.13 source package in Jammy: Invalid Bug description: SRU Justification [Impact] Secure boot instances of linux-azure require an EFI framebuffer in some cases in order for the VM to boot. The issue was noticed in Ubuntu 18.04 linux-azure kernel, but actually exists in the latest mainline kernel. The issue happens when the below conditions are met: hyperv_pci is built into the kernel and hyperv_fb is not, i.e., this means hyperv_pci loads before hyperv_fb loads. CONFIG_FB_EFI is not defined, i.e., the efifb driver is not used. Here is how the bug happens: Linux VM starts, and vmbus_reserve_fb() reserves the VRAM [base=0xf800, length=8MB]. hyper-pci loads, gets MMIO [base=0xf880, lengh=8KB] as the bridge config window, and may get some other 64-bit MMIO ranges, and some 32-bit MMIO ranges (if needed.) hyperv-fb loads, and gets MMIO [base=0xf800, lengh=8MB or a different length], and sets screen_info.lfb_base = 0. VM panics. The kdump kernel starts to run, and vmbus_reserve_fb() is not reserving [base=0xf800, length=8MB] due to the lfb_base==0. hyperv-pci loads and gets [base=0xf800, lengh=8KB] and the host PCI VSP driver rejects this address as the bridge config window. The crux of the problem is that Linux vmbus driver itself is unable to detect the VRAM base/length (it looks like a video BIOS call is needed to get this info and such a BIOS call is inappropriate or impossible in hv_vmbus) and has to rely on screen_info.lfb_base (which is set by grub or the kdump/kexec tool and can be reset to zero by hyperv_fb/drm). Solution: Enable CONFIG_FB_EFI=y [Test Case] Microsoft tested. This config is also enabled on the master branch. [Where things could go wrong] VMs on certain instance types could fail to boot. [Other Info] SF: #00327005 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1959216/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1960871] Re: linux-modules-extra-* fails to install due to dependency on unsigned package
Fixed sent to ML and has been applied https://lists.ubuntu.com/archives/kernel-team/2022-February/128100.html ** Changed in: linux-aws (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1960871 Title: linux-modules-extra-* fails to install due to dependency on unsigned package Status in linux-aws package in Ubuntu: Fix Committed Bug description: Several SRU tests are failing the test setup due to failure to install the modules-extra package: * Command: yes "" | DEBIAN_FRONTEND=noninteractive apt-get install --yes --force-yes automake bison build-essential byacc flex git keyutils libacl1-dev libaio- dev libcap-dev libmm-dev libnuma-dev libsctp-dev libselinux1-dev libssl- dev libtirpc-dev pkg-config quota xfslibs-dev xfsprogs gcc linux-modules- extra-4.15.0-1120-aws Exit status: 100 Duration: 0.908210039139 stdout: Reading package lists... Building dependency tree... Reading state information... xfsprogs is already the newest version (4.9.0+nmu1ubuntu2). xfsprogs set to manually installed. git is already the newest version (1:2.17.1-1ubuntu0.9). git set to manually installed. Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: linux-modules-extra-4.15.0-1120-aws : Depends: linux-image-unsigned-4.15.0-1120-aws but it is not going to be installed stderr: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to correct problems, you have held broken packages. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1960871/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1871015] Re: test_vxlan_under_vrf.sh in net from ubuntu_kernel_selftests failed with H (Check VM connectivity through VXLAN (underlay in the default VRF) [FAIL])
Found also on 2022.01.03/impish/linux-aws: 5.13.0-1012.13 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1871015 Title: test_vxlan_under_vrf.sh in net from ubuntu_kernel_selftests failed with H (Check VM connectivity through VXLAN (underlay in the default VRF) [FAIL]) Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Status in linux source package in Hirsute: Confirmed Bug description: Issue found with GCP 5.3.0-1017.18~18.04.1 # selftests: net: test_vxlan_under_vrf.sh # Checking HV connectivity [ OK ] # Check VM connectivity through VXLAN (underlay in the default VRF) [FAIL] not ok 25 selftests: net: test_vxlan_under_vrf.sh # exit=1 The failure is different from bug 1837348 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1871015/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1923104] Re: Include Infiniband Peer Memory interface
Tested on Focal 5.4.0-97.110, confirmed inbox peer memory interface is working. ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1923104 Title: Include Infiniband Peer Memory interface Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: Fix Committed Bug description: The peer_memory_client scheme allows a driver to register with the ib_umem system that it has the ability to understand user virtual address ranges that are not compatible with get_user_pages(). For instance VMAs created with io_remap_pfn_range(), or other driver special VMA. For ranges the interface understands it can provide a DMA mapped sg_table for use by the ib_umem, allowing user virtual ranges that cannot be supported by get_user_pages() to be used as umems for RDMA. This is designed to preserve the kABI, no functions or structures are changed, only new symbols are added: ib_register_peer_memory_client ib_unregister_peer_memory_client ib_umem_activate_invalidation_notifier ib_umem_get_peer And a bitfield in struct ib_umem uses more bits. This interface is compatible with the two out of tree GPU drivers: https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_peerdirect.c https://github.com/Mellanox/nv_peer_memory/blob/master/nv_peer_mem.c To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1923104/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1958534] Re: building of linux-signed package failing on arm64
Patches have been applied and bionic/linux-signed-aws now builds successfully -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-aws in Ubuntu. https://bugs.launchpad.net/bugs/1958534 Title: building of linux-signed package failing on arm64 Status in linux-signed-aws package in Ubuntu: Fix Committed Status in linux-signed-aws source package in Bionic: Fix Committed Bug description: dpkg-buildpackage - dpkg-buildpackage: info: source package linux-signed-aws dpkg-buildpackage: info: source version 4.15.0-1119.126 dpkg-buildpackage: info: source distribution bionic dpkg-source --before-build linux-signed-aws-4.15.0 dpkg-buildpackage: info: host architecture arm64 dpkg-source: info: using options from linux-signed-aws-4.15.0/debian/source/options: --diff-ignore --tar-ignore fakeroot debian/rules clean sed debian/control \ -e "s/@ABI@/4.15.0-1119/g" \ -e "s/@UNSIGNED_SRC_PACKAGE@/linux-aws/g" \ -e "s/@UNSIGNED_SRC_VERSION@/4.15.0-1119.126/g" \ -e 's/@SRCPKGNAME@/linux-signed-aws/g' \ -e 's/@HEADERS_COMMON@/linux-aws-headers-4.15.0-1119/g' \ -e 's/@HEADERS_ARCH@/linux-headers-4.15.0-1119-aws/g' rm -rf ./4.15.0-1119.126 UNSIGNED SIGNED rm -f debian/linux-image-*.install\ debian/linux-image-*.preinst\ debian/linux-image-*.prerm \ debian/linux-image-*.postinst \ debian/linux-image-*.postrm rm -f debian/kernel-signed-image-*.install dh clean dh_clean debian/rules build-arch dh build-arch dh_update_autotools_config -a debian/rules override_dh_auto_build make[1]: Entering directory '/<>' ./download-signed "linux-headers-4.15.0-1119-aws" "4.15.0-1119.126" "linux-aws" Downloading http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu/dists/bionic/main/signed/linux-aws-arm64/4.15.0-1119.126/SHA256SUMS ... found Downloading http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu/dists/bionic/main/signed/linux-aws-arm64/4.15.0-1119.126/signed.tar.gz ... found Extracting 4.15.0-1119.126 ... Extracting 4.15.0-1119.126/control ... Extracting 4.15.0-1119.126/control/options ... mkdir SIGNED ( \ cd "4.15.0-1119.126" || exit 1; \ for s in *.efi.signed; do \ [ ! -f "$s" ] && continue; \ base=$(echo "$s" | sed -e 's/.efi.signed//'); \ ( \ vars="${base}.efi.vars";\ [ -f "$vars" ] && . "./$vars"; \ if [ "$GZIP" = "1" ]; then \ gzip -9 "$s"; \ mv "${s}.gz" "$s"; \ fi; \ ); \ chmod 600 "$s"; \ ln "$s" "../SIGNED/$base"; \ done; \ for s in *.opal.sig; do \ [ ! -f "$s" ] && continue; \ chmod 600 "$s"; \ base=$(echo "$s" | sed -e 's/.opal.sig//'); \ cat "$base.opal" "$s" >"../SIGNED/$base"; \ done; \ for s in *.sipl.sig; do \ [ ! -f "$s" ] && continue; \ base=$(echo "$s" | sed -e 's/.sipl.sig//'); \ cat "$base.sipl" "$s" >"../SIGNED/$base"; \ chmod 600 "../SIGNED/$base";\ done\ ) make[1]: Leaving directory '/<>' fakeroot debian/rules binary-arch dh binary-arch dh_testroot -a dh_prep -a debian/rules override_dh_auto_install make[1]: Entering directory '/<>' for signed in "SIGNED"/*; do \ flavour=$(echo "$signed" | sed -e "s@.*-4.15.0-1119-@@"); \ instfile=$(echo "$signed" | sed -e "s@[^/]*/@@" -e "s@-4.15.0-1119-.*@@");
[Kernel-packages] [Bug 1958534] Re: building of linux-signed package failing on arm64
Patches sent to the ML https://lists.ubuntu.com/archives/kernel-team/2022-January/127253.html https://lists.ubuntu.com/archives/kernel-team/2022-January/127251.html ** Changed in: linux-signed-aws (Ubuntu) Assignee: (unassigned) => Ian May (ian-may) ** Changed in: linux-signed-aws (Ubuntu Bionic) Assignee: (unassigned) => Ian May (ian-may) ** Changed in: linux-signed-aws (Ubuntu) Status: New => Fix Committed ** Changed in: linux-signed-aws (Ubuntu Bionic) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-aws in Ubuntu. https://bugs.launchpad.net/bugs/1958534 Title: building of linux-signed package failing on arm64 Status in linux-signed-aws package in Ubuntu: Fix Committed Status in linux-signed-aws source package in Bionic: Fix Committed Bug description: dpkg-buildpackage - dpkg-buildpackage: info: source package linux-signed-aws dpkg-buildpackage: info: source version 4.15.0-1119.126 dpkg-buildpackage: info: source distribution bionic dpkg-source --before-build linux-signed-aws-4.15.0 dpkg-buildpackage: info: host architecture arm64 dpkg-source: info: using options from linux-signed-aws-4.15.0/debian/source/options: --diff-ignore --tar-ignore fakeroot debian/rules clean sed debian/control \ -e "s/@ABI@/4.15.0-1119/g" \ -e "s/@UNSIGNED_SRC_PACKAGE@/linux-aws/g" \ -e "s/@UNSIGNED_SRC_VERSION@/4.15.0-1119.126/g" \ -e 's/@SRCPKGNAME@/linux-signed-aws/g' \ -e 's/@HEADERS_COMMON@/linux-aws-headers-4.15.0-1119/g' \ -e 's/@HEADERS_ARCH@/linux-headers-4.15.0-1119-aws/g' rm -rf ./4.15.0-1119.126 UNSIGNED SIGNED rm -f debian/linux-image-*.install\ debian/linux-image-*.preinst\ debian/linux-image-*.prerm \ debian/linux-image-*.postinst \ debian/linux-image-*.postrm rm -f debian/kernel-signed-image-*.install dh clean dh_clean debian/rules build-arch dh build-arch dh_update_autotools_config -a debian/rules override_dh_auto_build make[1]: Entering directory '/<>' ./download-signed "linux-headers-4.15.0-1119-aws" "4.15.0-1119.126" "linux-aws" Downloading http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu/dists/bionic/main/signed/linux-aws-arm64/4.15.0-1119.126/SHA256SUMS ... found Downloading http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu/dists/bionic/main/signed/linux-aws-arm64/4.15.0-1119.126/signed.tar.gz ... found Extracting 4.15.0-1119.126 ... Extracting 4.15.0-1119.126/control ... Extracting 4.15.0-1119.126/control/options ... mkdir SIGNED ( \ cd "4.15.0-1119.126" || exit 1; \ for s in *.efi.signed; do \ [ ! -f "$s" ] && continue; \ base=$(echo "$s" | sed -e 's/.efi.signed//'); \ ( \ vars="${base}.efi.vars";\ [ -f "$vars" ] && . "./$vars"; \ if [ "$GZIP" = "1" ]; then \ gzip -9 "$s"; \ mv "${s}.gz" "$s"; \ fi; \ ); \ chmod 600 "$s"; \ ln "$s" "../SIGNED/$base"; \ done; \ for s in *.opal.sig; do \ [ ! -f "$s" ] && continue; \ chmod 600 "$s"; \ base=$(echo "$s" | sed -e 's/.opal.sig//'); \ cat "$base.opal" "$s" >"../SIGNED/$base"; \ done; \ for s in *.sipl.sig; do \ [ ! -f "$s" ] && continue;
[Kernel-packages] [Bug 1958534] Re: building of linux-signed package failing on arm64
This can be resolved by applying the following patches that were added for arm64 signed support in Disco UBUNTU: [Packaging] remove handoff check for uefi signing UBUNTU: [Packaging] decompress gzipped efi images in signing tarball -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-aws in Ubuntu. https://bugs.launchpad.net/bugs/1958534 Title: building of linux-signed package failing on arm64 Status in linux-signed-aws package in Ubuntu: Fix Committed Status in linux-signed-aws source package in Bionic: Fix Committed Bug description: dpkg-buildpackage - dpkg-buildpackage: info: source package linux-signed-aws dpkg-buildpackage: info: source version 4.15.0-1119.126 dpkg-buildpackage: info: source distribution bionic dpkg-source --before-build linux-signed-aws-4.15.0 dpkg-buildpackage: info: host architecture arm64 dpkg-source: info: using options from linux-signed-aws-4.15.0/debian/source/options: --diff-ignore --tar-ignore fakeroot debian/rules clean sed debian/control \ -e "s/@ABI@/4.15.0-1119/g" \ -e "s/@UNSIGNED_SRC_PACKAGE@/linux-aws/g" \ -e "s/@UNSIGNED_SRC_VERSION@/4.15.0-1119.126/g" \ -e 's/@SRCPKGNAME@/linux-signed-aws/g' \ -e 's/@HEADERS_COMMON@/linux-aws-headers-4.15.0-1119/g' \ -e 's/@HEADERS_ARCH@/linux-headers-4.15.0-1119-aws/g' rm -rf ./4.15.0-1119.126 UNSIGNED SIGNED rm -f debian/linux-image-*.install\ debian/linux-image-*.preinst\ debian/linux-image-*.prerm \ debian/linux-image-*.postinst \ debian/linux-image-*.postrm rm -f debian/kernel-signed-image-*.install dh clean dh_clean debian/rules build-arch dh build-arch dh_update_autotools_config -a debian/rules override_dh_auto_build make[1]: Entering directory '/<>' ./download-signed "linux-headers-4.15.0-1119-aws" "4.15.0-1119.126" "linux-aws" Downloading http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu/dists/bionic/main/signed/linux-aws-arm64/4.15.0-1119.126/SHA256SUMS ... found Downloading http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu/dists/bionic/main/signed/linux-aws-arm64/4.15.0-1119.126/signed.tar.gz ... found Extracting 4.15.0-1119.126 ... Extracting 4.15.0-1119.126/control ... Extracting 4.15.0-1119.126/control/options ... mkdir SIGNED ( \ cd "4.15.0-1119.126" || exit 1; \ for s in *.efi.signed; do \ [ ! -f "$s" ] && continue; \ base=$(echo "$s" | sed -e 's/.efi.signed//'); \ ( \ vars="${base}.efi.vars";\ [ -f "$vars" ] && . "./$vars"; \ if [ "$GZIP" = "1" ]; then \ gzip -9 "$s"; \ mv "${s}.gz" "$s"; \ fi; \ ); \ chmod 600 "$s"; \ ln "$s" "../SIGNED/$base"; \ done; \ for s in *.opal.sig; do \ [ ! -f "$s" ] && continue; \ chmod 600 "$s"; \ base=$(echo "$s" | sed -e 's/.opal.sig//'); \ cat "$base.opal" "$s" >"../SIGNED/$base"; \ done; \ for s in *.sipl.sig; do \ [ ! -f "$s" ] && continue; \ base=$(echo "$s" | sed -e 's/.sipl.sig//'); \ cat "$base.sipl" "$s" >"../SIGNED/$base"; \ chmod 600 "../SIGNED/$base";\ done\ ) make[1]: Leaving directory '/<>' fakeroot debian/rules binary-arch dh binary-arch dh_testroot -a dh_prep -a debian/rules override_dh_auto_install make[1]: Entering directory '/<>' for signed in "SIGNED"/*; do \ flavour=$(echo
[Kernel-packages] [Bug 1958534] [NEW] building of linux-signed package failing on arm64
ot;; \ \ package="kernel-signed-image-$verflav-di"; \ echo "$package: adding $signed";\ echo "$signed boot" >>"debian/$package.install";\ \ package="linux-image-$verflav"; \ echo "$package: adding $signed";\ echo "$signed boot" >>"debian/$package.install";\ \ ./generate-depends linux-image-unsigned-$verflav 4.15.0-1119.126 \ linux-image-$verflav\ >>"debian/linux-image-$verflav.substvars"; \ \ for which in postinst postrm preinst prerm; do \ template="debian/templates/image.$which.in";\ script="debian/$package.$which";\ sed -e "s/@abiname@/4.15.0-1119/g" \ -e "s/@localversion@/-$flavour/g" \ -e "s/@image-stem@/$instfile/g" \ <"$template" >"$script";\ done; \ echo "interest linux-update-4.15.0-1119-$flavour" \ >"debian/$package.triggers";\ done kernel-signed-image-4.15.0-1119-SIGNED/*-di: adding SIGNED/* /bin/sh: 8: cannot create debian/kernel-signed-image-4.15.0-1119-SIGNED/*-di.install: Directory nonexistent linux-image-4.15.0-1119-SIGNED/*: adding SIGNED/* /bin/sh: 12: cannot create debian/linux-image-4.15.0-1119-SIGNED/*.install: Directory nonexistent /bin/sh: 14: cannot create debian/linux-image-4.15.0-1119-SIGNED/*.substvars: Directory nonexistent /bin/sh: 21: cannot create debian/linux-image-4.15.0-1119-SIGNED/*.postinst: Directory nonexistent /bin/sh: 21: cannot create debian/linux-image-4.15.0-1119-SIGNED/*.postrm: Directory nonexistent /bin/sh: 21: cannot create debian/linux-image-4.15.0-1119-SIGNED/*.preinst: Directory nonexistent /bin/sh: 21: cannot create debian/linux-image-4.15.0-1119-SIGNED/*.prerm: Directory nonexistent /bin/sh: 26: cannot create debian/linux-image-4.15.0-1119-SIGNED/*.triggers: Directory nonexistent debian/rules:81: recipe for target 'override_dh_auto_install' failed make[1]: *** [override_dh_auto_install] Error 2 make[1]: Leaving directory '/<>' debian/rules:45: recipe for target 'binary-arch' failed make: *** [binary-arch] Error 2 dpkg-buildpackage: error: fakeroot debian/rules binary-arch subprocess returned exit status 2 ** Affects: linux-signed-aws (Ubuntu) Importance: Undecided Assignee: Ian May (ian-may) Status: Fix Committed ** Affects: linux-signed-aws (Ubuntu Bionic) Importance: Undecided Assignee: Ian May (ian-may) Status: Fix Committed ** Also affects: linux-signed-aws (Ubuntu Bionic) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-aws in Ubuntu. https://bugs.launchpad.net/bugs/1958534 Title: building of linux-signed package failing on arm64 Status in linux-signed-aws package in Ubuntu: Fix Committed Status in linux-signed-aws source package in Bionic: Fix Committed Bug description: dpkg-buildpackage - dpkg-buildpackage: info: source package linux-signed-aws dpkg-buildpackage: info: source version 4.15.0-1119.126 dpkg-buildpackage: info: source distribution bionic dpkg-source --before-build linux-signed-aws-4.15.0 dpkg-buildpackage: info: host architecture arm64 dpkg-source: info: using options from linux-signed-aws-4.15.0/debian/source/options: --diff-ignore --tar-ignore fakeroot debian/rules clean sed debian/control \ -e "s/@ABI@/4.15.0-1119/g" \ -e "s/@UNSIGNED_SRC_PACKAGE@/linux-aws/g" \ -e "s/@UNSIGNED_SRC_VERSION@/4.15.0-1119.126/g" \ -e 's/@SRCPKGNAME@/linux-signed-aws/g' \ -e 's/@HEADERS_COMMON@/linux-aws-headers-4.15.0-1119/g' \ -e 's/@HEADERS_ARCH@/linux-headers-4.15.0-1119-aws/g' rm -rf ./4.15.0-1119.126 UNSIGNED SIGNED rm -f debian/linux-image-*.ins
[Kernel-packages] [Bug 1949532] Re: ubuntu_ltp_controllers tests failing on Impish
** Tags added: aws azures sru-20211108 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1949532 Title: ubuntu_ltp_controllers tests failing on Impish Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Almost half of the ubuntu_ltp_controllers tests are failing due to the general pattern 'cgroup_name already mounted or mount point busy' causing the tests to fail. e.g. mount: /dev/cgroup: ltp_cgroup already mounted or mount point busy. cgroup_fj_function2_memory 1 TBROK: mount -t cgroup -o memory ltp_cgroup /dev/cgroup failed From investigation it seems there could be an issue with the transition to cgroup-v2. There have been rumors on the ltp mailing list that one of these days the tests could break due to the transition. Switching to cgroup-v2, likely due to a systemd update, could cause these tests to break due to different mount and cgroup hierarchy semantics. I could only reproduce a subset of the new failures we are seeing, but after setting systemd.unified_cgroup_hierarchy=0 on the kernel command line which sets cgroup back to v1, a lot of the failures I could produce went away. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1949532/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1949532] Re: ubuntu_ltp_controllers tests failing on Impish
Found on impish/linux-azure: 5.13.0-1008.9 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1949532 Title: ubuntu_ltp_controllers tests failing on Impish Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Almost half of the ubuntu_ltp_controllers tests are failing due to the general pattern 'cgroup_name already mounted or mount point busy' causing the tests to fail. e.g. mount: /dev/cgroup: ltp_cgroup already mounted or mount point busy. cgroup_fj_function2_memory 1 TBROK: mount -t cgroup -o memory ltp_cgroup /dev/cgroup failed From investigation it seems there could be an issue with the transition to cgroup-v2. There have been rumors on the ltp mailing list that one of these days the tests could break due to the transition. Switching to cgroup-v2, likely due to a systemd update, could cause these tests to break due to different mount and cgroup hierarchy semantics. I could only reproduce a subset of the new failures we are seeing, but after setting systemd.unified_cgroup_hierarchy=0 on the kernel command line which sets cgroup back to v1, a lot of the failures I could produce went away. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1949532/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1949532] Re: ubuntu_ltp_controllers tests failing on Impish
Found on impish/linux-aws: 5.13.0-1007.8 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1949532 Title: ubuntu_ltp_controllers tests failing on Impish Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Almost half of the ubuntu_ltp_controllers tests are failing due to the general pattern 'cgroup_name already mounted or mount point busy' causing the tests to fail. e.g. mount: /dev/cgroup: ltp_cgroup already mounted or mount point busy. cgroup_fj_function2_memory 1 TBROK: mount -t cgroup -o memory ltp_cgroup /dev/cgroup failed From investigation it seems there could be an issue with the transition to cgroup-v2. There have been rumors on the ltp mailing list that one of these days the tests could break due to the transition. Switching to cgroup-v2, likely due to a systemd update, could cause these tests to break due to different mount and cgroup hierarchy semantics. I could only reproduce a subset of the new failures we are seeing, but after setting systemd.unified_cgroup_hierarchy=0 on the kernel command line which sets cgroup back to v1, a lot of the failures I could produce went away. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1949532/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
As I was bisecting the commits, I was attempting to take advantage of parallelism. While my test kernel was building I would deploy a clean AWS r5.metal instance. I started seeing test kernels boot that I wouldn't expect to boot. So I decided as a sanity test, I would deploy an r5.metal instance, let it sit idle for 20 minutes and then install the known problematic 4.15.0-1113-aws kernel. Sure enough it booted fine. Tried the same thing again with letting it sit idle 20 mins and it worked again. So this does appear to be a race condition. I think this also explains some of the erratic test results I've seen while looking at this bug. Fortunately the console output gave us some definitive proof as to where the problem was occurring. With that being said, it appears I have found the offending commits. PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Enforce MSI[X] entry updates to be visible https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux- aws/+git/bionic/commit/?id=27571f5ea1dd074924b41a455c50dc2278e8c2b7 https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux- aws/+git/bionic/commit/?id=2478f358c2b35fea04e005447ce99ad8dc53fd5d More specifically the hang is introduced by 'PCI/MSI: Enforce that MSI-X table entry is masked for update', but it isn't a clean revert without reverting the other commit. So for a quick test confirmation I reverted both. I have not had a chance to determine why these commits are causing the problem, but with these reverted in a test build on top of 4.15.0-1113-aws, I can migrate from 5.4 to 4.15 as soon as the instance is available. I've done at least 6 attempts now and all have passed and doing the same steps without the reverts all have hung(unless I wait 20 mins). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Hi Mauricio, Thanks for getting this info. This is very helpful! I see a few potential patches between 4.15.0-159.167 and 4.15.0-160.168 that could be related to the hang. This will help greatly with the bisect. Ian -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1830585] Re: cpuset_memory_spread from controllers test suite in LTP failed (hog the memory on the unexpected node)
Found on bionic/linux-oracle-5.4: 5.4.0-1056.60~18.04.1 - BM.Standard2.52 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1830585 Title: cpuset_memory_spread from controllers test suite in LTP failed (hog the memory on the unexpected node) Status in ubuntu-kernel-tests: Triaged Status in linux package in Ubuntu: Confirmed Status in linux-azure package in Ubuntu: New Status in linux source package in Bionic: New Status in linux-azure source package in Bionic: New Status in linux source package in Disco: Won't Fix Status in linux-azure source package in Disco: Won't Fix Status in linux source package in Focal: New Status in linux-azure source package in Focal: New Status in linux source package in Hirsute: New Status in linux-azure source package in Hirsute: New Bug description: Test failed with: cpuset_memory_spread 7 TFAIL: hog the memory on the unexpected node(FilePages_For_Nodes(KB): _0: 2276 _1: 102428, Expect Nodes: 1). <<>> tag=cpuset_memory_spread stime=1558937747 cmdline=" cpuset_memory_spread_testset.sh" contacts="" analysis=exit <<>> 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0993112 s, 1.1 GB/s cpuset_memory_spread 1 TPASS: Cpuset memory spread page test succeeded. cpuset_memory_spread 3 TPASS: Cpuset memory spread page test succeeded. cpuset_memory_spread 5 TPASS: Cpuset memory spread page test succeeded. cpuset_memory_spread 7 TFAIL: hog the memory on the unexpected node(FilePages_For_Nodes(KB): _0: 2276 _1: 102428, Expect Nodes: 1). cpuset_memory_spread 9 TPASS: Cpuset memory spread page test succeeded. cpuset_memory_spread 11 TPASS: Cpuset memory spread page test succeeded. cpuset_memory_spread 13 TPASS: Cpuset memory spread page test succeeded. <<>> initiation_status="ok" duration=10 termination_type=exited termination_id=1 corefile=no cutime=364 cstime=383 <<>> ProblemType: Bug DistroRelease: Ubuntu 19.04 Package: linux-image-5.0.0-15-generic 5.0.0-15.16 ProcVersionSignature: User Name 5.0.0-15.16-generic 5.0.6 Uname: Linux 5.0.0-15-generic x86_64 AlsaDevices: total 0 crw-rw 1 root audio 116, 1 May 27 05:39 seq crw-rw 1 root audio 116, 33 May 27 05:39 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.10-0ubuntu27 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Mon May 27 06:16:49 2019 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL360 Gen9 PciMultimedia: ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-15-generic root=UUID=6422cfdd-2a69-4c0b-9784-6809a77ab980 ro RelatedPackageVersions: linux-restricted-modules-5.0.0-15-generic N/A linux-backports-modules-5.0.0-15-generic N/A linux-firmware1.178.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 04/25/2017 dmi.bios.vendor: HP dmi.bios.version: P89 dmi.board.name: ProLiant DL360 Gen9 dmi.board.vendor: HP dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP89:bd04/25/2017:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL360 Gen9 dmi.product.sku: 780020-S01 dmi.sys.vendor: HP To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1830585/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1876687] Re: function traceon/off triggers in ftace from ubuntu_kernel_selftests failed on B/F
Found on bionic/linux-gcp-fips: 4.15.0-2020.22 - n1-highcpu-4 ** Tags added: gcp sru-20210927 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1876687 Title: function traceon/off triggers in ftace from ubuntu_kernel_selftests failed on B/F Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Issue found on Focal 5.4.0-29.33 with node amaura (passed on rizzo, rizzo failed with other failures) # [27] ftrace - test for function traceon/off triggers [FAIL] Need to retest on amaura to check if this is just a glitch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1876687/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Mauricio, Interesting update, I agree that we need more info as to what the state is when the instance won't boot switching to the new 4.15 kernel. I'll check with my team in the morning and see if we can get additional info from AWS I was trying a few more scenarios this evening the first being the most interesting. Scenario 1 start with 5.4.0-1056-aws install 5.4.0-1058-aws reboot confirm 5.4.0-1058-aws booted reboot AGAIN install 4.15.0-1113-aws reboot machine booted 4.15.0-1113-aws successfully Scenario 2 start with 5.4.0-1056-aws install 4.15.0-1112-aws reboot install 4.15.0-1113-aws reboot confirmed 4.15.0-1113-aws booted then booted back into 5.4.0-1056-aws removed 4.15.0-1112-aws and 4.15.0-1113-aws rebooted again for good measure confirmed still running 5.4.0-1056-aws installed 4.15.0-1113-aws rebooted 4.15.0-1113-aws successfully loaded -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Just want to add an update. I haven't been able to replicate successfully booting 4.15.0-1113-aws from 5.4.0-1058-aws, so I'm questioning whether I made a mistake the time I thought it was successful. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Thanks for the in-depth update Mauricio! Is there any investigation you'd like me to specifically target? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
** Description changed: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal - flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 - kernel installation, but still wouldn't boot after installing the 4.15 - kernel. + flush. Removed 'discard' from mount options and rebooted 5.4 kernel + prior to 4.15 kernel installation, but still wouldn't boot after + installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Confirmed it does work to first upgrade bionic/linux-5.4 from 5.4.0-1056-aws to 5.4.0-1058-aws and then update to 4.15.0-1113-aws -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
** Description changed: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- - aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. + aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html - But after successfully updating to the 4.15 without those patches - applied, I can then upgrade to a 4.15 kernel with the above patches - included, and the instance will boot properly. + With that being said, after successfully updating to the 4.15 without + those patches applied, I can then upgrade to a 4.15 kernel with the + above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
** Description changed: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a test kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. + + I have been unable to capture a stack trace using 'aws get-console- + output'. I enabled kdump and was unable to replicate the failure. So + there must be some sort of race with either ext4 and/or nvme. ** Description changed: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches - applied, I can then upgrade to a test kernel with the above patches + applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. I have been unable to capture a stack trace using 'aws get-console- output'. I enabled kdump and was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. ** Description changed: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 - kernel installation, but still wouldn't boot. + kernel installation, but still wouldn't boot after installing the 4.15 + kernel. I have been unable to capture a stack trace using 'aws get-console- - output'. I enabled kdump and was unable to replicate the failure. So + output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console- output'. After enabling
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Have been unable to capture a stack trace using 'aws get-console- output'. Enabled kdump and was unable to replicate the failed boot, which makes this feel like a race condition with NVME. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a test kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
** Description changed: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. + + If I remove these patches the instance correctly boots the 4.15 kernel + + https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html + + But after successfully updating to the 4.15 without those patches + applied, I can then upgrade to a test kernel with the above patches + included, and the instance will boot properly. + + This problem only appears on metal instances, which uses NVME instead of + XVDA devices. + + AWS instances also use the 'discard' mount option with ext4, thought + maybe there could be a race condition between ext4 discard and journal + flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 + kernel installation, but still wouldn't boot. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel- team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a test kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1946149] [NEW] Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal
Public bug reported: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. ** Affects: linux-aws (Ubuntu) Importance: Undecided Status: New ** Package changed: ubuntu => linux-aws (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1946149 Title: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal Status in linux-aws package in Ubuntu: New Bug description: When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1931325] Re: cfs_bandwidth01 in sched from ubuntu_ltp_stable failed on B-4.15
Found on: bionic/linux-aws: 4.15.0-.118 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1931325 Title: cfs_bandwidth01 in sched from ubuntu_ltp_stable failed on B-4.15 Status in ubuntu-kernel-tests: Confirmed Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Confirmed Bug description: [Impact] Test case cfs_bandwidth01 in LTP sched test suite is a reproducer of a CFS unthrottle_cfs_rq() issue (fe61468b2cbc2b sched/fair: Fix enqueue_task_fair warning). This test triggers a warning on our 4.15 kernel: LTP: starting cfs_bandwidth01 (cfs_bandwidth01 -i 5) [ cut here ] rq->tmp_alone_branch != >leaf_cfs_rq_list WARNING: CPU: 0 PID: 0 at /build/linux-fYK9kF/linux-4.15.0/kernel/sched/fair.c:393 unthrottle_cfs_rq+0x16f/0x200 Modules linked in: input_leds joydev serio_raw mac_hid qemu_fw_cfg kvm_intel kvm irqbypass sch_fq_codel binfmt_misc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse virtio_blk pata_acpi floppy virtio_net i2c_piix4 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-144-generic #148-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:unthrottle_cfs_rq+0x16f/0x200 RSP: 0018:989ebfc03e80 EFLAGS: 00010082 RAX: RBX: 989eb4c6ac00 RCX: RDX: 0005 RSI: acb63c4d RDI: 0046 RBP: 989ebfc03ea8 R08: 00af39e61b33 R09: acb63c20 R10: R11: 0001 R12: 989eb57fe400 R13: 989ebfc21900 R14: 0001 R15: 0001 FS: () GS:989ebfc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 55593258d618 CR3: 7a044000 CR4: 06f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: distribute_cfs_runtime+0xc3/0x110 sched_cfs_period_timer+0xff/0x220 ? sched_cfs_slack_timer+0xd0/0xd0 __hrtimer_run_queues+0xdf/0x230 hrtimer_interrupt+0xa0/0x1d0 smp_apic_timer_interrupt+0x6f/0x140 apic_timer_interrupt+0x90/0xa0 RIP: 0010:native_safe_halt+0x12/0x20 RSP: 0018:ac603e28 EFLAGS: 0246 ORIG_RAX: ff11 RAX: abbc9280 RBX: RCX: RDX: RSI: RDI: RBP: ac603e28 R08: 00af39850067 R09: 989e73749d00 R10: R11: 7fff R12: R13: R14: R15: ? __sched_text_end+0x1/0x1 default_idle+0x20/0x100 arch_cpu_idle+0x15/0x20 default_idle_call+0x23/0x30 do_idle+0x172/0x1f0 cpu_startup_entry+0x73/0x80 rest_init+0xae/0xb0 start_kernel+0x4dc/0x500 x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x74/0x77 secondary_startup_64+0xa5/0xb0 Code: 50 09 00 00 49 39 85 60 09 00 00 74 68 80 3d 3a 6e 54 01 00 75 5f 31 db 48 c7 c7 c0 3d 2d ac c6 05 28 6e 54 01 01 e8 11 36 fc ff <0f> 0b 48 85 db 74 43 49 8b 85 78 09 00 00 49 39 85 70 09 00 00 ---[ end trace b6b9a70bc2945c0c ]--- [Fix] Base on the test case description, we will need these fixes: * fe61468b2cbc2b sched/fair: Fix enqueue_task_fair warning * b34cb07dde7c23 sched/fair: Fix enqueue_task_fair() warning some more * 39f23ce07b9355 sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list * 6d4d22468dae3d sched/fair: Reorder enqueue/dequeue_task_fair path * 5ab297bab98431 sched/fair: Fix reordering of enqueue/dequeue_task_fair() Backport needed for Bionic since we're missing some new variables / coding style changes introduced in the following commits (and their corresponding fixes): * 97fb7a0a8944bd sched: Clean up and harmonize the coding style of the scheduler code base * 9f68395333ad7f sched/pelt: Add a new runnable average signal * 6212437f0f6043 sched/fair: Fix runnable_avg for throttled cfs * 43e9f7f231e40e sched/fair: Start tracking SCHED_IDLE tasks count in cfs_rq I have also searched in the upstream tree to see if there is any other commit claim to be a fix of these but didn't see any. [Test] Test kernel can be found here: https://people.canonical.com/~phlin/kernel/lp-1931325-cfs_bandwidth01/ With these patches applied, the test can pass without triggering this warning.
[Kernel-packages] [Bug 1940261] Re: ubuntu_seccomp 11-basic-basic_errors failure on X/oracle
Found on: bionic/linux-aws: 4.15.0-.118 ** Tags added: aws -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-oracle in Ubuntu. https://bugs.launchpad.net/bugs/1940261 Title: ubuntu_seccomp 11-basic-basic_errors failure on X/oracle Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Status in linux-oracle package in Ubuntu: New Status in linux source package in Xenial: Incomplete Status in linux-oracle source package in Xenial: New Status in linux source package in Bionic: Incomplete Status in linux-oracle source package in Bionic: New Bug description: Xenial/Oracle 4.15.0-1079.87~16.04.1 fails 11-basic-basic_errors test from ubuntu_seccomp on all Oracle cloud instances: batch name: 11-basic-basic_errors test mode: c test type: basic Test 11-basic-basic_errors%%001-1 result: FAILURE 11-basic-basic_errors rc=255 Base kernel bionic/linux-oracle/4.15.0-1079.87 is OK. Previous cycle (xenial/linux-oracle/4.15.0-1077.85~16.04.1) is OK, so this looks like regression. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1940261/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1932065] Re: Upstream v5.9 introduced 'module' patches that removed exported symbols
** Description changed: SRU Justification: [Impact] * The following patches removed an exported symbol that will cause potential disruption and breakage for customers modules: inherit TAINT_PROPRIETARY_MODULE modules: return licensing information from find_symbol modules: rename the licence field in struct symsearch to license modules: unexport __module_address modules: unexport __module_text_address modules: mark each_symbol_section static modules: mark find_symbol static modules: mark ref_module static [Fix] * Temporarily revert as SAUCE patches to allow customers time to make necessary changes to support eventual patch changes. [Test Plan] - * none + * Check symbols on running kernel + sudo grep -e ' ref_module' -e ' find_symbol' -e ' each_symbol_section$' -e ' __module_address' -e ' __module_text_address' /proc/kallsyms + + * Check symbols on all installed kernels + sudo grep -e ' ref_module' -e ' find_symbol' -e ' each_symbol_section$' -e ' __module_address' -e ' __module_text_address' /boot/System.map-* [Where problems could occur] * The new functionality provided by patches will be removed, since we aren't removing existing functionality the risk should be low. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1932065 Title: Upstream v5.9 introduced 'module' patches that removed exported symbols Status in linux package in Ubuntu: New Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Released Status in linux source package in Groovy: Fix Committed Bug description: SRU Justification: [Impact] * The following patches removed an exported symbol that will cause potential disruption and breakage for customers modules: inherit TAINT_PROPRIETARY_MODULE modules: return licensing information from find_symbol modules: rename the licence field in struct symsearch to license modules: unexport __module_address modules: unexport __module_text_address modules: mark each_symbol_section static modules: mark find_symbol static modules: mark ref_module static [Fix] * Temporarily revert as SAUCE patches to allow customers time to make necessary changes to support eventual patch changes. [Test Plan] * Check symbols on running kernel sudo grep -e ' ref_module' -e ' find_symbol' -e ' each_symbol_section$' -e ' __module_address' -e ' __module_text_address' /proc/kallsyms * Check symbols on all installed kernels sudo grep -e ' ref_module' -e ' find_symbol' -e ' each_symbol_section$' -e ' __module_address' -e ' __module_text_address' /boot/System.map-* [Where problems could occur] * The new functionality provided by patches will be removed, since we aren't removing existing functionality the risk should be low. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1932065/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp