[Kernel-packages] [Bug 2062380] Re: Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper

2024-04-18 Thread Ian May
** Summary changed:

- Using a 6.8 kernel modprobe nvidia hangs on Grace Hopper
+ Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper

** Also affects: nvidia-graphics-drivers-535-server (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: nvidia-graphics-drivers-535-server (Ubuntu)
   Status: New => Confirmed

** Changed in: nvidia-graphics-drivers-550-server (Ubuntu)
   Status: New => Confirmed

** Description changed:

  Using both -generic and -nvidia 6.8 kernels I'm seeing a hang when I
  load the nvidia driver.
+ 
+ $ sudo dmidecode -t 0
+ # dmidecode 3.5
+ Getting SMBIOS data from sysfs.
+ SMBIOS 3.6.0 present.
+ # SMBIOS implementations newer than version 3.5.0 are not
+ # fully supported by this version of dmidecode.
+ 
+ Handle 0x0001, DMI type 0, 26 bytes
+ BIOS Information
+   Vendor: NVIDIA
+   Version: 01.02.01
+   Release Date: 20240207
+   ROM Size: 64 MB
+   Characteristics:
+   PCI is supported
+   PNP is supported
+   BIOS is upgradeable
+   BIOS shadowing is allowed
+   Boot from CD is supported
+   Selectable boot is supported
+   Serial services are supported (int 14h)
+   ACPI is supported
+   Targeted content distribution is supported
+   UEFI is supported
+   Firmware Revision: 0.0
  
  [  382.938326] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  [  382.946075] rcu: 53-...0: (4 ticks this GP) 
idle=1c2c/1/0x4000 softirq=4866/4868 fqs=14124
  [  382.955683] rcu:  hardirqs   softirqs   csw/system
  [  382.961378] rcu:  number:0  00
  [  382.967071] rcu: cputime:0  00   ==> 
30026(ms)
  [  382.974189] rcu: (detected by 52, t=60034 jiffies, g=24469, q=1199 
ncpus=72)
  [  392.982095] rcu: rcu_preempt kthread starved for 9994 jiffies! g24469 f0x0 
RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31
  [  392.992769] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
OOM is now expected behavior
  
- 
  After seeing this, I Enabled kdump and set kernel.panic_on_rcu_stall = 1
  
  KDUMP INFO
  WARNING: cpu 54: cannot find NT_PRSTATUS note
-   KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k  [TAINTED]
- DUMPFILE: /var/crash/202404172139/dump.202404172139  [PARTIAL DUMP]
- CPUS: 72
- DATE: Wed Apr 17 21:39:13 UTC 2024
-   UPTIME: 00:06:10
+   KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k  [TAINTED]
+ DUMPFILE: /var/crash/202404172139/dump.202404172139  [PARTIAL DUMP]
+ CPUS: 72
+ DATE: Wed Apr 17 21:39:13 UTC 2024
+   UPTIME: 00:06:10
  LOAD AVERAGE: 0.68, 0.63, 0.28
-TASKS: 854
- NODENAME: hinyari
-  RELEASE: 6.8.0-1005-nvidia-64k
-  VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024
-  MACHINE: aarch64  (unknown Mhz)
-   MEMORY: 479.7 GB
-PANIC: "Kernel panic - not syncing: RCU Stall"
-  PID: 0
-  COMMAND: "swapper/21"
- TASK: 82026880  (1 of 72)  [THREAD_INFO: 82026880]
-  CPU: 21
-STATE: TASK_RUNNING (PANIC)
+    TASKS: 854
+ NODENAME: hinyari
+  RELEASE: 6.8.0-1005-nvidia-64k
+  VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024
+  MACHINE: aarch64  (unknown Mhz)
+   MEMORY: 479.7 GB
+    PANIC: "Kernel panic - not syncing: RCU Stall"
+  PID: 0
+  COMMAND: "swapper/21"
+ TASK: 82026880  (1 of 72)  [THREAD_INFO: 82026880]
+  CPU: 21
+    STATE: TASK_RUNNING (PANIC)
  
  [  300.313144] nvidia: loading out-of-tree module taints kernel.
  [  300.313153] nvidia: module verification failed: signature and/or required 
key missing - tainting kernel
  [  300.316694] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 506
- [  300.316699] 
+ [  300.316699]
  [  360.323454] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  [  360.331206] rcu: 54-...0: (24 ticks this GP) 
idle=742c/1/0x4000 softirq=4931/4933 fqs=13148
  [  360.340903] rcu:  hardirqs   softirqs   csw/system
  [  360.346597] rcu:  number:0  00
  [  360.352291] rcu: cputime:0  00   ==> 
30031(ms)
  [  360.359408] rcu: (detected by 21, t=60038 jiffies, g=25009, q=1123 
ncpus=72)
  [  360.366704] Sending NMI from CPU 21 to CPUs 54:
  [  370.367310] rcu: rcu_preempt kthread starved for 9993 jiffies! g25009 f0x0 
RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31
  [  370.377983] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
OOM is now expected behavior.
  [  370.387322] rcu: RCU grace-period kthread stack dump:
  [  370.392482] task:rcu_preempt state:I stack:0 pid:17tgid:17
ppid:2  flags:0x0008
  [  370.392488] Call trace:
  [ 

[Kernel-packages] [Bug 2062380] [NEW] Using a 6.8 kernel modprobe nvidia hangs on Grace Hopper

2024-04-18 Thread Ian May
Public bug reported:

Using both -generic and -nvidia 6.8 kernels I'm seeing a hang when I
load the nvidia driver.

[  382.938326] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  382.946075] rcu: 53-...0: (4 ticks this GP) 
idle=1c2c/1/0x4000 softirq=4866/4868 fqs=14124
[  382.955683] rcu:  hardirqs   softirqs   csw/system
[  382.961378] rcu:  number:0  00
[  382.967071] rcu: cputime:0  00   ==> 
30026(ms)
[  382.974189] rcu: (detected by 52, t=60034 jiffies, g=24469, q=1199 
ncpus=72)
[  392.982095] rcu: rcu_preempt kthread starved for 9994 jiffies! g24469 f0x0 
RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31
[  392.992769] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
OOM is now expected behavior


After seeing this, I Enabled kdump and set kernel.panic_on_rcu_stall = 1

KDUMP INFO
WARNING: cpu 54: cannot find NT_PRSTATUS note
  KERNEL: /usr/lib/debug/boot/vmlinux-6.8.0-1004-nvidia-64k  [TAINTED]
DUMPFILE: /var/crash/202404172139/dump.202404172139  [PARTIAL DUMP]
CPUS: 72
DATE: Wed Apr 17 21:39:13 UTC 2024
  UPTIME: 00:06:10
LOAD AVERAGE: 0.68, 0.63, 0.28
   TASKS: 854
NODENAME: hinyari
 RELEASE: 6.8.0-1005-nvidia-64k
 VERSION: #5-Ubuntu SMP PREEMPT_DYNAMIC Wed Apr 17 11:26:46 UTC 2024
 MACHINE: aarch64  (unknown Mhz)
  MEMORY: 479.7 GB
   PANIC: "Kernel panic - not syncing: RCU Stall"
 PID: 0
 COMMAND: "swapper/21"
TASK: 82026880  (1 of 72)  [THREAD_INFO: 82026880]
 CPU: 21
   STATE: TASK_RUNNING (PANIC)

[  300.313144] nvidia: loading out-of-tree module taints kernel.
[  300.313153] nvidia: module verification failed: signature and/or required 
key missing - tainting kernel
[  300.316694] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 506
[  300.316699] 
[  360.323454] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  360.331206] rcu: 54-...0: (24 ticks this GP) 
idle=742c/1/0x4000 softirq=4931/4933 fqs=13148
[  360.340903] rcu:  hardirqs   softirqs   csw/system
[  360.346597] rcu:  number:0  00
[  360.352291] rcu: cputime:0  00   ==> 
30031(ms)
[  360.359408] rcu: (detected by 21, t=60038 jiffies, g=25009, q=1123 
ncpus=72)
[  360.366704] Sending NMI from CPU 21 to CPUs 54:
[  370.367310] rcu: rcu_preempt kthread starved for 9993 jiffies! g25009 f0x0 
RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=31
[  370.377983] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
OOM is now expected behavior.
[  370.387322] rcu: RCU grace-period kthread stack dump:
[  370.392482] task:rcu_preempt state:I stack:0 pid:17tgid:17
ppid:2  flags:0x0008
[  370.392488] Call trace:
[  370.392489]  __switch_to+0xd0/0x118
[  370.392499]  __schedule+0x2a8/0x7b0
[  370.392501]  schedule+0x40/0x168
[  370.392502]  schedule_timeout+0xac/0x1e0
[  370.392505]  rcu_gp_fqs_loop+0x128/0x508
[  370.392512]  rcu_gp_kthread+0x150/0x188
[  370.392514]  kthread+0xf8/0x110
[  370.392519]  ret_from_fork+0x10/0x20
[  370.392524] rcu: Stack dump where RCU GP kthread last ran:
[  370.398128] Sending NMI from CPU 21 to CPUs 31:
[  370.398131] NMI backtrace for cpu 31
[  370.398136] CPU: 31 PID: 0 Comm: swapper/31 Kdump: loaded Tainted: G 
  OE  6.8.0-1005-nvidia-64k #5-Ubuntu
[  370.398139] Hardware name:  /P3880, BIOS 01.02.01 20240207
[  370.398140] pstate: 6349 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  370.398142] pc : cpuidle_enter_state+0xd8/0x790
[  370.398150] lr : cpuidle_enter_state+0xcc/0x790
[  370.398153] sp : 800081eefd70
[  370.398154] x29: 800081eefd70 x28:  x27: 
[  370.398157] x26:  x25: 00563d67e4e0 x24: 
[  370.398160] x23: a0a1445699f8 x22:  x21: 00563d72ece0
[  370.398162] x20: a0a144569a10 x19: 8fa4a800 x18: 800081f00030
[  370.398165] x17:  x16:  x15: ac8c73b08db0
[  370.398168] x14:  x13:  x12: 
[  370.398170] x11:  x10: 2da0fbe3d5e8c649 x9 : a0a1424fd244
[  370.398173] x8 : 820559b8 x7 :  x6 : 
[  370.398175] x5 :  x4 :  x3 : 
[  370.398178] x2 :  x1 :  x0 : 
[  370.398181] Call trace:
[  370.398183]  cpuidle_enter_state+0xd8/0x790
[  370.398185]  cpuidle_enter+0x44/0x78
[  370.398195]  cpuidle_idle_call+0x15c/0x210
[  370.398202]  do_idle+0xb0/0x130
[  370.398204]  cpu_startup_entry+0x40/0x50
[  370.398206]  secondary_start_kernel+0xec/0x130
[  370.398211]  __secondary_switched+0xc0/0xc8
[  370.399132] Kernel panic - not syncing: RCU Stall
[  370.403938] CPU: 21 PID: 0 Comm: 

[Kernel-packages] [Bug 2055712] Re: Pull-request to address bug in mm/page_alloc.c

2024-04-02 Thread Ian May
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2055712

Title:
  Pull-request to address bug in mm/page_alloc.c

Status in linux-nvidia-6.5 package in Ubuntu:
  Fix Released

Bug description:
  
  The current calculation of min_free_kbytes only uses ZONE_DMA and
  ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will
  also divide part of min_free_kbytes.This will cause the min watermark of
  ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE.

  __GFP_HIGH and PF_MEMALLOC allocations usually don't need movable zone
  pages, so just like ZONE_HIGHMEM, cap pages_min to a small value in
  __setup_per_zone_wmarks().

  On my testing machine with 16GB of memory (transparent hugepage is turned
  off by default, and movablecore=12G is configured) The following is a
  comparative test data of watermark_min

  no patchadd patch
  ZONE_DMA1   8
  ZONE_DMA32  151 709
  ZONE_NORMAL 233 1113
  ZONE_MOVABLE1434128
  min_free_kbytes 72887326

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2055712/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2055712] Re: Pull-request to address bug in mm/page_alloc.c

2024-04-02 Thread Ian May
** Changed in: linux-nvidia-6.5 (Ubuntu)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2055712

Title:
  Pull-request to address bug in mm/page_alloc.c

Status in linux-nvidia-6.5 package in Ubuntu:
  Fix Released

Bug description:
  
  The current calculation of min_free_kbytes only uses ZONE_DMA and
  ZONE_NORMAL pages,but the ZONE_MOVABLE zone->_watermark[WMARK_MIN] will
  also divide part of min_free_kbytes.This will cause the min watermark of
  ZONE_NORMAL to be too small in the presence of ZONE_MOVEABLE.

  __GFP_HIGH and PF_MEMALLOC allocations usually don't need movable zone
  pages, so just like ZONE_HIGHMEM, cap pages_min to a small value in
  __setup_per_zone_wmarks().

  On my testing machine with 16GB of memory (transparent hugepage is turned
  off by default, and movablecore=12G is configured) The following is a
  comparative test data of watermark_min

  no patchadd patch
  ZONE_DMA1   8
  ZONE_DMA32  151 709
  ZONE_NORMAL 233 1113
  ZONE_MOVABLE1434128
  min_free_kbytes 72887326

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2055712/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2059150] Re: jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper

2024-03-26 Thread Ian May
Upgrading bios firmware resolves failure

$ sudo dmidecode -t 0
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.6.0 present.
# SMBIOS implementations newer than version 3.5.0 are not
# fully supported by this version of dmidecode.

Handle 0x0001, DMI type 0, 26 bytes
BIOS Information
Vendor: NVIDIA
Version: 01.02.01
Release Date: 20240207
ROM Size: 64 MB
Characteristics:
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
Serial services are supported (int 14h)
ACPI is supported
Targeted content distribution is supported
UEFI is supported
Firmware Revision: 0.0


** Changed in: linux-nvidia-6.5 (Ubuntu)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2059150

Title:
  jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta
  Grace/Hopper

Status in linux-nvidia-6.5 package in Ubuntu:
  Invalid

Bug description:
  Output from BMC SOL console:

  Unhandled Exception from EL2
  x0 = 0x11f210305619
  x1 = 0x
  x2 = 0x
  x3 = 0x
  x4 = 0x5f972493
  x5 = 0x
  x6 = 0x
  x7 = 0x
  x8 = 0x
  x9 = 0xa0e0a03e7d6c
  x10= 0x
  x11= 0x
  x12= 0x
  x13= 0x
  x14= 0x
  x15= 0x
  x16= 0x
  x17= 0x
  x18= 0x
  x19= 0xf0f18080
  x20= 0x80009e86f6a0
  x21= 0x80009e86f720
  x22= 0x07a5a0e0a03e7d6c
  x23= 0x
  x24= 0xa0e0a3348aa0
  x25= 0xa0e0a2990008
  x26= 0xa0e0a2990008
  x27= 0xa0e04b4f5748
  x28= 0x80009e86f710
  x29= 0x80008000fe00
  x30= 0xa0e0a03e7d6c
  scr_el3= 0x0407073d
  sctlr_el3  = 0x30cd183f
  cptr_el3   = 0x00100100
  tcr_el3= 0x80853510
  daif   = 0x02c0
  mair_el3   = 0x004404ff
  spsr_el3   = 0x034000c9
  elr_el3= 0xa0e04b4f58b4
  ttbr0_el3  = 0x0078734a5001
  esr_el3= 0x622c5c1f
  far_el3= 0x9446dd42099e8148
  spsr_el1   = 0x
  elr_el1= 0x
  spsr_abt   = 0x
  spsr_und   = 0x
  spsr_irq   = 0x
  spsr_fiq   = 0x
  sctlr_el1  = 0x30d00980
  actlr_el1  = 0x
  cpacr_el1  = 0x0030
  csselr_el1 = 0x0002
  sp_el1 = 0x
  esr_el1= 0x
  ttbr0_el1  = 0x
  ttbr1_el1  = 0x
  mair_el1   = 0x
  amair_el1  = 0x
  tcr_el1= 0x
  tpidr_el1  = 0x
  tpidr_el0  = 0x8000
  tpidrro_el0= 0x
  par_el1= 0x0800
  mpidr_el1  = 0x8102
  afsr0_el1  = 0x
  afsr1_el1  = 0x
  contextidr_el1 = 0x
  vbar_el1   = 0x
  cntp_ctl_el0   = 0x
  cntp_cval_el0  = 0x0012ec91c420
  cntv_ctl_el0   = 0x
  cntv_cval_el0  = 0x
  cntkctl_el1= 0x
  sp_el0 = 0x0078732cf4f0
  isr_el1= 0x0040
  cpuectlr_el1   = 0x4000340340003000
  gicd_ispendr regs (Offsets 0x200 - 0x278)
   Offset:value
  0200:   0xUnhandled Exception in EL3.
  x30= 0x0078732c4384
  x0 = 0x
  x1 = 0x0078732cb7d8
  x2 = 0x0018
  x3 = 0x0078732b1720
  x4 = 0x
  x5 = 0x003c
  x6 = 0x0078732c9109
  x7 = 0x22000204
  x8 = 0x4000340340003000
  x9 = 0x
  x10= 0x
  x11= 0x0012ec91c420
  x12= 0x
  x13= 0x
  x14= 0x
  

[Kernel-packages] [Bug 2059150] [NEW] jammy/linux-nvidia-6.5: 6.5.0-1014.14 - Boot failure on Quanta Grace/Hopper

2024-03-26 Thread Ian May
Public bug reported:

Output from BMC SOL console:

Unhandled Exception from EL2
x0 = 0x11f210305619
x1 = 0x
x2 = 0x
x3 = 0x
x4 = 0x5f972493
x5 = 0x
x6 = 0x
x7 = 0x
x8 = 0x
x9 = 0xa0e0a03e7d6c
x10= 0x
x11= 0x
x12= 0x
x13= 0x
x14= 0x
x15= 0x
x16= 0x
x17= 0x
x18= 0x
x19= 0xf0f18080
x20= 0x80009e86f6a0
x21= 0x80009e86f720
x22= 0x07a5a0e0a03e7d6c
x23= 0x
x24= 0xa0e0a3348aa0
x25= 0xa0e0a2990008
x26= 0xa0e0a2990008
x27= 0xa0e04b4f5748
x28= 0x80009e86f710
x29= 0x80008000fe00
x30= 0xa0e0a03e7d6c
scr_el3= 0x0407073d
sctlr_el3  = 0x30cd183f
cptr_el3   = 0x00100100
tcr_el3= 0x80853510
daif   = 0x02c0
mair_el3   = 0x004404ff
spsr_el3   = 0x034000c9
elr_el3= 0xa0e04b4f58b4
ttbr0_el3  = 0x0078734a5001
esr_el3= 0x622c5c1f
far_el3= 0x9446dd42099e8148
spsr_el1   = 0x
elr_el1= 0x
spsr_abt   = 0x
spsr_und   = 0x
spsr_irq   = 0x
spsr_fiq   = 0x
sctlr_el1  = 0x30d00980
actlr_el1  = 0x
cpacr_el1  = 0x0030
csselr_el1 = 0x0002
sp_el1 = 0x
esr_el1= 0x
ttbr0_el1  = 0x
ttbr1_el1  = 0x
mair_el1   = 0x
amair_el1  = 0x
tcr_el1= 0x
tpidr_el1  = 0x
tpidr_el0  = 0x8000
tpidrro_el0= 0x
par_el1= 0x0800
mpidr_el1  = 0x8102
afsr0_el1  = 0x
afsr1_el1  = 0x
contextidr_el1 = 0x
vbar_el1   = 0x
cntp_ctl_el0   = 0x
cntp_cval_el0  = 0x0012ec91c420
cntv_ctl_el0   = 0x
cntv_cval_el0  = 0x
cntkctl_el1= 0x
sp_el0 = 0x0078732cf4f0
isr_el1= 0x0040
cpuectlr_el1   = 0x4000340340003000
gicd_ispendr regs (Offsets 0x200 - 0x278)
 Offset:value
0200:   0xUnhandled Exception in EL3.
x30= 0x0078732c4384
x0 = 0x
x1 = 0x0078732cb7d8
x2 = 0x0018
x3 = 0x0078732b1720
x4 = 0x
x5 = 0x003c
x6 = 0x0078732c9109
x7 = 0x22000204
x8 = 0x4000340340003000
x9 = 0x
x10= 0x
x11= 0x0012ec91c420
x12= 0x
x13= 0x
x14= 0x
x15= 0x0078732cf4f0
x16= 0x2200
x17= 0x0018
x18= 0x0407073d
x19= 0x007873386440
x20= 0x80009e86f6a0
x21= 0x80009e86f720
x22= 0x07a5a0e0a03e7d6c
x23= 0x
x24= 0xa0e0a3348aa0
x25= 0xa0e0a2990008
x26= 0xa0e0a2990008
x27= 0xa0e04b4f5748
x28= 0x80009e86f710
x29= 0x80008000fe00
scr_el3= 0x0407073d
sctlr_el3  = 0x30cd183f
cptr_el3   = 0x00100100
tcr_el3= 0x80853510
daif   = 0x03c0
mair_el3   = 0x004404ff
spsr_el3   = 0x834002cd
elr_el3= 0x0078732b0af4
ttbr0_el3  = 0x0078734a5001
esr_el3= 0xbe11
far_el3= 0x9446dd42099e8148
spsr_el1   = 0x
elr_el1= 0x
spsr_abt   = 0x
spsr_und   = 0x
spsr_irq   = 0x
spsr_fiq   = 0x
sctlr_el1  = 0x30d00980
actlr_el1  = 0x
cpacr_el1  = 0x0030
csselr_el1 = 0x0002
sp_el1 = 0x
esr_el1= 0x
ttbr0_el1  = 0x
ttbr1_el1  = 0x
mair_el1   = 

[Kernel-packages] [Bug 2056448] Re: hfs: weird file system free block state after creating files and removing them with a mix of i/o operations

2024-03-07 Thread Colin Ian King
** Summary changed:

- weird file system free block state after creating files and removing them 
with a mix of i/o operations
+ hfs: weird file system free block state after creating files and removing 
them with a mix of i/o operations

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056448

Title:
  hfs: weird file system free block state after creating files and
  removing them with a mix of i/o operations

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  New

Bug description:
  Summary:

  create hfs file system, loop-back mount it, run iomix stessor with
  stress-ng to exercise with a mix of file I/O operations and remove
  files at end. File system is empty but a lot of blocks are used and
  can't seem to be recovered.

  Kernel: 6.8.0-11-generic

  test case:
  sudo apt-get install hfsprogs

  dd if=/dev/zero of=fs.img bs=1M count=2048
  mkfs.hfs fs.img 
  sudo mount fs.img /mnt
  sudo mkdir /mnt/x
  df /mnt
  Filesystem 1K-blocks  Used Available Use% Mounted on
  /dev/loop6   2097128  2015   2095113   1% /mnt

  
  sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2

  df /mnt
  Filesystem 1K-blocks   Used Available Use% Mounted on
  /dev/loop6   2097128 674635   1422493  33% /mnt

  
  ls -alR /mnt/
  /mnt/:
  total 4
  drwxr-xr-x  1 root root3 Mar  7 12:37 .
  drwxr-xr-x 23 root root 4096 Feb 28 14:13 ..
  drwxr-xr-x  1 root root2 Mar  7 12:37 x

  /mnt/x:
  total 0
  drwxr-xr-x 1 root root 2 Mar  7 12:37 .
  drwxr-xr-x 1 root root 3 Mar  7 12:37 ..

  
  ..so file system is 33% full, but no files are on the file system. Something 
looks wrong here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/2056448/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2056448] Re: weird file system free block state after creating files and removing them with a mix of i/o operations

2024-03-07 Thread Colin Ian King
** Bug watch added: Linux Kernel Bug Tracker #218571
   https://bugzilla.kernel.org/show_bug.cgi?id=218571

** Also affects: linux via
   https://bugzilla.kernel.org/show_bug.cgi?id=218571
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056448

Title:
  weird file system free block state after creating files and removing
  them with a mix of i/o operations

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  New

Bug description:
  Summary:

  create hfs file system, loop-back mount it, run iomix stessor with
  stress-ng to exercise with a mix of file I/O operations and remove
  files at end. File system is empty but a lot of blocks are used and
  can't seem to be recovered.

  Kernel: 6.8.0-11-generic

  test case:
  sudo apt-get install hfsprogs

  dd if=/dev/zero of=fs.img bs=1M count=2048
  mkfs.hfs fs.img 
  sudo mount fs.img /mnt
  sudo mkdir /mnt/x
  df /mnt
  Filesystem 1K-blocks  Used Available Use% Mounted on
  /dev/loop6   2097128  2015   2095113   1% /mnt

  
  sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2

  df /mnt
  Filesystem 1K-blocks   Used Available Use% Mounted on
  /dev/loop6   2097128 674635   1422493  33% /mnt

  
  ls -alR /mnt/
  /mnt/:
  total 4
  drwxr-xr-x  1 root root3 Mar  7 12:37 .
  drwxr-xr-x 23 root root 4096 Feb 28 14:13 ..
  drwxr-xr-x  1 root root2 Mar  7 12:37 x

  /mnt/x:
  total 0
  drwxr-xr-x 1 root root 2 Mar  7 12:37 .
  drwxr-xr-x 1 root root 3 Mar  7 12:37 ..

  
  ..so file system is 33% full, but no files are on the file system. Something 
looks wrong here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/2056448/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2056451] [NEW] hfs: concurrent create/unlink can trip -EEXIST on non-existent files

2024-03-07 Thread Colin Ian King
Public bug reported:

Summary:

create hfs file system, loop-back mount it, run filename stressor with
stress-ng to exercise filename create/stat/unlink and we get unexpected -EEXIST 
errors.
This can be worked around by adding a sync() call after the unlink() to ensure
metadata is sync'd.  

Kernel: 6.8.0-11-generic

test case:
sudo apt-get install hfsprogs

dd if=/dev/zero of=fs.img bs=1M count=2048
mkfs.hfs fs.img 
sudo mount fs.img /mnt
sudo mkdir /mnt/x
sudo stress-ng --temp-path /mnt/x --filename 8 --filename-opts posix -t 20
stress-ng: info:  [132412] setting to a 20 secs run per stressor
stress-ng: info:  [132412] dispatching hogs: 8 filename
stress-ng: fail:  [132424] filename: open failed on file of length 1 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132428] filename: open failed on file of length 20 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132423] filename: open failed on file of length 30 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132421] filename: open failed on file of length 30 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132428] filename: open failed on file of length 30 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132426] filename: open failed on file of length 23 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132425] filename: open failed on file of length 30 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132428] filename: open failed on file of length 1 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132423] filename: open failed on file of length 7 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132423] filename: open failed on file of length 11 bytes, 
errno=17 (File exists)
stress-ng: fail:  [132426] filename: open failed on file of length 24 bytes, 
errno=17 (File exists)

adding a sync() call in the stress-ng stressor fixes the issue:

git diff
diff --git a/stress-filename.c b/stress-filename.c
index a64898fb1..b8266f91e 100644
--- a/stress-filename.c
+++ b/stress-filename.c
@@ -308,6 +308,7 @@ static void stress_filename_test(
VOID_RET(int, shim_stat(filename, ));
 
(void)shim_unlink(filename);
+   (void)sync();
}
 
/* exercise dcache lookup of non-existent filename */


sudo stress-ng --temp-path /mnt/x --filename 8 --filename-opts posix -t 20
stress-ng: info:  [132461] setting to a 20 secs run per stressor
stress-ng: info:  [132461] dispatching hogs: 8 filename
stress-ng: info:  [132461] skipped: 0
stress-ng: info:  [132461] passed: 8: filename (8)
stress-ng: info:  [132461] failed: 0
stress-ng: info:  [132461] metrics untrustworthy: 0
stress-ng: info:  [132461] successful run completed in 20.05 secs

The sync should not be required by the way, I just added it to
illustrate that there is a racy metadata sync issue in hfs.

** Affects: linux
 Importance: Unknown
 Status: Unknown

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

** Summary changed:

- hfs: concurrent create/unlink can trip -EEXIST on files
+ hfs: concurrent create/unlink can trip -EEXIST on non-existent files

** Bug watch added: Linux Kernel Bug Tracker #218570
   https://bugzilla.kernel.org/show_bug.cgi?id=218570

** Also affects: linux via
   https://bugzilla.kernel.org/show_bug.cgi?id=218570
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056451

Title:
  hfs: concurrent create/unlink can trip -EEXIST on non-existent files

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  New

Bug description:
  Summary:

  create hfs file system, loop-back mount it, run filename stressor with
  stress-ng to exercise filename create/stat/unlink and we get unexpected 
-EEXIST errors.
  This can be worked around by adding a sync() call after the unlink() to ensure
  metadata is sync'd.  

  Kernel: 6.8.0-11-generic

  test case:
  sudo apt-get install hfsprogs

  dd if=/dev/zero of=fs.img bs=1M count=2048
  mkfs.hfs fs.img 
  sudo mount fs.img /mnt
  sudo mkdir /mnt/x
  sudo stress-ng --temp-path /mnt/x --filename 8 --filename-opts posix -t 20
  stress-ng: info:  [132412] setting to a 20 secs run per stressor
  stress-ng: info:  [132412] dispatching hogs: 8 filename
  stress-ng: fail:  [132424] filename: open failed on file of length 1 bytes, 
errno=17 (File exists)
  stress-ng: fail:  [132428] filename: open failed on file of length 20 bytes, 
errno=17 (File exists)
  stress-ng: fail:  [132423] filename: open failed on file of length 30 bytes, 
errno=17 (File exists)
  stress-ng: fail:  [132421] filename: open failed on file of length 30 bytes, 
errno=17 (File exists)
  stress-ng: fail:  [132428] filename: open failed on file of length 30 bytes, 
errno=17 (File exists)
  stress-ng: fail:  [132426] filename: open failed on file of length 23 bytes, 
errno=17 (File exists)
  stress-ng: fail:  [132425] filename: open failed 

[Kernel-packages] [Bug 2056448] [NEW] weird file system free block state after creating files and removing them with a mix of i/o operations

2024-03-07 Thread Colin Ian King
Public bug reported:

Summary:

create hfs file system, loop-back mount it, run iomix stessor with
stress-ng to exercise with a mix of file I/O operations and remove files
at end. File system is empty but a lot of blocks are used and can't seem
to be recovered.

Kernel: 6.8.0-11-generic

test case:
sudo apt-get install hfsprogs

dd if=/dev/zero of=fs.img bs=1M count=2048
mkfs.hfs fs.img 
sudo mount fs.img /mnt
sudo mkdir /mnt/x
df /mnt
Filesystem 1K-blocks  Used Available Use% Mounted on
/dev/loop6   2097128  2015   2095113   1% /mnt


sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2

df /mnt
Filesystem 1K-blocks   Used Available Use% Mounted on
/dev/loop6   2097128 674635   1422493  33% /mnt


ls -alR /mnt/
/mnt/:
total 4
drwxr-xr-x  1 root root3 Mar  7 12:37 .
drwxr-xr-x 23 root root 4096 Feb 28 14:13 ..
drwxr-xr-x  1 root root2 Mar  7 12:37 x

/mnt/x:
total 0
drwxr-xr-x 1 root root 2 Mar  7 12:37 .
drwxr-xr-x 1 root root 3 Mar  7 12:37 ..


..so file system is 33% full, but no files are on the file system. Something 
looks wrong here.

** Affects: linux (Ubuntu)
 Importance: High
 Status: New

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056448

Title:
  weird file system free block state after creating files and removing
  them with a mix of i/o operations

Status in linux package in Ubuntu:
  New

Bug description:
  Summary:

  create hfs file system, loop-back mount it, run iomix stessor with
  stress-ng to exercise with a mix of file I/O operations and remove
  files at end. File system is empty but a lot of blocks are used and
  can't seem to be recovered.

  Kernel: 6.8.0-11-generic

  test case:
  sudo apt-get install hfsprogs

  dd if=/dev/zero of=fs.img bs=1M count=2048
  mkfs.hfs fs.img 
  sudo mount fs.img /mnt
  sudo mkdir /mnt/x
  df /mnt
  Filesystem 1K-blocks  Used Available Use% Mounted on
  /dev/loop6   2097128  2015   2095113   1% /mnt

  
  sudo stress-ng --temp-path /mnt/x --iomix 1 -t 2

  df /mnt
  Filesystem 1K-blocks   Used Available Use% Mounted on
  /dev/loop6   2097128 674635   1422493  33% /mnt

  
  ls -alR /mnt/
  /mnt/:
  total 4
  drwxr-xr-x  1 root root3 Mar  7 12:37 .
  drwxr-xr-x 23 root root 4096 Feb 28 14:13 ..
  drwxr-xr-x  1 root root2 Mar  7 12:37 x

  /mnt/x:
  total 0
  drwxr-xr-x 1 root root 2 Mar  7 12:37 .
  drwxr-xr-x 1 root root 3 Mar  7 12:37 ..

  
  ..so file system is 33% full, but no files are on the file system. Something 
looks wrong here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056448/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages

2024-03-02 Thread Colin Ian King
** Attachment added: "screen shot of my noble VM on a noble server"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+attachment/5751725/+files/Screenshot%20from%202024-03-02%2022-57-52.png

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2055310

Title:
  dmesg spammed by virtui-fs and 9pnet-virtio messages

Status in linux package in Ubuntu:
  New

Bug description:
  Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel
  messages after boot (running instances in a VM using virt-manager)

  uname -a
  Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 
30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  [   30.638354] virtio-fs: tag  not found
  [   30.642316] 9pnet_virtio: no channels available for device config
  [   35.897615] virtio-fs: tag  not found
  [   35.901568] 9pnet_virtio: no channels available for device config
  [   41.141860] virtio-fs: tag  not found
  [   41.145513] 9pnet_virtio: no channels available for device config
  [   46.382040] virtio-fs: tag  not found
  [   46.386141] 9pnet_virtio: no channels available for device config
  [   51.632229] virtio-fs: tag  not found
  [   51.635727] 9pnet_virtio: no channels available for device config

  These are annoying when logging in via the console.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages

2024-03-02 Thread Colin Ian King
Does not occur on pre-noble VMs, e.g. fine with mantic through to trusty
on all my VMs on the same host.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2055310

Title:
  dmesg spammed by virtui-fs and 9pnet-virtio messages

Status in linux package in Ubuntu:
  New

Bug description:
  Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel
  messages after boot (running instances in a VM using virt-manager)

  uname -a
  Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 
30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  [   30.638354] virtio-fs: tag  not found
  [   30.642316] 9pnet_virtio: no channels available for device config
  [   35.897615] virtio-fs: tag  not found
  [   35.901568] 9pnet_virtio: no channels available for device config
  [   41.141860] virtio-fs: tag  not found
  [   41.145513] 9pnet_virtio: no channels available for device config
  [   46.382040] virtio-fs: tag  not found
  [   46.386141] 9pnet_virtio: no channels available for device config
  [   51.632229] virtio-fs: tag  not found
  [   51.635727] 9pnet_virtio: no channels available for device config

  These are annoying when logging in via the console.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages

2024-03-02 Thread Colin Ian King
Good idea,

I've installed this on my host and it's still occurring on various VM
architectures (x86-64, ppc64el, s390x, ect). My host is noble and up to
date with updates.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2055310

Title:
  dmesg spammed by virtui-fs and 9pnet-virtio messages

Status in linux package in Ubuntu:
  New

Bug description:
  Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel
  messages after boot (running instances in a VM using virt-manager)

  uname -a
  Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 
30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  [   30.638354] virtio-fs: tag  not found
  [   30.642316] 9pnet_virtio: no channels available for device config
  [   35.897615] virtio-fs: tag  not found
  [   35.901568] 9pnet_virtio: no channels available for device config
  [   41.141860] virtio-fs: tag  not found
  [   41.145513] 9pnet_virtio: no channels available for device config
  [   46.382040] virtio-fs: tag  not found
  [   46.386141] 9pnet_virtio: no channels available for device config
  [   51.632229] virtio-fs: tag  not found
  [   51.635727] 9pnet_virtio: no channels available for device config

  These are annoying when logging in via the console.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2055310] Re: dmesg spammed by virtui-fs and 9pnet-virtio messages

2024-02-29 Thread Colin Ian King
cking@noble-amd64:~$ uname -a
Linux noble-amd64 6.8.0-11-generic #11-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 14 
00:29:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

[9.551968] virtio-fs: tag  not found
[9.555352] 9pnet_virtio: no channels available for device config
[   14.850014] virtio-fs: tag  not found
[   14.854959] 9pnet_virtio: no channels available for device config
[   18.302513] systemd-journald[451]: 
/var/log/journal/c8f7ec498f904c46b99f25f051792ec0/user-1000.journal: Journal 
file uses a different sequence number ID, rotating.
[   20.173092] virtio-fs: tag  not found
[   20.176498] 9pnet_virtio: no channels available for device config
[   25.470406] virtio-fs: tag  not found
[   25.475163] 9pnet_virtio: no channels available for device config
[   30.690179] virtio-fs: tag  not found
[   30.695584] 9pnet_virtio: no channels available for device config
[   35.947869] virtio-fs: tag  not found
[   35.951397] 9pnet_virtio: no channels available for device config
[   41.190386] virtio-fs: tag  not found
[   41.195323] 9pnet_virtio: no channels available for device config
[   46.437815] virtio-fs: tag  not found
[   46.441903] 9pnet_virtio: no channels available for device config
[   51.700719] virtio-fs: tag  not found
[   51.704995] 9pnet_virtio: no channels available for device config

..but it stops now after ~52 seconds after boot. Anyhow, these are
unusual new messages from the previous mantic kernel.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2055310

Title:
  dmesg spammed by virtui-fs and 9pnet-virtio messages

Status in linux package in Ubuntu:
  New

Bug description:
  Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel
  messages after boot (running instances in a VM using virt-manager)

  uname -a
  Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 
30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  [   30.638354] virtio-fs: tag  not found
  [   30.642316] 9pnet_virtio: no channels available for device config
  [   35.897615] virtio-fs: tag  not found
  [   35.901568] 9pnet_virtio: no channels available for device config
  [   41.141860] virtio-fs: tag  not found
  [   41.145513] 9pnet_virtio: no channels available for device config
  [   46.382040] virtio-fs: tag  not found
  [   46.386141] 9pnet_virtio: no channels available for device config
  [   51.632229] virtio-fs: tag  not found
  [   51.635727] 9pnet_virtio: no channels available for device config

  These are annoying when logging in via the console.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2055310] [NEW] dmesg spammed by virtui-fs and 9pnet-virtio messages

2024-02-28 Thread Colin Ian King
Public bug reported:

Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel
messages after boot (running instances in a VM using virt-manager)

uname -a
Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 
30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

[   30.638354] virtio-fs: tag  not found
[   30.642316] 9pnet_virtio: no channels available for device config
[   35.897615] virtio-fs: tag  not found
[   35.901568] 9pnet_virtio: no channels available for device config
[   41.141860] virtio-fs: tag  not found
[   41.145513] 9pnet_virtio: no channels available for device config
[   46.382040] virtio-fs: tag  not found
[   46.386141] 9pnet_virtio: no channels available for device config
[   51.632229] virtio-fs: tag  not found
[   51.635727] 9pnet_virtio: no channels available for device config

These are annoying when logging in via the console.

** Affects: linux (Ubuntu)
 Importance: Low
 Status: New

** Changed in: linux (Ubuntu)
   Importance: Undecided => Low

** Changed in: linux (Ubuntu)
Milestone: None => ubuntu-24.04-beta

** Description changed:

  Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel
- messages after boot:
+ messages after boot (running instances in a VM using virt-manager)
  
  uname -a
  Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 
30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- 
  
  [   30.638354] virtio-fs: tag  not found
  [   30.642316] 9pnet_virtio: no channels available for device config
  [   35.897615] virtio-fs: tag  not found
  [   35.901568] 9pnet_virtio: no channels available for device config
  [   41.141860] virtio-fs: tag  not found
  [   41.145513] 9pnet_virtio: no channels available for device config
  [   46.382040] virtio-fs: tag  not found
  [   46.386141] 9pnet_virtio: no channels available for device config
  [   51.632229] virtio-fs: tag  not found
  [   51.635727] 9pnet_virtio: no channels available for device config
  
  These are annoying when logging in via the console.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2055310

Title:
  dmesg spammed by virtui-fs and 9pnet-virtio messages

Status in linux package in Ubuntu:
  New

Bug description:
  Ubuntu noble, as of 28 Feb 2024, on amd64, s390x, ppc64, seeing kernel
  messages after boot (running instances in a VM using virt-manager)

  uname -a
  Linux noble-amd64-efi 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 
30 10:27:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  [   30.638354] virtio-fs: tag  not found
  [   30.642316] 9pnet_virtio: no channels available for device config
  [   35.897615] virtio-fs: tag  not found
  [   35.901568] 9pnet_virtio: no channels available for device config
  [   41.141860] virtio-fs: tag  not found
  [   41.145513] 9pnet_virtio: no channels available for device config
  [   46.382040] virtio-fs: tag  not found
  [   46.386141] 9pnet_virtio: no channels available for device config
  [   51.632229] virtio-fs: tag  not found
  [   51.635727] 9pnet_virtio: no channels available for device config

  These are annoying when logging in via the console.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2055310/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1899249] Re: OpenZFS writing stalls, under load

2024-02-28 Thread Colin Ian King
** Changed in: zfs-linux (Ubuntu)
 Assignee: Colin Ian King (colin-king) => (unassigned)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released

Bug description:
  Using a QNAP 4-drive USB enclosure, with a set of SSDs, on a Raspberry
  Pi 8GB. ZFS deduplication, and LZJB compression is enabled.

  This issue seems to occur, intermittently, after some time (happens
  with both SMB access, via Samba, and when interacting with the system,
  via SSH), and never previously occurred, until a few months ago, and I
  sometimes have to force a reboot of the system (at the cost of some
  data loss), in order to use it again.

  The "dmesg" log reports:

  [25375.911590] z_wr_iss_h  D0  2161  2 0x0028
  [25375.911606] Call trace:
  [25375.911627]  __switch_to+0x104/0x170
  [25375.911639]  __schedule+0x30c/0x7c0
  [25375.911647]  schedule+0x3c/0xb8
  [25375.911655]  io_schedule+0x20/0x58
  [25375.911668]  rq_qos_wait+0x100/0x178
  [25375.911677]  wbt_wait+0xb4/0xf0
  [25375.911687]  __rq_qos_throttle+0x38/0x50
  [25375.911700]  blk_mq_make_request+0x128/0x610
  [25375.911712]  generic_make_request+0xb4/0x2d8
  [25375.911722]  submit_bio+0x48/0x218
  [25375.911960]  vdev_disk_io_start+0x670/0x9f8 [zfs]
  [25375.912181]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
  [25375.912400]  zio_nowait+0xd4/0x170 [zfs]
  [25375.912617]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
  [25375.912839]  zio_vdev_io_start+0x248/0x2b8 [zfs]
  [25375.913057]  zio_execute+0xac/0x110 [zfs]
  [25375.913096]  taskq_thread+0x2f8/0x570 [spl]
  [25375.913108]  kthread+0xfc/0x128
  [25375.913119]  ret_from_fork+0x10/0x1c
  [25375.913149] INFO: task txg_sync:2333 blocked for more than 120 seconds.
  [25375.919916]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.926848] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.934835] txg_syncD0  2333  2 0x0028
  [25375.934850] Call trace:
  [25375.934869]  __switch_to+0x104/0x170
  [25375.934879]  __schedule+0x30c/0x7c0
  [25375.934887]  schedule+0x3c/0xb8
  [25375.934899]  schedule_timeout+0x9c/0x190
  [25375.934908]  io_schedule_timeout+0x28/0x48
  [25375.934946]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
  [25375.934982]  __cv_timedwait_io+0x3c/0x50 [spl]
  [25375.935205]  zio_wait+0x130/0x2a0 [zfs]
  [25375.935423]  dsl_pool_sync+0x3fc/0x498 [zfs]
  [25375.935650]  spa_sync+0x538/0xe68 [zfs]
  [25375.935867]  txg_sync_thread+0x2c0/0x468 [zfs]
  [25375.935911]  thread_generic_wrapper+0x74/0xa0 [spl]
  [25375.935924]  kthread+0xfc/0x128
  [25375.935935]  ret_from_fork+0x10/0x1c
  [25375.936017] INFO: task zbackup:75339 blocked for more than 120 seconds.
  [25375.942780]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.949710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.957702] zbackup D0 75339   5499 0x
  [25375.957716] Call trace:
  [25375.957732]  __switch_to+0x104/0x170
  [25375.957742]  __schedule+0x30c/0x7c0
  [25375.957750]  schedule+0x3c/0xb8
  [25375.957789]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.957823]  __cv_wait+0x30/0x40 [spl]
  [25375.958045]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.958263]  zil_commit+0x48/0x70 [zfs]
  [25375.958481]  zfs_create+0x544/0x7d0 [zfs]
  [25375.958698]  zpl_create+0xb8/0x178 [zfs]
  [25375.958711]  lookup_open+0x4ec/0x6a8
  [25375.958721]  do_last+0x260/0x8c0
  [25375.958730]  path_openat+0x84/0x258
  [25375.958739]  do_filp_open+0x84/0x108
  [25375.958752]  do_sys_open+0x180/0x2b0
  [25375.958763]  __arm64_sys_openat+0x2c/0x38
  [25375.958773]  el0_svc_common.constprop.0+0x80/0x218
  [25375.958781]  el0_svc_handler+0x34/0xa0
  [25375.958791]  el0_svc+0x10/0x2cc
  [25375.958801] INFO: task zbackup:95187 blocked for more than 120 seconds.
  [25375.965564]   Tainted: P C OE 5.4.0-1018-raspi #20-Ubuntu
  [25375.972492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [25375.980479] zbackup D0 95187   5499 0x
  [25375.980493] Call trace:
  [25375.980514]  __switch_to+0x104/0x170
  [25375.980525]  __schedule+0x30c/0x7c0
  [25375.980536]  schedule+0x3c/0xb8
  [25375.980578]  cv_wait_common+0x188/0x1b0 [spl]
  [25375.980612]  __cv_wait+0x30/0x40 [spl]
  [25375.980834]  zil_commit_impl+0x234/0xd30 [zfs]
  [25375.981052]  zil_commit+0x48/0x70 [zfs]
  [25375.981280]  zfs_write+0xa3c/0xb90 [zfs]
  [25375.981498]  zpl_write_common_iovec+0xac/0x120 [zfs]
  [25375.981726]  zpl_iter_write+0xe4/0x150 [zfs]
  [25375.981766]  new_sync_write+0x100/0x1a8
  [25375.981776]  __vfs_write+0x74/0x90
  [25375.981784]  vfs_write+0xe4/0x1c8
  [

[Kernel-packages] [Bug 2051342] Re: Enable lowlatency settings in the generic kernel

2024-01-30 Thread Colin Ian King
Looks like Michael Larabel has done some analysis for you already :-)
https://www.phoronix.com/news/Ubuntu-Generic-LL-Kernel

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2051342

Title:
  Enable lowlatency settings in the generic kernel

Status in linux package in Ubuntu:
  New
Status in linux source package in Noble:
  New

Bug description:
  [Impact]

  Ubuntu provides the "lowlatency" kernel: a kernel optimized for
  applications that have special "low latency" requirements.

  Currently, this kernel does not include any specific UBUNTU SAUCE
  patches to improve the extra "low latency" requirements, but the only
  difference is a small subset of .config options.

  Almost all these options are now configurable either at boot-time or
  even at run-time, with the only exception of CONFIG_HZ (250 in the
  generic kernel vs 1000 in the lowlatency kernel).

  Maintaining a separate kernel for a single config option seems a bit
  overkill and it is a significant cost of engineering hours, build
  time, regression testing time and resources. Not to mention the risk
  of the low-latency kernel falling behind and not being perfectly in
  sync with the latest generic kernel.

  Enabling the low-latency settings in the generic kernel has been
  evaluated before, but it has been never finalized due to the potential
  risk of performance regressions in CPU-intensive applications
  (increasing HZ from 250 to 1000 may introduce more kernel jitter in
  number crunching workloads). The outcome of the original proposal
  resulted in a re-classification of the lowlatency kernel as a desktop-
  oriented kernel, enabling additional low latency features (LP:
  #2023007).

  As we are approaching the release of the new Ubuntu 24.04 we may want
  to re-consider merging the low-latency settings in the generic kernel
  again.

  Following a detailed analysis of the specific low-latency features:

  - CONFIG_NO_HZ_FULL=y: enable access to "Full tickless mode" (shutdown
  clock tick when possible across all the enabled CPUs if they are
  either idle or running 1 task - reduce kernel jitter of running tasks
  due to the periodic clock tick, must be enabled at boot time passing
  `nohz_full=`); this can actually help CPU-intensive
  workloads and it could provide much more benefits than the CONFIG_HZ
  difference (since it can potentially shutdown any kernel jitter on
  specific CPUs), this one should really be enabled anyway, considering
  that it is configurable at boot time

   - CONFIG_RCU_NOCB_CPU=y: move RCU callbacks from softirq context to
  kthread context (reduce time spent in softirqs with preemption
  disabled to improve the overall system responsiveness, at the cost of
  introducing a potential performance penalty, because RCU callbacks are
  not processed by kernel threads); this should be enabled as well,
  since it is configurable at boot time (via the rcu_nocbs=
  parameter)

   - CONFIG_RCU_LAZY=y: batch RCU callbacks and then flush them after a
  timed delay instead of executing them immediately (c'an provide 5~10%
  power-savings for idle or lightly-loaded systems, this is extremely
  useful for laptops / portable devices -
  
https://lore.kernel.org/lkml/20221016162305.2489629-3-j...@joelfernandes.org/);
  this has the potential to introduce significant performance
  regressions, but in the Noble kernel we already have a SAUCE patch
  that allows to enable/disable this option at boot time (see LP:
  #2045492), and by default it will be disabled
  (CONFIG_RCU_LAZY_DEFAULT_OFF=y)

   - CONFIG_HZ=1000 last but not least, the only option that is *only*
  tunable at compile time. As already mentioned there is a potential
  risk of regressions for CPU-intensive applications, but they can be
  mitigated (and maybe they could even outperformed) with NO_HZ_FULL. On
  the other hand, HZ=1000 can improve system responsiveness, that means
  most of the desktop and server applications will benefit from this
  (the largest part of the server workloads is I/O bound, more than CPU-
  bound, so they can benefit from having a kernel that can react faster
  at switching tasks), not to mention the benefit for the typical end
  users applications (gaming, live conferencing, multimedia, etc.).

  With all of that in place we can provide a kernel that has the
  flexibility to be more responsive, more performant and more power
  efficient (therefore more "generic"), simply by tuning run-time and
  boot-time options.

  Moreover, once these changes are applied we will be able to deprecate
  the lowlatency kernel, saving engineering time and also reducing power
  consumption (required to build the kernel and do all the testing).

  Optionally, we can also provide optimal "lowlatency" settings as a
  user-space package that would set the proper options in the kernel
  boot command line (GRUB, or similar).

  [Test case]

  

[Kernel-packages] [Bug 2051342] Re: Enable lowlatency settings in the generic kernel

2024-01-29 Thread Colin Ian King
@Andrea, that's a good start, but it may be worth running some of the
Phoronix Tests too as they are a good spread of use cases.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2051342

Title:
  Enable lowlatency settings in the generic kernel

Status in linux package in Ubuntu:
  New
Status in linux source package in Noble:
  New

Bug description:
  [Impact]

  Ubuntu provides the "lowlatency" kernel: a kernel optimized for
  applications that have special "low latency" requirements.

  Currently, this kernel does not include any specific UBUNTU SAUCE
  patches to improve the extra "low latency" requirements, but the only
  difference is a small subset of .config options.

  Almost all these options are now configurable either at boot-time or
  even at run-time, with the only exception of CONFIG_HZ (250 in the
  generic kernel vs 1000 in the lowlatency kernel).

  Maintaining a separate kernel for a single config option seems a bit
  overkill and it is a significant cost of engineering hours, build
  time, regression testing time and resources. Not to mention the risk
  of the low-latency kernel falling behind and not being perfectly in
  sync with the latest generic kernel.

  Enabling the low-latency settings in the generic kernel has been
  evaluated before, but it has been never finalized due to the potential
  risk of performance regressions in CPU-intensive applications
  (increasing HZ from 250 to 1000 may introduce more kernel jitter in
  number crunching workloads). The outcome of the original proposal
  resulted in a re-classification of the lowlatency kernel as a desktop-
  oriented kernel, enabling additional low latency features (LP:
  #2023007).

  As we are approaching the release of the new Ubuntu 24.04 we may want
  to re-consider merging the low-latency settings in the generic kernel
  again.

  Following a detailed analysis of the specific low-latency features:

  - CONFIG_NO_HZ_FULL=y: enable access to "Full tickless mode" (shutdown
  clock tick when possible across all the enabled CPUs if they are
  either idle or running 1 task - reduce kernel jitter of running tasks
  due to the periodic clock tick, must be enabled at boot time passing
  `nohz_full=`); this can actually help CPU-intensive
  workloads and it could provide much more benefits than the CONFIG_HZ
  difference (since it can potentially shutdown any kernel jitter on
  specific CPUs), this one should really be enabled anyway, considering
  that it is configurable at boot time

   - CONFIG_RCU_NOCB_CPU=y: move RCU callbacks from softirq context to
  kthread context (reduce time spent in softirqs with preemption
  disabled to improve the overall system responsiveness, at the cost of
  introducing a potential performance penalty, because RCU callbacks are
  not processed by kernel threads); this should be enabled as well,
  since it is configurable at boot time (via the rcu_nocbs=
  parameter)

   - CONFIG_RCU_LAZY=y: batch RCU callbacks and then flush them after a
  timed delay instead of executing them immediately (c'an provide 5~10%
  power-savings for idle or lightly-loaded systems, this is extremely
  useful for laptops / portable devices -
  
https://lore.kernel.org/lkml/20221016162305.2489629-3-j...@joelfernandes.org/);
  this has the potential to introduce significant performance
  regressions, but in the Noble kernel we already have a SAUCE patch
  that allows to enable/disable this option at boot time (see LP:
  #2045492), and by default it will be disabled
  (CONFIG_RCU_LAZY_DEFAULT_OFF=y)

   - CONFIG_HZ=1000 last but not least, the only option that is *only*
  tunable at compile time. As already mentioned there is a potential
  risk of regressions for CPU-intensive applications, but they can be
  mitigated (and maybe they could even outperformed) with NO_HZ_FULL. On
  the other hand, HZ=1000 can improve system responsiveness, that means
  most of the desktop and server applications will benefit from this
  (the largest part of the server workloads is I/O bound, more than CPU-
  bound, so they can benefit from having a kernel that can react faster
  at switching tasks), not to mention the benefit for the typical end
  users applications (gaming, live conferencing, multimedia, etc.).

  With all of that in place we can provide a kernel that has the
  flexibility to be more responsive, more performant and more power
  efficient (therefore more "generic"), simply by tuning run-time and
  boot-time options.

  Moreover, once these changes are applied we will be able to deprecate
  the lowlatency kernel, saving engineering time and also reducing power
  consumption (required to build the kernel and do all the testing).

  Optionally, we can also provide optimal "lowlatency" settings as a
  user-space package that would set the proper options in the kernel
  boot command line (GRUB, or similar).

  [Test case]

  

[Kernel-packages] [Bug 2051342] Re: Enable lowlatency settings in the generic kernel

2024-01-29 Thread Colin Ian King
It may be worth trying a wider range of synthetic benchmarks to see how
it affects scheduling, I/O, RCU and power consumption.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2051342

Title:
  Enable lowlatency settings in the generic kernel

Status in linux package in Ubuntu:
  New
Status in linux source package in Noble:
  New

Bug description:
  [Impact]

  Ubuntu provides the "lowlatency" kernel: a kernel optimized for
  applications that have special "low latency" requirements.

  Currently, this kernel does not include any specific UBUNTU SAUCE
  patches to improve the extra "low latency" requirements, but the only
  difference is a small subset of .config options.

  Almost all these options are now configurable either at boot-time or
  even at run-time, with the only exception of CONFIG_HZ (250 in the
  generic kernel vs 1000 in the lowlatency kernel).

  Maintaining a separate kernel for a single config option seems a bit
  overkill and it is a significant cost of engineering hours, build
  time, regression testing time and resources. Not to mention the risk
  of the low-latency kernel falling behind and not being perfectly in
  sync with the latest generic kernel.

  Enabling the low-latency settings in the generic kernel has been
  evaluated before, but it has been never finalized due to the potential
  risk of performance regressions in CPU-intensive applications
  (increasing HZ from 250 to 1000 may introduce more kernel jitter in
  number crunching workloads). The outcome of the original proposal
  resulted in a re-classification of the lowlatency kernel as a desktop-
  oriented kernel, enabling additional low latency features (LP:
  #2023007).

  As we are approaching the release of the new Ubuntu 24.04 we may want
  to re-consider merging the low-latency settings in the generic kernel
  again.

  Following a detailed analysis of the specific low-latency features:

  - CONFIG_NO_HZ_FULL=y: enable access to "Full tickless mode" (shutdown
  clock tick when possible across all the enabled CPUs if they are
  either idle or running 1 task - reduce kernel jitter of running tasks
  due to the periodic clock tick, must be enabled at boot time passing
  `nohz_full=`); this can actually help CPU-intensive
  workloads and it could provide much more benefits than the CONFIG_HZ
  difference (since it can potentially shutdown any kernel jitter on
  specific CPUs), this one should really be enabled anyway, considering
  that it is configurable at boot time

   - CONFIG_RCU_NOCB_CPU=y: move RCU callbacks from softirq context to
  kthread context (reduce time spent in softirqs with preemption
  disabled to improve the overall system responsiveness, at the cost of
  introducing a potential performance penalty, because RCU callbacks are
  not processed by kernel threads); this should be enabled as well,
  since it is configurable at boot time (via the rcu_nocbs=
  parameter)

   - CONFIG_RCU_LAZY=y: batch RCU callbacks and then flush them after a
  timed delay instead of executing them immediately (c'an provide 5~10%
  power-savings for idle or lightly-loaded systems, this is extremely
  useful for laptops / portable devices -
  
https://lore.kernel.org/lkml/20221016162305.2489629-3-j...@joelfernandes.org/);
  this has the potential to introduce significant performance
  regressions, but in the Noble kernel we already have a SAUCE patch
  that allows to enable/disable this option at boot time (see LP:
  #2045492), and by default it will be disabled
  (CONFIG_RCU_LAZY_DEFAULT_OFF=y)

   - CONFIG_HZ=1000 last but not least, the only option that is *only*
  tunable at compile time. As already mentioned there is a potential
  risk of regressions for CPU-intensive applications, but they can be
  mitigated (and maybe they could even outperformed) with NO_HZ_FULL. On
  the other hand, HZ=1000 can improve system responsiveness, that means
  most of the desktop and server applications will benefit from this
  (the largest part of the server workloads is I/O bound, more than CPU-
  bound, so they can benefit from having a kernel that can react faster
  at switching tasks), not to mention the benefit for the typical end
  users applications (gaming, live conferencing, multimedia, etc.).

  With all of that in place we can provide a kernel that has the
  flexibility to be more responsive, more performant and more power
  efficient (therefore more "generic"), simply by tuning run-time and
  boot-time options.

  Moreover, once these changes are applied we will be able to deprecate
  the lowlatency kernel, saving engineering time and also reducing power
  consumption (required to build the kernel and do all the testing).

  Optionally, we can also provide optimal "lowlatency" settings as a
  user-space package that would set the proper options in the kernel
  boot command line (GRUB, or similar).

  [Test case]

  

[Kernel-packages] [Bug 2049537] Re: Pull request for: peer-memory, ACPI thermal issues and coresight etm4x issues

2024-01-17 Thread Ian May
** Changed in: linux-nvidia-6.5 (Ubuntu)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2049537

Title:
  Pull request for: peer-memory, ACPI thermal issues and coresight
  etm4x issues

Status in linux-nvidia-6.5 package in Ubuntu:
  Fix Committed

Bug description:
  * Add support of "Thermal fast Sampling Period (_TFP)" for passive cooling.
  * Finer grained CPU throttling
  * The peer_memory_client scheme allows a driver to register with the ib_umem
  system that it has the ability to understand user virtual address ranges that 
are not compatible with get_user_pages(). For instance VMAs created with 
io_remap_pfn_range(), or other driver special VMA.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2049537/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2048815] Re: Pull request to address TPM SPI devices

2024-01-11 Thread Ian May
** Changed in: linux-nvidia-6.5 (Ubuntu)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2048815

Title:
  Pull request to address TPM SPI devices

Status in linux-nvidia-6.5 package in Ubuntu:
  Fix Committed

Bug description:
  TPM devices may insert wait state on last clock cycle of ADDR phase.
  For SPI controllers that support full-duplex transfers, this can be
  detected using software by reading the MISO line. For SPI controllers
  that only support half-duplex transfers, such as the Tegra QSPI, it is
  not possible to detect the wait signal from software.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048815/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2048966] Re: Fix soft lockup triggered by arm_smmu_mm_invalidate_range

2024-01-11 Thread Ian May
** Changed in: linux-nvidia-6.5 (Ubuntu)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2048966

Title:
  Fix soft lockup triggered by arm_smmu_mm_invalidate_range

Status in linux-nvidia-6.5 package in Ubuntu:
  Fix Committed

Bug description:
  [Problem]

  When running an SVA case, the following soft lockup is triggered:
  
  watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
  pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
  pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
  lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
  sp : 8000d83ef290
  x29: 8000d83ef290 x28: 3b9aca00 x27: 
  x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: 
  x23: 0040 x22: 8000d83ef340 x21: c63980c0
  x20: 0001 x19: c6398080 x18: 
  x17:  x16:  x15: 3000b4a3bbb0
  x14: 3000b4a30888 x13: 3000b4a3cf60 x12: 
  x11:  x10:  x9 : c08120e4d6bc
  x8 :  x7 :  x6 : 00048cfa
  x5 :  x4 : 0001 x3 : 000a
  x2 : 8000 x1 :  x0 : 0001
  Call trace:
   arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
   __arm_smmu_tlb_inv_range+0x118/0x254
   arm_smmu_tlb_inv_range_asid+0x6c/0x130
   arm_smmu_mm_invalidate_range+0xa0/0xa4
   __mmu_notifier_invalidate_range_end+0x88/0x120
   unmap_vmas+0x194/0x1e0
   unmap_region+0xb4/0x144
   do_mas_align_munmap+0x290/0x490
   do_mas_munmap+0xbc/0x124
   __vm_munmap+0xa8/0x19c
   __arm64_sys_munmap+0x28/0x50
   invoke_syscall+0x78/0x11c
   el0_svc_common.constprop.0+0x58/0x1c0
   do_el0_svc+0x34/0x60
   el0_svc+0x2c/0xd4
   el0t_64_sync_handler+0x114/0x140
   el0t_64_sync+0x1a4/0x1a8
  

  
  [Fix]

  backport the following upstream stable patch
  d5afb4b47e13161b3f33904d45110f9e6463bad6

  Link:
  https://lore.kernel.org/r/20230920052257.8615-1-nicol...@nvidia.com

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048966/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2048966] Re: Fix soft lockup triggered by arm_smmu_mm_invalidate_range

2024-01-11 Thread Ian May
** Description changed:

+ [Problem]
+ 
  When running an SVA case, the following soft lockup is triggered:
- 
- watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
- pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
- pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
- lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
- sp : 8000d83ef290
- x29: 8000d83ef290 x28: 3b9aca00 x27: 
- x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: 
- x23: 0040 x22: 8000d83ef340 x21: c63980c0
- x20: 0001 x19: c6398080 x18: 
- x17:  x16:  x15: 3000b4a3bbb0
- x14: 3000b4a30888 x13: 3000b4a3cf60 x12: 
- x11:  x10:  x9 : c08120e4d6bc
- x8 :  x7 :  x6 : 00048cfa
- x5 :  x4 : 0001 x3 : 000a
- x2 : 8000 x1 :  x0 : 0001
- Call trace:
-  arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
-  __arm_smmu_tlb_inv_range+0x118/0x254
-  arm_smmu_tlb_inv_range_asid+0x6c/0x130
-  arm_smmu_mm_invalidate_range+0xa0/0xa4
-  __mmu_notifier_invalidate_range_end+0x88/0x120
-  unmap_vmas+0x194/0x1e0
-  unmap_region+0xb4/0x144
-  do_mas_align_munmap+0x290/0x490
-  do_mas_munmap+0xbc/0x124
-  __vm_munmap+0xa8/0x19c
-  __arm64_sys_munmap+0x28/0x50
-  invoke_syscall+0x78/0x11c
-  el0_svc_common.constprop.0+0x58/0x1c0
-  do_el0_svc+0x34/0x60
-  el0_svc+0x2c/0xd4
-  el0t_64_sync_handler+0x114/0x140
-  el0t_64_sync+0x1a4/0x1a8
- 
+ 
+ watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
+ pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
+ pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
+ lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
+ sp : 8000d83ef290
+ x29: 8000d83ef290 x28: 3b9aca00 x27: 
+ x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: 
+ x23: 0040 x22: 8000d83ef340 x21: c63980c0
+ x20: 0001 x19: c6398080 x18: 
+ x17:  x16:  x15: 3000b4a3bbb0
+ x14: 3000b4a30888 x13: 3000b4a3cf60 x12: 
+ x11:  x10:  x9 : c08120e4d6bc
+ x8 :  x7 :  x6 : 00048cfa
+ x5 :  x4 : 0001 x3 : 000a
+ x2 : 8000 x1 :  x0 : 0001
+ Call trace:
+  arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
+  __arm_smmu_tlb_inv_range+0x118/0x254
+  arm_smmu_tlb_inv_range_asid+0x6c/0x130
+  arm_smmu_mm_invalidate_range+0xa0/0xa4
+  __mmu_notifier_invalidate_range_end+0x88/0x120
+  unmap_vmas+0x194/0x1e0
+  unmap_region+0xb4/0x144
+  do_mas_align_munmap+0x290/0x490
+  do_mas_munmap+0xbc/0x124
+  __vm_munmap+0xa8/0x19c
+  __arm64_sys_munmap+0x28/0x50
+  invoke_syscall+0x78/0x11c
+  el0_svc_common.constprop.0+0x58/0x1c0
+  do_el0_svc+0x34/0x60
+  el0_svc+0x2c/0xd4
+  el0t_64_sync_handler+0x114/0x140
+  el0t_64_sync+0x1a4/0x1a8
+ 
+ 
+ 
+ [Fix]
+ 
+ backport the following upstream stable patch
+ d5afb4b47e13161b3f33904d45110f9e6463bad6
+ 
+ Link:
+ https://lore.kernel.org/r/20230920052257.8615-1-nicol...@nvidia.com

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2048966

Title:
  Fix soft lockup triggered by arm_smmu_mm_invalidate_range

Status in linux-nvidia-6.5 package in Ubuntu:
  New

Bug description:
  [Problem]

  When running an SVA case, the following soft lockup is triggered:
  
  watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
  pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
  pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
  lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
  sp : 8000d83ef290
  x29: 8000d83ef290 x28: 3b9aca00 x27: 
  x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: 
  x23: 0040 x22: 8000d83ef340 x21: c63980c0
  x20: 0001 x19: c6398080 x18: 
  x17:  x16:  x15: 3000b4a3bbb0
  x14: 3000b4a30888 x13: 

[Kernel-packages] [Bug 2048966] [NEW] Fix soft lockup triggered by arm_smmu_mm_invalidate_range

2024-01-10 Thread Ian May
Public bug reported:

When running an SVA case, the following soft lockup is triggered:

watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
sp : 8000d83ef290
x29: 8000d83ef290 x28: 3b9aca00 x27: 
x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: 
x23: 0040 x22: 8000d83ef340 x21: c63980c0
x20: 0001 x19: c6398080 x18: 
x17:  x16:  x15: 3000b4a3bbb0
x14: 3000b4a30888 x13: 3000b4a3cf60 x12: 
x11:  x10:  x9 : c08120e4d6bc
x8 :  x7 :  x6 : 00048cfa
x5 :  x4 : 0001 x3 : 000a
x2 : 8000 x1 :  x0 : 0001
Call trace:
 arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
 __arm_smmu_tlb_inv_range+0x118/0x254
 arm_smmu_tlb_inv_range_asid+0x6c/0x130
 arm_smmu_mm_invalidate_range+0xa0/0xa4
 __mmu_notifier_invalidate_range_end+0x88/0x120
 unmap_vmas+0x194/0x1e0
 unmap_region+0xb4/0x144
 do_mas_align_munmap+0x290/0x490
 do_mas_munmap+0xbc/0x124
 __vm_munmap+0xa8/0x19c
 __arm64_sys_munmap+0x28/0x50
 invoke_syscall+0x78/0x11c
 el0_svc_common.constprop.0+0x58/0x1c0
 do_el0_svc+0x34/0x60
 el0_svc+0x2c/0xd4
 el0t_64_sync_handler+0x114/0x140
 el0t_64_sync+0x1a4/0x1a8


** Affects: linux-nvidia-6.5 (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2048966

Title:
  Fix soft lockup triggered by arm_smmu_mm_invalidate_range

Status in linux-nvidia-6.5 package in Ubuntu:
  New

Bug description:
  When running an SVA case, the following soft lockup is triggered:
  
  watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
  pstate: 8349 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
  pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
  lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
  sp : 8000d83ef290
  x29: 8000d83ef290 x28: 3b9aca00 x27: 
  x26: 8000d83ef3c0 x25: da86c0812194a0e8 x24: 
  x23: 0040 x22: 8000d83ef340 x21: c63980c0
  x20: 0001 x19: c6398080 x18: 
  x17:  x16:  x15: 3000b4a3bbb0
  x14: 3000b4a30888 x13: 3000b4a3cf60 x12: 
  x11:  x10:  x9 : c08120e4d6bc
  x8 :  x7 :  x6 : 00048cfa
  x5 :  x4 : 0001 x3 : 000a
  x2 : 8000 x1 :  x0 : 0001
  Call trace:
   arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
   __arm_smmu_tlb_inv_range+0x118/0x254
   arm_smmu_tlb_inv_range_asid+0x6c/0x130
   arm_smmu_mm_invalidate_range+0xa0/0xa4
   __mmu_notifier_invalidate_range_end+0x88/0x120
   unmap_vmas+0x194/0x1e0
   unmap_region+0xb4/0x144
   do_mas_align_munmap+0x290/0x490
   do_mas_munmap+0xbc/0x124
   __vm_munmap+0xa8/0x19c
   __arm64_sys_munmap+0x28/0x50
   invoke_syscall+0x78/0x11c
   el0_svc_common.constprop.0+0x58/0x1c0
   do_el0_svc+0x34/0x60
   el0_svc+0x2c/0xd4
   el0t_64_sync_handler+0x114/0x140
   el0t_64_sync+0x1a4/0x1a8
  

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.5/+bug/2048966/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2042697] Re: Pull request to address thermal core issues

2023-12-04 Thread Ian May
** Changed in: linux-nvidia-6.2 (Ubuntu)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2042697

Title:
  Pull request to address thermal core issues

Status in linux-nvidia-6.2 package in Ubuntu:
  Fix Released

Bug description:
  The Grace development team has not been testing the 6.2 Ubuntu kernel
  but instead a newer kernel. When they run their thermal tests on a 6.2
  kernel they are running into failures. Investigations have turned up
  several missing kernel patches. These patches are clean cherry-picks
  and have been tested and confirmed to fix the thermal issues we are
  seeing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2042697/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1995606] Re: Upgrade thermald to 2.5.1 for Jammy (22.04)

2023-11-23 Thread Colin Ian King
Thanks team Canonical for this \o/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/1995606

Title:
  Upgrade thermald to 2.5.1 for Jammy (22.04)

Status in OEM Priority Project:
  Fix Released
Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Jammy:
  Fix Released

Bug description:
  [Justification]
  The purpose of this bug is that prevent the regression in the future.
  The automatic test scripts are better for the future SRU and is still on the 
planning.

  [Test case]
  For these CPU series, RPL/ADL/TGL/CML/CFL/KBL, the following tests will be 
run on machines in the CI lab:

  1. Run stress-ng, and observe the temperature/frequency/power with s-tui
    - Temperatures should stay just below trip values
    - Power/performance profiles should stay roughly the same between old 
thermald and new thermald (unless specifically expected eg: to fix 
premature/insufficient throttling)
  2. check if thermald could read rules from /dev/acpi_thermal_rel and generate 
the xml file on /etc/thermald/ correctly.
    - this depends on if acpi_thermal_rel exist.
    - if the machine suppots acpi_thermal_rel, the "thermal-conf.xml.auto"
   could be landed in etc/thermald/.
    - if not, the user-defined xml could be created, then jump to (3).
    - run thermald with --loglevel=debug, and compare the log with xml.auto 
file. check if the configuration could be parsed correctly.
  3. check if theramd-conf.xml and thermal-cpu-cdev-order.xml can be loaded 
correctly.
    - run thermald with --loglevel=debug, and compare the log with xml files.
    - if parsed correctly, the configurations from XML files would appear in 
the log.

  4. Run unit tests, the scripts are under test folder, using emul_temp to 
simulate the High temperatue and check thermald would throttle CPU through the 
related cooling device.
    - rapl.sh
    - intel_pstate.sh
    - powerclamp.sh
    - processor.sh
  5. check if the power/frequency would be throttled once the temperature reach 
the trip-points of thermal zone.
  6. check if system would be throttled even the temperature is under the 
trip-points.

  [ Where problems could occur ]
  since the PL1 min/max is introduced, there may have some cases that don't 
check the minimum of PL1 then make PL1 to smaller and smaller and throttle the 
CPU. 
  this may cause machines run like the old behavior that doesn't have PL1 
min/max.

To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/1995606/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF

2023-11-14 Thread Ian May
** Changed in: linux (Ubuntu)
   Status: Incomplete => Won't Fix

** No longer affects: linux (Ubuntu Jammy)

** No longer affects: linux-nvidia (Ubuntu Jammy)

** Changed in: linux-nvidia (Ubuntu)
   Status: New => Fix Committed

** Changed in: linux-nvidia (Ubuntu)
 Assignee: (unassigned) => Ian May (ian-may)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2040526

Title:
  Backport RDMA DMABUF

Status in linux package in Ubuntu:
  Won't Fix
Status in linux-nvidia package in Ubuntu:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]

  * From Nvidia:

  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers
  to be shared between drivers thus enhancing performance while reducing
  copying of data.

  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the
  foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel,
  specifically the lowlatency flavor.

  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated
  into the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's
  performance but also minimizes the need for data copying, effectively
  enhancing efficiency across the board.

  The new functionality is isolated such that existing user will not
  execute these new code paths."

  Upstream Reference

  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in:
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"

  [Test Plan]

  * Testing instructions are outlined in the SF case and has been tested
  on in house hardware and externally by Nvidia.

  [Where problems could occur?]

  * This introduces new code paths so regression potential should be
  low.

  [Other Info]

  * SF#00370664

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF

2023-11-14 Thread Ian May
** Package changed: linux-nvidia (Ubuntu) => linux (Ubuntu)

** Also affects: linux-nvidia (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2040526

Title:
  Backport RDMA DMABUF

Status in linux package in Ubuntu:
  Won't Fix
Status in linux-nvidia package in Ubuntu:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]

  * From Nvidia:

  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers
  to be shared between drivers thus enhancing performance while reducing
  copying of data.

  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the
  foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel,
  specifically the lowlatency flavor.

  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated
  into the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's
  performance but also minimizes the need for data copying, effectively
  enhancing efficiency across the board.

  The new functionality is isolated such that existing user will not
  execute these new code paths."

  Upstream Reference

  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in:
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"

  [Test Plan]

  * Testing instructions are outlined in the SF case and has been tested
  on in house hardware and externally by Nvidia.

  [Where problems could occur?]

  * This introduces new code paths so regression potential should be
  low.

  [Other Info]

  * SF#00370664

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF

2023-11-14 Thread Ian May
** Package changed: linux (Ubuntu) => linux-nvidia (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia in Ubuntu.
https://bugs.launchpad.net/bugs/2040526

Title:
  Backport RDMA DMABUF

Status in linux-nvidia package in Ubuntu:
  Incomplete
Status in linux-nvidia source package in Jammy:
  Incomplete

Bug description:
  SRU Justification:

  [Impact]

  * From Nvidia:

  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers
  to be shared between drivers thus enhancing performance while reducing
  copying of data.

  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the
  foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel,
  specifically the lowlatency flavor.

  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated
  into the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's
  performance but also minimizes the need for data copying, effectively
  enhancing efficiency across the board.

  The new functionality is isolated such that existing user will not
  execute these new code paths."

  Upstream Reference

  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in:
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"

  [Test Plan]

  * Testing instructions are outlined in the SF case and has been tested
  on in house hardware and externally by Nvidia.

  [Where problems could occur?]

  * This introduces new code paths so regression potential should be
  low.

  [Other Info]

  * SF#00370664

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2040526/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2043059] Re: Installation errors out when installing in a chroot

2023-11-10 Thread Ian May
I don't appear to have access to the image file used in the reproducer.
http://bright-dev.nvidia.com/base-distributions/x86_64/dgx-os/dgx-os-6.1-trd4/DGXOS-6.1.0-DGX-H100.tar.gz

So instead I'm using the following image for reproducing.
https://cloud-images.ubuntu.com/jammy/20231027/jammy-server-cloudimg-amd64.tar.gz

The error indicates to me that it can't find the root device.  If I
don't bind mount /dev into my image, I'm able to recreate the error with
both linux-generic and linux-nvidia.  With the host /dev mounted into
the chroot both kernels are able to call mkinitramfs successfully.

Can you confirm that 'cm-chroot-sw-img' is mounting /dev?
mount | grep /cm/images/dgx-h100-image/dev

If we are lucky and it happens to not be mounted could you try the following:
sudo mount --bind /dev /cm/images/dgx-h100-image/dev
sudo chroot /cm/images/dgx-h100-image
/etc/kernel/postinst.d/kdump-tools 5.15.0-1040-nvidia

If /dev is correctly mounted and problem persists, I'll probably need a
way to get that image tar to further investigate.

Thanks,
Ian

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia in Ubuntu.
https://bugs.launchpad.net/bugs/2043059

Title:
  Installation errors out when installing in a chroot

Status in linux-nvidia package in Ubuntu:
  New

Bug description:
  Processing triggers for linux-image-5.15.0-1040-nvidia (5.15.0-1040.40) ...
  /etc/kernel/postinst.d/dkms:
   * dkms: running auto installation service for kernel 5.15.0-1040-nvidia
 ...done.
  /etc/kernel/postinst.d/initramfs-tools:
  update-initramfs: Generating /boot/initrd.img-5.15.0-1040-nvidia
  cryptsetup: WARNING: Couldn't determine root device
  W: Couldn't identify type of root file system for fsck hook
  cp: cannot stat '/etc/iscsi/initiatorname.iscsi': No such file or directory
  /etc/kernel/postinst.d/kdump-tools:
  kdump-tools: Generating /var/lib/kdump/initrd.img-5.15.0-1040-nvidia
  mkinitramfs: failed to determine device for /
  mkinitramfs: workaround is MODULES=most, check:
  grep -r MODULES /var/lib/kdump/initramfs-tools

  Error please report bug on initramfs-tools
  Include the output of 'mount' and 'cat /proc/mounts'
  update-initramfs: failed for /var/lib/kdump/initrd.img-5.15.0-1040-nvidia 
with 1.
  run-parts: /etc/kernel/postinst.d/kdump-tools exited with return code 1
  dpkg: error processing package linux-image-5.15.0-1040-nvidia (--configure):
   installed linux-image-5.15.0-1040-nvidia package post-installation script 
subprocess returned error exit status 1
  Errors were encountered while processing:
   linux-image-5.15.0-1040-nvidia
  E: Sub-process /usr/bin/dpkg returned an error code (1)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2043059/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1738534] Re: Processor turbo disabled/throttled after suspend

2023-11-09 Thread Colin Ian King
** Summary changed:

- Processor turbo dsiabled/throttled after suspend
+ Processor turbo disabled/throttled after suspend

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/1738534

Title:
  Processor turbo disabled/throttled after suspend

Status in linux package in Ubuntu:
  Confirmed
Status in thermald package in Ubuntu:
  New

Bug description:
  After suspending/resuming my laptop on battery power, I noticed choppy
  video playback.  I've narrowed it down to the CPU being locked to
  lower frequencies after suspend/resume (only on battery).  Plugging
  the laptop back in does not restore the normal performance, nor does
  suspend/resume after plugging it back in.  The performance doesn't
  drop until after the suspend/resume, I'm not sure if it is _supposed_
  to throttle when on battery, but either way the behaviour is wrong.

  Doing a full shutdown and restart restores the performance to normal.

  Prior to a suspend/resume cycle, cpupower reports:

  $ sudo cpupower frequency-info
  analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency:  Cannot determine or is not supported.
hardware limits: 800 MHz - 3.00 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 800 MHz and 3.00 GHz.
The governor "powersave" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 1.26 GHz (asserted by call to kernel)
boost state support:
  Supported: yes
  Active: yes
  2800 MHz max turbo 4 active cores
  2800 MHz max turbo 3 active cores
  2800 MHz max turbo 2 active cores
  3000 MHz max turbo 1 active cores

  Afterwards, the frequency is clamped (cpufreq-set -r --max=3.0GHz has
  no effect) and turbo is disabled:

  $ sudo cpupower frequency-info
  analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency:  Cannot determine or is not supported.
hardware limits: 800 MHz - 3.00 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 800 MHz and 1.80 GHz.
The governor "powersave" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 950 MHz (asserted by call to kernel)
boost state support:
  Supported: no
  Active: no
  2800 MHz max turbo 4 active cores
  2800 MHz max turbo 3 active cores
  2800 MHz max turbo 2 active cores
  3000 MHz max turbo 1 active cores

  Trying to re-enable turbo mode by setting the no_turbo intel_pstate
  /sys/ entry back to 0 is rejected:

  $ echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
  0
  tee: /sys/devices/system/cpu/intel_pstate/no_turbo: Operation not permitted

  However, these two commands *do* work around the problem, forcing
  turbo mode back on and then restoring the normal frequency range:

  sudo x86_energy_perf_policy --turbo-enable 1
  sudo cpufreq-set -r --min=0.8GHz --max=3.0GHz

  I also see this error in dmesg after some resumes (but the above
  problem sometimes happens without this error message):

  Dec 16 11:36:25 shauns-laptop kernel: intel_pstate: Turbo disabled by
  BIOS or unavailable on processor

  ProblemType: Bug
  DistroRelease: Ubuntu 17.10
  Package: linux-image-4.13.0-19-generic 4.13.0-19.22
  ProcVersionSignature: Ubuntu 4.13.0-19.22-generic 4.13.13
  Uname: Linux 4.13.0-19-generic x86_64
  ApportVersion: 2.20.7-0ubuntu3.6
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC0:  shaun  1194 F pulseaudio
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Dec 16 11:18:12 2017
  InstallationDate: Installed on 2017-12-14 (1 days ago)
  InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20171018)
  Lsusb:
   Bus 001 Device 004: ID 2232:1024 Silicon Motion 
   Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: SAMSUNG ELECTRONICS CO., LTD. 900X3C/900X3D/900X4C/900X4D
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-4.13.0-19-generic 
root=UUID=7352de8c-0017-44e1-81fb-0145ad9c1185 ro rootflags=subvol=@ quiet 
splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-4.13.0-19-generic N/A
   linux-backports-modules-4.13.0-19-generic  N/A
   

[Kernel-packages] [Bug 1901266] Re: system sluggish, thermal keep frequency at 400MHz

2023-11-09 Thread Colin Ian King
This bug report has not seen any further follow-up for 2+ years. Closing
it. If it is still not fixed please re-open this issue.

** Changed in: thermald (Ubuntu)
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/1901266

Title:
  system sluggish, thermal keep frequency at 400MHz

Status in thermald package in Ubuntu:
  Fix Released

Bug description:
  This morning I upgraded to 20.10 from 20.04

  The system was quite slow although I have a fast machine. My virtual windows 
10 on virtualbox became unusable. When I tried to have the virtual machine 
open, I could not participate properly in a zoom call (I could still hear the 
people but they said that my voice was very choppy)
  On 20.04 I was super-happy with the speed and I could have as many apps 
running as I want.

  Based on google I started looking at 
  % journalctl --follow
  and this shows quite a few errors but not repeating often enough to explain 
it.

  Then I googled some more and found that /boot/efi was writing and
  reading.

  Then I googled some more and thought I had trouble with gnome. So I reset it 
to default
  % dconf reset -f /org/gnome/
  and disabled the extensions. This made things slightly better but by far not 
acceptable.

  After lots of searching I checked the frequency of the CPUs and it was at the 
minimum 400Hz (as shown by i7z and also other tools). I tried setting the 
governor with cpufreqctl and similar methods but this did not change anything.
  I then found an old bug 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769236 and tried 
  % sudo systemctl stop thermald
  this seems to work. After a few seconds the frequency shown in i7z goes to 
~4500 MHz and the virtual machine seems to work fine.

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: thermald 2.3-4
  ProcVersionSignature: Ubuntu 5.8.0-25.26-generic 5.8.14
  Uname: Linux 5.8.0-25-generic x86_64
  NonfreeKernelModules: wl
  ApportVersion: 2.20.11-0ubuntu50
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Oct 24 01:39:40 2020
  DistributionChannelDescriptor:
   # This is the distribution channel descriptor for the OEM CDs
   # For more information see 
http://wiki.ubuntu.com/DistributionChannelDescriptor
   canonical-oem-somerville-bionic-amd64-20180608-47+merion+X66
  InstallationDate: Installed on 2019-09-27 (392 days ago)
  InstallationMedia: Ubuntu 18.04 "Bionic" - Build amd64 LIVE Binary 
20180608-09:38
  SourcePackage: thermald
  UpgradeStatus: Upgraded to groovy on 2020-10-23 (0 days ago)
  mtime.conffile..etc.thermald.thermal-conf.xml: 2020-10-24T01:35:59.781865

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1901266/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2041670] [NEW] tmpfs: O_DIRECT | O_CREATE open reports open failure but actually creates a file.

2023-10-27 Thread Colin Ian King
Public bug reported:

creating a file on tmpfs with open(filename, O_RDWR | O_DIRECT |
O_CREAT, 0666) reports an open failure error EINVAL, but still creates
the file. The file should not be created if we hit such an error.

Tested and fails on:
mantic amd64: 6.5.0-10-generic
lunar amd64: 6.2.0-35-generic
jammie amd64: 5.15.0-generic
focal: 5.4.0-165-generic
bionic: 4.15.0-213-generic
trusty: 4.4.0-148-generic


sudo mkdir /mnt/tmpfs
sudo mount -t tmpfs -o size=1G,nr_inodes=10k,mode=777 tmpfs /mnt/tmpfs
sudo chmod 666 /mnt/tmpfs
gcc reproducer.c -o reproducer
sudo ./reproducer

Run the attached program. It reports an open failure (errno 22, EINVAL)
but still manages to create the file.

Note this was original discovered by running stress-ng on tmpfs with the
open stressor: stress-ng --open 1

** Affects: linux
 Importance: Unknown
 Status: Unknown

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

** Attachment added: "C source to reproduce the issue"
   
https://bugs.launchpad.net/bugs/2041670/+attachment/5713768/+files/reproducer.c

** Bug watch added: Linux Kernel Bug Tracker #218049
   https://bugzilla.kernel.org/show_bug.cgi?id=218049

** Also affects: linux via
   https://bugzilla.kernel.org/show_bug.cgi?id=218049
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2041670

Title:
   tmpfs: O_DIRECT | O_CREATE open reports open failure but actually
  creates a file.

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  New

Bug description:
  creating a file on tmpfs with open(filename, O_RDWR | O_DIRECT |
  O_CREAT, 0666) reports an open failure error EINVAL, but still creates
  the file. The file should not be created if we hit such an error.

  Tested and fails on:
  mantic amd64: 6.5.0-10-generic
  lunar amd64: 6.2.0-35-generic
  jammie amd64: 5.15.0-generic
  focal: 5.4.0-165-generic
  bionic: 4.15.0-213-generic
  trusty: 4.4.0-148-generic

  
  sudo mkdir /mnt/tmpfs
  sudo mount -t tmpfs -o size=1G,nr_inodes=10k,mode=777 tmpfs /mnt/tmpfs
  sudo chmod 666 /mnt/tmpfs
  gcc reproducer.c -o reproducer
  sudo ./reproducer

  Run the attached program. It reports an open failure (errno 22,
  EINVAL) but still manages to create the file.

  Note this was original discovered by running stress-ng on tmpfs with
  the open stressor: stress-ng --open 1

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/2041670/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF

2023-10-25 Thread Ian May
** Changed in: linux (Ubuntu)
   Status: Incomplete => New

** Description changed:

  SRU Justification:
  
  [Impact]
  
- From Nvidia:
+ *From Nvidia:
  
  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers to
  be shared between drivers thus enhancing performance while reducing
  copying of data.
  
  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the foundation
  of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically
  the lowlatency flavor.
  
  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated into
  the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's performance
  but also minimizes the need for data copying, effectively enhancing
  efficiency across the board.
  
  The new functionality is isolated such that existing user will not
  execute these new code paths."
  
  Upstream Reference
  
  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in:
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"
  
  [Test Plan]
  
- Testing instructions are outlined in the SF case and has been tested on
+ *Testing instructions are outlined in the SF case and has been tested on
  in house hardware and externally by Nvidia.
  
  [Where problems could occur?]
  
- This introduces new code paths so regression potential should be low.
+ *This introduces new code paths so regression potential should be low.
  
  [Other Info]
- SF#00370664
+ 
+ *SF#00370664

** Description changed:

  SRU Justification:
  
  [Impact]
  
- *From Nvidia:
+ * From Nvidia:
  
  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers to
  be shared between drivers thus enhancing performance while reducing
  copying of data.
  
  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the foundation
  of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically
  the lowlatency flavor.
  
  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated into
  the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's performance
  but also minimizes the need for data copying, effectively enhancing
  efficiency across the board.
  
  The new functionality is isolated such that existing user will not
  execute these new code paths."
  
  Upstream Reference
  
  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in:
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"
  
  [Test Plan]
  
- *Testing instructions are outlined in the SF case and has been tested on
- in house hardware and externally by Nvidia.
+ * Testing instructions are outlined in the SF case and has been tested
+ on in house hardware and externally by Nvidia.
  
  [Where problems could occur?]
  
- *This introduces new code paths so regression potential should be low.
+ * This introduces new code paths so regression potential should be low.
  
  [Other Info]
  
- *SF#00370664
+ * SF#00370664

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2040526

Title:
  Backport RDMA DMABUF

Status in linux package in Ubuntu:
  New
Status in linux source package in Jammy:
  New

Bug description:
  SRU Justification:

  [Impact]

  * From Nvidia:

  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists 

[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF

2023-10-25 Thread Ian May
** Description changed:

  SRU Justification:
  
  [Impact]
  
  From Nvidia:
  
  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers to
  be shared between drivers thus enhancing performance while reducing
  copying of data.
  
  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the foundation
  of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically
  the lowlatency flavor.
  
  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated into
  the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's performance
  but also minimizes the need for data copying, effectively enhancing
  efficiency across the board.
  
  The new functionality is isolated such that existing user will not
  execute these new code paths."
  
  Upstream Reference
  
  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
- The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in: 
+ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in:
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"
  
  [Test Plan]
  
  Testing instructions are outlined in the SF case and has been tested on
- local hardware and also by Nvidia.
+ in house hardware and externally by Nvidia.
  
  [Where problems could occur?]
  
  This introduces new code paths so regression potential should be low.
  
  [Other Info]
  SF#00370664

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2040526

Title:
  Backport RDMA DMABUF

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Jammy:
  New

Bug description:
  SRU Justification:

  [Impact]

  From Nvidia:

  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers
  to be shared between drivers thus enhancing performance while reducing
  copying of data.

  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the
  foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel,
  specifically the lowlatency flavor.

  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated
  into the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's
  performance but also minimizes the need for data copying, effectively
  enhancing efficiency across the board.

  The new functionality is isolated such that existing user will not
  execute these new code paths."

  Upstream Reference

  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in:
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"

  [Test Plan]

  Testing instructions are outlined in the SF case and has been tested
  on in house hardware and externally by Nvidia.

  [Where problems could occur?]

  This introduces new code paths so regression potential should be low.

  [Other Info]
  SF#00370664

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2040526] Re: Backport RDMA DMABUF

2023-10-25 Thread Ian May
** Description changed:

  SRU Justification:
  
  [Impact]
  
  From Nvidia:
  
  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers to
  be shared between drivers thus enhancing performance while reducing
  copying of data.
  
  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the foundation
  of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically
  the lowlatency flavor.
  
  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated into
  the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's performance
  but also minimizes the need for data copying, effectively enhancing
  efficiency across the board.
  
  The new functionality is isolated such that existing user will not
  execute these new code paths."
  
  Upstream Reference
+ 
  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
+ The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in: 
+ "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"
  
  [Test Plan]
  
  Testing instructions are outlined in the SF case and has been tested on
  local hardware and also by Nvidia.
  
  [Where problems could occur?]
  
  This introduces new code paths so regression potential should be low.
  
  [Other Info]
  SF#00370664

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2040526

Title:
  Backport RDMA DMABUF

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Jammy:
  New

Bug description:
  SRU Justification:

  [Impact]

  From Nvidia:

  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers
  to be shared between drivers thus enhancing performance while reducing
  copying of data.

  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the
  foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel,
  specifically the lowlatency flavor.

  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated
  into the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's
  performance but also minimizes the need for data copying, effectively
  enhancing efficiency across the board.

  The new functionality is isolated such that existing user will not
  execute these new code paths."

  Upstream Reference

  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  The patch "[PATCH 1/4] net/mlx5: Add IFC bits for mkey ATS" is already in 
Jammy and was included in: 
  "2a1e6097e9b9 UBUNTU: SAUCE: RDMA/core: Updated ib_peer_memory"

  [Test Plan]

  Testing instructions are outlined in the SF case and has been tested
  on local hardware and also by Nvidia.

  [Where problems could occur?]

  This introduces new code paths so regression potential should be low.

  [Other Info]
  SF#00370664

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2040526/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2040526] Re: Backport DMABUF functionality

2023-10-25 Thread Ian May
** Description changed:

  SRU Justification:
  
  [Impact]
  
- Backport RDMA DMABUF functionality
+ Backport RDMA DMABUF
  
- Nvidia is working on a high performance networking solution with real
+ From Nvidia:
+ 
+ "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers to
  be shared between drivers thus enhancing performance while reducing
  copying of data.
  
  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the foundation
  of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically
  the lowlatency flavor.
  
  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated into
  the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's performance
  but also minimizes the need for data copying, effectively enhancing
  efficiency across the board.
  
  The new functionality is isolated such that existing user will not
- execute these new code paths.
+ execute these new code paths."
  
- * First 3 patches adds a new api to the RDMA subsystem that allows drivers to 
get a pinned dmabuf memory
- region without requiring an implementation of the move_notify callback.
- 
+ Upstream Reference
  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
- 
- * The remaining patches add support for DMABUF when creating a devx umem. 
devx umems
- are quite similar to MR's execpt they cannot be revoked, so this uses the 
- dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot
- work with MR. 
- 
- https://lore.kernel.org/all/0-v1-bd147097458e+ede-
- umem_dmabuf_...@nvidia.com/
+ https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  
  [Test Plan]
  
- SW Configuration:
- • Download CUDA 12.2 run file 
(https://developer.nvidia.com/cuda-downloads?target_os=Linux_arch=x86_64=Ubuntu_version=20.04_type=runfile_local)
- • Install using kernel-open i.e. #sh ./cuda_12.2.2_535.104.05_linux.run 
-m=kernel-open
- • Clone perftest from https://github.com/linux-rdma/perftest.
- • cd perftest
- • export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH
- • export LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LIBRARY_PATH
- • run: ./autogen.sh ; ./configure CUDA_H_PATH=/usr/local/cuda/include/cuda.h; 
make
- 
- # Start Server
- $ ./ib_write_bw -d mlx5_2 -F --use_cuda=0 --use_cuda_dmabuf
- 
- #Start Client
- $ ./ib_write_bw -d mlx5_3 -F --use_cuda=1 --use_cuda_dmabuf localhost
+ Testing instructions are outlined in the SF case and has been tested on
+ local hardware and also by Nvidia.
  
  [Where problems could occur?]
+ 
+ This introduces new code paths so regression potential should be low.
+ 
+ [Other Info]
+ SF#00370664

** Description changed:

  SRU Justification:
  
  [Impact]
- 
- Backport RDMA DMABUF
  
  From Nvidia:
  
  "We are working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers to
  be shared between drivers thus enhancing performance while reducing
  copying of data.
  
  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the foundation
  of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically
  the lowlatency flavor.
  
  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated into
  the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's performance
  but also minimizes the need for data copying, effectively enhancing
  efficiency across the board.
  
  The new functionality is isolated such that existing user will not
  execute these new code paths."
  
  Upstream Reference
  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/
  https://lore.kernel.org/all/0-v1-bd147097458e+ede-umem_dmabuf_...@nvidia.com/
  
  [Test Plan]
  
  Testing instructions are outlined in the SF case and has been tested on
  local hardware and also by Nvidia.
  
  [Where problems could occur?]
  
  This introduces new code paths so regression potential should be low.
  
  [Other 

[Kernel-packages] [Bug 2040526] [NEW] Backport DMABUF functionality

2023-10-25 Thread Ian May
Public bug reported:

SRU Justification:

[Impact]

Backport RDMA DMABUF functionality

Nvidia is working on a high performance networking solution with real
customers. That solution is being developed using the Ubuntu 22.04 LTS
distro release and the distro kernel (lowlatency flavour). This
“dma_buf” patchset consists of upstreamed patches that allow buffers to
be shared between drivers thus enhancing performance while reducing
copying of data.

Our team is currently engaged in the development of a high-performance
networking solution tailored to meet the demands of real-world
customers. This cutting-edge solution is being crafted on the foundation
of Ubuntu 22.04 LTS, utilizing the distribution's kernel, specifically
the lowlatency flavor.

At the heart of our innovation lies the transformative "dma_buf"
patchset, comprising a series of patches that have been integrated into
the upstream kernel in 5.16 and 5.17. These patches introduce a
groundbreaking capability: enabling the seamless sharing of buffers
among various drivers. This not only bolsters the solution's performance
but also minimizes the need for data copying, effectively enhancing
efficiency across the board.

The new functionality is isolated such that existing user will not
execute these new code paths.

* First 3 patches adds a new api to the RDMA subsystem that allows drivers to 
get a pinned dmabuf memory
region without requiring an implementation of the move_notify callback.

https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/

* The remaining patches add support for DMABUF when creating a devx umem. devx 
umems
are quite similar to MR's execpt they cannot be revoked, so this uses the 
dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot
work with MR. 

https://lore.kernel.org/all/0-v1-bd147097458e+ede-
umem_dmabuf_...@nvidia.com/

[Test Plan]

SW Configuration:
• Download CUDA 12.2 run file 
(https://developer.nvidia.com/cuda-downloads?target_os=Linux_arch=x86_64=Ubuntu_version=20.04_type=runfile_local)
• Install using kernel-open i.e. #sh ./cuda_12.2.2_535.104.05_linux.run 
-m=kernel-open
• Clone perftest from https://github.com/linux-rdma/perftest.
• cd perftest
• export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH
• export LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LIBRARY_PATH
• run: ./autogen.sh ; ./configure CUDA_H_PATH=/usr/local/cuda/include/cuda.h; 
make

# Start Server
$ ./ib_write_bw -d mlx5_2 -F --use_cuda=0 --use_cuda_dmabuf

#Start Client
$ ./ib_write_bw -d mlx5_3 -F --use_cuda=1 --use_cuda_dmabuf localhost

[Where problems could occur?]

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2040526

Title:
  Backport DMABUF functionality

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  SRU Justification:

  [Impact]

  Backport RDMA DMABUF functionality

  Nvidia is working on a high performance networking solution with real
  customers. That solution is being developed using the Ubuntu 22.04 LTS
  distro release and the distro kernel (lowlatency flavour). This
  “dma_buf” patchset consists of upstreamed patches that allow buffers
  to be shared between drivers thus enhancing performance while reducing
  copying of data.

  Our team is currently engaged in the development of a high-performance
  networking solution tailored to meet the demands of real-world
  customers. This cutting-edge solution is being crafted on the
  foundation of Ubuntu 22.04 LTS, utilizing the distribution's kernel,
  specifically the lowlatency flavor.

  At the heart of our innovation lies the transformative "dma_buf"
  patchset, comprising a series of patches that have been integrated
  into the upstream kernel in 5.16 and 5.17. These patches introduce a
  groundbreaking capability: enabling the seamless sharing of buffers
  among various drivers. This not only bolsters the solution's
  performance but also minimizes the need for data copying, effectively
  enhancing efficiency across the board.

  The new functionality is isolated such that existing user will not
  execute these new code paths.

  * First 3 patches adds a new api to the RDMA subsystem that allows drivers to 
get a pinned dmabuf memory
  region without requiring an implementation of the move_notify callback.

  https://lore.kernel.org/all/20211012120903.96933-1-galpr...@amazon.com/

  * The remaining patches add support for DMABUF when creating a devx umem. 
devx umems
  are quite similar to MR's execpt they cannot be revoked, so this uses the 
  dmabuf pinned memory flow. Several mlx5dv flows require umem and cannot
  work with MR. 

  https://lore.kernel.org/all/0-v1-bd147097458e+ede-
  umem_dmabuf_...@nvidia.com/

  [Test Plan]

  SW Configuration:
  • Download CUDA 12.2 run file 

[Kernel-packages] [Bug 2038099] Re: Enable building and signing of the nvidia-fs out-of-tree kernel module.

2023-10-10 Thread Ian May
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux-nvidia-6.2 (Ubuntu)
   Status: New => Fix Committed

** Changed in: linux-nvidia-6.2 (Ubuntu Jammy)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2038099

Title:
  Enable building and signing of the nvidia-fs out-of-tree kernel
  module.

Status in linux-nvidia-6.2 package in Ubuntu:
  Fix Committed
Status in linux-nvidia-6.2 source package in Jammy:
  Fix Committed

Bug description:
  [Issue]

  The nvidia-fs kernel module is a must have for Nvidia optimized
  kernels. There is now a version that is compatible with the Grace
  processor. Integrate the changes necessary to build and sign this out-
  of-tree kernel module.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2038099/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2033685] Re: Pull-request to address ARM CoreSoght PMU issues

2023-10-10 Thread Ian May
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2033685

Title:
  Pull-request to address ARM CoreSoght PMU issues

Status in linux-nvidia-6.2 package in Ubuntu:
  Fix Committed
Status in linux-nvidia-6.2 source package in Jammy:
  Fix Committed

Bug description:
  [issue]

  This patch set addresses several CoreSight PMU issues. These are all
  upstream patches.

  
  Commit Summary

  2940a5e perf: arm_cspmu: Fix variable dereference warning
  06f6951 perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used
  292771d perf/arm_cspmu: Fix event attribute type
  6992931 ACPI/APMT: Don't register invalid resource
  48f4b92 perf/arm_cspmu: Clean up ACPI dependency
  7da1852 perf/arm_cspmu: Decouple APMT dependency
  d3d56a4 perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE

  File Changes (4 files)

  M drivers/acpi/arm64/apmt.c (10)
  M drivers/perf/arm_cspmu/Kconfig (3)
  M drivers/perf/arm_cspmu/arm_cspmu.c (95)
  M drivers/perf/arm_cspmu/arm_cspmu.h (5)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2033685/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2037688] Re: Pull-request to address TPM bypass issue

2023-10-10 Thread Ian May
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2037688

Title:
  Pull-request to address TPM bypass issue

Status in linux-nvidia-6.2 package in Ubuntu:
  Fix Committed
Status in linux-nvidia-6.2 source package in Jammy:
  Fix Committed

Bug description:
  NVIDIA: [Config]: Ensure the TPM is available before IMA
  initializes

  Set the following configs:

CONFIG_SPI_TEGRA210_QUAD=y
CONFIG_TCG_TIS_SPI=y

  On Grace systems, the IMA driver emits the following log:

ima: No TPM chip found, activating TPM-bypass!

  This occurs because the IMA driver initializes before we are able to 
detect
  the TPM. This will always be the case when the drivers required to
  communicate with the TPM, spi_tegra210_quad and tpm_tis_spi, are built as
  modules.

  Having these drivers as built-ins ensures that the TPM is available before
  the IMA driver initializes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2037688/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2033685] Re: Pull-request to address ARM CoreSoght PMU issues

2023-10-10 Thread Ian May
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux-nvidia-6.2 (Ubuntu)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2033685

Title:
  Pull-request to address ARM CoreSoght PMU issues

Status in linux-nvidia-6.2 package in Ubuntu:
  Fix Committed
Status in linux-nvidia-6.2 source package in Jammy:
  New

Bug description:
  [issue]

  This patch set addresses several CoreSight PMU issues. These are all
  upstream patches.

  
  Commit Summary

  2940a5e perf: arm_cspmu: Fix variable dereference warning
  06f6951 perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used
  292771d perf/arm_cspmu: Fix event attribute type
  6992931 ACPI/APMT: Don't register invalid resource
  48f4b92 perf/arm_cspmu: Clean up ACPI dependency
  7da1852 perf/arm_cspmu: Decouple APMT dependency
  d3d56a4 perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE

  File Changes (4 files)

  M drivers/acpi/arm64/apmt.c (10)
  M drivers/perf/arm_cspmu/Kconfig (3)
  M drivers/perf/arm_cspmu/arm_cspmu.c (95)
  M drivers/perf/arm_cspmu/arm_cspmu.h (5)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2033685/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2037688] Re: Pull-request to address TPM bypass issue

2023-10-10 Thread Ian May
** Also affects: linux-nvidia-6.2 (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux-nvidia-6.2 (Ubuntu)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2037688

Title:
  Pull-request to address TPM bypass issue

Status in linux-nvidia-6.2 package in Ubuntu:
  Fix Committed
Status in linux-nvidia-6.2 source package in Jammy:
  New

Bug description:
  NVIDIA: [Config]: Ensure the TPM is available before IMA
  initializes

  Set the following configs:

CONFIG_SPI_TEGRA210_QUAD=y
CONFIG_TCG_TIS_SPI=y

  On Grace systems, the IMA driver emits the following log:

ima: No TPM chip found, activating TPM-bypass!

  This occurs because the IMA driver initializes before we are able to 
detect
  the TPM. This will always be the case when the drivers required to
  communicate with the TPM, spi_tegra210_quad and tpm_tis_spi, are built as
  modules.

  Having these drivers as built-ins ensures that the TPM is available before
  the IMA driver initializes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2037688/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-10 Thread Colin Ian King
Unable to collect data via apport-collect due to VPN restrictions.

** Changed in: linux (Ubuntu Mantic)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-10 Thread Colin Ian King
** Changed in: linux (Ubuntu Mantic)
   Status: Incomplete => New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  New
Status in linux source package in Mantic:
  New

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-09 Thread Colin Ian King
Did an hour of soak testing with  arm64 kernel
https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5.6/arm64/linux-image-
unsigned-6.5.6-060506-generic_6.5.6-060506.202310061235_arm64.deb and
cannot reproduce this issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Mantic:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-09 Thread Colin Ian King
And can reproduce on real H/W on a 24 core "SC2A11" is a multi-core chip
with 24 cores of ARM® Cortex-A53. with Linux 6.5.0-7-generic #7-Ubuntu
SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC 2023 aarch64 aarch64 aarch64
GNU/Linux


[  201.075720] EXT4-fs (loop13): mounted filesystem 
52e32882-8b3a-47ce-8bf6-ce095960b1e7 r/w with ordered data mode. Quota mode: 
none.
[  516.665218] [ cut here ]
[  516.665249] kernel BUG at fs/dcache.c:2050!
[  516.665279] Internal error: Oops - BUG: f2000800 [#1] SMP
[  516.665301] Modules linked in: tls vhost_vsock 
vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock nf_conntrack_netlink 
xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE 
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat 
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge 
stp llc overlay cfg80211 binfmt_misc zfs(PO) nls_iso8859_1 spl(O) 
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core 
snd_hwdep snd_pcm snd_timer snd soundcore uio_pdrv_genirq uio dm_multipath 
efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon 
raid6_pq libcrc32c raid1 raid0 multipath linear nouveau crct10dif_ce 
drm_ttm_helper polyval_ce polyval_generic ttm ghash_ce i2c_algo_bit 
drm_display_helper cec rc_core sm4 drm_kms_helper sha2_ce sha256_arm64 xhci_pci 
drm r8169 sha1_ce ahci xhci_pci_renesas realtek sdhci_f_sdh30 sdhci_pltfm sdhci 
gpio_keys
[  516.665743]  netsec gpio_mb86s7x i2c_synquacer aes_neon_bs aes_neon_blk 
aes_ce_blk aes_ce_cipher
[  516.665900] CPU: 2 PID: 17292 Comm: stress-ng-filen Tainted: P   O   
6.5.0-7-generic #7-Ubuntu
[  516.665927] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS 
build #85 Nov  6 2020
[  516.665948] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  516.665974] pc : d_instantiate_new+0xa8/0xc8
[  516.666006] lr : ext4_add_nondir+0x10c/0x160
[  516.666029] sp : 8000857838d0
[  516.666043] x29: 8000857838d0 x28:  x27: 8000816ea980
[  516.666076] x26: 0008119915e0 x25: 8180 x24: 000856c61ce8
[  516.666108] x23: 0008119915c0 x22: 800085783950 x21: 00080359e1c0
[  516.666140] x20: 0008561b1ce8 x19:  x18: 800085a6d068
[  516.666172] x17:  x16:  x15: 878b4681cc52c99d
[  516.666204] x14: d59de2a9feb89dca x13: 85e2878b4681cc52 x12: c99dd59de2a9feb8
[  516.666236] x11: e3b9eedbdf1c7d27 x10: 732db84fa4ef339b x9 : 8000807690bc
[  516.666268] x8 :  x7 :  x6 : 
[  516.666299] x5 :  x4 :  x3 : 
[  516.666330] x2 : 8000836b52e8 x1 : 0008561b1ce8 x0 : 0008119915c0
[  516.666362] Call trace:
[  516.666377]  d_instantiate_new+0xa8/0xc8
[  516.666401]  ext4_create+0x120/0x238
[  516.666422]  lookup_open.isra.0+0x480/0x4d0
[  516.666447]  open_last_lookups+0x160/0x3b0
[  516.666466]  path_openat+0xa0/0x2a0
[  516.666484]  do_filp_open+0xa8/0x180
[  516.666502]  do_sys_openat2+0xe8/0x128
[  516.666524]  __arm64_sys_openat+0x70/0xe0
[  516.666545]  invoke_syscall+0x7c/0x128
[  516.666566]  el0_svc_common.constprop.0+0x5c/0x168
[  516.666586]  do_el0_svc+0x38/0x68
[  516.04]  el0_svc+0x30/0xe0
[  516.26]  el0t_64_sync_handler+0x148/0x158
[  516.47]  el0t_64_sync+0x1b0/0x1b8
[  516.74] Code: d282 d2800010 d2800011 d65f03c0 (d421) 
[  516.96] ---[ end trace  ]---

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Mantic:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher 

[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-09 Thread Colin Ian King
Reproduced this with mainline arm64 kernel
https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5.5/arm64/linux-image-
unsigned-6.5.5-060505-generic_6.5.5-060505.202309230703_arm64.deb

[  219.219042] Internal error: Oops - BUG: f2000800 [#1] SMP
[  219.262013] Modules linked in: cfg80211 binfmt_misc nls_iso8859_1 dm_multipat
h drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_
generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_t
x xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_ce poly
val_ce polyval_generic ghash_ce sm4 sha2_ce sha256_arm64 virtio_net sha1_ce arm_
smccc_trng virtio_rng net_failover xhci_pci failover xhci_pci_renesas aes_neon_b
s aes_neon_blk aes_ce_blk aes_ce_cipher
[  219.322456] CPU: 13 PID: 1182 Comm: stress-ng-filen Not tainted 6.5.5-060505-
generic #202309230703
[  219.332405] Hardware name: QEMU KVM Virtual Machine, BIOS 2023.05-2 09/23/202
3
[  219.340433] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  219.348163] pc : d_instantiate_new+0xa8/0xc8
[  219.352942] lr : ext4_add_nondir+0x10c/0x160
[  219.357822] sp : 8000826ab9d0
[  219.361517] x29: 8000826ab9d0 x28:  x27: a9b65720a940
[  219.369535] x26: 1ea33582d2e0 x25: 8180 x24: 1ea3c2bb3d48
[  219.377494] x23: 1ea33582d2c0 x22: 8000826abab0 x21: 1ea3c3344930
[  219.385428] x20: 1ea324bda188 x19:  x18: 800080b4d068
[  219.393336] x17:  x16:  x15: 9afaefe7af176647
[  219.401279] x14: f302afa80109b8f3 x13: a3469afaefe7af17 x12: 6647f302afa80109
[  219.409258] x11: b4e7e46bc44fb52e x10: 4e81094291a860ce x9 : a9b6562b1b74
[  219.417639] x8 :  x7 :  x6 : 
[  219.426015] x5 :  x4 :  x3 : 
[  219.434462] x2 : a9b6591b27e8 x1 : 1ea324bda188 x0 : 1ea33582d2c0
[  219.442708] Call trace:
[  219.445901]  d_instantiate_new+0xa8/0xc8
[  219.450786]  ext4_create+0x120/0x238
[  219.454800]  lookup_open.isra.0+0x478/0x4c8
[  219.459476]  open_last_lookups+0x160/0x3b0
[  219.464060]  path_openat+0x9c/0x290
[  219.468062]  do_filp_open+0xac/0x188
[  219.472175]  do_sys_openat2+0xe4/0x120
[  219.476412]  __arm64_sys_openat+0x6c/0xd8
[  219.481300]  invoke_syscall+0x7c/0x128
[  219.485876]  el0_svc_common.constprop.0+0x5c/0x168
[  219.491561]  do_el0_svc+0x38/0x68
[  219.495523]  el0_svc+0x30/0xe0
[  219.499161]  el0t_64_sync_handler+0x148/0x158
[  219.504139]  el0t_64_sync+0x1b0/0x1b8
[  219.508320] Code: d282 d2800010 d2800011 d65f03c0 (d421) 
[  219.515430] ---[ end trace  ]---

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Mantic:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : 

[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-09 Thread Colin Ian King
Reproduced this with mainline arm64 kernel
https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5/arm64/linux-image-
unsigned-6.5.0-060500-generic_6.5.0-060500.202308271831_arm64.deb

[  184.853731] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  184.862627] pc : d_instantiate_new+0xa8/0xc8
[  184.867973] lr : ext4_add_nondir+0xf0/0x148
[  184.872959] sp : 8000828ab950
[  184.877059] x29: 8000828ab950 x28:  x27: d975b8b9a6c0
[  184.885032] x26: 7b0094e32c20 x25: 8180 x24: 7b01432e9848
[  184.893573] x23: 8000828aba30 x22: 7b0094e32c00 x21: 7b0172d574d0
[  184.902071] x20: 7b0089fbc688 x19:  x18: 800082295068
[  184.910550] x17:  x16:  x15: 5e9ca062546ae354
[  184.919056] x14: 998c9ec3ecc3a882 x13: 24d23ffaf8b470b6 x12: 022485883b51bee2
[  184.927692] x11: 5c7ac5c18df459ab x10: 6e24d23ffaf8b470 x9 : d975b7c3d730
[  184.936212] x8 :  x7 :  x6 : 
[  184.944811] x5 :  x4 :  x3 : 
[  184.953651] x2 : d975bab42cf0 x1 : 7b0089fbc688 x0 : 7b0094e32c00
[  184.962508] Call trace:
[  184.965316]  d_instantiate_new+0xa8/0xc8
[  184.969803]  ext4_create+0x120/0x238
[  184.973910]  lookup_open.isra.0+0x478/0x4c8
[  184.978689]  open_last_lookups+0x160/0x3b0
[  184.983374]  path_openat+0x9c/0x290
[  184.987372]  do_filp_open+0xac/0x188
[  184.991444]  do_sys_openat2+0xe4/0x120
[  184.995701]  __arm64_sys_openat+0x6c/0xd8
[  185.000271]  invoke_syscall+0x7c/0x128
[  185.004520]  el0_svc_common.constprop.0+0x5c/0x168
[  185.009977]  do_el0_svc+0x38/0x68
[  185.013775]  el0_svc+0x30/0xe0
[  185.017265]  el0t_64_sync_handler+0x148/0x158
[  185.022183]  el0t_64_sync+0x1b0/0x1b8
[  185.026332] Code: d282 d2800010 d2800011 d65f03c0 (d421) 
[  185.033606] ---[ end trace  ]---

Took a while to trigger.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Mantic:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 : 

[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-09 Thread Colin Ian King
Can't reproduce this with mainline arm64 kernel
https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.5.6/arm64/linux-image-
unsigned-6.5.6-060506-generic_6.5.6-060506.202310061235_arm64.deb

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Mantic:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-08 Thread Colin Ian King
Can't seem to trip the issue on a 24 core x86 instance, maybe this is
ARM64 specific.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Mantic:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-08 Thread Colin Ian King
** Also affects: linux (Ubuntu Mantic)
   Importance: High
   Status: Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Mantic:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-08 Thread Colin Ian King
I created a 1GB file and created a fresh ext4 file system on it and loop
back mounted it on /mnt, I created test directory /mnt/test and ran:

/stress-ng --filename 0 --temp-path /mnt/test --klog-check


Managed to trip the kernel crash again. So it appears to occur on a fresh ext4 
file system too :-(

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2038768] [NEW] arm64: linux: stress-ng filename stressor crashes kernel

2023-10-08 Thread Colin Ian King
Public bug reported:

Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

How to reproduce:

Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
Install latest stress-ng from git repo:

sudo apt-get update
sudo apt-get build-dep stress-ng
git clone git://github.com/ColinIanKing/stress-ng
cd stress-ng
make clean
make -j 24
make verify-test-all

When we reach the filename stressor the kernel crashes as follows:

[  902.594715] kernel BUG at fs/dcache.c:2050!
[  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
[  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio iommu
fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt cast6_ge
neric cast5_generic cast_common camellia_generic blowfish_generic blowfish_commo
n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm aes_
ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 binfmt_mis
c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables x_ta
bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy
 async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipa
th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_ar
m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs aes_
neon_blk aes_ce_blk aes_ce_cipher
[  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 6.5.0-7-gener
ic #7-Ubuntu
[  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  902.715488] pc : d_instantiate_new+0xa8/0xc8
[  902.720889] lr : ext4_add_nondir+0x10c/0x160
[  902.725702] sp : 80008b6d3930
[  902.729390] x29: 80008b6d3930 x28:  x27: bd164e51a980
[  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 678a541f7968
[  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 678a6a25bcb0
[  902.755776] x20: 678a36f8f028 x19:  x18: 80008af45068
[  902.764647] x17:  x16:  x15: ecececececececec
[  902.773135] x14: ecececececececec x13: ecececececececec x12: ecececececececec
[  902.781386] x11: ecececececececec x10: ecececececececec x9 : bd164d5990bc
[  902.789346] x8 :  x7 :  x6 : 
[  902.798564] x5 :  x4 :  x3 : 
[  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 6789f3b68f00
[  902.815544] Call trace:
[  902.818870]  d_instantiate_new+0xa8/0xc8
[  902.823523]  ext4_create+0x120/0x238
[  902.827716]  lookup_open.isra.0+0x480/0x4d0
[  902.832480]  open_last_lookups+0x160/0x3b0
[  902.837060]  path_openat+0xa0/0x2a0
[  902.840975]  do_filp_open+0xa8/0x180
[  902.845582]  do_sys_openat2+0xe8/0x128
[  902.850426]  __arm64_sys_openat+0x70/0xe0
[  902.854952]  invoke_syscall+0x7c/0x128
[  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
[  902.864979]  do_el0_svc+0x38/0x68
[  902.869364]  el0_svc+0x30/0xe0
[  902.873401]  el0t_64_sync_handler+0x148/0x158
[  902.878336]  el0t_64_sync+0x1b0/0x1b8
[  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
[  902.890632] ---[ end trace  ]---

** Affects: linux (Ubuntu)
 Importance: High
 Status: Incomplete

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Description changed:

  Running latest Ubuntu mantic with kernel: Linux mantic-arm64
  6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28 19:12:05 UTC
  2023 aarch64 aarch64 aarch64 GNU/Linux
- 
  
  How to reproduce:
  
  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:
  
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all
  
  When we reach the filename stressor the kernel crashes as follows:
  
- 
  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
-  async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 

[Kernel-packages] [Bug 2038768] Re: arm64: linux: stress-ng filename stressor crashes kernel

2023-10-08 Thread Colin Ian King
Note that just running stress-ng with --filename 0 will reproduce the
issue. I'm testing this now on a cleanly formatted ext4 file system

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2038768

Title:
  arm64: linux: stress-ng filename stressor crashes kernel

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Running latest Ubuntu mantic (ext4 file system) with kernel: Linux
  mantic-arm64 6.5.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 28
  19:12:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

  How to reproduce:

  Fire up a 24 instance ARM64 QEMU instance with Ubuntu Mantic Server.
  Install latest stress-ng from git repo:

  sudo apt-get update
  sudo apt-get build-dep stress-ng
  git clone git://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean
  make -j 24
  make verify-test-all

  When we reach the filename stressor the kernel crashes as follows:

  [  902.594715] kernel BUG at fs/dcache.c:2050!
  [  902.598205] Internal error: Oops - BUG: f2000800 [#1] SMP
  [  902.603127] Modules linked in: dccp_ipv4 dccp atm vfio_iommu_type1 vfio 
iommu
  fd cmac algif_rng twofish_generic twofish_common serpent_generic fcrypt 
cast6_ge
  neric cast5_generic cast_common camellia_generic blowfish_generic 
blowfish_commo
  n aes_arm64 algif_skcipher algif_hash aria_generic sm4_generic sm4_neon ccm 
aes_
  ce_ccm des_generic libdes authenc aegis128 algif_aead af_alg cfg80211 
binfmt_mis
  c nls_iso8859_1 dm_multipath drm efi_pstore dmi_sysfs qemu_fw_cfg ip_tables 
x_ta
  bles autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 
multipa
  th linear crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce 
sha256_ar
  m64 sha1_ce arm_smccc_trng xhci_pci virtio_rng xhci_pci_renesas aes_neon_bs 
aes_
  neon_blk aes_ce_blk aes_ce_cipher
  [  902.689941] CPU: 1 PID: 91317 Comm: stress-ng-filen Not tainted 
6.5.0-7-gener
  ic #7-Ubuntu
  [  902.699281] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  [  902.706902] pstate: 4045 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  902.715488] pc : d_instantiate_new+0xa8/0xc8
  [  902.720889] lr : ext4_add_nondir+0x10c/0x160
  [  902.725702] sp : 80008b6d3930
  [  902.729390] x29: 80008b6d3930 x28:  x27: 
bd164e51a980
  [  902.738705] x26: 6789f3b68f20 x25: 8180 x24: 
678a541f7968
  [  902.747003] x23: 6789f3b68f00 x22: 80008b6d39b0 x21: 
678a6a25bcb0
  [  902.755776] x20: 678a36f8f028 x19:  x18: 
80008af45068
  [  902.764647] x17:  x16:  x15: 
ecececececececec
  [  902.773135] x14: ecececececececec x13: ecececececececec x12: 
ecececececececec
  [  902.781386] x11: ecececececececec x10: ecececececececec x9 : 
bd164d5990bc
  [  902.789346] x8 :  x7 :  x6 : 

  [  902.798564] x5 :  x4 :  x3 : 

  [  902.806851] x2 : bd16504e4ce0 x1 : 678a36f8f028 x0 : 
6789f3b68f00
  [  902.815544] Call trace:
  [  902.818870]  d_instantiate_new+0xa8/0xc8
  [  902.823523]  ext4_create+0x120/0x238
  [  902.827716]  lookup_open.isra.0+0x480/0x4d0
  [  902.832480]  open_last_lookups+0x160/0x3b0
  [  902.837060]  path_openat+0xa0/0x2a0
  [  902.840975]  do_filp_open+0xa8/0x180
  [  902.845582]  do_sys_openat2+0xe8/0x128
  [  902.850426]  __arm64_sys_openat+0x70/0xe0
  [  902.854952]  invoke_syscall+0x7c/0x128
  [  902.859155]  el0_svc_common.constprop.0+0x5c/0x168
  [  902.864979]  do_el0_svc+0x38/0x68
  [  902.869364]  el0_svc+0x30/0xe0
  [  902.873401]  el0t_64_sync_handler+0x148/0x158
  [  902.878336]  el0t_64_sync+0x1b0/0x1b8
  [  902.882513] Code: d282 d2800010 d2800011 d65f03c0 (d421)
  [  902.890632] ---[ end trace  ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2038768/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Kernel-packages] [Bug 2031352] Re: Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot screen

2023-09-05 Thread Ian Russel Adem
This also fixed the problem on my end. My Dell xps 15 9510 NVIDIA GA107m
RTX3050 Ti
Linux version 6.2.0-31-generic (buildd@lcy02-amd64-032)
(x86_64-linux-gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU
Binutils for Ubuntu) 2.38)


On Tue, Sep 5, 2023 at 12:50 AM Juan Manuel Vicente <
2031...@bugs.launchpad.net> wrote:

> After following @juergh updates. I realized the problem was the nouveau
> drivers, also I saw my fresh installation (Ubuntu 22.04.3) was not
> detecting my RTX 3070. So I fixed both problems installing the drivers.
>
> > sudo apt install nvidia-driver-535
>
> Now, I can shutdown and/or restart my machine without issues. The only
> problem is this driver is the propertary one. However in my case is not
> a problem.
>
> Regards
> Juan
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2031352
>
> Title:
>   Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot
>   screen
>
> Status in linux package in Ubuntu:
>   Confirmed
> Status in linux-hwe-6.2 package in Ubuntu:
>   Confirmed
> Status in systemd package in Ubuntu:
>   Invalid
> Status in linux-hwe-6.2 source package in Jammy:
>   Confirmed
> Status in linux source package in Lunar:
>   Confirmed
>
> Bug description:
>   [Impact]
>
>   After updating to Kernel 6.2 a few days ago, I have been experiencing
>   issues with my system's shutdown and reboot functions. During these
>   processes, the system becomes unresponsive and hangs on a black
>   screen, which displays both the Dell and Ubuntu logos. This issue is
>   inconsistent; it happens sporadically. Currently, the only workaround
>   I've found to successfully shut down the system is to forcibly power
>   off the machine by holding down the power button for 5 seconds.
>
>   I've also tested a fresh installation of Ubuntu 22.04.3.
>
>   [Fix]
>
>   Updated patch from linux-next:
>   https://patchwork.freedesktop.org/patch/538562/
>
>   [Test Case]
>
>   Suspend,resume,shutdown,reboot should all work correctly. No nouveau
>   stack trace in the kernel log.
>
>   [Where Problems Could Occur]
>
>   Limited to nouveau driver that wants to load nonexistent ACR firmware.
>   Only nvidia GPUs are affected.
>
>   [Additional information]
>
>   
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 22.04
>   Package: systemd 249.11-0ubuntu3.9
>   ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13
>   Uname: Linux 6.2.0-26-generic x86_64
>   NonfreeKernelModules: nvidia_modeset nvidia
>   ApportVersion: 2.20.11-0ubuntu82.5
>   Architecture: amd64
>   CasperMD5CheckResult: pass
>   CurrentDesktop: ubuntu:GNOME
>   Date: Mon Aug 14 22:41:14 2023
>   InstallationDate: Installed on 2023-08-14 (1 days ago)
>   InstallationMedia: Ubuntu 22.04.3 2023.08.13 LTS (20230813)
>   MachineType: Dell Inc. XPS 8930
>   ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-26-generic
> root=UUID=14d1ee7a-565f-4ba4-b6dd-7bc16e487451 ro quiet splash vt.handoff=7
>   SourcePackage: systemd
>   UpgradeStatus: No upgrade log present (probably fresh install)
>   dmi.bios.date: 03/14/2023
>   dmi.bios.release: 1.1
>   dmi.bios.vendor: Dell Inc.
>   dmi.bios.version: 1.1.30
>   dmi.board.name: 0T88YD
>   dmi.board.vendor: Dell Inc.
>   dmi.board.version: A00
>   dmi.chassis.type: 3
>   dmi.chassis.vendor: Dell Inc.
>   dmi.chassis.version: Not Specified
>   dmi.modalias:
> dmi:bvnDellInc.:bvr1.1.30:bd03/14/2023:br1.1:svnDellInc.:pnXPS8930:pvr1.1.30:rvnDellInc.:rn0T88YD:rvrA00:cvnDellInc.:ct3:cvrNotSpecified:sku0859:
>   dmi.product.family: XPS
>   dmi.product.name: XPS 8930
>   dmi.product.sku: 0859
>   dmi.product.version: 1.1.30
>   dmi.sys.vendor: Dell Inc.
>   modified.conffile..etc.default.apport:
># set this to 0 to disable apport, or to 1 to enable it
># you can temporarily override this with
># sudo service apport start force_start=1
>enabled=0
>   mtime.conffile..etc.default.apport: 2023-08-13T20:57:27
>   mtime.conffile..etc.systemd.system.conf: 2023-08-13T20:57:27
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2031352/+subscriptions
>
>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2031352

Title:
  Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot
  screen

Status in linux package in Ubuntu:
  Confirmed
Status in linux-hwe-6.2 package in Ubuntu:
  Confirmed
Status in linux-hwe-6.2 source package in Jammy:
  Confirmed
Status in linux source package in Lunar:
  Confirmed

Bug description:
  [Impact]

  After updating to Kernel 6.2 a few days ago, I have been experiencing
  issues with my system's shutdown and reboot functions. During these
  processes, the system becomes unresponsive and hangs on a black
  screen, which displays both the Dell and Ubuntu logos. This issue is
  inconsistent; 

[Kernel-packages] [Bug 2031352] Re: Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot screen

2023-09-05 Thread Ian Russel Adem
@juanma-v82 this also fixed my issue.

> sudo apt install nvidia-driver-535

Linux version 6.2.0-31-generic (buildd@lcy02-amd64-032) (x86_64-linux-
gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils
for Ubuntu) 2.38)

Dell xps 15 9510 NVIDIA Corporation GA107M [GeForce RTX 3050 Ti Mobile]

Regards,

Ian

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2031352

Title:
  Nouveau driver crash - Ubuntu 22.04.3 LTS stuck on power-off/reboot
  screen

Status in linux package in Ubuntu:
  Confirmed
Status in linux-hwe-6.2 package in Ubuntu:
  Confirmed
Status in linux-hwe-6.2 source package in Jammy:
  Confirmed
Status in linux source package in Lunar:
  Confirmed

Bug description:
  [Impact]

  After updating to Kernel 6.2 a few days ago, I have been experiencing
  issues with my system's shutdown and reboot functions. During these
  processes, the system becomes unresponsive and hangs on a black
  screen, which displays both the Dell and Ubuntu logos. This issue is
  inconsistent; it happens sporadically. Currently, the only workaround
  I've found to successfully shut down the system is to forcibly power
  off the machine by holding down the power button for 5 seconds.

  I've also tested a fresh installation of Ubuntu 22.04.3.

  [Fix]

  Updated patch from linux-next:
  https://patchwork.freedesktop.org/patch/538562/

  [Test Case]

  Suspend,resume,shutdown,reboot should all work correctly. No nouveau
  stack trace in the kernel log.

  [Where Problems Could Occur]

  Limited to nouveau driver that wants to load nonexistent ACR firmware.
  Only nvidia GPUs are affected.

  [Additional information]

  

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: systemd 249.11-0ubuntu3.9
  ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13
  Uname: Linux 6.2.0-26-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Date: Mon Aug 14 22:41:14 2023
  InstallationDate: Installed on 2023-08-14 (1 days ago)
  InstallationMedia: Ubuntu 22.04.3 2023.08.13 LTS (20230813)
  MachineType: Dell Inc. XPS 8930
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-26-generic 
root=UUID=14d1ee7a-565f-4ba4-b6dd-7bc16e487451 ro quiet splash vt.handoff=7
  SourcePackage: systemd
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 03/14/2023
  dmi.bios.release: 1.1
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 1.1.30
  dmi.board.name: 0T88YD
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A00
  dmi.chassis.type: 3
  dmi.chassis.vendor: Dell Inc.
  dmi.chassis.version: Not Specified
  dmi.modalias: 
dmi:bvnDellInc.:bvr1.1.30:bd03/14/2023:br1.1:svnDellInc.:pnXPS8930:pvr1.1.30:rvnDellInc.:rn0T88YD:rvrA00:cvnDellInc.:ct3:cvrNotSpecified:sku0859:
  dmi.product.family: XPS
  dmi.product.name: XPS 8930
  dmi.product.sku: 0859
  dmi.product.version: 1.1.30
  dmi.sys.vendor: Dell Inc.
  modified.conffile..etc.default.apport:
   # set this to 0 to disable apport, or to 1 to enable it
   # you can temporarily override this with
   # sudo service apport start force_start=1
   enabled=0
  mtime.conffile..etc.default.apport: 2023-08-13T20:57:27
  mtime.conffile..etc.systemd.system.conf: 2023-08-13T20:57:27

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2031352/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port

2023-08-30 Thread Ian May
** Changed in: linux (Ubuntu Jammy)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2026776

Title:
  arm64+ast2600: No Output from BMC's VGA port

Status in linux package in Ubuntu:
  Triaged
Status in linux-hwe-5.19 package in Ubuntu:
  Won't Fix
Status in linux-hwe-6.2 package in Ubuntu:
  Fix Committed
Status in linux source package in Jammy:
  Fix Committed
Status in linux-hwe-5.19 source package in Jammy:
  Won't Fix
Status in linux-hwe-6.2 source package in Jammy:
  Fix Committed
Status in linux source package in Lunar:
  Fix Committed

Bug description:
  SRU Justification:

  [ Impact ]

  On systems that have the following combination of hardware

  1) arm64 CPU 
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  No output when connecting a display to the BMC's VGA port.

  [ Fix ]

  For AST2500+ MMIO should be enabled by default.

  [ Test Plan ]

  Test on targeted hardware to make sure BMC is displaying output.

  [ Where problems could occur ]

  Not aware of any potential problems, but any should be confined to
  ASPEED AST2500+ hardware.

  [ Other Info ]

  Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has
  been tested with affected BMC.


  
  [Issue]

  On systems that have the following combination of hardware...:

  1) arm64 CPU
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  .. we see no output when connecting a display to the BMC's VGA port.

  Upon further investigation, we see that applying the following patch
  fixes this issue:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228

  [Action]

  Please apply the following two backports to the appropriate Ubuntu HWE
  kernels:

  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560
  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port

2023-08-29 Thread Ian May
** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: linux-hwe-5.19 (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: linux-hwe-6.2 (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Changed in: linux-hwe-5.19 (Ubuntu Jammy)
   Status: New => Fix Committed

** Changed in: linux-hwe-6.2 (Ubuntu Jammy)
   Status: New => Fix Committed

** Changed in: linux-hwe-5.19 (Ubuntu Jammy)
   Status: Fix Committed => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2026776

Title:
  arm64+ast2600: No Output from BMC's VGA port

Status in linux package in Ubuntu:
  Triaged
Status in linux-hwe-5.19 package in Ubuntu:
  Won't Fix
Status in linux-hwe-6.2 package in Ubuntu:
  Fix Committed
Status in linux source package in Jammy:
  New
Status in linux-hwe-5.19 source package in Jammy:
  Won't Fix
Status in linux-hwe-6.2 source package in Jammy:
  Fix Committed
Status in linux source package in Lunar:
  Fix Committed

Bug description:
  SRU Justification:

  [ Impact ]

  On systems that have the following combination of hardware

  1) arm64 CPU 
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  No output when connecting a display to the BMC's VGA port.

  [ Fix ]

  For AST2500+ MMIO should be enabled by default.

  [ Test Plan ]

  Test on targeted hardware to make sure BMC is displaying output.

  [ Where problems could occur ]

  Not aware of any potential problems, but any should be confined to
  ASPEED AST2500+ hardware.

  [ Other Info ]

  Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has
  been tested with affected BMC.


  
  [Issue]

  On systems that have the following combination of hardware...:

  1) arm64 CPU
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  .. we see no output when connecting a display to the BMC's VGA port.

  Upon further investigation, we see that applying the following patch
  fixes this issue:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228

  [Action]

  Please apply the following two backports to the appropriate Ubuntu HWE
  kernels:

  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560
  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port

2023-08-29 Thread Ian May
** Changed in: linux-hwe-6.2 (Ubuntu)
   Status: New => Incomplete

** Changed in: linux-hwe-6.2 (Ubuntu)
   Status: Incomplete => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2026776

Title:
  arm64+ast2600: No Output from BMC's VGA port

Status in linux package in Ubuntu:
  Triaged
Status in linux-hwe-5.19 package in Ubuntu:
  Won't Fix
Status in linux-hwe-6.2 package in Ubuntu:
  Fix Committed
Status in linux source package in Lunar:
  Fix Committed

Bug description:
  SRU Justification:

  [ Impact ]

  On systems that have the following combination of hardware

  1) arm64 CPU 
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  No output when connecting a display to the BMC's VGA port.

  [ Fix ]

  For AST2500+ MMIO should be enabled by default.

  [ Test Plan ]

  Test on targeted hardware to make sure BMC is displaying output.

  [ Where problems could occur ]

  Not aware of any potential problems, but any should be confined to
  ASPEED AST2500+ hardware.

  [ Other Info ]

  Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has
  been tested with affected BMC.


  
  [Issue]

  On systems that have the following combination of hardware...:

  1) arm64 CPU
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  .. we see no output when connecting a display to the BMC's VGA port.

  Upon further investigation, we see that applying the following patch
  fixes this issue:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228

  [Action]

  Please apply the following two backports to the appropriate Ubuntu HWE
  kernels:

  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560
  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2031352] Re: Ubuntu 22.04.3 LTS stuck on power-off/reboot screen

2023-08-28 Thread Ian Russel Adem
** Changed in: systemd (Ubuntu)
   Status: Invalid => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2031352

Title:
  Ubuntu 22.04.3 LTS stuck on power-off/reboot screen

Status in linux-hwe-6.2 package in Ubuntu:
  Confirmed
Status in systemd package in Ubuntu:
  Confirmed
Status in linux-hwe-6.2 source package in Jammy:
  Confirmed

Bug description:
  After updating to Kernel 6.2 a few days ago, I have been experiencing
  issues with my system's shutdown and reboot functions. During these
  processes, the system becomes unresponsive and hangs on a black
  screen, which displays both the Dell and Ubuntu logos. This issue is
  inconsistent; it happens sporadically. Currently, the only workaround
  I've found to successfully shut down the system is to forcibly power
  off the machine by holding down the power button for 5 seconds.

  I've also tested a fresh installation of Ubuntu 22.04.3.

  

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: systemd 249.11-0ubuntu3.9
  ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13
  Uname: Linux 6.2.0-26-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Date: Mon Aug 14 22:41:14 2023
  InstallationDate: Installed on 2023-08-14 (1 days ago)
  InstallationMedia: Ubuntu 22.04.3 2023.08.13 LTS (20230813)
  MachineType: Dell Inc. XPS 8930
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-26-generic 
root=UUID=14d1ee7a-565f-4ba4-b6dd-7bc16e487451 ro quiet splash vt.handoff=7
  SourcePackage: systemd
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 03/14/2023
  dmi.bios.release: 1.1
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 1.1.30
  dmi.board.name: 0T88YD
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A00
  dmi.chassis.type: 3
  dmi.chassis.vendor: Dell Inc.
  dmi.chassis.version: Not Specified
  dmi.modalias: 
dmi:bvnDellInc.:bvr1.1.30:bd03/14/2023:br1.1:svnDellInc.:pnXPS8930:pvr1.1.30:rvnDellInc.:rn0T88YD:rvrA00:cvnDellInc.:ct3:cvrNotSpecified:sku0859:
  dmi.product.family: XPS
  dmi.product.name: XPS 8930
  dmi.product.sku: 0859
  dmi.product.version: 1.1.30
  dmi.sys.vendor: Dell Inc.
  modified.conffile..etc.default.apport:
   # set this to 0 to disable apport, or to 1 to enable it
   # you can temporarily override this with
   # sudo service apport start force_start=1
   enabled=0
  mtime.conffile..etc.default.apport: 2023-08-13T20:57:27
  mtime.conffile..etc.systemd.system.conf: 2023-08-13T20:57:27

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.2/+bug/2031352/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1982519] Re: GDS: Add NFS patches to optimized kernel

2023-08-28 Thread Ian May
** Changed in: linux-nvidia-5.19 (Ubuntu Jammy)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia in Ubuntu.
https://bugs.launchpad.net/bugs/1982519

Title:
  GDS: Add NFS patches to optimized kernel

Status in linux-nvidia package in Ubuntu:
  New
Status in linux-nvidia-5.19 package in Ubuntu:
  New
Status in linux-nvidia-6.2 package in Ubuntu:
  New
Status in linux-nvidia source package in Jammy:
  Fix Released
Status in linux-nvidia-5.19 source package in Jammy:
  Fix Released
Status in linux-nvidia-6.2 source package in Jammy:
  Fix Released

Bug description:
   [Impact]
  Adding these changes will enable GDS functionality NFS drivers.

   [Fix]
  This is a not a fix but a new feature being to NFS driver.

   [Test]
  
  Tested the NFS driver on a hpe system as I did not have a setup with BASEOS6.
   1) Installed 5.15.39 kernel on the system (this is the kernel that 
optimized kernel is on currently).
   2) Downloaded the optimized kernel.
   3) Applied the patches to the optimized kernel
   4) Replaced the NFS modules on the system with the one's built on 
optimized kernel.
   5) Ran gds and compat mode tests on a  NFS mount with the patched NFS 
driver. All tests went fine.
   
  Attaching the results

  Compat mode tests
  ==
  **
  API Tests, : 72 /  72 tests passed
  **
  Testsuite : 211 / 211 tests passed
  done tests:Thu Jul 21 08:27:58 PM UTC 2022

  GDS mode tests
  ==
  **
  NVFS IOCTL negative Tests, : 23 /  23 tests passed
  **
  Testsuite : 249 / 249 tests passed
  End: nvidia-fs:
  GDS Version: 1.4.0.31
  NVFS statistics(ver: 4.0)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/1982519/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1982519] Re: GDS: Add NFS patches to optimized kernel

2023-08-28 Thread Ian May
** Changed in: linux-nvidia-6.2 (Ubuntu Jammy)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia in Ubuntu.
https://bugs.launchpad.net/bugs/1982519

Title:
  GDS: Add NFS patches to optimized kernel

Status in linux-nvidia package in Ubuntu:
  New
Status in linux-nvidia-5.19 package in Ubuntu:
  New
Status in linux-nvidia-6.2 package in Ubuntu:
  New
Status in linux-nvidia source package in Jammy:
  Fix Released
Status in linux-nvidia-5.19 source package in Jammy:
  New
Status in linux-nvidia-6.2 source package in Jammy:
  Fix Released

Bug description:
   [Impact]
  Adding these changes will enable GDS functionality NFS drivers.

   [Fix]
  This is a not a fix but a new feature being to NFS driver.

   [Test]
  
  Tested the NFS driver on a hpe system as I did not have a setup with BASEOS6.
   1) Installed 5.15.39 kernel on the system (this is the kernel that 
optimized kernel is on currently).
   2) Downloaded the optimized kernel.
   3) Applied the patches to the optimized kernel
   4) Replaced the NFS modules on the system with the one's built on 
optimized kernel.
   5) Ran gds and compat mode tests on a  NFS mount with the patched NFS 
driver. All tests went fine.
   
  Attaching the results

  Compat mode tests
  ==
  **
  API Tests, : 72 /  72 tests passed
  **
  Testsuite : 211 / 211 tests passed
  done tests:Thu Jul 21 08:27:58 PM UTC 2022

  GDS mode tests
  ==
  **
  NVFS IOCTL negative Tests, : 23 /  23 tests passed
  **
  Testsuite : 249 / 249 tests passed
  End: nvidia-fs:
  GDS Version: 1.4.0.31
  NVFS statistics(ver: 4.0)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/1982519/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port

2023-07-20 Thread Ian May
** Also affects: linux (Ubuntu Lunar)
   Importance: Undecided
   Status: New

** Also affects: linux-hwe-5.19 (Ubuntu Lunar)
   Importance: Undecided
   Status: New

** Also affects: linux-hwe-6.2 (Ubuntu Lunar)
   Importance: Undecided
   Status: New

** No longer affects: linux-hwe-5.19 (Ubuntu Lunar)

** No longer affects: linux-hwe-6.2 (Ubuntu Lunar)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2026776

Title:
  arm64+ast2600: No Output from BMC's VGA port

Status in linux package in Ubuntu:
  New
Status in linux-hwe-5.19 package in Ubuntu:
  Won't Fix
Status in linux-hwe-6.2 package in Ubuntu:
  New
Status in linux source package in Lunar:
  New

Bug description:
  SRU Justification:

  [ Impact ]

  On systems that have the following combination of hardware

  1) arm64 CPU 
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  No output when connecting a display to the BMC's VGA port.

  [ Fix ]

  For AST2500+ MMIO should be enabled by default.

  [ Test Plan ]

  Test on targeted hardware to make sure BMC is displaying output.

  [ Where problems could occur ]

  Not aware of any potential problems, but any should be confined to
  ASPEED AST2500+ hardware.

  [ Other Info ]

  Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has
  been tested with affected BMC.


  
  [Issue]

  On systems that have the following combination of hardware...:

  1) arm64 CPU
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  .. we see no output when connecting a display to the BMC's VGA port.

  Upon further investigation, we see that applying the following patch
  fixes this issue:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228

  [Action]

  Please apply the following two backports to the appropriate Ubuntu HWE
  kernels:

  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560
  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2026776] Re: arm64+ast2600: No Output from BMC's VGA port

2023-07-19 Thread Ian May
With Kinetic going EOL, there will be no further SRU updates for linux-
hwe-5.19

** Description changed:

+ SRU Justification:
+ 
+ [ Impact ]
+ 
+ On systems that have the following combination of hardware
+ 
+ 1) arm64 CPU 
+ 2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/
+ 
+ No output when connecting a display to the BMC's VGA port.
+ 
+ [ Fix ]
+ 
+ For AST2500+ MMIO should be enabled by default.
+ 
+ [ Test Plan ]
+ 
+ Test on targeted hardware to make sure BMC is displaying output.
+ 
+ [ Where problems could occur ]
+ 
+ Not aware of any potential problems, but any should be confined to
+ ASPEED AST2500+ hardware.
+ 
+ [ Other Info ]
+ 
+ Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has been
+ tested with affected BMC.
+ 
+ 
+ 
  [Issue]
  
  On systems that have the following combination of hardware...:
  
  1) arm64 CPU
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/
  
  .. we see no output when connecting a display to the BMC's VGA port.
  
  Upon further investigation, we see that applying the following patch
  fixes this issue:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228
  
  [Action]
  
  Please apply the following two backports to the appropriate Ubuntu HWE
  kernels:
  
  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560
  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7

** Changed in: linux-hwe-5.19 (Ubuntu)
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2026776

Title:
  arm64+ast2600: No Output from BMC's VGA port

Status in linux package in Ubuntu:
  New
Status in linux-hwe-5.19 package in Ubuntu:
  Won't Fix
Status in linux-hwe-6.2 package in Ubuntu:
  New

Bug description:
  SRU Justification:

  [ Impact ]

  On systems that have the following combination of hardware

  1) arm64 CPU 
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  No output when connecting a display to the BMC's VGA port.

  [ Fix ]

  For AST2500+ MMIO should be enabled by default.

  [ Test Plan ]

  Test on targeted hardware to make sure BMC is displaying output.

  [ Where problems could occur ]

  Not aware of any potential problems, but any should be confined to
  ASPEED AST2500+ hardware.

  [ Other Info ]

  Patch is already in jammy/nvidia-5.19 and jammy/nvidia-6.2 and has
  been tested with affected BMC.


  
  [Issue]

  On systems that have the following combination of hardware...:

  1) arm64 CPU
  2) ASPEED AST2600 BMC: https://www.aspeedtech.com/server_ast2600/

  .. we see no output when connecting a display to the BMC's VGA port.

  Upon further investigation, we see that applying the following patch
  fixes this issue:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ast?h=v6.4-rc6=4327a6137ed43a091d900b1ac833345d60f32228

  [Action]

  Please apply the following two backports to the appropriate Ubuntu HWE
  kernels:

  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-5.19/commit/055c9ec3739d7df1179db3ba054b00f3dd684560
  
https://github.com/NVIDIA-BaseOS-6/linux-nvidia-6.2/commit/8ab3253c6a59eee3424fe0c60b1fc6dc9f2d73b7

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026776/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2026883] Re: vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs

2023-07-11 Thread Colin Ian King
It may be worth trying this on real H/W to factor our the QEMU
component.

** Description changed:

  When running the stress-ng vector floating point stressor in QEMU PPC64
  virtual machines I get floating point verification errors when running
  more stressor instances than the number of virtual CPUs.
  
  How to reproduce:
  
  Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and
  then do:
  
  get latest stress-ng:
  
  sudo apt-get build-dep stress-ng
  git clone https://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean; make -j $(nproc)
  ./stress-ng --vecfp 32 --verify -t 10
  
  One should get failures such as:
  stress-ng: info:  [1487] setting to a 10 second run per stressor
  stress-ng: info:  [1487] dispatching hogs: 32 vecfp
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.00, expected 180812.062500
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.00, expected 13278722.00
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.00, expected 26213772.00
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.00, expected 39415832.00
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 157250576.00, expected 33576.261719
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 170314032.00, expected 13129044.00
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 183516080.00, expected 26348392.00
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 196647552.00, expected 39365508.00
  etc..
  
  However, running less instances than the number of CPUs this runs fine 
without any errors:
  /stress-ng --vecfp 1 --verify -t 10
  stress-ng: info:  [1521] setting to a 10 second run per stressor
  stress-ng: info:  [1521] dispatching hogs: 1 vecfp
  stress-ng: info:  [1521] passed: 1: vecfp (1)
  stress-ng: info:  [1521] failed: 0
  stress-ng: info:  [1521] skipped: 0
  stress-ng: info:  [1521] metrics untrustworthy: 0
  stress-ng: info:  [1521] successful run completed in 19.00s
  
  It appears this only fails when the number of instances of the vecfp
  stressor is more than the number of virtual CPUs.  This seems to
  indicate that vector floating point registers are being clobbered
  between processes, which could be a security exploitable issue.
  
  Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host
- (6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6)
+ (6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6).
+ 
+ List of PPC64el kernels reproducers:
+ 
+ Lunar: 6.2.0-20-generic
+ Mantic: 6.3.0-7-generic
+ 
  
  Not sure if this is a kernel or KVM issue, or both.

** Information type changed from Public to Private Security

** Changed in: linux (Ubuntu Lunar)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Also affects: linux (Ubuntu Mantic)
   Importance: High
   Status: New

** Description changed:

  When running the stress-ng vector floating point stressor in QEMU PPC64
  virtual machines I get floating point verification errors when running
  more stressor instances than the number of virtual CPUs.
  
  How to reproduce:
  
  Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and
  then do:
  
  get latest stress-ng:
  
  sudo apt-get build-dep stress-ng
  git clone https://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean; make -j $(nproc)
  ./stress-ng --vecfp 32 --verify -t 10
  
  One should get failures such as:
  stress-ng: info:  [1487] setting to a 10 second run per stressor
  stress-ng: info:  [1487] dispatching hogs: 32 vecfp
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.00, expected 180812.062500
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.00, expected 13278722.00
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.00, expected 26213772.00
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.00, expected 39415832.00
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 157250576.00, expected 33576.261719
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 170314032.00, expected 13129044.00
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 183516080.00, expected 26348392.00
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 

[Kernel-packages] [Bug 2026883] [NEW] vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs

2023-07-11 Thread Colin Ian King
*** This bug is a security vulnerability ***

Private security bug reported:

When running the stress-ng vector floating point stressor in QEMU PPC64
virtual machines I get floating point verification errors when running
more stressor instances than the number of virtual CPUs.

How to reproduce:

Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and
then do:

get latest stress-ng:

sudo apt-get build-dep stress-ng
git clone https://github.com/ColinIanKing/stress-ng
cd stress-ng
make clean; make -j $(nproc)
./stress-ng --vecfp 32 --verify -t 10

One should get failures such as:
stress-ng: info:  [1487] setting to a 10 second run per stressor
stress-ng: info:  [1487] dispatching hogs: 32 vecfp
stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.00, expected 180812.062500
stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.00, expected 13278722.00
stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.00, expected 26213772.00
stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.00, expected 39415832.00
stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 157250576.00, expected 33576.261719
stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 170314032.00, expected 13129044.00
stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 183516080.00, expected 26348392.00
stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 196647552.00, expected 39365508.00
etc..

However, running less instances than the number of CPUs this runs fine without 
any errors:
/stress-ng --vecfp 1 --verify -t 10
stress-ng: info:  [1521] setting to a 10 second run per stressor
stress-ng: info:  [1521] dispatching hogs: 1 vecfp
stress-ng: info:  [1521] passed: 1: vecfp (1)
stress-ng: info:  [1521] failed: 0
stress-ng: info:  [1521] skipped: 0
stress-ng: info:  [1521] metrics untrustworthy: 0
stress-ng: info:  [1521] successful run completed in 19.00s

It appears this only fails when the number of instances of the vecfp
stressor is more than the number of virtual CPUs.  This seems to
indicate that vector floating point registers are being clobbered
between processes, which could be a security exploitable issue.

Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host
(6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6).

List of PPC64el kernels reproducers:

Focal: 5.4.0-148-generic
Jammy: 5.15.0-58-generic
Lunar: 6.2.0-20-generic
Mantic: 6.3.0-7-generic

Not sure if this is a kernel or KVM issue, or both.

** Affects: linux (Ubuntu)
 Importance: High
 Status: New

** Affects: linux (Ubuntu Focal)
 Importance: High
 Status: New

** Affects: linux (Ubuntu Lunar)
 Importance: High
 Status: New

** Affects: linux (Ubuntu Mantic)
 Importance: High
 Status: New

** Also affects: linux (Ubuntu Lunar)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2026883

Title:
  vector floating point registers get clobbered when running stress-ng
  --vecfp with more instances than CPUs

Status in linux package in Ubuntu:
  New
Status in linux source package in Focal:
  New
Status in linux source package in Lunar:
  New
Status in linux source package in Mantic:
  New

Bug description:
  When running the stress-ng vector floating point stressor in QEMU
  PPC64 virtual machines I get floating point verification errors when
  running more stressor instances than the number of virtual CPUs.

  How to reproduce:

  Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login,
  and then do:

  get latest stress-ng:

  sudo apt-get build-dep stress-ng
  git clone https://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean; make -j $(nproc)
  ./stress-ng --vecfp 32 --verify -t 10

  One should get failures such as:
  stress-ng: info:  [1487] setting to a 10 second run per stressor
  stress-ng: info:  [1487] dispatching hogs: 32 vecfp
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.00, expected 180812.062500
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.00, expected 13278722.00
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.00, expected 26213772.00
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.00, expected 39415832.00
  stress-ng: fail:  [1488] vecfp: floatv16div float vector 

[Kernel-packages] [Bug 2017903] Re: LSM stacking and AppArmor for 6.2: additional fixes

2023-07-11 Thread Colin Ian King
Note that this could be triggered with stress-ng --apparmor 0;  see
https://bugs.launchpad.net/ubuntu/mantic/+source/linux/+bug/2024599

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2017903

Title:
  LSM stacking and AppArmor for 6.2: additional fixes

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Lunar:
  Fix Released

Bug description:
  [Impact]

  We maintain custom LSM stacking and AppArmor SAUCE patches in our
  kernel to provide additional features that are not available in the
  upstream AppArmor.

  We have experienced occasional bugs in the lunar kernel (specifically
  with the environ.sh test) that can lead to system crashes / failures
  (such as potential NULL pointer dereference).

  [Test case]

  Run AppArmor autopkgtest / qa-regression-testing.

  [Fix]

  Apply the following additional fixes provided by AppArmor upstream
  maintainer:

UBUNTU: SAUCE: apparmor: fix policy_compat perms remap for file dfa
UBUNTU: SAUCE: apparmor: fix profile verification and enable it
UBUNTU: SAUCE: apparmor: fix: add missing failure check in 
compute_xmatch_perms
UBUNTU: SAUCE: apparmor: fix: kzalloc perms tables for shared dfas

  [Regression potential]

  Additional fixes are touching only AppArmor specific code, so we may
  experience regressions (bugs / behavior change) only in apparmor by
  applying them.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2017903/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-07-11 Thread Colin Ian King
Thanks JJ, much appreciated :-)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2024599

Title:
  linux-image-5.15.0-1032-realtime locks up under scheduler test load

Status in apparmor package in Ubuntu:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in apparmor source package in Jammy:
  New
Status in linux source package in Jammy:
  Incomplete
Status in apparmor source package in Kinetic:
  New
Status in linux source package in Kinetic:
  New
Status in apparmor source package in Lunar:
  New
Status in linux source package in Lunar:
  New
Status in apparmor source package in Mantic:
  New
Status in linux source package in Mantic:
  Incomplete

Bug description:
  lsb_release -a
  No LSB modules are available.
  Distributor ID:   Ubuntu
  Description:  Ubuntu 22.04.2 LTS
  Release:  22.04
  Codename: jammy

  uname -a
  Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
  x86_64 x86_64 GNU/Linux

  free
     totalusedfree  shared  buff/cache   
available
  Mem: 4013888  200984 34390121204  373892 
3744628
  Swap:4014076   0 4014076

  Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake,
  IBRS):

  how to reproduce issue:

  git clone https://github.com/ColinIanKing/stress-ng
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
  cd stress-ng
  make clean
  make -j 8
  sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

  ..wait for all the stressors to get invoked, system becomes
  unresponsive, can't ^C stress-ng, can't swap consoles on the VM,
  appears to be hard locked.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2024599/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-06-27 Thread Colin Ian King
And also occurs in Ubuntu Mantic with 6.3.0-7-generic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2024599

Title:
  linux-image-5.15.0-1032-realtime locks up under scheduler test load

Status in apparmor package in Ubuntu:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in apparmor source package in Jammy:
  New
Status in linux source package in Jammy:
  Incomplete
Status in apparmor source package in Kinetic:
  New
Status in linux source package in Kinetic:
  New
Status in apparmor source package in Lunar:
  New
Status in linux source package in Lunar:
  New
Status in apparmor source package in Mantic:
  New
Status in linux source package in Mantic:
  Incomplete

Bug description:
  lsb_release -a
  No LSB modules are available.
  Distributor ID:   Ubuntu
  Description:  Ubuntu 22.04.2 LTS
  Release:  22.04
  Codename: jammy

  uname -a
  Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
  x86_64 x86_64 GNU/Linux

  free
     totalusedfree  shared  buff/cache   
available
  Mem: 4013888  200984 34390121204  373892 
3744628
  Swap:4014076   0 4014076

  Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake,
  IBRS):

  how to reproduce issue:

  git clone https://github.com/ColinIanKing/stress-ng
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
  cd stress-ng
  make clean
  make -j 8
  sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

  ..wait for all the stressors to get invoked, system becomes
  unresponsive, can't ^C stress-ng, can't swap consoles on the VM,
  appears to be hard locked.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2024599/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-06-27 Thread Colin Ian King
5.15.0.75 works fine, no problem, 5.19.0-45 kernel crashes, so issue
introduced between 5.15 and 5.19

** Also affects: apparmor (Ubuntu Lunar)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Lunar)
   Importance: Undecided
   Status: New

** Also affects: apparmor (Ubuntu Kinetic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Kinetic)
   Importance: Undecided
   Status: New

** Also affects: apparmor (Ubuntu Mantic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Mantic)
   Importance: Low
   Status: Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2024599

Title:
  linux-image-5.15.0-1032-realtime locks up under scheduler test load

Status in apparmor package in Ubuntu:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in apparmor source package in Jammy:
  New
Status in linux source package in Jammy:
  Incomplete
Status in apparmor source package in Kinetic:
  New
Status in linux source package in Kinetic:
  New
Status in apparmor source package in Lunar:
  New
Status in linux source package in Lunar:
  New
Status in apparmor source package in Mantic:
  New
Status in linux source package in Mantic:
  Incomplete

Bug description:
  lsb_release -a
  No LSB modules are available.
  Distributor ID:   Ubuntu
  Description:  Ubuntu 22.04.2 LTS
  Release:  22.04
  Codename: jammy

  uname -a
  Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
  x86_64 x86_64 GNU/Linux

  free
     totalusedfree  shared  buff/cache   
available
  Mem: 4013888  200984 34390121204  373892 
3744628
  Swap:4014076   0 4014076

  Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake,
  IBRS):

  how to reproduce issue:

  git clone https://github.com/ColinIanKing/stress-ng
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
  cd stress-ng
  make clean
  make -j 8
  sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

  ..wait for all the stressors to get invoked, system becomes
  unresponsive, can't ^C stress-ng, can't swap consoles on the VM,
  appears to be hard locked.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2024599/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-06-27 Thread Colin Ian King
And with 5.19.0-45-generic:

sudo ./stress-ng --apparmor 1 --klog-check
[sudo] password for cking: 
stress-ng: info:  [1179] defaulting to a 86400 second (1 day, 0.00 secs) run 
per stressor
stress-ng: info:  [1179] dispatching hogs: 1 apparmor
stress-ng: info:  [1180] klog-check: kernel cmdline: 
'BOOT_IMAGE=/vmlinuz-5.19.0-45-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv 
ro'
stress-ng: error: [1180] klog-check: error: [93.527396] 'AppArmor DFA 
next/check upper bounds error'
stress-ng: error: [1180] klog-check: error: [93.827976] 'AppArmor DFA state 
with invalid match flags'
stress-ng: error: [1180] klog-check: error: [93.991395] 'AppArmor DFA 
next/check upper bounds error'
stress-ng: error: [1180] klog-check: error: [93.992189] 'AppArmor DFA 
next/check upper bounds error'
stress-ng: error: [1180] klog-check: error: [94.007400] 'AppArmor DFA state 
with invalid match flags'
stress-ng: error: [1180] klog-check: error: [94.059345] 'AppArmor DFA state 
with invalid match flags'
stress-ng: error: [1180] klog-check: error: [94.104414] 'AppArmor DFA 
next/check upper bounds error'
stress-ng: error: [1180] klog-check: alert: [94.128617] 'BUG: kernel NULL 
pointer dereference, address: 0130'
stress-ng: error: [1180] klog-check: alert: [94.128644] '#PF: supervisor read 
access in kernel mode'
stress-ng: error: [1180] klog-check: alert: [94.128659] '#PF: 
error_code(0x) - not-present page'
stress-ng: info:  [1180] klog-check: warning: [94.128685] 'Oops:  [#1] 
PREEMPT SMP PTI'
stress-ng: info:  [1180] klog-check: warning: [94.128698] 'CPU: 7 PID: 1185 
Comm: stress-ng-appar Not tainted 5.19.0-45-generic #46-Ubuntu'
stress-ng: info:  [1180] klog-check: warning: [94.128722] 'Hardware name: QEMU 
Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014'
stress-ng: info:  [1180] klog-check: warning: [94.128745] 'RIP: 
0010:aa_unpack+0x11f/0x530'
stress-ng: info:  [1180] klog-check: warning: [94.128762] 'Code: 00 48 85 c0 0f 
84 15 04 00 00 48 8d 75 a8 48 8d 7d b0 4c 8b 7d c0 e8 80 ec ff ff 48 89 c3 48 
3d 00 f0 ff ff 0f 87 00 02 00 00 <4c> 8b b0 30 01 00 00 4d 85 f6 0f 84 38 01 00 
00 49 8b 86 c8 00 00'
stress-ng: info:  [1180] klog-check: warning: [94.128807] 'RSP: 
0018:b1fdc0f57ce0 EFLAGS: 00010207'
stress-ng: info:  [1180] klog-check: warning: [94.129378] 'RAX: 
 RBX:  RCX: '
stress-ng: info:  [1180] klog-check: warning: [94.129928] 'RDX: 
 RSI:  RDI: '
stress-ng: info:  [1180] klog-check: warning: [94.130443] 'RBP: 
b1fdc0f57d40 R08:  R09: '
stress-ng: info:  [1180] klog-check: warning: [94.131056] 'R10: 
 R11:  R12: b1fdc0f57da8'
stress-ng: info:  [1180] klog-check: warning: [94.131572] 'R13: 
b1fdc0f57da0 R14: 9da384835962 R15: 9da384820010'
stress-ng: info:  [1180] klog-check: warning: [94.132090] 'FS:  
7fa65a059740() GS:9da3fbdc() knlGS:'
stress-ng: info:  [1180] klog-check: warning: [94.132652] 'CS:  0010 DS:  
ES:  CR0: 80050033'
stress-ng: info:  [1180] klog-check: warning: [94.133206] 'CR2: 
0130 CR3: 00010d432006 CR4: 00370ee0'
stress-ng: info:  [1180] klog-check: warning: [94.133739] 'DR0: 
 DR1:  DR2: '
stress-ng: info:  [1180] klog-check: warning: [94.134282] 'DR3: 
 DR6: fffe0ff0 DR7: 0400'
stress-ng: info:  [1180] klog-check: warning: [94.134868] 'Call Trace:'
stress-ng: info:  [1180] klog-check: warning: [94.135388] ' '
stress-ng: info:  [1180] klog-check: warning: [94.135933] ' 
aa_replace_profiles+0xa1/0x10b0'
stress-ng: info:  [1180] klog-check: warning: [94.136471] ' ? 
check_heap_object+0x29/0x1e0'
stress-ng: info:  [1180] klog-check: warning: [94.137018] ' ? 
__check_object_size.part.0+0x4c/0xf0'
stress-ng: info:  [1180] klog-check: warning: [94.137528] ' 
policy_update+0xd0/0x170'
stress-ng: info:  [1180] klog-check: warning: [94.138061] ' 
profile_replace+0xb9/0x150'
stress-ng: info:  [1180] klog-check: warning: [94.138612] ' 
vfs_write+0xb7/0x290'
stress-ng: info:  [1180] klog-check: warning: [94.139124] ' 
ksys_write+0x73/0x100'
stress-ng: info:  [1180] klog-check: warning: [94.139616] ' 
__x64_sys_write+0x19/0x30'
stress-ng: info:  [1180] klog-check: warning: [94.140104] ' 
do_syscall_64+0x58/0x90'
stress-ng: info:  [1180] klog-check: warning: [94.140651] ' ? 
syscall_exit_to_user_mode+0x29/0x50'
stress-ng: info:  [1180] klog-check: warning: [94.141130] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1180] klog-check: warning: [94.141630] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1180] klog-check: warning: [94.142117] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1180] klog-check: warning: [94.142627] ' 
entry_SYSCALL_64_after_hwframe+0x63/0xcd'
stress-ng: info:  [1180] klog-check: warning: 

[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-06-27 Thread Colin Ian King
On 6.2.0-21-generic I also get:

sudo ./stress-ng --apparmor 1 --klog-check

stress-ng: error: [1083] klog-check: alert: [66.442338] 'BUG: kernel NULL 
pointer dereference, address: 0030'
stress-ng: error: [1083] klog-check: alert: [66.442538] '#PF: supervisor read 
access in kernel mode'
stress-ng: error: [1083] klog-check: alert: [66.442718] '#PF: 
error_code(0x) - not-present page'
stress-ng: info:  [1083] klog-check: warning: [66.443080] 'Oops:  [#1] 
PREEMPT SMP PTI'
stress-ng: info:  [1083] klog-check: warning: [66.443256] 'CPU: 3 PID: 1088 
Comm: stress-ng-appar Not tainted 6.2.0-21-generic #21-Ubuntu'
stress-ng: info:  [1083] klog-check: warning: [66.443438] 'Hardware name: QEMU 
Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014'
stress-ng: info:  [1083] klog-check: warning: [66.443628] 'RIP: 
0010:aafs_create.constprop.0+0x7f/0x130'
stress-ng: info:  [1083] klog-check: warning: [66.443819] 'Code: 4c 63 e0 48 83 
c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 
45 31 c9 45 31 d2 c3 cc cc cc cc <4d> 8b 55 30 4d 8d ba a0 00 00 00 4c 89 55 c0 
4c 89 ff e8 8a 59 a1'
stress-ng: info:  [1083] klog-check: warning: [66.444227] 'RSP: 
0018:beb940907bd8 EFLAGS: 00010246'
stress-ng: info:  [1083] klog-check: warning: [66.33] 'RAX: 
 RBX: 41ed RCX: '
stress-ng: info:  [1083] klog-check: warning: [66.444646] 'RDX: 
 RSI:  RDI: '
stress-ng: info:  [1083] klog-check: warning: [66.444862] 'RBP: 
beb940907c18 R08:  R09: '
stress-ng: info:  [1083] klog-check: warning: [66.445074] 'R10: 
 R11:  R12: 93db8b18'
stress-ng: info:  [1083] klog-check: warning: [66.445291] 'R13: 
 R14:  R15: '
stress-ng: info:  [1083] klog-check: warning: [66.445503] 'FS:  
7f60f5c07740() GS:9578bbcc() knlGS:'
stress-ng: info:  [1083] klog-check: warning: [66.445721] 'CS:  0010 DS:  
ES:  CR0: 80050033'
stress-ng: info:  [1083] klog-check: warning: [66.445939] 'CR2: 
0030 CR3: 000124ffa004 CR4: 00370ee0'
stress-ng: info:  [1083] klog-check: warning: [66.446163] 'DR0: 
 DR1:  DR2: '
stress-ng: info:  [1083] klog-check: warning: [66.446387] 'DR3: 
 DR6: fffe0ff0 DR7: 0400'
stress-ng: info:  [1083] klog-check: warning: [66.446608] 'Call Trace:'
stress-ng: info:  [1083] klog-check: warning: [66.446829] ' '
stress-ng: info:  [1083] klog-check: warning: [66.447059] ' 
__aafs_profile_mkdir+0x3d6/0x480'
stress-ng: info:  [1083] klog-check: warning: [66.447290] ' 
aa_replace_profiles+0x862/0x1270'
stress-ng: info:  [1083] klog-check: warning: [66.447518] ' 
policy_update+0xe0/0x180'
stress-ng: info:  [1083] klog-check: warning: [66.447750] ' 
profile_replace+0xb9/0x150'
stress-ng: info:  [1083] klog-check: warning: [66.447981] ' 
vfs_write+0xc8/0x410'
stress-ng: info:  [1083] klog-check: warning: [66.448213] ' ? 
kmem_cache_free+0x1e/0x3b0'
stress-ng: info:  [1083] klog-check: warning: [66.448442] ' 
ksys_write+0x73/0x100'
stress-ng: info:  [1083] klog-check: warning: [66.448670] ' 
__x64_sys_write+0x19/0x30'
stress-ng: info:  [1083] klog-check: warning: [66.448892] ' 
do_syscall_64+0x58/0x90'
stress-ng: info:  [1083] klog-check: warning: [66.449115] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1083] klog-check: warning: [66.449337] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1083] klog-check: warning: [66.449551] ' ? 
exit_to_user_mode_loop+0xe0/0x130'
stress-ng: info:  [1083] klog-check: warning: [66.449775] ' ? 
exit_to_user_mode_prepare+0x30/0xb0'
stress-ng: info:  [1083] klog-check: warning: [66.449996] ' ? 
syscall_exit_to_user_mode+0x29/0x50'
stress-ng: info:  [1083] klog-check: warning: [66.450220] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1083] klog-check: warning: [66.450449] ' ? 
exit_to_user_mode_prepare+0x30/0xb0'
stress-ng: info:  [1083] klog-check: warning: [66.450681] ' ? 
syscall_exit_to_user_mode+0x29/0x50'
stress-ng: info:  [1083] klog-check: warning: [66.450915] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1083] klog-check: warning: [66.451151] ' ? 
do_syscall_64+0x67/0x90'
stress-ng: info:  [1083] klog-check: warning: [66.451384] ' 
entry_SYSCALL_64_after_hwframe+0x72/0xdc'
stress-ng: info:  [1083] klog-check: warning: [66.451614] 'RIP: 
0033:0x7f60f5b0b9e4'
stress-ng: info:  [1083] klog-check: warning: [66.451848] 'Code: 15 39 a4 0e 00 
f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d fd 2b 0f 
00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 
28 48 89 54 24 18 48'
stress-ng: info:  [1083] klog-check: warning: [66.452341] 'RSP: 
002b:7ffdaa28bfb8 EFLAGS: 0202 ORIG_RAX: 0001'
stress-ng: info: 

[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-06-23 Thread Colin Ian King
I've managed to capture where it hangs, looks like a RCU issue, see
attached screen shot.

** Attachment added: "Screenshot from 2023-06-23 12-28-42.png"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+attachment/5681654/+files/Screenshot%20from%202023-06-23%2012-28-42.png

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2024599

Title:
  linux-image-5.15.0-1032-realtime locks up under scheduler test load

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Jammy:
  Incomplete

Bug description:
  lsb_release -a
  No LSB modules are available.
  Distributor ID:   Ubuntu
  Description:  Ubuntu 22.04.2 LTS
  Release:  22.04
  Codename: jammy

  uname -a
  Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
  x86_64 x86_64 GNU/Linux

  free
     totalusedfree  shared  buff/cache   
available
  Mem: 4013888  200984 34390121204  373892 
3744628
  Swap:4014076   0 4014076

  Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake,
  IBRS):

  how to reproduce issue:

  git clone https://github.com/ColinIanKing/stress-ng
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
  cd stress-ng
  make clean
  make -j 8
  sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

  ..wait for all the stressors to get invoked, system becomes
  unresponsive, can't ^C stress-ng, can't swap consoles on the VM,
  appears to be hard locked.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2024599] Re: linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-06-22 Thread Colin Ian King
I'm working through the stressors to see which ones are possibly causing
issues. I did notice that the apparmor stressor eats up memory until the
system runs out of memory. This stressor loads illegal apparmor profiles
and then removes them. Perhaps there is a memory leak in the loading of
profiles that don't pass the verification phase:

To show this issue, run the following, one can see that memory gets low
over time before the user gets kicked off due to low memory:

sudo ./stress-ng --apparmor 1 --vmstat 5
stress-ng: info:  [1339] defaulting to a 86400 second (1 day, 0.00 secs) run 
per stressor
stress-ng: info:  [1339] dispatching hogs: 1 apparmor
stress-ng: info:  [1340] vmstat:   r   b  swpd  free  buff 
cache   si   so bi bo   in   cs us sy id wa st
stress-ng: info:  [1340] vmstat:   2   1 0313824 32776
36435200 16 18 4858 9752  4 25 70  0  0
stress-ng: info:  [1340] vmstat:   5   0 0257848 32776
36652800  0   1091 4573 8435  4 23 72  0  0
stress-ng: info:  [1340] vmstat:   5   0 0198916 32784
36828800  0 20 4642 8681  4 23 71  1  0
stress-ng: info:  [1340] vmstat:   2   0 0139496 32792
37060000  0 16 4612 8500  4 23 71  1  0
stress-ng: info:  [1340] vmstat:   2   0 0 85032 32740
36391600  0   1751 4774 8710  4 23 71  1  0
stress-ng: info:  [1340] vmstat:   5   0 0 92224 32748
31054800  0   2020 5919 10123  4 24 70  1  0
stress-ng: info:  [1340] vmstat:   2   0 0 93380 30068
26848400  0 14 5590 10275  4 26 69  1  0
stress-ng: info:  [1340] vmstat:   2   0 0102152 23648
20787200  0   3346 5277 9303  4 24 70  1  0
stress-ng: info:  [1340] vmstat:   5   0 0 99184 18488
16908400 48   2180 5614 9901  4 25 71  0  0
stress-ng: info:  [1340] vmstat:   2   0 0 88068  7080
14039200359   2090 6146 11013  4 27 68  0  0
stress-ng: info:  [1340] vmstat:   2   0 0 92368   564 
8210800   3568   2534 5899 10308  4 26 67  1  0
stress-ng: info:  [1340] vmstat:   7   0 0 83784   100 
4735600  99834   4212 8540 14574  4 28 65  2  0
stress-ng: info:  [1340] vmstat:   2   0 0 76784   188 
4491600 363427   7621 16647 28448  4 37 45 12  0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2024599

Title:
  linux-image-5.15.0-1032-realtime locks up under scheduler test load

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Jammy:
  Incomplete

Bug description:
  lsb_release -a
  No LSB modules are available.
  Distributor ID:   Ubuntu
  Description:  Ubuntu 22.04.2 LTS
  Release:  22.04
  Codename: jammy

  uname -a
  Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
  x86_64 x86_64 GNU/Linux

  free
     totalusedfree  shared  buff/cache   
available
  Mem: 4013888  200984 34390121204  373892 
3744628
  Swap:4014076   0 4014076

  Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake,
  IBRS):

  how to reproduce issue:

  git clone https://github.com/ColinIanKing/stress-ng
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
  cd stress-ng
  make clean
  make -j 8
  sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

  ..wait for all the stressors to get invoked, system becomes
  unresponsive, can't ^C stress-ng, can't swap consoles on the VM,
  appears to be hard locked.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2024599] [NEW] linux-image-5.15.0-1032-realtime locks up under scheduler test load

2023-06-21 Thread Colin Ian King
Public bug reported:

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 22.04.2 LTS
Release:22.04
Codename:   jammy

uname -a
Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
x86_64 x86_64 GNU/Linux

free
   totalusedfree  shared  buff/cache   available
Mem: 4013888  200984 34390121204  373892 3744628
Swap:4014076   0 4014076

Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS):

how to reproduce issue:

git clone https://github.com/ColinIanKing/stress-ng
sudo apt-get update
sudo apt-get build-dep stress-ng
sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
cd stress-ng
make clean
make -j 8
sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

..wait for all the stressors to get invoked, system becomes
unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears
to be hard locked.

** Affects: linux (Ubuntu)
 Importance: Low
 Status: New

** Affects: linux (Ubuntu Jammy)
 Importance: Undecided
 Status: New

** Changed in: linux (Ubuntu)
   Importance: Undecided => Low

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Summary changed:

- linux-image-5.15.0-1032-realtime locksup under scheduler test load
+ linux-image-5.15.0-1032-realtime locks up under scheduler test load

** Description changed:

  lsb_release -a
  No LSB modules are available.
  Distributor ID:   Ubuntu
  Description:  Ubuntu 22.04.2 LTS
  Release:  22.04
  Codename: jammy
  
  uname -a
- Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64 
+ Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
  x86_64 x86_64 GNU/Linux
  
  free
-totalusedfree  shared  buff/cache   
available
+    totalusedfree  shared  buff/cache   
available
  Mem: 4013888  200984 34390121204  373892 
3744628
  Swap:4014076   0 4014076
  
  Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake, IBRS):
  
  how to reproduce issue:
  
  git clone https://github.com/ColinIanKing/stress-ng
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
  cd stress-ng
  make clean
  make -j 8
  sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m
  
  ..wait for all the stressors to get invoked, system becomes
  unresponsive, can't ^C stress-ng, can't swap consoles on the VM, appears
  to be hard locked.
- 
- cd stress-ng

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2024599

Title:
  linux-image-5.15.0-1032-realtime locks up under scheduler test load

Status in linux package in Ubuntu:
  New
Status in linux source package in Jammy:
  New

Bug description:
  lsb_release -a
  No LSB modules are available.
  Distributor ID:   Ubuntu
  Description:  Ubuntu 22.04.2 LTS
  Release:  22.04
  Codename: jammy

  uname -a
  Linux jammie-amd64-efi 5.15.0-1032-realtime #35-Ubuntu SMP PREEMPT_RT Tue Jan 
24 11:45:03 UTC 2023 x86_64
  x86_64 x86_64 GNU/Linux

  free
     totalusedfree  shared  buff/cache   
available
  Mem: 4013888  200984 34390121204  373892 
3744628
  Swap:4014076   0 4014076

  Running in a kvm-qemu, 8 cpus, cpu Intel Core Processor (Skylake,
  IBRS):

  how to reproduce issue:

  git clone https://github.com/ColinIanKing/stress-ng
  sudo apt-get update
  sudo apt-get build-dep stress-ng
  sudo apt-get install libeigen3-dev libmpfr-dev libkmod-dev libxxhash-dev 
libglvnd-dev libgbm-dev
  cd stress-ng
  make clean
  make -j 8
  sudo ./stress-ng --class scheduler --all 1 -v --vmstat 1 -t 30m

  ..wait for all the stressors to get invoked, system becomes
  unresponsive, can't ^C stress-ng, can't swap consoles on the VM,
  appears to be hard locked.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2024599/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk

2023-05-16 Thread Ian! D. Allen
Unfortunately, an unacceptable side-effect of "sysctl -w
vm.dirty_ratio=0" is that disk operations that move a lot of data are
taking much too long. An rsync that normally takes less than an hour of
real time was still running over 12 hours later (and hasn't finished
yet).  I'm reverting the vm.dirty_ratio back up to 20 to see if that
clears out all the unfinished disk I/O.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2018687

Title:
  rm -r dir on USB disk locks up hdparm on different disk

Status in linux-meta-hwe-5.15 package in Ubuntu:
  New

Bug description:
  Description:Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz

  Running a "rm -r dir" on a directory with millions of files that resides
  on a disk in an external USB-3 hard drive dock locks up an unrelated
  hdparm processes running on an internal disk such that the kernel says:

  May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
  [...]
  May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
  [...]
  May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.

  First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
  it doesn't affect anything:

  # \time hdparm -t /dev/sda
  /dev/sda:
   Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
  0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
  2312704inputs+8outputs (0major+664minor)pagefaults 0swaps

  Elapsed time is about six seconds, as expected.  /dev/sda is an
  internal SSD drive.

  I now run this loop to show the timings and process states below:

  # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v
  refrig ; done

  (I have some processes stopped in a freezer cgroup ("refrig") that I
  don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
  running on the drive in the USB3 drive dock and you see the rm appear
  in uninterruptible sleep along with a couple of kernel processes:

  Sun May  7 05:01:07 EDT 2023
  Sun May  7 05:01:08 EDT 2023
  Sun May  7 05:01:09 EDT 2023
  Sun May  7 05:01:10 EDT 2023
  Sun May  7 05:01:11 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  Sun May  7 05:01:12 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 04016   1  20   0 161136  1900 usbhid Dsl  ?  1:39 
/sbin/apcupsd
  4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  Sun May  7 05:01:13 EDT 2023

  The above lines showing those processes in uninterruptible sleep repeat
  over and over each second as the "rm -r" continues.  I then start up
  "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
  uninterruptible sleep and doesn't finish even after minutes of waiting:

  Sun May  7 05:01:25 EDT 2023
  1 0 368   2  20   0  0 0 md_sup D?  2:57 
[md0_raid5]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
  4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:26 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
  4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:27 EDT 2023
  [...]
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:35 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  4 0 1423283   11939  20   0  11260  2544 blk_mq D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:36 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 04985   2  20   0  0 0 rq_qos D?  0:24 
[jbd2/sdj1-8]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 

[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk

2023-05-15 Thread Ian! D. Allen
If I do this:

# sysctl -w vm.dirty_ratio=0

the hdparm no longer hangs.  It has to be zero; anything non-zero, even
1, causes large delays in disk-related commands such as hdparm, sync,
smartctl, etc.

I got this idea from here:

https://serverfault.com/questions/405210/can-high-load-cause-server-
hang-and-error-blocked-for-more-than-120-seconds

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2018687

Title:
  rm -r dir on USB disk locks up hdparm on different disk

Status in linux-meta-hwe-5.15 package in Ubuntu:
  New

Bug description:
  Description:Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz

  Running a "rm -r dir" on a directory with millions of files that resides
  on a disk in an external USB-3 hard drive dock locks up an unrelated
  hdparm processes running on an internal disk such that the kernel says:

  May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
  [...]
  May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
  [...]
  May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.

  First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
  it doesn't affect anything:

  # \time hdparm -t /dev/sda
  /dev/sda:
   Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
  0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
  2312704inputs+8outputs (0major+664minor)pagefaults 0swaps

  Elapsed time is about six seconds, as expected.  /dev/sda is an
  internal SSD drive.

  I now run this loop to show the timings and process states below:

  # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v
  refrig ; done

  (I have some processes stopped in a freezer cgroup ("refrig") that I
  don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
  running on the drive in the USB3 drive dock and you see the rm appear
  in uninterruptible sleep along with a couple of kernel processes:

  Sun May  7 05:01:07 EDT 2023
  Sun May  7 05:01:08 EDT 2023
  Sun May  7 05:01:09 EDT 2023
  Sun May  7 05:01:10 EDT 2023
  Sun May  7 05:01:11 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  Sun May  7 05:01:12 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 04016   1  20   0 161136  1900 usbhid Dsl  ?  1:39 
/sbin/apcupsd
  4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  Sun May  7 05:01:13 EDT 2023

  The above lines showing those processes in uninterruptible sleep repeat
  over and over each second as the "rm -r" continues.  I then start up
  "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
  uninterruptible sleep and doesn't finish even after minutes of waiting:

  Sun May  7 05:01:25 EDT 2023
  1 0 368   2  20   0  0 0 md_sup D?  2:57 
[md0_raid5]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
  4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:26 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
  4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:27 EDT 2023
  [...]
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:35 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  4 0 1423283   11939  20   0  11260  2544 blk_mq D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:36 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 04985   2  20   0  0 0 rq_qos D?  0:24 
[jbd2/sdj1-8]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 

[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk

2023-05-15 Thread Ian! D. Allen
I now have a Perl script running that is removing duplicate files by
doing thousands of hard links on a different external USB3 disk and it
is locking up or timing out many disk-related things on all my other
disks.  Both this USB3 external drive and the one above are plugged
directly into the motherboard (HP Z440 Workstation).

F   UID PIDPPID PRI  NIVSZ   RSS WCHAN  STAT TTYTIME COMMAND
4 0 2210112 2210111  20   0  53680 42588 blk_mq D+   pts/24 3:00 
/usr/bin/perl [...]

May 14 13:37:49 kernel: [259424.745462] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 00:08:09 kernel: [297244.761855] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 00:10:10 kernel: [297365.592485] INFO: task smartd:2719 blocked for more 
than 241 seconds.
May 15 01:08:34 kernel: [300869.682961] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 01:28:43 kernel: [302077.989582] INFO: task hdparm:2052842 blocked for 
more than 120 seconds.
May 15 01:30:43 kernel: [302198.820278] INFO: task hdparm:2052842 blocked for 
more than 241 seconds.
May 15 01:32:44 kernel: [302319.654907] INFO: task hdparm:2052842 blocked for 
more than 362 seconds.
May 15 01:34:45 kernel: [302440.481601] INFO: task hdparm:2052842 blocked for 
more than 483 seconds.
May 15 01:36:46 kernel: [302561.316237] INFO: task hdparm:2052842 blocked for 
more than 604 seconds.
May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more 
than 120 seconds.

From one of the logged events:

May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 02:06:58 kernel: [304373.770209]   Tainted: G   O  
5.15.0-72-generic #79~20.04.1-Ubuntu
May 15 02:06:58 kernel: [304373.770215] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 02:06:58 kernel: [304373.770218] task:smartd  state:D stack:
0 pid: 2719 ppid: 1 flags:0x
May 15 02:06:58 kernel: [304373.770226] Call Trace: 
May 15 02:06:58 kernel: [304373.770230]  
May 15 02:06:58 kernel: [304373.770236]  __schedule+0x2cd/0x890
May 15 02:06:58 kernel: [304373.770251]  schedule+0x69/0x110
May 15 02:06:58 kernel: [304373.770260]  schedule_preempt_disabled+0xe/0x20
May 15 02:06:58 kernel: [304373.770269]  __mutex_lock.isra.0+0x20c/0x470
May 15 02:06:58 kernel: [304373.770276]  ? iput.part.0+0x61/0x1e0
May 15 02:06:58 kernel: [304373.770287]  __mutex_lock_slowpath+0x13/0x20
May 15 02:06:58 kernel: [304373.770294]  mutex_lock+0x36/0x40
May 15 02:06:58 kernel: [304373.770299]  blkdev_get_by_dev+0x11d/0x2d0
May 15 02:06:58 kernel: [304373.770309]  ? blkdev_close+0x30/0x30
May 15 02:06:58 kernel: [304373.770318]  blkdev_open+0x50/0x90
May 15 02:06:58 kernel: [304373.770325]  do_dentry_open+0x169/0x3e0
May 15 02:06:58 kernel: [304373.770336]  vfs_open+0x2d/0x40
May 15 02:06:58 kernel: [304373.770342]  do_open.isra.0+0x20d/0x480
May 15 02:06:58 kernel: [304373.770351]  path_openat+0x18e/0xe50
May 15 02:06:58 kernel: [304373.770361]  ? put_device+0x13/0x20
May 15 02:06:58 kernel: [304373.770371]  ? scsi_device_put+0x31/0x40
May 15 02:06:58 kernel: [304373.770380]  ? sd_release+0x3b/0xb0
May 15 02:06:58 kernel: [304373.770388]  do_filp_open+0xb2/0x120
May 15 02:06:58 kernel: [304373.770398]  ? __check_object_size+0x14f/0x160
May 15 02:06:58 kernel: [304373.770408]  do_sys_openat2+0x249/0x330
May 15 02:06:58 kernel: [304373.770418]  do_sys_open+0x46/0x80
May 15 02:06:58 kernel: [304373.770424]  __x64_sys_openat+0x20/0x30
May 15 02:06:58 kernel: [304373.770430]  do_syscall_64+0x5c/0xc0
May 15 02:06:58 kernel: [304373.770440]  ? do_syscall_64+0x69/0xc0
May 15 02:06:58 kernel: [304373.770448]  
entry_SYSCALL_64_after_hwframe+0x61/0xcb
May 15 02:06:58 kernel: [304373.770458] RIP: 0033:0x7f9b0d188d3b
May 15 02:06:58 kernel: [304373.770465] RSP: 002b:7ffd72a3caf0 EFLAGS: 
0246 ORIG_RAX: 0101
May 15 02:06:58 kernel: [304373.770473] RAX: ffda RBX: 
55f1346783c0 RCX: 7f9b0d188d3b
May 15 02:06:58 kernel: [304373.770479] RDX: 0800 RSI: 
55f1346783f8 RDI: ff9c
May 15 02:06:58 kernel: [304373.770484] RBP: 55f1346783f8 R08: 
0001 R09: 
May 15 02:06:58 kernel: [304373.770488] R10:  R11: 
0246 R12: 0800
May 15 02:06:58 kernel: [304373.770493] R13:  R14: 
55f1334c26a4 R15: 7f9b0cd17250
May 15 02:06:58 kernel: [304373.770500]  

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2018687

Title:
  rm -r dir on USB disk locks up hdparm on different disk

Status in linux-meta-hwe-5.15 package in Ubuntu:
  New

Bug description:
  Description:Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) 

[Kernel-packages] [Bug 2019240] Re: Pull-request to address a number of enablement issues for Orin platforms

2023-05-11 Thread Ian May
Changing Package to linux-nvidia-tegra

** Package changed: linux-nvidia (Ubuntu) => linux-nvidia-tegra (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-tegra in Ubuntu.
https://bugs.launchpad.net/bugs/2019240

Title:
  Pull-request to address a number of enablement issues for Orin
  platforms

Status in linux-nvidia-tegra package in Ubuntu:
  New

Bug description:
  [impact]
  This patch set addresses a wide variety of bugs and missing features for 
NVIDIA Orin platforms.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-tegra/+bug/2019240/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk

2023-05-08 Thread Ian! D. Allen
** Description changed:

  Description:Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz
  
  Running a "rm -r dir" on a directory with millions of files that resides
  on a disk in an external USB-3 hard drive dock locks up an unrelated
  hdparm processes running on an internal disk such that the kernel says:
  
- May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
+ May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
+ [...]
+ May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
+ [...]
+ May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.
  [...]
- May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
- [...]
- May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.
+ May  7 04:30:05 kernel: [163968.537842] INFO: task hdparm:1391162 blocked 
for more than 483 seconds.
  
  First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
  it doesn't affect anything:
  
- # \time hdparm -t /dev/sda
- /dev/sda:
-  Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
- 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
- 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps
+ # \time hdparm -t /dev/sda
+ /dev/sda:
+  Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
+ 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
+ 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps
  
  Elapsed time is about six seconds, as expected.  /dev/sda is an internal
  SSD drive.
  
  I now run this loop to show the timings and process states below:
  
- # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v refrig
+ # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v refrig
  ; done
  
  (I have some processes stopped in a freezer cgroup ("refrig") that I
  don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
  running on the drive in the USB3 drive dock and you see the rm appear
  in uninterruptible sleep along with a couple of kernel processes:
  
- Sun May  7 05:01:07 EDT 2023
- Sun May  7 05:01:08 EDT 2023
- Sun May  7 05:01:09 EDT 2023
- Sun May  7 05:01:10 EDT 2023
- Sun May  7 05:01:11 EDT 2023
- 1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
- 4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
- Sun May  7 05:01:12 EDT 2023
- 1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
- 1 04016   1  20   0 161136  1900 usbhid Dsl  ?  1:39 
/sbin/apcupsd
- 4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
- Sun May  7 05:01:13 EDT 2023
+ Sun May  7 05:01:07 EDT 2023
+ Sun May  7 05:01:08 EDT 2023
+ Sun May  7 05:01:09 EDT 2023
+ Sun May  7 05:01:10 EDT 2023
+ Sun May  7 05:01:11 EDT 2023
+ 1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
+ 4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
+ Sun May  7 05:01:12 EDT 2023
+ 1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
+ 1 04016   1  20   0 161136  1900 usbhid Dsl  ?  1:39 
/sbin/apcupsd
+ 4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
+ Sun May  7 05:01:13 EDT 2023
  
  The above lines showing those processes in uninterruptible sleep repeat
  over and over each second as the "rm -r" continues.  I then start up
  "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
  uninterruptible sleep and doesn't finish even after minutes of waiting:
  
- Sun May  7 05:01:25 EDT 2023
- 1 0 368   2  20   0  0 0 md_sup D?  2:57 
[md0_raid5]
- 1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
- 4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
- 4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
- Sun May  7 05:01:26 EDT 2023
- 1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
- 1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
- 4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 

[Kernel-packages] [Bug 2018687] Re: rm -r dir on USB disk locks up hdparm on different disk

2023-05-08 Thread Ian! D. Allen
** Description changed:

  Description:Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz
  
  Running a "rm -r dir" on a directory with millions of files that resides
  on a disk in an external USB-3 hard drive dock locks up an unrelated
  hdparm processes running on an internal disk such that the kernel says:
  
  May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
  [...]
  May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
  [...]
  May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.
- [...]
- May  7 04:30:05 kernel: [163968.537842] INFO: task hdparm:1391162 blocked 
for more than 483 seconds.
  
  First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
  it doesn't affect anything:
  
  # \time hdparm -t /dev/sda
  /dev/sda:
   Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
  0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
  2312704inputs+8outputs (0major+664minor)pagefaults 0swaps
  
  Elapsed time is about six seconds, as expected.  /dev/sda is an internal
  SSD drive.
  
  I now run this loop to show the timings and process states below:
  
  # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v refrig
  ; done
  
  (I have some processes stopped in a freezer cgroup ("refrig") that I
  don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
  running on the drive in the USB3 drive dock and you see the rm appear
  in uninterruptible sleep along with a couple of kernel processes:
  
  Sun May  7 05:01:07 EDT 2023
  Sun May  7 05:01:08 EDT 2023
  Sun May  7 05:01:09 EDT 2023
  Sun May  7 05:01:10 EDT 2023
  Sun May  7 05:01:11 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  Sun May  7 05:01:12 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 04016   1  20   0 161136  1900 usbhid Dsl  ?  1:39 
/sbin/apcupsd
  4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  Sun May  7 05:01:13 EDT 2023
  
  The above lines showing those processes in uninterruptible sleep repeat
  over and over each second as the "rm -r" continues.  I then start up
  "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
  uninterruptible sleep and doesn't finish even after minutes of waiting:
  
  Sun May  7 05:01:25 EDT 2023
  1 0 368   2  20   0  0 0 md_sup D?  2:57 
[md0_raid5]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
  4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:26 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
  4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:27 EDT 2023
  [...]
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:35 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  4 0 1423283   11939  20   0  11260  2544 blk_mq D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:36 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
  1 04985   2  20   0  0 0 rq_qos D?  0:24 
[jbd2/sdj1-8]
  1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
  4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 
rm -rf 15tb3
  4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
  Sun May  7 05:01:37 EDT 2023
  
  I keep waiting.  The above lines repeat over and over and the hdparm is
  blocked and doesn't finish.
  
  Sun May  7 05:03:32 EDT 2023
  1 0 447   2  20   0  0 0 usb_sg D?  3:18 
[usb-storage]
  1 0 1366783   2  20 

[Kernel-packages] [Bug 2018687] [NEW] rm -r dir on USB disk locks up hdparm on different disk

2023-05-07 Thread Ian! D. Allen
Public bug reported:

Description:Ubuntu 20.04.6 LTS
Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz

Running a "rm -r dir" on a directory with millions of files that resides
on a disk in an external USB-3 hard drive dock locks up an unrelated
hdparm processes running on an internal disk such that the kernel says:

May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
[...]
May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
[...]
May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.

First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
it doesn't affect anything:

# \time hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
2312704inputs+8outputs (0major+664minor)pagefaults 0swaps

Elapsed time is about six seconds, as expected.  /dev/sda is an internal
SSD drive.

I now run this loop to show the timings and process states below:

# while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v refrig
; done

(I have some processes stopped in a freezer cgroup ("refrig") that I
don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
running on the drive in the USB3 drive dock and you see the rm appear
in uninterruptible sleep along with a couple of kernel processes:

Sun May  7 05:01:07 EDT 2023
Sun May  7 05:01:08 EDT 2023
Sun May  7 05:01:09 EDT 2023
Sun May  7 05:01:10 EDT 2023
Sun May  7 05:01:11 EDT 2023
1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 rm 
-rf 15tb3
Sun May  7 05:01:12 EDT 2023
1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
1 04016   1  20   0 161136  1900 usbhid Dsl  ?  1:39 
/sbin/apcupsd
4 0 1423283   11939  20   0  10648   580 wait_o D+   pts/28 0:00 rm 
-rf 15tb3
Sun May  7 05:01:13 EDT 2023

The above lines showing those processes in uninterruptible sleep repeat
over and over each second as the "rm -r" continues.  I then start up
"hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
uninterruptible sleep and doesn't finish even after minutes of waiting:

Sun May  7 05:01:25 EDT 2023
1 0 368   2  20   0  0 0 md_sup D?  2:57 
[md0_raid5]
1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 rm 
-rf 15tb3
4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
Sun May  7 05:01:26 EDT 2023
1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 rm 
-rf 15tb3
4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
Sun May  7 05:01:27 EDT 2023
[...]
4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
Sun May  7 05:01:35 EDT 2023
1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
4 0 1423283   11939  20   0  11260  2544 blk_mq D+   pts/28 0:00 rm 
-rf 15tb3
4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
Sun May  7 05:01:36 EDT 2023
1 0 447   2  20   0  0 0 usb_sg D?  3:17 
[usb-storage]
1 04985   2  20   0  0 0 rq_qos D?  0:24 
[jbd2/sdj1-8]
1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:00 rm 
-rf 15tb3
4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
Sun May  7 05:01:37 EDT 2023

I keep waiting.  The above lines repeat over and over and the hdparm is
blocked and doesn't finish.

Sun May  7 05:03:32 EDT 2023
1 0 447   2  20   0  0 0 usb_sg D?  3:18 
[usb-storage]
1 0 1366783   2  20   0  0 0 blk_mq D?  0:02 
[kworker/u16:2+flush-8:144]
4 0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28 0:03 rm 
-rf 15tb3
4 0 14235019975  20   0   4680  4584 wb_wai DL+  pts/4  0:00 
hdparm -t /dev/sda
Sun May  7 05:03:34 EDT 

[Kernel-packages] [Bug 1959215] Re: unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to handle page fault for address: 0000000000001cc8" on Impish with node vought

2023-05-03 Thread Colin Ian King
** Changed in: stress-ng
   Status: New => Won't Fix

** Changed in: stress-ng
   Status: Won't Fix => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1959215

Title:
  unshare test in ubuntu_stress_smoke_tests triggers "BUG: unable to
  handle page fault for address: 1cc8" on Impish with node
  vought

Status in Stress-ng:
  Invalid
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Impish:
  Won't Fix

Bug description:
  Issue found on Intel node "vought" with:
    * 5.13.0-28.31
    * 5.13.0-27
    * And possibly the 5.13.0-23 from the last cycle (this test didn't finish 
properly and marked as "Incomplete" back then, just like this cycle). For more 
earlier Impish kernels, this system was not tested with this test on them.

  The test will hang with unshare test in ubuntu_stress_smoke_tests:
  12:39:39 DEBUG| [stdout] udp RETURNED 0
  12:39:39 DEBUG| [stdout] udp PASSED
  12:39:39 DEBUG| [stdout] udp-flood STARTING
  12:39:41 DEBUG| [stdout] udp-flood RETURNED 0
  12:39:41 DEBUG| [stdout] udp-flood PASSED
  12:39:41 DEBUG| [stdout] unshare STARTING
  (Test hangs here)

  And eventually the test will be killed because of the timeout setting.

  stress-ng Test suite HEAD SHA1: b81116c

  Error can be found in dmesg:
  [ 2371.109961] BUG: unable to handle page fault for address: 1cc8
  [ 2371.110074] #PF: supervisor read access in kernel mode
  [ 2371.114323] #PF: error_code(0x) - not-present page
  [ 2371.119931] PGD 0 P4D 0
  [ 2371.125257] Oops:  [#1] SMP NOPTI
  [ 2371.129247] CPU: 51 PID: 207256 Comm: stress-ng Tainted: P   O 
 5.13.0-27-generic #29-Ubuntu
  [ 2371.133203] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS 
SE5C620.86B.0D.01.0395.022720191340 02/27/2019
  [ 2371.135887] RIP: 0010:__next_zones_zonelist+0x6/0x50
  [ 2371.138525] Code: d0 0f 4e d0 3d ff 03 00 00 7f 0d 48 63 d2 5d 48 8b 04 d5 
60 e5 35 af c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <8b> 4f 08 
48 89 f8 48 89 e5 48 85 d2 75 10 eb 1d 48 63 49 50 48 0f
  [ 2371.143813] RSP: 0018:a9c8b399fac0 EFLAGS: 00010282
  [ 2371.146078] RAX:  RBX:  RCX: 

  [ 2371.148293] RDX: 9c98e894ea98 RSI: 0002 RDI: 
1cc0
  [ 2371.150477] RBP: a9c8b399fb28 R08:  R09: 

  [ 2371.152650] R10: 0002 R11: d9bfbfcc5600 R12: 
00052cc0
  [ 2371.154778] R13: 0002 R14: 0001 R15: 
00152cc0
  [ 2371.156876] FS:  7fcbd141d740() GS:9cc14ccc() 
knlGS:
  [ 2371.158936] CS:  0010 DS:  ES:  CR0: 80050033
  [ 2371.160958] CR2: 1cc8 CR3: 00059f292001 CR4: 
007706e0
  [ 2371.162950] DR0:  DR1:  DR2: 

  [ 2371.164888] DR3:  DR6: fffe0ff0 DR7: 
0400
  [ 2371.166811] PKRU: 5554
  [ 2371.168694] Call Trace:
  [ 2371.170544]  ? __alloc_pages+0x2f1/0x330
  [ 2371.172386]  kmalloc_large_node+0x45/0xb0
  [ 2371.174222]  __kmalloc_node+0x276/0x300
  [ 2371.176036]  ? queue_delayed_work_on+0x39/0x60
  [ 2371.177853]  kvmalloc_node+0x5a/0x90
  [ 2371.179622]  expand_one_shrinker_info+0x82/0x190
  [ 2371.181382]  prealloc_shrinker+0x175/0x1d0
  [ 2371.183091]  alloc_super+0x2bf/0x330
  [ 2371.184764]  ? __fput_sync+0x30/0x30
  [ 2371.186384]  sget_fc+0x74/0x2e0
  [ 2371.187951]  ? set_anon_super+0x50/0x50
  [ 2371.189473]  ? mqueue_create+0x20/0x20
  [ 2371.190944]  get_tree_keyed+0x34/0xd0
  [ 2371.192363]  mqueue_get_tree+0x1c/0x20
  [ 2371.193734]  vfs_get_tree+0x2a/0xc0
  [ 2371.195105]  fc_mount+0x13/0x50
  [ 2371.196409]  mq_init_ns+0x10a/0x1b0
  [ 2371.197667]  copy_ipcs+0x130/0x220
  [ 2371.198899]  create_new_namespaces+0xa6/0x2e0
  [ 2371.200113]  unshare_nsproxy_namespaces+0x5a/0xb0
  [ 2371.201303]  ksys_unshare+0x1db/0x3c0
  [ 2371.202480]  __x64_sys_unshare+0x12/0x20
  [ 2371.203649]  do_syscall_64+0x61/0xb0
  [ 2371.204804]  ? exit_to_user_mode_loop+0xec/0x160
  [ 2371.205966]  ? exit_to_user_mode_prepare+0x37/0xb0
  [ 2371.207102]  ? syscall_exit_to_user_mode+0x27/0x50
  [ 2371.208222]  ? __x64_sys_close+0x11/0x40
  [ 2371.209336]  ? do_syscall_64+0x6e/0xb0
  [ 2371.210438]  ? asm_exc_page_fault+0x8/0x30
  [ 2371.211545]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [ 2371.212641] RIP: 0033:0x7fcbd1562c4b
  [ 2371.213698] Code: 73 01 c3 48 8b 0d e5 e1 0e 00 f7 d8 64 89 01 48 83 c8 ff 
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d b5 e1 0e 00 f7 d8 64 89 01 48
  [ 2371.215851] RSP: 002b:7ffc5d8eb878 EFLAGS: 0246 ORIG_RAX: 
0110
  [ 2371.216846] RAX: ffda RBX: 7ffc5d8eba20 RCX: 

[Kernel-packages] [Bug 1961076] Re: linux-hwe-5.4 ADT test failure (ubuntu_stress_smoke_test) with linux-hwe-5.4/5.4.0-100.113~18.04.1

2023-05-03 Thread Colin Ian King
** Changed in: stress-ng
   Status: New => Fix Released

** Changed in: stress-ng
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: stress-ng
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-5.4 in Ubuntu.
https://bugs.launchpad.net/bugs/1961076

Title:
  linux-hwe-5.4 ADT test failure (ubuntu_stress_smoke_test) with linux-
  hwe-5.4/5.4.0-100.113~18.04.1

Status in Stress-ng:
  Fix Released
Status in ubuntu-kernel-tests:
  New
Status in linux-hwe-5.4 package in Ubuntu:
  New
Status in linux-hwe-5.4 source package in Bionic:
  New

Bug description:
  The 'dev-shm' stress-ng test is failing with bionic/linux-hwe-5.4
  5.4.0-100.113~18.04.1 on ADT, only on ppc64el.

  Testing failed on:
  ppc64el: 
https://autopkgtest.ubuntu.com/results/autopkgtest-bionic/bionic/ppc64el/l/linux-hwe-5.4/20220216_115416_c1d6c@/log.gz

  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] stress-ng 0.13.11 
g48be8ff4ffc4
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] system: Linux autopkgtest 
5.4.0-100-generic #113~18.04.1-Ubuntu SMP Mon Feb 7 15:02:55 UTC 2022 ppc64le
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] RAM total: 7.9G, RAM free: 
3.3G, swap free: 1023.9M
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] 4 processors online, 4 
processors configured
  11:35:08 DEBUG| [stdout] stress-ng: info:  [26897] setting to a 5 second run 
per stressor
  11:35:08 DEBUG| [stdout] stress-ng: info:  [26897] dispatching hogs: 4 dev-shm
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] cache allocate: using 
cache maximum level L1
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] cache allocate: shared 
cache buffer size: 32K
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] starting stressors
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26899] stress-ng-dev-shm: started 
[26899] (instance 0)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26900] stress-ng-dev-shm: started 
[26900] (instance 1)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: started 
[26901] (instance 2)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26902] stress-ng-dev-shm: started 
[26902] (instance 3)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] 4 stressors started
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26899] stress-ng-dev-shm: 
assuming killed by OOM killer, restarting again (instance 0)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26902] stress-ng-dev-shm: 
assuming killed by OOM killer, restarting again (instance 3)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: 
assuming killed by OOM killer, restarting again (instance 2)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26900] stress-ng-dev-shm: 
assuming killed by OOM killer, restarting again (instance 1)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] 
(stress-ng-dev-shm) terminated on signal: 9 (Killed)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] 
(stress-ng-dev-shm) was killed by the OOM killer
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26899] terminated
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] 
(stress-ng-dev-shm) terminated on signal: 9 (Killed)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] 
(stress-ng-dev-shm) was possibly killed by the OOM killer
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26900] terminated
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26901] stress-ng-dev-shm: exited 
[26901] (instance 2)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26901] terminated
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] 
(stress-ng-dev-shm) terminated on signal: 9 (Killed)
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] 
(stress-ng-dev-shm) was killed by the OOM killer
  11:35:08 DEBUG| [stdout] stress-ng: debug: [26897] process [26902] terminated
  11:35:08 DEBUG| [stdout] stress-ng: info:  [26897] successful run completed 
in 5.06s
  11:35:08 DEBUG| [stdout] stress-ng: fail:  [26897] dev_shm instance 0 
corrupted bogo-ops counter, 14 vs 0
  11:35:08 DEBUG| [stdout] stress-ng: fail:  [26897] dev_shm instance 0 hash 
error in bogo-ops counter and run flag, 2146579844 vs 0
  11:35:08 DEBUG| [stdout] stress-ng: fail:  [26897] dev_shm instance 1 
corrupted bogo-ops counter, 13 vs 0
  11:35:08 DEBUG| [stdout] stress-ng: fail:  [26897] dev_shm instance 1 hash 
error in bogo-ops counter and run flag, 1093487894 vs 0
  11:35:08 DEBUG| [stdout] stress-ng: fail:  [26897] dev_shm instance 3 
corrupted bogo-ops counter, 13 vs 0
  11:35:08 DEBUG| [stdout] info: 5 failures reached, aborting stress process
  11:35:08 DEBUG| [stdout] stress-ng: fail:  [26897] dev_shm instance 3 hash 
error in bogo-ops counter and run flag, 1093487894 vs 0
  11:35:08 DEBUG| [stdout] stress-ng: fail:  [26897] metr

[Kernel-packages] [Bug 1999731] Re: disk stress test failing with code 7

2023-05-03 Thread Colin Ian King
** Changed in: stress-ng
   Status: In Progress => Fix Released

** Changed in: stress-ng
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: stress-ng
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1999731

Title:
  disk stress test failing with code 7

Status in Stress-ng:
  Fix Released
Status in linux package in Ubuntu:
  Invalid
Status in stress-ng package in Ubuntu:
  Fix Released

Bug description:
  Since mid of November we see lots of disk stress test failing with
  multiple Ubuntu kernel e.g. bionic-hwe, focal, focal-hwe. Most of them
  are with lockofd stressor and system are still alive after stress
  test.

  05 Nov 08:51: Running stress-ng lockofd stressor for 240 seconds...
  ** stress-ng exited with code 7

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1999731/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup

2023-05-03 Thread Colin Ian King
** Changed in: stress-ng
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1968361

Title:
  rawsock test BUG: soft lockup

Status in Linux:
  Fix Released
Status in Stress-ng:
  Fix Released
Status in linux package in Ubuntu:
  Invalid

Bug description:
  When running the rawsock stressor on large system with 32 CPUs and
  above, I always hit soft lockup in the kernel, and sometime it will
  lock up the system if running it for longtime. This issue is on all
  major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15

  
  my system:
  stress-ng V0.13.03-5-g9093bce7

  #lscpu | grep CPU
  CPU(s):  64
  On-line CPU(s) list: 0-63
  NUMA node0 CPU(s):   0-63

  # ./stress-ng --rawsock 20 -t 5
  stress-ng: info:  [49748] setting to a 5 second run per stressor
  stress-ng: info:  [49748] dispatching hogs: 20 rawsock

  Message from syslogd@rain65 at Apr  8 12:18:26 ...
   kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781]
  

  If I run with --timeout 60 secs, it will lock up the systems.

  The issue is lock starvation in the kernel:
  - when stressor create an instance, forking a new child/client and 
parent/server processes, recreating sockets for these processes. The kernel 
acquires the Write lock for adding them to raw sock hash table.
  - the client process immediately starts sending data in a do while {} loop. 
The kernel acquires the Read Lock for accessing raw sock hash table, and 
cloning the data packets for all raw socket processes.
  - The main stress-ng process may still continue to create the rest of 
instances. The kernel may hit the lock starvation (as error shown above)
  - similar to it, when the timeout expires, the parents would try to close 
their sockets, which the kernel also try to acquire the Write Lock, before 
sending SIGKILL to their child processes. We may hit the lock starvation, since 
clients have not closed their sockets and continue sending data.

  I'm not sure this is intended, but to avoid the kernel lock starvation
  in raw socket, I propose the simple patch attached. I has tested it a
  large system with 128 CPUs without hitting any BUG: soft lock up.

  Thanks,
  Thinh Tran

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2017529] Re: livecd-rootfs: kernel bump

2023-04-25 Thread Ian Kumlien
It's in a server room and it's hard to get logs out - I also have
pressure to make it usable.

IMHO, just bump the kernel on the livecd.

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2017529

Title:
  livecd-rootfs: kernel bump

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Booting on a larger HP EPYC server results in no PCIe devices
  available.

  The kernel thinks that mem regions overlap and eventually ends up in
  stating no address space available for other things

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2017529/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2012260] Re: Add support for Adler Lake N

2023-03-20 Thread Colin Ian King
** Changed in: thermald (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2012260

Title:
  Add support for Adler Lake N

Status in thermald package in Ubuntu:
  In Progress

Bug description:
  [Impact]

   * Support thermald on Adler Lake N CPU.

  [Test Plan]

   * Use a machine with a Adler Lake N cpu.

   * systemctl status thermald

   * Status of thermald should be `running`

  [Where problems could occur]

   * This change is to add support for Adler Lake N in thermald, which
  won't impact other hardware.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2012260/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2009676] Re: Add support for Raptor Lake S CPUs

2023-03-08 Thread Colin Ian King
@koba, can you test the focal version too?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2009676

Title:
  Add support for Raptor Lake S CPUs

Status in OEM Priority Project:
  New
Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Jammy:
  Fix Committed

Bug description:
  [Impact]

   * Support thermald on Raptor Lake S CPU.

  [Test Plan]

   * Use a machine with a Raptor Lake S cpu.

   * systemctl status thermald

   * Status of thermald should be `running`

  [Where problems could occur]

   * This change is to add support for Raptor Lake S in thermald, which
  won't impact other hardware.

  [Other Info]

  
https://github.com/intel/thermal_daemon/commit/e03493dc1e972374c1686492655250f8f48a15ba

To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/2009676/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2009676] Re: Add support for Raptor Lake S CPUs

2023-03-08 Thread Colin Ian King
I believe it's useful to have RPL focal support in thermald if users use
newer HWE kernels. Plus the change is basically adding in some CPU id's
so it's a small delta for useful potential support.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2009676

Title:
  Add support for Raptor Lake S CPUs

Status in OEM Priority Project:
  New
Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Jammy:
  Fix Committed

Bug description:
  [Impact]

   * Support thermald on Raptor Lake S CPU.

  [Test Plan]

   * Use a machine with a Raptor Lake S cpu.

   * systemctl status thermald

   * Status of thermald should be `running`

  [Where problems could occur]

   * This change is to add support for Raptor Lake S in thermald, which
  won't impact other hardware.

  [Other Info]

  
https://github.com/intel/thermal_daemon/commit/e03493dc1e972374c1686492655250f8f48a15ba

To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/2009676/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2009676] Re: Add support for Raptor Lake S CPUs

2023-03-08 Thread Colin Ian King
I've applied the patch for Jammy and uploaded a new version ready for
SRU.

thermald (2.4.9-1ubuntu0.2) jammy; urgency=medium

  * Add support for Raptor Lake S CPUs. (LP: #2009676)

Date: Wed, 8 Mar 2023 11:28:31 +
Changed-By: Colin Ian King 
Maintainer: Ubuntu Developers 
https://launchpad.net/ubuntu/+source/thermald/2.4.9-1ubuntu0.2

==

 OK: thermald_2.4.9.orig.tar.xz
 OK: thermald_2.4.9-1ubuntu0.2.debian.tar.xz
 OK: thermald_2.4.9-1ubuntu0.2.dsc
 -> Component: main Section: misc

Upload Warnings:
Redirecting ubuntu jammy to ubuntu jammy-proposed.
This upload awaits approval by a distro manager


Announcing to jammy-chan...@lists.ubuntu.com

Thank you for your contribution to Ubuntu.

** Changed in: thermald (Ubuntu)
Milestone: None => jammy-updates

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2009676

Title:
  Add support for Raptor Lake S CPUs

Status in OEM Priority Project:
  New
Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Jammy:
  New

Bug description:
  [Impact]

   * Support thermald on Raptor Lake S CPU.

  [Test Plan]

   * Use a machine with a Raptor Lake S cpu.

   * systemctl status thermald

   * Status of thermald should be `running`

  [Where problems could occur]

   * This change is to add support for Raptor Lake S in thermald, which
  won't impact other hardware.

  [Other Info]

  
https://github.com/intel/thermal_daemon/commit/e03493dc1e972374c1686492655250f8f48a15ba

To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/2009676/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup

2023-03-01 Thread Colin Ian King
Also there is a re-forking delay added to allow instances to fire up and
back off if resources get low. These changes have been tested with 256,
1024, 4096 and 8192 instances on a 24 thread system with 32GB of memory.

** Changed in: linux (Ubuntu)
   Status: New => Invalid

** Changed in: linux (Ubuntu)
   Importance: High => Low

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1968361

Title:
  rawsock test BUG: soft lockup

Status in Linux:
  Fix Released
Status in Stress-ng:
  Fix Committed
Status in linux package in Ubuntu:
  Invalid

Bug description:
  When running the rawsock stressor on large system with 32 CPUs and
  above, I always hit soft lockup in the kernel, and sometime it will
  lock up the system if running it for longtime. This issue is on all
  major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15

  
  my system:
  stress-ng V0.13.03-5-g9093bce7

  #lscpu | grep CPU
  CPU(s):  64
  On-line CPU(s) list: 0-63
  NUMA node0 CPU(s):   0-63

  # ./stress-ng --rawsock 20 -t 5
  stress-ng: info:  [49748] setting to a 5 second run per stressor
  stress-ng: info:  [49748] dispatching hogs: 20 rawsock

  Message from syslogd@rain65 at Apr  8 12:18:26 ...
   kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781]
  

  If I run with --timeout 60 secs, it will lock up the systems.

  The issue is lock starvation in the kernel:
  - when stressor create an instance, forking a new child/client and 
parent/server processes, recreating sockets for these processes. The kernel 
acquires the Write lock for adding them to raw sock hash table.
  - the client process immediately starts sending data in a do while {} loop. 
The kernel acquires the Read Lock for accessing raw sock hash table, and 
cloning the data packets for all raw socket processes.
  - The main stress-ng process may still continue to create the rest of 
instances. The kernel may hit the lock starvation (as error shown above)
  - similar to it, when the timeout expires, the parents would try to close 
their sockets, which the kernel also try to acquire the Write Lock, before 
sending SIGKILL to their child processes. We may hit the lock starvation, since 
clients have not closed their sockets and continue sending data.

  I'm not sure this is intended, but to avoid the kernel lock starvation
  in raw socket, I propose the simple patch attached. I has tested it a
  large system with 128 CPUs without hitting any BUG: soft lock up.

  Thanks,
  Thinh Tran

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup

2023-03-01 Thread Colin Ian King
This fix will land in the next release of stress-ng at the end of March
2023

** Changed in: stress-ng
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1968361

Title:
  rawsock test BUG: soft lockup

Status in Linux:
  Fix Released
Status in Stress-ng:
  Fix Committed
Status in linux package in Ubuntu:
  Invalid

Bug description:
  When running the rawsock stressor on large system with 32 CPUs and
  above, I always hit soft lockup in the kernel, and sometime it will
  lock up the system if running it for longtime. This issue is on all
  major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15

  
  my system:
  stress-ng V0.13.03-5-g9093bce7

  #lscpu | grep CPU
  CPU(s):  64
  On-line CPU(s) list: 0-63
  NUMA node0 CPU(s):   0-63

  # ./stress-ng --rawsock 20 -t 5
  stress-ng: info:  [49748] setting to a 5 second run per stressor
  stress-ng: info:  [49748] dispatching hogs: 20 rawsock

  Message from syslogd@rain65 at Apr  8 12:18:26 ...
   kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781]
  

  If I run with --timeout 60 secs, it will lock up the systems.

  The issue is lock starvation in the kernel:
  - when stressor create an instance, forking a new child/client and 
parent/server processes, recreating sockets for these processes. The kernel 
acquires the Write lock for adding them to raw sock hash table.
  - the client process immediately starts sending data in a do while {} loop. 
The kernel acquires the Read Lock for accessing raw sock hash table, and 
cloning the data packets for all raw socket processes.
  - The main stress-ng process may still continue to create the rest of 
instances. The kernel may hit the lock starvation (as error shown above)
  - similar to it, when the timeout expires, the parents would try to close 
their sockets, which the kernel also try to acquire the Write Lock, before 
sending SIGKILL to their child processes. We may hit the lock starvation, since 
clients have not closed their sockets and continue sending data.

  I'm not sure this is intended, but to avoid the kernel lock starvation
  in raw socket, I propose the simple patch attached. I has tested it a
  large system with 128 CPUs without hitting any BUG: soft lock up.

  Thanks,
  Thinh Tran

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup

2023-03-01 Thread Colin Ian King
Added an ENOBUFS check on the sender with priority dropping on ENOBUFS
errors and also a timer backoff delay.  Added OOM killer respawning that
can be overridden using the --oomable to allow overcommitted systems to
ether respawn OOM'd rawsock instances (default) or not respawn
(--oomable).

Fix committed upstream: https://github.com/ColinIanKing/stress-
ng/commit/e4d3b90267243d7505399e7059950097d9bd50ae

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1968361

Title:
  rawsock test BUG: soft lockup

Status in Linux:
  Fix Released
Status in Stress-ng:
  Fix Committed
Status in linux package in Ubuntu:
  Invalid

Bug description:
  When running the rawsock stressor on large system with 32 CPUs and
  above, I always hit soft lockup in the kernel, and sometime it will
  lock up the system if running it for longtime. This issue is on all
  major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15

  
  my system:
  stress-ng V0.13.03-5-g9093bce7

  #lscpu | grep CPU
  CPU(s):  64
  On-line CPU(s) list: 0-63
  NUMA node0 CPU(s):   0-63

  # ./stress-ng --rawsock 20 -t 5
  stress-ng: info:  [49748] setting to a 5 second run per stressor
  stress-ng: info:  [49748] dispatching hogs: 20 rawsock

  Message from syslogd@rain65 at Apr  8 12:18:26 ...
   kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781]
  

  If I run with --timeout 60 secs, it will lock up the systems.

  The issue is lock starvation in the kernel:
  - when stressor create an instance, forking a new child/client and 
parent/server processes, recreating sockets for these processes. The kernel 
acquires the Write lock for adding them to raw sock hash table.
  - the client process immediately starts sending data in a do while {} loop. 
The kernel acquires the Read Lock for accessing raw sock hash table, and 
cloning the data packets for all raw socket processes.
  - The main stress-ng process may still continue to create the rest of 
instances. The kernel may hit the lock starvation (as error shown above)
  - similar to it, when the timeout expires, the parents would try to close 
their sockets, which the kernel also try to acquire the Write Lock, before 
sending SIGKILL to their child processes. We may hit the lock starvation, since 
clients have not closed their sockets and continue sending data.

  I'm not sure this is intended, but to avoid the kernel lock starvation
  in raw socket, I propose the simple patch attached. I has tested it a
  large system with 128 CPUs without hitting any BUG: soft lock up.

  Thanks,
  Thinh Tran

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1999731] Re: disk stress test failing with code 7

2023-03-01 Thread Colin Ian King
** Changed in: stress-ng (Ubuntu)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1999731

Title:
  disk stress test failing with code 7

Status in Stress-ng:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in stress-ng package in Ubuntu:
  Fix Released

Bug description:
  Since mid of November we see lots of disk stress test failing with
  multiple Ubuntu kernel e.g. bionic-hwe, focal, focal-hwe. Most of them
  are with lockofd stressor and system are still alive after stress
  test.

  05 Nov 08:51: Running stress-ng lockofd stressor for 240 seconds...
  ** stress-ng exited with code 7

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1999731/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007579] Re: Raptor Lake Thermald ITMT version 2 support

2023-03-01 Thread Colin Ian King
@koba, if you have access to a Dell XPS 9320 then that would be super
useful to verify too - thanks!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2007579

Title:
  Raptor Lake Thermald ITMT version 2 support

Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Bionic:
  Won't Fix
Status in thermald source package in Focal:
  Won't Fix
Status in thermald source package in Jammy:
  In Progress
Status in thermald source package in Kinetic:
  Fix Committed
Status in thermald source package in Lunar:
  Fix Released

Bug description:
  == SRU Justification Kinetic ==

  Raptor Lake system uses ITMT v2 instead of V1 for thermal
  configuration via GDDV.

  This was observed on Dell XPS 9320 system.
  Because thermald can't parse V2 table, it is not getting correct thermal 
threshold temperature and power limits.

  == The Fix ==

  This is fixed in upstream thermald by the patch:
  
https://github.com/intel/thermal_daemon/commit/90d56bc06cdcf78e7398ea7da389401516591774
  This fix is part of Thermald 2.5.2 release.

  The fix applies cleanly and this is already in Ubuntu Lunar in
  thermald 2.5.2.  The fix checks for illegal ITMT version and handles
  version 2 as a specific exceptional case.

  == Regression Risks ==

  For systems that do not used ITMT, no change in behaviour will occur.
  Systems with versions > 2 (currently not valid) will not have ITMT
  parsed anymore; this will avoid misinterpreting unsupported ITMT data.
  Finally, version 2 of ITMT will be now parsed differently and
  additional fields will be parsed and these will be ignored as
  intended.

  == Test Plan ==

  Test against a Dell XPS 9320 system. See if it handles the ITMT correctly. 
The thermald log should indicate version 2 is being used with the message: 
  "ignore dummy_str: ds d1 d2 d3 " where ds = a string, d1 .. d3 are uint64 
values that are parsed and ignored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2007579/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup

2023-03-01 Thread Colin Ian King
Looks like the kernel is running out of resources and it is doing Out-
of-memory killing of various processes. I think I have ways of reducing
this from occurring.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1968361

Title:
  rawsock test BUG: soft lockup

Status in Linux:
  Fix Released
Status in Stress-ng:
  New
Status in linux package in Ubuntu:
  New

Bug description:
  When running the rawsock stressor on large system with 32 CPUs and
  above, I always hit soft lockup in the kernel, and sometime it will
  lock up the system if running it for longtime. This issue is on all
  major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15

  
  my system:
  stress-ng V0.13.03-5-g9093bce7

  #lscpu | grep CPU
  CPU(s):  64
  On-line CPU(s) list: 0-63
  NUMA node0 CPU(s):   0-63

  # ./stress-ng --rawsock 20 -t 5
  stress-ng: info:  [49748] setting to a 5 second run per stressor
  stress-ng: info:  [49748] dispatching hogs: 20 rawsock

  Message from syslogd@rain65 at Apr  8 12:18:26 ...
   kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781]
  

  If I run with --timeout 60 secs, it will lock up the systems.

  The issue is lock starvation in the kernel:
  - when stressor create an instance, forking a new child/client and 
parent/server processes, recreating sockets for these processes. The kernel 
acquires the Write lock for adding them to raw sock hash table.
  - the client process immediately starts sending data in a do while {} loop. 
The kernel acquires the Read Lock for accessing raw sock hash table, and 
cloning the data packets for all raw socket processes.
  - The main stress-ng process may still continue to create the rest of 
instances. The kernel may hit the lock starvation (as error shown above)
  - similar to it, when the timeout expires, the parents would try to close 
their sockets, which the kernel also try to acquire the Write Lock, before 
sending SIGKILL to their child processes. We may hit the lock starvation, since 
clients have not closed their sockets and continue sending data.

  I'm not sure this is intended, but to avoid the kernel lock starvation
  in raw socket, I propose the simple patch attached. I has tested it a
  large system with 128 CPUs without hitting any BUG: soft lock up.

  Thanks,
  Thinh Tran

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007579] Re: Raptor Lake Thermald ITMT version 2 support

2023-02-28 Thread Colin Ian King
I've been exercising the existing code paths of thermald for several
days now with no observable regression in behaviour. I cannot test the
new code path change for this fix as I don't have the exact same system
as that reported in the bug.

For what I can see, there is no regression on this single change for
Kinetic.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2007579

Title:
  Raptor Lake Thermald ITMT version 2 support

Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Bionic:
  Won't Fix
Status in thermald source package in Focal:
  Won't Fix
Status in thermald source package in Jammy:
  In Progress
Status in thermald source package in Kinetic:
  Fix Committed
Status in thermald source package in Lunar:
  Fix Released

Bug description:
  == SRU Justification Kinetic ==

  Raptor Lake system uses ITMT v2 instead of V1 for thermal
  configuration via GDDV.

  This was observed on Dell XPS 9320 system.
  Because thermald can't parse V2 table, it is not getting correct thermal 
threshold temperature and power limits.

  == The Fix ==

  This is fixed in upstream thermald by the patch:
  
https://github.com/intel/thermal_daemon/commit/90d56bc06cdcf78e7398ea7da389401516591774
  This fix is part of Thermald 2.5.2 release.

  The fix applies cleanly and this is already in Ubuntu Lunar in
  thermald 2.5.2.  The fix checks for illegal ITMT version and handles
  version 2 as a specific exceptional case.

  == Regression Risks ==

  For systems that do not used ITMT, no change in behaviour will occur.
  Systems with versions > 2 (currently not valid) will not have ITMT
  parsed anymore; this will avoid misinterpreting unsupported ITMT data.
  Finally, version 2 of ITMT will be now parsed differently and
  additional fields will be parsed and these will be ignored as
  intended.

  == Test Plan ==

  Test against a Dell XPS 9320 system. See if it handles the ITMT correctly. 
The thermald log should indicate version 2 is being used with the message: 
  "ignore dummy_str: ds d1 d2 d3 " where ds = a string, d1 .. d3 are uint64 
values that are parsed and ignored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2007579/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2007579] Re: Raptor Lake Thermald ITMT version 2 support

2023-02-16 Thread Colin Ian King
** Changed in: thermald (Ubuntu Jammy)
   Status: Won't Fix => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/2007579

Title:
  Raptor Lake Thermald ITMT version 2 support

Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Bionic:
  Won't Fix
Status in thermald source package in Focal:
  Won't Fix
Status in thermald source package in Jammy:
  In Progress
Status in thermald source package in Kinetic:
  In Progress
Status in thermald source package in Lunar:
  Fix Released

Bug description:
  == SRU Justification Kinetic ==

  Raptor Lake system uses ITMT v2 instead of V1 for thermal
  configuration via GDDV.

  This was observed on Dell XPS 9320 system.
  Because thermald can't parse V2 table, it is not getting correct thermal 
threshold temperature and power limits.

  == The Fix ==

  This is fixed in upstream thermald by the patch:
  
https://github.com/intel/thermal_daemon/commit/90d56bc06cdcf78e7398ea7da389401516591774
  This fix is part of Thermald 2.5.2 release.

  The fix applies cleanly and this is already in Ubuntu Lunar in
  thermald 2.5.2.  The fix checks for illegal ITMT version and handles
  version 2 as a specific exceptional case.

  == Regression Risks ==

  For systems that do not used ITMT, no change in behaviour will occur.
  Systems with versions > 2 (currently not valid) will not have ITMT
  parsed anymore; this will avoid misinterpreting unsupported ITMT data.
  Finally, version 2 of ITMT will be now parsed differently and
  additional fields will be parsed and these will be ignored as
  intended.

  == Test Plan ==

  Test against a Dell XPS 9320 system. See if it handles the ITMT correctly. 
The thermald log should indicate version 2 is being used with the message: 
  "ignore dummy_str: ds d1 d2 d3 " where ds = a string, d1 .. d3 are uint64 
values that are parsed and ignored.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/2007579/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


  1   2   3   4   5   6   7   8   9   10   >