Bug#1057005: linux-image-6.1.0-13-amd64: Kernel Oops in nfs4_do_reclaim, which exits with irqs disabled

2023-12-04 Thread Salvatore Bonaccorso
Control: tags -1 + moreinfo

Hi James,

On Mon, Nov 27, 2023 at 08:00:52PM +, James Chapman wrote:
> Package: src:linux
> Version: 6.1.55-1
> Severity: important
> X-Debbugs-Cc: jamescope...@gmail.com
> 
> Dear Maintainer,
> 
> Hi, I have experienced an issue where my client lost access to an
> NFS server for a period of around 15 minutes, then immediately
> following server recovery, I experienced a kernel oops on the client
> (details below). What made this more severe was the fact that
> nfs4_do_reclaim exited with irqs disabled, which is possibly what
> resulted in a number of "rcu_preempt self-detected stall on CPU"
> errors and a very unstable system, leaving me no choice but to hit
> the reset button.

Are you able to reproduce the issue? otherwise it is quite hard to
make any assessment for this bug.

Regards,
Salvatore



Bug#1057005: linux-image-6.1.0-13-amd64: Kernel Oops in nfs4_do_reclaim, which exits with irqs disabled

2023-11-27 Thread James Chapman
Package: src:linux
Version: 6.1.55-1
Severity: important
X-Debbugs-Cc: jamescope...@gmail.com

Dear Maintainer,

Hi, I have experienced an issue where my client lost access to an NFS server 
for a period of around 15 minutes, then immediately following server recovery, 
I experienced a kernel oops on the client (details below). What made this more 
severe was the fact that nfs4_do_reclaim exited with irqs disabled, which is 
possibly what resulted in a number of "rcu_preempt self-detected stall on CPU" 
errors and a very unstable system, leaving me no choice but to hit the reset 
button.

BUG: unable to handle page fault for address: fff8
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 37de15067 P4D 37de15067 PUD 37de17067 PMD 0 
Oops:  [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 4154600 Comm: 192.168.253.7-m Not tainted 6.1.0-13-amd64 #1  Debian 
6.1.55-1
Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 
3405 02/01/2021
RIP: 0010:complete+0x38/0x80
Code: 89 fb 4c 89 e7 e8 c8 d8 93 00 48 89 c5 8b 03 83 f8 ff 74 05 83 c0 01 89 
03 48 8b 53 10 48 
8d 43 10 48 39 c2 74 2e 48 8b 5b 10 <48> 8b 7b f8 e8 df cc fd ff 48 89 df e8 17 
bd 43 00 84 c0 7
4 0e 48
RSP: 0018:b78d1e55bdc0 EFLAGS: 00010013
RAX: 9df24a45f2f8 RBX:  RCX: 
RDX:  RSI:  RDI: 0001
RBP: 0247 R08: 9defc1f84040 R09: 9dee419a5910
R10: 0001 R11: 0001 R12: 9df24a45f2f0
R13: c12613a0 R14: 9dee5532d820 R15: 9defc1f84000
FS:  () GS:9df54ec4() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff8 CR3: 0007eaf16000 CR4: 00350ee0
Call Trace:
 
 ? __die_body.cold+0x1a/0x1f
 ? page_fault_oops+0xd2/0x2b0
 ? exc_page_fault+0xca/0x170
 ? asm_exc_page_fault+0x22/0x30
 ? complete+0x38/0x80
 nfs4_do_reclaim+0x5b6/0x810 [nfsv4]
 nfs4_run_state_manager+0x882/0xab0 [nfsv4]
 ? __schedule+0x359/0xa20
 ? preempt_count_add+0x6a/0xa0
 ? nfs4_do_reclaim+0x810/0x810 [nfsv4]
 kthread+0xe9/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30
 
Modules linked in: sd_mod uas usb_storage tcp_diag udp_diag inet_diag veth 
vhost_net vhost vhost
_iotlb tun macvtap macvlan tap xt_CHECKSUM ipt_REJECT nf_reject_ipv4 
cpufreq_powersave cpufreq_u
serspace cpufreq_ondemand cpufreq_conservative rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nf
s lockd grace fscache netfs nft_masq sunrpc bridge nft_chain_nat xt_MASQUERADE 
xt_nat nf_nat xt_
multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp 
nft_compat nf_tables
 libcrc32c nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common amd64_edac 
edac_mce_amd kvm_am
d nouveau kvm irqbypass ghash_clmulni_intel sha512_ssse3 video sha512_generic 
drm_display_helper
 asus_ec_sensors cec evdev aesni_intel rc_core crypto_simd pl2303 
drm_ttm_helper cryptd ttm usbs
erial rapl drm_kms_helper sp5100_tco ccp pcspkr wmi_bmof mxm_wmi watchdog 
k10temp button acpi_cp
ufreq sg nct6775 nct6775_core hwmon_vid 8021q garp stp mrp llc drm fuse loop 
dm_mod efi_pstore c
onfigfs efivarfs ip_tables
 x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic mlx4_ib ib_uverbs 
ib_core mlx4_en hid_g
eneric usbhid hid sr_mod cdrom ahci libahci xhci_pci xhci_hcd nvme libata 
mlx4_core crc32_pclmul
 nvme_core crc32c_intel usbcore igb scsi_mod t10_pi i2c_piix4 crc64_rocksoft 
crc64 i2c_algo_bit 
crc_t10dif scsi_common usb_common crct10dif_generic dca crct10dif_pclmul 
crct10dif_common wmi
CR2: fff8
---[ end trace  ]---
RIP: 0010:complete+0x38/0x80
Code: 89 fb 4c 89 e7 e8 c8 d8 93 00 48 89 c5 8b 03 83 f8 ff 74 05 83 c0 01 89 
03 48 8b 53 10 48 8d 43 10 48 39 c2 74 2e 48 8b 5b 10 <48> 8b 7b f8 e8 df cc fd 
ff 48 89 df e8 17 bd 43 00 84 c0 74 0e 48
RSP: 0018:b78d1e55bdc0 EFLAGS: 00010013
RAX: 9df24a45f2f8 RBX:  RCX: 
RDX:  RSI:  RDI: 0001
RBP: 0247 R08: 9defc1f84040 R09: 9dee419a5910
R10: 0001 R11: 0001 R12: 9df24a45f2f0
R13: c12613a0 R14: 9dee5532d820 R15: 9defc1f84000
FS:  () GS:9df54ec4() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff8 CR3: 0007eaf16000 CR4: 00350ee0
note: 192.168.253.7-m[4154600] exited with irqs disabled
note: 192.168.253.7-m[4154600] exited with preempt_count 2


-- Package-specific info:
** Version:
Linux version 6.1.0-13-amd64 (debian-ker...@lists.debian.org) (gcc-12 (Debian 
12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP 
PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29)

** Command line:
BOOT_IMAGE=/vmlinuz-6.1.0-13-amd64 
root=UUID=dc88b740-868e-4b8f-9e70-2a5d47104b70 ro ipv6.disable=1 nomodeset 
clocksource=hpet retbleed=off