Public bug reported:
== Comment: #0 - Tasmiya Nalatwad <[email protected]> - 2024-05-28
04:35:50 ==
--- Description ---
When sosreport command is executed the kernel OOPS crash is happening and lpar
is rebooting. As kdump was enabled the dump is captured.
Note : The bug looks similar Bug 206504 Which is seen on z lpars.
--- Lpar Details ---
1. PowerVM
2. FW: FW1060.00 (NH1060_026)
3. OS: Ubuntu 24.04
4. Kernel: 6.8.0-31-generic
5. Mem (free -mh): 47Gi
6. cpus: 40
--- Steps to reproduce ---
1. run sosreport command on the lpar and the crash is seen when the sosreport
is starting to capture dump.
--- Traces ---
root@ubuntulp2host:~# sosreport
Please note the 'sosreport' command has been deprecated in favor of the new
'sos' command, E.G. 'sos report'.
Redirecting to 'sos report '
sosreport (version 4.5.6)
This command will collect system configuration and diagnostic
information from this Ubuntu system.
For more information on Canonical visit:
Community Website : https://www.ubuntu.com/
Commercial Support : https://www.canonical.com
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.
No changes will be made to system configuration.
Press ENTER to continue, or CTRL-C to quit.
Optionally, please enter the case id that you are generating this report
for []:
Setting up archive ...
Setting up plugins ...
[plugin:lxd] skipped command 'lxc image list': required kmods missing:
ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter,
iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat,
ip6_tables, ip6table_filter.
[plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_nat,
ip6table_raw, bpfilter, iptable_mangle, iptable_filter, iptable_raw,
ebtable_filter, ip6table_mangle, ebtables, iptable_nat, ip6_tables,
ip6table_filter.
[plugin:lxd] skipped command 'lxc network list': required kmods missing:
ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter,
iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat,
ip6_tables, ip6table_filter.
[plugin:lxd] skipped command 'lxc profile list': required kmods missing:
ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter,
iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat,
ip6_tables, ip6table_filter.
[plugin:lxd] skipped command 'lxc storage list': required kmods missing:
ip6table_nat, ip6table_raw, bpfilter, iptable_mangle, iptable_filter,
iptable_raw, ebtable_filter, ip6table_mangle, ebtables, iptable_nat,
ip6_tables, ip6table_filter.
[plugin:networking] skipped command 'ip -s macsec show': required kmods
missing: macsec. Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing:
af_packet_diag, unix_diag, netlink_diag, udp_diag, inet_diag, tcp_diag,
xsk_diag. Use '--allow-system-changes' to enable collection.
Not all environment variables set. Source the environment file for the user
intended to connect to the OpenStack environment.
[plugin:ufw] skipped command 'ufw status numbered': required kmods missing:
bpfilter, iptable_filter.
[plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter,
iptable_filter.
Running plugins. Please wait ...
Starting 21/75 firewall_tables [Running: cloud_init ebpf filesys
firewall_tables] [ 1057.076626] Kernel attempted to read user page (0) -
exploit attempt? (uid: 0)
[ 1057.076645] BUG: Kernel NULL pointer dereference on read at 0x00000000
[ 1057.076650] Faulting instruction address: 0xc0000000016ff114
[ 1057.076655] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1057.076659] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 1057.076665] Modules linked in: rpcsec_gss_krb5 xt_CHECKSUM xt_MASQUERADE
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc rdma_ucm
ib_uverbs qrtr rdma_cm iw_cm ib_cm ib_core cfg80211 binfmt_misc kvm_hv kvm
vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace nf_tables nvme_fabrics
dm_multipath nvme_core nvme_auth sunrpc nfnetlink ip_tables x_tables autofs4
btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 nx_compress_pseries
nx_compress ibmvscsi 842_decompress ibmveth pseries_rng poly1305_p10_crypto
chacha_p10_crypto libchacha crct10dif_vpmsum crc32c_vpmsum aes_gcm_p10_crypto
[ 1057.076731] CPU: 25 PID: 6109 Comm: sosreport Kdump: loaded Not tainted
6.8.0-31-generic #31-Ubuntu
[ 1057.076737] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006
of:IBM,FW1060.00 (NH1060_026) hv:phyp pSeries
[ 1057.076743] NIP: c0000000016ff114 LR: c0000000016ff108 CTR: c0000000016ff0e0
[ 1057.076747] REGS: c000000067e63630 TRAP: 0300 Not tainted
(6.8.0-31-generic)
[ 1057.076752] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24044400
XER: 2004008c
[ 1057.076761] CFAR: c0000000016fb6c8 DAR: 0000000000000000 DSISR: 40000000
IRQMASK: 0
[ 1057.076761] GPR00: 0000000000000000 c000000067e638d0 c000000002254800
0000000000000000
[ 1057.076761] GPR04: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 1057.076761] GPR08: 0000000000000000 0000000000000000 c000000057a07980
c008000005d39538
[ 1057.076761] GPR12: c0000000016ff0e0 c000000c1bc8ff00 0000000000000000
0000000000000000
[ 1057.076761] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 1057.076761] GPR20: c00000006751a628 0000000000000000 0000000000000000
0000000000000000
[ 1057.076761] GPR24: 0000000000000000 c00000006751a618 0000000000000000
c000000067e63a70
[ 1057.076761] GPR28: c000000067e63a98 0000000000000000 c00000006b4d9188
0000000000000000
[ 1057.076809] NIP [c0000000016ff114] mutex_lock+0x34/0x98
[ 1057.076816] LR [c0000000016ff108] mutex_lock+0x28/0x98
[ 1057.076821] Call Trace:
[ 1057.076823] [c000000067e638d0] [c0000000016ff108] mutex_lock+0x28/0x98
(unreliable)
[ 1057.076829] [c000000067e63900] [c008000005d2e480]
svc_pool_stats_start+0x48/0xf8 [sunrpc]
[ 1057.076866] [c000000067e63970] [c0000000007196a0] seq_read_iter+0x16c/0x6a4
[ 1057.076871] [c000000067e63a40] [c000000000719d00] seq_read+0x128/0x1a8
[ 1057.076875] [c000000067e63ae0] [c0000000006c8254] vfs_read+0xe4/0x3e0
[ 1057.076881] [c000000067e63b90] [c0000000006c94a0] ksys_read+0x90/0x168
[ 1057.076886] [c000000067e63be0] [c000000000033248]
system_call_exception+0xf8/0x290
[ 1057.076892] [c000000067e63e50] [c00000000000d05c]
system_call_vectored_common+0x15c/0x2ec
[ 1057.076899] --- interrupt: 3000 at 0x689080b5b504
[ 1057.076903] NIP: 0000689080b5b504 LR: 0000689080b5b504 CTR: 0000000000000000
[ 1057.076907] REGS: c000000067e63e80 TRAP: 3000 Not tainted
(6.8.0-31-generic)
[ 1057.076911] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR:
42044402 XER: 00000000
[ 1057.076922] IRQMASK: 0
[ 1057.076922] GPR00: 0000000000000003 000068907600da50 0000689080c96d00
0000000000000008
[ 1057.076922] GPR04: 000068905c014660 0000000000010000 000068907ca613c8
00006890760168e0
[ 1057.076922] GPR08: 000068907600f228 0000000000000000 0000000000000000
0000000000000000
[ 1057.076922] GPR12: 0000000000000000 00006890760168e0 0000000000000001
0000000000000000
[ 1057.076922] GPR16: 000068907c89bb50 000068907ff10968 000068907ff10978
0000689080372f7a
[ 1057.076922] GPR20: 0000689080372f78 000068907ff10938 000068907ff108f0
0000000010493180
[ 1057.076922] GPR24: 000068905c014660 0000000000000008 0000000000000000
0000000000000000
[ 1057.076922] GPR28: 000068905c014660 0000000000010000 0000000000000008
000068907600da50
[ 1057.076965] NIP [0000689080b5b504] 0x689080b5b504
[ 1057.076969] LR [0000689080b5b504] 0x689080b5b504
[ 1057.076972] --- interrupt: 3000
[ 1057.076975] Code: 38425720 7c0802a6 60000000 7c0802a6 fbe1fff8 7c7f1b78
f8010010 f821ffd1 4bffc575 60000000 39200000 e94d0908 <7d00f8a8> 7c284800
40c20010 7d40f9ad
[ 1057.076990] ---[ end trace 0000000000000000 ]---
== Comment: #1 - Tasmiya Nalatwad <[email protected]> - 2024-05-28
04:39:47 ==
Placed the dump file and dmesg file in the junebug server
ssh [email protected]
Location to the dump dile is present : /home/dump/dumps/206751
== Comment: #5 - Sourabh Jain <[email protected]> - 2024-05-29 09:23:29 ==
Hello Team,
Here is my observation on this issue:
The kernel crash is due to sos trying to get data from below sysfs file:
/proc/fs/nfsd/pool_stats
This issue is also reproducible with current upstream kernel 6.10-rc1.
So there is nothing wrong with sos tool, it is a kernel bug.
Here is the first kernel bad commit which introduced this issue:
7b207ccd9833 svc: don't hold reference for poolstats, only mutex.
Here are the steps to reproduce this issue without sos tool:
Requirements:
1. Kernel must have "7b207ccd9833 svc: don't hold reference for poolstats,
only mutex." commit
2. CONFIG_NFSD=m must be enabled
3. mount nfsd if not already using "$ mount -t nfsd nfsd /proc/fs/nfsd" command
Run the below command reproduce the issue:
$ cat /proc/fs/nfsd/pool_stats
NOTE: the above command will crash the kernel.
Thanks,
Sourabh Jain
== Comment: #9 - Sourabh Jain <[email protected]> - 2024-06-17 08:57:19 ==
Hello Team,
NFSD maintainer has provided the fix.
https://lore.kernel.org/all/[email protected]/
Feel free try the above fix.
Note: the fix is for Linux kernel and not for sosreport tool.
Thanks,
Sourabh Jain
== Comment: #10 - Sourabh Jain <[email protected]> - 2024-06-17 22:07:11 ==
Hello Team,
Fix is applied to nfsd-next kernel. Likely to hit mainline kernel in next rc.
https://lore.kernel.org/all/[email protected]/
Thanks,
Sourabh Jain
== Comment: #14 - Tasmiya Nalatwad <[email protected]> - 2024-06-25
03:38:16 ==
Team, I have tested the fix on custom kernel "6.9.0-rc7nfsd-fix+" and the issue
is not reproducible.
---- uname ----
Linux ubuntulp2host 6.9.0-rc7nfsd-fix+ #2 SMP Tue Jun 25 06:49:48 UTC 2024
ppc64le ppc64le ppc64le GNU/Linux
1. sosreport is generated as expected
------------------- logs ---------------------------
Please note the 'sosreport' command has been deprecated in favor of the new
'sos' command, E.G. 'sos report'.
Redirecting to 'sos report '
sosreport (version 4.5.6)
This command will collect system configuration and diagnostic
information from this Ubuntu system.
For more information on Canonical visit:
Community Website : https://www.ubuntu.com/
Commercial Support : https://www.canonical.com
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.
No changes will be made to system configuration.
Press ENTER to continue, or CTRL-C to quit.
Optionally, please enter the case id that you are generating this report for
[]:
Setting up archive ...
Setting up plugins ...
[plugin:lxd] skipped command 'lxc image list': required kmods missing:
ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter,
ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter,
iptable_raw.
[plugin:lxd] skipped command 'lxc list': required kmods missing: ip6table_raw,
iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter, ip6table_nat,
iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter, iptable_raw.
[plugin:lxd] skipped command 'lxc network list': required kmods missing:
ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter,
ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter,
iptable_raw.
[plugin:lxd] skipped command 'lxc profile list': required kmods missing:
ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter,
ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter,
iptable_raw.
[plugin:lxd] skipped command 'lxc storage list': required kmods missing:
ip6table_raw, iptable_filter, ebtables, bpfilter, iptable_nat, ebtable_filter,
ip6table_nat, iptable_mangle, ip6table_mangle, ip6_tables, ip6table_filter,
iptable_raw.
[plugin:networking] skipped command 'ip -s macsec show': required kmods
missing: macsec. Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing:
unix_diag, xsk_diag, af_packet_diag, tcp_diag, udp_diag, netlink_diag,
inet_diag. Use '--allow-system-changes' to enable collection.
Not all environment variables set. Source the environment file for the user
intended to connect to the OpenStack environment.
[plugin:ufw] skipped command 'ufw status numbered': required kmods missing:
bpfilter, iptable_filter.
[plugin:ufw] skipped command 'ufw app list': required kmods missing: bpfilter,
iptable_filter.
Running plugins. Please wait ...
Finishing plugins [Running: logs]
Finished running plugins
Creating compressed archive...
Your sosreport has been generated and saved in:
/tmp/sosreport-ubuntulp2host-2024-06-25-cussrcx.tar.xz
Size 5.99MiB
Owner root
sha256 192c04e45142382038adb223d6dc4aa95edc8edf5d37a576cdd2912e71cdd98b
Please send this file to your support representative.
2. As mentioned by Sourabh in the above comments the below command is
not giving crash/OOPS .
cat /proc/fs/nfsd/pool_stats
# pool packets-arrived sockets-enqueued threads-woken threads-timedout
0 0 2 0 0
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Affects: sosreport (Ubuntu)
Importance: Undecided
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Status: New
** Tags: architecture-ppc64le bugnameltc-206751 severity-high
targetmilestone-inin---
** Tags added: architecture-ppc64le bugnameltc-206751 severity-high
targetmilestone-inin---
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2070358
Title:
[Ubuntu 24.04] FW1060.00 (NH1060_026) sosreport is running to Kernel
OOPS crash
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2070358/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs