Thanks for testing!
I will SRU this to our Disco kernel.
https://lists.ubuntu.com/archives/kernel-team/2020-January/106822.html
** Package changed: linux-hwe (Ubuntu) => linux (Ubuntu)
** Also affects: linux (Ubuntu Disco)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu)
Status: Incomplete => Fix Released
** Changed in: linux (Ubuntu Disco)
Status: New => In Progress
** Changed in: linux (Ubuntu Disco)
Assignee: (unassigned) => Po-Hsu Lin (cypressyew)
** Tags added: disco
** Description changed:
+ == SRU Justification ==
+ The xdr_shrink_pagelen() added in commit 5f1bc39 (SUNRPC: Fix buffer
+ handling of GSS MIC without slack), which applied in the Disco tree via
+ stable update process, sometimes will raise the following kernel trace
+ when the bytes to remove from buf->pages is larger than buf->page_len:
+
+ [ 49.420081] ------------[ cut here ]------------
+ [ 49.420084] kernel BUG at
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
+ [ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
+ [ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE
5.0.0-37-generic #40~18.04.1-Ubuntu
+ [ 49.420096] Hardware name: System manufacturer System Product Name/ROG
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
+ [ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
+ [ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
+ [ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
+ [ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
+ [ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX: 000000000000001c
+ [ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI: ffff8e1a87c56e50
+ [ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09: 0000000000000000
+ [ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12: ffff8e1a87c56e50
+ [ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15: ffffffffc228e8c0
+ [ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000)
knlGS:0000000000000000
+ [ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ [ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4: 0000000000340ee0
+ [ 49.420137] Call Trace:
+ [ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
+ [ 49.420154] ? kzfree+0x2d/0x40
+ [ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
+ [ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
+ [ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
+ [ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
+ [ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
+ [ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
+ [ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
+ [ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
+ [ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
+ [ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
+ [ 49.420216] ? __switch_to_asm+0x35/0x70
+ [ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
+ [ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
+ [ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
+ [ 49.420245] process_one_work+0x1fd/0x400
+ [ 49.420247] worker_thread+0x34/0x410
+ [ 49.420249] kthread+0x121/0x140
+ [ 49.420250] ? process_one_work+0x400/0x400
+ [ 49.420252] ? kthread_park+0xb0/0xb0
+ [ 49.420254] ret_from_fork+0x22/0x40
+
+ == Fixes ==
+ * e8d70b32 (SUNRPC: Fix another issue with MIC buffer space)
+ Instead of calling BUG_ON, this patch will just cap the number of bytes
+ that xdr_shrink_pagelen() will move.
+
+ Only Disco kernel needs this patch, for Bionic and earlier they don't
+ have 5f1bc39, and this fix has been applied to Eoan and onward.
+
+ == Test ==
+ Test kernel can be found here:
+ https://people.canonical.com/~phlin/kernel/lp-1858832-sunrpc-bufferhandling/
+
+ And it's been stress-tested by the bug reporter, Michael, this issue
+ can no longer be reproduced.
+
+ == Regression Potential ==
+ Low. It's just changing the length of bytes to shrink, change limited
+ to a single driver with positive test result.
+
+
+ == Original Bug Report ==
RELEASE=19.3
CODENAME=tricia
EDITION="Cinnamon"
DESCRIPTION="Linux Mint 19.3 Tricia"
DESKTOP=Gnome
TOOLKIT=GTK
NEW_FEATURES_URL=https://www.linuxmint.com/rel_tricia_cinnamon_whatsnew.php
RELEASE_NOTES_URL=https://www.linuxmint.com/rel_tricia_cinnamon.php
USER_GUIDE_URL=https://www.linuxmint.com/documentation.php
GRUB_TITLE=Linux Mint 19.3 Cinnamon
My home dir is mounted through nfs on a local server via nfs4 and krb5i.
When stressing the mounted directory or its sub-directories (sometimes
starting firefox, sometimes starting thunderbird, nearly guaranteed when
compiling, sometimes the login itself), it will eventually lead to the
following stack-trace. The corresponding process is then stuck and
accessing the mounted directory (like calling ls) easily yields further and
similar stack trace and causing the process to also stuck.
Currently I am running an AMD 3950x on a ASUS Crosshair VII Hero Wifi
(chipset x470), but I had the same issues with an Intel 6700K on a ASUS
Crosshair VIII Hero in fall of 2019. I couldn't be bother back then to
report the bug so I just kept running a working kernel (~5.0.0-15 I
think) without updating it. After Christmas I updated said Intel machine
with the AMD machine, re-installed Linux Mint, installed all updates and
therefore ran into this issue again.
[ 49.420081] ------------[ cut here ]------------
[ 49.420084] kernel BUG at
/build/linux-hwe-FLYqTt/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ 49.420092] invalid opcode: 0000 [#1] SMP NOPTI
[ 49.420095] CPU: 16 PID: 469 Comm: kworker/u64:13 Tainted: P OE
5.0.0-37-generic #40~18.04.1-Ubuntu
[ 49.420096] Hardware name: System manufacturer System Product Name/ROG
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ 49.420109] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 49.420123] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420124] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420126] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420128] RAX: 000000000000000c RBX: 000000000000006c RCX:
000000000000001c
[ 49.420129] RDX: 000000000000005c RSI: 0000000000000010 RDI:
ffff8e1a87c56e50
[ 49.420130] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09:
0000000000000000
[ 49.420131] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12:
ffff8e1a87c56e50
[ 49.420132] R13: ffffb93787be7c00 R14: 0000000000000058 R15:
ffffffffc228e8c0
[ 49.420134] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000)
knlGS:0000000000000000
[ 49.420135] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420136] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4:
0000000000340ee0
[ 49.420137] Call Trace:
[ 49.420150] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ 49.420154] ? kzfree+0x2d/0x40
[ 49.420158] ? crypto_destroy_tfm+0x73/0xb0
[ 49.420162] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420164] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ 49.420167] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420170] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ 49.420172] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ 49.420184] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420194] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ 49.420204] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ 49.420213] call_decode+0x1c4/0x880 [sunrpc]
[ 49.420216] ? __switch_to_asm+0x35/0x70
[ 49.420224] ? rpc_check_timeout+0x130/0x130 [sunrpc]
[ 49.420233] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ 49.420242] rpc_async_schedule+0x12/0x20 [sunrpc]
[ 49.420245] process_one_work+0x1fd/0x400
[ 49.420247] worker_thread+0x34/0x410
[ 49.420249] kthread+0x121/0x140
[ 49.420250] ? process_one_work+0x400/0x400
[ 49.420252] ? kthread_park+0xb0/0xb0
[ 49.420254] ret_from_fork+0x22/0x40
[ 49.420255] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd
grace fscache edac_mce_amd snd_hda_codec_hdmi joydev kvm hid_roccat_koneplus
hid_roccat irqbypass hid_roccat_common nvidia_uvm(OE) nvidia_drm(POE)
nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
snd_hda_codec_ca0132 snd_hda_intel snd_usb_audio snd_hda_codec snd_usbmidi_lib
snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq_midi snd_pcm
nvidia(POE) ghash_clmulni_intel snd_seq_midi_event eeepc_wmi aesni_intel
snd_rawmidi asus_wmi sparse_keymap aes_x86_64 crypto_simd cryptd video
glue_helper snd_seq drm_kms_helper snd_seq_device mxm_wmi wmi_bmof input_leds
drm snd_timer ipmi_devintf snd serio_raw ccp ipmi_msghandler fb_sys_fops
syscopyarea sysfillrect sysimgblt soundcore k10temp mac_hid sch_fq_codel
asus_wmi_sensors(OE) parport_pc sunrpc ppdev lp parport ip_tables x_tables
autofs4 btrfs xor zstd_compress raid6_pq libcrc32c dm_mirror dm_region_hash
dm_log hid_plantronics
[ 49.420282] hid_generic usbhid hid igb i2c_piix4 nvme dca ahci
i2c_algo_bit nvme_core libahci gpio_amdpt wmi gpio_generic
[ 49.420293] ---[ end trace 75bda976d7f1c02d ]---
[ 49.420305] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ 49.420306] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ 49.420307] RSP: 0018:ffffb93787be7b38 EFLAGS: 00010287
[ 49.420309] RAX: 000000000000000c RBX: 000000000000006c RCX:
000000000000001c
[ 49.420310] RDX: 000000000000005c RSI: 0000000000000010 RDI:
ffff8e1a87c56e50
[ 49.420311] RBP: ffffb93787be7b50 R08: ffff8e1b06999700 R09:
0000000000000000
[ 49.420312] R10: 00000000ffffffff R11: ffff8e1b0ecd1cd0 R12:
ffff8e1a87c56e50
[ 49.420312] R13: ffffb93787be7c00 R14: 0000000000000058 R15:
ffffffffc228e8c0
[ 49.420314] FS: 0000000000000000(0000) GS:ffff8e1b1ea00000(0000)
knlGS:0000000000000000
[ 49.420315] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 49.420316] CR2: 00007ffa1faeb000 CR3: 0000000f19abe000 CR4:
0000000000340ee0
.
[Jan 1 03:45] ------------[ cut here ]------------
[ +0,000002] kernel BUG at
/build/linux-hwe-W9CF8Q/linux-hwe-5.0.0/net/sunrpc/xdr.c:434!
[ +0,000006] invalid opcode: 0000 [#1] SMP NOPTI
[ +0,000002] CPU: 4 PID: 28219 Comm: kworker/u64:2 Tainted: P OE
5.0.0-35-generic #38~18.04.1-Ubuntu
[ +0,000001] Hardware name: System manufacturer System Product Name/ROG
CROSSHAIR VII HERO (WI-FI), BIOS 3004 12/16/2019
[ +0,000011] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ +0,000010] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
[ +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX:
000000000000001c
[ +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI:
ffff8b96c0856650
[ +0,000001] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09:
0000000000000000
[ +0,000000] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12:
ffff8b96c0856650
[ +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15:
ffffffffc0eb8920
[ +0,000001] FS: 0000000000000000(0000) GS:ffff8b97de700000(0000)
knlGS:0000000000000000
[ +0,000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0,000001] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4:
0000000000340ee0
[ +0,000001] Call Trace:
[ +0,000009] xdr_buf_read_netobj+0x122/0x180 [sunrpc]
[ +0,000003] ? kzfree+0x2d/0x40
[ +0,000002] ? crypto_destroy_tfm+0x73/0xb0
[ +0,000003] gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ +0,000002] ? gss_unwrap_resp_integ.isra.11+0x9c/0x100 [auth_rpcgss]
[ +0,000002] gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ +0,000002] ? kmem_cache_alloc_trace+0x42/0x1c0
[ +0,000002] ? gss_unwrap_resp+0x13c/0x280 [auth_rpcgss]
[ +0,000002] ? gss_validate+0x242/0x300 [auth_rpcgss]
[ +0,000008] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ +0,000008] rpcauth_unwrap_resp+0x67/0xe0 [sunrpc]
[ +0,000007] ? nfs4_xdr_dec_readdir+0x100/0x100 [nfsv4]
[ +0,000007] call_decode+0x166/0x8b0 [sunrpc]
[ +0,000002] ? __switch_to_asm+0x41/0x70
[ +0,000006] ? call_refreshresult+0x130/0x130 [sunrpc]
[ +0,000006] __rpc_execute+0x7a/0x3f0 [sunrpc]
[ +0,000007] rpc_async_schedule+0x12/0x20 [sunrpc]
[ +0,000002] process_one_work+0x1fd/0x400
[ +0,000002] worker_thread+0x34/0x410
[ +0,000001] kthread+0x121/0x140
[ +0,000001] ? process_one_work+0x400/0x400
[ +0,000002] ? kthread_park+0xb0/0xb0
[ +0,000001] ret_from_fork+0x22/0x40
[ +0,000001] Modules linked in: nls_utf8 udf crc_itu_t rpcsec_gss_krb5
auth_rpcgss nfsv4 nfs lockd grace fscache edac_mce_amd snd_hda_codec_hdmi kvm
irqbypass joydev crct10dif_pclmul nvidia_uvm(OE) crc32_pclmul
hid_roccat_koneplus nvidia_drm(POE) hid_roccat ghash_clmulni_intel
hid_roccat_common nvidia_modeset(POE) nvidia(POE) snd_usb_audio
snd_hda_codec_realtek
snd_usbmidi_lib snd_hda_codec_generic ledtrig_audio snd_hda_codec_ca0132
aesni_intel input_leds snd_hda_intel eeepc_wmi snd_hda_codec asus_wmi
aes_x86_64 drm_kms_helper crypto_simd snd_hda_core snd_seq_midi cryptd
sparse_keymap snd_hwdep snd_seq_midi_event video glue_helper wmi_bmof mxm_wmi
serio_raw drm snd_rawmidi snd_pcm ipmi_devintf ipmi_msghandler snd_seq
fb_sys_fops syscopyarea sysfillrect snd_seq_device sysimgblt snd_timer
k10temp ccp snd soundcore mac_hid sch_fq_codel asus_wmi_sensors(OE) parport_pc
ppdev sunrpc lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress
raid6_pq libcrc32c dm_mirror dm_region_hash dm_log
[ +0,000019] hid_plantronics hid_generic usbhid hid igb i2c_piix4 dca
i2c_algo_bit ahci nvme libahci nvme_core wmi gpio_amdpt gpio_generic
[ +0,000008] ---[ end trace 4314523bc923f697 ]---
[ +0,000007] RIP: 0010:xdr_shrink_pagelen+0x9e/0xa0 [sunrpc]
[ +0,000001] Code: 29 ea e8 85 f4 ff ff 44 8b 63 34 8b 43 3c 45 29 ec 44 29
e8 3b 43 40 44 89 63 34 89 43 3c 73 03 89 43 40 5b 41 5c 41 5d 5d c3 <0f> 0b 0f
1f 44 00 00 4c 8d 54 24 08 48 83 e4 f0 b9 04 00 00 00 41
[ +0,000001] RSP: 0018:ffffa2dd18117b28 EFLAGS: 00010297
[ +0,000001] RAX: 0000000000000010 RBX: 0000000000000070 RCX:
000000000000001c
[ +0,000001] RDX: 000000000000005c RSI: 0000000000000014 RDI:
ffff8b96c0856650
[ +0,000000] RBP: ffffa2dd18117b40 R08: ffff8b97d1f82e00 R09:
0000000000000000
[ +0,000001] R10: 1d1cc51b00000000 R11: ffff8b97cf00e520 R12:
ffff8b96c0856650
[ +0,000001] R13: ffffa2dd18117bf0 R14: 0000000000000058 R15:
ffffffffc0eb8920
[ +0,000001] FS: 0000000000000000(0000) GS:ffff8b97de700000(0000)
knlGS:0000000000000000
[ +0,000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0,000000] CR2: 0000191e985bac88 CR3: 0000000fd656c000 CR4:
0000000000340ee0
.
With a little compile-stress-test, I have tested the following kernels which
seem to run fine:
* 4.15.0-69
* 4.15.0-70
* 4.15.0-72
* 5.0.0-32 (current daily driver, runs without a hassle, max test length 2d
4h 33m - I am writing this bug report on it)
But the following kernels do not run stable:
* 5.0.0-35 (second stack-trace from above)
* 5.0.0-37 (fist stack-trace from above, as you can see 49s after boot will
already throw the error)
* 5.3.0-24
$ lspci | grep -i ether
06:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
Connection (rev 03
$ mount | grep filer
filer:/ on /share type nfs4
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
filer:/home/michael on /share/home/michael type nfs4
(rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=krb5i,clientaddr=192.168.3.55,local_lock=none,addr=192.168.2.33)
$ cat /etc/fstab | grep -i filer
filer:/ /share/ nfs4
nfsvers=4,sec=krb5i,rw,x-systemd.automount,soft,intr,tcp,noatime 0 0
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1858832
Title:
invalid opcode xdr_buf_read_netobj on nfs4+krb5i directory
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1858832/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs