Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS
Hi Arne, On Fri, Jul 22, 2022 at 10:18:56AM +0200, Arne Nordmark wrote: > Den 2022-07-15 kl. 21:58, skrev Salvatore Bonaccorso: > > I would be interested to either pinpoint the regressing commit > > upstream beween 5.10.120 and 5.10.127 or conversely the fixing commit > > beween 5.10.127 upstream and 5.10.130 where you are not able anymore > > to reproduce the error. What I can say, I have already imported > > 5.10.130 for furture upload (cf. > > https://salsa.debian.org/kernel-team/linux/-/merge_requests/506). > > Bisection for the regression proved too hard. > > Bisection for the fix went better, I can get a crash with 5.10.128-00010 but > not yet with 5.10.128-00011. This indicates that the fixing commit was > probably: > > commit 6a0b9512a6aa7b7835d8138f5ffdcb4789c093d4 > Author: Chuck Lever > Date: Thu Jun 30 16:48:18 2022 -0400 > > SUNRPC: Fix READ_PLUS crasher > > which indeed seems to touch code involved in NFS service. > > Consequently, the breaking commit was probably: > > 6c254bf3b637 ("SUNRPC: Fix the calculation of xdr->end in > xdr_get_next_encode_buffer()") Thank you and apologies for the delay! The next upload for bullseye(-security) will contain that fix and will close the bug with that upload. Thanks a lot for your investigtive work and bisection to the fix! Regards, Salvatore
Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS
Den 2022-07-15 kl. 21:58, skrev Salvatore Bonaccorso: I would be interested to either pinpoint the regressing commit upstream beween 5.10.120 and 5.10.127 or conversely the fixing commit beween 5.10.127 upstream and 5.10.130 where you are not able anymore to reproduce the error. What I can say, I have already imported 5.10.130 for furture upload (cf. https://salsa.debian.org/kernel-team/linux/-/merge_requests/506). Bisection for the regression proved too hard. Bisection for the fix went better, I can get a crash with 5.10.128-00010 but not yet with 5.10.128-00011. This indicates that the fixing commit was probably: commit 6a0b9512a6aa7b7835d8138f5ffdcb4789c093d4 Author: Chuck Lever Date: Thu Jun 30 16:48:18 2022 -0400 SUNRPC: Fix READ_PLUS crasher which indeed seems to touch code involved in NFS service. Consequently, the breaking commit was probably: 6c254bf3b637 ("SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()") Bisection would be a new experience for me, even compiling the kernel seem like ages ago ... (using Debian since 0.93R6). Would the following help? https://wiki.debian.org/DebianKernel/GitBisect Do you need any more specifc help to get it rolling? That was indeed helpful. Regards, Salvatore Thanks Arne
Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS
Hi Arne, Thanks a lot for your time into debugging the issue. On Fri, Jul 15, 2022 at 10:28:17AM +0200, Arne Nordmark wrote: > Sorry for the late reply. > > Den 2022-07-13 kl. 12:07, skrev Salvatore Bonaccorso: > > Control: tags -1 + moreinfo > > > > Hello Arne, > > > > ... > > > > > As you seem to reliably reproduce the issue, do you have the > > possiblity (on the nonproduction instance) to try to bisect down the > > problem? Additionally to the bisect, on a testinstance were the issue > > is reproducible, can you run a selfcompiled 5.10.130 upstream to see > > if the problem is still present? > > I have now set up a test environment, and been able to reproduce NFS crashes > with the Debian linux-image-5.10.0-16-amd64 and self-compiled upstream > v5.10.127 kernels. Thats great. I have not reached yet the point to replicate it myself. But it's good you have now a base test environment where it's safe to experiment. > I have not been able to get a self-compiled upstream v5.10.130 to crash. That are good news. > As for bisection, I am not entirely clear what is expected from me. Do you > mean bisect the upstream kernels? Between which points? v5.10.120 to > v5.10.127? I would be interested to either pinpoint the regressing commit upstream beween 5.10.120 and 5.10.127 or conversely the fixing commit beween 5.10.127 upstream and 5.10.130 where you are not able anymore to reproduce the error. What I can say, I have already imported 5.10.130 for furture upload (cf. https://salsa.debian.org/kernel-team/linux/-/merge_requests/506). > Bisection would be a new experience for me, even compiling the kernel seem > like ages ago ... (using Debian since 0.93R6). Would the following help? https://wiki.debian.org/DebianKernel/GitBisect Do you need any more specifc help to get it rolling? Regards, Salvatore
Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS
Sorry for the late reply. Den 2022-07-13 kl. 12:07, skrev Salvatore Bonaccorso: Control: tags -1 + moreinfo Hello Arne, ... As you seem to reliably reproduce the issue, do you have the possiblity (on the nonproduction instance) to try to bisect down the problem? Additionally to the bisect, on a testinstance were the issue is reproducible, can you run a selfcompiled 5.10.130 upstream to see if the problem is still present? I have now set up a test environment, and been able to reproduce NFS crashes with the Debian linux-image-5.10.0-16-amd64 and self-compiled upstream v5.10.127 kernels. I have not been able to get a self-compiled upstream v5.10.130 to crash. As for bisection, I am not entirely clear what is expected from me. Do you mean bisect the upstream kernels? Between which points? v5.10.120 to v5.10.127? Bisection would be a new experience for me, even compiling the kernel seem like ages ago ... (using Debian since 0.93R6). Regards, Salvatore Thanks again, Arne
Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS
Control: tags -1 + moreinfo Hello Arne, On Tue, Jul 12, 2022 at 08:14:22AM +0200, Arne Nordmark wrote: > > Package: src:linux > Version: 5.10.127-1 > Severity: normal > > Dear Maintainer, > > The new kernel in Debian 11.4 seems unstable and crashes when serving NFS. > On two different computers, these lockups happens within minutes, typically > when a client runs firefox on an NFS-mounted home directory. Typically the > servers lock up without any printout, but on one occasion, the following was > logged: > > jul 10 08:35:13 ano4 kernel: general protection fault, probably for > non-canonical address 0x2f48514544455145: [#1] SMP PTI > jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1244 Comm: nfsd Not tainted > 5.10.0-16-amd64 #1 Debian 5.10.127-1 > jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System > Product Name/P5Q DELUXE, BIOS 220105/21/2009 > jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570 > jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48 > 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85 > c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89 > jul 10 08:35:13 ano4 kernel: RSP: 0018:abe901fa3bc8 EFLAGS: 00010202 > jul 10 08:35:13 ano4 kernel: RAX: bab6aebe RBX: 0001 > RCX: 0004 > jul 10 08:35:13 ano4 kernel: RDX: 00035a00 RSI: 0001 > RDI: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: RBP: abe901fa3c20 R08: 0001 > R09: 0002 > jul 10 08:35:13 ano4 kernel: R10: 0002 R11: 0002 > R12: 0002 > jul 10 08:35:13 ano4 kernel: R13: 45495141 R14: 424d6757 > R15: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: FS: () > GS:939527d0() knlGS: > jul 10 08:35:13 ano4 kernel: CS: 0010 DS: ES: CR0: > 80050033 > jul 10 08:35:13 ano4 kernel: CR2: 560b8cee4000 CR3: 0001034da000 > CR4: 000406e0 > jul 10 08:35:13 ano4 kernel: Call Trace: > jul 10 08:35:13 ano4 kernel: __fsnotify_parent+0xe7/0x2d0 > jul 10 08:35:13 ano4 kernel: ? ext4_buffered_write_iter+0xce/0x160 [ext4] > jul 10 08:35:13 ano4 kernel: ? do_iter_readv_writev+0x152/0x1b0 > jul 10 08:35:13 ano4 kernel: do_iter_write+0xc8/0x1b0 > jul 10 08:35:13 ano4 kernel: nfsd_vfs_write+0x175/0x510 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd4_write+0x135/0x1b0 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd4_proc_compound+0x40d/0x680 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd_dispatch+0xd3/0x180 [nfsd] > jul 10 08:35:13 ano4 kernel: svc_process_common+0x3d4/0x6d0 [sunrpc] > jul 10 08:35:13 ano4 kernel: ? nfsd_svc+0x320/0x320 [nfsd] > jul 10 08:35:13 ano4 kernel: svc_process+0xb7/0xf0 [sunrpc] > jul 10 08:35:13 ano4 kernel: nfsd+0xe8/0x140 [nfsd] > jul 10 08:35:13 ano4 kernel: ? nfsd_destroy+0x60/0x60 [nfsd] > jul 10 08:35:13 ano4 kernel: kthread+0x11b/0x140 > jul 10 08:35:13 ano4 kernel: ? __kthread_bind_mask+0x60/0x60 > jul 10 08:35:13 ano4 kernel: ret_from_fork+0x22/0x30 > jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun > cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace > aes_generic libaes crypto_simd cryptd glue_helper cbc cts rpcsec_gss_krb5 > sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl > e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core > snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa > tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887 > tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner > soundwire_generic_allocation snd_soc_core snd > _compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core > snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x kvm > ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc serio_raw > evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt iTCO_vendor_support pcspkr > watchdog sg acpi_ > cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct > jul 10 08:35:13 ano4 kernel: nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 > coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev > nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs > ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4 > 56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq > libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod > hid_generic t10_pi ata_generic crc_t10dif crct10dif_generic st > crct10dif_common usbhid pata_marvell hid ahci libahci mpt3sas firewire_ohci > firewire_core aic7xxx > crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich i2c_i801 > sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas usbcore > scsi_mod usb_common floppy > jul 10 08:35:13 ano4 kernel: ---[ end trace
Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS
Package: src:linux Version: 5.10.127-1 Severity: normal Dear Maintainer, The new kernel in Debian 11.4 seems unstable and crashes when serving NFS. On two different computers, these lockups happens within minutes, typically when a client runs firefox on an NFS-mounted home directory. Typically the servers lock up without any printout, but on one occasion, the following was logged: jul 10 08:35:13 ano4 kernel: general protection fault, probably for non-canonical address 0x2f48514544455145: [#1] SMP PTI jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1244 Comm: nfsd Not tainted 5.10.0-16-amd64 #1 Debian 5.10.127-1 jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System Product Name/P5Q DELUXE, BIOS 220105/21/2009 jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570 jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85 c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89 jul 10 08:35:13 ano4 kernel: RSP: 0018:abe901fa3bc8 EFLAGS: 00010202 jul 10 08:35:13 ano4 kernel: RAX: bab6aebe RBX: 0001 RCX: 0004 jul 10 08:35:13 ano4 kernel: RDX: 00035a00 RSI: 0001 RDI: 2f48514544455145 jul 10 08:35:13 ano4 kernel: RBP: abe901fa3c20 R08: 0001 R09: 0002 jul 10 08:35:13 ano4 kernel: R10: 0002 R11: 0002 R12: 0002 jul 10 08:35:13 ano4 kernel: R13: 45495141 R14: 424d6757 R15: 2f48514544455145 jul 10 08:35:13 ano4 kernel: FS: () GS:939527d0() knlGS: jul 10 08:35:13 ano4 kernel: CS: 0010 DS: ES: CR0: 80050033 jul 10 08:35:13 ano4 kernel: CR2: 560b8cee4000 CR3: 0001034da000 CR4: 000406e0 jul 10 08:35:13 ano4 kernel: Call Trace: jul 10 08:35:13 ano4 kernel: __fsnotify_parent+0xe7/0x2d0 jul 10 08:35:13 ano4 kernel: ? ext4_buffered_write_iter+0xce/0x160 [ext4] jul 10 08:35:13 ano4 kernel: ? do_iter_readv_writev+0x152/0x1b0 jul 10 08:35:13 ano4 kernel: do_iter_write+0xc8/0x1b0 jul 10 08:35:13 ano4 kernel: nfsd_vfs_write+0x175/0x510 [nfsd] jul 10 08:35:13 ano4 kernel: nfsd4_write+0x135/0x1b0 [nfsd] jul 10 08:35:13 ano4 kernel: nfsd4_proc_compound+0x40d/0x680 [nfsd] jul 10 08:35:13 ano4 kernel: nfsd_dispatch+0xd3/0x180 [nfsd] jul 10 08:35:13 ano4 kernel: svc_process_common+0x3d4/0x6d0 [sunrpc] jul 10 08:35:13 ano4 kernel: ? nfsd_svc+0x320/0x320 [nfsd] jul 10 08:35:13 ano4 kernel: svc_process+0xb7/0xf0 [sunrpc] jul 10 08:35:13 ano4 kernel: nfsd+0xe8/0x140 [nfsd] jul 10 08:35:13 ano4 kernel: ? nfsd_destroy+0x60/0x60 [nfsd] jul 10 08:35:13 ano4 kernel: kthread+0x11b/0x140 jul 10 08:35:13 ano4 kernel: ? __kthread_bind_mask+0x60/0x60 jul 10 08:35:13 ano4 kernel: ret_from_fork+0x22/0x30 jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace aes_generic libaes crypto_simd cryptd glue_helper cbc cts rpcsec_gss_krb5 sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887 tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner soundwire_generic_allocation snd_soc_core snd _compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x kvm ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc serio_raw evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt iTCO_vendor_support pcspkr watchdog sg acpi_ cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct jul 10 08:35:13 ano4 kernel: nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4 56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod hid_generic t10_pi ata_generic crc_t10dif crct10dif_generic st crct10dif_common usbhid pata_marvell hid ahci libahci mpt3sas firewire_ohci firewire_core aic7xxx crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich i2c_i801 sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas usbcore scsi_mod usb_common floppy jul 10 08:35:13 ano4 kernel: ---[ end trace 159cb95f57d30ea4 ]--- jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570 jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85 c1 74 4f