Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS

2022-08-13 Thread Salvatore Bonaccorso
Hi Arne,

On Fri, Jul 22, 2022 at 10:18:56AM +0200, Arne Nordmark wrote:
> Den 2022-07-15 kl. 21:58, skrev Salvatore Bonaccorso:
> > I would be interested to either pinpoint the regressing commit
> > upstream beween 5.10.120 and 5.10.127 or conversely the fixing commit
> > beween 5.10.127 upstream and 5.10.130 where you are not able anymore
> > to reproduce the error. What I can say, I have already imported
> > 5.10.130 for furture upload (cf.
> > https://salsa.debian.org/kernel-team/linux/-/merge_requests/506).
> 
> Bisection for the regression proved too hard.
> 
> Bisection for the fix went better, I can get a crash with 5.10.128-00010 but
> not yet with 5.10.128-00011. This indicates that the fixing commit was
> probably:
> 
> commit 6a0b9512a6aa7b7835d8138f5ffdcb4789c093d4
> Author: Chuck Lever 
> Date:   Thu Jun 30 16:48:18 2022 -0400
> 
> SUNRPC: Fix READ_PLUS crasher
> 
> which indeed seems to touch code involved in NFS service.
> 
> Consequently, the breaking commit was probably:
> 
> 6c254bf3b637 ("SUNRPC: Fix the calculation of xdr->end in
> xdr_get_next_encode_buffer()")

Thank you and apologies for the delay! The next upload for
bullseye(-security) will contain that fix and will close the bug with
that upload.

Thanks a lot for your investigtive work and bisection to the fix!

Regards,
Salvatore



Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS

2022-07-22 Thread Arne Nordmark

Den 2022-07-15 kl. 21:58, skrev Salvatore Bonaccorso:

I would be interested to either pinpoint the regressing commit
upstream beween 5.10.120 and 5.10.127 or conversely the fixing commit
beween 5.10.127 upstream and 5.10.130 where you are not able anymore
to reproduce the error. What I can say, I have already imported
5.10.130 for furture upload (cf.
https://salsa.debian.org/kernel-team/linux/-/merge_requests/506).


Bisection for the regression proved too hard.

Bisection for the fix went better, I can get a crash with 5.10.128-00010 
but not yet with 5.10.128-00011. This indicates that the fixing commit 
was probably:


commit 6a0b9512a6aa7b7835d8138f5ffdcb4789c093d4
Author: Chuck Lever 
Date:   Thu Jun 30 16:48:18 2022 -0400

SUNRPC: Fix READ_PLUS crasher

which indeed seems to touch code involved in NFS service.

Consequently, the breaking commit was probably:

6c254bf3b637 ("SUNRPC: Fix the calculation of xdr->end in 
xdr_get_next_encode_buffer()")





Bisection would be a new experience for me, even compiling the kernel seem
like ages ago ... (using Debian since 0.93R6).


Would the following help?
https://wiki.debian.org/DebianKernel/GitBisect
Do you need any more specifc help to get it rolling?


That was indeed helpful.



Regards,
Salvatore


Thanks
Arne



Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS

2022-07-15 Thread Salvatore Bonaccorso
Hi Arne,

Thanks a lot for your time into debugging the issue.

On Fri, Jul 15, 2022 at 10:28:17AM +0200, Arne Nordmark wrote:
> Sorry for the late reply.
> 
> Den 2022-07-13 kl. 12:07, skrev Salvatore Bonaccorso:
> > Control: tags -1 + moreinfo
> > 
> > Hello Arne,
> > 
> 
> ...
> 
> > 
> > As you seem to reliably reproduce the issue, do you have the
> > possiblity (on the nonproduction instance) to try to bisect down the
> > problem? Additionally to the bisect, on a testinstance were the issue
> > is reproducible, can you run a selfcompiled 5.10.130 upstream to see
> > if the problem is still present?
> 
> I have now set up a test environment, and been able to reproduce NFS crashes
> with the Debian linux-image-5.10.0-16-amd64 and self-compiled upstream
> v5.10.127 kernels.

Thats great. I have not reached yet the point to replicate it myself.
But it's good you have now a base test environment where it's safe to
experiment.

> I have not been able to get a self-compiled upstream v5.10.130 to crash.

That are good news.

> As for bisection, I am not entirely clear what is expected from me. Do you
> mean bisect the upstream kernels? Between which points? v5.10.120 to
> v5.10.127?

I would be interested to either pinpoint the regressing commit
upstream beween 5.10.120 and 5.10.127 or conversely the fixing commit
beween 5.10.127 upstream and 5.10.130 where you are not able anymore
to reproduce the error. What I can say, I have already imported
5.10.130 for furture upload (cf.
https://salsa.debian.org/kernel-team/linux/-/merge_requests/506).

> Bisection would be a new experience for me, even compiling the kernel seem
> like ages ago ... (using Debian since 0.93R6).

Would the following help? 
https://wiki.debian.org/DebianKernel/GitBisect
Do you need any more specifc help to get it rolling?

Regards,
Salvatore



Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS

2022-07-15 Thread Arne Nordmark

Sorry for the late reply.

Den 2022-07-13 kl. 12:07, skrev Salvatore Bonaccorso:

Control: tags -1 + moreinfo

Hello Arne,



...



As you seem to reliably reproduce the issue, do you have the
possiblity (on the nonproduction instance) to try to bisect down the
problem? Additionally to the bisect, on a testinstance were the issue
is reproducible, can you run a selfcompiled 5.10.130 upstream to see
if the problem is still present?


I have now set up a test environment, and been able to reproduce NFS 
crashes with the Debian linux-image-5.10.0-16-amd64 and self-compiled 
upstream v5.10.127 kernels.


I have not been able to get a self-compiled upstream v5.10.130 to crash.

As for bisection, I am not entirely clear what is expected from me. Do 
you mean bisect the upstream kernels? Between which points? v5.10.120 to 
v5.10.127?


Bisection would be a new experience for me, even compiling the kernel 
seem like ages ago ... (using Debian since 0.93R6).




Regards,
Salvatore


Thanks again,
Arne



Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS

2022-07-13 Thread Salvatore Bonaccorso
Control: tags -1 + moreinfo

Hello Arne,

On Tue, Jul 12, 2022 at 08:14:22AM +0200, Arne Nordmark wrote:
> 
> Package: src:linux
> Version: 5.10.127-1
> Severity: normal
> 
> Dear Maintainer,
> 
> The new kernel in Debian 11.4 seems unstable and crashes when serving NFS.
> On two different computers, these lockups happens within minutes, typically
> when a client runs firefox on an NFS-mounted home directory. Typically the
> servers lock up without any printout, but on one occasion, the following was
> logged:
> 
> jul 10 08:35:13 ano4 kernel: general protection fault, probably for
> non-canonical address 0x2f48514544455145:  [#1] SMP PTI
> jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1244 Comm: nfsd Not tainted
> 5.10.0-16-amd64 #1 Debian 5.10.127-1
> jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System
> Product Name/P5Q DELUXE, BIOS 220105/21/2009
> jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570
> jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48
> 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85
> c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89
> jul 10 08:35:13 ano4 kernel: RSP: 0018:abe901fa3bc8 EFLAGS: 00010202
> jul 10 08:35:13 ano4 kernel: RAX: bab6aebe RBX: 0001
> RCX: 0004
> jul 10 08:35:13 ano4 kernel: RDX: 00035a00 RSI: 0001
> RDI: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: RBP: abe901fa3c20 R08: 0001
> R09: 0002
> jul 10 08:35:13 ano4 kernel: R10: 0002 R11: 0002
> R12: 0002
> jul 10 08:35:13 ano4 kernel: R13: 45495141 R14: 424d6757
> R15: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: FS:  ()
> GS:939527d0() knlGS:
> jul 10 08:35:13 ano4 kernel: CS:  0010 DS:  ES:  CR0:
> 80050033
> jul 10 08:35:13 ano4 kernel: CR2: 560b8cee4000 CR3: 0001034da000
> CR4: 000406e0
> jul 10 08:35:13 ano4 kernel: Call Trace:
> jul 10 08:35:13 ano4 kernel:  __fsnotify_parent+0xe7/0x2d0
> jul 10 08:35:13 ano4 kernel:  ? ext4_buffered_write_iter+0xce/0x160 [ext4]
> jul 10 08:35:13 ano4 kernel:  ? do_iter_readv_writev+0x152/0x1b0
> jul 10 08:35:13 ano4 kernel:  do_iter_write+0xc8/0x1b0
> jul 10 08:35:13 ano4 kernel:  nfsd_vfs_write+0x175/0x510 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd4_write+0x135/0x1b0 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd4_proc_compound+0x40d/0x680 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd_dispatch+0xd3/0x180 [nfsd]
> jul 10 08:35:13 ano4 kernel:  svc_process_common+0x3d4/0x6d0 [sunrpc]
> jul 10 08:35:13 ano4 kernel:  ? nfsd_svc+0x320/0x320 [nfsd]
> jul 10 08:35:13 ano4 kernel:  svc_process+0xb7/0xf0 [sunrpc]
> jul 10 08:35:13 ano4 kernel:  nfsd+0xe8/0x140 [nfsd]
> jul 10 08:35:13 ano4 kernel:  ? nfsd_destroy+0x60/0x60 [nfsd]
> jul 10 08:35:13 ano4 kernel:  kthread+0x11b/0x140
> jul 10 08:35:13 ano4 kernel:  ? __kthread_bind_mask+0x60/0x60
> jul 10 08:35:13 ano4 kernel:  ret_from_fork+0x22/0x30
> jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun
> cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace
> aes_generic libaes crypto_simd cryptd glue_helper cbc cts rpcsec_gss_krb5
> sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl
> e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core
> snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa
> tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887
> tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner
> soundwire_generic_allocation snd_soc_core snd
> _compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core
> snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x kvm
> ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc serio_raw
> evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt iTCO_vendor_support pcspkr
> watchdog sg acpi_
> cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct
> jul 10 08:35:13 ano4 kernel:  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev
> nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs
> ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4
> 56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod
> hid_generic t10_pi ata_generic crc_t10dif crct10dif_generic st
> crct10dif_common usbhid pata_marvell hid ahci libahci mpt3sas firewire_ohci
> firewire_core aic7xxx
>  crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich i2c_i801
> sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas usbcore
> scsi_mod usb_common floppy
> jul 10 08:35:13 ano4 kernel: ---[ end trace 

Bug#1014793: linux-image-5.10.0-16-amd64: Kernel crashes while serving NFS

2022-07-12 Thread Arne Nordmark



Package: src:linux
Version: 5.10.127-1
Severity: normal

Dear Maintainer,

The new kernel in Debian 11.4 seems unstable and crashes when serving 
NFS. On two different computers, these lockups happens within minutes, 
typically when a client runs firefox on an NFS-mounted home directory. 
Typically the servers lock up without any printout, but on one occasion, 
the following was logged:


jul 10 08:35:13 ano4 kernel: general protection fault, probably for 
non-canonical address 0x2f48514544455145:  [#1] SMP PTI
jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1244 Comm: nfsd Not tainted 
5.10.0-16-amd64 #1 Debian 5.10.127-1
jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System 
Product Name/P5Q DELUXE, BIOS 220105/21/2009

jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570
jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 
01 48 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 
f1 41 85 c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 
7c 24 38 44 89

jul 10 08:35:13 ano4 kernel: RSP: 0018:abe901fa3bc8 EFLAGS: 00010202
jul 10 08:35:13 ano4 kernel: RAX: bab6aebe RBX: 0001 
RCX: 0004
jul 10 08:35:13 ano4 kernel: RDX: 00035a00 RSI: 0001 
RDI: 2f48514544455145
jul 10 08:35:13 ano4 kernel: RBP: abe901fa3c20 R08: 0001 
R09: 0002
jul 10 08:35:13 ano4 kernel: R10: 0002 R11: 0002 
R12: 0002
jul 10 08:35:13 ano4 kernel: R13: 45495141 R14: 424d6757 
R15: 2f48514544455145
jul 10 08:35:13 ano4 kernel: FS:  () 
GS:939527d0() knlGS:
jul 10 08:35:13 ano4 kernel: CS:  0010 DS:  ES:  CR0: 
80050033
jul 10 08:35:13 ano4 kernel: CR2: 560b8cee4000 CR3: 0001034da000 
CR4: 000406e0

jul 10 08:35:13 ano4 kernel: Call Trace:
jul 10 08:35:13 ano4 kernel:  __fsnotify_parent+0xe7/0x2d0
jul 10 08:35:13 ano4 kernel:  ? ext4_buffered_write_iter+0xce/0x160 [ext4]
jul 10 08:35:13 ano4 kernel:  ? do_iter_readv_writev+0x152/0x1b0
jul 10 08:35:13 ano4 kernel:  do_iter_write+0xc8/0x1b0
jul 10 08:35:13 ano4 kernel:  nfsd_vfs_write+0x175/0x510 [nfsd]
jul 10 08:35:13 ano4 kernel:  nfsd4_write+0x135/0x1b0 [nfsd]
jul 10 08:35:13 ano4 kernel:  nfsd4_proc_compound+0x40d/0x680 [nfsd]
jul 10 08:35:13 ano4 kernel:  nfsd_dispatch+0xd3/0x180 [nfsd]
jul 10 08:35:13 ano4 kernel:  svc_process_common+0x3d4/0x6d0 [sunrpc]
jul 10 08:35:13 ano4 kernel:  ? nfsd_svc+0x320/0x320 [nfsd]
jul 10 08:35:13 ano4 kernel:  svc_process+0xb7/0xf0 [sunrpc]
jul 10 08:35:13 ano4 kernel:  nfsd+0xe8/0x140 [nfsd]
jul 10 08:35:13 ano4 kernel:  ? nfsd_destroy+0x60/0x60 [nfsd]
jul 10 08:35:13 ano4 kernel:  kthread+0x11b/0x140
jul 10 08:35:13 ano4 kernel:  ? __kthread_bind_mask+0x60/0x60
jul 10 08:35:13 ano4 kernel:  ret_from_fork+0x22/0x30
jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun 
cpufreq_ondemand cpufreq_powersave cpufreq_conservative 
cpufreq_userspace aes_generic libaes crypto_simd cryptd glue_helper cbc 
cts rpcsec_gss_krb5 sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl
e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core 
snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa 
tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887 
tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner 
soundwire_generic_allocation snd_soc_core snd
_compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core 
snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x 
kvm ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc 
serio_raw evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt 
iTCO_vendor_support pcspkr watchdog sg acpi_
cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet 
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct
jul 10 08:35:13 ano4 kernel:  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev 
nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs 
ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4
56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 
md_mod sd_mod hid_generic t10_pi ata_generic crc_t10dif 
crct10dif_generic st crct10dif_common usbhid pata_marvell hid ahci 
libahci mpt3sas firewire_ohci firewire_core aic7xxx
 crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich 
i2c_i801 sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas 
usbcore scsi_mod usb_common floppy

jul 10 08:35:13 ano4 kernel: ---[ end trace 159cb95f57d30ea4 ]---
jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570
jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 
01 48 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 
f1 41 85 c1 74 4f