Bug#989705: Suspend to RAM hangs computer with nouveau driver and kernel 5.10.0-7-amd64 / 5.10.0-8-amd64
Hello, An upstream patch has been released [1] [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc5&id=6b04ce966a738ecdd9294c9593e48513c0dc90aa
Bug#1019544: Additional Information
As there were several changes in 5.10.140 to the kernel I/O code which could be the cause of my issue, I downloaded the vanilla source code for the 5.10.139 kernel and built it using my Debian kernel config from /boot. I installed the resulting kernel and kernel headers and DKMS built the required ZFS modules. Upon rebooting, all my ZFS pools are working as expected. In particular, the main pool that consistently showed six missing drives (A1-A6) under 5.10.140 is now showing all drives as online, just as it does with 5.10.136-1. In total, this system has 48 3.5" 7200 RPM SATA drives, two 1.92TB Samsung enterprise SATA SSDs, and three NVMe SSDs. The impacted drives are 16TB 3.5" drives, which are in two 4U DS4246 JBOD enclosures, and attached to a Dell R730xd server via an LSI 9207-8e HBA running P20 firmware in IT mode. I'm running ZFS 2.1.5 from bullseye-backports. Note that SMART data for the impacted drives is normal with no bad sectors. The only change I made was booting into a different kernel. Otherwise, it's running all the updates from the 11.5 point release. I will try to bisect 5.10.140 tomorrow to determine more precisely which commit(s) are causing my issue. NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 A1 ONLINE 0 0 0 A2 ONLINE 0 0 0 A3 ONLINE 0 0 0 A4 ONLINE 0 0 0 A5 ONLINE 0 0 0 A6 ONLINE 0 0 0 A7 ONLINE 0 0 0 A8 ONLINE 0 0 0 A9 ONLINE 0 0 0 A10 ONLINE 0 0 0 A11 ONLINE 0 0 0 A12 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 nvme-HP_SSD_EX920_1TB_HBSE48481800144-part1 ONLINE 0 0 0 nvme-HP_SSD_EX920_1TB_HBSE48481800847-part1 ONLINE 0 0 0 logs mirror-2 ONLINE 0 0 0 nvme-HP_SSD_EX920_1TB_HBSE48481800144-part2 ONLINE 0 0 0
Bug#1017720: nfs-common: No such file or directory
I downgraded the nfs-common package which required the downgrade of the libevent packages and am using the 4.19.X kernel. I see the issue running the initial test, but then the issue is gone when running the test a subsequent time. libevent-2.1-6:amd64 2.1.8-stable-4 amd64Asynchronous event notification library libevent-core-2.1-6:amd64 2.1.8-stable-4 amd64Asynchronous event notification library (core) libevent-pthreads-2.1-6:amd64 2.1.8-stable-4amd64 Asynchronous event notification library (pthreads) linux-image-4.19.0-21-amd644.19.249-2 amd64Linux 4.19 for 64-bit PCs (signed) nfs-common 1:1.3.4-2.5+deb10u1 amd64NFS support files common to client and server What other packages do I need to downgrade in order to get Debian 11.4 to behave like Debian 10.8? What additional questions can I answer so that we can move forward? > -Original Message- > From: Jason Breitman > Sent: Tuesday, September 6, 2022 5:18 PM > To: Ben Hutchings ; 1017...@bugs.debian.org > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > I also see the failure with the kernels below, but the 4.19.X kernel resolves > the issue without dropping caches. > linux-image-4.19.0-14-amd64 4.19.171-2 amd64 > Linux 4.19 for > 64-bit PCs (signed) > linux-image-4.19.0-21-amd64 4.19.249-2 amd64 > Linux 4.19 for > 64-bit PCs (signed) > > I see the issue running the initial test, but then the issue is gone when > running the test a subsequent time. > I ran several tests to verify the behavior differences between the 4.19.X and > 5.X kernels. > > -- Test > ls -l /mnt/dir/someOtherDir/* | grep '?' > > -- Error message - the error message is showing files that have been erased > via rsync --delete > ls: cannot access 'filename': No such file or directory > -? ? ???? filename > > > -Original Message- > > From: Jason Breitman > > Sent: Friday, September 2, 2022 5:17 PM > > To: Ben Hutchings ; 1017...@bugs.debian.org > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > I have tested with the following kernels and see this issue in each case. > > > > linux-image-5.10.0-16-amd64 5.10.127-1 > > amd64 > Linux > > 5.10 for 64-bit PCs (signed) > > linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1 amd64 > > Linux 5.15 for 64-bit PCs (signed) > > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1 amd64 > > Linux 5.18 for 64-bit PCs (signed) > > > > An interesting note is that when using the 5.18 kernel, I had to run echo 3 > > > > > /proc/sys/vm/drop_caches to resolve the issue. > > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 and > > 5.15 kernels. > > > > > -Original Message- > > > From: Jason Breitman > > > Sent: Friday, August 26, 2022 3:36 PM > > > To: 'Ben Hutchings' ; '1017...@bugs.debian.org' > > > <1017...@bugs.debian.org> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > I was able to identify another workaround today which may help you to > > > identify the issue. > > > The workaround is to touch the directory where the troubled files live on > > the > > > file server. > > > I believe this tells us that updating the modify time attribute is used by > the > > > cache. > > > It should be noted that access time updates are disabled on the file > server. > > > > > > I also wanted to restate that we use rsync to push out these application > > > updates and also use rsync to sync data files. > > > Our rsync options preserve timestamps, so it is possible that the new > > > files > > > have an older timestamp than "now". > > > It is not the case that the new files have an older timestamp than the > prior > > > version that is stuck in the cache. > > > > > > The rsync process that I describe has not changed and has been in use for > > > many years. > > > > > > > -Original Message- > > > > From: Jason Breitman > > > > Sent: Thursday, August 25, 2022 11:54 AM > > > > To: Ben Hutchings ; 1017...@bugs.debian.org > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > > > I have the same issue after adding actimeo=30 to /etc/fstab, rebooting > > and > > > > testing. > > > > I also confirmed that those settings applied via /proc/mounts which > > shows > > > > the below snippet for each mountpoint. > > > > nfs4 > > > > > > > > > > rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a > > > > > > > > > > cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s > > > > > > > > > > ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0 > > >
Bug#1019700: Fwd: Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.
-- Forwarded message - From: Hank Barta Date: Tue, Sep 13, 2022 at 12:54 PM Subject: Re: Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt. To: Bjørn Mork Hi Bjørn, Many thanks for the prompt reply. In the mean time I have done the following: * Reimaged my SD card with `20220808_raspi_4_bookworm.img.xz` from Debian Tested images. (5.18.14-1 kernel) * Booted and noted *no* SD card timeouts. Rebooted and power cycled 3 times each with the same result. * Performed `apt update && apt upgrade -y` and rebooted. (5.19.6-1 kernel) * First boot - repeated SD timeouts and unable to log in. Power cycled to force reboot * Second reboot - no SD card timeouts. Added `dtparam=sd_poll_once=on` to `/boot/firmware/config.txt` * Third boot - repeated SD card timeouts. Evetually I was able to log in to the console. Network is not fully up. The repeated SD timeouts seem to be slowing normal boot. Actually I may not have been logged in but in the console that presents when there is a problem booting. I exited and now I see a login prompt. And Ethernet finally came up. 737 seconds post boot according to console messages. (It was some time later before I could ssh in.) The SD timeout messages stopped. I have a login prompt at the console but it takes about 30s to login. The system is now responsive, but WiFi modules did not load. I count 52 timeout messages in dmesg output. There is no response to at the console. Tried to shutdown using `shutdown -r now` and the system hangs. The system is most certainly not operating normally. Does Debian use the device tree? This is a Debian system, not R-Pi OS. If I reboot enough times I will get a clean boot followed by normal operation. I have tried different SD cards, USB SSDs and Pi 4Bs all with the same result so I do not believe this is a H/W problem. I do recall the previous SD timeout issue and I worked around that by inserting an SD card post boot but that no longer works. This seems to be a new problem. best, hank On Tue, Sep 13, 2022 at 11:32 AM Bjørn Mork wrote: > Hank Barta writes: > > > ** Kernel log: > > [ 723.735217] mmc0: sdhci: Timeout: 0x | Int stat: 0x00018000 > > [ 723.741743] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 > > [ 723.748270] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001 > > [ 723.754797] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0xa525 > > [ 723.761324] mmc0: sdhci: Cmd: 0x0502 | Max curr: 0x00080008 > > [ 723.767851] mmc0: sdhci: Resp[0]: 0x01aa | Resp[1]: 0x > > [ 723.774379] mmc0: sdhci: Resp[2]: 0x | Resp[3]: 0x > > [ 723.780905] mmc0: sdhci: Host ctl2: 0x > > [ 723.785404] mmc0: sdhci: ADMA Err: 0x | ADMA Ptr: 0x > > [ 723.791930] mmc0: sdhci: > > [ 733.923993] mmc0: Timeout waiting for hardware cmd interrupt. > > These repeated messages are normal on the RPi4 if you boot it without an > SD card. E.g. from USB or network. If that's what you intend to do, > then you can avoid the repeated messages by adding > > dtparam=sd_poll_once=on > > to the config.txt file in your firmware partition. Often mounted as > /boot/firmware/. > > The effect depends on which device-tree you are using. I believe it > will only work with the ones coming with the Raspberry Pi firmware. See > > https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README > > for docs. > > > Bjørn > -- Beautiful Sunny Winfield -- Beautiful Sunny Winfield
Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.
Hank Barta writes: > ** Kernel log: > [ 723.735217] mmc0: sdhci: Timeout: 0x | Int stat: 0x00018000 > [ 723.741743] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 > [ 723.748270] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001 > [ 723.754797] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0xa525 > [ 723.761324] mmc0: sdhci: Cmd: 0x0502 | Max curr: 0x00080008 > [ 723.767851] mmc0: sdhci: Resp[0]: 0x01aa | Resp[1]: 0x > [ 723.774379] mmc0: sdhci: Resp[2]: 0x | Resp[3]: 0x > [ 723.780905] mmc0: sdhci: Host ctl2: 0x > [ 723.785404] mmc0: sdhci: ADMA Err: 0x | ADMA Ptr: 0x > [ 723.791930] mmc0: sdhci: > [ 733.923993] mmc0: Timeout waiting for hardware cmd interrupt. These repeated messages are normal on the RPi4 if you boot it without an SD card. E.g. from USB or network. If that's what you intend to do, then you can avoid the repeated messages by adding dtparam=sd_poll_once=on to the config.txt file in your firmware partition. Often mounted as /boot/firmware/. The effect depends on which device-tree you are using. I believe it will only work with the ones coming with the Raspberry Pi firmware. See https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README for docs. Bjørn
Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.
Behavior not seen before - panic during boot. https://photos.app.goo.gl/txcEsUCJuqMGmK1K6 -- Beautiful Sunny Winfield
Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.
Package: src:linux Version: 5.19.6-1 Severity: important X-Debbugs-Cc: hba...@gmail.com Dear Maintainer, *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? Apparent inability to initialize/connect to the SD card H/W. This leads to the message below that is repeated about every 10s. It can manifest three ways. 1. Failure to boot - continuous retries to read SD card. 2. If a USB SSD is connected, it can skip the SD card and boot from the SATA SSD. (That is the coneition as I prepare this report.) 3. Completes boot, message repeats and there are no /dev/mmc* entries and WiFi H/W is not recognozed. 4. Completes boot, messages are repeated but /dev/mmc entries are present and can mount/read an SD card. And WiFi appears to be working 5. Completes boot, no SD card timeout messages are reported and system operates normally. * What exactly did you do (or not do) that was effective (or ineffective)? * What was the outcome of this action? * What outcome did you expect instead? I build kernel 5.19.8 and found the same problem behavior. I booted a different SSD with Bullseye installed and on 5.10.0 kernel and do not see this issue. (Likely unrelated - The 5.10 and 5.19.0 kernels had a lot of vc4 related errors that seem to be fixed in 5.19.8) Additional information: The 5.19.8 kernel was built with the options found at https://github.com/HankB/Debian-Arm64-kernel-for-Pi-4B-on-X86_64 I have saved dmesg output from a normal boot and a boot that exhibted the timeout (but was otherwise able to complete booting) in paste.dmesg.net Normal - https://paste.debian.net/1253718/ Timeout - https://paste.debian.net/1253719/ Since the kernel log below doesn't include the information at the beginning of `dmesg` I will capture again. Or I won't. It already overflowed the dmesg buffer. If needed for this kernel I can dupicate the situation and capture before it overflows. -- Package-specific info: ** Version: Linux version 5.19.0-1-arm64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP Debian 5.19.6-1 (2022-09-01) ** Command line: video=HDMI-A-1:1600x1200M@60 dma.dmachans=0x37f5 bcm2709.boardrev=0xc03111 bcm2709.serial=0x44557cae bcm2709.uart_clock=4800 bcm2709.disk_led_gpio=42 bcm2709.disk_led_active_low=0 smsc95xx.macaddr=DC:A6:32:09:C6:71 vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000 console=tty0 console=ttyS1,115200 root=LABEL=RASPIROOT rw fsck.repair=yes net.ifnames=0 rootwait ** Tainted: WC (1536) * kernel issued warning * staging driver was loaded ** Kernel log: [ 723.735217] mmc0: sdhci: Timeout: 0x | Int stat: 0x00018000 [ 723.741743] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 723.748270] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001 [ 723.754797] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0xa525 [ 723.761324] mmc0: sdhci: Cmd: 0x0502 | Max curr: 0x00080008 [ 723.767851] mmc0: sdhci: Resp[0]: 0x01aa | Resp[1]: 0x [ 723.774379] mmc0: sdhci: Resp[2]: 0x | Resp[3]: 0x [ 723.780905] mmc0: sdhci: Host ctl2: 0x [ 723.785404] mmc0: sdhci: ADMA Err: 0x | ADMA Ptr: 0x [ 723.791930] mmc0: sdhci: [ 733.923993] mmc0: Timeout waiting for hardware cmd interrupt. [ 733.929837] mmc0: sdhci: SDHCI REGISTER DUMP === [ 733.936364] mmc0: sdhci: Sys addr: 0x | Version: 0x1002 [ 733.942892] mmc0: sdhci: Blk size: 0x | Blk cnt: 0x [ 733.949420] mmc0: sdhci: Argument: 0x | Trn mode: 0x [ 733.955946] mmc0: sdhci: Present: 0x1fff | Host ctl: 0x0001 [ 733.962473] mmc0: sdhci: Power: 0x000f | Blk gap: 0x0080 [ 733.969001] mmc0: sdhci: Wake-up: 0x | Clock:0xfa07 [ 733.975528] mmc0: sdhci: Timeout: 0x | Int stat: 0x00018000 [ 733.982055] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 733.988582] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001 [ 733.995109] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0xa525 [ 734.001636] mmc0: sdhci: Cmd: 0x0502 | Max curr: 0x00080008 [ 734.008163] mmc0: sdhci: Resp[0]: 0x01aa | Resp[1]: 0x [ 734.014689] mmc0: sdhci: Resp[2]: 0x | Resp[3]: 0x [ 734.021216] mmc0: sdhci: Host ctl2: 0x [ 734.025716] mmc0: sdhci: ADMA Err: 0x | ADMA Ptr: 0x [ 734.032242] mmc0: sdhci: [ 744.164283] mmc0: Timeout waiting for hardware cmd interrupt. [ 744.170128] mmc0: sdhci: SDHCI REGISTER DUMP === [ 744.176655] mmc0: sdhci: Sys addr: 0x | Version: 0x1002 [ 744.183183] mmc0: sdhci: Blk size: 0x | Blk cnt: 0x [ 744.189711] mmc0: s
Bug#1019660: console-setup: grep: warning: stray \ before #
Package: console-setup Version: 1.210 Severity: minor Control: affects -1 + initramfs-tools With grep (>= 3.8), I'm getting this warning: # setupcon grep: warning: stray \ before # -- System Information: Architecture: i386 Versions of packages console-setup depends on: ii debconf 1.5.79 ii console-setup-linux 1.210 ii xkb-data2.35.1-1 ii keyboard-configuration 1.210 -- Jakub Wilk