Bug#989705: Suspend to RAM hangs computer with nouveau driver and kernel 5.10.0-7-amd64 / 5.10.0-8-amd64

2022-09-13 Thread Computer Enthusiastic
Hello,

An upstream patch has been released [1]

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc5=6b04ce966a738ecdd9294c9593e48513c0dc90aa

Bug#1019544: Additional Information

2022-09-13 Thread Jason Wittlin-Cohen
As there were several changes in 5.10.140 to the kernel I/O code which
could be the cause of my issue, I downloaded the vanilla source code for
the 5.10.139 kernel and built it using my Debian kernel config from /boot.
I installed the resulting kernel and kernel headers and DKMS built the
required ZFS modules. Upon rebooting, all my ZFS pools are working as
expected.

In particular, the main pool that consistently showed six missing drives
(A1-A6) under 5.10.140 is now showing all drives as online, just as it does
with 5.10.136-1.  In total, this system has 48 3.5" 7200 RPM SATA drives,
two 1.92TB Samsung enterprise SATA SSDs, and three NVMe SSDs.  The impacted
drives are 16TB 3.5" drives, which are in two 4U DS4246 JBOD enclosures,
and attached to a Dell R730xd server via an LSI 9207-8e HBA running P20
firmware in IT mode.  I'm running ZFS 2.1.5 from bullseye-backports.  Note
that SMART data for the impacted drives is normal with no bad sectors.  The
only change I made was booting into a different kernel.  Otherwise, it's
running all the updates from the 11.5 point release.

I will try to bisect 5.10.140 tomorrow to determine more precisely which
commit(s) are causing my issue.

NAME STATE READ
WRITE CKSUM
data ONLINE   0
0 0
  raidz2-0   ONLINE   0
0 0
A1   ONLINE   0
0 0
A2   ONLINE   0
0 0
A3   ONLINE   0
0 0
A4   ONLINE   0
0 0
A5   ONLINE   0
0 0
A6   ONLINE   0
0 0
A7   ONLINE   0
0 0
A8   ONLINE   0
0 0
A9   ONLINE   0
0 0
A10  ONLINE   0
0 0
A11  ONLINE   0
0 0
A12  ONLINE   0
0 0
special
  mirror-1   ONLINE   0
0 0
nvme-HP_SSD_EX920_1TB_HBSE48481800144-part1  ONLINE   0
0 0
nvme-HP_SSD_EX920_1TB_HBSE48481800847-part1  ONLINE   0
0 0
logs
  mirror-2   ONLINE   0
0 0
nvme-HP_SSD_EX920_1TB_HBSE48481800144-part2  ONLINE   0
0 0


Bug#1017720: nfs-common: No such file or directory

2022-09-13 Thread Jason Breitman
I downgraded the nfs-common package which required the downgrade of the 
libevent packages and am using the 4.19.X kernel.
I see the issue running the initial test, but then the issue is gone when 
running the test a subsequent time.

libevent-2.1-6:amd64  2.1.8-stable-4
amd64Asynchronous event notification library
libevent-core-2.1-6:amd64 2.1.8-stable-4
amd64Asynchronous event notification library (core)
libevent-pthreads-2.1-6:amd64 2.1.8-stable-4amd64   
 Asynchronous event notification library (pthreads)
linux-image-4.19.0-21-amd644.19.249-2  
amd64Linux 4.19 for 64-bit PCs (signed)
nfs-common  1:1.3.4-2.5+deb10u1
amd64NFS support files common to client and server

What other packages do I need to downgrade in order to get Debian 11.4 to 
behave like Debian 10.8?
What additional questions can I answer so that we can move forward?

> -Original Message-
> From: Jason Breitman
> Sent: Tuesday, September 6, 2022 5:18 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I also see the failure with the kernels below, but the 4.19.X kernel resolves
> the issue without dropping caches.
> linux-image-4.19.0-14-amd64   4.19.171-2 amd64
> Linux 4.19 for
> 64-bit PCs (signed)
> linux-image-4.19.0-21-amd64   4.19.249-2 amd64
> Linux 4.19 for
> 64-bit PCs (signed)
> 
> I see the issue running the initial test, but then the issue is gone when
> running the test a subsequent time.
> I ran several tests to verify the behavior differences between the 4.19.X and
> 5.X kernels.
> 
> -- Test
> ls -l /mnt/dir/someOtherDir/* | grep '?'
> 
> -- Error message - the error message is showing files that have been erased
> via rsync --delete
> ls: cannot access 'filename': No such file or directory
> -? ? ???? filename
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Friday, September 2, 2022 5:17 PM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I have tested with the following kernels and see this issue in each case.
> >
> > linux-image-5.10.0-16-amd64  5.10.127-1 
> >  amd64
> Linux
> > 5.10 for 64-bit PCs (signed)
> > linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1  amd64
> > Linux 5.15 for 64-bit PCs (signed)
> > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1  amd64
> > Linux 5.18 for 64-bit PCs (signed)
> >
> > An interesting note is that when using the 5.18 kernel, I had to run echo 3 
> > >
> > /proc/sys/vm/drop_caches to resolve the issue.
> > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 and
> > 5.15 kernels.
> >
> > > -Original Message-
> > > From: Jason Breitman
> > > Sent: Friday, August 26, 2022 3:36 PM
> > > To: 'Ben Hutchings' ; '1017...@bugs.debian.org'
> > > <1017...@bugs.debian.org>
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I was able to identify another workaround today which may help you to
> > > identify the issue.
> > > The workaround is to touch the directory where the troubled files live on
> > the
> > > file server.
> > > I believe this tells us that updating the modify time attribute is used by
> the
> > > cache.
> > > It should be noted that access time updates are disabled on the file
> server.
> > >
> > > I also wanted to restate that we use rsync to push out these application
> > > updates and also use rsync to sync data files.
> > > Our rsync options preserve timestamps, so it is possible that the new 
> > > files
> > > have an older timestamp than "now".
> > > It is not the case that the new files have an older timestamp than the
> prior
> > > version that is stuck in the cache.
> > >
> > > The rsync process that I describe has not changed and has been in use for
> > > many years.
> > >
> > > > -Original Message-
> > > > From: Jason Breitman
> > > > Sent: Thursday, August 25, 2022 11:54 AM
> > > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > I have the same issue after adding actimeo=30 to /etc/fstab, rebooting
> > and
> > > > testing.
> > > > I also confirmed that those settings applied via /proc/mounts which
> > shows
> > > > the below snippet for each mountpoint.
> > > > nfs4
> > > >
> > >
> >
> rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a
> > > >
> > >
> >
> cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s
> > > >
> > >
> >
> ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0
> > > 

Bug#1019700: Fwd: Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.

2022-09-13 Thread Hank Barta
-- Forwarded message -
From: Hank Barta 
Date: Tue, Sep 13, 2022 at 12:54 PM
Subject: Re: Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.
To: Bjørn Mork 


Hi Bjørn,

Many thanks for the prompt reply. In the mean time I have done the
following:

* Reimaged my SD card with `20220808_raspi_4_bookworm.img.xz` from Debian
Tested images. (5.18.14-1 kernel)
* Booted and noted *no* SD card timeouts. Rebooted and power cycled 3 times
each with the same result.
* Performed `apt update && apt upgrade -y` and rebooted. (5.19.6-1 kernel)
* First boot - repeated SD timeouts and unable to log in. Power cycled to
force reboot
* Second reboot - no SD card timeouts. Added `dtparam=sd_poll_once=on` to
`/boot/firmware/config.txt`
* Third boot - repeated SD card timeouts.

Evetually I was able to log in to the console. Network is not fully up. The
repeated SD timeouts seem to be slowing normal boot. Actually I may not
have been logged in but in the console that presents when there is a
problem booting. I exited and now I see a login prompt. And Ethernet
finally came up. 737 seconds post boot according to console messages. (It
was some time later before I could ssh in.)

The SD timeout messages stopped. I have a login prompt at the console but
it takes about 30s to login. The system is now responsive, but WiFi modules
did not load. I count 52 timeout messages in dmesg output. There is no
response to  at the console. Tried to shutdown using
`shutdown -r now` and the system hangs.

The system is most certainly not operating normally.

Does Debian use the device tree? This is a Debian system, not R-Pi OS.

If I reboot enough times I will get a clean boot followed by normal
operation. I have tried different SD cards, USB SSDs and Pi 4Bs all with
the same result so I do not believe this is a H/W problem. I do recall the
previous SD timeout issue and I worked around that by inserting an SD card
post boot but that no longer works. This seems to be a new problem.

best,
hank

On Tue, Sep 13, 2022 at 11:32 AM Bjørn Mork  wrote:

> Hank Barta  writes:
>
> > ** Kernel log:
> > [  723.735217] mmc0: sdhci: Timeout:   0x | Int stat: 0x00018000
> > [  723.741743] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
> > [  723.748270] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001
> > [  723.754797] mmc0: sdhci: Caps:  0x45ee6432 | Caps_1:   0xa525
> > [  723.761324] mmc0: sdhci: Cmd:   0x0502 | Max curr: 0x00080008
> > [  723.767851] mmc0: sdhci: Resp[0]:   0x01aa | Resp[1]:  0x
> > [  723.774379] mmc0: sdhci: Resp[2]:   0x | Resp[3]:  0x
> > [  723.780905] mmc0: sdhci: Host ctl2: 0x
> > [  723.785404] mmc0: sdhci: ADMA Err:  0x | ADMA Ptr: 0x
> > [  723.791930] mmc0: sdhci: 
> > [  733.923993] mmc0: Timeout waiting for hardware cmd interrupt.
>
> These repeated messages are normal on the RPi4 if you boot it without an
> SD card.  E.g. from USB or network.  If that's what you intend to do,
> then you can avoid the repeated messages by adding
>
>  dtparam=sd_poll_once=on
>
> to the config.txt file in your firmware partition.  Often mounted as
> /boot/firmware/.
>
> The effect depends on which device-tree you are using.  I believe it
> will only work with the ones coming with the Raspberry Pi firmware.  See
>
> https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README
>
> for docs.
>
>
> Bjørn
>


-- 
Beautiful Sunny Winfield


-- 
Beautiful Sunny Winfield


Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.

2022-09-13 Thread Bjørn Mork
Hank Barta  writes:

> ** Kernel log:
> [  723.735217] mmc0: sdhci: Timeout:   0x | Int stat: 0x00018000
> [  723.741743] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
> [  723.748270] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001
> [  723.754797] mmc0: sdhci: Caps:  0x45ee6432 | Caps_1:   0xa525
> [  723.761324] mmc0: sdhci: Cmd:   0x0502 | Max curr: 0x00080008
> [  723.767851] mmc0: sdhci: Resp[0]:   0x01aa | Resp[1]:  0x
> [  723.774379] mmc0: sdhci: Resp[2]:   0x | Resp[3]:  0x
> [  723.780905] mmc0: sdhci: Host ctl2: 0x
> [  723.785404] mmc0: sdhci: ADMA Err:  0x | ADMA Ptr: 0x
> [  723.791930] mmc0: sdhci: 
> [  733.923993] mmc0: Timeout waiting for hardware cmd interrupt.

These repeated messages are normal on the RPi4 if you boot it without an
SD card.  E.g. from USB or network.  If that's what you intend to do,
then you can avoid the repeated messages by adding

 dtparam=sd_poll_once=on

to the config.txt file in your firmware partition.  Often mounted as
/boot/firmware/.

The effect depends on which device-tree you are using.  I believe it
will only work with the ones coming with the Raspberry Pi firmware.  See

https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README

for docs.


Bjørn



Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.

2022-09-13 Thread Hank Barta
Behavior not seen before - panic during boot.

https://photos.app.goo.gl/txcEsUCJuqMGmK1K6


-- 
Beautiful Sunny Winfield


Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.

2022-09-13 Thread Hank Barta
Package: src:linux
Version: 5.19.6-1
Severity: important
X-Debbugs-Cc: hba...@gmail.com

Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

   * What led up to the situation?

Apparent inability to initialize/connect to the SD card H/W. This leads to the 
message 
below that is repeated about every 10s. It can manifest three ways.

1. Failure to boot - continuous retries to read SD card.
2. If a USB SSD is connected, it can skip the SD card and boot from the SATA 
SSD. (That is
   the coneition as I prepare this report.)
3. Completes boot, message repeats and there are no /dev/mmc* entries and WiFi 
H/W is
   not recognozed.
4. Completes boot, messages are repeated but /dev/mmc entries are present and 
can
   mount/read an SD card. And WiFi appears to be working
5. Completes boot, no SD card timeout messages are reported and system operates 
normally.


   * What exactly did you do (or not do) that was effective (or
 ineffective)?
   * What was the outcome of this action?
   * What outcome did you expect instead?

I build kernel 5.19.8 and found the same problem behavior. I booted a different 
SSD with
Bullseye installed and on 5.10.0 kernel and do not see this issue. (Likely 
unrelated -
The 5.10 and 5.19.0 kernels had a lot of vc4 related errors that seem to be 
fixed in 5.19.8)

Additional information:

The 5.19.8 kernel was built with the options found at 
https://github.com/HankB/Debian-Arm64-kernel-for-Pi-4B-on-X86_64 

I have saved dmesg output from a normal boot and a boot that exhibted the 
timeout
(but was otherwise able to complete booting) in paste.dmesg.net

Normal -  https://paste.debian.net/1253718/
Timeout - https://paste.debian.net/1253719/

Since the kernel log below doesn't include the information at the beginning of 
`dmesg`
I will capture again. Or I won't. It already overflowed the dmesg buffer. If 
needed 
for this kernel I can dupicate the situation and capture before it overflows.


-- Package-specific info:
** Version:
Linux version 5.19.0-1-arm64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 
11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP 
Debian 5.19.6-1 (2022-09-01)

** Command line:
video=HDMI-A-1:1600x1200M@60 dma.dmachans=0x37f5 bcm2709.boardrev=0xc03111 
bcm2709.serial=0x44557cae bcm2709.uart_clock=4800 bcm2709.disk_led_gpio=42 
bcm2709.disk_led_active_low=0 smsc95xx.macaddr=DC:A6:32:09:C6:71 
vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000  console=tty0 
console=ttyS1,115200 root=LABEL=RASPIROOT rw fsck.repair=yes net.ifnames=0  
rootwait

** Tainted: WC (1536)
 * kernel issued warning
 * staging driver was loaded

** Kernel log:
[  723.735217] mmc0: sdhci: Timeout:   0x | Int stat: 0x00018000
[  723.741743] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
[  723.748270] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001
[  723.754797] mmc0: sdhci: Caps:  0x45ee6432 | Caps_1:   0xa525
[  723.761324] mmc0: sdhci: Cmd:   0x0502 | Max curr: 0x00080008
[  723.767851] mmc0: sdhci: Resp[0]:   0x01aa | Resp[1]:  0x
[  723.774379] mmc0: sdhci: Resp[2]:   0x | Resp[3]:  0x
[  723.780905] mmc0: sdhci: Host ctl2: 0x
[  723.785404] mmc0: sdhci: ADMA Err:  0x | ADMA Ptr: 0x
[  723.791930] mmc0: sdhci: 
[  733.923993] mmc0: Timeout waiting for hardware cmd interrupt.
[  733.929837] mmc0: sdhci:  SDHCI REGISTER DUMP ===
[  733.936364] mmc0: sdhci: Sys addr:  0x | Version:  0x1002
[  733.942892] mmc0: sdhci: Blk size:  0x | Blk cnt:  0x
[  733.949420] mmc0: sdhci: Argument:  0x | Trn mode: 0x
[  733.955946] mmc0: sdhci: Present:   0x1fff | Host ctl: 0x0001
[  733.962473] mmc0: sdhci: Power: 0x000f | Blk gap:  0x0080
[  733.969001] mmc0: sdhci: Wake-up:   0x | Clock:0xfa07
[  733.975528] mmc0: sdhci: Timeout:   0x | Int stat: 0x00018000
[  733.982055] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
[  733.988582] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x0001
[  733.995109] mmc0: sdhci: Caps:  0x45ee6432 | Caps_1:   0xa525
[  734.001636] mmc0: sdhci: Cmd:   0x0502 | Max curr: 0x00080008
[  734.008163] mmc0: sdhci: Resp[0]:   0x01aa | Resp[1]:  0x
[  734.014689] mmc0: sdhci: Resp[2]:   0x | Resp[3]:  0x
[  734.021216] mmc0: sdhci: Host ctl2: 0x
[  734.025716] mmc0: sdhci: ADMA Err:  0x | ADMA Ptr: 0x
[  734.032242] mmc0: sdhci: 
[  744.164283] mmc0: Timeout waiting for hardware cmd interrupt.
[  744.170128] mmc0: sdhci:  SDHCI REGISTER DUMP ===
[  744.176655] mmc0: sdhci: Sys addr:  0x | Version:  0x1002
[  744.183183] mmc0: sdhci: Blk size:  0x | Blk cnt:  0x
[  744.189711] mmc0: 

Bug#1019660: console-setup: grep: warning: stray \ before #

2022-09-13 Thread Jakub Wilk

Package: console-setup
Version: 1.210
Severity: minor
Control: affects -1 + initramfs-tools

With grep (>= 3.8), I'm getting this warning:

  # setupcon
  grep: warning: stray \ before #


-- System Information:
Architecture: i386

Versions of packages console-setup depends on:
ii  debconf 1.5.79
ii  console-setup-linux 1.210
ii  xkb-data2.35.1-1
ii  keyboard-configuration  1.210

--
Jakub Wilk