Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures

2024-05-17 Thread Forest
Control: fixed -1 6.8.9-1

On Fri, 17 May 2024 12:15:55 +0200, Diederik de Haas wrote:

>Kernel 6.8.9 has recently been uploaded to Unstable which has that commit.
>Can you verify that it indeed fixes this bug?

Indeed, it seems to be fixed there. It usually takes only one or two boots
to show up, but I didn't see it in five reboots with kernel 6.8.9. This
matches what I found while bisecting for the past week.

Note that I have not examined the es8316 driver code or its relationship to
maple_tree. I don't know if the bug was in maple_tree and now fixed, or
still lurks within the driver but is now hidden as a result of the
maple_tree changes. In any case, I'm happy to report that I no longer see it
breaking the OS in this newer kernel.



Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures

2024-05-17 Thread Diederik de Haas
On Friday, 17 May 2024 03:36:35 CEST Forest wrote:
> A git bisect reveals it to be fixed by this commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=
> f7a59018953910032231c0a019208c4b0a4a8bc3
> > maple_tree: make mas_erase() more robust
> > 
> > mas_erase() may not deal correctly with all maple states.  Make the
> > function more robust by ensuring the state is in one of the two acceptable
> > states.

Kernel 6.8.9 has recently been uploaded to Unstable which has that commit.
Can you verify that it indeed fixes this bug?

signature.asc
Description: This is a digitally signed message part.


Bug#1069077:

2024-05-16 Thread Forest
A git bisect reveals it to be fixed by this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f7a59018953910032231c0a019208c4b0a4a8bc3

> maple_tree: make mas_erase() more robust
> 
> mas_erase() may not deal correctly with all maple states.  Make the
> function more robust by ensuring the state is in one of the two acceptable
> states.



Bug#1069077:

2024-05-04 Thread Forest
Control: found -1 6.7.12-1



Bug#1069077: Re: Bug#1069077: es8316 driver causes kernel oops / panic on rockpro64

2024-04-16 Thread Forest
Control: found -1 6.6.15-2


On Tue, 16 Apr 2024 10:34:45 +0200, Diederik de Haas wrote:

>Can you try the Debian Testing kernel, which is at version 6.6.15?

6.6.15-2 also has the bug.



Bug#1069077: es8316 driver causes kernel oops / panic on rockpro64

2024-04-16 Thread Diederik de Haas
Control: retitle -1 es8316 driver causes kernel oops / panic on rockpro64

On Tuesday, 16 April 2024 01:37:57 CEST Forest wrote:
> The current debian unstable kernel causes a variety of failures that are not
> present in the bookworm kernel, on the RockPro64 single board computer.
> (This is an arm64 machine built upon the Rockchip rk3399 SoC.)

On Tuesday, 16 April 2024 05:21:13 CEST Forest wrote:
> Blacklisting the snd_soc_es8316 module in /etc/modprobe.d seems to restore
> kernel stability, as far as I have seen from half a dozen reboots.

Can you try the Debian Testing kernel, which is at version 6.6.15?

If the 6.6.15 kernel does work properly, then the 3 commits for
the es8316 driver in kernel 6.7 are the most likely suspects:
869f30782cda ASoC: es8316: Enable support for MCLK div by 2
a43c0dc1004c ASoC: es8316: Replace NR_SUPPORTED_MCLK_LRCK_RATIOS with 
ARRAY_SIZE()
2f06f231f0bf ASoC: es8316: Enable support for S32 LE format

Attached you'll find 3 patches which revert those commits.
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4
describes a procedure with which you can apply patches to the
(6.7) kernel. If that makes the 6.7 kernel work properly again, we
likely have found the culprit for the kernel oops/panic.

Can you first try the Testing (6.6.15) kernel and if that works try
applying the attached patches to the 6.7 kernel?>From 407672343a738ede6f5e955e3afa57d16b37f4e6 Mon Sep 17 00:00:00 2001
From: Diederik de Haas 
Date: Tue, 16 Apr 2024 10:24:39 +0200
Subject: [PATCH 1/3] Revert "ASoC: es8316: Enable support for MCLK div by 2"

This reverts commit 869f30782cdad0a86598a700a864e4a2bf44f8cc.
---
 sound/soc/codecs/es8316.c | 45 ++-
 sound/soc/codecs/es8316.h |  3 ---
 2 files changed, 11 insertions(+), 37 deletions(-)

diff --git a/sound/soc/codecs/es8316.c b/sound/soc/codecs/es8316.c
index e53b2856d625..a1c3e10c3cf1 100644
--- a/sound/soc/codecs/es8316.c
+++ b/sound/soc/codecs/es8316.c
@@ -469,42 +469,19 @@ static int es8316_pcm_hw_params(struct snd_pcm_substream *substream,
 	u8 bclk_divider;
 	u16 lrck_divider;
 	int i;
-	unsigned int clk = es8316->sysclk / 2;
-	bool clk_valid = false;
-
-	/* We will start with halved sysclk and see if we can use it
-	 * for proper clocking. This is to minimise the risk of running
-	 * the CODEC with a too high frequency. We have an SKU where
-	 * the sysclk frequency is 48Mhz and this causes the sound to be
-	 * sped up. If we can run with a halved sysclk, we will use it,
-	 * if we can't use it, then full sysclk will be used.
-	 */
-	do {
-		/* Validate supported sample rates that are autodetected from MCLK */
-		for (i = 0; i < ARRAY_SIZE(supported_mclk_lrck_ratios); i++) {
-			const unsigned int ratio = supported_mclk_lrck_ratios[i];
-
-			if (clk % ratio != 0)
-continue;
-			if (clk / ratio == params_rate(params))
-break;
-		}
-		if (i == ARRAY_SIZE(supported_mclk_lrck_ratios)) {
-			if (clk == es8316->sysclk)
-return -EINVAL;
-			clk = es8316->sysclk;
-		} else {
-			clk_valid = true;
-		}
-	} while (!clk_valid);
 
-	if (clk != es8316->sysclk) {
-		snd_soc_component_update_bits(component, ES8316_CLKMGR_CLKSW,
-	  ES8316_CLKMGR_CLKSW_MCLK_DIV,
-	  ES8316_CLKMGR_CLKSW_MCLK_DIV);
-	}
+	/* Validate supported sample rates that are autodetected from MCLK */
+	for (i = 0; i < ARRAY_SIZE(supported_mclk_lrck_ratios); i++) {
+		const unsigned int ratio = supported_mclk_lrck_ratios[i];
 
-	lrck_divider = clk / params_rate(params);
+		if (es8316->sysclk % ratio != 0)
+			continue;
+		if (es8316->sysclk / ratio == params_rate(params))
+			break;
+	}
+	if (i == ARRAY_SIZE(supported_mclk_lrck_ratios))
+		return -EINVAL;
+	lrck_divider = es8316->sysclk / params_rate(params);
 	bclk_divider = lrck_divider / 4;
 	switch (params_format(params)) {
 	case SNDRV_PCM_FORMAT_S16_LE:
diff --git a/sound/soc/codecs/es8316.h b/sound/soc/codecs/es8316.h
index 0ff16f948690..c335138e2837 100644
--- a/sound/soc/codecs/es8316.h
+++ b/sound/soc/codecs/es8316.h
@@ -129,7 +129,4 @@
 #define ES8316_GPIO_FLAG_GM_NOT_SHORTED		0x02
 #define ES8316_GPIO_FLAG_HP_NOT_INSERTED	0x04
 
-/* ES8316_CLKMGR_CLKSW */
-#define ES8316_CLKMGR_CLKSW_MCLK_DIV	0x80
-
 #endif
-- 
2.43.0

>From c309d8cf7e3c192683beacb3781458a2f8bfef81 Mon Sep 17 00:00:00 2001
From: Diederik de Haas 
Date: Tue, 16 Apr 2024 10:24:55 +0200
Subject: [PATCH 2/3] Revert "ASoC: es8316: Replace
 NR_SUPPORTED_MCLK_LRCK_RATIOS with ARRAY_SIZE()"

This reverts commit a43c0dc1004cbe2edbae9b6e6793db71f6896449.
---
 sound/soc/codecs/es8316.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/sound/soc/codecs/es8316.c b/sound/soc/codecs/es8316.c
index a1c3e10c3cf1..09fc0b25f600 100644
--- a/sound/soc/codecs/es8316.c
+++ b/sound/soc/codecs/es8316.c
@@ -27,6 +27,7 @@
  * MCLK/LRCK ratios, but we also add ratio 400, which is commonly used on
  * Intel Cherry Trail platforms (19.2MHz MCLK, 48kHz LRCK).
  */
+#define NR_S

Bug#1069077:

2024-04-15 Thread Forest
Control: retitle es8316 driver causes kernel oops / panic on rockpro64

Blacklisting the snd_soc_es8316 module in /etc/modprobe.d seems to restore
kernel stability, as far as I have seen from half a dozen reboots.



Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures

2024-04-15 Thread Forest
Package: src:linux
Version: 6.7.9-2
Severity: important
X-Debbugs-Cc: fores...@sonic.net

Dear Maintainer,

The current debian unstable kernel causes a variety of failures that are not
present in the bookworm kernel, on the RockPro64 single board computer. (This
is an arm64 machine built upon the Rockchip rk3399 SoC.)

The system is sometimes able to reach a state where sshd login works, allowing
me to run reportbug, but not always. Regardless of whether it gets that far,
dmesg often contains one or more stack traces, along with messages like these:

  kernel BUG at mm/slub.c:448!
  Internal error: Oops - BUG: f2000800 [#1] SMP

  WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:128 
ct_kernel_exit.isra.0+0xa0/0xa8

  Unable to handle kernel paging request at virtual address 4daee1bbcd3980fb

I have noticed es8316 driver error messages preceding some of these stack
traces, though I'm not sure if that is always the case.

Sometimes the stack traces appear only once, during boot, and the system
appears to run normally after that. Other times, they appear every few minutes,
and various things like network services and the ability to cleanly shut down,
or even log in at the serial console, fail. In one case, I noticed a message
mentioning a kernel panic in the serial console output when I was trying to
shut down.

Since the worst examples of failure prevent me from logging in, I am unable to
run reportbug to capture information about those cases.

Reverting to linux-image-6.1.0-20-arm64 solves the problem.


-- Package-specific info:
** Version:
Linux version 6.7.9-arm64 (debian-ker...@lists.debian.org) 
(aarch64-linux-gnu-gcc-13 (Debian 13.2.0-18) 13.2.0, GNU ld (GNU Binutils for 
Debian) 2.42) #1 SMP Debian 6.7.9-2 (2024-03-13)

** Command line:
root=/dev/mapper/ console=ttyS2,150n8 net.ifnames=0

** Tainted: DWC (1664)
 * kernel died recently, i.e. there was an OOPS or BUG
 * kernel issued warning
 * staging driver was loaded

** Kernel log:
[   56.250803]  driver_attach+0x2c/0x40
[   56.250809]  bus_add_driver+0x11c/0x238
[   56.250814]  driver_register+0x64/0x138
[   56.250821]  __platform_driver_register+0x30/0x48
[   56.252550]  graph_card_init+0x28/0xff8 [snd_soc_audio_graph_card]
[   56.252565]  do_one_initcall+0x60/0x298
[   56.252574]  do_init_module+0x60/0x218
[   56.252581]  load_module+0x22b4/0x23b8
[   56.252588]  __do_sys_init_module+0x230/0x290
[   56.252593]  __arm64_sys_init_module+0x24/0x38
[   56.252599]  invoke_syscall+0x78/0x100
[   56.252609]  el0_svc_common.constprop.0+0xc8/0xf0
[   56.252617]  do_el0_svc+0x24/0x38
[   56.252624]  el0_svc+0x3c/0x108
[   56.252633]  el0t_64_sync_handler+0x120/0x130
[   56.252639]  el0t_64_sync+0x190/0x198
[   56.256943] Code: 52800024 97fff9b4 a94563f7 17d0 (d421) 
[   56.256952] ---[ end trace  ]---
[   56.256957] note: (udev-worker)[554] exited with irqs disabled
[   56.257262] [ cut here ]
[   56.258816] WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:128 
ct_kernel_exit.isra.0+0xa0/0xa8
[   56.259633] Modules linked in: snd_soc_audio_graph_card(+) 
snd_soc_simple_card snd_soc_rockchip_i2s evdev snd_soc_spdif_tx 
snd_soc_simple_card_utils snd_soc_es8316 snd_soc_hdmi_codec v4l2_vp9 
rockchip_rga v4l2_h264 videobuf2_dma_contig snd_soc_core v4l2_mem2mem 
sha512_arm64 videobuf2_dma_sg governor_simpleondemand snd_compress 
snd_pcm_dmaengine snd_pcm videobuf2_memops panfrost dw_wdt videobuf2_v4l2 
snd_timer ofpart gpu_sched snd drbg(+) leds_gpio pwm_fan drm_shmem_helper 
spi_nor videodev des_generic ansi_cprng dw_hdmi_i2s_audio dw_hdmi_cec rk_crypto 
ecdh_generic(+) rockchip_saradc gpio_ir_recv rfkill videobuf2_common mc 
crypto_engine ecc nvmem_rockchip_efuse soundcore libdes mtd rockchip_thermal 
coresight_cpu_debug industrialio_triggered_buffer sg kfifo_buf coresight_etm4x 
rockchip_dfi industrialio coresight cpufreq_dt loop efi_pstore configfs 
ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_crypt 
dm_mod dax sd_mod t10_pi xhci_plat_hcd xhci_hcd crc64_rocksoft_generic 
crc64_rocksoft crc_t10dif
[   56.259856]  crct10dif_generic crc64 realtek ahci libahci libata 
rk808_regulator dwc3 scsi_mod udc_core scsi_common fusb302 tcpm ulpi typec 
crct10dif_ce crct10dif_common polyval_ce rockchipdrm polyval_generic dw_hdmi 
dwmac_rk fan53555 cec ghash_ce stmmac_platform rc_core stmmac gf128mul 
dw_mipi_dsi analogix_dp sha2_ce pcs_xpcs pwm_regulator sha256_arm64 
drm_display_helper phylink ohci_platform sha1_ce dwc3_of_simple of_mdio 
gpio_rockchip gpio_keys ohci_hcd ehci_platform drm_dma_helper fixed_phy 
sdhci_of_arasan ehci_hcd sdhci_pltfm drm_kms_helper cqhci dw_mmc_rockchip 
fwnode_mdio phy_rockchip_inno_usb2 phy_rockchip_emmc phy_rockchip_pcie 
phy_rockchip_typec usbcore io_domain pl330 pwm_rockchip spi_rockchip drm 
dw_mmc_pltfm sdhci libphy dw_mmc i2c_rk3x usb_common fixed aes_neon_bs 
aes_neon_blk aes_ce_blk aes_ce_cipher
[   56.274047] CPU: 2 PID: 0 Comm: swap