Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures
Control: fixed -1 6.8.9-1 On Fri, 17 May 2024 12:15:55 +0200, Diederik de Haas wrote: >Kernel 6.8.9 has recently been uploaded to Unstable which has that commit. >Can you verify that it indeed fixes this bug? Indeed, it seems to be fixed there. It usually takes only one or two boots to show up, but I didn't see it in five reboots with kernel 6.8.9. This matches what I found while bisecting for the past week. Note that I have not examined the es8316 driver code or its relationship to maple_tree. I don't know if the bug was in maple_tree and now fixed, or still lurks within the driver but is now hidden as a result of the maple_tree changes. In any case, I'm happy to report that I no longer see it breaking the OS in this newer kernel.
Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures
On Friday, 17 May 2024 03:36:35 CEST Forest wrote: > A git bisect reveals it to be fixed by this commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id= > f7a59018953910032231c0a019208c4b0a4a8bc3 > > maple_tree: make mas_erase() more robust > > > > mas_erase() may not deal correctly with all maple states. Make the > > function more robust by ensuring the state is in one of the two acceptable > > states. Kernel 6.8.9 has recently been uploaded to Unstable which has that commit. Can you verify that it indeed fixes this bug? signature.asc Description: This is a digitally signed message part.
Bug#1069077:
A git bisect reveals it to be fixed by this commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f7a59018953910032231c0a019208c4b0a4a8bc3 > maple_tree: make mas_erase() more robust > > mas_erase() may not deal correctly with all maple states. Make the > function more robust by ensuring the state is in one of the two acceptable > states.
Bug#1069077:
Control: found -1 6.7.12-1
Bug#1069077: Re: Bug#1069077: es8316 driver causes kernel oops / panic on rockpro64
Control: found -1 6.6.15-2 On Tue, 16 Apr 2024 10:34:45 +0200, Diederik de Haas wrote: >Can you try the Debian Testing kernel, which is at version 6.6.15? 6.6.15-2 also has the bug.
Bug#1069077: es8316 driver causes kernel oops / panic on rockpro64
Control: retitle -1 es8316 driver causes kernel oops / panic on rockpro64 On Tuesday, 16 April 2024 01:37:57 CEST Forest wrote: > The current debian unstable kernel causes a variety of failures that are not > present in the bookworm kernel, on the RockPro64 single board computer. > (This is an arm64 machine built upon the Rockchip rk3399 SoC.) On Tuesday, 16 April 2024 05:21:13 CEST Forest wrote: > Blacklisting the snd_soc_es8316 module in /etc/modprobe.d seems to restore > kernel stability, as far as I have seen from half a dozen reboots. Can you try the Debian Testing kernel, which is at version 6.6.15? If the 6.6.15 kernel does work properly, then the 3 commits for the es8316 driver in kernel 6.7 are the most likely suspects: 869f30782cda ASoC: es8316: Enable support for MCLK div by 2 a43c0dc1004c ASoC: es8316: Replace NR_SUPPORTED_MCLK_LRCK_RATIOS with ARRAY_SIZE() 2f06f231f0bf ASoC: es8316: Enable support for S32 LE format Attached you'll find 3 patches which revert those commits. https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4 describes a procedure with which you can apply patches to the (6.7) kernel. If that makes the 6.7 kernel work properly again, we likely have found the culprit for the kernel oops/panic. Can you first try the Testing (6.6.15) kernel and if that works try applying the attached patches to the 6.7 kernel?>From 407672343a738ede6f5e955e3afa57d16b37f4e6 Mon Sep 17 00:00:00 2001 From: Diederik de Haas Date: Tue, 16 Apr 2024 10:24:39 +0200 Subject: [PATCH 1/3] Revert "ASoC: es8316: Enable support for MCLK div by 2" This reverts commit 869f30782cdad0a86598a700a864e4a2bf44f8cc. --- sound/soc/codecs/es8316.c | 45 ++- sound/soc/codecs/es8316.h | 3 --- 2 files changed, 11 insertions(+), 37 deletions(-) diff --git a/sound/soc/codecs/es8316.c b/sound/soc/codecs/es8316.c index e53b2856d625..a1c3e10c3cf1 100644 --- a/sound/soc/codecs/es8316.c +++ b/sound/soc/codecs/es8316.c @@ -469,42 +469,19 @@ static int es8316_pcm_hw_params(struct snd_pcm_substream *substream, u8 bclk_divider; u16 lrck_divider; int i; - unsigned int clk = es8316->sysclk / 2; - bool clk_valid = false; - - /* We will start with halved sysclk and see if we can use it - * for proper clocking. This is to minimise the risk of running - * the CODEC with a too high frequency. We have an SKU where - * the sysclk frequency is 48Mhz and this causes the sound to be - * sped up. If we can run with a halved sysclk, we will use it, - * if we can't use it, then full sysclk will be used. - */ - do { - /* Validate supported sample rates that are autodetected from MCLK */ - for (i = 0; i < ARRAY_SIZE(supported_mclk_lrck_ratios); i++) { - const unsigned int ratio = supported_mclk_lrck_ratios[i]; - - if (clk % ratio != 0) -continue; - if (clk / ratio == params_rate(params)) -break; - } - if (i == ARRAY_SIZE(supported_mclk_lrck_ratios)) { - if (clk == es8316->sysclk) -return -EINVAL; - clk = es8316->sysclk; - } else { - clk_valid = true; - } - } while (!clk_valid); - if (clk != es8316->sysclk) { - snd_soc_component_update_bits(component, ES8316_CLKMGR_CLKSW, - ES8316_CLKMGR_CLKSW_MCLK_DIV, - ES8316_CLKMGR_CLKSW_MCLK_DIV); - } + /* Validate supported sample rates that are autodetected from MCLK */ + for (i = 0; i < ARRAY_SIZE(supported_mclk_lrck_ratios); i++) { + const unsigned int ratio = supported_mclk_lrck_ratios[i]; - lrck_divider = clk / params_rate(params); + if (es8316->sysclk % ratio != 0) + continue; + if (es8316->sysclk / ratio == params_rate(params)) + break; + } + if (i == ARRAY_SIZE(supported_mclk_lrck_ratios)) + return -EINVAL; + lrck_divider = es8316->sysclk / params_rate(params); bclk_divider = lrck_divider / 4; switch (params_format(params)) { case SNDRV_PCM_FORMAT_S16_LE: diff --git a/sound/soc/codecs/es8316.h b/sound/soc/codecs/es8316.h index 0ff16f948690..c335138e2837 100644 --- a/sound/soc/codecs/es8316.h +++ b/sound/soc/codecs/es8316.h @@ -129,7 +129,4 @@ #define ES8316_GPIO_FLAG_GM_NOT_SHORTED 0x02 #define ES8316_GPIO_FLAG_HP_NOT_INSERTED 0x04 -/* ES8316_CLKMGR_CLKSW */ -#define ES8316_CLKMGR_CLKSW_MCLK_DIV 0x80 - #endif -- 2.43.0 >From c309d8cf7e3c192683beacb3781458a2f8bfef81 Mon Sep 17 00:00:00 2001 From: Diederik de Haas Date: Tue, 16 Apr 2024 10:24:55 +0200 Subject: [PATCH 2/3] Revert "ASoC: es8316: Replace NR_SUPPORTED_MCLK_LRCK_RATIOS with ARRAY_SIZE()" This reverts commit a43c0dc1004cbe2edbae9b6e6793db71f6896449. --- sound/soc/codecs/es8316.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/sound/soc/codecs/es8316.c b/sound/soc/codecs/es8316.c index a1c3e10c3cf1..09fc0b25f600 100644 --- a/sound/soc/codecs/es8316.c +++ b/sound/soc/codecs/es8316.c @@ -27,6 +27,7 @@ * MCLK/LRCK ratios, but we also add ratio 400, which is commonly used on * Intel Cherry Trail platforms (19.2MHz MCLK, 48kHz LRCK). */ +#define NR_S
Bug#1069077:
Control: retitle es8316 driver causes kernel oops / panic on rockpro64 Blacklisting the snd_soc_es8316 module in /etc/modprobe.d seems to restore kernel stability, as far as I have seen from half a dozen reboots.
Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures
Package: src:linux Version: 6.7.9-2 Severity: important X-Debbugs-Cc: fores...@sonic.net Dear Maintainer, The current debian unstable kernel causes a variety of failures that are not present in the bookworm kernel, on the RockPro64 single board computer. (This is an arm64 machine built upon the Rockchip rk3399 SoC.) The system is sometimes able to reach a state where sshd login works, allowing me to run reportbug, but not always. Regardless of whether it gets that far, dmesg often contains one or more stack traces, along with messages like these: kernel BUG at mm/slub.c:448! Internal error: Oops - BUG: f2000800 [#1] SMP WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.isra.0+0xa0/0xa8 Unable to handle kernel paging request at virtual address 4daee1bbcd3980fb I have noticed es8316 driver error messages preceding some of these stack traces, though I'm not sure if that is always the case. Sometimes the stack traces appear only once, during boot, and the system appears to run normally after that. Other times, they appear every few minutes, and various things like network services and the ability to cleanly shut down, or even log in at the serial console, fail. In one case, I noticed a message mentioning a kernel panic in the serial console output when I was trying to shut down. Since the worst examples of failure prevent me from logging in, I am unable to run reportbug to capture information about those cases. Reverting to linux-image-6.1.0-20-arm64 solves the problem. -- Package-specific info: ** Version: Linux version 6.7.9-arm64 (debian-ker...@lists.debian.org) (aarch64-linux-gnu-gcc-13 (Debian 13.2.0-18) 13.2.0, GNU ld (GNU Binutils for Debian) 2.42) #1 SMP Debian 6.7.9-2 (2024-03-13) ** Command line: root=/dev/mapper/ console=ttyS2,150n8 net.ifnames=0 ** Tainted: DWC (1664) * kernel died recently, i.e. there was an OOPS or BUG * kernel issued warning * staging driver was loaded ** Kernel log: [ 56.250803] driver_attach+0x2c/0x40 [ 56.250809] bus_add_driver+0x11c/0x238 [ 56.250814] driver_register+0x64/0x138 [ 56.250821] __platform_driver_register+0x30/0x48 [ 56.252550] graph_card_init+0x28/0xff8 [snd_soc_audio_graph_card] [ 56.252565] do_one_initcall+0x60/0x298 [ 56.252574] do_init_module+0x60/0x218 [ 56.252581] load_module+0x22b4/0x23b8 [ 56.252588] __do_sys_init_module+0x230/0x290 [ 56.252593] __arm64_sys_init_module+0x24/0x38 [ 56.252599] invoke_syscall+0x78/0x100 [ 56.252609] el0_svc_common.constprop.0+0xc8/0xf0 [ 56.252617] do_el0_svc+0x24/0x38 [ 56.252624] el0_svc+0x3c/0x108 [ 56.252633] el0t_64_sync_handler+0x120/0x130 [ 56.252639] el0t_64_sync+0x190/0x198 [ 56.256943] Code: 52800024 97fff9b4 a94563f7 17d0 (d421) [ 56.256952] ---[ end trace ]--- [ 56.256957] note: (udev-worker)[554] exited with irqs disabled [ 56.257262] [ cut here ] [ 56.258816] WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.isra.0+0xa0/0xa8 [ 56.259633] Modules linked in: snd_soc_audio_graph_card(+) snd_soc_simple_card snd_soc_rockchip_i2s evdev snd_soc_spdif_tx snd_soc_simple_card_utils snd_soc_es8316 snd_soc_hdmi_codec v4l2_vp9 rockchip_rga v4l2_h264 videobuf2_dma_contig snd_soc_core v4l2_mem2mem sha512_arm64 videobuf2_dma_sg governor_simpleondemand snd_compress snd_pcm_dmaengine snd_pcm videobuf2_memops panfrost dw_wdt videobuf2_v4l2 snd_timer ofpart gpu_sched snd drbg(+) leds_gpio pwm_fan drm_shmem_helper spi_nor videodev des_generic ansi_cprng dw_hdmi_i2s_audio dw_hdmi_cec rk_crypto ecdh_generic(+) rockchip_saradc gpio_ir_recv rfkill videobuf2_common mc crypto_engine ecc nvmem_rockchip_efuse soundcore libdes mtd rockchip_thermal coresight_cpu_debug industrialio_triggered_buffer sg kfifo_buf coresight_etm4x rockchip_dfi industrialio coresight cpufreq_dt loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_crypt dm_mod dax sd_mod t10_pi xhci_plat_hcd xhci_hcd crc64_rocksoft_generic crc64_rocksoft crc_t10dif [ 56.259856] crct10dif_generic crc64 realtek ahci libahci libata rk808_regulator dwc3 scsi_mod udc_core scsi_common fusb302 tcpm ulpi typec crct10dif_ce crct10dif_common polyval_ce rockchipdrm polyval_generic dw_hdmi dwmac_rk fan53555 cec ghash_ce stmmac_platform rc_core stmmac gf128mul dw_mipi_dsi analogix_dp sha2_ce pcs_xpcs pwm_regulator sha256_arm64 drm_display_helper phylink ohci_platform sha1_ce dwc3_of_simple of_mdio gpio_rockchip gpio_keys ohci_hcd ehci_platform drm_dma_helper fixed_phy sdhci_of_arasan ehci_hcd sdhci_pltfm drm_kms_helper cqhci dw_mmc_rockchip fwnode_mdio phy_rockchip_inno_usb2 phy_rockchip_emmc phy_rockchip_pcie phy_rockchip_typec usbcore io_domain pl330 pwm_rockchip spi_rockchip drm dw_mmc_pltfm sdhci libphy dw_mmc i2c_rk3x usb_common fixed aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [ 56.274047] CPU: 2 PID: 0 Comm: swap