Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive
Hi Ben (and all the rest), On 15-05-2024 9:56 p.m., Ben Hutchings wrote: Apologies for leaving this bug for so long. NP, part of live I guess. Is this bug still occurring? I don't know. The problem was severe enough for us to abandon the idea of running the backport kernels on our arm64 hosts, so we went back to the stable kernel there. I had a look for possibly related fixes, and found: commit 22e111ed6c83dcde3037fc81176012721bc34c0b [...] The fix went into 6.8-rc1 and was backported to 6.6.15, so Debian versions 6.6.15-1 onward should have it. commit a8b0026847b8c43445c921ad2c85521c92eb175f [...] which went into 6.8 but was *not* backported. If you think it worth enough knowing if either is the case, I can install the backports kernel again on the arm64 hosts, but obviously that will be annoying for us. Please let me know if I should pursue this (I would be expecting a bit quicker turn around on this bug if you say yes now ;) ). If the bug is still occurring, can you say what type of filesystem rsync is being run on? I'm not sure if this is the answer you're looking for, we use ext4. Paul OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive
Control: tag -1 moreinfo On Wed, 6 Dec 2023 21:36:53 +0100 Paul Gevers wrote: > Control: tags -1 - moreinfo > > Hi Ben and the rest, > > On 04-12-2023 15:10, Ben Hutchings wrote: > >> CPU: 6 PID: 15039 Comm: lxc-start Tainted: G D W L > >> 6.5.0-0.deb12.1-arm64 #1 Debian 6.5.3-1~bpo12+1 > > > > The D and W flags mean there were prior BUG and WARN errors logged. > > Please send those as well. > > Please find attached the content of the journal since the reboot. I > filtered out "debci". Apologies for leaving this bug for so long. Is this bug still occurring? I had a look for possibly related fixes, and found: commit 22e111ed6c83dcde3037fc81176012721bc34c0b Author: Al Viro Date: Sun Nov 19 20:25:58 2023 -0500 rename(): fix the locking of subdirectories [...] Cc: sta...@vger.kernel.org Fixes: 28eceeda130f "fs: Lock moved directories" This claims to a fix a bug introduced in 6.5. The fix went into 6.8-rc1 and was backported to 6.6.15, so Debian versions 6.6.15-1 onward should have it. The lockup might alternately be fixed by: commit a8b0026847b8c43445c921ad2c85521c92eb175f Author: Al Viro Date: Mon Nov 20 20:02:11 2023 -0500 rename(): avoid a deadlock in the case of parents having no common ancestor which went into 6.8 but was *not* backported. If the bug is still occurring, can you say what type of filesystem rsync is being run on? Ben. -- Ben Hutchings Anthony's Law of Force: Don't force it, get a larger hammer. signature.asc Description: This is a digitally signed message part
Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive
Hi, On 2023-12-04 03:10, Ben Hutchings wrote: > The D and W flags mean there were prior BUG and WARN errors logged. > Please send those as well. Here is the very first warning: Nov 30 17:16:45 ci-worker-armhf-03 kernel: WARNING: CPU: 10 PID: 1592 at fs/dcache.c:365 dentry_free+0x98/0xd0 [...] Nov 30 17:16:45 ci-worker-armhf-03 kernel: pc : dentry_free+0x98/0xd0 Nov 30 17:16:45 ci-worker-armhf-03 kernel: lr : __dentry_kill+0x180/0x200 [...] Nov 30 17:16:45 ci-worker-armhf-03 kernel: Call trace: Nov 30 17:16:45 ci-worker-armhf-03 kernel: dentry_free+0x98/0xd0 Nov 30 17:16:45 ci-worker-armhf-03 kernel: __dentry_kill+0x180/0x200 Nov 30 17:16:45 ci-worker-armhf-03 kernel: dput+0x32c/0x440 Nov 30 17:16:45 ci-worker-armhf-03 kernel: do_renameat2+0x310/0x500 Nov 30 17:16:45 ci-worker-armhf-03 kernel: __arm64_sys_renameat+0x60/0x80 Nov 30 17:16:45 ci-worker-armhf-03 kernel: invoke_syscall+0x78/0x108 Nov 30 17:16:45 ci-worker-armhf-03 kernel: el0_svc_common.constprop.0+0x4c/0xf8 Nov 30 17:16:45 ci-worker-armhf-03 kernel: do_el0_svc+0x40/0xa8 Nov 30 17:16:45 ci-worker-armhf-03 kernel: el0_svc+0x34/0xd8 Nov 30 17:16:45 ci-worker-armhf-03 kernel: el0t_64_sync_handler+0x100/0x130 Nov 30 17:16:45 ci-worker-armhf-03 kernel: el0t_64_sync+0x190/0x198 https://lore.kernel.org/lkml/zv-q8e-gawio0...@fvff77s0q05n.cambridge.arm.com/T/ looks similar?
Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive
Control: tags -1 - moreinfo Hi Ben and the rest, On 04-12-2023 15:10, Ben Hutchings wrote: CPU: 6 PID: 15039 Comm: lxc-start Tainted: G D WL 6.5.0-0.deb12.1-arm64 #1 Debian 6.5.3-1~bpo12+1 The D and W flags mean there were prior BUG and WARN errors logged. Please send those as well. Please find attached the content of the journal since the reboot. I filtered out "debci". Paul kernel-bug-part0.log.xz Description: application/xz OpenPGP_signature.asc Description: OpenPGP digital signature
Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive
Control: reassign -1 src:linux 6.5.3-1~bpo12+1 Control: tag -1 moreinfo On Sat, 2023-12-02 at 18:11 +0100, Paul Gevers wrote: > Package: linux-image-6.5.0-0.deb12.1-arm64 > Version: 6.5.3-1~bpo12+1 > Severity: serious > Justification: system stops responding > > Dear kernel maintainers, > > Thursday 30 November I upgraded the ci.debian.net workers. We're running > the backports kernel there due to issues we discussed earlier, but after > upgrading, we lost access to our arm64 hosts one after the other. We're > running the 6.4 kernel again now, and I extracted some of the logs. > Please let me know if you need more info. The first error logged in this file has: > CPU: 6 PID: 15039 Comm: lxc-start Tainted: G D WL > 6.5.0-0.deb12.1-arm64 #1 Debian 6.5.3-1~bpo12+1 The D and W flags mean there were prior BUG and WARN errors logged. Please send those as well. Ben. -- Ben Hutchings Power corrupts. Absolute power is kind of neat. - John Lehman signature.asc Description: This is a digitally signed message part