Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive

2024-05-16 Thread Paul Gevers

Hi Ben (and all the rest),

On 15-05-2024 9:56 p.m., Ben Hutchings wrote:

Apologies for leaving this bug for so long.


NP, part of live I guess.


Is this bug still occurring?


I don't know. The problem was severe enough for us to abandon the idea 
of running the backport kernels on our arm64 hosts, so we went back to 
the stable kernel there.



I had a look for possibly related fixes,
and found:

commit 22e111ed6c83dcde3037fc81176012721bc34c0b


[...]


The fix went into
6.8-rc1 and was backported to 6.6.15, so Debian versions 6.6.15-1
onward should have it.



commit a8b0026847b8c43445c921ad2c85521c92eb175f


[...]


which went into 6.8 but was *not* backported.


If you think it worth enough knowing if either is the case, I can 
install the backports kernel again on the arm64 hosts, but obviously 
that will be annoying for us. Please let me know if I should pursue this 
(I would be expecting a bit quicker turn around on this bug if you say 
yes now ;) ).



If the bug is still occurring, can you say what type of filesystem
rsync is being run on?


I'm not sure if this is the answer you're looking for, we use ext4.

Paul


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive

2024-05-15 Thread Ben Hutchings
Control: tag -1 moreinfo

On Wed, 6 Dec 2023 21:36:53 +0100 Paul Gevers  wrote:
> Control: tags -1 - moreinfo
> 
> Hi Ben and the rest,
> 
> On 04-12-2023 15:10, Ben Hutchings wrote:
> >> CPU: 6 PID: 15039 Comm: lxc-start Tainted: G  D W    L 
> >> 6.5.0-0.deb12.1-arm64 #1  Debian 6.5.3-1~bpo12+1
> > 
> > The D and W flags mean there were prior BUG and WARN errors logged.
> > Please send those as well.
> 
> Please find attached the content of the journal since the reboot. I 
> filtered out "debci".

Apologies for leaving this bug for so long.

Is this bug still occurring?  I had a look for possibly related fixes,
and found:

commit 22e111ed6c83dcde3037fc81176012721bc34c0b
Author: Al Viro 
Date:   Sun Nov 19 20:25:58 2023 -0500
 
rename(): fix the locking of subdirectories
[...]
Cc: sta...@vger.kernel.org
Fixes: 28eceeda130f "fs: Lock moved directories"

This claims to a fix a bug introduced in 6.5.  The fix went into
6.8-rc1 and was backported to 6.6.15, so Debian versions 6.6.15-1
onward should have it.

The lockup might alternately be fixed by:

commit a8b0026847b8c43445c921ad2c85521c92eb175f
Author: Al Viro 
Date:   Mon Nov 20 20:02:11 2023 -0500
 
rename(): avoid a deadlock in the case of parents having no common ancestor

which went into 6.8 but was *not* backported.

If the bug is still occurring, can you say what type of filesystem
rsync is being run on?

Ben.

-- 
Ben Hutchings
Anthony's Law of Force: Don't force it, get a larger hammer.



signature.asc
Description: This is a digitally signed message part


Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive

2023-12-07 Thread Emanuele Rocca
Hi,

On 2023-12-04 03:10, Ben Hutchings wrote:
> The D and W flags mean there were prior BUG and WARN errors logged. 
> Please send those as well.

Here is the very first warning:

   Nov 30 17:16:45 ci-worker-armhf-03 kernel: WARNING: CPU: 10 PID: 1592 at 
fs/dcache.c:365 dentry_free+0x98/0xd0
   [...]
   Nov 30 17:16:45 ci-worker-armhf-03 kernel: pc : dentry_free+0x98/0xd0
   Nov 30 17:16:45 ci-worker-armhf-03 kernel: lr : __dentry_kill+0x180/0x200
   [...]
   Nov 30 17:16:45 ci-worker-armhf-03 kernel: Call trace:
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  dentry_free+0x98/0xd0
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  __dentry_kill+0x180/0x200
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  dput+0x32c/0x440
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  do_renameat2+0x310/0x500
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  __arm64_sys_renameat+0x60/0x80
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  invoke_syscall+0x78/0x108
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  
el0_svc_common.constprop.0+0x4c/0xf8
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  do_el0_svc+0x40/0xa8
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  el0_svc+0x34/0xd8
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  el0t_64_sync_handler+0x100/0x130
   Nov 30 17:16:45 ci-worker-armhf-03 kernel:  el0t_64_sync+0x190/0x198

https://lore.kernel.org/lkml/zv-q8e-gawio0...@fvff77s0q05n.cambridge.arm.com/T/
looks similar?



Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive

2023-12-06 Thread Paul Gevers

Control: tags -1 - moreinfo

Hi Ben and the rest,

On 04-12-2023 15:10, Ben Hutchings wrote:

CPU: 6 PID: 15039 Comm: lxc-start Tainted: G  D WL 
6.5.0-0.deb12.1-arm64 #1  Debian 6.5.3-1~bpo12+1


The D and W flags mean there were prior BUG and WARN errors logged.
Please send those as well.


Please find attached the content of the journal since the reboot. I 
filtered out "debci".


Paul


kernel-bug-part0.log.xz
Description: application/xz


OpenPGP_signature.asc
Description: OpenPGP digital signature


Bug#1057282: linux-image-6.5.0-0.deb12.1-arm64: arm64 kernel upgrade makes systems unresponsive

2023-12-04 Thread Ben Hutchings
Control: reassign -1 src:linux 6.5.3-1~bpo12+1
Control: tag -1 moreinfo

On Sat, 2023-12-02 at 18:11 +0100, Paul Gevers wrote:
> Package: linux-image-6.5.0-0.deb12.1-arm64
> Version: 6.5.3-1~bpo12+1
> Severity: serious
> Justification: system stops responding
> 
> Dear kernel maintainers,
> 
> Thursday 30 November I upgraded the ci.debian.net workers. We're running 
> the backports kernel there due to issues we discussed earlier, but after 
> upgrading, we lost access to our arm64 hosts one after the other. We're 
> running the 6.4 kernel again now, and I extracted some of the logs. 
> Please let me know if you need more info.

The first error logged in this file has:

> CPU: 6 PID: 15039 Comm: lxc-start Tainted: G  D WL 
> 6.5.0-0.deb12.1-arm64 #1  Debian 6.5.3-1~bpo12+1

The D and W flags mean there were prior BUG and WARN errors logged. 
Please send those as well.

Ben.

-- 
Ben Hutchings
Power corrupts.  Absolute power is kind of neat. - John Lehman



signature.asc
Description: This is a digitally signed message part