Bug#1033398: linux-image-amd64: reproducible kernel freeze on 5.19+

2023-06-05 Thread Florian Lehner

Hi all,

the fix was merged upstream with 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/maccess.c?id=d319f344561de23e810515d109c7278919bff7b0


- florian

On 3/25/23 16:58, Diederik de Haas wrote:

Control: found -1 5.19~rc4-1~exp1
Control: forwarded -1 
https://lore.kernel.org/bpf/20230118051443.78988-1-alexei.starovoi...@gmail.com/

On Saturday, 25 March 2023 16:00:47 CET Florian Lehner wrote:

Via https://snapshot.debian.org/binary/linux-image-amd64/ you can easily
test various kernel versions. Could you try whether 5.19~rc4-1~exp1
indeed produces the problem?


Yes - I can reproduce the total system freeze with 5.19~rc4-1~exp1


Thanks. Then the most likely case was that it was introduced in
the 5.19 merge window and thus also present in 5.19-rc1, but there isn't a
prebuild kernel to verify.


Since the running program is rather complex, it is not easily possible
to carve out a small reproducer. We can provide gdb backtraces from
freezes inside qemu.


Someone else would have to chime in for the backtraces; that's beyond my
skill set.


I just learned about
https://lore.kernel.org/bpf/20230118051443.78988-1-alexei.starovoitov@gmail.
com/. With the provided patch applied I no longer mange to freeze the
system.


I see you already responded to that thread, excellent :-)
Hopefully they'll read this whole bug report, but mentioning that your actual
problem was NOT triggered till 5.18, but did trigger from 5.19-rc4 and later,
could be useful. I may not fully understand what upstream talked about, but I
only saw a reference to a 6.0.0 kernel.

Thanks for testing and reporting back :-)




Bug#1033398: linux-image-amd64: reproducible kernel freeze on 5.19+

2023-03-25 Thread Florian Lehner



On Fri, 24 Mar 2023 13:50:15 +0100 Diederik de Haas 
 wrote:

On Friday, 24 March 2023 12:44:33 CET Tim Rühsen wrote:
> Package: linux-image-amd64
> Version: 6.1.20-1
> 
> We run a priviledged eBPF based tool with a communication between kernel and

> user space. It runs without issues on kernels 4.15 to 5.18.
> On kernels 5.19+, the whole system freezes after a few minutes.

Via https://snapshot.debian.org/binary/linux-image-amd64/ you can easily test 
various kernel versions. Could you try whether 5.19~rc4-1~exp1 indeed produces 
the problem?


Yes - I can reproduce the total system freeze with 5.19~rc4-1~exp1 
(2022-07-01) from 
https://snapshot.debian.org/package/linux-signed-amd64/5.19~rc4%2B1~exp1/.




> Since the running program is rather complex, it is not easily possible to
> carve out a small reproducer. We can provide gdb backtraces from freezes
> inside qemu.

Someone else would have to chime in for the backtraces; that's beyond my skill 
set.


I just learned about 
https://lore.kernel.org/bpf/20230118051443.78988-1-alexei.starovoi...@gmail.com/. 
With the provided patch applied I no longer mange to freeze the system.


- florian



Bug#1033398: linux-image-amd64: reproducible kernel freeze on 5.19+

2023-03-24 Thread Florian Lehner

Hi,

maybe some additional information.

The eBPF program is of type BPF_PROG_TYPE_PERF_EVENT and attached to all 
CPUs via the perf subsystem and the use of PERF_COUNT_SW_CPU_CLOCK. It 
is executed on a constant sampling frequency (usually 20 Hz).


We also do have qemus guest memory dumps available if this would help 
investigate the issue.


- florian

On Fri, 24 Mar 2023 12:44:33 +0100 =?utf-8?q?Tim_R=C3=BChsen?= 
 wrote:

Package: linux-image-amd64
Version: 6.1.20-1
Severity: important
X-Debbugs-Cc: tim.rueh...@gmx.de

Dear Maintainer,

   * What led up to the situation?

We run a priviledged eBPF based tool with a communication between kernel and 
user space.
It runs without issues on kernels 4.15 to 5.18.
On kernels 5.19+, the whole system freezes after a few minutes.
It seems that with more system activities (load, forks) the freeze happens 
earlier.
The underlying hardware seems to play no role, we could reproduce this on 
different
bare metal systems as well as within a qemu based VM.

Since the running program is rather complex, it is not easily possible to carve 
out a small reproducer.
We can provide gdb backtraces from freezes inside qemu.


-- System Information:
Debian Release: 12.0
  APT prefers testing-security
  APT policy: (500, 'testing-security'), (500, 'testing-debug'), (500, 
'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-7-amd64 (SMP w/20 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=locale: Cannot set 
LC_ALL to default locale: No such file or directory
UTF-8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages linux-image-amd64 depends on:
ii  linux-image-6.1.0-7-amd64  6.1.20-1

linux-image-amd64 recommends no packages.

linux-image-amd64 suggests no packages.

-- debconf information:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US:en",
LC_ALL = (unset),
LC_TIME = "en_DE.UTF-8",
LC_MONETARY = "en_DE.UTF-8",
LC_COLLATE = "en_DE.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
locale: Cannot set LC_ALL to default locale: No such file or directory