On Mon, 3 Aug 2020 09:55:28 +0200 =?UTF-8?Q?C=c3=a9dric_Dufour?= <
cedric.duf...@ced-network.net> wrote:
> Package: linux-source-4.19
> Version: 4.19.132-1
> Severity: important
>
> Hello,
>
> Since linux-image-4.19.0-10-amd64, I'm facing regular Kernel panics -
"RIP: 0010:__cgroup_bpf_run_filter_skb+0x26d/0x3d0" - resulting in full
(file) *server freeze*.
>
> The issue is pretty well described and summarized in
https://forum.proxmox.com/threads/kernel-5-4-44-causes-system-freeze-on-hp-microserver-gen8.72050/page-2#post-323498
>
> The "culprit" commit - "netprio_cgroup: Fix unlimited memory leak of v2
cgroups" - is indeed included in Debian kernel (4.19) since changelog entry
4.19.131-1
>
> It *seems* there is already a patch proposed upstream (although here for
kernel 4.9): https://lkml.org/lkml/2020/7/20/883
>
> Best regards,
>
> Cédric
>
> --
> Cédric Dufour
>
>
FWIW, I am seeing a very similar issue. Some Debian 10 AWS instances used
to run Guacamole via Docker recently started randomly freezing up on me. I
enabled kernel dumps and finally caught one of the machines misbehaving.
Looking at the kdump I see this:
KERNEL: /usr/lib/debug/vmlinux-4.19.0-10-cloud-amd64
DUMPFILE: dump.202008101612 [PARTIAL DUMP]
CPUS: 2
DATE: Mon Aug 10 16:11:47 2020
UPTIME: 00:05:44
LOAD AVERAGE: 0.21, 0.11, 0.04
TASKS: 261
NODENAME: guac.env0.staging.cool.cyber.dhs.gov
RELEASE: 4.19.0-10-cloud-amd64
VERSION: #1 SMP Debian 4.19.132-1 (2020-07-24)
MACHINE: x86_64 (2499 Mhz)
MEMORY: 4 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at
0010"
PID: 1453
COMMAND: "sshd"
TASK: 8a3f695115c0 [THREAD_INFO: 8a3f695115c0]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 1453 TASK: 8a3f695115c0 CPU: 0 COMMAND: "sshd"
#0 [b37740c77800] machine_kexec at 97a4b297
#1 [b37740c77858] __crash_kexec at 97b0e7dd
#2 [b37740c77920] crash_kexec at 97b0f62d
#3 [b37740c77938] oops_end at 97a2907d
#4 [b37740c77958] no_context at 97a5858e
#5 [b37740c779b0] __do_page_fault at 97a58c42
#6 [b37740c77a20] async_page_fault at 982010be
[exception RIP: __cgroup_bpf_run_filter_skb+189]
RIP: 97b94ffd RSP: b37740c77ad0 RFLAGS: 00010286
RAX: RBX: 8a3ff55e5ee8 RCX:
RDX: 0001 RSI: 8a3ff3d49800 RDI: 8a3ff52fd500
RBP: 8a3ff52fd500 R8: 8a3ff55e5ee8 R9: 0001
R10: 0001 R11: 8a3ef6dd7500 R12:
R13: R14: 8a3ff52fd840 R15: 8a3ff55e5ee8
ORIG_RAX: CS: 0010 SS: 0018
#7 [b37740c77b30] ip_finish_output at 97f65988
#8 [b37740c77b68] ip_output at 97f6640c
#9 [b37740c77bc0] __ip_queue_xmit at 97f65e6d
#10 [b37740c77c18] __tcp_transmit_skb at 97f80557
#11 [b37740c77c88] tcp_write_xmit at 97f81e34
#12 [b37740c77cf0] __tcp_push_pending_frames at 97f82ae1
#13 [b37740c77d00] tcp_sendmsg_locked at 97f733ac
#14 [b37740c77da8] tcp_sendmsg at 97f73507
#15 [b37740c77dc8] sock_sendmsg at 97ee8aa6
#16 [b37740c77de0] sock_write_iter at 97ee8b47
#17 [b37740c77e50] new_sync_write at 97c49bfb
#18 [b37740c77ed0] vfs_write at 97c4c7d5
#19 [b37740c77f00] ksys_write at 97c4ca77
#20 [b37740c77f38] do_syscall_64 at 97a04140
#21 [b37740c77f50] entry_SYSCALL_64_after_hwframe at 98200088
RIP: 7fd74beba504 RSP: 7ffc1d456638 RFLAGS: 0246
RAX: ffda RBX: 0084 RCX: 7fd74beba504
RDX: 0084 RSI: 55785f33bb90 RDI: 0003
RBP: 55785f31d630 R8: R9: 1000
R10: 0008 R11: 0246 R12: 01dd
R13: 55785ddc9b00 R14: 0003 R15: 7ffc1d4566e0
ORIG_RAX: 0001 CS: 0033 SS: 002b
crash> sym 97b94ffd
97b94ffd (T) __cgroup_bpf_run_filter_skb+189
./debian/build/build_amd64_none_cloud-amd64/./kernel/bpf/cgroup.c: 539
crash> log
[0.00] Linux version 4.19.0-10-cloud-amd64 (
debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP
Debian 4.19.132-1 (2020-07-24)
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-10-cloud-amd64
root=UUID=9ac8f5bd-5b64-48cd-9efd-2b2d35a30500 ro console=tty0
console=ttyS0,115200 earlyprintk=ttyS0,115200 nmi_watchdog=1 elevator=noop
scsi_mod.use_blk_mq=Y crashkernel=384M-:128M
[ 478.686368] BUG: unable to handle kernel NULL pointer dereference at
0010
[ 478.693551] PGD 0 P4D 0
[ 478.696291] Oops: [#1] SMP PTI
[ 478.699431] CPU: 0 PID: 1453 Comm: sshd Kdump: loaded Not tainted