Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-09-23 Thread Ben Hutchings
On Tue, 2020-09-22 at 23:52 +0200, Michel Le Bihan wrote:
> Hello,
> 
> I'm a bit late but I also have this issue and it occurs every several
> hours on my server. Here is the trace if anybody is still interested: 
> https://lebihan.pl/files/trace.txt
> 
> When can I expect the new package to be uploaded into stable?

In the point release, at the weekend.

Ben.

-- 
Ben Hutchings
The generation of random numbers is too important to be left to chance.
   - Robert Coveyou




signature.asc
Description: This is a digitally signed message part


Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-09-22 Thread Michel Le Bihan
Hello,

I'm a bit late but I also have this issue and it occurs every several
hours on my server. Here is the trace if anybody is still interested: 
https://lebihan.pl/files/trace.txt

When can I expect the new package to be uploaded into stable?

Michel Le Bihan



Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-09-11 Thread Salvatore Bonaccorso
Hi Sébastien,

On Fri, Sep 11, 2020 at 11:41:16AM +0200, Sébastien NOBILI wrote:
> Hi,
> 
> More than a week after, no problem with this build, working fine
> 24/7.

Thanks for confirming that. We are planning to rebase the version for
the next point release and so will contain the fix.

Regards,
Salvatore



Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-09-11 Thread Sébastien NOBILI

Hi,

More than a week after, no problem with this build, working fine 24/7.

Sébastien



Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-09-05 Thread Sébastien NOBILI

Hi Salvatore,

No crash for two days with this build. I'll send an update in a few 
days.


Sébastien



Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-09-03 Thread Sébastien NOBILI

Hi Salvatore,

Le 2020-08-30 17:08, Salvatore Bonaccorso a écrit :
On Sun, Aug 30, 2020 at 04:49:51PM +0200, isch...@der-ball-ist-rund.net 
wrote:
But I'm still running two lxc-containers and two virtual machines 
kvm/qemu.


Yes, the issue would not be exclusively with docker.


I'm facing this bug as well on a server with LXC containers (no 
Docker/KVM/Qemu

at all).


The issue should be pending fixed, but if you can I would appreciate
if you can test (warning: temporary and unofficial build!) packages
rebased to 4.19.142 upstream:

https://people.debian.org/~carnil/tmp/linux/4.19.142-1/


I've installed it (amd64 version) and will let you know how things are 
going on.


Sébastien



Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-08-30 Thread Salvatore Bonaccorso
Hi Immanuel,

On Sun, Aug 30, 2020 at 04:49:51PM +0200, isch...@der-ball-ist-rund.net wrote:
> I can confirm this bug. For me the panic is not exclusively related
> with docker.  I stopped my docker daemon - and I'm still suffering
> from random freezes of the kernel 4.19.0-10. Kernel is unusable.
> 
> But I'm still running two lxc-containers and two virtual machines kvm/qemu.
> 
> enclosed you'll find the "screenshot"

Yes, the issue would not be exclusively with docker.

The issue should be pending fixed, but if you can I would appreciate
if you can test (warning: temporary and unofficial build!) packages
rebased to 4.19.142 upstream:

https://people.debian.org/~carnil/tmp/linux/4.19.142-1/

Regards,
Salvatore



Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-08-10 Thread Shane Frasier
On Mon, 3 Aug 2020 09:55:28 +0200 =?UTF-8?Q?C=c3=a9dric_Dufour?= <
cedric.duf...@ced-network.net> wrote:
> Package: linux-source-4.19
> Version: 4.19.132-1
> Severity: important
>
> Hello,
>
> Since linux-image-4.19.0-10-amd64, I'm facing regular Kernel panics -
"RIP: 0010:__cgroup_bpf_run_filter_skb+0x26d/0x3d0" - resulting in full
(file) *server freeze*.
>
> The issue is pretty well described and summarized in
https://forum.proxmox.com/threads/kernel-5-4-44-causes-system-freeze-on-hp-microserver-gen8.72050/page-2#post-323498
>
> The "culprit" commit - "netprio_cgroup: Fix unlimited memory leak of v2
cgroups" - is indeed included in Debian kernel (4.19) since changelog entry
4.19.131-1
>
> It *seems* there is already a patch proposed upstream (although here for
kernel 4.9): https://lkml.org/lkml/2020/7/20/883
>
> Best regards,
>
> Cédric
>
> --
> Cédric Dufour
>
>

FWIW, I am seeing a very similar issue.  Some Debian 10 AWS instances used
to run Guacamole via Docker recently started randomly freezing up on me.  I
enabled kernel dumps and finally caught one of the machines misbehaving.
Looking at the kdump I see this:
  KERNEL: /usr/lib/debug/vmlinux-4.19.0-10-cloud-amd64
DUMPFILE: dump.202008101612  [PARTIAL DUMP]
CPUS: 2
DATE: Mon Aug 10 16:11:47 2020
  UPTIME: 00:05:44
LOAD AVERAGE: 0.21, 0.11, 0.04
   TASKS: 261
NODENAME: guac.env0.staging.cool.cyber.dhs.gov
 RELEASE: 4.19.0-10-cloud-amd64
 VERSION: #1 SMP Debian 4.19.132-1 (2020-07-24)
 MACHINE: x86_64  (2499 Mhz)
  MEMORY: 4 GB
   PANIC: "BUG: unable to handle kernel NULL pointer dereference at
0010"
 PID: 1453
 COMMAND: "sshd"
TASK: 8a3f695115c0  [THREAD_INFO: 8a3f695115c0]
 CPU: 0
   STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 1453   TASK: 8a3f695115c0  CPU: 0   COMMAND: "sshd"
 #0 [b37740c77800] machine_kexec at 97a4b297
 #1 [b37740c77858] __crash_kexec at 97b0e7dd
 #2 [b37740c77920] crash_kexec at 97b0f62d
 #3 [b37740c77938] oops_end at 97a2907d
 #4 [b37740c77958] no_context at 97a5858e
 #5 [b37740c779b0] __do_page_fault at 97a58c42
 #6 [b37740c77a20] async_page_fault at 982010be
[exception RIP: __cgroup_bpf_run_filter_skb+189]
RIP: 97b94ffd  RSP: b37740c77ad0  RFLAGS: 00010286
RAX:   RBX: 8a3ff55e5ee8  RCX: 
RDX: 0001  RSI: 8a3ff3d49800  RDI: 8a3ff52fd500
RBP: 8a3ff52fd500   R8: 8a3ff55e5ee8   R9: 0001
R10: 0001  R11: 8a3ef6dd7500  R12: 
R13:   R14: 8a3ff52fd840  R15: 8a3ff55e5ee8
ORIG_RAX:   CS: 0010  SS: 0018
 #7 [b37740c77b30] ip_finish_output at 97f65988
 #8 [b37740c77b68] ip_output at 97f6640c
 #9 [b37740c77bc0] __ip_queue_xmit at 97f65e6d
#10 [b37740c77c18] __tcp_transmit_skb at 97f80557
#11 [b37740c77c88] tcp_write_xmit at 97f81e34
#12 [b37740c77cf0] __tcp_push_pending_frames at 97f82ae1
#13 [b37740c77d00] tcp_sendmsg_locked at 97f733ac
#14 [b37740c77da8] tcp_sendmsg at 97f73507
#15 [b37740c77dc8] sock_sendmsg at 97ee8aa6
#16 [b37740c77de0] sock_write_iter at 97ee8b47
#17 [b37740c77e50] new_sync_write at 97c49bfb
#18 [b37740c77ed0] vfs_write at 97c4c7d5
#19 [b37740c77f00] ksys_write at 97c4ca77
#20 [b37740c77f38] do_syscall_64 at 97a04140
#21 [b37740c77f50] entry_SYSCALL_64_after_hwframe at 98200088
RIP: 7fd74beba504  RSP: 7ffc1d456638  RFLAGS: 0246
RAX: ffda  RBX: 0084  RCX: 7fd74beba504
RDX: 0084  RSI: 55785f33bb90  RDI: 0003
RBP: 55785f31d630   R8:    R9: 1000
R10: 0008  R11: 0246  R12: 01dd
R13: 55785ddc9b00  R14: 0003  R15: 7ffc1d4566e0
ORIG_RAX: 0001  CS: 0033  SS: 002b

crash> sym 97b94ffd
97b94ffd (T) __cgroup_bpf_run_filter_skb+189
./debian/build/build_amd64_none_cloud-amd64/./kernel/bpf/cgroup.c: 539

crash> log
[0.00] Linux version 4.19.0-10-cloud-amd64 (
debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP
Debian 4.19.132-1 (2020-07-24)
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-10-cloud-amd64
root=UUID=9ac8f5bd-5b64-48cd-9efd-2b2d35a30500 ro console=tty0
console=ttyS0,115200 earlyprintk=ttyS0,115200 nmi_watchdog=1 elevator=noop
scsi_mod.use_blk_mq=Y crashkernel=384M-:128M

[  478.686368] BUG: unable to handle kernel NULL pointer dereference at
0010
[  478.693551] PGD 0 P4D 0
[  478.696291] Oops:  [#1] SMP PTI
[  478.699431] CPU: 0 PID: 1453 Comm: sshd Kdump: loaded Not tainted

Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb

2020-08-03 Thread Cédric Dufour
Package: linux-source-4.19
Version: 4.19.132-1
Severity: important

Hello,

Since linux-image-4.19.0-10-amd64, I'm facing regular Kernel panics - "RIP: 
0010:__cgroup_bpf_run_filter_skb+0x26d/0x3d0" - resulting in full (file) 
*server freeze*.

The issue is pretty well described and summarized in 
https://forum.proxmox.com/threads/kernel-5-4-44-causes-system-freeze-on-hp-microserver-gen8.72050/page-2#post-323498

The "culprit" commit - "netprio_cgroup: Fix unlimited memory leak of v2 
cgroups" - is indeed included in Debian kernel (4.19) since changelog entry 
4.19.131-1

It *seems* there is already a patch proposed upstream (although here for kernel 
4.9): https://lkml.org/lkml/2020/7/20/883

Best regards,

Cédric

-- 
Cédric Dufour