Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On 08/17/2018 11:13 PM, Peter Robinson wrote: > On Fri, Aug 17, 2018 at 7:30 PM, Daniel Borkmann wrote: >> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote: >>> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote: I'd have one potential bug suspicion, for the 4.18 one you were trying, could you run with the below patch to see whether it would help? >>> >>> I think this is almost certainly the problem - looking at the history, >>> it seems that the "-4" was assumed to be part of the scratch stuff in >>> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space") >>> but it isn't - it's because "off" of zero refers to the top word in the >>> stack (iow at STACK_SIZE-4). >> >> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that). >> Waiting for Peter to get back with results for definite confirmation. Your >> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked >> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using >> ARM FP register") fixes this in mainline, so unless I'm missing something >> this >> would only need a stand-alone fix for 4.18/stable which I can cook up and >> submit then. > > I can confirm that fixes the problems I was seeing on Fedora 29. > > Feel free to add a tested by from me: > > Tested-by: Peter Robinson Great, thanks everyone! Will get it out asap.
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
Hi Stefan, >> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote: >> > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote: >> >> I'd have one potential bug suspicion, for the 4.18 one you were trying, >> >> could you run with the below patch to see whether it would help? >> > >> > I think this is almost certainly the problem - looking at the history, >> > it seems that the "-4" was assumed to be part of the scratch stuff in >> > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space") >> > but it isn't - it's because "off" of zero refers to the top word in the >> > stack (iow at STACK_SIZE-4). >> >> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that). >> Waiting for Peter to get back with results for definite confirmation. Your >> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked >> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using >> ARM FP register") fixes this in mainline, so unless I'm missing something >> this >> would only need a stand-alone fix for 4.18/stable which I can cook up and >> submit then. > > i was able to reproduce this issue on RPi 3 with Linux 4.18.1 + > multi_v7_defconfig and the following config changes: > > --- a/arch/arm/configs/multi_v7_defconfig > +++ b/arch/arm/configs/multi_v7_defconfig > @@ -2,7 +2,10 @@ CONFIG_SYSVIPC=y > CONFIG_NO_HZ=y > CONFIG_HIGH_RES_TIMERS=y > CONFIG_CGROUPS=y > +CONFIG_CGROUP_BPF=y > CONFIG_BLK_DEV_INITRD=y > +CONFIG_BPF_SYSCALL=y > +CONFIG_BPF_JIT_ALWAYS_ON=y > CONFIG_EMBEDDED=y > CONFIG_PERF_EVENTS=y > CONFIG_MODULES=y > @@ -153,6 +156,8 @@ CONFIG_IPV6_MIP6=m > CONFIG_IPV6_TUNNEL=m > CONFIG_IPV6_MULTIPLE_TABLES=y > CONFIG_NET_DSA=m > +CONFIG_BPF_JIT=y > +CONFIG_BPF_STREAM_PARSER=y > CONFIG_CAN=y > CONFIG_CAN_AT91=m > CONFIG_CAN_FLEXCAN=m > > After applying the "-4" patch the oopses doesn't appear during boot anymore. Would be fab to get that into the kernel so this is widely tested moving forward. Peter
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Fri, Aug 17, 2018 at 7:30 PM, Daniel Borkmann wrote: > On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote: >> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote: >>> I'd have one potential bug suspicion, for the 4.18 one you were trying, >>> could you run with the below patch to see whether it would help? >> >> I think this is almost certainly the problem - looking at the history, >> it seems that the "-4" was assumed to be part of the scratch stuff in >> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space") >> but it isn't - it's because "off" of zero refers to the top word in the >> stack (iow at STACK_SIZE-4). > > Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that). > Waiting for Peter to get back with results for definite confirmation. Your > rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked > registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using > ARM FP register") fixes this in mainline, so unless I'm missing something this > would only need a stand-alone fix for 4.18/stable which I can cook up and > submit then. I can confirm that fixes the problems I was seeing on Fedora 29. Feel free to add a tested by from me: Tested-by: Peter Robinson
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Fri, Aug 17, 2018 at 5:17 PM, Russell King - ARM Linux wrote: > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote: >> I'd have one potential bug suspicion, for the 4.18 one you were trying, >> could you run with the below patch to see whether it would help? > > I think this is almost certainly the problem - looking at the history, > it seems that the "-4" was assumed to be part of the scratch stuff in > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space") > but it isn't - it's because "off" of zero refers to the top word in the > stack (iow at STACK_SIZE-4). I can confirm that patch fixes the problem I was seeing. Peter
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
Hi Daniel, > Daniel Borkmann hat am 17. August 2018 um 20:30 > geschrieben: > > > On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote: > > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote: > >> I'd have one potential bug suspicion, for the 4.18 one you were trying, > >> could you run with the below patch to see whether it would help? > > > > I think this is almost certainly the problem - looking at the history, > > it seems that the "-4" was assumed to be part of the scratch stuff in > > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space") > > but it isn't - it's because "off" of zero refers to the top word in the > > stack (iow at STACK_SIZE-4). > > Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that). > Waiting for Peter to get back with results for definite confirmation. Your > rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked > registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using > ARM FP register") fixes this in mainline, so unless I'm missing something this > would only need a stand-alone fix for 4.18/stable which I can cook up and > submit then. i was able to reproduce this issue on RPi 3 with Linux 4.18.1 + multi_v7_defconfig and the following config changes: --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -2,7 +2,10 @@ CONFIG_SYSVIPC=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_CGROUPS=y +CONFIG_CGROUP_BPF=y CONFIG_BLK_DEV_INITRD=y +CONFIG_BPF_SYSCALL=y +CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y CONFIG_MODULES=y @@ -153,6 +156,8 @@ CONFIG_IPV6_MIP6=m CONFIG_IPV6_TUNNEL=m CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_NET_DSA=m +CONFIG_BPF_JIT=y +CONFIG_BPF_STREAM_PARSER=y CONFIG_CAN=y CONFIG_CAN_AT91=m CONFIG_CAN_FLEXCAN=m After applying the "-4" patch the oopses doesn't appear during boot anymore. Stefan > > Thanks, > Daniel > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote: > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote: >> I'd have one potential bug suspicion, for the 4.18 one you were trying, >> could you run with the below patch to see whether it would help? > > I think this is almost certainly the problem - looking at the history, > it seems that the "-4" was assumed to be part of the scratch stuff in > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space") > but it isn't - it's because "off" of zero refers to the top word in the > stack (iow at STACK_SIZE-4). Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that). Waiting for Peter to get back with results for definite confirmation. Your rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using ARM FP register") fixes this in mainline, so unless I'm missing something this would only need a stand-alone fix for 4.18/stable which I can cook up and submit then. Thanks, Daniel
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote: > I'd have one potential bug suspicion, for the 4.18 one you were trying, > could you run with the below patch to see whether it would help? I think this is almost certainly the problem - looking at the history, it seems that the "-4" was assumed to be part of the scratch stuff in commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space") but it isn't - it's because "off" of zero refers to the top word in the stack (iow at STACK_SIZE-4). -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Fri, Aug 17, 2018 at 1:40 PM, Daniel Borkmann wrote: > On 08/17/2018 02:25 PM, Peter Robinson wrote: >> On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux >> wrote: >>> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote: On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote: > So with that and the other fix there was no improvement, with those > and the BPF JIT disabled it works, I'm not sure if the two patches > have any effect with the JIT disabled though. I can confirm the crash with the released 4.18.1 on Banana Pi, and I can also confirm that disabling BPF JIT makes the Banana Pi work again., >>> >>> I'm afraid that the information in the crash dumps is insufficient >>> to be able to work very much out about these crashes. >>> >>> We need a recipe (kernel configuration and what userspace is doing) >>> so that it's possible to recreate the crash, or we need responses >>> to requests for information - I requested the disassembly of >>> sk_filter_trim_cap and the BPF code dump via setting a sysctl back >>> in early July. Without this, as I say, I don't see how this problem >>> can be progressed. >> >> I can provide a kernel config [1] but I've not had enough time to sit >> down and get the rest of the stuff and debug it due to a combination >> of travel and other priorities. > > Did you get a chance to try latest kernel from Linus' tree [1] from last > few days to see whether the issue is still persistent? There have been > a number of improvements, bit strange why e.g. Russell didn't run into > it while others have, hmm. Perhaps due to EABI vs non EABI. I haven't had a chance to try anything from the 4.19 merge window as yet, I'm traveling this week so it was on the list for next week to try. > [1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >>> If the problem is at boot, one way to set the sysctl would be to >>> hack the kernel and explicitly initialise the sysctl to '2', or >>> boot with init=/bin/sh, then manually mount /proc, set the sysctl, >>> and then "exec /sbin/init" from that shell. (Remember there's no >>> job control in that shell, so ^z, ^c, etc do not work.) >> >> It starts to happen in the early kernel boot long before we get to any >> userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and >> AllWinner H3 based devices at least). >> >> [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config > > I'd have one potential bug suspicion, for the 4.18 one you were trying, > could you run with the below patch to see whether it would help? I will try and get someone to test that today, thanks > Thanks, > Daniel > > diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c > index f6a62ae..c864f6b 100644 > --- a/arch/arm/net/bpf_jit_32.c > +++ b/arch/arm/net/bpf_jit_32.c > @@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size) > #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT) > > /* Get the offset of eBPF REGISTERs stored on scratch space. */ > -#define STACK_VAR(off) (STACK_SIZE - off) > +#define STACK_VAR(off) (STACK_SIZE - off - 4) > > #if __LINUX_ARM_ARCH__ < 7 >
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On 08/17/2018 02:25 PM, Peter Robinson wrote: > On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux > wrote: >> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote: >>> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote: So with that and the other fix there was no improvement, with those and the BPF JIT disabled it works, I'm not sure if the two patches have any effect with the JIT disabled though. >>> >>> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can >>> also confirm that disabling BPF JIT makes the Banana Pi work again., >> >> I'm afraid that the information in the crash dumps is insufficient >> to be able to work very much out about these crashes. >> >> We need a recipe (kernel configuration and what userspace is doing) >> so that it's possible to recreate the crash, or we need responses >> to requests for information - I requested the disassembly of >> sk_filter_trim_cap and the BPF code dump via setting a sysctl back >> in early July. Without this, as I say, I don't see how this problem >> can be progressed. > > I can provide a kernel config [1] but I've not had enough time to sit > down and get the rest of the stuff and debug it due to a combination > of travel and other priorities. Did you get a chance to try latest kernel from Linus' tree [1] from last few days to see whether the issue is still persistent? There have been a number of improvements, bit strange why e.g. Russell didn't run into it while others have, hmm. Perhaps due to EABI vs non EABI. [1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> If the problem is at boot, one way to set the sysctl would be to >> hack the kernel and explicitly initialise the sysctl to '2', or >> boot with init=/bin/sh, then manually mount /proc, set the sysctl, >> and then "exec /sbin/init" from that shell. (Remember there's no >> job control in that shell, so ^z, ^c, etc do not work.) > > It starts to happen in the early kernel boot long before we get to any > userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and > AllWinner H3 based devices at least). > > [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config I'd have one potential bug suspicion, for the 4.18 one you were trying, could you run with the below patch to see whether it would help? Thanks, Daniel diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index f6a62ae..c864f6b 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size) #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT) /* Get the offset of eBPF REGISTERs stored on scratch space. */ -#define STACK_VAR(off) (STACK_SIZE - off) +#define STACK_VAR(off) (STACK_SIZE - off - 4) #if __LINUX_ARM_ARCH__ < 7
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux wrote: > On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote: >> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote: >> > So with that and the other fix there was no improvement, with those >> > and the BPF JIT disabled it works, I'm not sure if the two patches >> > have any effect with the JIT disabled though. >> >> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can >> also confirm that disabling BPF JIT makes the Banana Pi work again., > > Hi, > > I'm afraid that the information in the crash dumps is insufficient > to be able to work very much out about these crashes. > > We need a recipe (kernel configuration and what userspace is doing) > so that it's possible to recreate the crash, or we need responses > to requests for information - I requested the disassembly of > sk_filter_trim_cap and the BPF code dump via setting a sysctl back > in early July. Without this, as I say, I don't see how this problem > can be progressed. I can provide a kernel config [1] but I've not had enough time to sit down and get the rest of the stuff and debug it due to a combination of travel and other priorities. > If the problem is at boot, one way to set the sysctl would be to > hack the kernel and explicitly initialise the sysctl to '2', or > boot with init=/bin/sh, then manually mount /proc, set the sysctl, > and then "exec /sbin/init" from that shell. (Remember there's no > job control in that shell, so ^z, ^c, etc do not work.) It starts to happen in the early kernel boot long before we get to any userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and AllWinner H3 based devices at least). [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote: > On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote: > > So with that and the other fix there was no improvement, with those > > and the BPF JIT disabled it works, I'm not sure if the two patches > > have any effect with the JIT disabled though. > > I can confirm the crash with the released 4.18.1 on Banana Pi, and I can > also confirm that disabling BPF JIT makes the Banana Pi work again., Hi, I'm afraid that the information in the crash dumps is insufficient to be able to work very much out about these crashes. We need a recipe (kernel configuration and what userspace is doing) so that it's possible to recreate the crash, or we need responses to requests for information - I requested the disassembly of sk_filter_trim_cap and the BPF code dump via setting a sysctl back in early July. Without this, as I say, I don't see how this problem can be progressed. If the problem is at boot, one way to set the sysctl would be to hack the kernel and explicitly initialise the sysctl to '2', or boot with init=/bin/sh, then manually mount /proc, set the sysctl, and then "exec /sbin/init" from that shell. (Remember there's no job control in that shell, so ^z, ^c, etc do not work.) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote: > So with that and the other fix there was no improvement, with those > and the BPF JIT disabled it works, I'm not sure if the two patches > have any effect with the JIT disabled though. I can confirm the crash with the released 4.18.1 on Banana Pi, and I can also confirm that disabling BPF JIT makes the Banana Pi work again., Greetings Marc [0.004930] /cpus/cpu@0 missing clock-frequency property [0.004965] /cpus/cpu@1 missing clock-frequency property [4.959858] zswap: default zpool zbud not available [4.964820] zswap: pool creation failed WARNING: Failed to connect to lvmetad. Falling back to device scanning. WARNING: Failed to connect to lvmetad. Falling back to device scanning. [ 10.721077] Unable to handle kernel NULL pointer dereference at virtual address 000c [ 10.722949] Unable to handle kernel NULL pointer dereference at virtual address 000c [ 10.729288] pgd = (ptrval) [ 10.729299] [000c] *pgd=6dc65003, *pmd= [ 10.737464] pgd = (ptrval) [ 10.740176] Internal error: Oops: a06 [#1] SMP ARM [ 10.745056] [000c] *pgd=6e72a003 [ 10.747742] Modules linked in: ip_tables x_tables autofs4 btrfs [ 10.752561] , *pmd= [ 10.756113] libcrc32c crc32c_generic xor zstd_decompress zstd_compress xxhash [ 10.764833] zlib_deflate raid6_pq dm_mod dax axp20x_regulator realtek ahci_sunxi dwmac_sunxi stmmac_platform libahci_platform stmmac i2c_mv64xxx libahci libata scsi_mod ohci_platform ohci_hcd ehci_platform ehci_hcd phy_sun4i_usb sunxi_mmc [ 10.793306] CPU: 1 PID: 238 Comm: systemd-udevd Not tainted 4.18.1-zgbpi-armmp-lpae #3 [ 10.801212] Hardware name: Allwinner sun7i (A20) Family [ 10.806448] PC is at sk_filter_trim_cap+0xa0/0x1d4 [ 10.811238] LR is at (null) [ 10.814205] pc : []lr : [<>]psr: 600f0013 [ 10.820466] sp : edc7dcf8 ip : fp : edc7dd34 [ 10.825686] r10: r9 : r8 : [ 10.830907] r7 : 0001 r6 : f0e96000 r5 : c0e04cc8 r4 : [ 10.837428] r3 : 0007 r2 : fb5e2d70 r1 : r0 : [ 10.843952] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 10.851081] Control: 30c5387d Table: 6e6c7580 DAC: 2c983336 [ 10.856822] Process systemd-udevd (pid: 238, stack limit = 0x(ptrval)) [ 10.863344] Stack: (0xedc7dcf8 to 0xedc7e000) [ 10.867700] dce0: edc7dd1c edc7dd08 [ 10.875873] dd00: c06a41dc c06a4048 ee7d39c0 fb5e2d70 ee479800 ee6c2400 edc33840 c0e6aac0 [ 10.884046] dd20: 0001 edc7dd8c edc7dd38 c0705884 c06de2f4 edc7de24 0001 [ 10.892219] dd40: c0ec649c ee479864 ee7d39c0 0002 [ 10.900391] dd60: edc7df44 c0e04cc8 ee7d39c0 ee6c2400 008c 0002 [ 10.908565] dd80: edc7ddf4 edc7dd90 c0705ee0 c0705610 006000c0 fb5e2d70 [ 10.916737] dda0: 0008 ef357c80 00ee [ 10.924910] ddc0: fb5e2d70 008c edc7df44 eef08700 0040 eef08700 [ 10.933083] dde0: edc7dedc edc7de0c edc7ddf8 c069b948 c0705b78 edc7df44 c0e04cc8 [ 10.941256] de00: edc7df2c edc7de10 c069c2f8 c069b910 c0e04cc8 edc7dec0 be8dcfac [ 10.949428] de20: 0028 0186a660 0064 bf387954 edc7df48 be8dcf80 [ 10.957602] de40: be8dcf80 b6f19ce8 0128 4028 b6e01346 000e 0010 [ 10.965774] de60: 0002 be8dcf80 [ 10.973948] de80: b6f19ce8 fb5e2d70 edc7deb4 e000 c0e04cc8 [ 10.982120] dea0: 0128 c0201204 0080 edc7df6c edc7dec0 c02f5e2c c02f5c18 [ 10.990293] dec0: fb5e2d70 edc7def4 a0010013 c9f1e000 c03f986c edc7df50 [ 10.998466] dee0: 000e 4000 edc7df3c fb5e2d70 c0409c98 c0409d34 edc7df14 fb5e2d70 [ 11.006639] df00: c0409d34 c0e04cc8 be8dcf80 eef08700 c0201204 edc7c000 0128 [ 11.014812] df20: edc7df94 edc7df30 c069d818 c069c0a0 c0e04cc8 [ 11.022984] df40: fff7 edc7de5c 000c 0001 edc7de2c [ 11.031156] df60: edc7df7c 0040 fb5e2d70 be8dcf80 b6f19ce8 [ 11.039329] df80: 01878670 0128 edc7dfa4 edc7df98 c069d870 c069d7c4 edc7dfa8 [ 11.047502] dfa0: c02011cc c069d860 be8dcf80 b6f19ce8 000e be8dcf80 [ 11.055675] dfc0: be8dcf80 b6f19ce8 01878670 0128 0064 01878e80 [ 11.063848] dfe0: 0128 be8dcf50 b6e003e3 b6e01346 200f0030 000e [ 11.072038] [] (sk_filter_trim_cap) from [] (netlink_broadcast_filtered+0x280/0x460) [ 11.081517] [] (netlink_broadcast_filtered) from [] (netlink_sendmsg+0x374/0x3b0) [ 11.090734] [] (netlink_sendmsg) from
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On 07/05/2018 09:31 AM, Russell King - ARM Linux wrote: > On Thu, Jul 05, 2018 at 12:41:54AM +0100, Russell King - ARM Linux wrote: >> Subject says offlist, but this isn't... >> >> On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote: >>> Sorry for the delay on this from my end. I noticed there was some bpf >>> bits land in the last net fixes pull request landed Monday so I built >>> a kernel with the JIT reenabled. It seems it's improved in that the >>> completely dead no output boot has gone but the original problem that >>> arrived in the merge window still persists: >>> >>> [ 17.564142] note: systemd-udevd[194] exited with preempt_count 1 >>> [ 17.592739] Unable to handle kernel NULL pointer dereference at >>> virtual address 000c >>> [ 17.601002] pgd = (ptrval) >>> [ 17.603819] [000c] *pgd= >>> [ 17.607487] Internal error: Oops: 805 [#10] SMP ARM >>> [ 17.612396] Modules linked in: >>> [ 17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G D >>> 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1 >>> [ 17.626056] Hardware name: Generic AM33XX (Flattened Device Tree) >>> [ 17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc >>> [ 17.637102] LR is at (null) >>> [ 17.640086] pc : []lr : [<>]psr: 6013 >>> [ 17.646384] sp : cfe1dd48 ip : fp : >>> [ 17.651635] r10: d837e000 r9 : d833be00 r8 : >>> [ 17.656887] r7 : 0001 r6 : e003d000 r5 : r4 : >>> [ 17.663447] r3 : 0007 r2 : r1 : r0 : >>> [ 17.670009] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment >>> none >>> [ 17.677180] Control: 10c5387d Table: 8fe20019 DAC: 0051 >>> [ 17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval)) >>> [ 17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000) >> >> Can you provide a full disassembly of sk_filter_trim_cap from vmlinux >> (iow, annotated with its linked address) for the above dump please - >> alternatively a new dump with matching disassembly. Thanks. > > Also probably a good idea to have bpf_jit_enable set to 2 to get a > dump of the bpf program being run, which I think for your problem, > you'll have to hack the kernel source to do that. Agree, that would be good as well. You could use something like the below to bail out to interpreter after JIT did the dump. Dump will then land in kernel log which you could paste here. diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index f6a62ae..d6a7dfd 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -1844,6 +1844,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) /* there are 2 passes here */ bpf_jit_dump(prog->len, image_size, 2, ctx.target); + /* Defer to interpreter after dump. */ + if (1) { + bpf_jit_binary_free(header); + prog = orig_prog; + goto out_imms; + } + bpf_jit_binary_lock_ro(header); prog->bpf_func = (void *)ctx.target; prog->jited = 1;
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Thu, Jul 05, 2018 at 12:41:54AM +0100, Russell King - ARM Linux wrote: > Subject says offlist, but this isn't... > > On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote: > > Sorry for the delay on this from my end. I noticed there was some bpf > > bits land in the last net fixes pull request landed Monday so I built > > a kernel with the JIT reenabled. It seems it's improved in that the > > completely dead no output boot has gone but the original problem that > > arrived in the merge window still persists: > > > > [ 17.564142] note: systemd-udevd[194] exited with preempt_count 1 > > [ 17.592739] Unable to handle kernel NULL pointer dereference at > > virtual address 000c > > [ 17.601002] pgd = (ptrval) > > [ 17.603819] [000c] *pgd= > > [ 17.607487] Internal error: Oops: 805 [#10] SMP ARM > > [ 17.612396] Modules linked in: > > [ 17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G D > > 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1 > > [ 17.626056] Hardware name: Generic AM33XX (Flattened Device Tree) > > [ 17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc > > [ 17.637102] LR is at (null) > > [ 17.640086] pc : []lr : [<>]psr: 6013 > > [ 17.646384] sp : cfe1dd48 ip : fp : > > [ 17.651635] r10: d837e000 r9 : d833be00 r8 : > > [ 17.656887] r7 : 0001 r6 : e003d000 r5 : r4 : > > [ 17.663447] r3 : 0007 r2 : r1 : r0 : > > [ 17.670009] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment > > none > > [ 17.677180] Control: 10c5387d Table: 8fe20019 DAC: 0051 > > [ 17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval)) > > [ 17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000) > > Can you provide a full disassembly of sk_filter_trim_cap from vmlinux > (iow, annotated with its linked address) for the above dump please - > alternatively a new dump with matching disassembly. Thanks. Also probably a good idea to have bpf_jit_enable set to 2 to get a dump of the bpf program being run, which I think for your problem, you'll have to hack the kernel source to do that. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
Subject says offlist, but this isn't... On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote: > Sorry for the delay on this from my end. I noticed there was some bpf > bits land in the last net fixes pull request landed Monday so I built > a kernel with the JIT reenabled. It seems it's improved in that the > completely dead no output boot has gone but the original problem that > arrived in the merge window still persists: > > [ 17.564142] note: systemd-udevd[194] exited with preempt_count 1 > [ 17.592739] Unable to handle kernel NULL pointer dereference at > virtual address 000c > [ 17.601002] pgd = (ptrval) > [ 17.603819] [000c] *pgd= > [ 17.607487] Internal error: Oops: 805 [#10] SMP ARM > [ 17.612396] Modules linked in: > [ 17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G D > 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1 > [ 17.626056] Hardware name: Generic AM33XX (Flattened Device Tree) > [ 17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc > [ 17.637102] LR is at (null) > [ 17.640086] pc : []lr : [<>]psr: 6013 > [ 17.646384] sp : cfe1dd48 ip : fp : > [ 17.651635] r10: d837e000 r9 : d833be00 r8 : > [ 17.656887] r7 : 0001 r6 : e003d000 r5 : r4 : > [ 17.663447] r3 : 0007 r2 : r1 : r0 : > [ 17.670009] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment > none > [ 17.677180] Control: 10c5387d Table: 8fe20019 DAC: 0051 > [ 17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval)) > [ 17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000) Can you provide a full disassembly of sk_filter_trim_cap from vmlinux (iow, annotated with its linked address) for the above dump please - alternatively a new dump with matching disassembly. Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up According to speedtest.net: 13Mbps down 490kbps up
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On 07/04/2018 09:33 AM, Peter Robinson wrote: > On Tue, Jun 26, 2018 at 1:52 PM, Daniel Borkmann wrote: >> On 06/26/2018 02:23 PM, Peter Robinson wrote: >> On 06/24/2018 11:24 AM, Peter Robinson wrote: > I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite > a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3 > (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few > others, both LPAE/normal kernels. >> >> So this is arm32 right? > > Correct. > > I'm a bit out of my depth in this part of the kernel but I'm wondering > if it's known, I couldn't find anything that looked obvious on a few > mailing lists. > > Peter Hi Peter Could you provide symbolic information ? >>> >>> I passed in through scripts/decode_stacktrace.sh is that what you were >>> after: >>> >>> [8.673880] Internal error: Oops: a06 [#10] SMP ARM >>> [8.673949] ---[ end trace 049df4786ea3140a ]--- >>> [8.678754] Modules linked in: >>> [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G D >>> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1 >>> [8.678769] Hardware name: Allwinner sun8i Family >>> [8.678781] PC is at sk_filter_trim_cap () >>> [8.678790] LR is at (null) >>> [8.709463] pc : lr : psr: 6013 () >>> [8.715722] sp : c996bd60 ip : fp : >>> [8.720939] r10: ee79dc00 r9 : c12c9f80 r8 : >>> [8.726157] r7 : r6 : 0001 r5 : f1648000 r4 : >>> >>> [8.732674] r3 : 0007 r2 : r1 : r0 : >>> >>> [8.739193] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM >>> Segment user >>> [8.746318] Control: 30c5387d Table: 6e7bc880 DAC: ffe75ece >>> [8.752055] Process systemd-udevd (pid: 206, stack limit = >>> 0x(ptrval)) >>> [8.758574] Stack: (0xc996bd60 to 0xc996c000) >> >> Do you have BPF JIT enabled or disabled? Does it happen with disabled? > > Enabled, I can test with it disabled, BPF configs bits are: > CONFIG_BPF_EVENTS=y > # CONFIG_BPFILTER is not set > CONFIG_BPF_JIT_ALWAYS_ON=y > CONFIG_BPF_JIT=y > CONFIG_BPF_STREAM_PARSER=y > CONFIG_BPF_SYSCALL=y > CONFIG_BPF=y > CONFIG_CGROUP_BPF=y > CONFIG_HAVE_EBPF_JIT=y > CONFIG_IPV6_SEG6_BPF=y > CONFIG_LWTUNNEL_BPF=y > # CONFIG_NBPFAXI_DMA is not set > CONFIG_NET_ACT_BPF=m > CONFIG_NET_CLS_BPF=m > CONFIG_NETFILTER_XT_MATCH_BPF=m > # CONFIG_TEST_BPF is not set > >> I can see one bug, but your stack trace seems unrelated. >> >> Anyway, could you try with this? > > Build in process. > >> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c >> index 6e8b716..f6a62ae 100644 >> --- a/arch/arm/net/bpf_jit_32.c >> +++ b/arch/arm/net/bpf_jit_32.c >> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct >> bpf_prog *prog) >> /* there are 2 passes here */ >> bpf_jit_dump(prog->len, image_size, 2, ctx.target); >> >> - set_memory_ro((unsigned long)header, header->pages); >> + bpf_jit_binary_lock_ro(header); >> prog->bpf_func = (void *)ctx.target; >> prog->jited = 1; >> prog->jited_len = image_size; So with that and the other fix there was no improvement, with those and the BPF JIT disabled it works, I'm not sure if the two patches have any effect with the JIT disabled though. Will look at the other patches shortly, there's been some other issue introduced between rc1 and rc2 which I have to work out before I can test those though. >>> >>> Quick update, with linus's head as of yesterday, basically rc2 plus >>> davem's network fixes it works if the JIT is disabled IE: >>> # CONFIG_BPF_JIT_ALWAYS_ON is not set >>> # CONFIG_BPF_JIT is not set >>> >>> If I enable it the boot breaks even worse than the errors above in >>> that I get no console output at all, even with earlycon, so we've gone >>> backwards since rc1 somehow. >>> >>> I'll try the above two reverted unless you have any other suggestions. >> >> Ok, thanks, lets do that! >> >> I'm still working on fixes meanwhile, should have something by end of day. > > Sorry for the delay on this from my end. I noticed there was some bpf > bits land in the last net fixes pull request landed Monday so I built > a kernel with the JIT reenabled. It seems it's improved in that the > completely dead no output boot has gone but the original problem that > arrived in the merge window still persists: Okay, thanks a lot! And on top of that tree could you try with the below applied to check whether it fixes the issue? diff --git
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Tue, Jun 26, 2018 at 1:52 PM, Daniel Borkmann wrote: > On 06/26/2018 02:23 PM, Peter Robinson wrote: > On 06/24/2018 11:24 AM, Peter Robinson wrote: I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3 (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few others, both LPAE/normal kernels. > > So this is arm32 right? Correct. I'm a bit out of my depth in this part of the kernel but I'm wondering if it's known, I couldn't find anything that looked obvious on a few mailing lists. Peter >>> >>> Hi Peter >>> >>> Could you provide symbolic information ? >> >> I passed in through scripts/decode_stacktrace.sh is that what you were >> after: >> >> [8.673880] Internal error: Oops: a06 [#10] SMP ARM >> [8.673949] ---[ end trace 049df4786ea3140a ]--- >> [8.678754] Modules linked in: >> [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G D >> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1 >> [8.678769] Hardware name: Allwinner sun8i Family >> [8.678781] PC is at sk_filter_trim_cap () >> [8.678790] LR is at (null) >> [8.709463] pc : lr : psr: 6013 () >> [8.715722] sp : c996bd60 ip : fp : >> [8.720939] r10: ee79dc00 r9 : c12c9f80 r8 : >> [8.726157] r7 : r6 : 0001 r5 : f1648000 r4 : >> [8.732674] r3 : 0007 r2 : r1 : r0 : >> [8.739193] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM >> Segment user >> [8.746318] Control: 30c5387d Table: 6e7bc880 DAC: ffe75ece >> [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval)) >> [8.758574] Stack: (0xc996bd60 to 0xc996c000) > > Do you have BPF JIT enabled or disabled? Does it happen with disabled? Enabled, I can test with it disabled, BPF configs bits are: CONFIG_BPF_EVENTS=y # CONFIG_BPFILTER is not set CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_BPF_JIT=y CONFIG_BPF_STREAM_PARSER=y CONFIG_BPF_SYSCALL=y CONFIG_BPF=y CONFIG_CGROUP_BPF=y CONFIG_HAVE_EBPF_JIT=y CONFIG_IPV6_SEG6_BPF=y CONFIG_LWTUNNEL_BPF=y # CONFIG_NBPFAXI_DMA is not set CONFIG_NET_ACT_BPF=m CONFIG_NET_CLS_BPF=m CONFIG_NETFILTER_XT_MATCH_BPF=m # CONFIG_TEST_BPF is not set > I can see one bug, but your stack trace seems unrelated. > > Anyway, could you try with this? Build in process. > diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c > index 6e8b716..f6a62ae 100644 > --- a/arch/arm/net/bpf_jit_32.c > +++ b/arch/arm/net/bpf_jit_32.c > @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct > bpf_prog *prog) > /* there are 2 passes here */ > bpf_jit_dump(prog->len, image_size, 2, ctx.target); > > - set_memory_ro((unsigned long)header, header->pages); > + bpf_jit_binary_lock_ro(header); > prog->bpf_func = (void *)ctx.target; > prog->jited = 1; > prog->jited_len = image_size; >>> >>> So with that and the other fix there was no improvement, with those >>> and the BPF JIT disabled it works, I'm not sure if the two patches >>> have any effect with the JIT disabled though. >>> >>> Will look at the other patches shortly, there's been some other issue >>> introduced between rc1 and rc2 which I have to work out before I can >>> test those though. >> >> Quick update, with linus's head as of yesterday, basically rc2 plus >> davem's network fixes it works if the JIT is disabled IE: >> # CONFIG_BPF_JIT_ALWAYS_ON is not set >> # CONFIG_BPF_JIT is not set >> >> If I enable it the boot breaks even worse than the errors above in >> that I get no console output at all, even with earlycon, so we've gone >> backwards since rc1 somehow. >> >> I'll try the above two reverted unless you have any other suggestions. > > Ok, thanks, lets do that! > > I'm still working on fixes meanwhile, should have something by end of day. Sorry for the delay on this from my end. I noticed there was some bpf bits land in the last net fixes pull request landed Monday so I built a kernel with the JIT reenabled. It seems it's improved in that the completely dead no output boot has gone but the original problem that arrived in the merge window still persists: [ 17.564142] note: systemd-udevd[194] exited with preempt_count 1 [ 17.592739] Unable to handle kernel NULL pointer dereference at virtual address 000c [ 17.601002] pgd = (ptrval) [ 17.603819] [000c] *pgd= [ 17.607487] Internal error: Oops: 805 [#10] SMP ARM [ 17.612396] Modules linked in: [ 17.615484] CPU: 0
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On 06/26/2018 02:23 PM, Peter Robinson wrote: On 06/24/2018 11:24 AM, Peter Robinson wrote: >>> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite >>> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3 >>> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few >>> others, both LPAE/normal kernels. So this is arm32 right? >>> >>> Correct. >>> >>> I'm a bit out of my depth in this part of the kernel but I'm wondering >>> if it's known, I couldn't find anything that looked obvious on a few >>> mailing lists. >>> >>> Peter >> >> Hi Peter >> >> Could you provide symbolic information ? > > I passed in through scripts/decode_stacktrace.sh is that what you were > after: > > [8.673880] Internal error: Oops: a06 [#10] SMP ARM > [8.673949] ---[ end trace 049df4786ea3140a ]--- > [8.678754] Modules linked in: > [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G D > 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1 > [8.678769] Hardware name: Allwinner sun8i Family > [8.678781] PC is at sk_filter_trim_cap () > [8.678790] LR is at (null) > [8.709463] pc : lr : psr: 6013 () > [8.715722] sp : c996bd60 ip : fp : > [8.720939] r10: ee79dc00 r9 : c12c9f80 r8 : > [8.726157] r7 : r6 : 0001 r5 : f1648000 r4 : > [8.732674] r3 : 0007 r2 : r1 : r0 : > [8.739193] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment user > [8.746318] Control: 30c5387d Table: 6e7bc880 DAC: ffe75ece > [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval)) > [8.758574] Stack: (0xc996bd60 to 0xc996c000) Do you have BPF JIT enabled or disabled? Does it happen with disabled? >>> >>> Enabled, I can test with it disabled, BPF configs bits are: >>> CONFIG_BPF_EVENTS=y >>> # CONFIG_BPFILTER is not set >>> CONFIG_BPF_JIT_ALWAYS_ON=y >>> CONFIG_BPF_JIT=y >>> CONFIG_BPF_STREAM_PARSER=y >>> CONFIG_BPF_SYSCALL=y >>> CONFIG_BPF=y >>> CONFIG_CGROUP_BPF=y >>> CONFIG_HAVE_EBPF_JIT=y >>> CONFIG_IPV6_SEG6_BPF=y >>> CONFIG_LWTUNNEL_BPF=y >>> # CONFIG_NBPFAXI_DMA is not set >>> CONFIG_NET_ACT_BPF=m >>> CONFIG_NET_CLS_BPF=m >>> CONFIG_NETFILTER_XT_MATCH_BPF=m >>> # CONFIG_TEST_BPF is not set >>> I can see one bug, but your stack trace seems unrelated. Anyway, could you try with this? >>> >>> Build in process. >>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index 6e8b716..f6a62ae 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) /* there are 2 passes here */ bpf_jit_dump(prog->len, image_size, 2, ctx.target); - set_memory_ro((unsigned long)header, header->pages); + bpf_jit_binary_lock_ro(header); prog->bpf_func = (void *)ctx.target; prog->jited = 1; prog->jited_len = image_size; >> >> So with that and the other fix there was no improvement, with those >> and the BPF JIT disabled it works, I'm not sure if the two patches >> have any effect with the JIT disabled though. >> >> Will look at the other patches shortly, there's been some other issue >> introduced between rc1 and rc2 which I have to work out before I can >> test those though. > > Quick update, with linus's head as of yesterday, basically rc2 plus > davem's network fixes it works if the JIT is disabled IE: > # CONFIG_BPF_JIT_ALWAYS_ON is not set > # CONFIG_BPF_JIT is not set > > If I enable it the boot breaks even worse than the errors above in > that I get no console output at all, even with earlycon, so we've gone > backwards since rc1 somehow. > > I'll try the above two reverted unless you have any other suggestions. Ok, thanks, lets do that! I'm still working on fixes meanwhile, should have something by end of day. Thanks, Daniel
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
Hi Daniel, >>> On 06/24/2018 11:24 AM, Peter Robinson wrote: >> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite >> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3 >> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few >> others, both LPAE/normal kernels. >>> >>> So this is arm32 right? >> >> Correct. >> >> I'm a bit out of my depth in this part of the kernel but I'm wondering >> if it's known, I couldn't find anything that looked obvious on a few >> mailing lists. >> >> Peter > > Hi Peter > > Could you provide symbolic information ? I passed in through scripts/decode_stacktrace.sh is that what you were after: [8.673880] Internal error: Oops: a06 [#10] SMP ARM [8.673949] ---[ end trace 049df4786ea3140a ]--- [8.678754] Modules linked in: [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G D 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1 [8.678769] Hardware name: Allwinner sun8i Family [8.678781] PC is at sk_filter_trim_cap () [8.678790] LR is at (null) [8.709463] pc : lr : psr: 6013 () [8.715722] sp : c996bd60 ip : fp : [8.720939] r10: ee79dc00 r9 : c12c9f80 r8 : [8.726157] r7 : r6 : 0001 r5 : f1648000 r4 : [8.732674] r3 : 0007 r2 : r1 : r0 : [8.739193] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [8.746318] Control: 30c5387d Table: 6e7bc880 DAC: ffe75ece [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval)) [8.758574] Stack: (0xc996bd60 to 0xc996c000) >>> >>> Do you have BPF JIT enabled or disabled? Does it happen with disabled? >> >> Enabled, I can test with it disabled, BPF configs bits are: >> CONFIG_BPF_EVENTS=y >> # CONFIG_BPFILTER is not set >> CONFIG_BPF_JIT_ALWAYS_ON=y >> CONFIG_BPF_JIT=y >> CONFIG_BPF_STREAM_PARSER=y >> CONFIG_BPF_SYSCALL=y >> CONFIG_BPF=y >> CONFIG_CGROUP_BPF=y >> CONFIG_HAVE_EBPF_JIT=y >> CONFIG_IPV6_SEG6_BPF=y >> CONFIG_LWTUNNEL_BPF=y >> # CONFIG_NBPFAXI_DMA is not set >> CONFIG_NET_ACT_BPF=m >> CONFIG_NET_CLS_BPF=m >> CONFIG_NETFILTER_XT_MATCH_BPF=m >> # CONFIG_TEST_BPF is not set >> >>> I can see one bug, but your stack trace seems unrelated. >>> >>> Anyway, could you try with this? >> >> Build in process. >> >>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c >>> index 6e8b716..f6a62ae 100644 >>> --- a/arch/arm/net/bpf_jit_32.c >>> +++ b/arch/arm/net/bpf_jit_32.c >>> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog >>> *prog) >>> /* there are 2 passes here */ >>> bpf_jit_dump(prog->len, image_size, 2, ctx.target); >>> >>> - set_memory_ro((unsigned long)header, header->pages); >>> + bpf_jit_binary_lock_ro(header); >>> prog->bpf_func = (void *)ctx.target; >>> prog->jited = 1; >>> prog->jited_len = image_size; > > So with that and the other fix there was no improvement, with those > and the BPF JIT disabled it works, I'm not sure if the two patches > have any effect with the JIT disabled though. > > Will look at the other patches shortly, there's been some other issue > introduced between rc1 and rc2 which I have to work out before I can > test those though. Quick update, with linus's head as of yesterday, basically rc2 plus davem's network fixes it works if the JIT is disabled IE: # CONFIG_BPF_JIT_ALWAYS_ON is not set # CONFIG_BPF_JIT is not set If I enable it the boot breaks even worse than the errors above in that I get no console output at all, even with earlycon, so we've gone backwards since rc1 somehow. I'll try the above two reverted unless you have any other suggestions. Peter
Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
On Mon, Jun 25, 2018 at 2:39 PM, Peter Robinson wrote: > Hi Daniel, > >> On 06/24/2018 11:24 AM, Peter Robinson wrote: > I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite > a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3 > (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few > others, both LPAE/normal kernels. >> >> So this is arm32 right? > > Correct. > > I'm a bit out of my depth in this part of the kernel but I'm wondering > if it's known, I couldn't find anything that looked obvious on a few > mailing lists. > > Peter Hi Peter Could you provide symbolic information ? >>> >>> I passed in through scripts/decode_stacktrace.sh is that what you were >>> after: >>> >>> [8.673880] Internal error: Oops: a06 [#10] SMP ARM >>> [8.673949] ---[ end trace 049df4786ea3140a ]--- >>> [8.678754] Modules linked in: >>> [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G D >>> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1 >>> [8.678769] Hardware name: Allwinner sun8i Family >>> [8.678781] PC is at sk_filter_trim_cap () >>> [8.678790] LR is at (null) >>> [8.709463] pc : lr : psr: 6013 () >>> [8.715722] sp : c996bd60 ip : fp : >>> [8.720939] r10: ee79dc00 r9 : c12c9f80 r8 : >>> [8.726157] r7 : r6 : 0001 r5 : f1648000 r4 : >>> [8.732674] r3 : 0007 r2 : r1 : r0 : >>> [8.739193] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment >>> user >>> [8.746318] Control: 30c5387d Table: 6e7bc880 DAC: ffe75ece >>> [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval)) >>> [8.758574] Stack: (0xc996bd60 to 0xc996c000) >> >> Do you have BPF JIT enabled or disabled? Does it happen with disabled? > > Enabled, I can test with it disabled, BPF configs bits are: > CONFIG_BPF_EVENTS=y > # CONFIG_BPFILTER is not set > CONFIG_BPF_JIT_ALWAYS_ON=y > CONFIG_BPF_JIT=y > CONFIG_BPF_STREAM_PARSER=y > CONFIG_BPF_SYSCALL=y > CONFIG_BPF=y > CONFIG_CGROUP_BPF=y > CONFIG_HAVE_EBPF_JIT=y > CONFIG_IPV6_SEG6_BPF=y > CONFIG_LWTUNNEL_BPF=y > # CONFIG_NBPFAXI_DMA is not set > CONFIG_NET_ACT_BPF=m > CONFIG_NET_CLS_BPF=m > CONFIG_NETFILTER_XT_MATCH_BPF=m > # CONFIG_TEST_BPF is not set > >> I can see one bug, but your stack trace seems unrelated. >> >> Anyway, could you try with this? > > Build in process. > >> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c >> index 6e8b716..f6a62ae 100644 >> --- a/arch/arm/net/bpf_jit_32.c >> +++ b/arch/arm/net/bpf_jit_32.c >> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog >> *prog) >> /* there are 2 passes here */ >> bpf_jit_dump(prog->len, image_size, 2, ctx.target); >> >> - set_memory_ro((unsigned long)header, header->pages); >> + bpf_jit_binary_lock_ro(header); >> prog->bpf_func = (void *)ctx.target; >> prog->jited = 1; >> prog->jited_len = image_size; So with that and the other fix there was no improvement, with those and the BPF JIT disabled it works, I'm not sure if the two patches have any effect with the JIT disabled though. Will look at the other patches shortly, there's been some other issue introduced between rc1 and rc2 which I have to work out before I can test those though. Peter