Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Daniel Borkmann
On 08/17/2018 11:13 PM, Peter Robinson wrote:
> On Fri, Aug 17, 2018 at 7:30 PM, Daniel Borkmann  wrote:
>> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
>>> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
 I'd have one potential bug suspicion, for the 4.18 one you were trying,
 could you run with the below patch to see whether it would help?
>>>
>>> I think this is almost certainly the problem - looking at the history,
>>> it seems that the "-4" was assumed to be part of the scratch stuff in
>>> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
>>> but it isn't - it's because "off" of zero refers to the top word in the
>>> stack (iow at STACK_SIZE-4).
>>
>> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
>> Waiting for Peter to get back with results for definite confirmation. Your
>> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
>> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
>> ARM FP register") fixes this in mainline, so unless I'm missing something 
>> this
>> would only need a stand-alone fix for 4.18/stable which I can cook up and
>> submit then.
> 
> I can confirm that fixes the problems I was seeing on Fedora 29.
> 
> Feel free to add a tested by from me:
> 
> Tested-by: Peter Robinson 

Great, thanks everyone! Will get it out asap.


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
Hi Stefan,

>> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
>> > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>> >> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>> >> could you run with the below patch to see whether it would help?
>> >
>> > I think this is almost certainly the problem - looking at the history,
>> > it seems that the "-4" was assumed to be part of the scratch stuff in
>> > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
>> > but it isn't - it's because "off" of zero refers to the top word in the
>> > stack (iow at STACK_SIZE-4).
>>
>> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
>> Waiting for Peter to get back with results for definite confirmation. Your
>> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
>> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
>> ARM FP register") fixes this in mainline, so unless I'm missing something 
>> this
>> would only need a stand-alone fix for 4.18/stable which I can cook up and
>> submit then.
>
> i was able to reproduce this issue on RPi 3 with Linux 4.18.1 + 
> multi_v7_defconfig and the following  config changes:
>
>  --- a/arch/arm/configs/multi_v7_defconfig
> +++ b/arch/arm/configs/multi_v7_defconfig
> @@ -2,7 +2,10 @@ CONFIG_SYSVIPC=y
>  CONFIG_NO_HZ=y
>  CONFIG_HIGH_RES_TIMERS=y
>  CONFIG_CGROUPS=y
> +CONFIG_CGROUP_BPF=y
>  CONFIG_BLK_DEV_INITRD=y
> +CONFIG_BPF_SYSCALL=y
> +CONFIG_BPF_JIT_ALWAYS_ON=y
>  CONFIG_EMBEDDED=y
>  CONFIG_PERF_EVENTS=y
>  CONFIG_MODULES=y
> @@ -153,6 +156,8 @@ CONFIG_IPV6_MIP6=m
>  CONFIG_IPV6_TUNNEL=m
>  CONFIG_IPV6_MULTIPLE_TABLES=y
>  CONFIG_NET_DSA=m
> +CONFIG_BPF_JIT=y
> +CONFIG_BPF_STREAM_PARSER=y
>  CONFIG_CAN=y
>  CONFIG_CAN_AT91=m
>  CONFIG_CAN_FLEXCAN=m
>
> After applying the "-4" patch the oopses doesn't appear during boot anymore.

Would be fab to get that into the kernel so this is widely tested
moving forward.

Peter


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Fri, Aug 17, 2018 at 7:30 PM, Daniel Borkmann  wrote:
> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
>> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>>> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>>> could you run with the below patch to see whether it would help?
>>
>> I think this is almost certainly the problem - looking at the history,
>> it seems that the "-4" was assumed to be part of the scratch stuff in
>> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
>> but it isn't - it's because "off" of zero refers to the top word in the
>> stack (iow at STACK_SIZE-4).
>
> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
> Waiting for Peter to get back with results for definite confirmation. Your
> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
> ARM FP register") fixes this in mainline, so unless I'm missing something this
> would only need a stand-alone fix for 4.18/stable which I can cook up and
> submit then.

I can confirm that fixes the problems I was seeing on Fedora 29.

Feel free to add a tested by from me:

Tested-by: Peter Robinson 


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Fri, Aug 17, 2018 at 5:17 PM, Russell King - ARM Linux
 wrote:
> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>> could you run with the below patch to see whether it would help?
>
> I think this is almost certainly the problem - looking at the history,
> it seems that the "-4" was assumed to be part of the scratch stuff in
> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
> but it isn't - it's because "off" of zero refers to the top word in the
> stack (iow at STACK_SIZE-4).

I can confirm that patch fixes the problem I was seeing.

Peter


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Stefan Wahren
Hi Daniel,

> Daniel Borkmann  hat am 17. August 2018 um 20:30 
> geschrieben:
> 
> 
> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
> > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
> >> I'd have one potential bug suspicion, for the 4.18 one you were trying,
> >> could you run with the below patch to see whether it would help?
> > 
> > I think this is almost certainly the problem - looking at the history,
> > it seems that the "-4" was assumed to be part of the scratch stuff in
> > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
> > but it isn't - it's because "off" of zero refers to the top word in the
> > stack (iow at STACK_SIZE-4).
> 
> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
> Waiting for Peter to get back with results for definite confirmation. Your
> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
> ARM FP register") fixes this in mainline, so unless I'm missing something this
> would only need a stand-alone fix for 4.18/stable which I can cook up and
> submit then.

i was able to reproduce this issue on RPi 3 with Linux 4.18.1 + 
multi_v7_defconfig and the following  config changes:

 --- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -2,7 +2,10 @@ CONFIG_SYSVIPC=y
 CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_CGROUPS=y
+CONFIG_CGROUP_BPF=y
 CONFIG_BLK_DEV_INITRD=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT_ALWAYS_ON=y
 CONFIG_EMBEDDED=y
 CONFIG_PERF_EVENTS=y
 CONFIG_MODULES=y
@@ -153,6 +156,8 @@ CONFIG_IPV6_MIP6=m
 CONFIG_IPV6_TUNNEL=m
 CONFIG_IPV6_MULTIPLE_TABLES=y
 CONFIG_NET_DSA=m
+CONFIG_BPF_JIT=y
+CONFIG_BPF_STREAM_PARSER=y
 CONFIG_CAN=y
 CONFIG_CAN_AT91=m
 CONFIG_CAN_FLEXCAN=m

After applying the "-4" patch the oopses doesn't appear during boot anymore.

Stefan

> 
> Thanks,
> Daniel
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Daniel Borkmann
On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>> could you run with the below patch to see whether it would help?
> 
> I think this is almost certainly the problem - looking at the history,
> it seems that the "-4" was assumed to be part of the scratch stuff in
> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
> but it isn't - it's because "off" of zero refers to the top word in the
> stack (iow at STACK_SIZE-4).

Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
Waiting for Peter to get back with results for definite confirmation. Your
rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
ARM FP register") fixes this in mainline, so unless I'm missing something this
would only need a stand-alone fix for 4.18/stable which I can cook up and
submit then.

Thanks,
Daniel


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Russell King - ARM Linux
On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
> I'd have one potential bug suspicion, for the 4.18 one you were trying,
> could you run with the below patch to see whether it would help?

I think this is almost certainly the problem - looking at the history,
it seems that the "-4" was assumed to be part of the scratch stuff in
commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
but it isn't - it's because "off" of zero refers to the top word in the
stack (iow at STACK_SIZE-4).

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Fri, Aug 17, 2018 at 1:40 PM, Daniel Borkmann  wrote:
> On 08/17/2018 02:25 PM, Peter Robinson wrote:
>> On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux
>>  wrote:
>>> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote:
 On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
> So with that and the other fix there was no improvement, with those
> and the BPF JIT disabled it works, I'm not sure if the two patches
> have any effect with the JIT disabled though.

 I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
 also confirm that disabling BPF JIT makes the Banana Pi work again.,
>>>
>>> I'm afraid that the information in the crash dumps is insufficient
>>> to be able to work very much out about these crashes.
>>>
>>> We need a recipe (kernel configuration and what userspace is doing)
>>> so that it's possible to recreate the crash, or we need responses
>>> to requests for information - I requested the disassembly of
>>> sk_filter_trim_cap and the BPF code dump via setting a sysctl back
>>> in early July.  Without this, as I say, I don't see how this problem
>>> can be progressed.
>>
>> I can provide a kernel config [1] but I've not had enough time to sit
>> down and get the rest of the stuff and debug it due to a combination
>> of travel and other priorities.
>
> Did you get a chance to try latest kernel from Linus' tree [1] from last
> few days to see whether the issue is still persistent? There have been
> a number of improvements, bit strange why e.g. Russell didn't run into
> it while others have, hmm. Perhaps due to EABI vs non EABI.

I haven't had a chance to try anything from the 4.19 merge window as
yet, I'm traveling this week so it was on the list for next week to
try.

> [1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
>>> If the problem is at boot, one way to set the sysctl would be to
>>> hack the kernel and explicitly initialise the sysctl to '2', or
>>> boot with init=/bin/sh, then manually mount /proc, set the sysctl,
>>> and then "exec /sbin/init" from that shell.  (Remember there's no
>>> job control in that shell, so ^z, ^c, etc do not work.)
>>
>> It starts to happen in the early kernel boot long before we get to any
>> userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and
>> AllWinner H3 based devices at least).
>>
>> [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config
>
> I'd have one potential bug suspicion, for the 4.18 one you were trying,
> could you run with the below patch to see whether it would help?

I will try and get someone to test that today, thanks

> Thanks,
> Daniel
>
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index f6a62ae..c864f6b 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size)
>  #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
>
>  /* Get the offset of eBPF REGISTERs stored on scratch space. */
> -#define STACK_VAR(off) (STACK_SIZE - off)
> +#define STACK_VAR(off) (STACK_SIZE - off - 4)
>
>  #if __LINUX_ARM_ARCH__ < 7
>


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Daniel Borkmann
On 08/17/2018 02:25 PM, Peter Robinson wrote:
> On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux
>  wrote:
>> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote:
>>> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
 So with that and the other fix there was no improvement, with those
 and the BPF JIT disabled it works, I'm not sure if the two patches
 have any effect with the JIT disabled though.
>>>
>>> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
>>> also confirm that disabling BPF JIT makes the Banana Pi work again.,
>>
>> I'm afraid that the information in the crash dumps is insufficient
>> to be able to work very much out about these crashes.
>>
>> We need a recipe (kernel configuration and what userspace is doing)
>> so that it's possible to recreate the crash, or we need responses
>> to requests for information - I requested the disassembly of
>> sk_filter_trim_cap and the BPF code dump via setting a sysctl back
>> in early July.  Without this, as I say, I don't see how this problem
>> can be progressed.
> 
> I can provide a kernel config [1] but I've not had enough time to sit
> down and get the rest of the stuff and debug it due to a combination
> of travel and other priorities.

Did you get a chance to try latest kernel from Linus' tree [1] from last
few days to see whether the issue is still persistent? There have been
a number of improvements, bit strange why e.g. Russell didn't run into
it while others have, hmm. Perhaps due to EABI vs non EABI.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

>> If the problem is at boot, one way to set the sysctl would be to
>> hack the kernel and explicitly initialise the sysctl to '2', or
>> boot with init=/bin/sh, then manually mount /proc, set the sysctl,
>> and then "exec /sbin/init" from that shell.  (Remember there's no
>> job control in that shell, so ^z, ^c, etc do not work.)
> 
> It starts to happen in the early kernel boot long before we get to any
> userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and
> AllWinner H3 based devices at least).
> 
> [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config

I'd have one potential bug suspicion, for the 4.18 one you were trying,
could you run with the below patch to see whether it would help?

Thanks,
Daniel

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f6a62ae..c864f6b 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size)
 #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)

 /* Get the offset of eBPF REGISTERs stored on scratch space. */
-#define STACK_VAR(off) (STACK_SIZE - off)
+#define STACK_VAR(off) (STACK_SIZE - off - 4)

 #if __LINUX_ARM_ARCH__ < 7



Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux
 wrote:
> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote:
>> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
>> > So with that and the other fix there was no improvement, with those
>> > and the BPF JIT disabled it works, I'm not sure if the two patches
>> > have any effect with the JIT disabled though.
>>
>> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
>> also confirm that disabling BPF JIT makes the Banana Pi work again.,
>
> Hi,
>
> I'm afraid that the information in the crash dumps is insufficient
> to be able to work very much out about these crashes.
>
> We need a recipe (kernel configuration and what userspace is doing)
> so that it's possible to recreate the crash, or we need responses
> to requests for information - I requested the disassembly of
> sk_filter_trim_cap and the BPF code dump via setting a sysctl back
> in early July.  Without this, as I say, I don't see how this problem
> can be progressed.

I can provide a kernel config [1] but I've not had enough time to sit
down and get the rest of the stuff and debug it due to a combination
of travel and other priorities.

> If the problem is at boot, one way to set the sysctl would be to
> hack the kernel and explicitly initialise the sysctl to '2', or
> boot with init=/bin/sh, then manually mount /proc, set the sysctl,
> and then "exec /sbin/init" from that shell.  (Remember there's no
> job control in that shell, so ^z, ^c, etc do not work.)

It starts to happen in the early kernel boot long before we get to any
userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and
AllWinner H3 based devices at least).

[1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-16 Thread Russell King - ARM Linux
On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote:
> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
> > So with that and the other fix there was no improvement, with those
> > and the BPF JIT disabled it works, I'm not sure if the two patches
> > have any effect with the JIT disabled though.
> 
> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
> also confirm that disabling BPF JIT makes the Banana Pi work again.,

Hi,

I'm afraid that the information in the crash dumps is insufficient
to be able to work very much out about these crashes.

We need a recipe (kernel configuration and what userspace is doing)
so that it's possible to recreate the crash, or we need responses
to requests for information - I requested the disassembly of
sk_filter_trim_cap and the BPF code dump via setting a sysctl back
in early July.  Without this, as I say, I don't see how this problem
can be progressed.

If the problem is at boot, one way to set the sysctl would be to
hack the kernel and explicitly initialise the sysctl to '2', or
boot with init=/bin/sh, then manually mount /proc, set the sysctl,
and then "exec /sbin/init" from that shell.  (Remember there's no
job control in that shell, so ^z, ^c, etc do not work.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-16 Thread Marc Haber
On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
> So with that and the other fix there was no improvement, with those
> and the BPF JIT disabled it works, I'm not sure if the two patches
> have any effect with the JIT disabled though.

I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
also confirm that disabling BPF JIT makes the Banana Pi work again.,

Greetings
Marc

[0.004930] /cpus/cpu@0 missing clock-frequency property
[0.004965] /cpus/cpu@1 missing clock-frequency property
[4.959858] zswap: default zpool zbud not available
[4.964820] zswap: pool creation failed
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
  WARNING: Failed to connect to lvmetad. Falling back to device scanning.
[   10.721077] Unable to handle kernel NULL pointer dereference at virtual 
address 000c
[   10.722949] Unable to handle kernel NULL pointer dereference at virtual 
address 000c
[   10.729288] pgd = (ptrval)
[   10.729299] [000c] *pgd=6dc65003, *pmd=
[   10.737464] pgd = (ptrval)
[   10.740176] Internal error: Oops: a06 [#1] SMP ARM
[   10.745056] [000c] *pgd=6e72a003
[   10.747742] Modules linked in: ip_tables x_tables autofs4 btrfs
[   10.752561] , *pmd=
[   10.756113]  libcrc32c crc32c_generic xor zstd_decompress zstd_compress 
xxhash
[   10.764833]  zlib_deflate raid6_pq dm_mod dax axp20x_regulator realtek 
ahci_sunxi dwmac_sunxi stmmac_platform libahci_platform stmmac i2c_mv64xxx 
libahci libata scsi_mod ohci_platform ohci_hcd ehci_platform ehci_hcd 
phy_sun4i_usb sunxi_mmc
[   10.793306] CPU: 1 PID: 238 Comm: systemd-udevd Not tainted 
4.18.1-zgbpi-armmp-lpae #3
[   10.801212] Hardware name: Allwinner sun7i (A20) Family
[   10.806448] PC is at sk_filter_trim_cap+0xa0/0x1d4
[   10.811238] LR is at   (null)
[   10.814205] pc : []lr : [<>]psr: 600f0013
[   10.820466] sp : edc7dcf8  ip :   fp : edc7dd34
[   10.825686] r10:   r9 :   r8 : 
[   10.830907] r7 : 0001  r6 : f0e96000  r5 : c0e04cc8  r4 : 
[   10.837428] r3 : 0007  r2 : fb5e2d70  r1 :   r0 : 
[   10.843952] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   10.851081] Control: 30c5387d  Table: 6e6c7580  DAC: 2c983336
[   10.856822] Process systemd-udevd (pid: 238, stack limit = 0x(ptrval))
[   10.863344] Stack: (0xedc7dcf8 to 0xedc7e000)
[   10.867700] dce0:   
edc7dd1c edc7dd08
[   10.875873] dd00: c06a41dc c06a4048 ee7d39c0 fb5e2d70 ee479800 ee6c2400 
edc33840 c0e6aac0
[   10.884046] dd20:  0001 edc7dd8c edc7dd38 c0705884 c06de2f4 
edc7de24 0001
[   10.892219] dd40: c0ec649c ee479864   ee7d39c0  
 0002
[   10.900391] dd60:  edc7df44 c0e04cc8 ee7d39c0 ee6c2400  
008c 0002
[   10.908565] dd80: edc7ddf4 edc7dd90 c0705ee0 c0705610 006000c0  
 fb5e2d70
[   10.916737] dda0: 0008   ef357c80  00ee 
 
[   10.924910] ddc0:  fb5e2d70 008c edc7df44 eef08700 0040 
 eef08700
[   10.933083] dde0:  edc7dedc edc7de0c edc7ddf8 c069b948 c0705b78 
edc7df44 c0e04cc8
[   10.941256] de00: edc7df2c edc7de10 c069c2f8 c069b910 c0e04cc8 edc7dec0 
 be8dcfac
[   10.949428] de20: 0028 0186a660 0064 bf387954 edc7df48 be8dcf80 
 
[   10.957602] de40: be8dcf80 b6f19ce8 0128 4028 b6e01346  
000e 0010
[   10.965774] de60:  0002     
be8dcf80 
[   10.973948] de80: b6f19ce8   fb5e2d70 edc7deb4 e000 
 c0e04cc8
[   10.982120] dea0: 0128 c0201204  0080 edc7df6c edc7dec0 
c02f5e2c c02f5c18
[   10.990293] dec0:  fb5e2d70 edc7def4 a0010013 c9f1e000 c03f986c 
edc7df50 
[   10.998466] dee0: 000e 4000 edc7df3c fb5e2d70 c0409c98 c0409d34 
edc7df14 fb5e2d70
[   11.006639] df00: c0409d34 c0e04cc8 be8dcf80  eef08700 c0201204 
edc7c000 0128
[   11.014812] df20: edc7df94 edc7df30 c069d818 c069c0a0   
c0e04cc8 
[   11.022984] df40: fff7 edc7de5c 000c 0001   
edc7de2c 
[   11.031156] df60: edc7df7c   0040  fb5e2d70 
be8dcf80 b6f19ce8
[   11.039329] df80: 01878670 0128 edc7dfa4 edc7df98 c069d870 c069d7c4 
 edc7dfa8
[   11.047502] dfa0: c02011cc c069d860 be8dcf80 b6f19ce8 000e be8dcf80 
 
[   11.055675] dfc0: be8dcf80 b6f19ce8 01878670 0128  0064 
01878e80 
[   11.063848] dfe0: 0128 be8dcf50 b6e003e3 b6e01346 200f0030 000e 
 
[   11.072038] [] (sk_filter_trim_cap) from [] 
(netlink_broadcast_filtered+0x280/0x460)
[   11.081517] [] (netlink_broadcast_filtered) from [] 
(netlink_sendmsg+0x374/0x3b0)
[   11.090734] [] (netlink_sendmsg) from 

Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-07-05 Thread Daniel Borkmann
On 07/05/2018 09:31 AM, Russell King - ARM Linux wrote:
> On Thu, Jul 05, 2018 at 12:41:54AM +0100, Russell King - ARM Linux wrote:
>> Subject says offlist, but this isn't...
>>
>> On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote:
>>> Sorry for the delay on this from my end. I noticed there was some bpf
>>> bits land in the last net fixes pull request landed Monday so I built
>>> a kernel with the JIT reenabled. It seems it's improved in that the
>>> completely dead no output boot has gone but the original problem that
>>> arrived in the merge window still persists:
>>>
>>> [   17.564142] note: systemd-udevd[194] exited with preempt_count 1
>>> [   17.592739] Unable to handle kernel NULL pointer dereference at
>>> virtual address 000c
>>> [   17.601002] pgd = (ptrval)
>>> [   17.603819] [000c] *pgd=
>>> [   17.607487] Internal error: Oops: 805 [#10] SMP ARM
>>> [   17.612396] Modules linked in:
>>> [   17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G  D
>>> 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1
>>> [   17.626056] Hardware name: Generic AM33XX (Flattened Device Tree)
>>> [   17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc
>>> [   17.637102] LR is at   (null)
>>> [   17.640086] pc : []lr : [<>]psr: 6013
>>> [   17.646384] sp : cfe1dd48  ip :   fp : 
>>> [   17.651635] r10: d837e000  r9 : d833be00  r8 : 
>>> [   17.656887] r7 : 0001  r6 : e003d000  r5 :   r4 : 
>>> [   17.663447] r3 : 0007  r2 :   r1 :   r0 : 
>>> [   17.670009] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
>>> none
>>> [   17.677180] Control: 10c5387d  Table: 8fe20019  DAC: 0051
>>> [   17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval))
>>> [   17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000)
>>
>> Can you provide a full disassembly of sk_filter_trim_cap from vmlinux
>> (iow, annotated with its linked address) for the above dump please -
>> alternatively a new dump with matching disassembly.  Thanks.
> 
> Also probably a good idea to have bpf_jit_enable set to 2 to get a
> dump of the bpf program being run, which I think for your problem,
> you'll have to hack the kernel source to do that.

Agree, that would be good as well. You could use something like the below
to bail out to interpreter after JIT did the dump.

Dump will then land in kernel log which you could paste here.

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f6a62ae..d6a7dfd 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1844,6 +1844,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*prog)
/* there are 2 passes here */
bpf_jit_dump(prog->len, image_size, 2, ctx.target);

+   /* Defer to interpreter after dump. */
+   if (1) {
+   bpf_jit_binary_free(header);
+   prog = orig_prog;
+   goto out_imms;
+   }
+
bpf_jit_binary_lock_ro(header);
prog->bpf_func = (void *)ctx.target;
prog->jited = 1;


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-07-05 Thread Russell King - ARM Linux
On Thu, Jul 05, 2018 at 12:41:54AM +0100, Russell King - ARM Linux wrote:
> Subject says offlist, but this isn't...
> 
> On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote:
> > Sorry for the delay on this from my end. I noticed there was some bpf
> > bits land in the last net fixes pull request landed Monday so I built
> > a kernel with the JIT reenabled. It seems it's improved in that the
> > completely dead no output boot has gone but the original problem that
> > arrived in the merge window still persists:
> > 
> > [   17.564142] note: systemd-udevd[194] exited with preempt_count 1
> > [   17.592739] Unable to handle kernel NULL pointer dereference at
> > virtual address 000c
> > [   17.601002] pgd = (ptrval)
> > [   17.603819] [000c] *pgd=
> > [   17.607487] Internal error: Oops: 805 [#10] SMP ARM
> > [   17.612396] Modules linked in:
> > [   17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G  D
> > 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1
> > [   17.626056] Hardware name: Generic AM33XX (Flattened Device Tree)
> > [   17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc
> > [   17.637102] LR is at   (null)
> > [   17.640086] pc : []lr : [<>]psr: 6013
> > [   17.646384] sp : cfe1dd48  ip :   fp : 
> > [   17.651635] r10: d837e000  r9 : d833be00  r8 : 
> > [   17.656887] r7 : 0001  r6 : e003d000  r5 :   r4 : 
> > [   17.663447] r3 : 0007  r2 :   r1 :   r0 : 
> > [   17.670009] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> > none
> > [   17.677180] Control: 10c5387d  Table: 8fe20019  DAC: 0051
> > [   17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval))
> > [   17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000)
> 
> Can you provide a full disassembly of sk_filter_trim_cap from vmlinux
> (iow, annotated with its linked address) for the above dump please -
> alternatively a new dump with matching disassembly.  Thanks.

Also probably a good idea to have bpf_jit_enable set to 2 to get a
dump of the bpf program being run, which I think for your problem,
you'll have to hack the kernel source to do that.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-07-04 Thread Russell King - ARM Linux
Subject says offlist, but this isn't...

On Wed, Jul 04, 2018 at 08:33:20AM +0100, Peter Robinson wrote:
> Sorry for the delay on this from my end. I noticed there was some bpf
> bits land in the last net fixes pull request landed Monday so I built
> a kernel with the JIT reenabled. It seems it's improved in that the
> completely dead no output boot has gone but the original problem that
> arrived in the merge window still persists:
> 
> [   17.564142] note: systemd-udevd[194] exited with preempt_count 1
> [   17.592739] Unable to handle kernel NULL pointer dereference at
> virtual address 000c
> [   17.601002] pgd = (ptrval)
> [   17.603819] [000c] *pgd=
> [   17.607487] Internal error: Oops: 805 [#10] SMP ARM
> [   17.612396] Modules linked in:
> [   17.615484] CPU: 0 PID: 195 Comm: systemd-udevd Tainted: G  D
> 4.18.0-0.rc3.git1.1.bpf1.fc29.armv7hl #1
> [   17.626056] Hardware name: Generic AM33XX (Flattened Device Tree)
> [   17.632198] PC is at sk_filter_trim_cap+0x218/0x2fc
> [   17.637102] LR is at   (null)
> [   17.640086] pc : []lr : [<>]psr: 6013
> [   17.646384] sp : cfe1dd48  ip :   fp : 
> [   17.651635] r10: d837e000  r9 : d833be00  r8 : 
> [   17.656887] r7 : 0001  r6 : e003d000  r5 :   r4 : 
> [   17.663447] r3 : 0007  r2 :   r1 :   r0 : 
> [   17.670009] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> none
> [   17.677180] Control: 10c5387d  Table: 8fe20019  DAC: 0051
> [   17.682956] Process systemd-udevd (pid: 195, stack limit = 0x(ptrval))
> [   17.689518] Stack: (0xcfe1dd48 to 0xcfe1e000)

Can you provide a full disassembly of sk_filter_trim_cap from vmlinux
(iow, annotated with its linked address) for the above dump please -
alternatively a new dump with matching disassembly.  Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-07-04 Thread Daniel Borkmann
On 07/04/2018 09:33 AM, Peter Robinson wrote:
> On Tue, Jun 26, 2018 at 1:52 PM, Daniel Borkmann  wrote:
>> On 06/26/2018 02:23 PM, Peter Robinson wrote:
>> On 06/24/2018 11:24 AM, Peter Robinson wrote:
> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite
> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3
> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few
> others, both LPAE/normal kernels.
>>
>> So this is arm32 right?
>
> Correct.
>
> I'm a bit out of my depth in this part of the kernel but I'm wondering
> if it's known, I couldn't find anything that looked obvious on a few
> mailing lists.
>
> Peter

 Hi Peter

 Could you provide symbolic information ?
>>>
>>> I passed in through scripts/decode_stacktrace.sh is that what you were 
>>> after:
>>>
>>> [8.673880] Internal error: Oops: a06 [#10] SMP ARM
>>> [8.673949] ---[ end trace 049df4786ea3140a ]---
>>> [8.678754] Modules linked in:
>>> [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G  D
>>> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1
>>> [8.678769] Hardware name: Allwinner sun8i Family
>>> [8.678781] PC is at sk_filter_trim_cap ()
>>> [8.678790] LR is at   (null)
>>> [8.709463] pc : lr : psr: 6013 ()
>>> [8.715722] sp : c996bd60  ip :   fp : 
>>> [8.720939] r10: ee79dc00  r9 : c12c9f80  r8 : 
>>> [8.726157] r7 :   r6 : 0001  r5 : f1648000  r4 : 
>>> 
>>> [8.732674] r3 : 0007  r2 :   r1 :   r0 : 
>>> 
>>> [8.739193] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
>>> Segment user
>>> [8.746318] Control: 30c5387d  Table: 6e7bc880  DAC: ffe75ece
>>> [8.752055] Process systemd-udevd (pid: 206, stack limit = 
>>> 0x(ptrval))
>>> [8.758574] Stack: (0xc996bd60 to 0xc996c000)
>>
>> Do you have BPF JIT enabled or disabled? Does it happen with disabled?
>
> Enabled, I can test with it disabled, BPF configs bits are:
> CONFIG_BPF_EVENTS=y
> # CONFIG_BPFILTER is not set
> CONFIG_BPF_JIT_ALWAYS_ON=y
> CONFIG_BPF_JIT=y
> CONFIG_BPF_STREAM_PARSER=y
> CONFIG_BPF_SYSCALL=y
> CONFIG_BPF=y
> CONFIG_CGROUP_BPF=y
> CONFIG_HAVE_EBPF_JIT=y
> CONFIG_IPV6_SEG6_BPF=y
> CONFIG_LWTUNNEL_BPF=y
> # CONFIG_NBPFAXI_DMA is not set
> CONFIG_NET_ACT_BPF=m
> CONFIG_NET_CLS_BPF=m
> CONFIG_NETFILTER_XT_MATCH_BPF=m
> # CONFIG_TEST_BPF is not set
>
>> I can see one bug, but your stack trace seems unrelated.
>>
>> Anyway, could you try with this?
>
> Build in process.
>
>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>> index 6e8b716..f6a62ae 100644
>> --- a/arch/arm/net/bpf_jit_32.c
>> +++ b/arch/arm/net/bpf_jit_32.c
>> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct 
>> bpf_prog *prog)
>> /* there are 2 passes here */
>> bpf_jit_dump(prog->len, image_size, 2, ctx.target);
>>
>> -   set_memory_ro((unsigned long)header, header->pages);
>> +   bpf_jit_binary_lock_ro(header);
>> prog->bpf_func = (void *)ctx.target;
>> prog->jited = 1;
>> prog->jited_len = image_size;

 So with that and the other fix there was no improvement, with those
 and the BPF JIT disabled it works, I'm not sure if the two patches
 have any effect with the JIT disabled though.

 Will look at the other patches shortly, there's been some other issue
 introduced between rc1 and rc2 which I have to work out before I can
 test those though.
>>>
>>> Quick update, with linus's head as of yesterday, basically rc2 plus
>>> davem's network fixes it works if the JIT is disabled IE:
>>> # CONFIG_BPF_JIT_ALWAYS_ON is not set
>>> # CONFIG_BPF_JIT is not set
>>>
>>> If I enable it the boot breaks even worse than the errors above in
>>> that I get no console output at all, even with earlycon, so we've gone
>>> backwards since rc1 somehow.
>>>
>>> I'll try the above two reverted unless you have any other suggestions.
>>
>> Ok, thanks, lets do that!
>>
>> I'm still working on fixes meanwhile, should have something by end of day.
> 
> Sorry for the delay on this from my end. I noticed there was some bpf
> bits land in the last net fixes pull request landed Monday so I built
> a kernel with the JIT reenabled. It seems it's improved in that the
> completely dead no output boot has gone but the original problem that
> arrived in the merge window still persists:

Okay, thanks a lot! And on top of that tree could you try with the below
applied to check whether it fixes the issue?

diff --git 

Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-07-04 Thread Peter Robinson
On Tue, Jun 26, 2018 at 1:52 PM, Daniel Borkmann  wrote:
> On 06/26/2018 02:23 PM, Peter Robinson wrote:
> On 06/24/2018 11:24 AM, Peter Robinson wrote:
 I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite
 a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3
 (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few
 others, both LPAE/normal kernels.
>
> So this is arm32 right?

 Correct.

 I'm a bit out of my depth in this part of the kernel but I'm wondering
 if it's known, I couldn't find anything that looked obvious on a few
 mailing lists.

 Peter
>>>
>>> Hi Peter
>>>
>>> Could you provide symbolic information ?
>>
>> I passed in through scripts/decode_stacktrace.sh is that what you were 
>> after:
>>
>> [8.673880] Internal error: Oops: a06 [#10] SMP ARM
>> [8.673949] ---[ end trace 049df4786ea3140a ]---
>> [8.678754] Modules linked in:
>> [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G  D
>> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1
>> [8.678769] Hardware name: Allwinner sun8i Family
>> [8.678781] PC is at sk_filter_trim_cap ()
>> [8.678790] LR is at   (null)
>> [8.709463] pc : lr : psr: 6013 ()
>> [8.715722] sp : c996bd60  ip :   fp : 
>> [8.720939] r10: ee79dc00  r9 : c12c9f80  r8 : 
>> [8.726157] r7 :   r6 : 0001  r5 : f1648000  r4 : 
>> [8.732674] r3 : 0007  r2 :   r1 :   r0 : 
>> [8.739193] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
>> Segment user
>> [8.746318] Control: 30c5387d  Table: 6e7bc880  DAC: ffe75ece
>> [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval))
>> [8.758574] Stack: (0xc996bd60 to 0xc996c000)
>
> Do you have BPF JIT enabled or disabled? Does it happen with disabled?

 Enabled, I can test with it disabled, BPF configs bits are:
 CONFIG_BPF_EVENTS=y
 # CONFIG_BPFILTER is not set
 CONFIG_BPF_JIT_ALWAYS_ON=y
 CONFIG_BPF_JIT=y
 CONFIG_BPF_STREAM_PARSER=y
 CONFIG_BPF_SYSCALL=y
 CONFIG_BPF=y
 CONFIG_CGROUP_BPF=y
 CONFIG_HAVE_EBPF_JIT=y
 CONFIG_IPV6_SEG6_BPF=y
 CONFIG_LWTUNNEL_BPF=y
 # CONFIG_NBPFAXI_DMA is not set
 CONFIG_NET_ACT_BPF=m
 CONFIG_NET_CLS_BPF=m
 CONFIG_NETFILTER_XT_MATCH_BPF=m
 # CONFIG_TEST_BPF is not set

> I can see one bug, but your stack trace seems unrelated.
>
> Anyway, could you try with this?

 Build in process.

> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index 6e8b716..f6a62ae 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct 
> bpf_prog *prog)
> /* there are 2 passes here */
> bpf_jit_dump(prog->len, image_size, 2, ctx.target);
>
> -   set_memory_ro((unsigned long)header, header->pages);
> +   bpf_jit_binary_lock_ro(header);
> prog->bpf_func = (void *)ctx.target;
> prog->jited = 1;
> prog->jited_len = image_size;
>>>
>>> So with that and the other fix there was no improvement, with those
>>> and the BPF JIT disabled it works, I'm not sure if the two patches
>>> have any effect with the JIT disabled though.
>>>
>>> Will look at the other patches shortly, there's been some other issue
>>> introduced between rc1 and rc2 which I have to work out before I can
>>> test those though.
>>
>> Quick update, with linus's head as of yesterday, basically rc2 plus
>> davem's network fixes it works if the JIT is disabled IE:
>> # CONFIG_BPF_JIT_ALWAYS_ON is not set
>> # CONFIG_BPF_JIT is not set
>>
>> If I enable it the boot breaks even worse than the errors above in
>> that I get no console output at all, even with earlycon, so we've gone
>> backwards since rc1 somehow.
>>
>> I'll try the above two reverted unless you have any other suggestions.
>
> Ok, thanks, lets do that!
>
> I'm still working on fixes meanwhile, should have something by end of day.

Sorry for the delay on this from my end. I noticed there was some bpf
bits land in the last net fixes pull request landed Monday so I built
a kernel with the JIT reenabled. It seems it's improved in that the
completely dead no output boot has gone but the original problem that
arrived in the merge window still persists:

[   17.564142] note: systemd-udevd[194] exited with preempt_count 1
[   17.592739] Unable to handle kernel NULL pointer dereference at
virtual address 000c
[   17.601002] pgd = (ptrval)
[   17.603819] [000c] *pgd=
[   17.607487] Internal error: Oops: 805 [#10] SMP ARM
[   17.612396] Modules linked in:
[   17.615484] CPU: 0 

Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-06-26 Thread Daniel Borkmann
On 06/26/2018 02:23 PM, Peter Robinson wrote:
 On 06/24/2018 11:24 AM, Peter Robinson wrote:
>>> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite
>>> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3
>>> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few
>>> others, both LPAE/normal kernels.

 So this is arm32 right?
>>>
>>> Correct.
>>>
>>> I'm a bit out of my depth in this part of the kernel but I'm wondering
>>> if it's known, I couldn't find anything that looked obvious on a few
>>> mailing lists.
>>>
>>> Peter
>>
>> Hi Peter
>>
>> Could you provide symbolic information ?
>
> I passed in through scripts/decode_stacktrace.sh is that what you were 
> after:
>
> [8.673880] Internal error: Oops: a06 [#10] SMP ARM
> [8.673949] ---[ end trace 049df4786ea3140a ]---
> [8.678754] Modules linked in:
> [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G  D
> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1
> [8.678769] Hardware name: Allwinner sun8i Family
> [8.678781] PC is at sk_filter_trim_cap ()
> [8.678790] LR is at   (null)
> [8.709463] pc : lr : psr: 6013 ()
> [8.715722] sp : c996bd60  ip :   fp : 
> [8.720939] r10: ee79dc00  r9 : c12c9f80  r8 : 
> [8.726157] r7 :   r6 : 0001  r5 : f1648000  r4 : 
> [8.732674] r3 : 0007  r2 :   r1 :   r0 : 
> [8.739193] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
> Segment user
> [8.746318] Control: 30c5387d  Table: 6e7bc880  DAC: ffe75ece
> [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval))
> [8.758574] Stack: (0xc996bd60 to 0xc996c000)

 Do you have BPF JIT enabled or disabled? Does it happen with disabled?
>>>
>>> Enabled, I can test with it disabled, BPF configs bits are:
>>> CONFIG_BPF_EVENTS=y
>>> # CONFIG_BPFILTER is not set
>>> CONFIG_BPF_JIT_ALWAYS_ON=y
>>> CONFIG_BPF_JIT=y
>>> CONFIG_BPF_STREAM_PARSER=y
>>> CONFIG_BPF_SYSCALL=y
>>> CONFIG_BPF=y
>>> CONFIG_CGROUP_BPF=y
>>> CONFIG_HAVE_EBPF_JIT=y
>>> CONFIG_IPV6_SEG6_BPF=y
>>> CONFIG_LWTUNNEL_BPF=y
>>> # CONFIG_NBPFAXI_DMA is not set
>>> CONFIG_NET_ACT_BPF=m
>>> CONFIG_NET_CLS_BPF=m
>>> CONFIG_NETFILTER_XT_MATCH_BPF=m
>>> # CONFIG_TEST_BPF is not set
>>>
 I can see one bug, but your stack trace seems unrelated.

 Anyway, could you try with this?
>>>
>>> Build in process.
>>>
 diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
 index 6e8b716..f6a62ae 100644
 --- a/arch/arm/net/bpf_jit_32.c
 +++ b/arch/arm/net/bpf_jit_32.c
 @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
 *prog)
 /* there are 2 passes here */
 bpf_jit_dump(prog->len, image_size, 2, ctx.target);

 -   set_memory_ro((unsigned long)header, header->pages);
 +   bpf_jit_binary_lock_ro(header);
 prog->bpf_func = (void *)ctx.target;
 prog->jited = 1;
 prog->jited_len = image_size;
>>
>> So with that and the other fix there was no improvement, with those
>> and the BPF JIT disabled it works, I'm not sure if the two patches
>> have any effect with the JIT disabled though.
>>
>> Will look at the other patches shortly, there's been some other issue
>> introduced between rc1 and rc2 which I have to work out before I can
>> test those though.
> 
> Quick update, with linus's head as of yesterday, basically rc2 plus
> davem's network fixes it works if the JIT is disabled IE:
> # CONFIG_BPF_JIT_ALWAYS_ON is not set
> # CONFIG_BPF_JIT is not set
> 
> If I enable it the boot breaks even worse than the errors above in
> that I get no console output at all, even with earlycon, so we've gone
> backwards since rc1 somehow.
> 
> I'll try the above two reverted unless you have any other suggestions.

Ok, thanks, lets do that!

I'm still working on fixes meanwhile, should have something by end of day.

Thanks,
Daniel


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-06-26 Thread Peter Robinson
Hi Daniel,

>>> On 06/24/2018 11:24 AM, Peter Robinson wrote:
>> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite
>> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3
>> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few
>> others, both LPAE/normal kernels.
>>>
>>> So this is arm32 right?
>>
>> Correct.
>>
>> I'm a bit out of my depth in this part of the kernel but I'm wondering
>> if it's known, I couldn't find anything that looked obvious on a few
>> mailing lists.
>>
>> Peter
>
> Hi Peter
>
> Could you provide symbolic information ?

 I passed in through scripts/decode_stacktrace.sh is that what you were 
 after:

 [8.673880] Internal error: Oops: a06 [#10] SMP ARM
 [8.673949] ---[ end trace 049df4786ea3140a ]---
 [8.678754] Modules linked in:
 [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G  D
 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1
 [8.678769] Hardware name: Allwinner sun8i Family
 [8.678781] PC is at sk_filter_trim_cap ()
 [8.678790] LR is at   (null)
 [8.709463] pc : lr : psr: 6013 ()
 [8.715722] sp : c996bd60  ip :   fp : 
 [8.720939] r10: ee79dc00  r9 : c12c9f80  r8 : 
 [8.726157] r7 :   r6 : 0001  r5 : f1648000  r4 : 
 [8.732674] r3 : 0007  r2 :   r1 :   r0 : 
 [8.739193] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
 Segment user
 [8.746318] Control: 30c5387d  Table: 6e7bc880  DAC: ffe75ece
 [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval))
 [8.758574] Stack: (0xc996bd60 to 0xc996c000)
>>>
>>> Do you have BPF JIT enabled or disabled? Does it happen with disabled?
>>
>> Enabled, I can test with it disabled, BPF configs bits are:
>> CONFIG_BPF_EVENTS=y
>> # CONFIG_BPFILTER is not set
>> CONFIG_BPF_JIT_ALWAYS_ON=y
>> CONFIG_BPF_JIT=y
>> CONFIG_BPF_STREAM_PARSER=y
>> CONFIG_BPF_SYSCALL=y
>> CONFIG_BPF=y
>> CONFIG_CGROUP_BPF=y
>> CONFIG_HAVE_EBPF_JIT=y
>> CONFIG_IPV6_SEG6_BPF=y
>> CONFIG_LWTUNNEL_BPF=y
>> # CONFIG_NBPFAXI_DMA is not set
>> CONFIG_NET_ACT_BPF=m
>> CONFIG_NET_CLS_BPF=m
>> CONFIG_NETFILTER_XT_MATCH_BPF=m
>> # CONFIG_TEST_BPF is not set
>>
>>> I can see one bug, but your stack trace seems unrelated.
>>>
>>> Anyway, could you try with this?
>>
>> Build in process.
>>
>>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>>> index 6e8b716..f6a62ae 100644
>>> --- a/arch/arm/net/bpf_jit_32.c
>>> +++ b/arch/arm/net/bpf_jit_32.c
>>> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
>>> *prog)
>>> /* there are 2 passes here */
>>> bpf_jit_dump(prog->len, image_size, 2, ctx.target);
>>>
>>> -   set_memory_ro((unsigned long)header, header->pages);
>>> +   bpf_jit_binary_lock_ro(header);
>>> prog->bpf_func = (void *)ctx.target;
>>> prog->jited = 1;
>>> prog->jited_len = image_size;
>
> So with that and the other fix there was no improvement, with those
> and the BPF JIT disabled it works, I'm not sure if the two patches
> have any effect with the JIT disabled though.
>
> Will look at the other patches shortly, there's been some other issue
> introduced between rc1 and rc2 which I have to work out before I can
> test those though.

Quick update, with linus's head as of yesterday, basically rc2 plus
davem's network fixes it works if the JIT is disabled IE:
# CONFIG_BPF_JIT_ALWAYS_ON is not set
# CONFIG_BPF_JIT is not set

If I enable it the boot breaks even worse than the errors above in
that I get no console output at all, even with earlycon, so we've gone
backwards since rc1 somehow.

I'll try the above two reverted unless you have any other suggestions.

Peter


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-06-25 Thread Peter Robinson
On Mon, Jun 25, 2018 at 2:39 PM, Peter Robinson  wrote:
> Hi Daniel,
>
>> On 06/24/2018 11:24 AM, Peter Robinson wrote:
> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite
> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3
> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few
> others, both LPAE/normal kernels.
>>
>> So this is arm32 right?
>
> Correct.
>
> I'm a bit out of my depth in this part of the kernel but I'm wondering
> if it's known, I couldn't find anything that looked obvious on a few
> mailing lists.
>
> Peter

 Hi Peter

 Could you provide symbolic information ?
>>>
>>> I passed in through scripts/decode_stacktrace.sh is that what you were 
>>> after:
>>>
>>> [8.673880] Internal error: Oops: a06 [#10] SMP ARM
>>> [8.673949] ---[ end trace 049df4786ea3140a ]---
>>> [8.678754] Modules linked in:
>>> [8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G  D
>>> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1
>>> [8.678769] Hardware name: Allwinner sun8i Family
>>> [8.678781] PC is at sk_filter_trim_cap ()
>>> [8.678790] LR is at   (null)
>>> [8.709463] pc : lr : psr: 6013 ()
>>> [8.715722] sp : c996bd60  ip :   fp : 
>>> [8.720939] r10: ee79dc00  r9 : c12c9f80  r8 : 
>>> [8.726157] r7 :   r6 : 0001  r5 : f1648000  r4 : 
>>> [8.732674] r3 : 0007  r2 :   r1 :   r0 : 
>>> [8.739193] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
>>> user
>>> [8.746318] Control: 30c5387d  Table: 6e7bc880  DAC: ffe75ece
>>> [8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval))
>>> [8.758574] Stack: (0xc996bd60 to 0xc996c000)
>>
>> Do you have BPF JIT enabled or disabled? Does it happen with disabled?
>
> Enabled, I can test with it disabled, BPF configs bits are:
> CONFIG_BPF_EVENTS=y
> # CONFIG_BPFILTER is not set
> CONFIG_BPF_JIT_ALWAYS_ON=y
> CONFIG_BPF_JIT=y
> CONFIG_BPF_STREAM_PARSER=y
> CONFIG_BPF_SYSCALL=y
> CONFIG_BPF=y
> CONFIG_CGROUP_BPF=y
> CONFIG_HAVE_EBPF_JIT=y
> CONFIG_IPV6_SEG6_BPF=y
> CONFIG_LWTUNNEL_BPF=y
> # CONFIG_NBPFAXI_DMA is not set
> CONFIG_NET_ACT_BPF=m
> CONFIG_NET_CLS_BPF=m
> CONFIG_NETFILTER_XT_MATCH_BPF=m
> # CONFIG_TEST_BPF is not set
>
>> I can see one bug, but your stack trace seems unrelated.
>>
>> Anyway, could you try with this?
>
> Build in process.
>
>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>> index 6e8b716..f6a62ae 100644
>> --- a/arch/arm/net/bpf_jit_32.c
>> +++ b/arch/arm/net/bpf_jit_32.c
>> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
>> *prog)
>> /* there are 2 passes here */
>> bpf_jit_dump(prog->len, image_size, 2, ctx.target);
>>
>> -   set_memory_ro((unsigned long)header, header->pages);
>> +   bpf_jit_binary_lock_ro(header);
>> prog->bpf_func = (void *)ctx.target;
>> prog->jited = 1;
>> prog->jited_len = image_size;

So with that and the other fix there was no improvement, with those
and the BPF JIT disabled it works, I'm not sure if the two patches
have any effect with the JIT disabled though.

Will look at the other patches shortly, there's been some other issue
introduced between rc1 and rc2 which I have to work out before I can
test those though.

Peter