Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-20 Thread Steffen Klassert
On Mon, Feb 19, 2018 at 11:05:38AM +0100, Dmitry Vyukov wrote:
> On Mon, Feb 19, 2018 at 8:22 AM, Steffen Klassert
>  wrote:
> >> >  wrote:
> >> >> Hello,
> >> >>
> >> >> syzbot hit the following crash on net-next commit
> >> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 
> >> >> +)
> >> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
> >> >>
> >> >> So far this crash happened 6 times on net-next.
> >> >> Unfortunately, I don't have any reproducer for this crash yet.
> >> >> Raw console output is attached.
> >> >> compiler: gcc (GCC) 7.1.1 20170620
> >> >> .config is attached.
> >> >
> >> >
> >> > +xfrm maintainers
> >>
> >> Here is a C repro:
> >> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
> >
> > Seems like syzbot does not know about this reproducer.
> >
> > I've send a patch to test and got this as the reply:
> >
> > This crash does not have a reproducer. I cannot test it.
> 
> Yes, it does not know about the reproducer. I've extracted it
> manually, these hangs are sometimes hard to reproduce. For syzbot this
> bug does not have a reproducer.
> Have you tried to run the reproducer? For me it reproduced the bug
> quite reliably.

I've tried the reproducer, it does not trigger with my kernel
config and with your config my machine does not boot. So it
was just easy to ask syzbot for a test.

Anyway, Florian Westphal did a test and confirmed that
the patch fixes the issue. I've just applied the fix
below to the ipsec tree.


Subject: [PATCH] xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport 
mode.

On transport mode we forget to fetch the child dst_entry
before we continue the while loop, this leads to an infinite
loop. Fix this by fetching the child dst_entry before we
continue the while loop.

Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
Reported-by: syzbot+7d03c810e50aaedef...@syzkaller.appspotmail.com
Tested-by: Florian Westphal 
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 150d46633ce6..625b3fca5704 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2732,14 +2732,14 @@ static const void *xfrm_get_dst_nexthop(const struct 
dst_entry *dst,
while (dst->xfrm) {
const struct xfrm_state *xfrm = dst->xfrm;
 
+   dst = xfrm_dst_child(dst);
+
if (xfrm->props.mode == XFRM_MODE_TRANSPORT)
continue;
if (xfrm->type->flags & XFRM_TYPE_REMOTE_COADDR)
daddr = xfrm->coaddr;
else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR))
daddr = >id.daddr;
-
-   dst = xfrm_dst_child(dst);
}
return daddr;
 }
-- 
2.14.1



Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-20 Thread Steffen Klassert
On Mon, Feb 19, 2018 at 11:05:38AM +0100, Dmitry Vyukov wrote:
> On Mon, Feb 19, 2018 at 8:22 AM, Steffen Klassert
>  wrote:
> >> >  wrote:
> >> >> Hello,
> >> >>
> >> >> syzbot hit the following crash on net-next commit
> >> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 
> >> >> +)
> >> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
> >> >>
> >> >> So far this crash happened 6 times on net-next.
> >> >> Unfortunately, I don't have any reproducer for this crash yet.
> >> >> Raw console output is attached.
> >> >> compiler: gcc (GCC) 7.1.1 20170620
> >> >> .config is attached.
> >> >
> >> >
> >> > +xfrm maintainers
> >>
> >> Here is a C repro:
> >> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
> >
> > Seems like syzbot does not know about this reproducer.
> >
> > I've send a patch to test and got this as the reply:
> >
> > This crash does not have a reproducer. I cannot test it.
> 
> Yes, it does not know about the reproducer. I've extracted it
> manually, these hangs are sometimes hard to reproduce. For syzbot this
> bug does not have a reproducer.
> Have you tried to run the reproducer? For me it reproduced the bug
> quite reliably.

I've tried the reproducer, it does not trigger with my kernel
config and with your config my machine does not boot. So it
was just easy to ask syzbot for a test.

Anyway, Florian Westphal did a test and confirmed that
the patch fixes the issue. I've just applied the fix
below to the ipsec tree.


Subject: [PATCH] xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport 
mode.

On transport mode we forget to fetch the child dst_entry
before we continue the while loop, this leads to an infinite
loop. Fix this by fetching the child dst_entry before we
continue the while loop.

Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
Reported-by: syzbot+7d03c810e50aaedef...@syzkaller.appspotmail.com
Tested-by: Florian Westphal 
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 150d46633ce6..625b3fca5704 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2732,14 +2732,14 @@ static const void *xfrm_get_dst_nexthop(const struct 
dst_entry *dst,
while (dst->xfrm) {
const struct xfrm_state *xfrm = dst->xfrm;
 
+   dst = xfrm_dst_child(dst);
+
if (xfrm->props.mode == XFRM_MODE_TRANSPORT)
continue;
if (xfrm->type->flags & XFRM_TYPE_REMOTE_COADDR)
daddr = xfrm->coaddr;
else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR))
daddr = >id.daddr;
-
-   dst = xfrm_dst_child(dst);
}
return daddr;
 }
-- 
2.14.1



Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-19 Thread Dmitry Vyukov
On Mon, Feb 19, 2018 at 8:22 AM, Steffen Klassert
 wrote:
>> >  wrote:
>> >> Hello,
>> >>
>> >> syzbot hit the following crash on net-next commit
>> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
>> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>> >>
>> >> So far this crash happened 6 times on net-next.
>> >> Unfortunately, I don't have any reproducer for this crash yet.
>> >> Raw console output is attached.
>> >> compiler: gcc (GCC) 7.1.1 20170620
>> >> .config is attached.
>> >
>> >
>> > +xfrm maintainers
>>
>> Here is a C repro:
>> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
>
> Seems like syzbot does not know about this reproducer.
>
> I've send a patch to test and got this as the reply:
>
> This crash does not have a reproducer. I cannot test it.

Yes, it does not know about the reproducer. I've extracted it
manually, these hangs are sometimes hard to reproduce. For syzbot this
bug does not have a reproducer.
Have you tried to run the reproducer? For me it reproduced the bug
quite reliably.


Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-19 Thread Dmitry Vyukov
On Mon, Feb 19, 2018 at 8:22 AM, Steffen Klassert
 wrote:
>> >  wrote:
>> >> Hello,
>> >>
>> >> syzbot hit the following crash on net-next commit
>> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
>> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>> >>
>> >> So far this crash happened 6 times on net-next.
>> >> Unfortunately, I don't have any reproducer for this crash yet.
>> >> Raw console output is attached.
>> >> compiler: gcc (GCC) 7.1.1 20170620
>> >> .config is attached.
>> >
>> >
>> > +xfrm maintainers
>>
>> Here is a C repro:
>> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
>
> Seems like syzbot does not know about this reproducer.
>
> I've send a patch to test and got this as the reply:
>
> This crash does not have a reproducer. I cannot test it.

Yes, it does not know about the reproducer. I've extracted it
manually, these hangs are sometimes hard to reproduce. For syzbot this
bug does not have a reproducer.
Have you tried to run the reproducer? For me it reproduced the bug
quite reliably.


Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-18 Thread Steffen Klassert
On Tue, Feb 13, 2018 at 10:19:17AM +0100, Dmitry Vyukov wrote:
> On Mon, Feb 12, 2018 at 4:26 PM, Dmitry Vyukov  wrote:
> > On Mon, Feb 12, 2018 at 4:23 PM, syzbot
> >  wrote:
> >> Hello,
> >>
> >> syzbot hit the following crash on net-next commit
> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
> >>
> >> So far this crash happened 6 times on net-next.
> >> Unfortunately, I don't have any reproducer for this crash yet.
> >> Raw console output is attached.
> >> compiler: gcc (GCC) 7.1.1 20170620
> >> .config is attached.
> >
> >
> > +xfrm maintainers
> 
> Here is a C repro:
> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt

Seems like syzbot does not know about this reproducer.

I've send a patch to test and got this as the reply:

This crash does not have a reproducer. I cannot test it.


Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-18 Thread Steffen Klassert
On Tue, Feb 13, 2018 at 10:19:17AM +0100, Dmitry Vyukov wrote:
> On Mon, Feb 12, 2018 at 4:26 PM, Dmitry Vyukov  wrote:
> > On Mon, Feb 12, 2018 at 4:23 PM, syzbot
> >  wrote:
> >> Hello,
> >>
> >> syzbot hit the following crash on net-next commit
> >> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
> >> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
> >>
> >> So far this crash happened 6 times on net-next.
> >> Unfortunately, I don't have any reproducer for this crash yet.
> >> Raw console output is attached.
> >> compiler: gcc (GCC) 7.1.1 20170620
> >> .config is attached.
> >
> >
> > +xfrm maintainers
> 
> Here is a C repro:
> https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt

Seems like syzbot does not know about this reproducer.

I've send a patch to test and got this as the reply:

This crash does not have a reproducer. I cannot test it.


Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-13 Thread Dmitry Vyukov
On Mon, Feb 12, 2018 at 4:26 PM, Dmitry Vyukov  wrote:
> On Mon, Feb 12, 2018 at 4:23 PM, syzbot
>  wrote:
>> Hello,
>>
>> syzbot hit the following crash on net-next commit
>> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
>> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>>
>> So far this crash happened 6 times on net-next.
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>
>
> +xfrm maintainers

Here is a C repro:
https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
Somewhat messy, but it gives me:

[64360.028053] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [a.out:7891]
[64360.030043] Modules linked in:
[64360.030355] CPU: 0 PID: 7891 Comm: a.out Not tainted 4.15.0+ #96
[64360.031334] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[64360.033632] RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50
[64360.035114] RSP: 0018:88006b96f6e0 EFLAGS: 0246 ORIG_RAX:
ff11
[64360.037238] RAX: 8800636ce040 RBX: 880067fadb00 RCX: 84d147a6
[64360.039238] RDX:  RSI: 84d147b0 RDI: 0001
[64360.041357] RBP: 88006b96f6e0 R08: 8800636ce040 R09: ed000d1008c9
[64360.043364] R10:  R11:  R12: ed000cff5b7b
[64360.045366] R13: 880067fadbdc R14: 88006586ae00 R15: 
[64360.047367] FS:  7f2a5bab5700() GS:88006ca0()
knlGS:
[64360.049486] CS:  0010 DS:  ES:  CR0: 80050033
[64360.051095] CR2: 20cf9000 CR3: 626a7001 CR4: 001606f0
[64360.053065] Call Trace:
[64360.053822]  xfrm_confirm_neigh+0xd0/0x2c0
[64360.056166]  raw_sendmsg+0x1f1e/0x2640
[64360.068396]  inet_sendmsg+0x145/0x490
[64360.074198]  sock_sendmsg+0xd2/0x120
[64360.075094]  SYSC_sendto+0x3de/0x640
[64360.085269]  SyS_sendto+0x40/0x50



>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+7d03c810e50aaedef...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> INFO: rcu_sched self-detected stall on CPU
>> 1-...!: (124998 ticks this GP) idle=376/141/0
>> softirq=506054/506054 fqs=19
>>  (t=125000 jiffies g=289415 c=289414 q=312)
>> rcu_sched kthread starved for 124920 jiffies! g289415 c289414 f0x0
>> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
>> rcu_sched   R  running task23456 8  2 0x8000
>> Call Trace:
>>  context_switch kernel/sched/core.c:2799 [inline]
>>  __schedule+0x8eb/0x2060 kernel/sched/core.c:3375
>>  schedule+0xf5/0x430 kernel/sched/core.c:3434
>>  schedule_timeout+0x118/0x230 kernel/time/timer.c:1793
>>  rcu_gp_kthread+0x9e5/0x1930 kernel/rcu/tree.c:2314
>>  kthread+0x33c/0x400 kernel/kthread.c:238
>>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:541
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 15893 Comm: syz-executor0 Not tainted 4.15.0-rc9+ #210
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>>  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>>  nmi_trigger_cpumask_backtrace+0x122/0x180 lib/nmi_backtrace.c:62
>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>  rcu_dump_cpu_stacks+0x186/0x1d9 kernel/rcu/tree.c:1459
>>  print_cpu_stall kernel/rcu/tree.c:1608 [inline]
>>  check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1676
>>  __rcu_pending kernel/rcu/tree.c:3440 [inline]
>>  rcu_pending kernel/rcu/tree.c:3502 [inline]
>>  rcu_check_callbacks+0x256/0xd00 kernel/rcu/tree.c:2842
>>  update_process_times+0x30/0x60 kernel/time/timer.c:1628
>>  tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
>>  tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
>>  __run_hrtimer kernel/time/hrtimer.c:1211 [inline]
>>  __hrtimer_run_queues+0x358/0xe20 kernel/time/hrtimer.c:1275
>>  hrtimer_interrupt+0x1c2/0x5e0 kernel/time/hrtimer.c:1309
>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>  smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
>>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:937
>>  
>> RIP: 0010:__read_once_size include/linux/compiler.h:183 [inline]
>> RIP: 0010:__sanitizer_cov_trace_pc+0x3b/0x50 kernel/kcov.c:106
>> RSP: 0018:8801a6867820 EFLAGS: 0246 ORIG_RAX: ff11
>> RAX: 0001 RBX: 8801caf7d200 RCX: 

Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-13 Thread Dmitry Vyukov
On Mon, Feb 12, 2018 at 4:26 PM, Dmitry Vyukov  wrote:
> On Mon, Feb 12, 2018 at 4:23 PM, syzbot
>  wrote:
>> Hello,
>>
>> syzbot hit the following crash on net-next commit
>> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
>> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>>
>> So far this crash happened 6 times on net-next.
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>
>
> +xfrm maintainers

Here is a C repro:
https://gist.githubusercontent.com/dvyukov/92c67ba9afaaa960bcfbdc6ef549ac10/raw/786f9221c1d707c7f4a15effcb1d5997dd4f8638/gistfile1.txt
Somewhat messy, but it gives me:

[64360.028053] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [a.out:7891]
[64360.030043] Modules linked in:
[64360.030355] CPU: 0 PID: 7891 Comm: a.out Not tainted 4.15.0+ #96
[64360.031334] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[64360.033632] RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50
[64360.035114] RSP: 0018:88006b96f6e0 EFLAGS: 0246 ORIG_RAX:
ff11
[64360.037238] RAX: 8800636ce040 RBX: 880067fadb00 RCX: 84d147a6
[64360.039238] RDX:  RSI: 84d147b0 RDI: 0001
[64360.041357] RBP: 88006b96f6e0 R08: 8800636ce040 R09: ed000d1008c9
[64360.043364] R10:  R11:  R12: ed000cff5b7b
[64360.045366] R13: 880067fadbdc R14: 88006586ae00 R15: 
[64360.047367] FS:  7f2a5bab5700() GS:88006ca0()
knlGS:
[64360.049486] CS:  0010 DS:  ES:  CR0: 80050033
[64360.051095] CR2: 20cf9000 CR3: 626a7001 CR4: 001606f0
[64360.053065] Call Trace:
[64360.053822]  xfrm_confirm_neigh+0xd0/0x2c0
[64360.056166]  raw_sendmsg+0x1f1e/0x2640
[64360.068396]  inet_sendmsg+0x145/0x490
[64360.074198]  sock_sendmsg+0xd2/0x120
[64360.075094]  SYSC_sendto+0x3de/0x640
[64360.085269]  SyS_sendto+0x40/0x50



>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+7d03c810e50aaedef...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> INFO: rcu_sched self-detected stall on CPU
>> 1-...!: (124998 ticks this GP) idle=376/141/0
>> softirq=506054/506054 fqs=19
>>  (t=125000 jiffies g=289415 c=289414 q=312)
>> rcu_sched kthread starved for 124920 jiffies! g289415 c289414 f0x0
>> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
>> rcu_sched   R  running task23456 8  2 0x8000
>> Call Trace:
>>  context_switch kernel/sched/core.c:2799 [inline]
>>  __schedule+0x8eb/0x2060 kernel/sched/core.c:3375
>>  schedule+0xf5/0x430 kernel/sched/core.c:3434
>>  schedule_timeout+0x118/0x230 kernel/time/timer.c:1793
>>  rcu_gp_kthread+0x9e5/0x1930 kernel/rcu/tree.c:2314
>>  kthread+0x33c/0x400 kernel/kthread.c:238
>>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:541
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 15893 Comm: syz-executor0 Not tainted 4.15.0-rc9+ #210
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>>  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>>  nmi_trigger_cpumask_backtrace+0x122/0x180 lib/nmi_backtrace.c:62
>>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>>  rcu_dump_cpu_stacks+0x186/0x1d9 kernel/rcu/tree.c:1459
>>  print_cpu_stall kernel/rcu/tree.c:1608 [inline]
>>  check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1676
>>  __rcu_pending kernel/rcu/tree.c:3440 [inline]
>>  rcu_pending kernel/rcu/tree.c:3502 [inline]
>>  rcu_check_callbacks+0x256/0xd00 kernel/rcu/tree.c:2842
>>  update_process_times+0x30/0x60 kernel/time/timer.c:1628
>>  tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
>>  tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
>>  __run_hrtimer kernel/time/hrtimer.c:1211 [inline]
>>  __hrtimer_run_queues+0x358/0xe20 kernel/time/hrtimer.c:1275
>>  hrtimer_interrupt+0x1c2/0x5e0 kernel/time/hrtimer.c:1309
>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>>  smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
>>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:937
>>  
>> RIP: 0010:__read_once_size include/linux/compiler.h:183 [inline]
>> RIP: 0010:__sanitizer_cov_trace_pc+0x3b/0x50 kernel/kcov.c:106
>> RSP: 0018:8801a6867820 EFLAGS: 0246 ORIG_RAX: ff11
>> RAX: 0001 RBX: 8801caf7d200 RCX: 84acc87d
>> RDX:  RSI: c90002aa6000 RDI: 8801c54c4f50
>> 

Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-12 Thread Dmitry Vyukov
On Mon, Feb 12, 2018 at 4:23 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>
> So far this crash happened 6 times on net-next.
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.


+xfrm maintainers


> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+7d03c810e50aaedef...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> INFO: rcu_sched self-detected stall on CPU
> 1-...!: (124998 ticks this GP) idle=376/141/0
> softirq=506054/506054 fqs=19
>  (t=125000 jiffies g=289415 c=289414 q=312)
> rcu_sched kthread starved for 124920 jiffies! g289415 c289414 f0x0
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
> rcu_sched   R  running task23456 8  2 0x8000
> Call Trace:
>  context_switch kernel/sched/core.c:2799 [inline]
>  __schedule+0x8eb/0x2060 kernel/sched/core.c:3375
>  schedule+0xf5/0x430 kernel/sched/core.c:3434
>  schedule_timeout+0x118/0x230 kernel/time/timer.c:1793
>  rcu_gp_kthread+0x9e5/0x1930 kernel/rcu/tree.c:2314
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:541
> NMI backtrace for cpu 1
> CPU: 1 PID: 15893 Comm: syz-executor0 Not tainted 4.15.0-rc9+ #210
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x122/0x180 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x186/0x1d9 kernel/rcu/tree.c:1459
>  print_cpu_stall kernel/rcu/tree.c:1608 [inline]
>  check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1676
>  __rcu_pending kernel/rcu/tree.c:3440 [inline]
>  rcu_pending kernel/rcu/tree.c:3502 [inline]
>  rcu_check_callbacks+0x256/0xd00 kernel/rcu/tree.c:2842
>  update_process_times+0x30/0x60 kernel/time/timer.c:1628
>  tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
>  tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
>  __run_hrtimer kernel/time/hrtimer.c:1211 [inline]
>  __hrtimer_run_queues+0x358/0xe20 kernel/time/hrtimer.c:1275
>  hrtimer_interrupt+0x1c2/0x5e0 kernel/time/hrtimer.c:1309
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:937
>  
> RIP: 0010:__read_once_size include/linux/compiler.h:183 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0x3b/0x50 kernel/kcov.c:106
> RSP: 0018:8801a6867820 EFLAGS: 0246 ORIG_RAX: ff11
> RAX: 0001 RBX: 8801caf7d200 RCX: 84acc87d
> RDX:  RSI: c90002aa6000 RDI: 8801c54c4f50
> RBP: 8801a6867820 R08: 0001 R09: 
> R10:  R11:  R12: 8801c54c4e00
> R13: dc00 R14: ed00395efa5f R15: 8801caf7d2fc
>  xfrm_get_dst_nexthop net/xfrm/xfrm_policy.c:2732 [inline]
>  xfrm_confirm_neigh+0xad/0x270 net/xfrm/xfrm_policy.c:2759
>  dst_confirm_neigh include/net/dst.h:419 [inline]
>  raw_sendmsg+0xece/0x23b0 net/ipv4/raw.c:702
>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
>  sock_sendmsg_nosec net/socket.c:630 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:640
>  SYSC_sendto+0x361/0x5c0 net/socket.c:1747
>  SyS_sendto+0x40/0x50 net/socket.c:1715
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x452f19
> RSP: 002b:7f00a389ec58 EFLAGS: 0212 ORIG_RAX: 002c
> RAX: ffda RBX: 0071bf58 RCX: 00452f19
> RDX: 001e RSI: 20098000 RDI: 0013
> RBP: 0510 R08: 20cf9000 R09: 0010
> R10: fffe R11: 0212 R12: 006f6a20
> R13:  R14: 7f00a389f6d4 R15: 0001
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: 

Re: INFO: rcu detected stall in xfrm_confirm_neigh

2018-02-12 Thread Dmitry Vyukov
On Mon, Feb 12, 2018 at 4:23 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 9515a2e082f91457db0ecff4b65371d0fb5d9aad (Thu Jan 25 03:37:38 2018 +)
> net/ipv4: Allow send to local broadcast from a socket bound to a VRF
>
> So far this crash happened 6 times on net-next.
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.


+xfrm maintainers


> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+7d03c810e50aaedef...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> INFO: rcu_sched self-detected stall on CPU
> 1-...!: (124998 ticks this GP) idle=376/141/0
> softirq=506054/506054 fqs=19
>  (t=125000 jiffies g=289415 c=289414 q=312)
> rcu_sched kthread starved for 124920 jiffies! g289415 c289414 f0x0
> RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
> rcu_sched   R  running task23456 8  2 0x8000
> Call Trace:
>  context_switch kernel/sched/core.c:2799 [inline]
>  __schedule+0x8eb/0x2060 kernel/sched/core.c:3375
>  schedule+0xf5/0x430 kernel/sched/core.c:3434
>  schedule_timeout+0x118/0x230 kernel/time/timer.c:1793
>  rcu_gp_kthread+0x9e5/0x1930 kernel/rcu/tree.c:2314
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:541
> NMI backtrace for cpu 1
> CPU: 1 PID: 15893 Comm: syz-executor0 Not tainted 4.15.0-rc9+ #210
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
>  nmi_trigger_cpumask_backtrace+0x122/0x180 lib/nmi_backtrace.c:62
>  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>  rcu_dump_cpu_stacks+0x186/0x1d9 kernel/rcu/tree.c:1459
>  print_cpu_stall kernel/rcu/tree.c:1608 [inline]
>  check_cpu_stall.isra.61+0xbb8/0x15b0 kernel/rcu/tree.c:1676
>  __rcu_pending kernel/rcu/tree.c:3440 [inline]
>  rcu_pending kernel/rcu/tree.c:3502 [inline]
>  rcu_check_callbacks+0x256/0xd00 kernel/rcu/tree.c:2842
>  update_process_times+0x30/0x60 kernel/time/timer.c:1628
>  tick_sched_handle+0x85/0x160 kernel/time/tick-sched.c:162
>  tick_sched_timer+0x42/0x120 kernel/time/tick-sched.c:1194
>  __run_hrtimer kernel/time/hrtimer.c:1211 [inline]
>  __hrtimer_run_queues+0x358/0xe20 kernel/time/hrtimer.c:1275
>  hrtimer_interrupt+0x1c2/0x5e0 kernel/time/hrtimer.c:1309
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>  smp_apic_timer_interrupt+0x14a/0x700 arch/x86/kernel/apic/apic.c:1050
>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:937
>  
> RIP: 0010:__read_once_size include/linux/compiler.h:183 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0x3b/0x50 kernel/kcov.c:106
> RSP: 0018:8801a6867820 EFLAGS: 0246 ORIG_RAX: ff11
> RAX: 0001 RBX: 8801caf7d200 RCX: 84acc87d
> RDX:  RSI: c90002aa6000 RDI: 8801c54c4f50
> RBP: 8801a6867820 R08: 0001 R09: 
> R10:  R11:  R12: 8801c54c4e00
> R13: dc00 R14: ed00395efa5f R15: 8801caf7d2fc
>  xfrm_get_dst_nexthop net/xfrm/xfrm_policy.c:2732 [inline]
>  xfrm_confirm_neigh+0xad/0x270 net/xfrm/xfrm_policy.c:2759
>  dst_confirm_neigh include/net/dst.h:419 [inline]
>  raw_sendmsg+0xece/0x23b0 net/ipv4/raw.c:702
>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
>  sock_sendmsg_nosec net/socket.c:630 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:640
>  SYSC_sendto+0x361/0x5c0 net/socket.c:1747
>  SyS_sendto+0x40/0x50 net/socket.c:1715
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x452f19
> RSP: 002b:7f00a389ec58 EFLAGS: 0212 ORIG_RAX: 002c
> RAX: ffda RBX: 0071bf58 RCX: 00452f19
> RDX: 001e RSI: 20098000 RDI: 0013
> RBP: 0510 R08: 20cf9000 R09: 0010
> R10: fffe R11: 0212 R12: 006f6a20
> R13:  R14: 7f00a389f6d4 R15: 0001
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug