subject:"INFO\: rcu detected stall in tc_modify

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-30 Thread Vinicius Costa Gomes

Dmitry Vyukov  writes:

>> >
>> > Also are we talking about CAP_NET_ADMIN in a user ns as well
>> > (effectively nobody)?
>>
>> Just checked, we are talking about CAP_NET_ADMIN in user namespace as
>> well.
>
> OK, so this is not root/admin, this is just any user.

Yeah, will fix this.


Thanks,
-- 
Vinicius

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-30 Thread Vinicius Costa Gomes

Hi Eric,

Eric Dumazet  writes:

>> I admit that I am on the fence on that argument: do not let even root
>> crash the system (the point that my code is crashing the system gives
>> weight to this side) vs. root has great powers, they need to know what
>> they are doing.
>> 
>> The argument that I used to convince myself was: root can easily create
>> a bunch of processes and give them the highest priority and do
>> effectively the same thing as this issue, so I went with a the "they
>> need to know what they are doing side".
>> 
>> A bit more on the specifics here:
>> 
>>   - Using a small interval size, is only a limitation of the taprio
>>   software mode, when using hardware offloads (which I think most users
>>   do), any interval size (supported by the hardware) can be used;
>> 
>>   - Choosing a good lower limit for this seems kind of hard: something
>>   below 1us would never work well, I think, but things 1us < x < 100us
>>   will depend on the hardware/kernel config/system load, and this is the
>>   range includes "useful" values for many systems.
>> 
>> Perhaps a middle ground would be to impose a limit based on the link
>> speed, the interval can never be smaller than the time it takes to send
>> the minimum ethernet frame (for 1G links this would be ~480ns, should be
>> enough to catch most programming mistakes). I am going to add this and
>> see how it looks like.
>> 
>> Sorry for the brain dump :-)
>
>
> I do not know taprio details, but do you really need a periodic timer
> ?

As we can control the transmission time of packets, you are right, I
don't.

Just a bit more detail about the current implementation taprio,
basically it has a sequence of { Traffic Classes that are open; Interval
} that repeats cyclicly, it uses an hrtimer to advance the pointer for
the current element, so during dequeue I can check if a traffic class is
"open" or "closed".

But again, if I calculate the 'skb->tstamp' of each packet during
enqueue, I don't need the hrtimer. What we have in the txtime-assisted
mode is half way there.

I think this is what you had in mind.


Cheers,
-- 
Vinicius

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-30 Thread Dmitry Vyukov

On Thu, Jul 30, 2020 at 7:44 PM Vinicius Costa Gomes
 wrote:
>
> Hi,
>
> Dmitry Vyukov  writes:
>
> > On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
> >  wrote:
> >>
> >> Hi,
> >>
> >> "Zhang, Qiang"  writes:
> >>
> >> > 
> >> > 发件人: linux-kernel-ow...@vger.kernel.org 
> >> >  代表 syzbot 
> >> > 
> >> > 发送时间: 2020年7月29日 13:53
> >> > 收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; 
> >> > j...@resnulli.us; linux-kernel@vger.kernel.org; mi...@kernel.org; 
> >> > net...@vger.kernel.org; syzkaller-b...@googlegroups.com; 
> >> > t...@linutronix.de; vinicius.go...@intel.com; xiyou.wangc...@gmail.com
> >> > 主题: INFO: rcu detected stall in tc_modify_qdisc
> >> >
> >> > Hello,
> >> >
> >> > syzbot found the following issue on:
> >> >
> >> > HEAD commit:181964e6 fix a braino in 
> >> > cmsghdr_from_user_compat_to_kern()
> >> > git tree:   net
> >> > console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
> >> > kernel config:  
> >> > https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
> >> > dashboard link: 
> >> > https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
> >> > compiler:   gcc (GCC) 10.1.0-syz 20200507
> >> > syz repro:
> >> > https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90
> >>
> >> It seems that syzkaller is generating an schedule with too small
> >> intervals (3ns in this case) which causes a hrtimer busy-loop which
> >> starves other kernel threads.
> >>
> >> We could put some limits on the interval when running in software mode,
> >> but I don't like this too much, because we are talking about users with
> >> CAP_NET_ADMIN and they have easier ways to do bad things to the system.
> >
> > Hi Vinicius,
> >
> > Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
> > Good code should check arguments regardless I think and it's useful to
> > protect root from, say, programming bugs rather than kill the machine
> > on any bug and misconfiguration. What am I missing?
>
> I admit that I am on the fence on that argument: do not let even root
> crash the system (the point that my code is crashing the system gives
> weight to this side) vs. root has great powers, they need to know what
> they are doing.
>
> The argument that I used to convince myself was: root can easily create
> a bunch of processes and give them the highest priority and do
> effectively the same thing as this issue, so I went with a the "they
> need to know what they are doing side".
>
> A bit more on the specifics here:
>
>   - Using a small interval size, is only a limitation of the taprio
>   software mode, when using hardware offloads (which I think most users
>   do), any interval size (supported by the hardware) can be used;
>
>   - Choosing a good lower limit for this seems kind of hard: something
>   below 1us would never work well, I think, but things 1us < x < 100us
>   will depend on the hardware/kernel config/system load, and this is the
>   range includes "useful" values for many systems.
>
> Perhaps a middle ground would be to impose a limit based on the link
> speed, the interval can never be smaller than the time it takes to send
> the minimum ethernet frame (for 1G links this would be ~480ns, should be
> enough to catch most programming mistakes). I am going to add this and
> see how it looks like.
>
> Sorry for the brain dump :-)
>
> >
> > Also are we talking about CAP_NET_ADMIN in a user ns as well
> > (effectively nobody)?
>
> Just checked, we are talking about CAP_NET_ADMIN in user namespace as
> well.

OK, so this is not root/admin, this is just any user.

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-30 Thread Eric Dumazet




On 7/30/20 10:44 AM, Vinicius Costa Gomes wrote:
> Hi,
> 
> Dmitry Vyukov  writes:
> 
>> On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
>>  wrote:
>>>
>>> Hi,
>>>
>>> "Zhang, Qiang"  writes:
>>>
>>>> 
>>>> 发件人: linux-kernel-ow...@vger.kernel.org 
>>>>  代表 syzbot 
>>>> 
>>>> 发送时间: 2020年7月29日 13:53
>>>> 收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; 
>>>> j...@resnulli.us; linux-kernel@vger.kernel.org; mi...@kernel.org; 
>>>> net...@vger.kernel.org; syzkaller-b...@googlegroups.com; 
>>>> t...@linutronix.de; vinicius.go...@intel.com; xiyou.wangc...@gmail.com
>>>> 主题: INFO: rcu detected stall in tc_modify_qdisc
>>>>
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
>>>> git tree:   net
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
>>>> dashboard link: 
>>>> https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
>>>> compiler:   gcc (GCC) 10.1.0-syz 20200507
>>>> syz repro:
>>>> https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90
>>>
>>> It seems that syzkaller is generating an schedule with too small
>>> intervals (3ns in this case) which causes a hrtimer busy-loop which
>>> starves other kernel threads.
>>>
>>> We could put some limits on the interval when running in software mode,
>>> but I don't like this too much, because we are talking about users with
>>> CAP_NET_ADMIN and they have easier ways to do bad things to the system.
>>
>> Hi Vinicius,
>>
>> Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
>> Good code should check arguments regardless I think and it's useful to
>> protect root from, say, programming bugs rather than kill the machine
>> on any bug and misconfiguration. What am I missing?
> 
> I admit that I am on the fence on that argument: do not let even root
> crash the system (the point that my code is crashing the system gives
> weight to this side) vs. root has great powers, they need to know what
> they are doing.
> 
> The argument that I used to convince myself was: root can easily create
> a bunch of processes and give them the highest priority and do
> effectively the same thing as this issue, so I went with a the "they
> need to know what they are doing side".
> 
> A bit more on the specifics here:
> 
>   - Using a small interval size, is only a limitation of the taprio
>   software mode, when using hardware offloads (which I think most users
>   do), any interval size (supported by the hardware) can be used;
> 
>   - Choosing a good lower limit for this seems kind of hard: something
>   below 1us would never work well, I think, but things 1us < x < 100us
>   will depend on the hardware/kernel config/system load, and this is the
>   range includes "useful" values for many systems.
> 
> Perhaps a middle ground would be to impose a limit based on the link
> speed, the interval can never be smaller than the time it takes to send
> the minimum ethernet frame (for 1G links this would be ~480ns, should be
> enough to catch most programming mistakes). I am going to add this and
> see how it looks like.
> 
> Sorry for the brain dump :-)


I do not know taprio details, but do you really need a periodic timer ?

Presumably there is no need to fire a timer before next packet departure time ?

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-30 Thread Vinicius Costa Gomes

Hi,

Dmitry Vyukov  writes:

> On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
>  wrote:
>>
>> Hi,
>>
>> "Zhang, Qiang"  writes:
>>
>> > 
>> > 发件人: linux-kernel-ow...@vger.kernel.org 
>> >  代表 syzbot 
>> > 
>> > 发送时间: 2020年7月29日 13:53
>> > 收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; 
>> > j...@resnulli.us; linux-kernel@vger.kernel.org; mi...@kernel.org; 
>> > net...@vger.kernel.org; syzkaller-b...@googlegroups.com; 
>> > t...@linutronix.de; vinicius.go...@intel.com; xiyou.wangc...@gmail.com
>> > 主题: INFO: rcu detected stall in tc_modify_qdisc
>> >
>> > Hello,
>> >
>> > syzbot found the following issue on:
>> >
>> > HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
>> > git tree:   net
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
>> > kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
>> > dashboard link: 
>> > https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
>> > compiler:   gcc (GCC) 10.1.0-syz 20200507
>> > syz repro:
>> > https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90
>>
>> It seems that syzkaller is generating an schedule with too small
>> intervals (3ns in this case) which causes a hrtimer busy-loop which
>> starves other kernel threads.
>>
>> We could put some limits on the interval when running in software mode,
>> but I don't like this too much, because we are talking about users with
>> CAP_NET_ADMIN and they have easier ways to do bad things to the system.
>
> Hi Vinicius,
>
> Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
> Good code should check arguments regardless I think and it's useful to
> protect root from, say, programming bugs rather than kill the machine
> on any bug and misconfiguration. What am I missing?

I admit that I am on the fence on that argument: do not let even root
crash the system (the point that my code is crashing the system gives
weight to this side) vs. root has great powers, they need to know what
they are doing.

The argument that I used to convince myself was: root can easily create
a bunch of processes and give them the highest priority and do
effectively the same thing as this issue, so I went with a the "they
need to know what they are doing side".

A bit more on the specifics here:

  - Using a small interval size, is only a limitation of the taprio
  software mode, when using hardware offloads (which I think most users
  do), any interval size (supported by the hardware) can be used;

  - Choosing a good lower limit for this seems kind of hard: something
  below 1us would never work well, I think, but things 1us < x < 100us
  will depend on the hardware/kernel config/system load, and this is the
  range includes "useful" values for many systems.

Perhaps a middle ground would be to impose a limit based on the link
speed, the interval can never be smaller than the time it takes to send
the minimum ethernet frame (for 1G links this would be ~480ns, should be
enough to catch most programming mistakes). I am going to add this and
see how it looks like.

Sorry for the brain dump :-)

>
> Also are we talking about CAP_NET_ADMIN in a user ns as well
> (effectively nobody)?

Just checked, we are talking about CAP_NET_ADMIN in user namespace as
well.


Cheers,
-- 
Vinicius

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-29 Thread Dmitry Vyukov

On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
 wrote:
>
> Hi,
>
> "Zhang, Qiang"  writes:
>
> > 
> > 发件人: linux-kernel-ow...@vger.kernel.org 
> >  代表 syzbot 
> > 
> > 发送时间: 2020年7月29日 13:53
> > 收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; 
> > j...@resnulli.us; linux-kernel@vger.kernel.org; mi...@kernel.org; 
> > net...@vger.kernel.org; syzkaller-b...@googlegroups.com; 
> > t...@linutronix.de; vinicius.go...@intel.com; xiyou.wangc...@gmail.com
> > 主题: INFO: rcu detected stall in tc_modify_qdisc
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
> > git tree:   net
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
> > compiler:   gcc (GCC) 10.1.0-syz 20200507
> > syz repro:
> > https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90
>
> It seems that syzkaller is generating an schedule with too small
> intervals (3ns in this case) which causes a hrtimer busy-loop which
> starves other kernel threads.
>
> We could put some limits on the interval when running in software mode,
> but I don't like this too much, because we are talking about users with
> CAP_NET_ADMIN and they have easier ways to do bad things to the system.

Hi Vinicius,

Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
Good code should check arguments regardless I think and it's useful to
protect root from, say, programming bugs rather than kill the machine
on any bug and misconfiguration. What am I missing?

Also are we talking about CAP_NET_ADMIN in a user ns as well
(effectively nobody)?

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-29 Thread Vinicius Costa Gomes

Hi,

"Zhang, Qiang"  writes:

> 
> 发件人: linux-kernel-ow...@vger.kernel.org  
> 代表 syzbot 
> 发送时间: 2020年7月29日 13:53
> 收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; 
> j...@resnulli.us; linux-kernel@vger.kernel.org; mi...@kernel.org; 
> net...@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de; 
> vinicius.go...@intel.com; xiyou.wangc...@gmail.com
> 主题: INFO: rcu detected stall in tc_modify_qdisc
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
> git tree:   net
> console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
> dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> syz repro:
> https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90

It seems that syzkaller is generating an schedule with too small
intervals (3ns in this case) which causes a hrtimer busy-loop which
starves other kernel threads.

We could put some limits on the interval when running in software mode,
but I don't like this too much, because we are talking about users with
CAP_NET_ADMIN and they have easier ways to do bad things to the system.


Cheers,
-- 
Vinicius

回复: INFO: rcu detected stall in tc_modify_qdisc

2020-07-29 Thread Zhang, Qiang




发件人: linux-kernel-ow...@vger.kernel.org  代表 
syzbot 
发送时间: 2020年7月29日 13:53
收件人: da...@davemloft.net; fweis...@gmail.com; j...@mojatatu.com; 
j...@resnulli.us; linux-kernel@vger.kernel.org; mi...@kernel.org; 
net...@vger.kernel.org; syzkaller-b...@googlegroups.com; t...@linutronix.de; 
vinicius.go...@intel.com; xiyou.wangc...@gmail.com
主题: INFO: rcu detected stall in tc_modify_qdisc

Hello,

syzbot found the following issue on:

HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
git tree:   net
console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15b2d79090

The issue was bisected to:

commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22
Author: Vinicius Costa Gomes 
Date:   Sat Sep 29 00:59:43 2018 +

tc: Add support for configuring the taprio scheduler

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=160e1bac90
console output: https://syzkaller.appspot.com/x/log.txt?x=110e1bac90

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9f78d5c664a8c33f4...@syzkaller.appspotmail.com
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:1-...!: (1 GPs behind) idle=6f6/1/0x4000 
softirq=10195/10196 fqs=1
(t=27930 jiffies g=9233 q=413)
rcu: rcu_preempt kthread starved for 27901 jiffies! g9233 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu:Unless rcu_preempt kthread gets sufficient CPU time, OOM is now 
expected behavior.
rcu: RCU grace-period kthread stack dump:
rcu_preempt R  running task2911210  2 0x4000
Call Trace:
 context_switch kernel/sched/core.c:3458 [inline]
 __schedule+0x8ea/0x2210 kernel/sched/core.c:4219
 schedule+0xd0/0x2a0 kernel/sched/core.c:4294
 schedule_timeout+0x148/0x250 kernel/time/timer.c:1908
 rcu_gp_fqs_loop kernel/rcu/tree.c:1874 [inline]
 rcu_gp_kthread+0xae5/0x1b50 kernel/rcu/tree.c:2044
 kthread+0x3b5/0x4a0 kernel/kthread.c:291
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
NMI backtrace for cpu 1
CPU: 1 PID: 6799 Comm: syz-executor494 Not tainted 5.8.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x194/0x1cf kernel/rcu/tree_stall.h:320
 print_cpu_stall kernel/rcu/tree_stall.h:553 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
 rcu_pending kernel/rcu/tree.c:3489 [inline]
 rcu_sched_clock_irq.cold+0x5b3/0xccc kernel/rcu/tree.c:2504
 update_process_times+0x25/0x60 kernel/time/timer.c:1737
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
 tick_sched_timer+0x108/0x290 kernel/time/tick-sched.c:1320
 __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
 __hrtimer_run_queues+0x1d5/0xfc0 kernel/time/hrtimer.c:1584
 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
 __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 88 e0 b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 
00 75 37 48 83 3d e3 52 cc 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 
00 00 e8 35 e5 66 f9 65 8b 05 fe 70 19 78
RSP: 0018:c900016672c0 EFLAGS: 0282
RAX: 11369c11 RBX: 0282 RCX: 0002
RDX: dc00 RSI:  RDI: 0282
RBP: 888093a052e8 R08:  R09: 
R10: 0001 R11:  R12: 0282
R13: 0078100c35c3 R14: 888093a05000 R15: 
 spin_unlock_irqrestore include/linux/spinlock.h:408 [inline]
 taprio_change+0x1fdc/0x2960 net/sched/sch_tap

INFO: rcu detected stall in tc_modify_qdisc

2020-07-28 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
git tree:   net
console output: https://syzkaller.appspot.com/x/log.txt?x=12925e3890
kernel config:  https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=16587f8c90
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15b2d79090

The issue was bisected to:

commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22
Author: Vinicius Costa Gomes 
Date:   Sat Sep 29 00:59:43 2018 +

tc: Add support for configuring the taprio scheduler

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=160e1bac90
console output: https://syzkaller.appspot.com/x/log.txt?x=110e1bac90

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9f78d5c664a8c33f4...@syzkaller.appspotmail.com
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:1-...!: (1 GPs behind) idle=6f6/1/0x4000 
softirq=10195/10196 fqs=1 
(t=27930 jiffies g=9233 q=413)
rcu: rcu_preempt kthread starved for 27901 jiffies! g9233 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu:Unless rcu_preempt kthread gets sufficient CPU time, OOM is now 
expected behavior.
rcu: RCU grace-period kthread stack dump:
rcu_preempt R  running task2911210  2 0x4000
Call Trace:
 context_switch kernel/sched/core.c:3458 [inline]
 __schedule+0x8ea/0x2210 kernel/sched/core.c:4219
 schedule+0xd0/0x2a0 kernel/sched/core.c:4294
 schedule_timeout+0x148/0x250 kernel/time/timer.c:1908
 rcu_gp_fqs_loop kernel/rcu/tree.c:1874 [inline]
 rcu_gp_kthread+0xae5/0x1b50 kernel/rcu/tree.c:2044
 kthread+0x3b5/0x4a0 kernel/kthread.c:291
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
NMI backtrace for cpu 1
CPU: 1 PID: 6799 Comm: syz-executor494 Not tainted 5.8.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x194/0x1cf kernel/rcu/tree_stall.h:320
 print_cpu_stall kernel/rcu/tree_stall.h:553 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
 rcu_pending kernel/rcu/tree.c:3489 [inline]
 rcu_sched_clock_irq.cold+0x5b3/0xccc kernel/rcu/tree.c:2504
 update_process_times+0x25/0x60 kernel/time/timer.c:1737
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
 tick_sched_timer+0x108/0x290 kernel/time/tick-sched.c:1320
 __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
 __hrtimer_run_queues+0x1d5/0xfc0 kernel/time/hrtimer.c:1584
 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
 __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
 
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 
[inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 88 e0 b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 
00 75 37 48 83 3d e3 52 cc 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 
00 00 e8 35 e5 66 f9 65 8b 05 fe 70 19 78
RSP: 0018:c900016672c0 EFLAGS: 0282
RAX: 11369c11 RBX: 0282 RCX: 0002
RDX: dc00 RSI:  RDI: 0282
RBP: 888093a052e8 R08:  R09: 
R10: 0001 R11:  R12: 0282
R13: 0078100c35c3 R14: 888093a05000 R15: 
 spin_unlock_irqrestore include/linux/spinlock.h:408 [inline]
 taprio_change+0x1fdc/0x2960 net/sched/sch_taprio.c:1557
 taprio_init+0x52e/0x670 net/sched/sch_taprio.c:1670
 qdisc_create+0x4b6/0x12e0 net/sched/sch_api.c:1246
 tc_modify_qdisc+0x4c8/0x1990 net/sched/sch_api.c:1662
 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5461
 netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
 netlink_sendmsg+0x856/0xd90

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

回复: INFO: rcu detected stall in tc_modify_qdisc

INFO: rcu detected stall in tc_modify_qdisc

9 matches

Site Navigation

Mail list logo

Footer information