Re: INFO: rcu detected stall in corrupted (3)

2019-04-01 Thread Juri Lelli
Hi,

On 30/03/19 12:09, Borislav Petkov wrote:
> On Sat, Mar 30, 2019 at 07:57:50PM +0900, Tetsuo Handa wrote:
> > Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
> > Can we find a threshold where everyone can agree on?
> 
> This is what we do all day on lkml: discussing changes so that (almost)
> everyone is happy with them.

Looks like this is the same problem we discussed a while ago [1], but
couldn't reach an agreement on what's best to do about it.

I'll need to go back and refresh memory.

Thanks,

- Juri

1 - https://lore.kernel.org/lkml/a4ee200578172...@google.com/


Re: INFO: rcu detected stall in corrupted (3)

2019-03-30 Thread Borislav Petkov
On Sat, Mar 30, 2019 at 11:07:40PM +0900, Tetsuo Handa wrote:
> I think that syzbot should for now refrain from testing syscalls that change
> scheduling related attributes,

And how would we know about problems there, otherwise?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: INFO: rcu detected stall in corrupted (3)

2019-03-30 Thread Tetsuo Handa
On 2019/03/30 20:09, Borislav Petkov wrote:
> On Sat, Mar 30, 2019 at 07:57:50PM +0900, Tetsuo Handa wrote:
>> Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
>> Can we find a threshold where everyone can agree on?
> 
> This is what we do all day on lkml: discussing changes so that (almost)
> everyone is happy with them.
> 
> :-)
> 

I think that syzbot should for now refrain from testing syscalls that change
scheduling related attributes, for mixing stall reports caused by change of
scheduling related attributes and different stall reports caused by e.g.
(almost) infinite loop due to race conditions is annoying.


Re: INFO: rcu detected stall in corrupted (3)

2019-03-30 Thread Borislav Petkov
On Sat, Mar 30, 2019 at 07:57:50PM +0900, Tetsuo Handa wrote:
> Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
> Can we find a threshold where everyone can agree on?

This is what we do all day on lkml: discussing changes so that (almost)
everyone is happy with them.

:-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: INFO: rcu detected stall in corrupted (3)

2019-03-30 Thread Tetsuo Handa
On 2019/03/30 19:45, Borislav Petkov wrote:
> On Sat, Mar 30, 2019 at 07:40:11PM +0900, Tetsuo Handa wrote:
>> But how can the scheduler be aware of various watchdogs' thresholds?
> 
> I think what tglx means is sched_setattr() should be fixed to fail due
> to the bogus value.
> 

Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
Can we find a threshold where everyone can agree on?


Re: INFO: rcu detected stall in corrupted (3)

2019-03-30 Thread Borislav Petkov
On Sat, Mar 30, 2019 at 07:40:11PM +0900, Tetsuo Handa wrote:
> But how can the scheduler be aware of various watchdogs' thresholds?

I think what tglx means is sched_setattr() should be fixed to fail due
to the bogus value.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: INFO: rcu detected stall in corrupted (3)

2019-03-30 Thread Tetsuo Handa
On 2019/03/30 16:46, Thomas Gleixner wrote:
> On Sat, 30 Mar 2019, Tetsuo Handa wrote:
>> This reproducer does sched_setattr(SCHED_DEADLINE) with bogus value, as with
>> a reproducer for "INFO: rcu detected stall in sys_sendfile64" did.
>>
>> sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_DEADLINE */,
>> sched_flags=0, sched_nice=0, sched_priority=0, sched_runtime=65535,
>> sched_deadline=4611686018427453437, sched_period=0}, 0) = 0
>>
>> #syz invalid
> 
> Marking this invalid is not really the right thing to do. Bogus deadline
> parameters should not cause RCU stalls. They either need to be rejected or
> handled gracefully.

But how can the scheduler be aware of various watchdogs' thresholds?

The scheduler behaves differently based on watchdog's remaining grace periods?
That sounds quite strange. If administrator tunes watchdog thresholds in a way
schedulers can't survive (or vice versa), it must be an administrator's fault.

Since this stall might occur with any combination, not closing this kind of
report will result in flood of stall reports...


Re: INFO: rcu detected stall in corrupted (3)

2019-03-30 Thread Thomas Gleixner
On Sat, 30 Mar 2019, Tetsuo Handa wrote:

> On 2019/03/30 7:34, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    8c2ffd91 Linux 5.1-rc2
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15099d2b20
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=8dcdce25ea72bedf
> > dashboard link: https://syzkaller.appspot.com/bug?extid=65cecdd27b726c261799
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17d3c67d20
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d4f31720
> > 
> > Bisection is inconclusive: the bug happens on the oldest tested release.
> 
> This reproducer does sched_setattr(SCHED_DEADLINE) with bogus value, as with
> a reproducer for "INFO: rcu detected stall in sys_sendfile64" did.
> 
> sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_DEADLINE */,
> sched_flags=0, sched_nice=0, sched_priority=0, sched_runtime=65535,
> sched_deadline=4611686018427453437, sched_period=0}, 0) = 0
>
> #syz invalid

Marking this invalid is not really the right thing to do. Bogus deadline
parameters should not cause RCU stalls. They either need to be rejected or
handled gracefully.

Thanks,

tglx

Re: INFO: rcu detected stall in corrupted (3)

2019-03-29 Thread Tetsuo Handa
On 2019/03/30 7:34, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    8c2ffd91 Linux 5.1-rc2
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15099d2b20
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8dcdce25ea72bedf
> dashboard link: https://syzkaller.appspot.com/bug?extid=65cecdd27b726c261799
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17d3c67d20
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d4f31720
> 
> Bisection is inconclusive: the bug happens on the oldest tested release.

This reproducer does sched_setattr(SCHED_DEADLINE) with bogus value, as with
a reproducer for "INFO: rcu detected stall in sys_sendfile64" did.

  sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_DEADLINE */, 
sched_flags=0, sched_nice=0, sched_priority=0, sched_runtime=65535, 
sched_deadline=4611686018427453437, sched_period=0}, 0) = 0

#syz invalid