Re: INFO: task hung in pipe_write (2)

2019-10-14 Thread Darrick J. Wong
On Mon, Oct 14, 2019 at 10:40:44PM +0200, Andreas Gruenbacher wrote:
> Hi Darrick,
> 
> On Thu, Sep 19, 2019 at 11:10 PM Darrick J. Wong
>  wrote:
> > On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote:
> > > On 19/09/2019 19.19, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:288b9117 Add linux-next specific files for 20190918
> > > > git tree:   linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560
> > > > kernel config:  
> > > > https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3
> > > > dashboard link:
> > > > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d
> > > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > > syz repro:  
> > > > https://syzkaller.appspot.com/x/repro.syz?x=1185576960
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143580a160
> > > >
> > > > The bug was bisected to:
> > > >
> > > > commit cfb864757d8690631aadf1c4b80022c18ae865b3
> > > > Author: Darrick J. Wong 
> > > > Date:   Tue Sep 17 16:05:22 2019 +
> > > >
> > > > splice: only read in as much information as there is pipe buffer 
> > > > space
> > >
> > > The middle hunk (the one before splice_pipe_to_pipe()) accesses
> > > opipe->{buffers, nrbufs}, but opipe is not locked at that point. So
> > > maybe we end up passing len==0, which seems (once there's room in opipe)
> > > it would put a zero-length pipe_buffer in opipe - and that probably
> > > violates an invariant somewhere.
> > >
> > > But does the splice_pipe_to_pipe() case even need that extra logic?
> > > Doesn't it handle short writes correctly already?
> >
> > Yep.  I missed the part where splice_pipe_to_pipe is already perfectly
> > capable of detecting insufficient space in opipe and kicking opipe's
> > readers to clear out the buffer.  So that hunk isn't needed, and now I'm
> > wondering how in the other clause we return 0 from wait_for_space yet
> > still don't have buffer space...
> >
> > Oh well, back to the drawing board.  Good catch, though now it's become
> > painfully clear that xfstests lacks rigorous testing of splice()...
> 
> have you had any luck figuring out how to fix this? We're still
> suffering from the regression I've reported a while ago (*).

Nope, that's slipped along with everything else because I'm burning out
on all the buggy sh*t that has gone in the kernel for 5.4 that has made
it difficult to get regression tests to run reliably to find out if
there's anything wrong with XFS.

Oh, sure, if I turn off kmemleak and whatever the hell "haltpoll" is
then it tidies up enough to run fstests but now "sleep 0.5" runs in
anywhere between a jiffie and 10s.  WTH.

> If not, I wonder if reverting commit 8f67b5adc030 would make sense for now.

Ugh, no, splice shouldn't be asking the filesystem for a 75k buffered
read and then *oopsie* running out of pages after ~64k or so.  Even more
frighteningly the syzbot reproducer asks for a 20GB read into a pipe
which gets sent right into the fs without any size clamping.

Ok I'll at least cough up a v3 patch which maybe will work.

--D

> 
> * 
> https://lore.kernel.org/linux-fsdevel/CAHpGcM+WQYFHOOC8SzKq+=duhvz4fw4rhltmudn-o6gx3yt...@mail.gmail.com/T/#u
> 
> Thanks,
> Andreas


Re: INFO: task hung in pipe_write (2)

2019-10-14 Thread Andreas Gruenbacher
Hi Darrick,

On Thu, Sep 19, 2019 at 11:10 PM Darrick J. Wong
 wrote:
> On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote:
> > On 19/09/2019 19.19, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:288b9117 Add linux-next specific files for 20190918
> > > git tree:   linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3
> > > dashboard link:
> > > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d
> > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1185576960
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143580a160
> > >
> > > The bug was bisected to:
> > >
> > > commit cfb864757d8690631aadf1c4b80022c18ae865b3
> > > Author: Darrick J. Wong 
> > > Date:   Tue Sep 17 16:05:22 2019 +
> > >
> > > splice: only read in as much information as there is pipe buffer space
> >
> > The middle hunk (the one before splice_pipe_to_pipe()) accesses
> > opipe->{buffers, nrbufs}, but opipe is not locked at that point. So
> > maybe we end up passing len==0, which seems (once there's room in opipe)
> > it would put a zero-length pipe_buffer in opipe - and that probably
> > violates an invariant somewhere.
> >
> > But does the splice_pipe_to_pipe() case even need that extra logic?
> > Doesn't it handle short writes correctly already?
>
> Yep.  I missed the part where splice_pipe_to_pipe is already perfectly
> capable of detecting insufficient space in opipe and kicking opipe's
> readers to clear out the buffer.  So that hunk isn't needed, and now I'm
> wondering how in the other clause we return 0 from wait_for_space yet
> still don't have buffer space...
>
> Oh well, back to the drawing board.  Good catch, though now it's become
> painfully clear that xfstests lacks rigorous testing of splice()...

have you had any luck figuring out how to fix this? We're still
suffering from the regression I've reported a while ago (*).

If not, I wonder if reverting commit 8f67b5adc030 would make sense for now.

* 
https://lore.kernel.org/linux-fsdevel/CAHpGcM+WQYFHOOC8SzKq+=duhvz4fw4rhltmudn-o6gx3yt...@mail.gmail.com/T/#u

Thanks,
Andreas


Re: INFO: task hung in pipe_write (2)

2019-09-19 Thread Darrick J. Wong
On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote:
> On 19/09/2019 19.19, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    288b9117 Add linux-next specific files for 20190918
> > git tree:   linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1185576960
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143580a160
> > 
> > The bug was bisected to:
> > 
> > commit cfb864757d8690631aadf1c4b80022c18ae865b3
> > Author: Darrick J. Wong 
> > Date:   Tue Sep 17 16:05:22 2019 +
> > 
> >     splice: only read in as much information as there is pipe buffer space
> 
> The middle hunk (the one before splice_pipe_to_pipe()) accesses
> opipe->{buffers, nrbufs}, but opipe is not locked at that point. So
> maybe we end up passing len==0, which seems (once there's room in opipe)
> it would put a zero-length pipe_buffer in opipe - and that probably
> violates an invariant somewhere.
>
> But does the splice_pipe_to_pipe() case even need that extra logic?
> Doesn't it handle short writes correctly already?

Yep.  I missed the part where splice_pipe_to_pipe is already perfectly
capable of detecting insufficient space in opipe and kicking opipe's
readers to clear out the buffer.  So that hunk isn't needed, and now I'm
wondering how in the other clause we return 0 from wait_for_space yet
still don't have buffer space...

Oh well, back to the drawing board.  Good catch, though now it's become
painfully clear that xfstests lacks rigorous testing of splice()...

--D

> Rasmus


Re: INFO: task hung in pipe_write (2)

2019-09-19 Thread Rasmus Villemoes
On 19/09/2019 19.19, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    288b9117 Add linux-next specific files for 20190918
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1185576960
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143580a160
> 
> The bug was bisected to:
> 
> commit cfb864757d8690631aadf1c4b80022c18ae865b3
> Author: Darrick J. Wong 
> Date:   Tue Sep 17 16:05:22 2019 +
> 
>     splice: only read in as much information as there is pipe buffer space

The middle hunk (the one before splice_pipe_to_pipe()) accesses
opipe->{buffers, nrbufs}, but opipe is not locked at that point. So
maybe we end up passing len==0, which seems (once there's room in opipe)
it would put a zero-length pipe_buffer in opipe - and that probably
violates an invariant somewhere.

But does the splice_pipe_to_pipe() case even need that extra logic?
Doesn't it handle short writes correctly already?

Rasmus


INFO: task hung in pipe_write (2)

2019-09-19 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:288b9117 Add linux-next specific files for 20190918
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560
kernel config:  https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3
dashboard link: https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1185576960
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143580a160

The bug was bisected to:

commit cfb864757d8690631aadf1c4b80022c18ae865b3
Author: Darrick J. Wong 
Date:   Tue Sep 17 16:05:22 2019 +

splice: only read in as much information as there is pipe buffer space

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=150a745560
final crash:https://syzkaller.appspot.com/x/report.txt?x=170a745560
console output: https://syzkaller.appspot.com/x/log.txt?x=130a745560

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3c01db6025f26530c...@syzkaller.appspotmail.com
Fixes: cfb864757d86 ("splice: only read in as much information as there is  
pipe buffer space")


INFO: task syz-executor623:8754 can't die for more than 143 seconds.
syz-executor623 D26888  8754   8751 0x4004
Call Trace:
 context_switch kernel/sched/core.c:3384 [inline]
 __schedule+0x828/0x1c20 kernel/sched/core.c:4065
 schedule+0xd9/0x260 kernel/sched/core.c:4132
 schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:4191
 __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
 __mutex_lock+0x7b0/0x13c0 kernel/locking/mutex.c:1103
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1118
 pipe_lock_nested fs/pipe.c:63 [inline]
 pipe_lock fs/pipe.c:71 [inline]
 pipe_wait+0x1ce/0x1f0 fs/pipe.c:119
 pipe_write+0x5fa/0xf40 fs/pipe.c:497
 call_write_iter include/linux/fs.h:1902 [inline]
 new_sync_write+0x4d3/0x770 fs/read_write.c:483
 __vfs_write+0xe1/0x110 fs/read_write.c:496
 vfs_write+0x268/0x5d0 fs/read_write.c:558
 ksys_write+0x14f/0x290 fs/read_write.c:611
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:620
 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445909
Code: ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a  
fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 <7a> fd ff 11 7a  
fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd

RSP: 002b:7f9271abddb8 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 006dac68 RCX: 00445909
RDX: fc8f RSI: 2140 RDI: 0007
RBP: 006dac60 R08:  R09: 
R10:  R11: 0246 R12: 006dac6c
R13: 7ffc2974a61f R14: 7f9271abe9c0 R15: 20c49ba5e353f7cf
INFO: task syz-executor623:8754 blocked for more than 143 seconds.
  Not tainted 5.3.0-next-20190918 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor623 D26888  8754   8751 0x4004
Call Trace:
 context_switch kernel/sched/core.c:3384 [inline]
 __schedule+0x828/0x1c20 kernel/sched/core.c:4065
 schedule+0xd9/0x260 kernel/sched/core.c:4132
 schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:4191
 __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
 __mutex_lock+0x7b0/0x13c0 kernel/locking/mutex.c:1103
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1118
 pipe_lock_nested fs/pipe.c:63 [inline]
 pipe_lock fs/pipe.c:71 [inline]
 pipe_wait+0x1ce/0x1f0 fs/pipe.c:119
 pipe_write+0x5fa/0xf40 fs/pipe.c:497
 call_write_iter include/linux/fs.h:1902 [inline]
 new_sync_write+0x4d3/0x770 fs/read_write.c:483
 __vfs_write+0xe1/0x110 fs/read_write.c:496
 vfs_write+0x268/0x5d0 fs/read_write.c:558
 ksys_write+0x14f/0x290 fs/read_write.c:611
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:620
 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445909
Code: ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a  
fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 <7a> fd ff 11 7a  
fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd

RSP: 002b:7f9271abddb8 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 006dac68 RCX: 00445909
RDX: fc8f RSI: 2140 RDI: 0007
RBP: 006dac60 R08:  R09: 
R10:  R11: 0246 R12: 006dac6c
R13: 7ffc2974a61f R14: 7f9271abe9c0 R15: 20c49ba5e353f7cf
INFO: task syz-executor623:8755 can't die for more than 143 seconds.
syz-executor623 R  running task24984  8755   8751 0x4006
Call Trace: