Re: INFO: task hung in pipe_write (2)
On Mon, Oct 14, 2019 at 10:40:44PM +0200, Andreas Gruenbacher wrote: > Hi Darrick, > > On Thu, Sep 19, 2019 at 11:10 PM Darrick J. Wong > wrote: > > On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote: > > > On 19/09/2019 19.19, syzbot wrote: > > > > Hello, > > > > > > > > syzbot found the following crash on: > > > > > > > > HEAD commit:288b9117 Add linux-next specific files for 20190918 > > > > git tree: linux-next > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560 > > > > kernel config: > > > > https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3 > > > > dashboard link: > > > > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d > > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > syz repro: > > > > https://syzkaller.appspot.com/x/repro.syz?x=1185576960 > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=143580a160 > > > > > > > > The bug was bisected to: > > > > > > > > commit cfb864757d8690631aadf1c4b80022c18ae865b3 > > > > Author: Darrick J. Wong > > > > Date: Tue Sep 17 16:05:22 2019 + > > > > > > > > splice: only read in as much information as there is pipe buffer > > > > space > > > > > > The middle hunk (the one before splice_pipe_to_pipe()) accesses > > > opipe->{buffers, nrbufs}, but opipe is not locked at that point. So > > > maybe we end up passing len==0, which seems (once there's room in opipe) > > > it would put a zero-length pipe_buffer in opipe - and that probably > > > violates an invariant somewhere. > > > > > > But does the splice_pipe_to_pipe() case even need that extra logic? > > > Doesn't it handle short writes correctly already? > > > > Yep. I missed the part where splice_pipe_to_pipe is already perfectly > > capable of detecting insufficient space in opipe and kicking opipe's > > readers to clear out the buffer. So that hunk isn't needed, and now I'm > > wondering how in the other clause we return 0 from wait_for_space yet > > still don't have buffer space... > > > > Oh well, back to the drawing board. Good catch, though now it's become > > painfully clear that xfstests lacks rigorous testing of splice()... > > have you had any luck figuring out how to fix this? We're still > suffering from the regression I've reported a while ago (*). Nope, that's slipped along with everything else because I'm burning out on all the buggy sh*t that has gone in the kernel for 5.4 that has made it difficult to get regression tests to run reliably to find out if there's anything wrong with XFS. Oh, sure, if I turn off kmemleak and whatever the hell "haltpoll" is then it tidies up enough to run fstests but now "sleep 0.5" runs in anywhere between a jiffie and 10s. WTH. > If not, I wonder if reverting commit 8f67b5adc030 would make sense for now. Ugh, no, splice shouldn't be asking the filesystem for a 75k buffered read and then *oopsie* running out of pages after ~64k or so. Even more frighteningly the syzbot reproducer asks for a 20GB read into a pipe which gets sent right into the fs without any size clamping. Ok I'll at least cough up a v3 patch which maybe will work. --D > > * > https://lore.kernel.org/linux-fsdevel/CAHpGcM+WQYFHOOC8SzKq+=duhvz4fw4rhltmudn-o6gx3yt...@mail.gmail.com/T/#u > > Thanks, > Andreas
Re: INFO: task hung in pipe_write (2)
Hi Darrick, On Thu, Sep 19, 2019 at 11:10 PM Darrick J. Wong wrote: > On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote: > > On 19/09/2019 19.19, syzbot wrote: > > > Hello, > > > > > > syzbot found the following crash on: > > > > > > HEAD commit:288b9117 Add linux-next specific files for 20190918 > > > git tree: linux-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3 > > > dashboard link: > > > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1185576960 > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=143580a160 > > > > > > The bug was bisected to: > > > > > > commit cfb864757d8690631aadf1c4b80022c18ae865b3 > > > Author: Darrick J. Wong > > > Date: Tue Sep 17 16:05:22 2019 + > > > > > > splice: only read in as much information as there is pipe buffer space > > > > The middle hunk (the one before splice_pipe_to_pipe()) accesses > > opipe->{buffers, nrbufs}, but opipe is not locked at that point. So > > maybe we end up passing len==0, which seems (once there's room in opipe) > > it would put a zero-length pipe_buffer in opipe - and that probably > > violates an invariant somewhere. > > > > But does the splice_pipe_to_pipe() case even need that extra logic? > > Doesn't it handle short writes correctly already? > > Yep. I missed the part where splice_pipe_to_pipe is already perfectly > capable of detecting insufficient space in opipe and kicking opipe's > readers to clear out the buffer. So that hunk isn't needed, and now I'm > wondering how in the other clause we return 0 from wait_for_space yet > still don't have buffer space... > > Oh well, back to the drawing board. Good catch, though now it's become > painfully clear that xfstests lacks rigorous testing of splice()... have you had any luck figuring out how to fix this? We're still suffering from the regression I've reported a while ago (*). If not, I wonder if reverting commit 8f67b5adc030 would make sense for now. * https://lore.kernel.org/linux-fsdevel/CAHpGcM+WQYFHOOC8SzKq+=duhvz4fw4rhltmudn-o6gx3yt...@mail.gmail.com/T/#u Thanks, Andreas
Re: INFO: task hung in pipe_write (2)
On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote: > On 19/09/2019 19.19, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: 288b9117 Add linux-next specific files for 20190918 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560 > > kernel config: https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3 > > dashboard link: > > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1185576960 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=143580a160 > > > > The bug was bisected to: > > > > commit cfb864757d8690631aadf1c4b80022c18ae865b3 > > Author: Darrick J. Wong > > Date: Tue Sep 17 16:05:22 2019 + > > > > splice: only read in as much information as there is pipe buffer space > > The middle hunk (the one before splice_pipe_to_pipe()) accesses > opipe->{buffers, nrbufs}, but opipe is not locked at that point. So > maybe we end up passing len==0, which seems (once there's room in opipe) > it would put a zero-length pipe_buffer in opipe - and that probably > violates an invariant somewhere. > > But does the splice_pipe_to_pipe() case even need that extra logic? > Doesn't it handle short writes correctly already? Yep. I missed the part where splice_pipe_to_pipe is already perfectly capable of detecting insufficient space in opipe and kicking opipe's readers to clear out the buffer. So that hunk isn't needed, and now I'm wondering how in the other clause we return 0 from wait_for_space yet still don't have buffer space... Oh well, back to the drawing board. Good catch, though now it's become painfully clear that xfstests lacks rigorous testing of splice()... --D > Rasmus
Re: INFO: task hung in pipe_write (2)
On 19/09/2019 19.19, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: 288b9117 Add linux-next specific files for 20190918 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560 > kernel config: https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3 > dashboard link: > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1185576960 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=143580a160 > > The bug was bisected to: > > commit cfb864757d8690631aadf1c4b80022c18ae865b3 > Author: Darrick J. Wong > Date: Tue Sep 17 16:05:22 2019 + > > splice: only read in as much information as there is pipe buffer space The middle hunk (the one before splice_pipe_to_pipe()) accesses opipe->{buffers, nrbufs}, but opipe is not locked at that point. So maybe we end up passing len==0, which seems (once there's room in opipe) it would put a zero-length pipe_buffer in opipe - and that probably violates an invariant somewhere. But does the splice_pipe_to_pipe() case even need that extra logic? Doesn't it handle short writes correctly already? Rasmus
INFO: task hung in pipe_write (2)
Hello, syzbot found the following crash on: HEAD commit:288b9117 Add linux-next specific files for 20190918 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560 kernel config: https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3 dashboard link: https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d compiler: gcc (GCC) 9.0.0 20181231 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1185576960 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=143580a160 The bug was bisected to: commit cfb864757d8690631aadf1c4b80022c18ae865b3 Author: Darrick J. Wong Date: Tue Sep 17 16:05:22 2019 + splice: only read in as much information as there is pipe buffer space bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=150a745560 final crash:https://syzkaller.appspot.com/x/report.txt?x=170a745560 console output: https://syzkaller.appspot.com/x/log.txt?x=130a745560 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+3c01db6025f26530c...@syzkaller.appspotmail.com Fixes: cfb864757d86 ("splice: only read in as much information as there is pipe buffer space") INFO: task syz-executor623:8754 can't die for more than 143 seconds. syz-executor623 D26888 8754 8751 0x4004 Call Trace: context_switch kernel/sched/core.c:3384 [inline] __schedule+0x828/0x1c20 kernel/sched/core.c:4065 schedule+0xd9/0x260 kernel/sched/core.c:4132 schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:4191 __mutex_lock_common kernel/locking/mutex.c:1033 [inline] __mutex_lock+0x7b0/0x13c0 kernel/locking/mutex.c:1103 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1118 pipe_lock_nested fs/pipe.c:63 [inline] pipe_lock fs/pipe.c:71 [inline] pipe_wait+0x1ce/0x1f0 fs/pipe.c:119 pipe_write+0x5fa/0xf40 fs/pipe.c:497 call_write_iter include/linux/fs.h:1902 [inline] new_sync_write+0x4d3/0x770 fs/read_write.c:483 __vfs_write+0xe1/0x110 fs/read_write.c:496 vfs_write+0x268/0x5d0 fs/read_write.c:558 ksys_write+0x14f/0x290 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:620 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x445909 Code: ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 <7a> fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd RSP: 002b:7f9271abddb8 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 006dac68 RCX: 00445909 RDX: fc8f RSI: 2140 RDI: 0007 RBP: 006dac60 R08: R09: R10: R11: 0246 R12: 006dac6c R13: 7ffc2974a61f R14: 7f9271abe9c0 R15: 20c49ba5e353f7cf INFO: task syz-executor623:8754 blocked for more than 143 seconds. Not tainted 5.3.0-next-20190918 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor623 D26888 8754 8751 0x4004 Call Trace: context_switch kernel/sched/core.c:3384 [inline] __schedule+0x828/0x1c20 kernel/sched/core.c:4065 schedule+0xd9/0x260 kernel/sched/core.c:4132 schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:4191 __mutex_lock_common kernel/locking/mutex.c:1033 [inline] __mutex_lock+0x7b0/0x13c0 kernel/locking/mutex.c:1103 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1118 pipe_lock_nested fs/pipe.c:63 [inline] pipe_lock fs/pipe.c:71 [inline] pipe_wait+0x1ce/0x1f0 fs/pipe.c:119 pipe_write+0x5fa/0xf40 fs/pipe.c:497 call_write_iter include/linux/fs.h:1902 [inline] new_sync_write+0x4d3/0x770 fs/read_write.c:483 __vfs_write+0xe1/0x110 fs/read_write.c:496 vfs_write+0x268/0x5d0 fs/read_write.c:558 ksys_write+0x14f/0x290 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:620 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x445909 Code: ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 <7a> fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd ff 11 7a fd RSP: 002b:7f9271abddb8 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 006dac68 RCX: 00445909 RDX: fc8f RSI: 2140 RDI: 0007 RBP: 006dac60 R08: R09: R10: R11: 0246 R12: 006dac6c R13: 7ffc2974a61f R14: 7f9271abe9c0 R15: 20c49ba5e353f7cf INFO: task syz-executor623:8755 can't die for more than 143 seconds. syz-executor623 R running task24984 8755 8751 0x4006 Call Trace: