Re: splice: infinite busy loop lockup bug

2020-09-01 Thread Qian Cai
On Tue, Sep 01, 2020 at 09:10:05AM +0800, Ming Lei wrote:
> On Tue, Sep 1, 2020 at 8:53 AM Qian Cai  wrote:
> >
> > On Fri, Aug 07, 2020 at 09:34:08PM +0900, Tetsuo Handa wrote:
> > > On 2020/08/07 21:27, Al Viro wrote:
> > > > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> > > >> syzbot is reporting hung task at pipe_release() [1], for 
> > > >> for_each_bvec() from
> > > >> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> > > >> ext4_unaligned_io() from ext4_dio_write_iter() from 
> > > >> ext4_file_write_iter() from
> > > >> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> > > >> vfs_iter_write() from iter_file_splice_write() falls into infinite 
> > > >> busy loop
> > > >> with pipe->mutex held.
> > > >>
> > > >> The reason of falling into infinite busy loop is that 
> > > >> iter_file_splice_write()
> > > >> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> > > >> .bv_offset=0
> > > >> while for_each_bvec() cannot handle .bv_len == 0.
> > > >
> > > > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> > > > iterate_bvec()",
> > > > unless I'm misreading it...
> >
> > I have been chasing something similar for a while as in,
> >
> > https://lore.kernel.org/linux-fsdevel/89f418a9-eb20-48cb-9ae0-52c700e6b...@lca.pw/
> >
> > In my case, it seems the endless loop happens in iterate_iovec() instead 
> > where
> > I put a debug patch here,
> >
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -33,6 +33,7 @@
> > if (unlikely(!__v.iov_len)) \
> > continue;   \
> > __v.iov_base = __p->iov_base;   \
> > +   printk_ratelimited("ITER_IOVEC left = %zu, n = %zu\n", 
> > left, n); \
> > left = (STEP);  \
> > __v.iov_len -= left;\
> > skip = __v.iov_len; \
> >
> > and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before 
> > the
> > soft-lockups and a dead system,
> >
> > [ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423
> >
> > Thoughts?
> 
> Does the following patch make a difference for you?
> 
> https://lore.kernel.org/linux-block/20200817100055.2495905-1-ming@redhat.com/

Yes, it does. I could no longer be able to reproduce the soft lockups with the
patch applied.


Re: splice: infinite busy loop lockup bug

2020-08-31 Thread Al Viro
On Mon, Aug 31, 2020 at 11:32:28PM -0400, Qian Cai wrote:

> I used a new debug patch but not sure how to capture without
> printk_ratelimited() because the call sites are large,

if (!strcmp(current->comm, "bugger"))
printk(KERN_ERR
and call the binary you are running ./bugger...  And I'd slap such
printk into the beginning of iterate_iovec() as well, if not into
the entry of iov_iter_copy_from_user_atomic().  That BS value of
n must've come from somewhere; it should expand to 'bytes'.
What we have in the beginning is

const struct iovec *iov;
struct iovec v;
size_t skip = i->iov_offset;
size_t left;
size_t wanted = bytes;

iov = i->iov;

__v.iov_len = min(bytes, iov->iov_len - skip);
if (likely(__v.iov_len)) {
__v.iov_base = iov->iov_base + skip;
left = copyin((p += v.iov_len) - v.iov_len, v.iov_base, 
v.iov_len);
__v.iov_len -= left;
skip += __v.iov_len;
bytes -= __v.iov_len;
} else {
left = 0;
}

and something leaves you with bytes bumped to 22476968.  What was in that first
iovec?  Incidentally, what's in 'wanted'?  And...  Timestamps don't look like
that crap has come from generic_perform_write() - it's about 4 seconds later.

While we are at it, there are other users of iterate_all_kinds(), and some of 
them
can very well get large sizes; they are not copying anything 
(iov_iter_advance(),
for starters).  There that kind of values would be just fine; are you sure those
printks came from iov_iter_copy_from_user_atomic()?


Re: splice: infinite busy loop lockup bug

2020-08-31 Thread Qian Cai
On Tue, Sep 01, 2020 at 02:09:28AM +0100, Al Viro wrote:
> On Mon, Aug 31, 2020 at 08:51:32PM -0400, Qian Cai wrote:
> > On Fri, Aug 07, 2020 at 09:34:08PM +0900, Tetsuo Handa wrote:
> > > On 2020/08/07 21:27, Al Viro wrote:
> > > > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> > > >> syzbot is reporting hung task at pipe_release() [1], for 
> > > >> for_each_bvec() from
> > > >> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> > > >> ext4_unaligned_io() from ext4_dio_write_iter() from 
> > > >> ext4_file_write_iter() from
> > > >> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> > > >> vfs_iter_write() from iter_file_splice_write() falls into infinite 
> > > >> busy loop
> > > >> with pipe->mutex held.
> > > >>
> > > >> The reason of falling into infinite busy loop is that 
> > > >> iter_file_splice_write()
> > > >> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> > > >> .bv_offset=0
> > > >> while for_each_bvec() cannot handle .bv_len == 0.
> > > > 
> > > > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> > > > iterate_bvec()",
> > > > unless I'm misreading it...
> > 
> > I have been chasing something similar for a while as in,
> > 
> > https://lore.kernel.org/linux-fsdevel/89f418a9-eb20-48cb-9ae0-52c700e6b...@lca.pw/
> > 
> > In my case, it seems the endless loop happens in iterate_iovec() instead 
> > where
> > I put a debug patch here,
> > 
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -33,6 +33,7 @@
> > if (unlikely(!__v.iov_len)) \
> > continue;   \
> > __v.iov_base = __p->iov_base;   \
> > +   printk_ratelimited("ITER_IOVEC left = %zu, n = %zu\n", 
> > left, n); \
> > left = (STEP);  \
> > __v.iov_len -= left;\
> > skip = __v.iov_len; \
> > 
> > and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before 
> > the
> > soft-lockups and a dead system,
> > 
> > [ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423
> > 
> > Thoughts?
> 
> Er...  Where does that size come from?  If that's generic_perform_write(),
> I'd like to see pos, offset and bytes at the time of call...  ->iov_offset 
> would
> also be interesting to see (along with the entire iovec array, really).

I used a new debug patch but not sure how to capture without
printk_ratelimited() because the call sites are large,

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d812eef23a32..214b93c3d8a8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -33,6 +33,7 @@
if (unlikely(!__v.iov_len)) \
continue;   \
__v.iov_base = __p->iov_base;   \
+   printk_ratelimited("ITER_IOVEC left = %zu, n = %zu, iov_offset 
= %zu, iov_base = %px, iov_len = %lu\n", left, n, i->iov_offset, __p->iov_base, 
__p->iov_len); \
left = (STEP);  \
__v.iov_len -= left;\
skip = __v.iov_len; \
diff --git a/mm/filemap.c b/mm/filemap.c
index 1aaea26556cc..202b0ab28adf 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3408,6 +3408,7 @@ ssize_t generic_perform_write(struct file *file,
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
 
+   printk_ratelimited("pos = %lld, offset = %ld, bytes = %ld\n", 
pos, offset, bytes);
copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
flush_dcache_page(page);

Al, does it help? If not, can you cook a debug patch instead?

[  587.765400][T21348] pos = 468152746240, offset = 256, bytes = 3840
[  587.765424][T21348] pos = 468152750080, offset = 0, bytes = 4096
[  587.765439][T21348] pos = 468152754176, offset = 0, bytes = 4096
[  587.765455][T21348] pos = 468152758272, offset = 0, bytes = 4096
[  591.235409][T22038] ITER_IOVEC left = 0, n = 22476968, iov_offset = 0, 
iov_base = 7ff12570c000, iov_len = 20
[  591.313456][T22038] ITER_IOVEC left = 0, n = 22476948, iov_offset = 0, 
iov_base = 7ff12325, iov_len = 20
[  591.363679][T22038] ITER_IOVEC left = 0, n = 22476928, iov_offset = 0, 
iov_base = 7ff12365, iov_len = 6370
[  591.413217][T22038] ITER_IOVEC left = 0, n = 22470558, iov_offset = 0, 
iov_base = 7ff12385, iov_len = 20
[  591.459754][T22038] ITER_IOVEC left = 0, n = 22470538, iov_offset = 0, 
iov_base = 7ff12570c000, iov_len = 376
[  591.507748][T22038] ITER_IOVEC left = 0, n = 22470162, iov_offset = 0, 
iov_base = 7ff12570c000, iov_len = 3473
[  591.557395][T22038] ITER_IOVEC left = 0, n = 22466689, iov_offset = 0, 
iov_base = 7ff12325, iov_len = 20
[  591.605295][T22038] ITER_IOVEC left = 0, n = 2249, 

Re: splice: infinite busy loop lockup bug

2020-08-31 Thread Qian Cai
On Tue, Sep 01, 2020 at 02:09:28AM +0100, Al Viro wrote:
> On Mon, Aug 31, 2020 at 08:51:32PM -0400, Qian Cai wrote:
> > On Fri, Aug 07, 2020 at 09:34:08PM +0900, Tetsuo Handa wrote:
> > > On 2020/08/07 21:27, Al Viro wrote:
> > > > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> > > >> syzbot is reporting hung task at pipe_release() [1], for 
> > > >> for_each_bvec() from
> > > >> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> > > >> ext4_unaligned_io() from ext4_dio_write_iter() from 
> > > >> ext4_file_write_iter() from
> > > >> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> > > >> vfs_iter_write() from iter_file_splice_write() falls into infinite 
> > > >> busy loop
> > > >> with pipe->mutex held.
> > > >>
> > > >> The reason of falling into infinite busy loop is that 
> > > >> iter_file_splice_write()
> > > >> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> > > >> .bv_offset=0
> > > >> while for_each_bvec() cannot handle .bv_len == 0.
> > > > 
> > > > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> > > > iterate_bvec()",
> > > > unless I'm misreading it...
> > 
> > I have been chasing something similar for a while as in,
> > 
> > https://lore.kernel.org/linux-fsdevel/89f418a9-eb20-48cb-9ae0-52c700e6b...@lca.pw/
> > 
> > In my case, it seems the endless loop happens in iterate_iovec() instead 
> > where
> > I put a debug patch here,
> > 
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -33,6 +33,7 @@
> > if (unlikely(!__v.iov_len)) \
> > continue;   \
> > __v.iov_base = __p->iov_base;   \
> > +   printk_ratelimited("ITER_IOVEC left = %zu, n = %zu\n", 
> > left, n); \
> > left = (STEP);  \
> > __v.iov_len -= left;\
> > skip = __v.iov_len; \
> > 
> > and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before 
> > the
> > soft-lockups and a dead system,
> > 
> > [ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423
> > 
> > Thoughts?
> 
> Er...  Where does that size come from?  If that's generic_perform_write(),
> I'd like to see pos, offset and bytes at the time of call...  ->iov_offset 
> would
> also be interesting to see (along with the entire iovec array, really).

Yes, generic_perform_write(). I'll see if I can capture more information.

[ 2867.463013][T217919] ITER_IOVEC left = 0, n = 2209 
[ 2867.466154][T217971] ITER_IOVEC left = 0, n = 2093279 
[ 2867.903689][T217971] ITER_IOVEC left = 0, n = 2093259 
[ 2867.928514][T217971] ITER_IOVEC left = 0, n = 2093239 
[ 2867.952450][T217971] ITER_IOVEC left = 0, n = 2090980 
[ 2867.976585][T217971] ITER_IOVEC left = 0, n = 2090960 
[ 2869.219459][T217774] futex_wake_op: trinity-c61 tries to shift op by -1; fix 
this program 
[ 2870.005178][T218110] futex_wake_op: trinity-c9 tries to shift op by -1; fix 
this program 
[ 2870.297607][T218213] futex_wake_op: trinity-c4 tries to shift op by -836; 
fix this program 
[ 2870.338118][T218213] futex_wake_op: trinity-c4 tries to shift op by -836; 
fix this program 
[-- MARK -- Mon Aug 31 18:50:00 2020] 
[ 2893.870387][   C62] watchdog: BUG: soft lockup - CPU#62 stuck for 23s! 
[trinity-c5:218155] 
[ 2893.912341][   C62] Modules linked in: nls_ascii nls_cp437 vfat fat 
kvm_intel kvm irqbypass efivars ip_tables x_tables sd_mod ahci bnx2x libahci 
mdio libata firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs 
[ 2894.003787][   C62] irq event stamp: 55378 
[ 2894.022398][   C62] hardirqs last  enabled at (55377): [] 
asm_sysvec_apic_timer_interrupt+0x12/0x20 
[ 2894.070770][   C62] hardirqs last disabled at (55378): [] 
irqentry_enter+0x1d/0x50 
[ 2894.112794][   C62] softirqs last  enabled at (50602): [] 
__do_softirq+0x70f/0xa9f 
[ 2894.154580][   C62] softirqs last disabled at (49393): [] 
asm_call_on_stack+0x12/0x20 
[ 2894.197654][   C62] CPU: 62 PID: 218155 Comm: trinity-c5 Not tainted 
5.9.0-rc2-next-20200828+ #4 
[ 2894.239807][   C62] Hardware name: HP Synergy 480 Gen9/Synergy 480 Gen9 
Compute Module, BIOS I37 10/21/2019 
[ 2894.284894][   C62] RIP: 0010:iov_iter_copy_from_user_atomic+0x598/0xab0 
[ 2894.316037][   C62] Code: 42 0f b6 14 08 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 
08 84 d2 0f 85 7c 03 00 00 48 8d 7e 08 8b 46 0c 48 89 fa 48 c1 ea 03 01 e8 <42> 
0f b6 14 0a 84 d2 74 09 80 fa 03 0f 8e 0e 03 00 00 44 8b 6e 08 
[ 2894.408763][   C62] RSP: 0018:c9000fb7f848 EFLAGS: 0246 
[ 2894.440034][   C62] RAX:  RBX:  RCX: 
888fce295100 
[ 2894.480312][   C62] RDX: 1111f9c52a21 RSI: 888fce295100 RDI: 
888fce295108 
[ 2894.517551][   C62] RBP:  R08: f52001f6ffa1 R09: 
dc00 
[ 2894.556460][   C62] R10: 889055c1f000 R11: 0400 R12: 
c9000fb7fcf0 

Re: splice: infinite busy loop lockup bug

2020-08-31 Thread Ming Lei
On Tue, Sep 1, 2020 at 8:53 AM Qian Cai  wrote:
>
> On Fri, Aug 07, 2020 at 09:34:08PM +0900, Tetsuo Handa wrote:
> > On 2020/08/07 21:27, Al Viro wrote:
> > > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> > >> syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() 
> > >> from
> > >> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> > >> ext4_unaligned_io() from ext4_dio_write_iter() from 
> > >> ext4_file_write_iter() from
> > >> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> > >> vfs_iter_write() from iter_file_splice_write() falls into infinite busy 
> > >> loop
> > >> with pipe->mutex held.
> > >>
> > >> The reason of falling into infinite busy loop is that 
> > >> iter_file_splice_write()
> > >> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> > >> .bv_offset=0
> > >> while for_each_bvec() cannot handle .bv_len == 0.
> > >
> > > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> > > iterate_bvec()",
> > > unless I'm misreading it...
>
> I have been chasing something similar for a while as in,
>
> https://lore.kernel.org/linux-fsdevel/89f418a9-eb20-48cb-9ae0-52c700e6b...@lca.pw/
>
> In my case, it seems the endless loop happens in iterate_iovec() instead where
> I put a debug patch here,
>
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -33,6 +33,7 @@
> if (unlikely(!__v.iov_len)) \
> continue;   \
> __v.iov_base = __p->iov_base;   \
> +   printk_ratelimited("ITER_IOVEC left = %zu, n = %zu\n", left, 
> n); \
> left = (STEP);  \
> __v.iov_len -= left;\
> skip = __v.iov_len; \
>
> and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before 
> the
> soft-lockups and a dead system,
>
> [ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423
>
> Thoughts?

Does the following patch make a difference for you?

https://lore.kernel.org/linux-block/20200817100055.2495905-1-ming@redhat.com/

thanks,
Ming Lei


Re: splice: infinite busy loop lockup bug

2020-08-31 Thread Al Viro
On Mon, Aug 31, 2020 at 08:51:32PM -0400, Qian Cai wrote:
> On Fri, Aug 07, 2020 at 09:34:08PM +0900, Tetsuo Handa wrote:
> > On 2020/08/07 21:27, Al Viro wrote:
> > > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> > >> syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() 
> > >> from
> > >> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> > >> ext4_unaligned_io() from ext4_dio_write_iter() from 
> > >> ext4_file_write_iter() from
> > >> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> > >> vfs_iter_write() from iter_file_splice_write() falls into infinite busy 
> > >> loop
> > >> with pipe->mutex held.
> > >>
> > >> The reason of falling into infinite busy loop is that 
> > >> iter_file_splice_write()
> > >> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> > >> .bv_offset=0
> > >> while for_each_bvec() cannot handle .bv_len == 0.
> > > 
> > > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> > > iterate_bvec()",
> > > unless I'm misreading it...
> 
> I have been chasing something similar for a while as in,
> 
> https://lore.kernel.org/linux-fsdevel/89f418a9-eb20-48cb-9ae0-52c700e6b...@lca.pw/
> 
> In my case, it seems the endless loop happens in iterate_iovec() instead where
> I put a debug patch here,
> 
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -33,6 +33,7 @@
> if (unlikely(!__v.iov_len)) \
> continue;   \
> __v.iov_base = __p->iov_base;   \
> +   printk_ratelimited("ITER_IOVEC left = %zu, n = %zu\n", left, 
> n); \
> left = (STEP);  \
> __v.iov_len -= left;\
> skip = __v.iov_len; \
> 
> and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before 
> the
> soft-lockups and a dead system,
> 
> [ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423
> 
> Thoughts?

Er...  Where does that size come from?  If that's generic_perform_write(),
I'd like to see pos, offset and bytes at the time of call...  ->iov_offset would
also be interesting to see (along with the entire iovec array, really).


Re: splice: infinite busy loop lockup bug

2020-08-31 Thread Qian Cai
On Fri, Aug 07, 2020 at 09:34:08PM +0900, Tetsuo Handa wrote:
> On 2020/08/07 21:27, Al Viro wrote:
> > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> >> syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() 
> >> from
> >> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> >> ext4_unaligned_io() from ext4_dio_write_iter() from ext4_file_write_iter() 
> >> from
> >> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> >> vfs_iter_write() from iter_file_splice_write() falls into infinite busy 
> >> loop
> >> with pipe->mutex held.
> >>
> >> The reason of falling into infinite busy loop is that 
> >> iter_file_splice_write()
> >> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> >> .bv_offset=0
> >> while for_each_bvec() cannot handle .bv_len == 0.
> > 
> > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> > iterate_bvec()",
> > unless I'm misreading it...

I have been chasing something similar for a while as in,

https://lore.kernel.org/linux-fsdevel/89f418a9-eb20-48cb-9ae0-52c700e6b...@lca.pw/

In my case, it seems the endless loop happens in iterate_iovec() instead where
I put a debug patch here,

--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -33,6 +33,7 @@
if (unlikely(!__v.iov_len)) \
continue;   \
__v.iov_base = __p->iov_base;   \
+   printk_ratelimited("ITER_IOVEC left = %zu, n = %zu\n", left, 
n); \
left = (STEP);  \
__v.iov_len -= left;\
skip = __v.iov_len; \

and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before the
soft-lockups and a dead system,

[ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423

Thoughts?

> > 
> > Zero-length segments are not disallowed; it's not all that hard to filter 
> > them
> > out in iter_file_splice_write(), but the intent had always been to have
> > iterate_all_kinds() et.al. able to cope with those.
> > 
> > How are these pipe_buffers with ->len == 0 generated in that reproducer, 
> > BTW?
> > There might be something else fishy going on...
> > 
> 
> OK. Indeed writing to empty pipe which returns -EFAULT allows an empty
> page to be linked to pipe's array.
> 
> Now, I've just found a simple reproducer, and confirmed that this bug is
> a local lockup DoS by unprivileged user. Please fix.
> 
> --
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char *argv[])
> {
> static char buffer[4096];
> const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
> int pipe_fd[2] = { EOF, EOF };
> pipe(pipe_fd);
> write(pipe_fd[1], NULL, 4096);
> write(pipe_fd[1], buffer, 4096);
> splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
> return 0;
> }
> --
> 
> [  125.598898][C0] rcu: INFO: rcu_sched self-detected stall on CPU
> [  125.601072][C0] rcu:   0-: (20171 ticks this GP) 
> idle=526/1/0x4000 softirq=7918/7918 fqs=5136 
> [  125.604874][C0](t=21006 jiffies g=9341 q=30)
> [  125.606512][C0] NMI backtrace for cpu 0
> [  125.607931][C0] CPU: 0 PID: 2792 Comm: a.out Not tainted 5.8.0+ #793
> [  125.610948][C0] Hardware name: VMware, Inc. VMware Virtual 
> Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
> [  125.614938][C0] Call Trace:
> [  125.616049][C0]  
> [  125.617010][C0]  dump_stack+0x5e/0x7a
> [  125.618370][C0]  nmi_cpu_backtrace.cold.7+0x14/0x52
> [  125.620148][C0]  ? lapic_can_unplug_cpu.cold.39+0x3a/0x3a
> [  125.622074][C0]  nmi_trigger_cpumask_backtrace+0x92/0x9f
> [  125.624154][C0]  arch_trigger_cpumask_backtrace+0x14/0x20
> [  125.626102][C0]  rcu_dump_cpu_stacks+0xa0/0xd0
> [  125.627919][C0]  rcu_sched_clock_irq.cold.95+0x121/0x39c
> [  125.629833][C0]  ? acct_account_cputime+0x17/0x20
> [  125.631534][C0]  ? account_system_index_time+0x8a/0xa0
> [  125.633422][C0]  update_process_times+0x23/0x60
> [  125.635070][C0]  tick_sched_handle.isra.22+0x20/0x60
> [  125.636870][C0]  tick_sched_timer+0x68/0x80
> [  125.638403][C0]  ? tick_sched_handle.isra.22+0x60/0x60
> [  125.640588][C0]  __hrtimer_run_queues+0xf9/0x1a0
> [  125.642591][C0]  hrtimer_interrupt+0xfc/0x210
> [  125.645033][C0]  __sysvec_apic_timer_interrupt+0x4c/0x60
> [  125.647292][C0]  asm_call_on_stack+0xf/0x20
> [  125.649192][C0]  
> [  125.650501][C0]  sysvec_apic_timer_interrupt+0x75/0x80
> [  125.652900][C0]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [  125.655487][C0] RIP: 0010:iov_iter_copy_from_user_atomic+0x19b/0x350
> [  125.658124][C0] Code: 89 45 d0 48 c1 e6 06 48 03 37 4d 8d 3c 09 4c 89 
> cf e8 d9 e5 ff ff 48 8b 45 d0 45 39 eb 0f 87 35 

Re: splice: infinite busy loop lockup bug

2020-08-08 Thread Ming Lei
On Sun, Aug 09, 2020 at 10:31:23AM +0800, Ming Lei wrote:
> On Fri, Aug 07, 2020 at 03:11:48PM +0100, Matthew Wilcox wrote:
> > On Fri, Aug 07, 2020 at 09:41:14PM +0800, Ming Lei wrote:
> > > On Fri, Aug 07, 2020 at 01:38:54PM +0100, Al Viro wrote:
> > > > FWIW, my preference would be to have for_each_bvec() advance past 
> > > > zero-length
> > > > segments; I'll need to go through its uses elsewhere in the tree first, 
> > > > though
> > > > (after I grab some sleep),
> > > 
> > > Usually block layer doesn't allow/support zero bvec, however we can make
> > > for_each_bvec() to support it only.
> > > 
> > > Tetsuo, can you try the following patch?
> > > 
> > > diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> > > index ac0c7299d5b8..b03c793dd28d 100644
> > > --- a/include/linux/bvec.h
> > > +++ b/include/linux/bvec.h
> > > @@ -117,11 +117,19 @@ static inline bool bvec_iter_advance(const struct 
> > > bio_vec *bv,
> > >   return true;
> > >  }
> > >  
> > > +static inline void bvec_iter_skip_zero_vec(const struct bio_vec *bv,
> > > + struct bvec_iter *iter)
> > > +{
> > > + iter->bi_idx++;
> > > + iter->bi_bvec_done = 0;
> > > +}
> > > +
> > >  #define for_each_bvec(bvl, bio_vec, iter, start) \
> > >   for (iter = (start);\
> > >(iter).bi_size &&  \
> > > - ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> > > -  bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
> > > + ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> > > +   (bvl).bv_len ? bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len) : \
> > > + bvec_iter_skip_zero_vec((bio_vec), &(iter)))
> > 
> > Uhm, bvec_iter_advance() already skips over zero length bio_vecs.
> > 
> > while (bytes && bytes >= bv[idx].bv_len) {
> > bytes -= bv[idx].bv_len;
> > idx++;
> > }
> 
> The issue is that zero (bvl).bv_len passed to bvec_iter_advance(), so
> the iterator can't move on.
> 
> And I tried to avoid change to bvec_iter_advance() since this exact
> issue only exists on for_each_bvec, and block layer won't support/allow
> zero-length bvec.
> 
> > 
> > The problem is when the _first_ bio_vec is zero length.
> 
> It can be any zero-length bvec during the iterating. 
> 
> > Maybe something more
> > like this (which doesn't even compile, but hopefully makes my point):
> > 
> > @@ -86,12 +86,24 @@ struct bvec_iter_all {
> > (mp_bvec_iter_page((bvec), (iter)) +\
> >  mp_bvec_iter_page_idx((bvec), (iter)))
> >  
> > -#define bvec_iter_bvec(bvec, iter) \
> > -((struct bio_vec) {\
> > -   .bv_page= bvec_iter_page((bvec), (iter)),   \
> > -   .bv_len = bvec_iter_len((bvec), (iter)),\
> > -   .bv_offset  = bvec_iter_offset((bvec), (iter)), \
> > -})
> > +static inline bool bvec_iter_bvec(struct bio_vec *bv, struct bio_vec *bvec,
> > +   struct bvec_iter *iter)
> > +{
> > +   unsigned int idx = iter->bi_idx;
> > +
> > +   if (!iter->bi_size)
> > +   return false;
> > +
> > +   while (!bv[idx].bv_len)
> > +   idx++;
> > +   iter->bi_idx = idx;
> > +
> > +   bv->bv_page = bvec_iter_page(bvec, *iter);
> > +   bv->bv_len = bvec_iter_len(bvec, *iter);
> > +   bv->bv_offset = bvec_iter_offset(bvec, *iter);
> > +
> > +   return true;
> > +}
> >  
> >  static inline bool bvec_iter_advance(const struct bio_vec *bv,
> > struct bvec_iter *iter, unsigned bytes)
> > @@ -119,8 +131,7 @@ static inline bool bvec_iter_advance(const struct 
> > bio_vec *bv,
> >  
> >  #define for_each_bvec(bvl, bio_vec, iter, start)   \
> > for (iter = (start);\
> > -(iter).bi_size &&  \
> > -   ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> > +bvec_iter_bvec(&(bvl), (bio_vec), &(iter));\
> >  bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
> >  
> >  /* for iterating one bio from start to end */
> > 
> > (I find the whole bvec handling a mess of confusing macros and would
> > welcome more of it being inline functions, in general).
> 
> The above change may bring more code duplication. Meantime, it can't
> work because (bvl).bv_len isn't taken into account into bvec_iter_bvec(),
> then how can the iterator advance?

oops, looks you change bvec_iter_bvec() only for skipping zero length bvec,
and this way might work(still ->bi_bvec_done isn't reset). However the
change is ugly, cause the iterator is supposed to not be updated in
bvec_iter_bvec(). Also block layer code doesn't require such change.

BTW, I agree on switching to inline if performance isn't affected.



Re: splice: infinite busy loop lockup bug

2020-08-08 Thread Ming Lei
On Fri, Aug 07, 2020 at 03:11:48PM +0100, Matthew Wilcox wrote:
> On Fri, Aug 07, 2020 at 09:41:14PM +0800, Ming Lei wrote:
> > On Fri, Aug 07, 2020 at 01:38:54PM +0100, Al Viro wrote:
> > > FWIW, my preference would be to have for_each_bvec() advance past 
> > > zero-length
> > > segments; I'll need to go through its uses elsewhere in the tree first, 
> > > though
> > > (after I grab some sleep),
> > 
> > Usually block layer doesn't allow/support zero bvec, however we can make
> > for_each_bvec() to support it only.
> > 
> > Tetsuo, can you try the following patch?
> > 
> > diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> > index ac0c7299d5b8..b03c793dd28d 100644
> > --- a/include/linux/bvec.h
> > +++ b/include/linux/bvec.h
> > @@ -117,11 +117,19 @@ static inline bool bvec_iter_advance(const struct 
> > bio_vec *bv,
> > return true;
> >  }
> >  
> > +static inline void bvec_iter_skip_zero_vec(const struct bio_vec *bv,
> > +   struct bvec_iter *iter)
> > +{
> > +   iter->bi_idx++;
> > +   iter->bi_bvec_done = 0;
> > +}
> > +
> >  #define for_each_bvec(bvl, bio_vec, iter, start)   \
> > for (iter = (start);\
> >  (iter).bi_size &&  \
> > -   ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> > -bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
> > +   ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> > + (bvl).bv_len ? bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len) : \
> > +   bvec_iter_skip_zero_vec((bio_vec), &(iter)))
> 
> Uhm, bvec_iter_advance() already skips over zero length bio_vecs.
> 
> while (bytes && bytes >= bv[idx].bv_len) {
> bytes -= bv[idx].bv_len;
> idx++;
> }

The issue is that zero (bvl).bv_len passed to bvec_iter_advance(), so
the iterator can't move on.

And I tried to avoid change to bvec_iter_advance() since this exact
issue only exists on for_each_bvec, and block layer won't support/allow
zero-length bvec.

> 
> The problem is when the _first_ bio_vec is zero length.

It can be any zero-length bvec during the iterating. 

> Maybe something more
> like this (which doesn't even compile, but hopefully makes my point):
> 
> @@ -86,12 +86,24 @@ struct bvec_iter_all {
> (mp_bvec_iter_page((bvec), (iter)) +\
>  mp_bvec_iter_page_idx((bvec), (iter)))
>  
> -#define bvec_iter_bvec(bvec, iter) \
> -((struct bio_vec) {\
> -   .bv_page= bvec_iter_page((bvec), (iter)),   \
> -   .bv_len = bvec_iter_len((bvec), (iter)),\
> -   .bv_offset  = bvec_iter_offset((bvec), (iter)), \
> -})
> +static inline bool bvec_iter_bvec(struct bio_vec *bv, struct bio_vec *bvec,
> +   struct bvec_iter *iter)
> +{
> +   unsigned int idx = iter->bi_idx;
> +
> +   if (!iter->bi_size)
> +   return false;
> +
> +   while (!bv[idx].bv_len)
> +   idx++;
> +   iter->bi_idx = idx;
> +
> +   bv->bv_page = bvec_iter_page(bvec, *iter);
> +   bv->bv_len = bvec_iter_len(bvec, *iter);
> +   bv->bv_offset = bvec_iter_offset(bvec, *iter);
> +
> +   return true;
> +}
>  
>  static inline bool bvec_iter_advance(const struct bio_vec *bv,
> struct bvec_iter *iter, unsigned bytes)
> @@ -119,8 +131,7 @@ static inline bool bvec_iter_advance(const struct bio_vec 
> *bv,
>  
>  #define for_each_bvec(bvl, bio_vec, iter, start)   \
> for (iter = (start);\
> -(iter).bi_size &&  \
> -   ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> +bvec_iter_bvec(&(bvl), (bio_vec), &(iter));\
>  bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
>  
>  /* for iterating one bio from start to end */
> 
> (I find the whole bvec handling a mess of confusing macros and would
> welcome more of it being inline functions, in general).

The above change may bring more code duplication. Meantime, it can't
work because (bvl).bv_len isn't taken into account into bvec_iter_bvec(),
then how can the iterator advance?


Thanks,
Ming



Re: splice: infinite busy loop lockup bug

2020-08-07 Thread Tetsuo Handa
On 2020/08/07 23:11, Matthew Wilcox wrote:
> (I find the whole bvec handling a mess of confusing macros and would
> welcome more of it being inline functions, in general).

Indeed. Inlined functions will be more useful than macros when syzbot
calculates the location of the source code from address for reporting.
I spent a lot of time where

  RIP: 0010:iov_iter_alignment+0x39e/0x850 lib/iov_iter.c:1236

within the complicated macros is. If inlined line numbers were available,
I could have narrowed down the location of infinite loop faster...


Re: splice: infinite busy loop lockup bug

2020-08-07 Thread Tetsuo Handa
On 2020/08/07 22:41, Ming Lei wrote:
>> FWIW, my preference would be to have for_each_bvec() advance past zero-length
>> segments; I'll need to go through its uses elsewhere in the tree first, 
>> though
>> (after I grab some sleep),
> 
> Usually block layer doesn't allow/support zero bvec, however we can make
> for_each_bvec() to support it only.
> 
> Tetsuo, can you try the following patch?

Yes, this patch solves the lockup. Thank you.


Re: splice: infinite busy loop lockup bug

2020-08-07 Thread Matthew Wilcox
On Fri, Aug 07, 2020 at 09:41:14PM +0800, Ming Lei wrote:
> On Fri, Aug 07, 2020 at 01:38:54PM +0100, Al Viro wrote:
> > FWIW, my preference would be to have for_each_bvec() advance past 
> > zero-length
> > segments; I'll need to go through its uses elsewhere in the tree first, 
> > though
> > (after I grab some sleep),
> 
> Usually block layer doesn't allow/support zero bvec, however we can make
> for_each_bvec() to support it only.
> 
> Tetsuo, can you try the following patch?
> 
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index ac0c7299d5b8..b03c793dd28d 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -117,11 +117,19 @@ static inline bool bvec_iter_advance(const struct 
> bio_vec *bv,
>   return true;
>  }
>  
> +static inline void bvec_iter_skip_zero_vec(const struct bio_vec *bv,
> + struct bvec_iter *iter)
> +{
> + iter->bi_idx++;
> + iter->bi_bvec_done = 0;
> +}
> +
>  #define for_each_bvec(bvl, bio_vec, iter, start) \
>   for (iter = (start);\
>(iter).bi_size &&  \
> - ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> -  bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
> + ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
> +   (bvl).bv_len ? bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len) : \
> + bvec_iter_skip_zero_vec((bio_vec), &(iter)))

Uhm, bvec_iter_advance() already skips over zero length bio_vecs.

while (bytes && bytes >= bv[idx].bv_len) {
bytes -= bv[idx].bv_len;
idx++;
}

The problem is when the _first_ bio_vec is zero length.  Maybe something more
like this (which doesn't even compile, but hopefully makes my point):

@@ -86,12 +86,24 @@ struct bvec_iter_all {
(mp_bvec_iter_page((bvec), (iter)) +\
 mp_bvec_iter_page_idx((bvec), (iter)))
 
-#define bvec_iter_bvec(bvec, iter) \
-((struct bio_vec) {\
-   .bv_page= bvec_iter_page((bvec), (iter)),   \
-   .bv_len = bvec_iter_len((bvec), (iter)),\
-   .bv_offset  = bvec_iter_offset((bvec), (iter)), \
-})
+static inline bool bvec_iter_bvec(struct bio_vec *bv, struct bio_vec *bvec,
+   struct bvec_iter *iter)
+{
+   unsigned int idx = iter->bi_idx;
+
+   if (!iter->bi_size)
+   return false;
+
+   while (!bv[idx].bv_len)
+   idx++;
+   iter->bi_idx = idx;
+
+   bv->bv_page = bvec_iter_page(bvec, *iter);
+   bv->bv_len = bvec_iter_len(bvec, *iter);
+   bv->bv_offset = bvec_iter_offset(bvec, *iter);
+
+   return true;
+}
 
 static inline bool bvec_iter_advance(const struct bio_vec *bv,
struct bvec_iter *iter, unsigned bytes)
@@ -119,8 +131,7 @@ static inline bool bvec_iter_advance(const struct bio_vec 
*bv,
 
 #define for_each_bvec(bvl, bio_vec, iter, start)   \
for (iter = (start);\
-(iter).bi_size &&  \
-   ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
+bvec_iter_bvec(&(bvl), (bio_vec), &(iter));\
 bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
 
 /* for iterating one bio from start to end */

(I find the whole bvec handling a mess of confusing macros and would
welcome more of it being inline functions, in general).


Re: splice: infinite busy loop lockup bug

2020-08-07 Thread Ming Lei
On Fri, Aug 07, 2020 at 01:38:54PM +0100, Al Viro wrote:
> On Fri, Aug 07, 2020 at 01:27:27PM +0100, Al Viro wrote:
> > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> > > syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() 
> > > from
> > > iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> > > ext4_unaligned_io() from ext4_dio_write_iter() from 
> > > ext4_file_write_iter() from
> > > call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> > > vfs_iter_write() from iter_file_splice_write() falls into infinite busy 
> > > loop
> > > with pipe->mutex held.
> > > 
> > > The reason of falling into infinite busy loop is that 
> > > iter_file_splice_write()
> > > for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> > > .bv_offset=0
> > > while for_each_bvec() cannot handle .bv_len == 0.
> > 
> > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> > iterate_bvec()",
> > unless I'm misreading it...
> > 
> > Zero-length segments are not disallowed; it's not all that hard to filter 
> > them
> > out in iter_file_splice_write(), but the intent had always been to have
> > iterate_all_kinds() et.al. able to cope with those.
> > 
> > How are these pipe_buffers with ->len == 0 generated in that reproducer, 
> > BTW?
> > There might be something else fishy going on...
> 
> FWIW, my preference would be to have for_each_bvec() advance past zero-length
> segments; I'll need to go through its uses elsewhere in the tree first, though
> (after I grab some sleep),

Usually block layer doesn't allow/support zero bvec, however we can make
for_each_bvec() to support it only.

Tetsuo, can you try the following patch?

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index ac0c7299d5b8..b03c793dd28d 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -117,11 +117,19 @@ static inline bool bvec_iter_advance(const struct bio_vec 
*bv,
return true;
 }
 
+static inline void bvec_iter_skip_zero_vec(const struct bio_vec *bv,
+   struct bvec_iter *iter)
+{
+   iter->bi_idx++;
+   iter->bi_bvec_done = 0;
+}
+
 #define for_each_bvec(bvl, bio_vec, iter, start)   \
for (iter = (start);\
 (iter).bi_size &&  \
-   ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
-bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
+   ((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
+ (bvl).bv_len ? bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len) : \
+   bvec_iter_skip_zero_vec((bio_vec), &(iter)))
 
 /* for iterating one bio from start to end */
 #define BVEC_ITER_ALL_INIT (struct bvec_iter)  \

Thanks,
Ming



Re: splice: infinite busy loop lockup bug

2020-08-07 Thread Al Viro
On Fri, Aug 07, 2020 at 01:27:27PM +0100, Al Viro wrote:
> On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> > syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() 
> > from
> > iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> > ext4_unaligned_io() from ext4_dio_write_iter() from ext4_file_write_iter() 
> > from
> > call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> > vfs_iter_write() from iter_file_splice_write() falls into infinite busy loop
> > with pipe->mutex held.
> > 
> > The reason of falling into infinite busy loop is that 
> > iter_file_splice_write()
> > for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> > .bv_offset=0
> > while for_each_bvec() cannot handle .bv_len == 0.
> 
> broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> iterate_bvec()",
> unless I'm misreading it...
> 
> Zero-length segments are not disallowed; it's not all that hard to filter them
> out in iter_file_splice_write(), but the intent had always been to have
> iterate_all_kinds() et.al. able to cope with those.
> 
> How are these pipe_buffers with ->len == 0 generated in that reproducer, BTW?
> There might be something else fishy going on...

FWIW, my preference would be to have for_each_bvec() advance past zero-length
segments; I'll need to go through its uses elsewhere in the tree first, though
(after I grab some sleep),


Re: splice: infinite busy loop lockup bug

2020-08-07 Thread Tetsuo Handa
On 2020/08/07 21:27, Al Viro wrote:
> On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
>> syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() from
>> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
>> ext4_unaligned_io() from ext4_dio_write_iter() from ext4_file_write_iter() 
>> from
>> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
>> vfs_iter_write() from iter_file_splice_write() falls into infinite busy loop
>> with pipe->mutex held.
>>
>> The reason of falling into infinite busy loop is that 
>> iter_file_splice_write()
>> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
>> .bv_offset=0
>> while for_each_bvec() cannot handle .bv_len == 0.
> 
> broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
> iterate_bvec()",
> unless I'm misreading it...
> 
> Zero-length segments are not disallowed; it's not all that hard to filter them
> out in iter_file_splice_write(), but the intent had always been to have
> iterate_all_kinds() et.al. able to cope with those.
> 
> How are these pipe_buffers with ->len == 0 generated in that reproducer, BTW?
> There might be something else fishy going on...
> 

OK. Indeed writing to empty pipe which returns -EFAULT allows an empty
page to be linked to pipe's array.

Now, I've just found a simple reproducer, and confirmed that this bug is
a local lockup DoS by unprivileged user. Please fix.

--
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
static char buffer[4096];
const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
int pipe_fd[2] = { EOF, EOF };
pipe(pipe_fd);
write(pipe_fd[1], NULL, 4096);
write(pipe_fd[1], buffer, 4096);
splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
return 0;
}
--

[  125.598898][C0] rcu: INFO: rcu_sched self-detected stall on CPU
[  125.601072][C0] rcu: 0-: (20171 ticks this GP) 
idle=526/1/0x4000 softirq=7918/7918 fqs=5136 
[  125.604874][C0]  (t=21006 jiffies g=9341 q=30)
[  125.606512][C0] NMI backtrace for cpu 0
[  125.607931][C0] CPU: 0 PID: 2792 Comm: a.out Not tainted 5.8.0+ #793
[  125.610948][C0] Hardware name: VMware, Inc. VMware Virtual 
Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[  125.614938][C0] Call Trace:
[  125.616049][C0]  
[  125.617010][C0]  dump_stack+0x5e/0x7a
[  125.618370][C0]  nmi_cpu_backtrace.cold.7+0x14/0x52
[  125.620148][C0]  ? lapic_can_unplug_cpu.cold.39+0x3a/0x3a
[  125.622074][C0]  nmi_trigger_cpumask_backtrace+0x92/0x9f
[  125.624154][C0]  arch_trigger_cpumask_backtrace+0x14/0x20
[  125.626102][C0]  rcu_dump_cpu_stacks+0xa0/0xd0
[  125.627919][C0]  rcu_sched_clock_irq.cold.95+0x121/0x39c
[  125.629833][C0]  ? acct_account_cputime+0x17/0x20
[  125.631534][C0]  ? account_system_index_time+0x8a/0xa0
[  125.633422][C0]  update_process_times+0x23/0x60
[  125.635070][C0]  tick_sched_handle.isra.22+0x20/0x60
[  125.636870][C0]  tick_sched_timer+0x68/0x80
[  125.638403][C0]  ? tick_sched_handle.isra.22+0x60/0x60
[  125.640588][C0]  __hrtimer_run_queues+0xf9/0x1a0
[  125.642591][C0]  hrtimer_interrupt+0xfc/0x210
[  125.645033][C0]  __sysvec_apic_timer_interrupt+0x4c/0x60
[  125.647292][C0]  asm_call_on_stack+0xf/0x20
[  125.649192][C0]  
[  125.650501][C0]  sysvec_apic_timer_interrupt+0x75/0x80
[  125.652900][C0]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  125.655487][C0] RIP: 0010:iov_iter_copy_from_user_atomic+0x19b/0x350
[  125.658124][C0] Code: 89 45 d0 48 c1 e6 06 48 03 37 4d 8d 3c 09 4c 89 cf 
e8 d9 e5 ff ff 48 8b 45 d0 45 39 eb 0f 87 35 01 00 00 49 8b 4a 18 4d 89 f9 <45> 
29 dd 45 01 d8 75 12 eb 19 41 83 c4 01 41 29 c0 74 10 44 89 e0
[  125.666132][C0] RSP: 0018:a6cdc1237aa8 EFLAGS: 0246
[  125.668557][C0] RAX:  RBX: 1000 RCX: 
945035a25100
[  125.671576][C0] RDX:  RSI:  RDI: 
945035a25100
[  125.674851][C0] RBP: a6cdc1237ad8 R08:  R09: 
945028a8
[  125.677989][C0] R10: a6cdc1237de0 R11:  R12: 

[  125.680990][C0] R13: 1000 R14: 1000 R15: 
1000
[  125.684006][C0]  iomap_write_actor+0xbe/0x190
[  125.685982][C0]  ? iomap_write_begin+0x460/0x460
[  125.688031][C0]  iomap_apply+0xf4/0x1a0
[  125.689810][C0]  ? iomap_write_begin+0x460/0x460
[  125.691826][C0]  iomap_file_buffered_write+0x69/0x90
[  125.698598][C0]  ? iomap_write_begin+0x460/0x460
[  125.705341][C0]  xfs_file_buffered_aio_write+0xc2/0x2c0
[  125.707780][C0]  xfs_file_write_iter+0xa3/0xc0
[  125.709802][C0]  do_iter_readv_writev+0x15b/0x1c0
[  125.712496][C0]  do_iter_write+0x81/0x190
[  125.715245][C0] 

Re: splice: infinite busy loop lockup bug

2020-08-07 Thread Al Viro
On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() from
> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> ext4_unaligned_io() from ext4_dio_write_iter() from ext4_file_write_iter() 
> from
> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> vfs_iter_write() from iter_file_splice_write() falls into infinite busy loop
> with pipe->mutex held.
> 
> The reason of falling into infinite busy loop is that iter_file_splice_write()
> for some reason generates "struct bio_vec" entry with .bv_len=0 and 
> .bv_offset=0
> while for_each_bvec() cannot handle .bv_len == 0.

broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement 
iterate_bvec()",
unless I'm misreading it...

Zero-length segments are not disallowed; it's not all that hard to filter them
out in iter_file_splice_write(), but the intent had always been to have
iterate_all_kinds() et.al. able to cope with those.

How are these pipe_buffers with ->len == 0 generated in that reproducer, BTW?
There might be something else fishy going on...


splice: infinite busy loop lockup bug

2020-08-07 Thread Tetsuo Handa
syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() from
iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
ext4_unaligned_io() from ext4_dio_write_iter() from ext4_file_write_iter() from
call_write_iter() from do_iter_readv_writev() from do_iter_write() from
vfs_iter_write() from iter_file_splice_write() falls into infinite busy loop
with pipe->mutex held.

The reason of falling into infinite busy loop is that iter_file_splice_write()
for some reason generates "struct bio_vec" entry with .bv_len=0 and .bv_offset=0
while for_each_bvec() cannot handle .bv_len == 0.

--- a/fs/splice.c
+++ b/fs/splice.c
@@ -747,6 +747,14 @@ iter_file_splice_write(struct pipe_inode_info *pipe, 
struct file *out,
}
 
iov_iter_bvec(, WRITE, array, n, sd.total_len - left);
+   if (!strncmp(current->comm, "syz-executor", 12)) {
+   int i;
+   printk("Starting vfs_write_iter from.type=%d 
from.iov_offset=%zu from.count=%zu n=%u sd.total_len=%zu left=%zu\n",
+  from.type, from.iov_offset, from.count, n, 
sd.total_len, left);
+   for (i = 0; i < n; i++)
+   printk("  array[%u]: bv_page=%px bv_len=%u 
bv_offset=%u\n",
+  i, array[i].bv_page, array[i].bv_len, 
array[i].bv_offset);
+   }
ret = vfs_iter_write(out, , , 0);
if (ret <= 0)
break;

When splice() from pipe to file works.

[   31.704915][ T6552] Starting vfs_write_iter from.type=17 from.iov_offset=0 
from.count=4096 n=1 sd.total_len=65504 left=61408
[   31.709098][ T6552]   array[0]: bv_page=ea000870a7c0 bv_len=4096 
bv_offset=0

When splice() from pipe to file falls into infinite busy loop.

[   31.717178][ T6553] Starting vfs_write_iter from.type=17 from.iov_offset=0 
from.count=4096 n=2 sd.total_len=65504 left=61408
[   31.720983][ T6553]   array[0]: bv_page=ea0008706680 bv_len=0 bv_offset=0
[   31.723565][ T6553]   array[1]: bv_page=ea00086f4e80 bv_len=4096 
bv_offset=0

Is it normal behavior that an empty page is linked to pipe's array?
If yes, don't we need to skip empty pages when iter_file_splice_write() fills 
in "struct bio_vec *array" ?

[1] 
https://syzkaller.appspot.com/bug?id=2ccac875e85dc852911a0b5b788ada82dc0a081e