Re: a few questions about pagevc_lookup_entries

2019-06-24 Thread Liu Bo
On Mon, Jun 24, 2019 at 3:41 AM Jan Kara  wrote:
>
> On Mon 24-06-19 09:25:00, Miklos Szeredi wrote:
> > [cc: vivek, stefan, dgilbert]
> >
> > On Fri, Jun 21, 2019 at 12:04 AM Liu Bo  wrote:
> > >
> > > On Thu, Jun 20, 2019 at 1:36 AM Jan Kara  wrote:
> > > >
> > > > [added some relevant lists to CC - this can safe some people debugging 
> > > > by
> > > > being able to google this discussion]
> > > >
> > > > On Wed 19-06-19 15:57:38, Liu Bo wrote:
> > > > > I found a weird dead loop within invalidate_inode_pages2_range, the
> > > > > reason being that  pagevec_lookup_entries(index=1) returns an indices
> > > > > array which has only one entry storing value 0, and this has led
> > > > > invalidate_inode_pages2_range() to a dead loop, something like,
> > > > >
> > > > > invalidate_inode_pages2_range()
> > > > >   -> while (pagevec_lookup_entries(index=1, indices))
> > > > > ->  for (i = 0; i < pagevec_count(); i++) {
> > > > >   -> index = indices[0]; // index is set to 0
> > > > >   -> if (radix_tree_exceptional_entry(page)) {
> > > > >   -> if (!invalidate_exceptional_entry2()) //
> > > > >   ->__dax_invalidate_mapping_entry // return 0
> > > > >  -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
> > > > >  ret = -EBUSY;
> > > > >   ->continue;
> > > > >   } // end of if (radix_tree_exceptional_entry(page))
> > > > > -> index++; // index is set to 1
> > > > >
> > > > > The following debug[1] proved the above analysis,  I was wondering if
> > > > > this was a corner case that  pagevec_lookup_entries() allows or a
> > > > > known bug that has been fixed upstream?
> > > > >
> > > > > ps: the kernel in use is 4.19.30 (LTS).
> > > >
> > > > Hum, the above trace suggests you are using DAX. Are you really? 
> > > > Because the
> > > > stacktrace below shows we are working on fuse inode so that shouldn't
> > > > really be DAX inode...
> > > >
> > >
> > > So I was running tests against virtiofs[1] which adds dax support to
> > > fuse, with dax, fuse provides posix stuff while dax provides data
> > > channel.
> > >
> > > [1]: https://virtio-fs.gitlab.io/
> > > https://gitlab.com/virtio-fs/linux
>
> OK, thanks for the explanation and the pointer. So if I should guess, I'd
> say that there's some problem with multiorder entries (for PMD pages) in
> the radix tree. In particular if you lookup index 1 and there's
> multiorder entry for indices 0-511, radix_tree_next_chunk() is updating
> iter->index like:
>
> iter->index = (index &~ node_maxindex(node)) | (offset << node->shift);
>
> and offset is computed by radix_tree_descend() as:
>
> offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
>
> So this all results in iter->index being set to 0 and thus confusing the
> iteration in invalidate_inode_pages2_range(). Current kernel has xarray
> code from Matthew which maintains originally passed index in xas.xa_index
> and thus the problem isn't there.
>
> So to sum up: Seems like a DAX-specific bug with PMD entries in older
> kernels fixed by xarray rewrite.
>

Thank you so much for the information, Jan.

I'll double check if that's the root cause and report back, if yes,
guess then we have to fix 4.19's radix tree in place to do the right
thing instead of porting back xarray rewrite..

thanks,
liubo

> Honza
> --
> Jan Kara 
> SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: a few questions about pagevc_lookup_entries

2019-06-24 Thread Jan Kara
On Mon 24-06-19 09:25:00, Miklos Szeredi wrote:
> [cc: vivek, stefan, dgilbert]
> 
> On Fri, Jun 21, 2019 at 12:04 AM Liu Bo  wrote:
> >
> > On Thu, Jun 20, 2019 at 1:36 AM Jan Kara  wrote:
> > >
> > > [added some relevant lists to CC - this can safe some people debugging by
> > > being able to google this discussion]
> > >
> > > On Wed 19-06-19 15:57:38, Liu Bo wrote:
> > > > I found a weird dead loop within invalidate_inode_pages2_range, the
> > > > reason being that  pagevec_lookup_entries(index=1) returns an indices
> > > > array which has only one entry storing value 0, and this has led
> > > > invalidate_inode_pages2_range() to a dead loop, something like,
> > > >
> > > > invalidate_inode_pages2_range()
> > > >   -> while (pagevec_lookup_entries(index=1, indices))
> > > > ->  for (i = 0; i < pagevec_count(); i++) {
> > > >   -> index = indices[0]; // index is set to 0
> > > >   -> if (radix_tree_exceptional_entry(page)) {
> > > >   -> if (!invalidate_exceptional_entry2()) //
> > > >   ->__dax_invalidate_mapping_entry // return 0
> > > >  -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
> > > >  ret = -EBUSY;
> > > >   ->continue;
> > > >   } // end of if (radix_tree_exceptional_entry(page))
> > > > -> index++; // index is set to 1
> > > >
> > > > The following debug[1] proved the above analysis,  I was wondering if
> > > > this was a corner case that  pagevec_lookup_entries() allows or a
> > > > known bug that has been fixed upstream?
> > > >
> > > > ps: the kernel in use is 4.19.30 (LTS).
> > >
> > > Hum, the above trace suggests you are using DAX. Are you really? Because 
> > > the
> > > stacktrace below shows we are working on fuse inode so that shouldn't
> > > really be DAX inode...
> > >
> >
> > So I was running tests against virtiofs[1] which adds dax support to
> > fuse, with dax, fuse provides posix stuff while dax provides data
> > channel.
> >
> > [1]: https://virtio-fs.gitlab.io/
> > https://gitlab.com/virtio-fs/linux

OK, thanks for the explanation and the pointer. So if I should guess, I'd
say that there's some problem with multiorder entries (for PMD pages) in
the radix tree. In particular if you lookup index 1 and there's
multiorder entry for indices 0-511, radix_tree_next_chunk() is updating
iter->index like:

iter->index = (index &~ node_maxindex(node)) | (offset << node->shift);

and offset is computed by radix_tree_descend() as:

offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;

So this all results in iter->index being set to 0 and thus confusing the
iteration in invalidate_inode_pages2_range(). Current kernel has xarray
code from Matthew which maintains originally passed index in xas.xa_index
and thus the problem isn't there.

So to sum up: Seems like a DAX-specific bug with PMD entries in older
kernels fixed by xarray rewrite.

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: a few questions about pagevc_lookup_entries

2019-06-20 Thread Jan Kara
[added some relevant lists to CC - this can safe some people debugging by
being able to google this discussion]

On Wed 19-06-19 15:57:38, Liu Bo wrote:
> I found a weird dead loop within invalidate_inode_pages2_range, the
> reason being that  pagevec_lookup_entries(index=1) returns an indices
> array which has only one entry storing value 0, and this has led
> invalidate_inode_pages2_range() to a dead loop, something like,
> 
> invalidate_inode_pages2_range()
>   -> while (pagevec_lookup_entries(index=1, indices))
> ->  for (i = 0; i < pagevec_count(); i++) {
>   -> index = indices[0]; // index is set to 0
>   -> if (radix_tree_exceptional_entry(page)) {
>   -> if (!invalidate_exceptional_entry2()) //
>   ->__dax_invalidate_mapping_entry // return 0
>  -> // entry marked as PAGECACHE_TAG_DIRTY/TOWRITE
>  ret = -EBUSY;
>   ->continue;
>   } // end of if (radix_tree_exceptional_entry(page))
> -> index++; // index is set to 1
> 
> The following debug[1] proved the above analysis,  I was wondering if
> this was a corner case that  pagevec_lookup_entries() allows or a
> known bug that has been fixed upstream?
> 
> ps: the kernel in use is 4.19.30 (LTS).

Hum, the above trace suggests you are using DAX. Are you really? Because the
stacktrace below shows we are working on fuse inode so that shouldn't
really be DAX inode...

Honza

> [1]:
> $git diff
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 71b65aab8077..82bfeeb53135 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -692,6 +692,7 @@ int invalidate_inode_pages2_range(struct
> address_space *mapping,
> struct page *page = pvec.pages[i];
> 
> /* We rely upon deletion not changing page->index */
> +   WARN_ONCE(index > indices[i], "index = %d
> indices[%d]=%d\n", index, i, indices[i]);
> index = indices[i];
> if (index > end)
> break;
> 
> [  129.095383] [ cut here ]
> [  129.096164] index = 1 indices[0]=0
> [  129.096786] WARNING: CPU: 0 PID: 3022 at mm/truncate.c:695
> invalidate_inode_pages2_range+0x471/0x500
> [  129.098234] Modules linked in:
> [  129.098717] CPU: 0 PID: 3022 Comm: doio Not tainted 4.19.30+ #4
> ...
> [  129.101288] RIP: 0010:invalidate_inode_pages2_range+0x471/0x500
> ...
> [  129.114162] Call Trace:
> [  129.114623]  ? __schedule+0x2ad/0x860
> [  129.115214]  ? prepare_to_wait_event+0x80/0x140
> [  129.115903]  ? finish_wait+0x3f/0x80
> [  129.116452]  ? request_wait_answer+0x13d/0x210
> [  129.117128]  ? remove_wait_queue+0x60/0x60
> [  129.117757]  ? make_kgid+0x13/0x20
> [  129.118277]  ? fuse_change_attributes_common+0x7d/0x130
> [  129.119057]  ? fuse_change_attributes+0x8d/0x120
> [  129.119754]  fuse_dentry_revalidate+0x2c5/0x300
> [  129.120456]  lookup_fast+0x237/0x2b0
> [  129.121018]  path_openat+0x15f/0x1380
> [  129.121614]  ? generic_update_time+0x6b/0xd0
> [  129.122316]  do_filp_open+0x91/0x100
> [  129.122876]  do_sys_open+0x126/0x210
> [  129.123453]  do_syscall_64+0x55/0x180
> [  129.124036]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  129.124820] RIP: 0033:0x7fbe0cd75e80
> ...
> [  129.134574] ---[ end trace c0fc0bbc5aebf0dc ]---
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm