Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-14 Thread Matthew Wilcox
On Thu, Feb 13, 2020 at 09:36:25PM -0800, John Hubbard wrote:
> > +static inline struct page *readahead_page(struct readahead_control *rac)
> > +{
> > +   struct page *page;
> > +
> > +   if (!rac->nr_pages)
> > +   return NULL;
> > +
> > +   page = xa_load(>mapping->i_pages, rac->start);
> 
> 
> Is it worth asserting that the page was found:
> 
>   VM_BUG_ON_PAGE(!page || xa_is_value(page), page);
> 
> ? Or is that overkill here?

It shouldn't be possible since they were just added in a locked state.
If it did happen, it'll be caught by the assert below -- dereferencing
a NULL pointer or a shadow entry is not going to go well.

> > +   VM_BUG_ON_PAGE(!PageLocked(page), page);
> > +   rac->batch_count = hpage_nr_pages(page);
> > +   rac->start += rac->batch_count;
> 
> The above was surprising, until I saw the other thread with Dave and you.
> I was reviewing this patchset in order to have a chance at understanding the 
> follow-on patchset ("Large pages in the page cache"), and it seems like that
> feature has a solid head start here. :)  

Right, I'll document that.


Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-13 Thread John Hubbard
On 2/10/20 5:03 PM, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> This replaces ->readpages with a saner interface:
>  - Return void instead of an ignored error code.
>  - Pages are already in the page cache when ->readahead is called.
>  - Implementation looks up the pages in the page cache instead of
>having them passed in a linked list.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  Documentation/filesystems/locking.rst |  6 ++-
>  Documentation/filesystems/vfs.rst | 13 +++
>  include/linux/fs.h|  2 +
>  include/linux/pagemap.h   | 54 +++
>  mm/readahead.c| 48 ++--
>  5 files changed, 102 insertions(+), 21 deletions(-)
> 

A minor question below, but either way you can add:

Reviewed-by: John Hubbard 



> diff --git a/Documentation/filesystems/locking.rst 
> b/Documentation/filesystems/locking.rst
> index 5057e4d9dcd1..0ebc4491025a 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -239,6 +239,7 @@ prototypes::
>   int (*readpage)(struct file *, struct page *);
>   int (*writepages)(struct address_space *, struct writeback_control *);
>   int (*set_page_dirty)(struct page *page);
> + void (*readahead)(struct readahead_control *);
>   int (*readpages)(struct file *filp, struct address_space *mapping,
>   struct list_head *pages, unsigned nr_pages);
>   int (*write_begin)(struct file *, struct address_space *mapping,
> @@ -271,7 +272,8 @@ writepage:yes, unlocks (see below)
>  readpage:yes, unlocks
>  writepages:
>  set_page_dirty   no
> -readpages:
> +readahead:   yes, unlocks
> +readpages:   no
>  write_begin: locks the page   exclusive
>  write_end:   yes, unlocks exclusive
>  bmap:
> @@ -295,6 +297,8 @@ the request handler (/dev/loop).
>  ->readpage() unlocks the page, either synchronously or via I/O
>  completion.
>  
> +->readahead() unlocks the pages like ->readpage().
> +
>  ->readpages() populates the pagecache with the passed pages and starts
>  I/O against them.  They come unlocked upon I/O completion.
>  
> diff --git a/Documentation/filesystems/vfs.rst 
> b/Documentation/filesystems/vfs.rst
> index 7d4d09dd5e6d..cabee16b7406 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -706,6 +706,7 @@ cache in your filesystem.  The following members are 
> defined:
>   int (*readpage)(struct file *, struct page *);
>   int (*writepages)(struct address_space *, struct 
> writeback_control *);
>   int (*set_page_dirty)(struct page *page);
> + void (*readahead)(struct readahead_control *);
>   int (*readpages)(struct file *filp, struct address_space 
> *mapping,
>struct list_head *pages, unsigned nr_pages);
>   int (*write_begin)(struct file *, struct address_space *mapping,
> @@ -781,12 +782,24 @@ cache in your filesystem.  The following members are 
> defined:
>   If defined, it should set the PageDirty flag, and the
>   PAGECACHE_TAG_DIRTY tag in the radix tree.
>  
> +``readahead``
> + Called by the VM to read pages associated with the address_space
> + object.  The pages are consecutive in the page cache and are
> + locked.  The implementation should decrement the page refcount
> + after starting I/O on each page.  Usually the page will be
> + unlocked by the I/O completion handler.  If the function does
> + not attempt I/O on some pages, the caller will decrement the page
> + refcount and unlock the pages for you.  Set PageUptodate if the
> + I/O completes successfully.  Setting PageError on any page will
> + be ignored; simply unlock the page if an I/O error occurs.
> +
>  ``readpages``
>   called by the VM to read pages associated with the address_space
>   object.  This is essentially just a vector version of readpage.
>   Instead of just one page, several pages are requested.
>   readpages is only used for read-ahead, so read errors are
>   ignored.  If anything goes wrong, feel free to give up.
> +This interface is deprecated; implement readahead instead.
>  
>  ``write_begin``
>   Called by the generic buffered write code to ask the filesystem
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3cd4fe6b845e..d4e2d2964346 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -292,6 +292,7 @@ enum positive_aop_returns {
>  struct page;
>  struct address_space;
>  struct writeback_control;
> +struct readahead_control;
>  
>  /*
>   * Write life time hint values.
> @@ -375,6 +376,7 @@ struct address_space_operations {
>*/
>   int (*readpages)(struct file *filp, struct address_space *mapping,
>   

Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-12 Thread Christoph Hellwig
On Mon, Feb 10, 2020 at 05:03:39PM -0800, Matthew Wilcox wrote:
> +struct readahead_control {
> + struct file *file;
> + struct address_space *mapping;
> +/* private: use the readahead_* accessors instead */
> + pgoff_t start;
> + unsigned int nr_pages;
> + unsigned int batch_count;

We often use __ prefixes for the private fields to make that a little
more clear.


Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-11 Thread Dave Chinner
On Tue, Feb 11, 2020 at 04:54:13AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 11, 2020 at 03:52:30PM +1100, Dave Chinner wrote:
> > > +struct readahead_control {
> > > + struct file *file;
> > > + struct address_space *mapping;
> > > +/* private: use the readahead_* accessors instead */
> > > + pgoff_t start;
> > > + unsigned int nr_pages;
> > > + unsigned int batch_count;
> > > +};
> > > +
> > > +static inline struct page *readahead_page(struct readahead_control *rac)
> > > +{
> > > + struct page *page;
> > > +
> > > + if (!rac->nr_pages)
> > > + return NULL;
> > > +
> > > + page = xa_load(>mapping->i_pages, rac->start);
> > > + VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > + rac->batch_count = hpage_nr_pages(page);
> > > + rac->start += rac->batch_count;
> > 
> > There's no mention of large page support in the patch description
> > and I don't recall this sort of large page batching in previous
> > iterations.
> > 
> > This seems like new functionality to me, not directly related to
> > the initial ->readahead API change? What have I missed?
> 
> I had a crisis of confidence when I was working on this -- the loop
> originally looked like this:
> 
> #define readahead_for_each(rac, page)   \
> for (; (page = readahead_page(rac)); rac->nr_pages--)
> 
> and then I started thinking about what I'd need to do to support large
> pages, and that turned into
> 
> #define readahead_for_each(rac, page)   \
> for (; (page = readahead_page(rac));  \
>   rac->nr_pages -= hpage_nr_pages(page))
> 
> but I realised that was potentially a use-after-free because 'page' has
> certainly had put_page() called on it by then.  I had a brief period
> where I looked at moving put_page() away from being the filesystem's
> responsibility and into the iterator, but that would introduce more
> changes into the patchset, as well as causing problems for filesystems
> that want to break out of the loop.
> 
> By this point, I was also looking at the readahead_for_each_batch()
> iterator that btrfs uses, and so we have the batch count anyway, and we
> might as well use it to store the number of subpages of the large page.
> And so it became easier to just put the whole ball of wax into the initial
> patch set, rather than introduce the iterator now and then fix it up in
> the patch set that I'm basing on this.
> 
> So yes, there's a certain amount of excess functionality in this patch
> set ... I can remove it for the next release.

I'd say "Just document it" as that was the main reason I noticed it.
Or perhaps add the batching function as a stand-alone patch so it's
clear that the batch interface solves two problems at once - large
pages and the btrfs page batching implementation...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-11 Thread Matthew Wilcox
On Tue, Feb 11, 2020 at 03:52:30PM +1100, Dave Chinner wrote:
> > +struct readahead_control {
> > +   struct file *file;
> > +   struct address_space *mapping;
> > +/* private: use the readahead_* accessors instead */
> > +   pgoff_t start;
> > +   unsigned int nr_pages;
> > +   unsigned int batch_count;
> > +};
> > +
> > +static inline struct page *readahead_page(struct readahead_control *rac)
> > +{
> > +   struct page *page;
> > +
> > +   if (!rac->nr_pages)
> > +   return NULL;
> > +
> > +   page = xa_load(>mapping->i_pages, rac->start);
> > +   VM_BUG_ON_PAGE(!PageLocked(page), page);
> > +   rac->batch_count = hpage_nr_pages(page);
> > +   rac->start += rac->batch_count;
> 
> There's no mention of large page support in the patch description
> and I don't recall this sort of large page batching in previous
> iterations.
> 
> This seems like new functionality to me, not directly related to
> the initial ->readahead API change? What have I missed?

I had a crisis of confidence when I was working on this -- the loop
originally looked like this:

#define readahead_for_each(rac, page)   \
for (; (page = readahead_page(rac)); rac->nr_pages--)

and then I started thinking about what I'd need to do to support large
pages, and that turned into

#define readahead_for_each(rac, page)   \
for (; (page = readahead_page(rac));\
rac->nr_pages -= hpage_nr_pages(page))

but I realised that was potentially a use-after-free because 'page' has
certainly had put_page() called on it by then.  I had a brief period
where I looked at moving put_page() away from being the filesystem's
responsibility and into the iterator, but that would introduce more
changes into the patchset, as well as causing problems for filesystems
that want to break out of the loop.

By this point, I was also looking at the readahead_for_each_batch()
iterator that btrfs uses, and so we have the batch count anyway, and we
might as well use it to store the number of subpages of the large page.
And so it became easier to just put the whole ball of wax into the initial
patch set, rather than introduce the iterator now and then fix it up in
the patch set that I'm basing on this.

So yes, there's a certain amount of excess functionality in this patch
set ... I can remove it for the next release.


Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-10 Thread Dave Chinner
On Mon, Feb 10, 2020 at 05:03:39PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> This replaces ->readpages with a saner interface:
>  - Return void instead of an ignored error code.
>  - Pages are already in the page cache when ->readahead is called.
>  - Implementation looks up the pages in the page cache instead of
>having them passed in a linked list.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 



>  
> +/*
> + * Readahead is of a block of consecutive pages.
> + */
> +struct readahead_control {
> + struct file *file;
> + struct address_space *mapping;
> +/* private: use the readahead_* accessors instead */
> + pgoff_t start;
> + unsigned int nr_pages;
> + unsigned int batch_count;
> +};
> +
> +static inline struct page *readahead_page(struct readahead_control *rac)
> +{
> + struct page *page;
> +
> + if (!rac->nr_pages)
> + return NULL;
> +
> + page = xa_load(>mapping->i_pages, rac->start);
> + VM_BUG_ON_PAGE(!PageLocked(page), page);
> + rac->batch_count = hpage_nr_pages(page);
> + rac->start += rac->batch_count;

There's no mention of large page support in the patch description
and I don't recall this sort of large page batching in previous
iterations.

This seems like new functionality to me, not directly related to
the initial ->readahead API change? What have I missed?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com