Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

2020-02-11 Thread Johannes Thumshirn
On 11/02/2020 02:05, Matthew Wilcox wrote:
> even though I'm pretty sure we're not going to readahead more than 2^32
> pages ever.

And 640K is more memory than anyone will ever need on a computer *scnr*



Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

2020-02-11 Thread Matthew Wilcox
On Tue, Feb 11, 2020 at 08:19:14AM +, Johannes Thumshirn wrote:
> On 11/02/2020 02:05, Matthew Wilcox wrote:
> > even though I'm pretty sure we're not going to readahead more than 2^32
> > pages ever.
> 
> And 640K is more memory than anyone will ever need on a computer *scnr*

Sure, but bandwidth just isn't increasing quickly enough to have
this make sense.  2^32 pages even on our smallest page size machines
is 16GB.  Right now, we cap readahead at just 256kB.  If we did try to
readahead 16GB, we'd be occupying a PCIe gen4 x4 drive for two seconds,
just satisfying this one readahead.  PCIe has historically doubled in
bandwidth every three years or so, so to get this down to something
reasonable like a hundredth of a second, we're looking at PCIe gen12 in
twenty years or so.  And I bet we still won't do it (also, I doubt PCIe
will continue doubling bandwidth every three years).

And Linus has forbidden individual IOs over 2GB anyway, so not happening
until he's forced to see the error of his ways ;-)


Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-11 Thread Matthew Wilcox
On Tue, Feb 11, 2020 at 03:52:30PM +1100, Dave Chinner wrote:
> > +struct readahead_control {
> > +   struct file *file;
> > +   struct address_space *mapping;
> > +/* private: use the readahead_* accessors instead */
> > +   pgoff_t start;
> > +   unsigned int nr_pages;
> > +   unsigned int batch_count;
> > +};
> > +
> > +static inline struct page *readahead_page(struct readahead_control *rac)
> > +{
> > +   struct page *page;
> > +
> > +   if (!rac->nr_pages)
> > +   return NULL;
> > +
> > +   page = xa_load(>mapping->i_pages, rac->start);
> > +   VM_BUG_ON_PAGE(!PageLocked(page), page);
> > +   rac->batch_count = hpage_nr_pages(page);
> > +   rac->start += rac->batch_count;
> 
> There's no mention of large page support in the patch description
> and I don't recall this sort of large page batching in previous
> iterations.
> 
> This seems like new functionality to me, not directly related to
> the initial ->readahead API change? What have I missed?

I had a crisis of confidence when I was working on this -- the loop
originally looked like this:

#define readahead_for_each(rac, page)   \
for (; (page = readahead_page(rac)); rac->nr_pages--)

and then I started thinking about what I'd need to do to support large
pages, and that turned into

#define readahead_for_each(rac, page)   \
for (; (page = readahead_page(rac));\
rac->nr_pages -= hpage_nr_pages(page))

but I realised that was potentially a use-after-free because 'page' has
certainly had put_page() called on it by then.  I had a brief period
where I looked at moving put_page() away from being the filesystem's
responsibility and into the iterator, but that would introduce more
changes into the patchset, as well as causing problems for filesystems
that want to break out of the loop.

By this point, I was also looking at the readahead_for_each_batch()
iterator that btrfs uses, and so we have the batch count anyway, and we
might as well use it to store the number of subpages of the large page.
And so it became easier to just put the whole ball of wax into the initial
patch set, rather than introduce the iterator now and then fix it up in
the patch set that I'm basing on this.

So yes, there's a certain amount of excess functionality in this patch
set ... I can remove it for the next release.


Re: [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-11 Thread Dave Chinner
On Tue, Feb 11, 2020 at 04:54:13AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 11, 2020 at 03:52:30PM +1100, Dave Chinner wrote:
> > > +struct readahead_control {
> > > + struct file *file;
> > > + struct address_space *mapping;
> > > +/* private: use the readahead_* accessors instead */
> > > + pgoff_t start;
> > > + unsigned int nr_pages;
> > > + unsigned int batch_count;
> > > +};
> > > +
> > > +static inline struct page *readahead_page(struct readahead_control *rac)
> > > +{
> > > + struct page *page;
> > > +
> > > + if (!rac->nr_pages)
> > > + return NULL;
> > > +
> > > + page = xa_load(>mapping->i_pages, rac->start);
> > > + VM_BUG_ON_PAGE(!PageLocked(page), page);
> > > + rac->batch_count = hpage_nr_pages(page);
> > > + rac->start += rac->batch_count;
> > 
> > There's no mention of large page support in the patch description
> > and I don't recall this sort of large page batching in previous
> > iterations.
> > 
> > This seems like new functionality to me, not directly related to
> > the initial ->readahead API change? What have I missed?
> 
> I had a crisis of confidence when I was working on this -- the loop
> originally looked like this:
> 
> #define readahead_for_each(rac, page)   \
> for (; (page = readahead_page(rac)); rac->nr_pages--)
> 
> and then I started thinking about what I'd need to do to support large
> pages, and that turned into
> 
> #define readahead_for_each(rac, page)   \
> for (; (page = readahead_page(rac));  \
>   rac->nr_pages -= hpage_nr_pages(page))
> 
> but I realised that was potentially a use-after-free because 'page' has
> certainly had put_page() called on it by then.  I had a brief period
> where I looked at moving put_page() away from being the filesystem's
> responsibility and into the iterator, but that would introduce more
> changes into the patchset, as well as causing problems for filesystems
> that want to break out of the loop.
> 
> By this point, I was also looking at the readahead_for_each_batch()
> iterator that btrfs uses, and so we have the batch count anyway, and we
> might as well use it to store the number of subpages of the large page.
> And so it became easier to just put the whole ball of wax into the initial
> patch set, rather than introduce the iterator now and then fix it up in
> the patch set that I'm basing on this.
> 
> So yes, there's a certain amount of excess functionality in this patch
> set ... I can remove it for the next release.

I'd say "Just document it" as that was the main reason I noticed it.
Or perhaps add the batching function as a stand-alone patch so it's
clear that the batch interface solves two problems at once - large
pages and the btrfs page batching implementation...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com