Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-28 Thread William Kucharski
Except for [PATCH v4 36/36], which I can't approve for obvious reasons:

Reviewed-by: William Kucharski 


Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-25 Thread Matthew Wilcox
On Tue, May 26, 2020 at 09:07:51AM +1000, Dave Chinner wrote:
> On Thu, May 21, 2020 at 08:05:53PM -0700, Matthew Wilcox wrote:
> > On Fri, May 22, 2020 at 12:57:51PM +1000, Dave Chinner wrote:
> > > Again, why is this dependent on THP? We can allocate compound pages
> > > without using THP, so why only allow the page cache to use larger
> > > pages when THP is configured?
> > 
> > We have too many CONFIG options.  My brain can't cope with adding
> > CONFIG_LARGE_PAGES because then we might have neither THP nor LP, LP and
> > not THP, THP and not LP or both THP and LP.  And of course HUGETLBFS,
> > which has its own special set of issues that one has to think about when
> > dealing with the page cache.
> 
> That sounds like something that should be fixed. :/

If I have to fix hugetlbfs before doing large pages in the page cache,
we'll be five years away and at least two mental breakdowns.  Honestly,
I'd rather work on almost anything else.  Some of the work I'm doing
will help make hugetlbfs more similar to everything else, eventually,
but ... no, not going to put all this on hold to fix hugetlbfs.  Sorry.

> Really, I don't care about the historical mechanisms that people can
> configure large pages with. If the mm subsystem does not have a
> unified abstraction and API for working with large pages, then that
> is the first problem that needs to be addressed before other
> subsystems start trying to use large pages. 

I think you're reading too quickly.  Let me try again.

Historically, Transparent Huge Pages have been PMD sized.  They have also
had a complicated interface to use.  I am changing both those things;
THPs may now be arbitrary order, and I'm adding interfaces to make THPs
easier to work with.

Now, if you want to contend that THPs are inextricably linked with
PMD sizes and I need to choose a different name, I've been thinking
about other options a bit.  One would be 'lpage' for 'large page'.
Another would be 'mop' for 'multi-order page'.

We should not be seeing 'compound_order' in any filesystem code.
Compound pages are an mm concept.  They happen to be how THPs are
implemented, but it'd be a layering violation to use them directly.


Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-25 Thread Dave Chinner
On Thu, May 21, 2020 at 08:05:53PM -0700, Matthew Wilcox wrote:
> On Fri, May 22, 2020 at 12:57:51PM +1000, Dave Chinner wrote:
> > On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote:
> > > On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote:
> > > > Ok, so the main issue I have with the filesystem/iomap side of
> > > > things is that it appears to be adding "transparent huge page"
> > > > awareness to the filesysetm code, not "large page support".
> > > > 
> > > > For people that aren't aware of the difference between the
> > > > transparent huge and and a normal compound page (e.g. I have no idea
> > > > what the difference is), this is likely to cause problems,
> > > > especially as you haven't explained at all in this description why
> > > > transparent huge pages are being used rather than bog standard
> > > > compound pages.
> > > 
> > > The primary reason to use a different name from compound_*
> > > is so that it can be compiled out for systems that don't enable
> > > CONFIG_TRANSPARENT_HUGEPAGE.  So THPs are compound pages, as they always
> > > have been, but for a filesystem, using thp_size() will compile to either
> > > page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE.
> > 
> > Again, why is this dependent on THP? We can allocate compound pages
> > without using THP, so why only allow the page cache to use larger
> > pages when THP is configured?
> 
> We have too many CONFIG options.  My brain can't cope with adding
> CONFIG_LARGE_PAGES because then we might have neither THP nor LP, LP and
> not THP, THP and not LP or both THP and LP.  And of course HUGETLBFS,
> which has its own special set of issues that one has to think about when
> dealing with the page cache.

That sounds like something that should be fixed. :/

Really, I don't care about the historical mechanisms that people can
configure large pages with. If the mm subsystem does not have a
unified abstraction and API for working with large pages, then that
is the first problem that needs to be addressed before other
subsystems start trying to use large pages. 

i.e. a filesystem developer doesn't care how the mm subsystem is
allocating/managing large pages, we just want to be able to treat
large pages exactly the same way as we treat single pages. There
should be exactly zero difference between them at the API level.

> So, either large pages becomes part of the base kernel and you
> always get them, or there's a CONFIG option to enable them and it's
> CONFIG_TRANSPARENT_HUGEPAGE.  I chose the latter.

Please make the API part of the base kernel. Then you can hide all
these whacky mm level config options behind it so that code that
interacts with large pages just doesn't have to care about what type
of large page infrastructure the user has configured.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-21 Thread Matthew Wilcox
On Fri, May 22, 2020 at 12:57:51PM +1000, Dave Chinner wrote:
> On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote:
> > On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote:
> > > Ok, so the main issue I have with the filesystem/iomap side of
> > > things is that it appears to be adding "transparent huge page"
> > > awareness to the filesysetm code, not "large page support".
> > > 
> > > For people that aren't aware of the difference between the
> > > transparent huge and and a normal compound page (e.g. I have no idea
> > > what the difference is), this is likely to cause problems,
> > > especially as you haven't explained at all in this description why
> > > transparent huge pages are being used rather than bog standard
> > > compound pages.
> > 
> > The primary reason to use a different name from compound_*
> > is so that it can be compiled out for systems that don't enable
> > CONFIG_TRANSPARENT_HUGEPAGE.  So THPs are compound pages, as they always
> > have been, but for a filesystem, using thp_size() will compile to either
> > page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE.
> 
> Again, why is this dependent on THP? We can allocate compound pages
> without using THP, so why only allow the page cache to use larger
> pages when THP is configured?

We have too many CONFIG options.  My brain can't cope with adding
CONFIG_LARGE_PAGES because then we might have neither THP nor LP, LP and
not THP, THP and not LP or both THP and LP.  And of course HUGETLBFS,
which has its own special set of issues that one has to think about when
dealing with the page cache.

So, either large pages becomes part of the base kernel and you
always get them, or there's a CONFIG option to enable them and it's
CONFIG_TRANSPARENT_HUGEPAGE.  I chose the latter.

I suppose what I'm saying is that a transparent hugepage can now be any
size [1], not just PMD size.

[1] power of two that isn't 1 because we use the third page for
something-or-other.


Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-21 Thread Dave Chinner
On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote:
> On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote:
> > Ok, so the main issue I have with the filesystem/iomap side of
> > things is that it appears to be adding "transparent huge page"
> > awareness to the filesysetm code, not "large page support".
> > 
> > For people that aren't aware of the difference between the
> > transparent huge and and a normal compound page (e.g. I have no idea
> > what the difference is), this is likely to cause problems,
> > especially as you haven't explained at all in this description why
> > transparent huge pages are being used rather than bog standard
> > compound pages.
> 
> The primary reason to use a different name from compound_*
> is so that it can be compiled out for systems that don't enable
> CONFIG_TRANSPARENT_HUGEPAGE.  So THPs are compound pages, as they always
> have been, but for a filesystem, using thp_size() will compile to either
> page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE.

Again, why is this dependent on THP? We can allocate compound pages
without using THP, so why only allow the page cache to use larger
pages when THP is configured?

i.e. I don't know why this is dependent on THP because you haven't
explained why this only works for THP and not just plain old
compound pages

> Now, maybe thp_size() is the wrong name, but then you need to suggest
> a better name ;-)

First you need to explain why THP is requirement for large pages in
the page cache when most of the code changes I see only care if the
page is a compound page or not

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-21 Thread Matthew Wilcox
On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote:
> Ok, so the main issue I have with the filesystem/iomap side of
> things is that it appears to be adding "transparent huge page"
> awareness to the filesysetm code, not "large page support".
> 
> For people that aren't aware of the difference between the
> transparent huge and and a normal compound page (e.g. I have no idea
> what the difference is), this is likely to cause problems,
> especially as you haven't explained at all in this description why
> transparent huge pages are being used rather than bog standard
> compound pages.

The primary reason to use a different name from compound_*
is so that it can be compiled out for systems that don't enable
CONFIG_TRANSPARENT_HUGEPAGE.  So THPs are compound pages, as they always
have been, but for a filesystem, using thp_size() will compile to either
page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE.

Now, maybe thp_size() is the wrong name, but then you need to suggest
a better name ;-)

> And, really, why should iomap or the filesystems care if the large
> page is a THP or just a high order compound page? The interface
> for operating on these things at the page cache level should be the
> same. We already have page_size() and friends for operating on
> high order compound pages, yet the iomap stuff has this new
> thp_size() function instead of just using page_size(). THis is going
> to lead to confusion and future bugs when people who don't know the
> difference use the wrong page size function in their filesystem
> code.

There is no wrong function to use -- just one that expands to more code
in the case where CONFIG_TRANSPARENT_HUGEPAGE is disabled.


Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-21 Thread Dave Chinner
On Fri, May 15, 2020 at 06:16:20AM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> This patch set does not pass xfstests.  Test at your own risk.  It is
> based on the readahead rewrite which is in Andrew's tree.  I've fixed a
> lot of issues in the last two weeks, but generic/013 will still crash it.
> 
> The primary idea here is that a large part of the overhead in dealing
> with individual pages is that there's just so darned many of them.
> We would be better off dealing with fewer, larger pages, even if they
> don't get to be the size necessary for the CPU to use a larger TLB entry.

Ok, so the main issue I have with the filesystem/iomap side of
things is that it appears to be adding "transparent huge page"
awareness to the filesysetm code, not "large page support".

For people that aren't aware of the difference between the
transparent huge and and a normal compound page (e.g. I have no idea
what the difference is), this is likely to cause problems,
especially as you haven't explained at all in this description why
transparent huge pages are being used rather than bog standard
compound pages.

And, really, why should iomap or the filesystems care if the large
page is a THP or just a high order compound page? The interface
for operating on these things at the page cache level should be the
same. We already have page_size() and friends for operating on
high order compound pages, yet the iomap stuff has this new
thp_size() function instead of just using page_size(). THis is going
to lead to confusion and future bugs when people who don't know the
difference use the wrong page size function in their filesystem
code.

So, really, the "large page" API presented to the filesystems via
the page cache needs to be unified. Having to use compound_*() in
some places, thp_* in others, then page_* and Page*, not to mention
hpage_* just so that we can correctly support "large pages" is a
total non-starter.

Hence I'd suggest that this patch set needs to start by "hiding" all
the differences between different types of pages behind a unified,
consistent API, then it can introduce large page support into code
outside the mm/ infrastructure via that unified API. I don't care
what that API looks like so long as it is clear, consistenti, well
documented and means filesystem developers don't need to know
anything about how the page (large or not) is managed by the mm
subsystem.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


[PATCH v4 00/36] Large pages in the page cache

2020-05-15 Thread Matthew Wilcox
From: "Matthew Wilcox (Oracle)" 

This patch set does not pass xfstests.  Test at your own risk.  It is
based on the readahead rewrite which is in Andrew's tree.  I've fixed a
lot of issues in the last two weeks, but generic/013 will still crash it.

The primary idea here is that a large part of the overhead in dealing
with individual pages is that there's just so darned many of them.
We would be better off dealing with fewer, larger pages, even if they
don't get to be the size necessary for the CPU to use a larger TLB entry.

v4:
 - Fix thp_size typo
 - Fix the iomap page_mkwrite() path to operate on the head page, even
   though the vm_fault has a pointer to the tail page
 - Fix iomap_finish_ioend() to use bio_for_each_thp_segment_all()
 - Rework PageDoubleMap (see first two patches for details)
 - Fix page_cache_delete() to handle shadow entries being stored to a THP
 - Fix the assertion in pagecache_get_page() to handle tail pages
 - Change PageReadahead from NO_COMPOUND to ONLY_HEAD
 - Handle PageReadahead being set on head pages
 - Handle total_mapcount correctly (Kirill)
 - Pull the FS_LARGE_PAGES check out into mapping_large_pages()
 - Fix page size assumption in truncate_cleanup_page()
 - Avoid splitting large pages unnecessarily on truncate
 - Disable the page cache truncation introduced as part of the read-only
   THP patch set
 - Call compound_head() in iomap buffered write paths -- we retrieve a
   (potentially) tail page from the page cache and need to use that for
   flush_dcache_page(), but we expect to operate on a head page in most
   of the iomap code

Kirill A. Shutemov (1):
  mm: Fix total_mapcount assumption of page size

Matthew Wilcox (Oracle) (34):
  mm: Move PageDoubleMap bit
  mm: Simplify PageDoubleMap with PF_SECOND policy
  mm: Allow hpages to be arbitrary order
  mm: Introduce thp_size
  mm: Introduce thp_order
  mm: Introduce offset_in_thp
  fs: Add a filesystem flag for large pages
  fs: Do not update nr_thps for large page mappings
  fs: Introduce i_blocks_per_page
  fs: Make page_mkwrite_check_truncate thp-aware
  fs: Support THPs in zero_user_segments
  bio: Add bio_for_each_thp_segment_all
  iomap: Support arbitrarily many blocks per page
  iomap: Support large pages in iomap_adjust_read_range
  iomap: Support large pages in read paths
  iomap: Support large pages in write paths
  iomap: Inline data shouldn't see large pages
  iomap: Handle tail pages in iomap_page_mkwrite
  xfs: Support large pages
  mm: Make prep_transhuge_page return its argument
  mm: Add __page_cache_alloc_order
  mm: Allow large pages to be added to the page cache
  mm: Allow large pages to be removed from the page cache
  mm: Remove page fault assumption of compound page size
  mm: Avoid splitting large pages
  mm: Fix truncation for pages of arbitrary size
  mm: Support storing shadow entries for large pages
  mm: Support retrieving tail pages from the page cache
  mm: Support tail pages in wait_for_stable_page
  mm: Add DEFINE_READAHEAD
  mm: Make page_cache_readahead_unbounded take a readahead_control
  mm: Make __do_page_cache_readahead take a readahead_control
  mm: Allow PageReadahead to be set on head pages
  mm: Add large page readahead

William Kucharski (1):
  mm: Align THP mappings for non-DAX

 drivers/nvdimm/btt.c   |   4 +-
 drivers/nvdimm/pmem.c  |   6 +-
 fs/ext4/verity.c   |   4 +-
 fs/f2fs/verity.c   |   4 +-
 fs/iomap/buffered-io.c | 121 +--
 fs/jfs/jfs_metapage.c  |   2 +-
 fs/xfs/xfs_aops.c  |   4 +-
 fs/xfs/xfs_super.c |   2 +-
 include/linux/bio.h|  13 
 include/linux/bvec.h   |  23 ++
 include/linux/fs.h |  28 +--
 include/linux/highmem.h|  15 +++-
 include/linux/huge_mm.h|  25 +--
 include/linux/mm.h |  97 +
 include/linux/page-flags.h |  46 
 include/linux/pagemap.h|  96 +---
 mm/filemap.c   |  87 ++
 mm/highmem.c   |  62 +++-
 mm/huge_memory.c   |  58 +++
 mm/internal.h  |  13 ++--
 mm/memory.c|   7 +-
 mm/page-writeback.c|   1 +
 mm/page_io.c   |   2 +-
 mm/page_vma_mapped.c   |   4 +-
 mm/readahead.c | 145 -
 mm/truncate.c  |   6 +-
 mm/vmscan.c|   5 +-
 27 files changed, 565 insertions(+), 315 deletions(-)

-- 
2.26.2