Re: [PATCH v4 00/36] Large pages in the page cache
Except for [PATCH v4 36/36], which I can't approve for obvious reasons: Reviewed-by: William Kucharski
Re: [PATCH v4 00/36] Large pages in the page cache
On Tue, May 26, 2020 at 09:07:51AM +1000, Dave Chinner wrote: > On Thu, May 21, 2020 at 08:05:53PM -0700, Matthew Wilcox wrote: > > On Fri, May 22, 2020 at 12:57:51PM +1000, Dave Chinner wrote: > > > Again, why is this dependent on THP? We can allocate compound pages > > > without using THP, so why only allow the page cache to use larger > > > pages when THP is configured? > > > > We have too many CONFIG options. My brain can't cope with adding > > CONFIG_LARGE_PAGES because then we might have neither THP nor LP, LP and > > not THP, THP and not LP or both THP and LP. And of course HUGETLBFS, > > which has its own special set of issues that one has to think about when > > dealing with the page cache. > > That sounds like something that should be fixed. :/ If I have to fix hugetlbfs before doing large pages in the page cache, we'll be five years away and at least two mental breakdowns. Honestly, I'd rather work on almost anything else. Some of the work I'm doing will help make hugetlbfs more similar to everything else, eventually, but ... no, not going to put all this on hold to fix hugetlbfs. Sorry. > Really, I don't care about the historical mechanisms that people can > configure large pages with. If the mm subsystem does not have a > unified abstraction and API for working with large pages, then that > is the first problem that needs to be addressed before other > subsystems start trying to use large pages. I think you're reading too quickly. Let me try again. Historically, Transparent Huge Pages have been PMD sized. They have also had a complicated interface to use. I am changing both those things; THPs may now be arbitrary order, and I'm adding interfaces to make THPs easier to work with. Now, if you want to contend that THPs are inextricably linked with PMD sizes and I need to choose a different name, I've been thinking about other options a bit. One would be 'lpage' for 'large page'. Another would be 'mop' for 'multi-order page'. We should not be seeing 'compound_order' in any filesystem code. Compound pages are an mm concept. They happen to be how THPs are implemented, but it'd be a layering violation to use them directly.
Re: [PATCH v4 00/36] Large pages in the page cache
On Thu, May 21, 2020 at 08:05:53PM -0700, Matthew Wilcox wrote: > On Fri, May 22, 2020 at 12:57:51PM +1000, Dave Chinner wrote: > > On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote: > > > On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote: > > > > Ok, so the main issue I have with the filesystem/iomap side of > > > > things is that it appears to be adding "transparent huge page" > > > > awareness to the filesysetm code, not "large page support". > > > > > > > > For people that aren't aware of the difference between the > > > > transparent huge and and a normal compound page (e.g. I have no idea > > > > what the difference is), this is likely to cause problems, > > > > especially as you haven't explained at all in this description why > > > > transparent huge pages are being used rather than bog standard > > > > compound pages. > > > > > > The primary reason to use a different name from compound_* > > > is so that it can be compiled out for systems that don't enable > > > CONFIG_TRANSPARENT_HUGEPAGE. So THPs are compound pages, as they always > > > have been, but for a filesystem, using thp_size() will compile to either > > > page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE. > > > > Again, why is this dependent on THP? We can allocate compound pages > > without using THP, so why only allow the page cache to use larger > > pages when THP is configured? > > We have too many CONFIG options. My brain can't cope with adding > CONFIG_LARGE_PAGES because then we might have neither THP nor LP, LP and > not THP, THP and not LP or both THP and LP. And of course HUGETLBFS, > which has its own special set of issues that one has to think about when > dealing with the page cache. That sounds like something that should be fixed. :/ Really, I don't care about the historical mechanisms that people can configure large pages with. If the mm subsystem does not have a unified abstraction and API for working with large pages, then that is the first problem that needs to be addressed before other subsystems start trying to use large pages. i.e. a filesystem developer doesn't care how the mm subsystem is allocating/managing large pages, we just want to be able to treat large pages exactly the same way as we treat single pages. There should be exactly zero difference between them at the API level. > So, either large pages becomes part of the base kernel and you > always get them, or there's a CONFIG option to enable them and it's > CONFIG_TRANSPARENT_HUGEPAGE. I chose the latter. Please make the API part of the base kernel. Then you can hide all these whacky mm level config options behind it so that code that interacts with large pages just doesn't have to care about what type of large page infrastructure the user has configured. Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH v4 00/36] Large pages in the page cache
On Fri, May 22, 2020 at 12:57:51PM +1000, Dave Chinner wrote: > On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote: > > On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote: > > > Ok, so the main issue I have with the filesystem/iomap side of > > > things is that it appears to be adding "transparent huge page" > > > awareness to the filesysetm code, not "large page support". > > > > > > For people that aren't aware of the difference between the > > > transparent huge and and a normal compound page (e.g. I have no idea > > > what the difference is), this is likely to cause problems, > > > especially as you haven't explained at all in this description why > > > transparent huge pages are being used rather than bog standard > > > compound pages. > > > > The primary reason to use a different name from compound_* > > is so that it can be compiled out for systems that don't enable > > CONFIG_TRANSPARENT_HUGEPAGE. So THPs are compound pages, as they always > > have been, but for a filesystem, using thp_size() will compile to either > > page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE. > > Again, why is this dependent on THP? We can allocate compound pages > without using THP, so why only allow the page cache to use larger > pages when THP is configured? We have too many CONFIG options. My brain can't cope with adding CONFIG_LARGE_PAGES because then we might have neither THP nor LP, LP and not THP, THP and not LP or both THP and LP. And of course HUGETLBFS, which has its own special set of issues that one has to think about when dealing with the page cache. So, either large pages becomes part of the base kernel and you always get them, or there's a CONFIG option to enable them and it's CONFIG_TRANSPARENT_HUGEPAGE. I chose the latter. I suppose what I'm saying is that a transparent hugepage can now be any size [1], not just PMD size. [1] power of two that isn't 1 because we use the third page for something-or-other.
Re: [PATCH v4 00/36] Large pages in the page cache
On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote: > On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote: > > Ok, so the main issue I have with the filesystem/iomap side of > > things is that it appears to be adding "transparent huge page" > > awareness to the filesysetm code, not "large page support". > > > > For people that aren't aware of the difference between the > > transparent huge and and a normal compound page (e.g. I have no idea > > what the difference is), this is likely to cause problems, > > especially as you haven't explained at all in this description why > > transparent huge pages are being used rather than bog standard > > compound pages. > > The primary reason to use a different name from compound_* > is so that it can be compiled out for systems that don't enable > CONFIG_TRANSPARENT_HUGEPAGE. So THPs are compound pages, as they always > have been, but for a filesystem, using thp_size() will compile to either > page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE. Again, why is this dependent on THP? We can allocate compound pages without using THP, so why only allow the page cache to use larger pages when THP is configured? i.e. I don't know why this is dependent on THP because you haven't explained why this only works for THP and not just plain old compound pages > Now, maybe thp_size() is the wrong name, but then you need to suggest > a better name ;-) First you need to explain why THP is requirement for large pages in the page cache when most of the code changes I see only care if the page is a compound page or not Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH v4 00/36] Large pages in the page cache
On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote: > Ok, so the main issue I have with the filesystem/iomap side of > things is that it appears to be adding "transparent huge page" > awareness to the filesysetm code, not "large page support". > > For people that aren't aware of the difference between the > transparent huge and and a normal compound page (e.g. I have no idea > what the difference is), this is likely to cause problems, > especially as you haven't explained at all in this description why > transparent huge pages are being used rather than bog standard > compound pages. The primary reason to use a different name from compound_* is so that it can be compiled out for systems that don't enable CONFIG_TRANSPARENT_HUGEPAGE. So THPs are compound pages, as they always have been, but for a filesystem, using thp_size() will compile to either page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE. Now, maybe thp_size() is the wrong name, but then you need to suggest a better name ;-) > And, really, why should iomap or the filesystems care if the large > page is a THP or just a high order compound page? The interface > for operating on these things at the page cache level should be the > same. We already have page_size() and friends for operating on > high order compound pages, yet the iomap stuff has this new > thp_size() function instead of just using page_size(). THis is going > to lead to confusion and future bugs when people who don't know the > difference use the wrong page size function in their filesystem > code. There is no wrong function to use -- just one that expands to more code in the case where CONFIG_TRANSPARENT_HUGEPAGE is disabled.
Re: [PATCH v4 00/36] Large pages in the page cache
On Fri, May 15, 2020 at 06:16:20AM -0700, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" > > This patch set does not pass xfstests. Test at your own risk. It is > based on the readahead rewrite which is in Andrew's tree. I've fixed a > lot of issues in the last two weeks, but generic/013 will still crash it. > > The primary idea here is that a large part of the overhead in dealing > with individual pages is that there's just so darned many of them. > We would be better off dealing with fewer, larger pages, even if they > don't get to be the size necessary for the CPU to use a larger TLB entry. Ok, so the main issue I have with the filesystem/iomap side of things is that it appears to be adding "transparent huge page" awareness to the filesysetm code, not "large page support". For people that aren't aware of the difference between the transparent huge and and a normal compound page (e.g. I have no idea what the difference is), this is likely to cause problems, especially as you haven't explained at all in this description why transparent huge pages are being used rather than bog standard compound pages. And, really, why should iomap or the filesystems care if the large page is a THP or just a high order compound page? The interface for operating on these things at the page cache level should be the same. We already have page_size() and friends for operating on high order compound pages, yet the iomap stuff has this new thp_size() function instead of just using page_size(). THis is going to lead to confusion and future bugs when people who don't know the difference use the wrong page size function in their filesystem code. So, really, the "large page" API presented to the filesystems via the page cache needs to be unified. Having to use compound_*() in some places, thp_* in others, then page_* and Page*, not to mention hpage_* just so that we can correctly support "large pages" is a total non-starter. Hence I'd suggest that this patch set needs to start by "hiding" all the differences between different types of pages behind a unified, consistent API, then it can introduce large page support into code outside the mm/ infrastructure via that unified API. I don't care what that API looks like so long as it is clear, consistenti, well documented and means filesystem developers don't need to know anything about how the page (large or not) is managed by the mm subsystem. Cheers, Dave. -- Dave Chinner da...@fromorbit.com
[PATCH v4 00/36] Large pages in the page cache
From: "Matthew Wilcox (Oracle)" This patch set does not pass xfstests. Test at your own risk. It is based on the readahead rewrite which is in Andrew's tree. I've fixed a lot of issues in the last two weeks, but generic/013 will still crash it. The primary idea here is that a large part of the overhead in dealing with individual pages is that there's just so darned many of them. We would be better off dealing with fewer, larger pages, even if they don't get to be the size necessary for the CPU to use a larger TLB entry. v4: - Fix thp_size typo - Fix the iomap page_mkwrite() path to operate on the head page, even though the vm_fault has a pointer to the tail page - Fix iomap_finish_ioend() to use bio_for_each_thp_segment_all() - Rework PageDoubleMap (see first two patches for details) - Fix page_cache_delete() to handle shadow entries being stored to a THP - Fix the assertion in pagecache_get_page() to handle tail pages - Change PageReadahead from NO_COMPOUND to ONLY_HEAD - Handle PageReadahead being set on head pages - Handle total_mapcount correctly (Kirill) - Pull the FS_LARGE_PAGES check out into mapping_large_pages() - Fix page size assumption in truncate_cleanup_page() - Avoid splitting large pages unnecessarily on truncate - Disable the page cache truncation introduced as part of the read-only THP patch set - Call compound_head() in iomap buffered write paths -- we retrieve a (potentially) tail page from the page cache and need to use that for flush_dcache_page(), but we expect to operate on a head page in most of the iomap code Kirill A. Shutemov (1): mm: Fix total_mapcount assumption of page size Matthew Wilcox (Oracle) (34): mm: Move PageDoubleMap bit mm: Simplify PageDoubleMap with PF_SECOND policy mm: Allow hpages to be arbitrary order mm: Introduce thp_size mm: Introduce thp_order mm: Introduce offset_in_thp fs: Add a filesystem flag for large pages fs: Do not update nr_thps for large page mappings fs: Introduce i_blocks_per_page fs: Make page_mkwrite_check_truncate thp-aware fs: Support THPs in zero_user_segments bio: Add bio_for_each_thp_segment_all iomap: Support arbitrarily many blocks per page iomap: Support large pages in iomap_adjust_read_range iomap: Support large pages in read paths iomap: Support large pages in write paths iomap: Inline data shouldn't see large pages iomap: Handle tail pages in iomap_page_mkwrite xfs: Support large pages mm: Make prep_transhuge_page return its argument mm: Add __page_cache_alloc_order mm: Allow large pages to be added to the page cache mm: Allow large pages to be removed from the page cache mm: Remove page fault assumption of compound page size mm: Avoid splitting large pages mm: Fix truncation for pages of arbitrary size mm: Support storing shadow entries for large pages mm: Support retrieving tail pages from the page cache mm: Support tail pages in wait_for_stable_page mm: Add DEFINE_READAHEAD mm: Make page_cache_readahead_unbounded take a readahead_control mm: Make __do_page_cache_readahead take a readahead_control mm: Allow PageReadahead to be set on head pages mm: Add large page readahead William Kucharski (1): mm: Align THP mappings for non-DAX drivers/nvdimm/btt.c | 4 +- drivers/nvdimm/pmem.c | 6 +- fs/ext4/verity.c | 4 +- fs/f2fs/verity.c | 4 +- fs/iomap/buffered-io.c | 121 +-- fs/jfs/jfs_metapage.c | 2 +- fs/xfs/xfs_aops.c | 4 +- fs/xfs/xfs_super.c | 2 +- include/linux/bio.h| 13 include/linux/bvec.h | 23 ++ include/linux/fs.h | 28 +-- include/linux/highmem.h| 15 +++- include/linux/huge_mm.h| 25 +-- include/linux/mm.h | 97 + include/linux/page-flags.h | 46 include/linux/pagemap.h| 96 +--- mm/filemap.c | 87 ++ mm/highmem.c | 62 +++- mm/huge_memory.c | 58 +++ mm/internal.h | 13 ++-- mm/memory.c| 7 +- mm/page-writeback.c| 1 + mm/page_io.c | 2 +- mm/page_vma_mapped.c | 4 +- mm/readahead.c | 145 - mm/truncate.c | 6 +- mm/vmscan.c| 5 +- 27 files changed, 565 insertions(+), 315 deletions(-) -- 2.26.2