[PATCHv1, RFC 18/33] HACK: block: bump BIO_MAX_PAGES

2016-07-25 Thread Kirill A. Shutemov
We are going to do IO a huge page a time. For x86-64, it's 512 pages, so we need to double current BIO_MAX_PAGES. To be portable to other archtectures we need more generic solution. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/bio.h | 2 +-

[PATCHv1, RFC 12/33] truncate: make sure invalidate_mapping_pages() can discard huge pages

2016-07-25 Thread Kirill A. Shutemov
-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/truncate.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/truncate.c b/mm/truncate.c index a01cce450a26..ce904e4b1708 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -504,10 +504,21 @@ unsigne

[PATCHv1, RFC 06/33] radix-tree: Handle multiorder entries being deleted by replace_clear_tags

2016-07-25 Thread Kirill A. Shutemov
From: Matthew Wilcox <wi...@infradead.org> radix_tree_replace_clear_tags() can be called with NULL as the replacement value; in this case we need to delete sibling entries which point to the slot. Signed-off-by: Matthew Wilcox <wi...@infradead.org> Signed-off-by: Kirill A. Shutemov &

[PATCHv1, RFC 08/33] Revert "radix-tree: implement radix_tree_maybe_preload_order()"

2016-07-25 Thread Kirill A. Shutemov
This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8. After conversion of huge tmpfs to multi-order entries, we don't need this anymore. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/radix-tree.h | 1 - lib/radix-tree.c

[PATCHv1, RFC 28/33] ext4: handle huge pages in __ext4_block_zero_page_range()

2016-07-25 Thread Kirill A. Shutemov
As the function handles zeroing range only within one block, the required changes are trivial, just remove assuption on page size. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --gi

[PATCHv1, RFC 20/33] thp: introduce hpage_size() and hpage_mask()

2016-07-25 Thread Kirill A. Shutemov
Introduce new helpers which return size/mask of the page: HPAGE_PMD_SIZE/HPAGE_PMD_MASK if the page is PageTransHuge() and PAGE_SIZE/PAGE_MASK otherwise. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/huge_mm.h | 16 1 file chang

[PATCHv1, RFC 33/33] ext4, vfs: add huge= mount option

2016-07-25 Thread Kirill A. Shutemov
The same four values as in tmpfs case. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/ext4.h | 5 + fs/ext4/inode.c | 26 +- fs/ext4/super.c | 19 +++ 3 files changed, 45 insertions(+), 5 deletions(-) diff --gi

[PATCHv1, RFC 16/33] filemap: handle huge pages in filemap_fdatawait_range()

2016-07-25 Thread Kirill A. Shutemov
We writeback whole huge page a time. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index ad73b99c5ba7..3d46db277e73 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@

[PATCHv1, RFC 21/33] fs: make block_read_full_page() be able to read huge page

2016-07-25 Thread Kirill A. Shutemov
The approach is straight-forward: for compound pages we read out whole huge page. For huge page we cannot have array of buffer head pointers on stack -- it's 4096 pointers on x86-64 -- 'arr' is allocated with kmalloc() for huge pages. Signed-off-by: Kirill A. Shutemov <kirill.sh

[PATCHv1, RFC 26/33] ext4: make ext4_writepage() work on huge pages

2016-07-25 Thread Kirill A. Shutemov
Change ext4_writepage() and underlying ext4_bio_write_page(). It basically removes assumption on page size, infer it from struct page instead. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 10 +- fs/ext4/page-io.c | 11 +-- 2

[PATCHv1, RFC 22/33] fs: make block_write_{begin,end}() be able to handle huge pages

2016-07-25 Thread Kirill A. Shutemov
It's more or less straight-forward. Most changes are around getting offset/len withing page right and zero out desired part of the page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 53 +++-- 1 file c

[PATCHv1, RFC 11/33] thp: allow splitting non-shmem file-backed THPs

2016-07-25 Thread Kirill A. Shutemov
split_huge_page() is ready to handle file-backed huge pages, we only need to remove one guarding VM_BUG_ON_PAGE(). Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/huge_memory.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c

[PATCHv1, RFC 29/33] ext4: handle huge pages in ext4_da_write_end()

2016-07-25 Thread Kirill A. Shutemov
Call ext4_da_should_update_i_disksize() for head page with offset relative to head page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c

[PATCHv1, RFC 24/33] truncate: make truncate_inode_pages_range() aware about huge pages

2016-07-25 Thread Kirill A. Shutemov
. With memory-mapped IO we would loose holes in some cases when we have THP in page cache, since we cannot track access on 4k level in this case. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 2 +- mm/truncate.

[PATCHv1, RFC 00/33] ext4: support of huge pages

2016-07-25 Thread Kirill A. Shutemov
?); - check if memory reclaim process is adequate for huge pages with backing storage (unnecessary split_huge_page() ?); - handle shadow entries properly; - encryption, 1k blocks, bigalloc, ... Kirill A. Shutemov (27): mm, shmem: swich huge tmpfs to multi-order radix-tree entries Revert

[PATCHv1, RFC 31/33] WIP: ext4: handle writeback with huge pages

2016-07-25 Thread Kirill A. Shutemov
Modify mpage_map_and_submit_buffers() to do writeback with huge pages. This is somewhat unstable. I have hard time see full picture yet. More work is required. Not-yet-signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.

[PATCHv1, RFC 32/33] mm, fs, ext4: expand use of page_mapping() and page_to_pgoff()

2016-07-25 Thread Kirill A. Shutemov
With huge pages in page cache we see tail pages in more code paths. This patch replaces direct access to struct page fields with macros which can handle tail pages properly. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 2 +- fs/ext4/i

[PATCHv1, RFC 04/33] radix-tree: Add radix_tree_split

2016-07-25 Thread Kirill A. Shutemov
n call radix_tree_for_each_slot() and radix_tree_replace_slot() in order to turn these retry entries into the intended new entries. Tags are replicated from the original multiorder entry into each new entry. Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill

[PATCHv1, RFC 19/33] mm: make write_cache_pages() work on huge pages

2016-07-25 Thread Kirill A. Shutemov
We writeback whole huge page a time. Let's adjust iteration this way. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/mm.h | 1 + include/linux/pagemap.h | 1 + mm/page-writeback.c | 17 - 3 files changed, 14 insertions

Re: [PATCHv2, 00/41] ext4: support of huge pages

2016-08-12 Thread Kirill A. Shutemov
On Fri, Aug 12, 2016 at 04:34:40PM -0400, Theodore Ts'o wrote: > On Fri, Aug 12, 2016 at 09:37:43PM +0300, Kirill A. Shutemov wrote: > > Here's stabilized version of my patchset which intended to bring huge pages > > to ext4. > > So this patch is more about mm level cha

[PATCHv2 22/41] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask}

2016-08-12 Thread Kirill A. Shutemov
-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/huge_mm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de2789b4402c..5c5466ba37df 100644 --- a/include/linux/huge_mm.h +++ b/i

[PATCHv2 34/41] ext4: handle huge pages in ext4_da_write_end()

2016-08-12 Thread Kirill A. Shutemov
Call ext4_da_should_update_i_disksize() for head page with offset relative to head page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c

[PATCHv2 25/41] fs: make block_page_mkwrite() aware about huge pages

2016-08-12 Thread Kirill A. Shutemov
Adjust check on whether part of the page beyond file size and apply compound_head() and page_mapping() where appropriate. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/bu

[PATCHv2 28/41] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries

2016-08-12 Thread Kirill A. Shutemov
lated code have to be updated. Note that hugetlb_fault_mutex_hash() and reservation region handling are still working with hugepage offset. Signed-off-by: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> [kirill.shute...@linux.intel.com: reject fixed] Signed-off-by: Kirill A. Shutemov <kirill.

[PATCHv2 37/41] ext4: make EXT4_IOC_MOVE_EXT work with huge pages

2016-08-12 Thread Kirill A. Shutemov
Adjust how we find relevant block within page and how we clear the required part of the page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/move_extent.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/ext4/move_extent.

[PATCHv2 17/41] filemap: handle huge pages in filemap_fdatawait_range()

2016-08-12 Thread Kirill A. Shutemov
We writeback whole huge page a time. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 93fa97f143ab..429f9a0962b3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@

[PATCHv2 32/41] ext4: handle huge pages in __ext4_block_zero_page_range()

2016-08-12 Thread Kirill A. Shutemov
As the function handles zeroing range only within one block, the required changes are trivial, just remove assuption on page size. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --gi

[PATCHv2 07/41] mm, shmem: swich huge tmpfs to multi-order radix-tree entries

2016-08-12 Thread Kirill A. Shutemov
-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 320 +-- mm/huge_memory.c | 47 +--- mm/khugepaged.c | 26 ++--- mm/shmem.c | 36 ++- 4 files changed, 247 insertions(+), 182 deletions(-) diff --gi

[PATCHv2 20/41] mm: make write_cache_pages() work on huge pages

2016-08-12 Thread Kirill A. Shutemov
We writeback whole huge page a time. Let's adjust iteration this way. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/mm.h | 1 + include/linux/pagemap.h | 1 + mm/page-writeback.c | 17 - 3 files changed, 14 insertions

[PATCHv2 33/41] ext4: make ext4_block_write_begin() aware about huge pages

2016-08-12 Thread Kirill A. Shutemov
It simply matches changes to __block_write_begin_int(). Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bee21f

[PATCHv2 30/41] ext4: make ext4_writepage() work on huge pages

2016-08-12 Thread Kirill A. Shutemov
Change ext4_writepage() and underlying ext4_bio_write_page(). It basically removes assumption on page size, infer it from struct page instead. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 10 +- fs/ext4/page-io.c | 11 +-- 2

[PATCHv2 03/41] radix-tree: Add radix_tree_join

2016-08-12 Thread Kirill A. Shutemov
e walk, but they will never see NULL for an index which was populated before the join. Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/radix-tree.h| 2 + lib/radix-tree.c

[PATCHv2 11/41] thp: try to free page's buffers before attempt split

2016-08-12 Thread Kirill A. Shutemov
them, before attempt split. And remove one guarding VM_BUG_ON_PAGE(). Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/buffer_head.h | 1 + mm/huge_memory.c| 19 ++- 2 files changed, 19 insertions(+), 1 deletion(-) diff

[PATCHv2 08/41] Revert "radix-tree: implement radix_tree_maybe_preload_order()"

2016-08-12 Thread Kirill A. Shutemov
This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8. After conversion of huge tmpfs to multi-order entries, we don't need this anymore. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/radix-tree.h | 1 - lib/radix-tree.c

[PATCHv2 09/41] page-flags: relax page flag policy for few flags

2016-08-12 Thread Kirill A. Shutemov
These flags are in use for filesystems with backing storage: PG_error, PG_writeback and PG_readahead. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/page-flags.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include

[PATCHv2 10/41] mm, rmap: account file thp pages

2016-08-12 Thread Kirill A. Shutemov
Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps. It indicates how many times we allocate and map file THP. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- drivers/base/node.c| 6 ++ fs/proc/meminfo.c | 4 fs/proc/task

[PATCHv2 02/41] radix tree test suite: Allow GFP_ATOMIC allocations to fail

2016-08-12 Thread Kirill A. Shutemov
kernel include files. We also need the real definition of gfpflags_allow_blocking() to persuade the radix tree to actually use its preallocated nodes. Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- tools/

[PATCHv2 13/41] truncate: make sure invalidate_mapping_pages() can discard huge pages

2016-08-12 Thread Kirill A. Shutemov
-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/truncate.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/truncate.c b/mm/truncate.c index a01cce450a26..ce904e4b1708 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -504,10 +504,21 @@ unsigne

[PATCHv2 15/41] filemap: handle huge pages in do_generic_file_read()

2016-08-12 Thread Kirill A. Shutemov
Most of work happans on head page. Only when we need to do copy data to userspace we find relevant subpage. We are still limited by PAGE_SIZE per iteration. Lifting this limitation would require some more work. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/fil

[PATCHv2 04/41] radix-tree: Add radix_tree_split

2016-08-12 Thread Kirill A. Shutemov
n call radix_tree_for_each_slot() and radix_tree_replace_slot() in order to turn these retry entries into the intended new entries. Tags are replicated from the original multiorder entry into each new entry. Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill

[PATCHv2 01/41] tools: Add WARN_ON_ONCE

2016-08-12 Thread Kirill A. Shutemov
From: Matthew Wilcox <wi...@linux.intel.com> The radix tree uses its own buggy WARN_ON_ONCE. Replace it with the definition from asm-generic/bug.h Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> ---

[PATCHv6 20/37] truncate: make truncate_inode_pages_range() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
. With memory-mapped IO we would loose holes in some cases when we have THP in page cache, since we cannot track access on 4k level in this case. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c| 2 +- include/linux/mm.h | 9 +- mm/truncate.c

[PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read()

2017-01-26 Thread Kirill A. Shutemov
Most of work happans on head page. Only when we need to do copy data to userspace we find relevant subpage. We are still limited by PAGE_SIZE per iteration. Lifting this limitation would require some more work. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/fil

[PATCHv6 17/37] fs: make block_read_full_page() be able to read huge page

2017-01-26 Thread Kirill A. Shutemov
The approach is straight-forward: for compound pages we read out whole huge page. For huge page we cannot have array of buffer head pointers on stack -- it's 4096 pointers on x86-64 -- 'arr' is allocated with kmalloc() for huge pages. Signed-off-by: Kirill A. Shutemov <kirill.sh

[PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

2017-01-26 Thread Kirill A. Shutemov
HACK. Having that said, I don't think it should prevent huge page support to be applied. Future will show if lacking readahead is a big deal with huge pages in page cache. Any suggestions are welcome. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/readahead.

Re: [PATCHv6 06/37] thp: handle write-protection faults for file THP

2017-01-26 Thread Kirill A. Shutemov
On Thu, Jan 26, 2017 at 07:44:39AM -0800, Matthew Wilcox wrote: > On Thu, Jan 26, 2017 at 02:57:48PM +0300, Kirill A. Shutemov wrote: > > For filesystems that wants to be write-notified (has mkwrite), we will > > encount write-protection faults for huge PMDs in

[PATCHv6 36/37] mm, fs, ext4: expand use of page_mapping() and page_to_pgoff()

2017-01-26 Thread Kirill A. Shutemov
With huge pages in page cache we see tail pages in more code paths. This patch replaces direct access to struct page fields with macros which can handle tail pages properly. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 2 +- fs/ext4/i

[PATCHv6 33/37] ext4: fix SEEK_DATA/SEEK_HOLE for huge pages

2017-01-26 Thread Kirill A. Shutemov
ext4_find_unwritten_pgoff() needs few tweaks to work with huge pages. Mostly trivial page_mapping()/page_to_pgoff() and adjustment to how we find relevant block. Signe-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/file.c | 18 ++ 1 file chang

[PATCHv6 18/37] fs: make block_write_{begin,end}() be able to handle huge pages

2017-01-26 Thread Kirill A. Shutemov
It's more or less straight-forward. Most changes are around getting offset/len withing page right and zero out desired part of the page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 70 +++--

[PATCHv6 00/37] ext4: support of huge pages

2017-01-26 Thread Kirill A. Shutemov
: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com> Date: Fri, 12 Aug 2016 19:44:30 +0300 Subject: [PATCH] Add few more configurations to test ext4 with huge pages Four new configurations: huge_4k, huge_1k, huge_bigalloc, huge_encrypt. Signed-off-by: Kirill A. Shutemov <kir

[PATCHv6 34/37] ext4: make fallocate() operations work with huge pages

2017-01-26 Thread Kirill A. Shutemov
__ext4_block_zero_page_range() adjusted to calculate starting iblock correctry for huge pages. ext4_{collapse,insert}_range() requires page cache invalidation. We need the invalidation to be aligning to huge page border if huge pages are possible in page cache. Signed-off-by: Kirill A. Shutemov

[PATCHv6 06/37] thp: handle write-protection faults for file THP

2017-01-26 Thread Kirill A. Shutemov
For filesystems that wants to be write-notified (has mkwrite), we will encount write-protection faults for huge PMDs in shared mappings. The easiest way to handle them is to clear the PMD and let it refault as wriable. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> Re

[PATCHv6 29/37] ext4: handle huge pages in ext4_da_write_end()

2017-01-26 Thread Kirill A. Shutemov
Call ext4_da_should_update_i_disksize() for head page with offset relative to head page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c

[PATCHv6 35/37] ext4: reserve larger jounral transaction for huge pages

2017-01-26 Thread Kirill A. Shutemov
filesystem, but hopefully this change would be enough to address the concern. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/ext4_jbd2.h | 16 +--- fs/ext4/inode.c | 34 +++--- 2 files changed, 40 insertions(+), 10 del

[PATCHv6 23/37] mm: account huge pages to dirty, writaback, reclaimable, etc.

2017-01-26 Thread Kirill A. Shutemov
We need to account huge pages according to its size to get background writaback work properly. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/fs-writeback.c | 10 +++--- include/linux/backing-dev.h | 10 ++ include/linux/memcontrol.h

[PATCHv6 37/37] ext4, vfs: add huge= mount option

2017-01-26 Thread Kirill A. Shutemov
The same four values as in tmpfs case. Encyption code is not yet ready to handle huge page, so we disable huge pages support if the inode has EXT4_INODE_ENCRYPT. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/ext4.h | 5 + fs/ext4/inode.

[PATCHv6 13/37] mm: make write_cache_pages() work on huge pages

2017-01-26 Thread Kirill A. Shutemov
We writeback whole huge page a time. Let's adjust iteration this way. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/mm.h | 1 + include/linux/pagemap.h | 1 + mm/page-writeback.c | 17 - 3 files changed, 14 insertions

[PATCHv6 19/37] fs: make block_page_mkwrite() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
Adjust check on whether part of the page beyond file size and apply compound_head() and page_mapping() where appropriate. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/bu

[PATCHv6 32/37] ext4: make EXT4_IOC_MOVE_EXT work with huge pages

2017-01-26 Thread Kirill A. Shutemov
Adjust how we find relevant block within page and how we clear the required part of the page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/move_extent.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/ext4/move_extent.

[PATCHv6 30/37] ext4: make ext4_da_page_release_reservation() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
For huge pages 'stop' must be within HPAGE_PMD_SIZE. Let's use hpage_size() in the BUG_ON(). We also need to change how we calculate lblk for cluster deallocation. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 5 +++-- 1 file changed, 3 inse

[PATCHv6 22/37] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries

2017-01-26 Thread Kirill A. Shutemov
lated code have to be updated. Note that hugetlb_fault_mutex_hash() and reservation region handling are still working with hugepage offset. Signed-off-by: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> [kirill.shute...@linux.intel.com: reject fixed] Signed-off-by: Kirill A. Shutemov <kirill.

[PATCHv6 04/37] mm, rmap: account file thp pages

2017-01-26 Thread Kirill A. Shutemov
Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps. It indicates how many times we allocate and map file THP. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- drivers/base/node.c| 6 ++ fs/proc/meminfo.c | 4 fs/proc/task

[PATCHv6 09/37] filemap: allocate huge page in pagecache_get_page(), if allowed

2017-01-26 Thread Kirill A. Shutemov
Write path allocate pages using pagecache_get_page(). We should be able to allocate huge pages there, if it's allowed. As usually, fallback to small pages, if failed. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 17 +++-- 1 file chang

[PATCHv6 25/37] ext4: make ext4_writepage() work on huge pages

2017-01-26 Thread Kirill A. Shutemov
Change ext4_writepage() and underlying ext4_bio_write_page(). It basically removes assumption on page size, infer it from struct page instead. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 10 +- fs/ext4/page-io.c | 11 +-- 2

[PATCHv6 07/37] filemap: allocate huge page in page_cache_read(), if allowed

2017-01-26 Thread Kirill A. Shutemov
to accumulate information from shadow entires to return to caller (average eviction time?). Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/fs.h | 5 ++ include/linux/pagemap.h | 21 ++- mm/filemap.c

[PATCHv6 10/37] filemap: handle huge pages in filemap_fdatawait_range()

2017-01-26 Thread Kirill A. Shutemov
We writeback whole huge page a time. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 4e398d5e4134..f5cd654b3662 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@

[PATCHv6 15/37] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask}

2017-01-26 Thread Kirill A. Shutemov
-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/huge_mm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e5c9c26d2439..5e6c408f5b47 100644 --- a/include/linux/huge_mm.h +++ b/i

[PATCHv6 26/37] ext4: handle huge pages in ext4_page_mkwrite()

2017-01-26 Thread Kirill A. Shutemov
Trivial: remove assumption on page size. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 8d1b5e63cb15..a25be1cf4506 100644 --

[PATCHv6 16/37] thp: make thp_get_unmapped_area() respect S_HUGE_MODE

2017-01-26 Thread Kirill A. Shutemov
We want mmap(NULL) to return PMD-aligned address if the inode can have huge pages in page cache. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/huge_memory.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_me

[PATCHv6 28/37] ext4: make ext4_block_write_begin() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
It simply matches changes to __block_write_begin_int(). Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 35 +-- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c

[PATCHv6 01/37] mm, shmem: swich huge tmpfs to multi-order radix-tree entries

2017-01-26 Thread Kirill A. Shutemov
to HPAGE_PMD_NR); This would provide balanced exposure of multi-order entires to the rest of the kernel. [1] find_get_pages(), pagecache_get_page(), pagevec_lookup(), etc. [2] find_get_entry(), find_get_entries(), pagevec_lookup_entries(), etc. Signed-off-by: Kirill A. Shutemov <kirill.sh

[PATCHv6 14/37] thp: introduce hpage_size() and hpage_mask()

2017-01-26 Thread Kirill A. Shutemov
Introduce new helpers which return size/mask of the page: HPAGE_PMD_SIZE/HPAGE_PMD_MASK if the page is PageTransHuge() and PAGE_SIZE/PAGE_MASK otherwise. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/huge_mm.h | 16 1 file chang

Re: [PATCHv6 03/37] page-flags: relax page flag policy for few flags

2017-02-13 Thread Kirill A. Shutemov
On Wed, Feb 08, 2017 at 08:01:13PM -0800, Matthew Wilcox wrote: > On Thu, Jan 26, 2017 at 02:57:45PM +0300, Kirill A. Shutemov wrote: > > These flags are in use for filesystems with backing storage: PG_error, > > PG_writeback and PG_readahead. > > Oh ;-) Then I amend

Re: [PATCHv6 07/37] filemap: allocate huge page in page_cache_read(), if allowed

2017-02-13 Thread Kirill A. Shutemov
On Thu, Feb 09, 2017 at 01:18:35PM -0800, Matthew Wilcox wrote: > On Thu, Jan 26, 2017 at 02:57:49PM +0300, Kirill A. Shutemov wrote: > > Later we can add logic to accumulate information from shadow entires to > > return to caller (average eviction time?). > > I would sa

Re: [PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read()

2017-02-13 Thread Kirill A. Shutemov
On Thu, Feb 09, 2017 at 01:55:05PM -0800, Matthew Wilcox wrote: > On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote: > > +++ b/mm/filemap.c > > @@ -1886,6 +1886,7 @@ static ssize_t do_generic_file_read(struct file > > *filp, loff_t *ppos, > >

Re: [PATCHv6 01/37] mm, shmem: swich huge tmpfs to multi-order radix-tree entries

2017-02-09 Thread Kirill A. Shutemov
On Wed, Feb 08, 2017 at 07:57:27PM -0800, Matthew Wilcox wrote: > On Thu, Jan 26, 2017 at 02:57:43PM +0300, Kirill A. Shutemov wrote: > > +++ b/include/linux/pagemap.h > > @@ -332,6 +332,15 @@ static inline struct page > > *grab_cache_page_nowait(struct a

Re: [PATCHv6 01/37] mm, shmem: swich huge tmpfs to multi-order radix-tree entries

2017-02-13 Thread Kirill A. Shutemov
On Thu, Feb 09, 2017 at 07:58:20PM +0300, Kirill A. Shutemov wrote: > I'll look into it. I ended up with this (I'll test it more later): void filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff) { struct radix_tree_iter iter; void **s

[PATCHv3 08/41] Revert "radix-tree: implement radix_tree_maybe_preload_order()"

2016-09-15 Thread Kirill A. Shutemov
This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8. After conversion of huge tmpfs to multi-order entries, we don't need this anymore. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/radix-tree.h | 1 - lib/radix-tree.c

[PATCHv3 06/41] radix-tree: Handle multiorder entries being deleted by replace_clear_tags

2016-09-15 Thread Kirill A. Shutemov
From: Matthew Wilcox <wi...@infradead.org> radix_tree_replace_clear_tags() can be called with NULL as the replacement value; in this case we need to delete sibling entries which point to the slot. Signed-off-by: Matthew Wilcox <wi...@infradead.org> Signed-off-by: Kirill A. Shutemov &

[PATCHv3 27/41] truncate: make invalidate_inode_pages2_range() aware about huge pages

2016-09-15 Thread Kirill A. Shutemov
For huge pages we need to unmap whole range covered by the huge page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/truncate.c | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c

[PATCHv3 36/41] ext4: handle writeback with huge pages

2016-09-15 Thread Kirill A. Shutemov
Modify mpage_map_and_submit_buffers() and mpage_release_unused_pages() to deal with huge pages. Mostly result of try-and-error. Critical view would be appriciated. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.

[PATCHv3 29/41] ext4: make ext4_mpage_readpages() hugepage-aware

2016-09-15 Thread Kirill A. Shutemov
This patch modifies ext4_mpage_readpages() to deal with huge pages. We read out 2M at once, so we have to alloc (HPAGE_PMD_NR * blocks_per_page) sector_t for that. I'm not entirely happy with kmalloc in this codepath, but don't see any other option. Signed-off-by: Kirill A. Shutemov

[PATCHv3 11/41] thp: try to free page's buffers before attempt split

2016-09-15 Thread Kirill A. Shutemov
them, before attempt split. And remove one guarding VM_BUG_ON_PAGE(). Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/buffer_head.h | 1 + mm/huge_memory.c| 19 ++- 2 files changed, 19 insertions(+), 1 deletion(-) diff

[PATCHv3 24/41] fs: make block_write_{begin,end}() be able to handle huge pages

2016-09-15 Thread Kirill A. Shutemov
It's more or less straight-forward. Most changes are around getting offset/len withing page right and zero out desired part of the page. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 53 +++-- 1 file c

[PATCHv3 33/41] ext4: make ext4_block_write_begin() aware about huge pages

2016-09-15 Thread Kirill A. Shutemov
It simply matches changes to __block_write_begin_int(). Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a07c05

[PATCHv3 32/41] ext4: handle huge pages in __ext4_block_zero_page_range()

2016-09-15 Thread Kirill A. Shutemov
As the function handles zeroing range only within one block, the required changes are trivial, just remove assuption on page size. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --gi

[PATCHv3 15/41] filemap: handle huge pages in do_generic_file_read()

2016-09-15 Thread Kirill A. Shutemov
Most of work happans on head page. Only when we need to do copy data to userspace we find relevant subpage. We are still limited by PAGE_SIZE per iteration. Lifting this limitation would require some more work. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/fil

[PATCHv3 35/41] ext4: make ext4_da_page_release_reservation() aware about huge pages

2016-09-15 Thread Kirill A. Shutemov
For huge pages 'stop' must be within HPAGE_PMD_SIZE. Let's use hpage_size() in the BUG_ON(). We also need to change how we calculate lblk for cluster deallocation. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/ext4/inode.c | 5 +++-- 1 file changed, 3 inse

[PATCHv3 26/41] truncate: make truncate_inode_pages_range() aware about huge pages

2016-09-15 Thread Kirill A. Shutemov
. With memory-mapped IO we would loose holes in some cases when we have THP in page cache, since we cannot track access on 4k level in this case. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 2 +- mm/truncate.

[PATCHv3 17/41] filemap: handle huge pages in filemap_fdatawait_range()

2016-09-15 Thread Kirill A. Shutemov
We writeback whole huge page a time. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 05b42d3e5ed8..53da93156e60 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@

[PATCHv3 09/41] page-flags: relax page flag policy for few flags

2016-09-15 Thread Kirill A. Shutemov
These flags are in use for filesystems with backing storage: PG_error, PG_writeback and PG_readahead. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/page-flags.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include

[PATCHv3 10/41] mm, rmap: account file thp pages

2016-09-15 Thread Kirill A. Shutemov
Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps. It indicates how many times we allocate and map file THP. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- drivers/base/node.c| 6 ++ fs/proc/meminfo.c | 4 fs/proc/task

[PATCHv3 05/41] radix-tree: Add radix_tree_split_preload()

2016-09-15 Thread Kirill A. Shutemov
r too few (checked by comparing nr_allocated before and after the call to radix_tree_split()). Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/radix-tree.h|

[PATCHv3 03/41] radix-tree: Add radix_tree_join

2016-09-15 Thread Kirill A. Shutemov
e walk, but they will never see NULL for an index which was populated before the join. Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/radix-tree.h| 2 + lib/radix-tree.c

[PATCHv3 04/41] radix-tree: Add radix_tree_split

2016-09-15 Thread Kirill A. Shutemov
n call radix_tree_for_each_slot() and radix_tree_replace_slot() in order to turn these retry entries into the intended new entries. Tags are replicated from the original multiorder entry into each new entry. Signed-off-by: Matthew Wilcox <wi...@linux.intel.com> Signed-off-by: Kirill

[PATCHv3 07/41] mm, shmem: swich huge tmpfs to multi-order radix-tree entries

2016-09-15 Thread Kirill A. Shutemov
-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 320 +-- mm/huge_memory.c | 47 +--- mm/khugepaged.c | 26 ++--- mm/shmem.c | 36 ++- 4 files changed, 247 insertions(+), 182 deletions(-) diff --gi

[PATCHv3 16/41] filemap: allocate huge page in pagecache_get_page(), if allowed

2016-09-15 Thread Kirill A. Shutemov
Write path allocate pages using pagecache_get_page(). We should be able to allocate huge pages there, if it's allowed. As usually, fallback to small pages, if failed. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- mm/filemap.c | 18 -- 1 file chang

[PATCHv3 22/41] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask}

2016-09-15 Thread Kirill A. Shutemov
-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- include/linux/huge_mm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de2789b4402c..5c5466ba37df 100644 --- a/include/linux/huge_mm.h +++ b/i

[PATCHv3 25/41] fs: make block_page_mkwrite() aware about huge pages

2016-09-15 Thread Kirill A. Shutemov
Adjust check on whether part of the page beyond file size and apply compound_head() and page_mapping() where appropriate. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- fs/buffer.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/bu

[PATCHv3 12/41] thp: handle write-protection faults for file THP

2016-09-15 Thread Kirill A. Shutemov
For filesystems that wants to be write-notified (has mkwrite), we will encount write-protection faults for huge PMDs in shared mappings. The easiest way to handle them is to clear the PMD and let it refault as wriable. Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> -

  1   2   >