[PATCH RFC 0/1] hugetlbfs: fix truncate/fault races

2018-10-07 Thread Mike Kravetz
following patch describes the current race in detail and adds the mutex to prevent truncate/fault races. Mike Kravetz (1): hugetlbfs: introduce truncation/fault mutex to avoid races fs/hugetlbfs/inode.c| 24 include/linux/hugetlb.h | 1 + mm/hugetlb.c| 25

[PATCH RFC 1/1] hugetlbfs: introduce truncation/fault mutex to avoid races

2018-10-07 Thread Mike Kravetz
ation takes in write mode. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 24 include/linux/hugetlb.h | 1 + mm/hugetlb.c| 25 +++-- mm/userfaultfd.c| 8 +++- 4 files changed, 47 insertions(+), 11 deletions(-) diff

[PATCH] hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!

2018-11-05 Thread Mike Kravetz
iated page. This is how we end up with an elevated map count. To solve, check the dst_pte entry for huge_pte_none. If !none, this implies PMD sharing so do not copy. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff

Re: [PATCH] hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!

2018-11-05 Thread Mike Kravetz
On 11/5/18 1:30 PM, Andrew Morton wrote: > On Mon, 5 Nov 2018 13:23:15 -0800 Mike Kravetz > wrote: > >> This bug has been experienced several times by Oracle DB team. >> The BUG is in the routine remove_inode_hugepages() as follows: >> /* >> * If

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-23 Thread Mike Kravetz
On 10/23/18 12:43 AM, Michal Hocko wrote: > On Wed 17-10-18 21:10:22, Mike Kravetz wrote: >> Some test systems were experiencing negative huge page reserve >> counts and incorrect file block counts. This was traced to >> /proc/sys/vm/drop_caches removing clean pages f

[PATCH RFC v2 1/1] hugetlbfs: use i_mmap_rwsem for pmd sharing and truncate/fault sync

2018-10-23 Thread Mike Kravetz
d in read mode after huge_pte_alloc, until the caller is finished with the returned ptep. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 21 ++ mm/hugetlb.c | 65 +--- mm/rmap.c| 10 +++ mm/userfaultfd.c |

[PATCH RFC v2 0/1] hugetlbfs: Use i_mmap_rwsem for pmd share and fault/trunc

2018-10-23 Thread Mike Kravetz
worse. This leads to bad things such as incorrect page map/reference counts or invaid memory references. Fix this all by modifying the usage of i_mmap_rwsem to cover fault/truncate races as well as handling of shared pmds Mike Kravetz (1): hugetlbfs: use i_mmap_rwsem for pmd sharing and tru

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Mike Kravetz
On 10/18/18 4:08 PM, Andrew Morton wrote: > On Wed, 17 Oct 2018 21:10:22 -0700 Mike Kravetz > wrote: > >> Some test systems were experiencing negative huge page reserve >> counts and incorrect file block counts. This was traced to >> /proc/sys/vm/drop_caches removing

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Mike Kravetz
On 10/18/18 6:47 PM, Andrew Morton wrote: > On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli > wrote: > >> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote: >>> I was not sure about this, and expected someone could come up with >>> something

Re: [PATCH v2 1/2] mm: fix race on soft-offlining free huge pages

2018-07-17 Thread Mike Kravetz
workflow above. With the suggested changes, I think this is OK for huge pages. However, it seems that setting HWPoison on a in use non-huge page could cause issues? While looking at the code, I noticed this comment in __get_any_page() /* * When the target page is a free hugepage, just remove it * from free hugepage list. */ Did that apply to some code that was removed? It does not seem to make any sense in that routine. -- Mike Kravetz

Re: [PATCH v2 1/2] mm: fix race on soft-offlining free huge pages

2018-07-17 Thread Mike Kravetz
On 07/17/2018 06:28 PM, Naoya Horiguchi wrote: > On Tue, Jul 17, 2018 at 01:10:39PM -0700, Mike Kravetz wrote: >> It seems that soft_offline_free_page can be called for in use pages. >> Certainly, that is the case in the first workflow above. With the >> suggested changes, I

Re: [External] Re: [PATCH v20 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-04-20 Thread Mike Kravetz
On 4/20/21 1:46 AM, Muchun Song wrote: > On Tue, Apr 20, 2021 at 7:20 AM Mike Kravetz wrote: >> >> On 4/15/21 1:40 AM, Muchun Song wrote: >>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >>> index 0abed7e766b8..6e970a7d3480 100644 >>&g

Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

2021-02-16 Thread Mike Kravetz
s may not be too bad in the case of freeing a single page, but would become more complex when doing bulk freeing. After a little thought, the workqueue approach may even end up simpler. However, I would suggest a very simple workqueue implementation with non-blocking allocations. If we can not quickly get vmemmap pages, put the page back on the hugetlb free list and treat as a surplus page. -- Mike Kravetz

Re: [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom

2021-02-16 Thread Mike Kravetz
caused a DOS scenario as Michal sugested. However, this is an 'opt in' feature. So, I would not expect anyone who carefully plans the size of their hugetlb pool to enable such a feature. If there is a use case where hugetlb pages are used in a non-essential application, this might be of use. -- Mike Kravetz

Re: [PATCH 3/4] mm/hugeltb: fix potential wrong gbl_reserve value for hugetlb_acct_memory()

2021-04-07 Thread Mike Kravetz
On 4/7/21 12:24 AM, Miaohe Lin wrote: > Hi: > On 2021/4/7 10:49, Mike Kravetz wrote: >> On 4/2/21 2:32 AM, Miaohe Lin wrote: >>> The resv_map could be NULL since this routine can be called in the evict >>> inode path for all hugetlbfs inodes. So we could have chg = 0

Re: [PATCH 2/4] mm/hugeltb: simplify the return code of __vma_reservation_common()

2021-04-07 Thread Mike Kravetz
On 4/6/21 8:09 PM, Miaohe Lin wrote: > On 2021/4/7 10:37, Mike Kravetz wrote: >> On 4/6/21 7:05 PM, Miaohe Lin wrote: >>> Hi: >>> On 2021/4/7 8:53, Mike Kravetz wrote: >>>> On 4/2/21 2:32 AM, Miaohe Lin wrote: >>>>> It's guarant

Re: [PATCH v4 0/8] make hugetlb put_page safe for all calling contexts

2021-04-07 Thread Mike Kravetz
ing you suggest. Please do not start until we get an Ack from Oscar as he will need to participate. Remove patches for this series in your tree from Mike Kravetz: - hugetlb: add lockdep_assert_held() calls for hugetlb_lock - hugetlb: fix irq locking omissions - hugetlb: make free_huge_page irq safe -

Re: [PATCH 2/4] mm/hugeltb: simplify the return code of __vma_reservation_common()

2021-04-08 Thread Mike Kravetz
On 4/7/21 7:44 PM, Miaohe Lin wrote: > On 2021/4/8 5:23, Mike Kravetz wrote: >> On 4/6/21 8:09 PM, Miaohe Lin wrote: >>> On 2021/4/7 10:37, Mike Kravetz wrote: >>>> On 4/6/21 7:05 PM, Miaohe Lin wrote: >>>>> Hi: >>>>> On 2021/4/7 8:53, Mi

Re: [PATCH 3/4] mm/hugeltb: fix potential wrong gbl_reserve value for hugetlb_acct_memory()

2021-04-08 Thread Mike Kravetz
On 4/7/21 8:26 PM, Miaohe Lin wrote: > On 2021/4/8 11:24, Miaohe Lin wrote: >> On 2021/4/8 4:53, Mike Kravetz wrote: >>> On 4/7/21 12:24 AM, Miaohe Lin wrote: >>>> Hi: >>>> On 2021/4/7 10:49, Mike Kravetz wrote: >>>>> On 4/2/21 2:32 AM,

Re: [PATCH 4/4] mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()

2021-04-08 Thread Mike Kravetz
e if (!rsv_adjust) { > + reserved = true; > } > + > + if (!reserved) > + pr_warn("hugetlb: fix reserve count failed\n"); We should expand this warning message a bit to indicate what this may mean to the user. Add something like" "Huge Page Reserved count may go negative". -- Mike Kravetz

Re: [PATCH 3/4] mm/hugeltb: fix potential wrong gbl_reserve value for hugetlb_acct_memory()

2021-04-08 Thread Mike Kravetz
On 4/8/21 8:01 PM, Miaohe Lin wrote: > On 2021/4/9 6:53, Mike Kravetz wrote: >> >> Yes, add a comment to hugetlb_unreserve_pages saying that !resv_map >> implies freed == 0. >> > > Sounds good! > >> It would also be helpful to check for (

RT scheduling: wakeup bug?

2007-10-01 Thread Mike Kravetz
I've been trying to track down some unexpected realtime latencies and believe one source is a bug in the wakeup code. Specifically, this is within the try_to_wake_up() routine. Within this routine there is the following code segment: /* * If a newly woken up RT task cannot preem

Re: [PATCH V2 4/4] hugetlbfs: document min_size mount option

2015-03-20 Thread Mike Kravetz
On 03/18/2015 07:23 PM, Andrew Morton wrote: On Wed, 18 Mar 2015 18:51:22 -0700 Mike Kravetz wrote: Nowhere here is the reader told the units of "size". We should at least describe that, and maybe even rename the thing to min_bytes. Ok, I will add that the size is in unit of

[PATCH V3 3/4] hugetlbfs: accept subpool min_size mount option and setup accordingly

2015-03-20 Thread Mike Kravetz
s specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 90 ++--- include/linux/h

[PATCH V3 0/4] hugetlbfs: add min_size filesystem mount option

2015-03-20 Thread Mike Kravetz
: Added ability to specify minimum size. Suggsted by David Rientjes V1: Comments from RFC addressed/incorporated Mike Kravetz (4): hugetlbfs: add minimum size tracking fields to subpool structure hugetlbfs: add minimum size accounting to subpools hugetlbfs: accept subpool min_size mo

[PATCH V3 2/4] hugetlbfs: add minimum size accounting to subpools

2015-03-20 Thread Mike Kravetz
routines now return this global reserve count adjustment. This global reserve count adjustment is then passed to the global accounting routine hugetlb_acct_memory(). Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 123 --- 1 file changed, 100

[PATCH V3 1/4] hugetlbfs: add minimum size tracking fields to subpool structure

2015-03-20 Thread Mike Kravetz
minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 8 +++- mm/hugetlb.c

[PATCH V3 4/4] hugetlbfs: document min_size mount option and cleanup

2015-03-20 Thread Mike Kravetz
Add min_size mount option to the hugetlbfs documentation. Also, add the missing pagesize option and mention that size can be specified as bytes or a percentage of huge page pool. Signed-off-by: Mike Kravetz --- Documentation/vm/hugetlbpage.txt | 31 ++- 1 file

[RFC v2 PATCH 0/5] hugetlbfs: add fallocate support

2015-04-23 Thread Mike Kravetz
noticed by Hillf Danton New region_del() routine for region tracking/resv_map of ranges Fixed several issues found during more extensive testing Error handling in region_del() when kmalloc() fails stills needs to be addressed madvise remove support remains Mike Kravetz (5

[RFC v2 PATCH 4/5] hugetlbfs: add hugetlbfs_fallocate()

2015-04-23 Thread Mike Kravetz
it is currently implemented using fallocate(). MADV_REMOVE lets madvise() remove pages from the middle of a hugetlbfs file, which wasn't possible before. hugetlbfs fallocate only operates on whole huge pages. Based-on code-by: Dave Hansen Signed-off-by: Mike Kravetz --- fs/hugetlbfs/in

[RFC v2 PATCH 5/5] mm: madvise allow remove operation for hugetlbfs

2015-04-23 Thread Mike Kravetz
Now that we have hole punching support for hugetlbfs, we can also support the MADV_REMOVE interface to it. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/madvise.c b/mm/madvise.c index d551475

[RFC v2 PATCH 2/5] hugetlbfs: remove region_truncte() as region_del() can be used

2015-04-23 Thread Mike Kravetz
Now that region_del() exists, the region_truncate() routine can be removed. Callers of region_truncate are changed to call region_del instead with a ending value of -1. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 37 + 1 file changed, 1 insertion(+), 36

[RFC v2 PATCH 3/5] hugetlbfs: New huge_add_to_page_cache helper routine

2015-04-23 Thread Mike Kravetz
Currently, there is only a single place where hugetlbfs pages are added to the page cache. The new fallocate code be adding a second one, so break the functionality out into its own helper. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 2 ++ mm

[RFC v2 PATCH 1/5] hugetlbfs: truncate_hugepages() takes a range of pages

2015-04-23 Thread Mike Kravetz
. Based-on code-by: Dave Hansen Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 31 +++- include/linux/hugetlb.h | 3 +- mm/hugetlb.c| 76 +++-- 3 files changed, 100 insertions(+), 10 deletions(-) diff --git a/fs

hugetlbfs alignment requirements conflicting with documentation

2015-04-15 Thread Mike Kravetz
ze aligned value). cc'ing some people from the recent hugetlb munmap alignment thread as I'm sure they will have an opinion here. -- Mike Kravetz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.or

[RFC PATCH 1/4] hugetlbfs: truncate_hugepages() takes a range of pages

2015-04-16 Thread Mike Kravetz
Modify truncate_hugepages() to take a range of pages (start, end) instead of simply start. If the value of end is -1, this indicates the end of the range is the end of the file. This functionality will be used for fallocate hole punching. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz

[RFC PATCH 0/4] hugetlbfs: add fallocate support

2015-04-16 Thread Mike Kravetz
ideally would like to release them back to the subpool or global pools for other uses. The fallocate() system call provides an interface for preallocation and hole punching within files. This patch set adds fallocate functionality to hugetlbfs. Mike Kravetz (4): hugetlbfs: truncate_hugepages() takes

[RFC PATCH 2/4] hugetlbfs: New huge_add_to_page_cache helper routine

2015-04-16 Thread Mike Kravetz
Currently, there is only a single place where hugetlbfs pages are added to the page cache. The new fallocate code be adding a second one, so break the functionality out into its own helper. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 2 ++ mm

[RFC PATCH 3/4] hugetlbfs: add hugetlbfs_fallocate()

2015-04-16 Thread Mike Kravetz
it is currently implemented using fallocate(). MADV_REMOVE lets us remove data from the middle of a hugetlbfs file, which wasn't possible before. hugetlbfs fallocate only operates on whole huge pages. Based-on code-by: Dave Hansen Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c

[RFC PATCH 4/4] mm: madvise allow remove operation for hugetlbfs

2015-04-16 Thread Mike Kravetz
Now that we have hole punching support for hugetlbfs, we can also support the MADV_REMOVE interface to it. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/madvise.c b/mm/madvise.c index d551475

Re: HugePages_Rsvd leak

2015-04-08 Thread Mike Kravetz
y fault/allocate any huge pages. The result was the reservation (HugePages_Rsvd) of sufficient huge pages to cover the mapping. When the program exited, the reservations remained. If I remove (unlink) the file the reservations will be removed. -- Mike Kravetz -- To unsubscribe from this

Re: [RFC PATCH 4/4] mm: madvise allow remove operation for hugetlbfs

2015-04-17 Thread Mike Kravetz
On 04/16/2015 11:44 PM, Christoph Hellwig wrote: On Thu, Apr 16, 2015 at 04:02:58PM -0700, Mike Kravetz wrote: Now that we have hole punching support for hugetlbfs, we can also support the MADV_REMOVE interface to it. Meh. Just use fallocate for any new code.. I don't have the com

Re: [RFC PATCH 4/4] mm: madvise allow remove operation for hugetlbfs

2015-04-17 Thread Mike Kravetz
On 04/17/2015 12:10 AM, Hillf Danton wrote: Now that we have hole punching support for hugetlbfs, we can also support the MADV_REMOVE interface to it. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

Re: [RFC PATCH 3/4] hugetlbfs: add hugetlbfs_fallocate()

2015-04-17 Thread Mike Kravetz
() +* unlock_page because locked by add_to_page_cache() +*/ + put_page(page); Still needed if EEXIST? Nope. Good catch. I'll fix this in the next version. -- Mike Kravetz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" i

Re: [RFC PATCH 4/4] mm: madvise allow remove operation for hugetlbfs

2015-04-18 Thread Mike Kravetz
On 04/17/2015 10:11 AM, Mike Kravetz wrote: On 04/17/2015 12:10 AM, Hillf Danton wrote: Now that we have hole punching support for hugetlbfs, we can also support the MADV_REMOVE interface to it. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- mm/madvise.c | 2 +- 1 file

Re: [PATCH] mm/hugetlb: document the reserve map/region tracking routines

2015-05-26 Thread Mike Kravetz
On 05/26/2015 04:09 PM, Andrew Morton wrote: On Tue, 26 May 2015 14:27:10 -0700 Mike Kravetz wrote: This is a documentation only patch and does not modify any code. Descriptions of the routines used for reserve map/region tracking are added. Confused. This adds comments which are similar

[PATCH v3 2/3] mm/hugetlb: compute/return the number of regions added by region_add()

2015-05-27 Thread Mike Kravetz
region_add(). In the normal case, we want vma_commit_reservation to return the same value as the preceding call to vma_needs_reservation. Create a common __vma_reservation_common routine to help keep the special case return values in sync Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 72

[PATCH v3 1/3] mm/hugetlb: document the reserve map/region tracking routines

2015-05-27 Thread Mike Kravetz
This is a documentation only patch and does not modify any code. Descriptions of the routines used for reserve map/region tracking are added. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 52 ++-- 1 file changed, 50 insertions(+), 2 deletions

[PATCH v3 0/3] alloc_huge_page/hugetlb_reserve_pages race

2015-05-27 Thread Mike Kravetz
off parameter commit for easier reading v2: Added documentation for the region/reserve map routines Created common routine for vma_commit_reservation and vma_commit_reservation to help prevent them from drifting apart in the future. Mike Kravetz (3): mm/hugetlb: document the reserve

[PATCH v3 3/3] mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

2015-05-27 Thread Mike Kravetz
. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 34 ++ 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b3d3d59..038c84e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1544,7 +1544,7 @@ static struct page *alloc_huge_page

Re: [PATCH v3 3/3] mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

2015-05-28 Thread Mike Kravetz
On 05/28/2015 07:01 AM, Davidlohr Bueso wrote: On Wed, 2015-05-27 at 10:56 -0700, Mike Kravetz wrote: alloc_huge_page and hugetlb_reserve_pages use region_chg to calculate the number of pages which will be added to the reserve map. Subpool and global reserve counts are adjusted based on the

[RFC v4 PATCH 1/9] mm/hugetlb: add region_del() to delete a specific range of entries

2015-06-11 Thread Mike Kravetz
and do not need to deal with error handling. Future callers of region_del() (such as fallocate hole punch) will need to handle this error. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 88 ++-- 1 file changed, 62 insertions(+), 26 deletions

[RFC v4 PATCH 0/9] hugetlbfs: add fallocate support

2015-06-11 Thread Mike Kravetz
t and error handling issues noticed by Hillf Danton New region_del() routine for region tracking/resv_map of ranges Fixed several issues found during more extensive testing Error handling in region_del() when kmalloc() fails stills needs to be addressed madvise remove support remains

[RFC v4 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate

2015-06-11 Thread Mike Kravetz
Areas hole punched by fallocate will not have entries in the region/reserve map. However, shared mappings with min_size subpool reservations may still have reserved pages. alloc_huge_page needs to handle this special case and do the proper accounting. Signed-off-by: Mike Kravetz --- mm

[RFC v4 PATCH 2/9] mm/hugetlb: expose hugetlb fault mutex for use by fallocate

2015-06-11 Thread Mike Kravetz
-by: Mike Kravetz --- include/linux/hugetlb.h | 10 ++ mm/hugetlb.c| 20 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 2050261..bbd072e 100644 --- a/include/linux/hugetlb.h +++ b

[RFC v4 PATCH 9/9] mm: madvise allow remove operation for hugetlbfs

2015-06-11 Thread Mike Kravetz
Now that we have hole punching support for hugetlbfs, we can also support the MADV_REMOVE interface to it. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/madvise.c b/mm/madvise.c index d215ea9

[RFC v4 PATCH 8/9] hugetlbfs: add hugetlbfs_fallocate()

2015-06-11 Thread Mike Kravetz
only operates on whole huge pages. Based-on code-by: Dave Hansen Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 156 +++- include/linux/hugetlb.h | 3 + mm/hugetlb.c| 8 +-- 3 files changed, 162 insertions(+), 5 dele

[RFC v4 PATCH 7/9] hugetlbfs: New huge_add_to_page_cache helper routine

2015-06-11 Thread Mike Kravetz
Currently, there is only a single place where hugetlbfs pages are added to the page cache. The new fallocate code be adding a second one, so break the functionality out into its own helper. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 2 ++ mm

[RFC v4 PATCH 5/9] mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch

2015-06-11 Thread Mike Kravetz
). vma_has_reserves is passed "chg" which indicates whether or not a region/reserve map is present. Use this to determine if reserves are actually present or were removed via hole punch. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletion

[RFC v4 PATCH 3/9] hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete

2015-06-11 Thread Mike Kravetz
callers to add 0 as end of range. Since the routine will be used in hole punch as well as truncate operations, it is more appropriately renamed to hugetlb_vmdelete_list(). Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 25 ++--- 1 file changed, 18 insertions(+), 7

[RFC v4 PATCH 4/9] hugetlbfs: truncate_hugepages() takes a range of pages

2015-06-11 Thread Mike Kravetz
() is also modified to take a range of pages. hugetlb_unreserve_pages is modified to detect an error from region_del and pass it back to the caller. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 93 +++-- include/linux/hugetlb.h | 4 ++- mm

Re: [RFC v4 PATCH 2/9] mm/hugetlb: expose hugetlb fault mutex for use by fallocate

2015-06-11 Thread Mike Kravetz
On 06/11/2015 03:46 PM, Davidlohr Bueso wrote: On Thu, 2015-06-11 at 14:01 -0700, Mike Kravetz wrote: /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); @@ -3324,7 +3324,8 @@ static u32 fault_mutex_hash(struct hstate *h, struct mm_struct *mm

Re: [RFC v4 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate

2015-06-15 Thread Mike Kravetz
On 06/14/2015 11:34 PM, Naoya Horiguchi wrote: > On Thu, Jun 11, 2015 at 02:01:37PM -0700, Mike Kravetz wrote: >> Areas hole punched by fallocate will not have entries in the >> region/reserve map. However, shared mappings with min_size subpool >> reservations may stil

Re: [RFC v4 PATCH 2/9] mm/hugetlb: expose hugetlb fault mutex for use by fallocate

2015-06-17 Thread Mike Kravetz
On 06/11/2015 03:46 PM, Davidlohr Bueso wrote: On Thu, 2015-06-11 at 14:01 -0700, Mike Kravetz wrote: /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); @@ -3324,7 +3324,8 @@ static u32 fault_mutex_hash(struct hstate *h, struct mm_struct *mm

[RFC v5 PATCH 2/9] mm/hugetlb: expose hugetlb fault mutex for use by fallocate

2015-06-22 Thread Mike Kravetz
changes to be more consistent with other global hugetlb symbols. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 5 + mm/hugetlb.c| 20 ++-- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h

[RFC v5 PATCH 8/9] hugetlbfs: add hugetlbfs_fallocate()

2015-06-22 Thread Mike Kravetz
only operates on whole huge pages. Based-on code-by: Dave Hansen Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 158 +++- include/linux/hugetlb.h | 3 + mm/hugetlb.c| 2 +- 3 files changed, 161 insertions(+), 2 deletions(-)

[RFC v5 PATCH 4/9] hugetlbfs: truncate_hugepages() takes a range of pages

2015-06-22 Thread Mike Kravetz
() is also modified to take a range of pages. hugetlb_unreserve_pages is modified to detect an error from region_del and pass it back to the caller. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c| 98 - include/linux/hugetlb.h | 4 +- mm

[RFC v5 PATCH 5/9] mm/hugetlb: vma_has_reserves() needs to handle fallocate hole punch

2015-06-22 Thread Mike Kravetz
). vma_has_reserves is passed "chg" which indicates whether or not a region/reserve map is present. Use this to determine if reserves are actually present or were removed via hole punch. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 16 +--- 1 file changed, 13 insertions(+), 3

[RFC v5 PATCH 0/9] hugetlbfs: add fallocate support

2015-06-22 Thread Mike Kravetz
found during more extensive testing Error handling in region_del() when kmalloc() fails stills needs to be addressed madvise remove support remains Mike Kravetz (9): mm/hugetlb: add region_del() to delete a specific range of entries mm/hugetlb: expose hugetlb fault mutex for use by fall

[RFC v5 PATCH 1/9] mm/hugetlb: add region_del() to delete a specific range of entries

2015-06-22 Thread Mike Kravetz
and do not need to deal with error handling. Future callers of region_del() (such as fallocate hole punch) will need to handle this error. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 88 ++-- 1 file changed, 62 insertions(+), 26 deletions

[RFC v5 PATCH 3/9] hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete

2015-06-22 Thread Mike Kravetz
callers to add 0 as end of range. Since the routine will be used in hole punch as well as truncate operations, it is more appropriately renamed to hugetlb_vmdelete_list(). Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 25 ++--- 1 file changed, 18 insertions(+), 7

[RFC v5 PATCH 9/9] mm: madvise allow remove operation for hugetlbfs

2015-06-22 Thread Mike Kravetz
Now that we have hole punching support for hugetlbfs, we can also support the MADV_REMOVE interface to it. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/madvise.c b/mm/madvise.c index d215ea9

[RFC v5 PATCH 7/9] hugetlbfs: New huge_add_to_page_cache helper routine

2015-06-22 Thread Mike Kravetz
Currently, there is only a single place where hugetlbfs pages are added to the page cache. The new fallocate code be adding a second one, so break the functionality out into its own helper. Signed-off-by: Dave Hansen Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 2 ++ mm

[RFC v5 PATCH 6/9] mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate

2015-06-22 Thread Mike Kravetz
Areas hole punched by fallocate will not have entries in the region/reserve map. However, shared mappings with min_size subpool reservations may still have reserved pages. alloc_huge_page needs to handle this special case and do the proper accounting. Signed-off-by: Mike Kravetz --- mm

[PATCH v4 0/3] alloc_huge_page/hugetlb_reserve_pages race

2015-06-02 Thread Mike Kravetz
routine for vma_commit_reservation and vma_commit_reservation to help prevent them from drifting apart in the future. Mike Kravetz (3): mm/hugetlb: document the reserve map/region tracking routines mm/hugetlb: compute/return the number of regions added by region_add() mm/hugetlb

[PATCH v4 1/3] mm/hugetlb: document the reserve map/region tracking routines

2015-06-02 Thread Mike Kravetz
This is a documentation only patch and does not modify any code. Descriptions of the routines used for reserve map/region tracking are added. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 52 ++-- 1 file changed, 50 insertions(+), 2 deletions

[PATCH v4 2/3] mm/hugetlb: compute/return the number of regions added by region_add()

2015-06-02 Thread Mike Kravetz
region_add(). In the normal case, we want vma_commit_reservation to return the same value as the preceding call to vma_needs_reservation. Create a common __vma_reservation_common routine to help keep the special case return values in sync Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 72

[PATCH v4 3/3] mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

2015-06-02 Thread Mike Kravetz
. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 39 +++ 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index cd3fc41..75c0eef 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1542,7 +1542,7 @@ static struct page

Re: [PATCH v3 3/3] mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

2015-06-01 Thread Mike Kravetz
On 05/27/2015 10:56 AM, Mike Kravetz wrote: alloc_huge_page and hugetlb_reserve_pages use region_chg to calculate the number of pages which will be added to the reserve map. Subpool and global reserve counts are adjusted based on the output of region_chg. Before the pages are actually added to

Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg

2018-05-22 Thread Mike Kravetz
ages are not accounted for when they are allocated as 'reserves'. It is not until these reserves are actually used that accounting limits are checked. This 'seems' to align with general allocation of huge pages within the pool. No accounting is done until they are actually allocated to a mapping/file. -- Mike Kravetz

Re: [PATCH v2 3/4] mm: add find_alloc_contig_pages() interface

2018-05-22 Thread Mike Kravetz
On 05/22/2018 09:41 AM, Reinette Chatre wrote: > On 5/21/2018 4:48 PM, Mike Kravetz wrote: >> On 05/21/2018 01:54 AM, Vlastimil Babka wrote: >>> On 05/04/2018 01:29 AM, Mike Kravetz wrote: >>>> +/** >>>> + * find_alloc_contig_pages() --

Re: [PATCH 1/2] selftests/memfd/memfd_test.c: fix implicit declaration

2018-04-04 Thread Mike Kravetz
On 04/04/2018 04:36 AM, Anders Roxell wrote: > On 14 March 2018 at 02:09, Mike Kravetz wrote: >> On 03/13/2018 04:42 AM, Anders Roxell wrote: >>> gcc warns about implicit declaration. >>> >>> gcc -D_FILE_OFFSET_BITS=64 -I../../../../include/uapi/ >>>

Re: [PATCH 4.15 000/105] 4.15.14-stable review

2018-03-28 Thread Mike Kravetz
4. > > There is a regression on arm32 in libhugetlbfs/truncate_above_4GB-2M-32 > that also exists in 4.14 and mainline. We'll investigate the root cause > and report upstream in mainline. I suspect the cause is "hugetlbfs: > check for pgoff value overflow", but have no

Re: [PATCH 4.15 000/105] 4.15.14-stable review

2018-03-28 Thread Mike Kravetz
On 03/28/2018 12:06 PM, Mike Kravetz wrote: > On 03/28/2018 11:44 AM, Dan Rue wrote: >> On Tue, Mar 27, 2018 at 06:26:40PM +0200, Greg Kroah-Hartman wrote: >>> This is the start of the stable review cycle for the 4.15.14 release. >>> There are 105 patches in this seri

[PATCH 0/1] fix regression in hugetlbfs overflow checking

2018-03-28 Thread Mike Kravetz
han 4GB on 32 bit kernels. The above is in the commit message. 63489f8e8211 has been sent upstream and to stable, so cc'ing stable here as well. I would appreciate some more eyes on this code. There have been several fixes and we keep running into issues. Mike Kravetz (1): hugetlbfs: f

[PATCH 1/1] hugetlbfs: fix bug in pgoff overflow checking

2018-03-28 Thread Mike Kravetz
y: Dan Rue Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index b9a254dcc0e7..8450a1d75dfa 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c

Re: [PATCH] memcg, hugetlb: pages allocated for hugetlb's overcommit will be charged to memcg

2018-05-01 Thread Mike Kravetz
pages that are not charged to a memcg. memcg charges in other code paths seem to happen at huge page allocation time. -- Mike Kravetz > > The page charged to memcg will finally be uncharged at free_huge_page. > > Modification of memcontrol.c is for updating of statistical information

Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface

2018-05-02 Thread Mike Kravetz
On 04/21/2018 09:16 AM, Vlastimil Babka wrote: > On 04/17/2018 04:09 AM, Mike Kravetz wrote: >> find_alloc_contig_pages() is a new interface that attempts to locate >> and allocate a contiguous range of pages. It is provided as a more >> convenient interface than alloc_co

Re: [PATCH] memcg, hugetlb: pages allocated for hugetlb's overcommit will be charged to memcg

2018-05-02 Thread Mike Kravetz
On 05/01/2018 11:54 PM, TSUKADA Koutaro wrote: > On 2018/05/02 13:41, Mike Kravetz wrote: >> What is the reason for not charging pages at allocation/reserve time? I am >> not an expert in memcg accounting, but I would think the pages should be >> charged at allocation tim

Re: [patch] mm, hugetlb_cgroup: suppress SIGBUS when hugetlb_cgroup charge fails

2018-05-29 Thread Mike Kravetz
ssh_to_dbg # sudo ./test_mmap 4 mapping 4 huge pages address 7f62bba0 read (-) address 7f62bbc0 read (-) Connection to dbg closed by remote host. Connection to dbf closed. OOM did kick in (lots of console/log output) and killed the shell as well. -- Mike Kravetz

Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map

2018-05-29 Thread Mike Kravetz
libhugetlbfs tests for an unrelated issue/change and, will do some analysis to see exactly what is happening. Also, will take it upon myself to run libhugetlbfs test suite on a regular (at least weekly) basis. -- Mike Kravetz

Re: [PATCH -mm] mm, hugetlb: Pass fault address to no page handler

2018-05-14 Thread Mike Kravetz
to ~25.6%, the > IPC (instruction per cycle) increased from 0.3 to 0.37, and the time > spent in user space is reduced ~19.3% Since this patch only addresses hugetlbfs huge pages, I would suggest making that more explicit in the commit message. Other than that, the changes look fine to me.

Re: [PATCH -mm] mm, hugetlb: Pass fault address to no page handler

2018-05-16 Thread Mike Kravetz
that another area to consider? That gets back to Michal's question of a specific use case or generic optimization. Unless code is simple (as in this patch), seems like we should hold off on considering additional optimizations unless there is a specific use case. I'm still OK with this change. -- Mike Kravetz

Re: [PATCH v4 4/4] hugetlb: allow to free gigantic pages regardless of the configuration

2019-03-01 Thread Mike Kravetz
On 3/1/19 5:21 AM, Alexandre Ghiti wrote: > On 03/01/2019 07:25 AM, Alex Ghiti wrote: >> On 2/28/19 5:26 PM, Mike Kravetz wrote: >>> On 2/28/19 12:23 PM, Dave Hansen wrote: >>>> On 2/28/19 11:50 AM, Mike Kravetz wrote: >>>>> On 2/28/19

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-12 Thread Mike Kravetz
in controls who can have access to hugetlbfs, so I think adding code to the open routine as in patch 2 of this series would seem to work. However, I can imagine more special cases being added for other users. And, once you have more than one special case then you may want to combine them. For example, kvm and hugetlbfs together. -- Mike Kravetz

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Mike Kravetz
On 3/12/19 11:00 PM, Peter Xu wrote: > On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote: >> On 3/11/19 2:36 AM, Peter Xu wrote: >>> >>> The "kvm" entry is a bit special here only to make sure that existing >>> users like QEMU/KVM won'

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Mike Kravetz
tup process enable uffd for all users. Correct? This may be too simple, and I don't really like group access, but how about just defining a uffd group? If you are in the group you can make uffd system calls. -- Mike Kravetz

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Mike Kravetz
On 3/13/19 4:55 PM, Andrea Arcangeli wrote: > On Wed, Mar 13, 2019 at 01:01:40PM -0700, Mike Kravetz wrote: >> On 3/13/19 11:52 AM, Andrea Arcangeli wrote: >>> Unless somebody suggests a consistent way to make hugetlbfs "just >>> work" (like we could achi

Re: [PATCH -next] hugetlbfs: a terminator for hugetlb_param_specs[]

2019-02-04 Thread Mike Kravetz
91.658122] do_mount+0x11f0/0x1640 > [ 91.658125] ksys_mount+0xc0/0xd0 > [ 91.658129] __arm64_sys_mount+0xcc/0xe4 > [ 91.658137] el0_svc_handler+0x28c/0x338 > [ 91.681740] el0_svc+0x8/0xc > > Fixes: 2284cf59cbce ("hugetlbfs: Convert to fs_context") > Signed-off-by:

[PATCH] hugetlbfs: fix memory leak for resv_map

2019-04-01 Thread Mike Kravetz
tructures are only needed for inodes which can have associated page allocations. To fix the leak, only allocate resv_map for those inodes which could possibly be associated with page allocations. Reported-by: Yufen Yu Suggested-by: Yufen Yu Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 2

<    1   2   3   4   5   6   7   8   9   10   >