Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask
On 9/19/2022 10:13 AM, Mike Kravetz wrote: During discussions of this series [1], it was suggested that hugetlb handling code in follow_page_mask could be simplified. At the beginning of follow_page_mask, there currently is a call to follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only architecture which provides a follow_huge_addr routine that does not return error. Instead, at each level of the page table a check is made for a hugetlb entry. If a hugetlb entry is found, a call to a routine associated with that entry is made. Currently, there are two checks for hugetlb entries at each page table level. The first check is of the form: if (p?d_huge()) page = follow_huge_p?d(); the second check is of the form: if (is_hugepd()) page = follow_huge_pd(). We can replace these checks, as well as the special handling routines such as follow_huge_p?d() and follow_huge_pd() with a single routine to handle hugetlb vmas. A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the beginning of follow_page_mask. hugetlb_follow_page_mask will use the existing routine huge_pte_offset to walk page tables looking for hugetlb entries. huge_pte_offset can be overwritten by architectures, and already handles special cases such as hugepd entries. [1] https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/ Suggested-by: David Hildenbrand Signed-off-by: Mike Kravetz LGTM, and works well on my machine. So feel free to add: Reviewed-by: Baolin Wang Tested-by: Baolin Wang
Re: [PATCH 1/4] hugetlb: skip to end of PT page mapping when pte not present
On 6/18/2022 1:17 AM, Mike Kravetz wrote: On 06/17/22 10:15, Peter Xu wrote: Hi, Mike, On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote: @@ -6877,6 +6896,39 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return (pte_t *)pmd; } +/* + * Return a mask that can be used to update an address to the last huge + * page in a page table page mapping size. Used to skip non-present + * page table entries when linearly scanning address ranges. Architectures + * with unique huge page to page table relationships can define their own + * version of this routine. + */ +unsigned long hugetlb_mask_last_page(struct hstate *h) +{ + unsigned long hp_size = huge_page_size(h); + + switch (hp_size) { + case P4D_SIZE: + return PGDIR_SIZE - P4D_SIZE; + case PUD_SIZE: + return P4D_SIZE - PUD_SIZE; + case PMD_SIZE: + return PUD_SIZE - PMD_SIZE; + default: Should we add a WARN_ON_ONCE() if it should never trigger? Sure. I will add this. + break; /* Should never happen */ + } + + return ~(0UL); +} + +#else + +/* See description above. Architectures can provide their own version. */ +__weak unsigned long hugetlb_mask_last_page(struct hstate *h) +{ + return ~(0UL); I'm wondering whether it's better to return 0 rather than ~0 by default. Could an arch with !CONFIG_ARCH_WANT_GENERAL_HUGETLB wrongly skip some valid address ranges with ~0, or perhaps I misread? Thank you, thank you, thank you Peter! Yes, the 'default' return for hugetlb_mask_last_page() should be 0. If there is no 'optimization', we do not want to modify the address so we want to OR with 0 not ~0. My bad, I must have been thinking AND instead of OR. I will change here as well as in Baolin's patch. Ah, I also overlooked this. Thanks Peter, and thanks Mike for updating.
[PATCH v4 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Note: Mike pointed out the huge_ptep_get() will only return the one specific value, and it would not take into account the dirty or young bits of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not introduced by this patch set, and will address this issue in another thread [2]. Meanwhile the uffd for hugetlb case [3] pointed by Gerald also need another patch to address. [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/ [2] https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/ [3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/ Changes from v3: - Fix building errors for !CONFIG_MMU. Changes from v2: - Collect reviewed tags from Muchun and Mike. - Drop the unnecessary casting in hugetlb.c. - Fix building errors with adding dummy functions for !CONFIG_HUGETLB_PAGE. Changes from v1: - Add acked tag from Mike. - Update some commit message. - Add VM_BUG_ON in try_to_unmap() for hugetlb case. - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c. Baolin Wang (3): mm: change huge_ptep_clear_flush() to return the original pte mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping arch/arm64/include/asm/hugetlb.h | 4 +-- arch/arm64/mm/hugetlbpage.c| 12 +++- arch/ia64/include/asm/hugetlb.h| 5 +-- arch/mips/include/asm/hugetlb.h| 9 -- arch/parisc/include/asm/hugetlb.h | 5 +-- arch/powerpc/include/asm/hugetlb.h | 9 -- arch/s390/include/asm/hugetlb.h| 6 ++-- arch/sh/include/asm/hugetlb.h | 5 +-- arch/sparc/include/asm/hugetlb.h | 5 +-- include/asm-generic/hugetlb.h | 4 +-- include/linux/hugetlb.h| 11 +++ mm/rmap.c | 63 -- 12 files changed, 87 insertions(+), 51 deletions(-) -- 1.8.3.1
[PATCH v4 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When migrating a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it and remap it with a migration pte entry. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes. So we will nuke or remap only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb migration. The problem is we can still continue to modify the subpages' data of a hugetlb page during migrating a hugetlb page, which can cause a serious data consistent issue, since we did not nuke the page table entry and set a migration pte for the subpages of a hugetlb page. To fix this issue, we should change to use huge_ptep_clear_flush() to nuke a hugetlb page table, and remap it with set_huge_pte_at() and set_huge_swap_pte_at() when migrating a hugetlb page, which already considered the CONT-PTE or CONT-PMD size hugetlb. Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Reviewed-by: Mike Kravetz --- include/linux/hugetlb.h | 11 +++ mm/rmap.c | 24 ++-- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 306d6ef..abde66e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return *ptep; +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/rmap.c b/mm/rmap.c index 94d6b24..4e96daf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_di
[PATCH v4 1/3] mm: change huge_ptep_clear_flush() to return the original pte
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c| 12 +--- arch/ia64/include/asm/hugetlb.h| 5 +++-- arch/mips/include/asm/hugetlb.h| 9 ++--- arch/parisc/include/asm/hugetlb.h | 5 +++-- arch/powerpc/include/asm/hugetlb.h | 9 ++--- arch/s390/include/asm/hugetlb.h| 6 +++--- arch/sh/include/asm/hugetlb.h | 5 +++-- arch/sparc/include/asm/hugetlb.h | 5 +++-- include/asm-generic/hugetlb.h | 4 ++-- 10 files changed, 36 insertions(+), 28 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index 1242f71..616b2ca 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -extern void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep); +extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTE_CLEAR extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cbace1c..ca8e65c 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); } -void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) { - ptep_clear_flush(vma, addr, ptep); - return; - } + if (!pte_cont(READ_ONCE(*ptep))) + return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(vma->vm_mm, addr, ptep, ); - clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); + return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } static int __init hugetlbpage_init(void) diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h index 7e46ebd..026ead4 100644 --- a/arch/ia64/include/asm/hugetlb.h +++ b/arch/ia64/include/asm/hugetlb.h @@ -23,9 +23,10 @@ static inline int is_hugepage_only_range(struct mm_struct *mm, #define is_hugepage_only_range is_hugepage_only_range #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { + return *ptep; } #include diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h index c214440..fd69c88 100644 --- a/arch/mips/include/asm/hugetlb.h +++ b/arch/mips/include/asm/hugetlb.h @@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { + pte_t pte; + /* * clear the huge pte entry firstly, so that the other smp threads will * not get old pte entry after finishing flush_tlb_page and before * setting new huge pte entry */ - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); flush_tlb_page(vma, addr); + return pte; } #define __HAVE_ARCH_HUGE_PTE_NONE diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h index a69cf9e..f7f078c 100644 --- a/arch/parisc/include/asm/hugetlb.h +++ b/arch/parisc/include/asm/hugetlb.h @@ -28,9 +28,10 @@ static inline int prepare_hugepage_range(struct file *file, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -sta
[PATCH v4 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now try_to_unmap() is only passed a hugetlb page in the case where the hugetlb page is poisoned. Which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. We've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() to make sure the passed hugetlb page is poisoned in try_to_unmap(). Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/rmap.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 4e96daf..219e287 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1528,6 +1528,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (folio_test_hugetlb(folio)) { /* +* The try_to_unmap() is only passed a hugetlb page +* in the case where the hugetlb page is poisoned. +*/ + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may * be cached for this mm, so we must flush them all. @@ -1562,28 +1567,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - } - - /* -* Nuke the page table entry. When having to clear -* PageAnonExclusive(), we always have to flush. -*/ - if (should_defer_flush(mm, flags) && !anon_exclusive) { /* -* We clear the PTE but do not flush so potentially -* a remote CPU could still be writing to the folio. -* If the entry was previously clean then the -* architecture must guarantee that a clear->dirty -* transition on a cached TLB entry is written through -* and traps if the PTE is unmapped. +* Nuke the page table entry. When having to clear +* PageAnonExclusive(), we always have to flush. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (should_defer_flush(mm, flags) && !anon_exclusive) { + /* +* We clear the PTE but do not flush so potentially +* a remote CPU could still be writing to the folio. +* If the entry was previously clean then the +* architecture must guarantee that a clear->dirty +* transition on a cached TLB entry is written through +* and traps if the PTE is unmapped. +*/ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } } /* -- 1.8.3.1
Re: [PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On 5/11/2022 7:28 AM, Andrew Morton wrote: On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton wrote: + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return ptep_get(ptep); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif/* CONFIG_HUGETLB_PAGE */ This blows up nommu (arm allnoconfig): In file included from fs/io_uring.c:71: ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); |^~~~ huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply zapped this change. Well that wasn't a great success. Doing this instead. It's pretty nasty - something nicer would be nicer please. Thanks for fixing the building issue. I'll look at this to simplify the dummy function. Myabe just remove the ptep_get(). diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1097,7 +1097,7 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - return ptep_get(ptep); + return *ptep; } --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix +++ a/include/linux/hugetlb.h @@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at( { } +#ifdef CONFIG_MMU static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { @@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc pte_t *ptep, pte_t pte) { } +#endif #endif/* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, _
Re: [PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
On 5/10/2022 12:04 PM, Andrew Morton wrote: On Tue, 10 May 2022 11:45:57 +0800 Baolin Wang wrote: Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, It would be helpful to describe why it's wrong. Something like "should use huge_ptep_clear_flush() and huge_ptep_clear_flush() for this purpose"? Sorry for the confusing description. I described the problem explicitly in each patch's commit message. https://lore.kernel.org/all/ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.w...@linux.alibaba.com/ https://lore.kernel.org/all/730ea4b6d292f32fb10b7a4e87dad49b0eb30474.1652147571.git.baolin.w...@linux.alibaba.com/ which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Is a cc:stable needed here? And are we able to identify a target for a Fixes: tag? I think need a cc:stable tag, however I am not sure the target fixes tag, since we should trace back to the introduction of CONT-PTE/PMD hugetlb? 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
[PATCH v3 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now try_to_unmap() is only passed a hugetlb page in the case where the hugetlb page is poisoned. Which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. We've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() to make sure the passed hugetlb page is poisoned in try_to_unmap(). Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/rmap.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 4e96daf..219e287 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1528,6 +1528,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (folio_test_hugetlb(folio)) { /* +* The try_to_unmap() is only passed a hugetlb page +* in the case where the hugetlb page is poisoned. +*/ + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may * be cached for this mm, so we must flush them all. @@ -1562,28 +1567,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - } - - /* -* Nuke the page table entry. When having to clear -* PageAnonExclusive(), we always have to flush. -*/ - if (should_defer_flush(mm, flags) && !anon_exclusive) { /* -* We clear the PTE but do not flush so potentially -* a remote CPU could still be writing to the folio. -* If the entry was previously clean then the -* architecture must guarantee that a clear->dirty -* transition on a cached TLB entry is written through -* and traps if the PTE is unmapped. +* Nuke the page table entry. When having to clear +* PageAnonExclusive(), we always have to flush. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (should_defer_flush(mm, flags) && !anon_exclusive) { + /* +* We clear the PTE but do not flush so potentially +* a remote CPU could still be writing to the folio. +* If the entry was previously clean then the +* architecture must guarantee that a clear->dirty +* transition on a cached TLB entry is written through +* and traps if the PTE is unmapped. +*/ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } } /* -- 1.8.3.1
[PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When migrating a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it and remap it with a migration pte entry. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes. So we will nuke or remap only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb migration. The problem is we can still continue to modify the subpages' data of a hugetlb page during migrating a hugetlb page, which can cause a serious data consistent issue, since we did not nuke the page table entry and set a migration pte for the subpages of a hugetlb page. To fix this issue, we should change to use huge_ptep_clear_flush() to nuke a hugetlb page table, and remap it with set_huge_pte_at() and set_huge_swap_pte_at() when migrating a hugetlb page, which already considered the CONT-PTE or CONT-PMD size hugetlb. Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Reviewed-by: Mike Kravetz --- include/linux/hugetlb.h | 11 +++ mm/rmap.c | 24 ++-- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 306d6ef..9f71043 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return ptep_get(ptep); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/rmap.c b/mm/rmap.c index 94d6b24..4e96daf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_di
[PATCH v3 1/3] mm: change huge_ptep_clear_flush() to return the original pte
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c| 12 +--- arch/ia64/include/asm/hugetlb.h| 4 ++-- arch/mips/include/asm/hugetlb.h| 9 ++--- arch/parisc/include/asm/hugetlb.h | 4 ++-- arch/powerpc/include/asm/hugetlb.h | 9 ++--- arch/s390/include/asm/hugetlb.h| 6 +++--- arch/sh/include/asm/hugetlb.h | 4 ++-- arch/sparc/include/asm/hugetlb.h | 4 ++-- include/asm-generic/hugetlb.h | 4 ++-- 10 files changed, 32 insertions(+), 28 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index 1242f71..616b2ca 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -extern void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep); +extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTE_CLEAR extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cbace1c..ca8e65c 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); } -void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) { - ptep_clear_flush(vma, addr, ptep); - return; - } + if (!pte_cont(READ_ONCE(*ptep))) + return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(vma->vm_mm, addr, ptep, ); - clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); + return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } static int __init hugetlbpage_init(void) diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h index 7e46ebd..65d3811 100644 --- a/arch/ia64/include/asm/hugetlb.h +++ b/arch/ia64/include/asm/hugetlb.h @@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm, #define is_hugepage_only_range is_hugepage_only_range #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { } diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h index c214440..fd69c88 100644 --- a/arch/mips/include/asm/hugetlb.h +++ b/arch/mips/include/asm/hugetlb.h @@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { + pte_t pte; + /* * clear the huge pte entry firstly, so that the other smp threads will * not get old pte entry after finishing flush_tlb_page and before * setting new huge pte entry */ - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); flush_tlb_page(vma, addr); + return pte; } #define __HAVE_ARCH_HUGE_PTE_NONE diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h index a69cf9e..25bc560 100644 --- a/arch/parisc/include/asm/hugetlb.h +++ b/arch/parisc/include/asm/hugetlb.h @@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_cle
[PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Note: Mike pointed out the huge_ptep_get() will only return the one specific value, and it would not take into account the dirty or young bits of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not introduced by this patch set, and will address this issue in another thread [2]. Meanwhile the uffd for hugetlb case [3] pointed by Gerald also need another patch to address. [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/ [2] https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/ [3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/ Changes from v2: - Collect reviewed tags from Muchun and Mike. - Drop the unnecessary casting in hugetlb.c. - Fix building errors with adding dummy functions for !CONFIG_HUGETLB_PAGE. Changes from v1: - Add acked tag from Mike. - Update some commit message. - Add VM_BUG_ON in try_to_unmap() for hugetlb case. - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c. Baolin Wang (3): mm: change huge_ptep_clear_flush() to return the original pte mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping arch/arm64/include/asm/hugetlb.h | 4 +-- arch/arm64/mm/hugetlbpage.c| 12 +++- arch/ia64/include/asm/hugetlb.h| 4 +-- arch/mips/include/asm/hugetlb.h| 9 -- arch/parisc/include/asm/hugetlb.h | 4 +-- arch/powerpc/include/asm/hugetlb.h | 9 -- arch/s390/include/asm/hugetlb.h| 6 ++-- arch/sh/include/asm/hugetlb.h | 4 +-- arch/sparc/include/asm/hugetlb.h | 4 +-- include/asm-generic/hugetlb.h | 4 +-- include/linux/hugetlb.h| 11 +++ mm/rmap.c | 63 -- 12 files changed, 83 insertions(+), 51 deletions(-) -- 1.8.3.1
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/10/2022 4:02 AM, Mike Kravetz wrote: On 5/9/22 01:46, Baolin Wang wrote: On 5/9/2022 1:46 PM, Christophe Leroy wrote: Le 08/05/2022 à 15:09, Baolin Wang a écrit : On 5/8/2022 7:09 PM, Muchun Song wrote: On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song Thanks for reviewing. But one nit below: [...] diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8605d7e..61a21af 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, ClearHPageRestoreReserve(new_page); /* Break COW or unshare */ - huge_ptep_clear_flush(vma, haddr, ptep); + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? I did not meet any warning without the casting, but this is per Mike's comment[1] to make the code consistent with other functions casting to void type explicitly in hugetlb.c file. [1] https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/ As far as I understand, Mike said that you should be accompagnied with a big fat comment explaining why we ignore the returned value from huge_ptep_clear_flush(). > By the way huge_ptep_clear_flush() is not declared 'must_check' so this cast is just visual polution and should be removed. In the meantime the comment suggested by Mike should be added instead. Sorry for my misunderstanding. I just follow the explicit void casting like other places in hugetlb.c file. And I am not sure if it is useful adding some comments like below, since we did not need the original pte value in the COW case mapping with a new page, and the code is more readable already I think. Mike, could you help to clarify what useful comments would you like? and remove the explicit void casting? Thanks. Sorry for the confusion. In the original commit, it seemed odd to me that the signature of the function was changing and there was not an associated change to the only caller of the function. I did suggest casting to void or adding a comment. As Christophe mentions, the cast to void is not necessary. In addition, there really isn't a need for a comment as the calling code is not changed. OK. Will drop the casting in next version. The original version of the commit without either is actually preferable. The commit message does say this is a preparation patch and the return value will be used in later patches. OK. Thanks Mike for making me clear. Also thanks to Muchun and Christophe.
Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On 5/10/2022 12:41 AM, Peter Xu wrote: On Fri, May 06, 2022 at 12:07:13PM -0700, Mike Kravetz wrote: On 5/3/22 03:03, Gerald Schaefer wrote: On Tue, 3 May 2022 10:19:46 +0800 Baolin Wang wrote: On 5/2/2022 10:02 PM, Gerald Schaefer wrote: [...] Please see previous code, we'll use the original pte value to check if it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs is set noop_dirty_folio(). pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); Uh, ok, that wouldn't work on s390, but we also don't have CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so I guess we will be fine (for now). Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed() would work on a potential hugetlb *pte, directly de-referencing it instead of using huge_ptep_get(). The !pte_none(*pte) check at the beginning would be broken in the hugetlb case for s390 (not sure about other archs, but I think s390 might be the only exception strictly requiring huge_ptep_get() for de-referencing hugetlb *pte pointers). We could have used is_vm_hugetlb_page(vma) within the helper so as to properly use either generic pte or hugetlb version of pte fetching. We may want to conditionally do set_[huge_]pte_at() too at the end. I could prepare a patch for that even if it's not really anything urgently needed. I assume that won't need to block this patchset since we need the pteval for pte_dirty() check anyway and uffd-wp definitely needs it too. OK. Thanks Peter.
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/9/2022 1:46 PM, Christophe Leroy wrote: Le 08/05/2022 à 15:09, Baolin Wang a écrit : On 5/8/2022 7:09 PM, Muchun Song wrote: On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song Thanks for reviewing. But one nit below: [...] diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8605d7e..61a21af 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, ClearHPageRestoreReserve(new_page); /* Break COW or unshare */ - huge_ptep_clear_flush(vma, haddr, ptep); + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? I did not meet any warning without the casting, but this is per Mike's comment[1] to make the code consistent with other functions casting to void type explicitly in hugetlb.c file. [1] https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/ As far as I understand, Mike said that you should be accompagnied with a big fat comment explaining why we ignore the returned value from huge_ptep_clear_flush(). > By the way huge_ptep_clear_flush() is not declared 'must_check' so this cast is just visual polution and should be removed. In the meantime the comment suggested by Mike should be added instead. Sorry for my misunderstanding. I just follow the explicit void casting like other places in hugetlb.c file. And I am not sure if it is useful adding some comments like below, since we did not need the original pte value in the COW case mapping with a new page, and the code is more readable already I think. Mike, could you help to clarify what useful comments would you like? and remove the explicit void casting? Thanks. /* * Just ignore the return value with new page mapped. */
Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
Hi, On 5/8/2022 8:01 PM, kernel test robot wrote: Hi Baolin, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on next-20220506] [cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20220508/202205081910.mstoc5rj-...@intel.com/config) compiler: gcc-11 (Debian 11.2.0-20) 11.2.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/907981b27213707fdb2f8a24c107d6752a09a773 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036 git checkout 907981b27213707fdb2f8a24c107d6752a09a773 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): mm/rmap.c: In function 'try_to_migrate_one': mm/rmap.c:1931:34: error: implicit declaration of function 'huge_ptep_clear_flush'; did you mean 'ptep_clear_flush'? [-Werror=implicit-function-declaration] 1931 | pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); | ^ | ptep_clear_flush mm/rmap.c:1931:34: error: incompatible types when assigning to type 'pte_t' from type 'int' mm/rmap.c:2023:41: error: implicit declaration of function 'set_huge_pte_at'; did you mean 'set_huge_swap_pte_at'? [-Werror=implicit-function-declaration] 2023 | set_huge_pte_at(mm, address, pvmw.pte, pteval); | ^~~ | set_huge_swap_pte_at cc1: some warnings being treated as errors Thanks for reporting. I think I should add some dummy functions in hugetlb.h file if the CONFIG_HUGETLB_PAGE is not selected. I can pass the building with below changes and your config file. diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 306d6ef..9f71043 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return ptep_get(ptep); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */
Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/8/2022 7:09 PM, Muchun Song wrote: On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote: It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz Reviewed-by: Muchun Song Thanks for reviewing. But one nit below: [...] diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8605d7e..61a21af 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, ClearHPageRestoreReserve(new_page); /* Break COW or unshare */ - huge_ptep_clear_flush(vma, haddr, ptep); + (void)huge_ptep_clear_flush(vma, haddr, ptep); Why add a "(void)" here? Is there any warning if no "(void)"? IIUC, I think we can remove this, right? I did not meet any warning without the casting, but this is per Mike's comment[1] to make the code consistent with other functions casting to void type explicitly in hugetlb.c file. [1] https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/
[PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When migrating a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it and remap it with a migration pte entry. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes. So we will nuke or remap only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb migration. The problem is we can still continue to modify the subpages' data of a hugetlb page during migrating a hugetlb page, which can cause a serious data consistent issue, since we did not nuke the page table entry and set a migration pte for the subpages of a hugetlb page. To fix this issue, we should change to use huge_ptep_clear_flush() to nuke a hugetlb page table, and remap it with set_huge_pte_at() and set_huge_swap_pte_at() when migrating a hugetlb page, which already considered the CONT-PTE or CONT-PMD size hugetlb. Signed-off-by: Baolin Wang --- mm/rmap.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 6fdd198..7cf2408 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2024,7 +2029,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2050,7 +2058,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); + if (folio_test_hugetlb(folio)) + set_huge_swap_pte_at(mm, address, pvmw.pte, +swp_pte, vma_mmu_pagesize(vma)); + else + set_pte_at(mm, address, pvmw.pte, swp_pte); trace_set_migration_pte(address, pte_val(swp_pte), compound_order(>page)); /* -- 1.8.3.1
[PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now try_to_unmap() is only passed a hugetlb page in the case where the hugetlb page is poisoned. Which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. We've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() to make sure the passed hugetlb page is poisoned in try_to_unmap(). Signed-off-by: Baolin Wang --- mm/rmap.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7cf2408..37c8fd2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1530,6 +1530,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (folio_test_hugetlb(folio)) { /* +* The try_to_unmap() is only passed a hugetlb page +* in the case where the hugetlb page is poisoned. +*/ + VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may * be cached for this mm, so we must flush them all. @@ -1564,28 +1569,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - } - - /* -* Nuke the page table entry. When having to clear -* PageAnonExclusive(), we always have to flush. -*/ - if (should_defer_flush(mm, flags) && !anon_exclusive) { /* -* We clear the PTE but do not flush so potentially -* a remote CPU could still be writing to the folio. -* If the entry was previously clean then the -* architecture must guarantee that a clear->dirty -* transition on a cached TLB entry is written through -* and traps if the PTE is unmapped. +* Nuke the page table entry. When having to clear +* PageAnonExclusive(), we always have to flush. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (should_defer_flush(mm, flags) && !anon_exclusive) { + /* +* We clear the PTE but do not flush so potentially +* a remote CPU could still be writing to the folio. +* If the entry was previously clean then the +* architecture must guarantee that a clear->dirty +* transition on a cached TLB entry is written through +* and traps if the PTE is unmapped. +*/ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } } /* -- 1.8.3.1
[PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang Acked-by: Mike Kravetz --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c| 12 +--- arch/ia64/include/asm/hugetlb.h| 4 ++-- arch/mips/include/asm/hugetlb.h| 9 ++--- arch/parisc/include/asm/hugetlb.h | 4 ++-- arch/powerpc/include/asm/hugetlb.h | 9 ++--- arch/s390/include/asm/hugetlb.h| 6 +++--- arch/sh/include/asm/hugetlb.h | 4 ++-- arch/sparc/include/asm/hugetlb.h | 4 ++-- include/asm-generic/hugetlb.h | 4 ++-- mm/hugetlb.c | 2 +- 11 files changed, 33 insertions(+), 29 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index 1242f71..616b2ca 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -extern void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep); +extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTE_CLEAR extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cbace1c..ca8e65c 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); } -void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) { - ptep_clear_flush(vma, addr, ptep); - return; - } + if (!pte_cont(READ_ONCE(*ptep))) + return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(vma->vm_mm, addr, ptep, ); - clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); + return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } static int __init hugetlbpage_init(void) diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h index 7e46ebd..65d3811 100644 --- a/arch/ia64/include/asm/hugetlb.h +++ b/arch/ia64/include/asm/hugetlb.h @@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm, #define is_hugepage_only_range is_hugepage_only_range #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { } diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h index c214440..fd69c88 100644 --- a/arch/mips/include/asm/hugetlb.h +++ b/arch/mips/include/asm/hugetlb.h @@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { + pte_t pte; + /* * clear the huge pte entry firstly, so that the other smp threads will * not get old pte entry after finishing flush_tlb_page and before * setting new huge pte entry */ - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); flush_tlb_page(vma, addr); + return pte; } #define __HAVE_ARCH_HUGE_PTE_NONE diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h index a69cf9e..25bc560 100644 --- a/arch/parisc/include/asm/hugetlb.h +++ b/arch/parisc/include/asm/hugetlb.h @@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_p
[PATCH v2 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Note: Mike pointed out the huge_ptep_get() will only return the one specific value, and it would not take into account the dirty or young bits of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not introduced by this patch set, and will address this issue in another thread [2]. Meanwhile the uffd for hugetlb case [3] pointed by Gerald also need another patch to address. [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/ [2] https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/ [3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/ Changes from v1: - Add acked tag from Mike. - Update some commit message. - Add VM_BUG_ON in try_to_unmap() for hugetlb case. - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c. Baolin Wang (3): mm: change huge_ptep_clear_flush() to return the original pte mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping arch/arm64/include/asm/hugetlb.h | 4 +-- arch/arm64/mm/hugetlbpage.c| 12 +++- arch/ia64/include/asm/hugetlb.h| 4 +-- arch/mips/include/asm/hugetlb.h| 9 -- arch/parisc/include/asm/hugetlb.h | 4 +-- arch/powerpc/include/asm/hugetlb.h | 9 -- arch/s390/include/asm/hugetlb.h| 6 ++-- arch/sh/include/asm/hugetlb.h | 4 +-- arch/sparc/include/asm/hugetlb.h | 4 +-- include/asm-generic/hugetlb.h | 4 +-- mm/hugetlb.c | 2 +- mm/rmap.c | 63 -- 12 files changed, 73 insertions(+), 52 deletions(-) -- 1.8.3.1
Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On 5/7/2022 10:33 AM, Baolin Wang wrote: On 5/7/2022 1:56 AM, Mike Kravetz wrote: On 5/5/22 20:39, Baolin Wang wrote: On 5/6/2022 7:53 AM, Mike Kravetz wrote: On 4/29/22 01:14, Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. diff --git a/mm/rmap.c b/mm/rmap.c index 6fdd198..7cf2408 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set if ANY of the PTE/PMDs had dirty or young set. Right. - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); And, we will use that pteval for ALL the PTE/PMDs here. So, we would set the dirty or young bit in ALL PTE/PMDs. Could that cause any issues? May be more of a question for the arm64 people. I don't think this will cause any issues. Since the hugetlb can not be split, and we should not lose the the dirty or young state if any subpages were set. Meanwhile we already did like this in hugetlb.c: pte = huge_ptep_get_and_clear(mm, address, ptep); tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); Agree that it 'should not' cause issues. It just seems inconsistent. This is not a problem specifically with your patch, just the handling of CONT-PTE/PMD entries. There does not appear to be an arm64 specific version of huge_ptep_get() that takes CONT-PTE/PMD into account. So, huge_ptep_get() would only return the one specific value. It would not take into account the dirty or young bits of CONT-PTE/PMDs like your new version of huge_ptep_get_and_clear. Is that correct? Or, am I missing something. Yes, you are right. If I am correct, then code like the following may not work: static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { pte_t huge_pte = huge_ptep_get(pte); struct numa_maps *md; struct page *page; if (!pte_present(huge_pte)) return 0; page = pte_page(huge_pte); md = walk->private; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } Right, this is inconsistent with current huge_ptep_get() interface like you said. So I think we can define an ARCH-specific huge_ptep_get() interface for arm64, and some sample code like below. How do you think? After some investigation, I send out a RFC patch set[1] to address this issue. We can talk about this issue in that thread. Thanks. [1] https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/
Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On 5/7/2022 1:56 AM, Mike Kravetz wrote: On 5/5/22 20:39, Baolin Wang wrote: On 5/6/2022 7:53 AM, Mike Kravetz wrote: On 4/29/22 01:14, Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. diff --git a/mm/rmap.c b/mm/rmap.c index 6fdd198..7cf2408 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set if ANY of the PTE/PMDs had dirty or young set. Right. - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); And, we will use that pteval for ALL the PTE/PMDs here. So, we would set the dirty or young bit in ALL PTE/PMDs. Could that cause any issues? May be more of a question for the arm64 people. I don't think this will cause any issues. Since the hugetlb can not be split, and we should not lose the the dirty or young state if any subpages were set. Meanwhile we already did like this in hugetlb.c: pte = huge_ptep_get_and_clear(mm, address, ptep); tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); Agree that it 'should not' cause issues. It just seems inconsistent. This is not a problem specifically with your patch, just the handling of CONT-PTE/PMD entries. There does not appear to be an arm64 specific version of huge_ptep_get() that takes CONT-PTE/PMD into account. So, huge_ptep_get() would only return the one specific value. It would not take into account the dirty or young bits of CONT-PTE/PMDs like your new version of huge_ptep_get_and_clear. Is that correct? Or, am I missing something. Yes, you are right. If I am correct, then code like the following may not work: static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { pte_t huge_pte = huge_ptep_get(pte); struct numa_maps *md; struct page *page; if (!pte_present(huge_pte)) return 0; page = pte_page(huge_pte); md = walk->private; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } Right, this is inconsistent with current huge_ptep_get() interface like you said. So I think we can define an ARCH-specific huge_ptep_get() interface for arm64, and some sample code like below. How do you think? +pte_t huge_ptep_get(pte_t *ptep, unsigned long size) +{ + int ncontig; + pte_t orig_pte = ptep_get(ptep); + + if (!pte_cont(orig_pte)) + return orig_pte; + + switch (size) { + case CONT_PMD_SIZE: + ncontig = CONT_PMDS; + break; + case CONT_PTE_SIZE: + ncontig = CONT_PTES; + break; + default: + WARN_ON_ONCE(1); + return orig_pte; + } + + for (i = 0; i < ncontig; i++, ptep++) { + pte_t pte = ptep_get(ptep); + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyong(orig_pte); + } + + return orig_pte; +}
Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On 5/7/2022 2:55 AM, Mike Kravetz wrote: On 4/29/22 01:14, Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now we only use try_to_unmap() to unmap a poisoned hugetlb page, Since try_to_unmap can be called for non-hugetlb pages, perhaps the following is more accurate? try_to_unmap is only passed a hugetlb page in the case where the hugetlb page is poisoned. Yes, will update in next version. It does concern me that this assumption is built into the code as pointed out in your discussion with Gerald. Should we perhaps add a VM_BUG_ON() to make sure the passed huge page is poisoned? This would be in the same 'if block' where we call adjust_range_if_pmd_sharing_possible. Good point. Will do in next version. Thanks.
Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On 5/6/2022 7:53 AM, Mike Kravetz wrote: On 4/29/22 01:14, Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. diff --git a/mm/rmap.c b/mm/rmap.c index 6fdd198..7cf2408 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set if ANY of the PTE/PMDs had dirty or young set. Right. - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); And, we will use that pteval for ALL the PTE/PMDs here. So, we would set the dirty or young bit in ALL PTE/PMDs. Could that cause any issues? May be more of a question for the arm64 people. I don't think this will cause any issues. Since the hugetlb can not be split, and we should not lose the the dirty or young state if any subpages were set. Meanwhile we already did like this in hugetlb.c: pte = huge_ptep_get_and_clear(mm, address, ptep); tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page);
Re: [PATCH 1/3] mm: change huge_ptep_clear_flush() to return the original pte
On 5/6/2022 7:15 AM, Mike Kravetz wrote: On 4/29/22 01:14, Baolin Wang wrote: It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c| 12 +--- arch/ia64/include/asm/hugetlb.h| 4 ++-- arch/mips/include/asm/hugetlb.h| 9 ++--- arch/parisc/include/asm/hugetlb.h | 4 ++-- arch/powerpc/include/asm/hugetlb.h | 9 ++--- arch/s390/include/asm/hugetlb.h| 6 +++--- arch/sh/include/asm/hugetlb.h | 4 ++-- arch/sparc/include/asm/hugetlb.h | 4 ++-- include/asm-generic/hugetlb.h | 4 ++-- 10 files changed, 32 insertions(+), 28 deletions(-) The above changes look straight forward. Happy that you Cc'ed impacted arch maintainers so they can at least have a look. The only user of huge_ptep_clear_flush() today is hugetlb_cow/wp() in mm/hugetlb.c. Any reason why you did not change that code? At least Cause we did not use the return value of huge_ptep_clear_flush() in mm/hugetlb.c. cast the return of huge_ptep_clear_flush() to void with a comment? Sure. Will add an explicit casting in next version. Not absolutely necessary. Acked-by: Mike Kravetz Thanks.
Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On 5/3/2022 6:03 PM, Gerald Schaefer wrote: On Tue, 3 May 2022 10:19:46 +0800 Baolin Wang wrote: On 5/2/2022 10:02 PM, Gerald Schaefer wrote: On Sat, 30 Apr 2022 11:22:33 +0800 Baolin Wang wrote: On 4/30/2022 4:02 AM, Gerald Schaefer wrote: On Fri, 29 Apr 2022 16:14:43 +0800 Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now we only use try_to_unmap() to unmap a poisoned hugetlb page, which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. Note we've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Signed-off-by: Baolin Wang --- mm/rmap.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7cf2408..1e168d7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); Unlike in your patch 2/3, I do not see that this (huge) pteval would later be used again with set_huge_pte_at() instead of set_pte_at(). Not sure if this (huge) pteval could end up at a set_pte_at() later, but if yes, then this would be broken on s390, and you'd need to use set_huge_pte_at() instead of set_pte_at() like in your patch 2/3. IIUC, As I said in the commit message, we will only unmap a poisoned hugetlb page by try_to_unmap(), and the poisoned hugetlb page will be remapped with a poisoned entry by set_huge_swap_pte_at() in try_to_unmap_one(). So I think no need change to use set_huge_pte_at() instead of set_pte_at() for other cases, since the hugetlb page will not hit other cases. if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); set_huge_swap_pte_at(mm, address, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { dec_mm_counter(mm, mm_counter(>page)); set_pte_at(mm, address, pvmw.pte, pteval); } } OK, but wouldn't the pteval be overwritten here with pteval = swp_entry_to_pte(make_hwpoison_entry(subpage))? IOW, what sense does it make to save the returned pteval from huge_ptep_clear_flush(), when it is never being used anywhere? Please see previous code, we'll use the original pte value to check if it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs is set noop_dirty_folio(). pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); Uh, ok, that wouldn't work on s390, but we also don't have CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so I guess we will be fine (for now). OK. Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed() would work on a potential hugetlb *pte, directly de-referencing it instead of using huge_ptep_get(). The !pte_none(*pte) check at the beginning would be broken in the hugetlb case for s390 (not sure about other archs, but I think s390 might be the only exception strictly requiring huge_ptep_get() for de-referencing hugetlb *pte pointers). Right, I think so too. I'll look at the uffd code in detail, seems need another patch to fix the hugetlb for uffd. Thanks for your comments.
Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On 5/2/2022 10:02 PM, Gerald Schaefer wrote: On Sat, 30 Apr 2022 11:22:33 +0800 Baolin Wang wrote: On 4/30/2022 4:02 AM, Gerald Schaefer wrote: On Fri, 29 Apr 2022 16:14:43 +0800 Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now we only use try_to_unmap() to unmap a poisoned hugetlb page, which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. Note we've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Signed-off-by: Baolin Wang --- mm/rmap.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7cf2408..1e168d7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); Unlike in your patch 2/3, I do not see that this (huge) pteval would later be used again with set_huge_pte_at() instead of set_pte_at(). Not sure if this (huge) pteval could end up at a set_pte_at() later, but if yes, then this would be broken on s390, and you'd need to use set_huge_pte_at() instead of set_pte_at() like in your patch 2/3. IIUC, As I said in the commit message, we will only unmap a poisoned hugetlb page by try_to_unmap(), and the poisoned hugetlb page will be remapped with a poisoned entry by set_huge_swap_pte_at() in try_to_unmap_one(). So I think no need change to use set_huge_pte_at() instead of set_pte_at() for other cases, since the hugetlb page will not hit other cases. if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); set_huge_swap_pte_at(mm, address, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { dec_mm_counter(mm, mm_counter(>page)); set_pte_at(mm, address, pvmw.pte, pteval); } } OK, but wouldn't the pteval be overwritten here with pteval = swp_entry_to_pte(make_hwpoison_entry(subpage))? IOW, what sense does it make to save the returned pteval from huge_ptep_clear_flush(), when it is never being used anywhere? Please see previous code, we'll use the original pte value to check if it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs is set noop_dirty_folio(). pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio);
Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On 4/30/2022 4:02 AM, Gerald Schaefer wrote: On Fri, 29 Apr 2022 16:14:43 +0800 Baolin Wang wrote: On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now we only use try_to_unmap() to unmap a poisoned hugetlb page, which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. Note we've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Signed-off-by: Baolin Wang --- mm/rmap.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7cf2408..1e168d7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); Unlike in your patch 2/3, I do not see that this (huge) pteval would later be used again with set_huge_pte_at() instead of set_pte_at(). Not sure if this (huge) pteval could end up at a set_pte_at() later, but if yes, then this would be broken on s390, and you'd need to use set_huge_pte_at() instead of set_pte_at() like in your patch 2/3. IIUC, As I said in the commit message, we will only unmap a poisoned hugetlb page by try_to_unmap(), and the poisoned hugetlb page will be remapped with a poisoned entry by set_huge_swap_pte_at() in try_to_unmap_one(). So I think no need change to use set_huge_pte_at() instead of set_pte_at() for other cases, since the hugetlb page will not hit other cases. if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); set_huge_swap_pte_at(mm, address, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { dec_mm_counter(mm, mm_counter(>page)); set_pte_at(mm, address, pvmw.pte, pteval); } } Please note that huge_ptep_get functions do not return valid PTEs on s390, and such PTEs must never be set directly with set_pte_at(), but only with set_huge_pte_at(). Background is that, for hugetlb pages, we are of course not really dealing with PTEs at this level, but rather PMDs or PUDs, depending on hugetlb size. On s390, the layout is quite different for PTEs and PMDs / PUDs, and unfortunately the hugetlb code is not properly reflecting this by using PMD or PUD types, like the THP code does. So, as work-around, on s390, the huge_ptep_xxx functions will return only fake PTEs, which must be converted again to a proper PMD or PUD, before writing them to the page table, which is what happens in set_huge_pte_at(), but not in set_pte_at(). Thanks for your explanation. As I said as above, I think we've already handled the hugetlb with set_huge_swap_pte_at() in try_to_unmap_one().
[PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When unmapping a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes, so we will nuke only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page. And now we only use try_to_unmap() to unmap a poisoned hugetlb page, which means now we will unmap only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can still access other subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page, which will cause serious issues possibly. So we should change to use huge_ptep_clear_flush() to nuke the hugetlb page table to fix this issue, which already considered CONT-PTE and CONT-PMD size hugetlb. Note we've already used set_huge_swap_pte_at() to set a poisoned swap entry for a poisoned hugetlb page. Signed-off-by: Baolin Wang --- mm/rmap.c | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 7cf2408..1e168d7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, break; } } + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); - } - - /* -* Nuke the page table entry. When having to clear -* PageAnonExclusive(), we always have to flush. -*/ - if (should_defer_flush(mm, flags) && !anon_exclusive) { /* -* We clear the PTE but do not flush so potentially -* a remote CPU could still be writing to the folio. -* If the entry was previously clean then the -* architecture must guarantee that a clear->dirty -* transition on a cached TLB entry is written through -* and traps if the PTE is unmapped. +* Nuke the page table entry. When having to clear +* PageAnonExclusive(), we always have to flush. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (should_defer_flush(mm, flags) && !anon_exclusive) { + /* +* We clear the PTE but do not flush so potentially +* a remote CPU could still be writing to the folio. +* If the entry was previously clean then the +* architecture must guarantee that a clear->dirty +* transition on a cached TLB entry is written through +* and traps if the PTE is unmapped. +*/ + pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); - } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else { + pteval = ptep_clear_flush(vma, address, pvmw.pte); + } } /* -- 1.8.3.1
[PATCH 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating
Hi, Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue, please find details in each patch. Thanks. Baolin Wang (3): mm: change huge_ptep_clear_flush() to return the original pte mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping arch/arm64/include/asm/hugetlb.h | 4 +-- arch/arm64/mm/hugetlbpage.c| 12 arch/ia64/include/asm/hugetlb.h| 4 +-- arch/mips/include/asm/hugetlb.h| 9 -- arch/parisc/include/asm/hugetlb.h | 4 +-- arch/powerpc/include/asm/hugetlb.h | 9 -- arch/s390/include/asm/hugetlb.h| 6 ++-- arch/sh/include/asm/hugetlb.h | 4 +-- arch/sparc/include/asm/hugetlb.h | 4 +-- include/asm-generic/hugetlb.h | 4 +-- mm/rmap.c | 58 +++--- 11 files changed, 67 insertions(+), 51 deletions(-) -- 1.8.3.1
[PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When migrating a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it and remap it with a migration pte entry. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes. So we will nuke or remap only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb migration. The problem is we can still continue to modify the subpages' data of a hugetlb page during migrating a hugetlb page, which can cause a serious data consistent issue, since we did not nuke the page table entry and set a migration pte for the subpages of a hugetlb page. To fix this issue, we should change to use huge_ptep_clear_flush() to nuke a hugetlb page table, and remap it with set_huge_pte_at() and set_huge_swap_pte_at() when migrating a hugetlb page, which already considered the CONT-PTE or CONT-PMD size hugetlb. Signed-off-by: Baolin Wang --- mm/rmap.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 6fdd198..7cf2408 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2024,7 +2029,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(); break; @@ -2050,7 +2058,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); + if (folio_test_hugetlb(folio)) + set_huge_swap_pte_at(mm, address, pvmw.pte, +swp_pte, vma_mmu_pagesize(vma)); + else + set_pte_at(mm, address, pvmw.pte, swp_pte); trace_set_migration_pte(address, pte_val(swp_pte), compound_order(>page)); /* -- 1.8.3.1
[PATCH 1/3] mm: change huge_ptep_clear_flush() to return the original pte
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. Signed-off-by: Baolin Wang --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c| 12 +--- arch/ia64/include/asm/hugetlb.h| 4 ++-- arch/mips/include/asm/hugetlb.h| 9 ++--- arch/parisc/include/asm/hugetlb.h | 4 ++-- arch/powerpc/include/asm/hugetlb.h | 9 ++--- arch/s390/include/asm/hugetlb.h| 6 +++--- arch/sh/include/asm/hugetlb.h | 4 ++-- arch/sparc/include/asm/hugetlb.h | 4 ++-- include/asm-generic/hugetlb.h | 4 ++-- 10 files changed, 32 insertions(+), 28 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index 1242f71..616b2ca 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -extern void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep); +extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTE_CLEAR extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cbace1c..ca8e65c 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); } -void huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) { - ptep_clear_flush(vma, addr, ptep); - return; - } + if (!pte_cont(READ_ONCE(*ptep))) + return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(vma->vm_mm, addr, ptep, ); - clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); + return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } static int __init hugetlbpage_init(void) diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h index 7e46ebd..65d3811 100644 --- a/arch/ia64/include/asm/hugetlb.h +++ b/arch/ia64/include/asm/hugetlb.h @@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm, #define is_hugepage_only_range is_hugepage_only_range #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { } diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h index c214440..fd69c88 100644 --- a/arch/mips/include/asm/hugetlb.h +++ b/arch/mips/include/asm/hugetlb.h @@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, -unsigned long addr, pte_t *ptep) +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { + pte_t pte; + /* * clear the huge pte entry firstly, so that the other smp threads will * not get old pte entry after finishing flush_tlb_page and before * setting new huge pte entry */ - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); flush_tlb_page(vma, addr); + return pte; } #define __HAVE_ARCH_HUGE_PTE_NONE diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h index a69cf9e..25bc560 100644 --- a/arch/parisc/include/asm/hugetlb.h +++ b/arch/parisc/include/asm/hugetlb.h @@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file, } #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH -static inline void huge_ptep_clear_flush(struct vm_a
Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
On 16 July 2015 at 18:43, Thomas Gleixner t...@linutronix.de wrote: On Thu, 16 Jul 2015, Baolin Wang wrote: On 15 July 2015 at 19:55, Thomas Gleixner t...@linutronix.de wrote: On Wed, 15 Jul 2015, Baolin Wang wrote: On 15 July 2015 at 18:31, Thomas Gleixner t...@linutronix.de wrote: On Wed, 15 Jul 2015, Baolin Wang wrote: The cputime_to_timespec() and timespec_to_cputime() functions are not year 2038 safe on 32bit systems due to that the struct timepsec will overflow in 2038 year. And how is this relevant? cputime is not based on wall clock time at all. So what has 2038 to do with cputime? We want proper explanations WHY we need such a change. When converting the posix-cpu-timers, it call the cputime_to_timespec() function. Thus it need a conversion for this function. There is no requirement to convert posix-cpu-timers on their own. We need to adopt the posix cpu timers code because it shares syscalls with the other posix timers, but that still does not explain why we need these functions. In posix-cpu-timers, it also defined some 'k_clock struct' variables, and we need to convert the callbacks of the 'k_clock struct' which are not year 2038 safe on 32bit systems. Some callbacks which need to convert call the cputime_to_timespec() function, thus we also want to convert the cputime_to_timespec() function to a year 2038 safe function to make all them ready for the year 2038 issue. You are not getting it at all. 1) We need to change k_clock callbacks due to 2038 issues 2) posix cpu timers implement affected callbacks 3) posix cpu timers themself and cputime are NOT affected by 2038 So we have 2 options to change the code in posix cpu timers: A) Do the timespec/timespec64 conversion in the posix cpu timer callbacks and leave the cputime functions untouched. B) Implement cputime/timespec64 functions to avoid #A If you go for #B, you need to provide a reasonable explanation why it is better than #A. And that explanation has absolutely nothing to do with 2038 safety. Very thanks for your explanation, and I'll think about that. Not everything is a 2038 issue, just because the only tool you have is a timespec64. Thanks, tglx -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
On 15 July 2015 at 19:55, Thomas Gleixner t...@linutronix.de wrote: On Wed, 15 Jul 2015, Baolin Wang wrote: On 15 July 2015 at 18:31, Thomas Gleixner t...@linutronix.de wrote: On Wed, 15 Jul 2015, Baolin Wang wrote: The cputime_to_timespec() and timespec_to_cputime() functions are not year 2038 safe on 32bit systems due to that the struct timepsec will overflow in 2038 year. And how is this relevant? cputime is not based on wall clock time at all. So what has 2038 to do with cputime? We want proper explanations WHY we need such a change. When converting the posix-cpu-timers, it call the cputime_to_timespec() function. Thus it need a conversion for this function. There is no requirement to convert posix-cpu-timers on their own. We need to adopt the posix cpu timers code because it shares syscalls with the other posix timers, but that still does not explain why we need these functions. In posix-cpu-timers, it also defined some 'k_clock struct' variables, and we need to convert the callbacks of the 'k_clock struct' which are not year 2038 safe on 32bit systems. Some callbacks which need to convert call the cputime_to_timespec() function, thus we also want to convert the cputime_to_timespec() function to a year 2038 safe function to make all them ready for the year 2038 issue. You can see that conversion in patch posix-cpu-timers: Convert to y2038 safe callbacks from https://git.linaro.org/people/baolin.wang/upstream_0627.git. I do not care about your random git tree. I care about proper changelogs. Your changelogs are just a copied boilerplate full of errors. Thanks, tglx -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
The cputime_to_timespec() and timespec_to_cputime() functions are not year 2038 safe on 32bit systems due to that the struct timepsec will overflow in 2038 year. This patch introduces cputime_to_timespec64() and timespec64_to_cputime() functions which use struct timespec64. And converts arch specific implementations in arch/s390 and arch/powerpc. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- arch/powerpc/include/asm/cputime.h|6 +++--- arch/s390/include/asm/cputime.h |8 include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |6 +++--- include/linux/cputime.h | 16 5 files changed, 31 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index e245255..5dda5c0 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long sec) } /* - * Convert cputime - timespec + * Convert cputime - timespec64 */ -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *p) { u64 x = (__force u64) ct; unsigned int frac; @@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) p-tv_nsec = x; } -static inline cputime_t timespec_to_cputime(const struct timespec *p) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *p) { u64 ct; diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h index 221b454..3319b51 100644 --- a/arch/s390/include/asm/cputime.h +++ b/arch/s390/include/asm/cputime.h @@ -81,16 +81,16 @@ static inline cputime_t secs_to_cputime(const unsigned int s) } /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -static inline cputime_t timespec_to_cputime(const struct timespec *value) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *value) { unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC; return (__force cputime_t)(ret + __div(value-tv_nsec * CPUTIME_PER_USEC, NSEC_PER_USEC)); } -static inline void cputime_to_timespec(const cputime_t cputime, - struct timespec *value) +static inline void cputime_to_timespec64(const cputime_t cputime, +struct timespec64 *value) { unsigned long long __cputime = (__force unsigned long long) cputime; value-tv_nsec = (__cputime % CPUTIME_PER_SEC) * NSEC_PER_USEC / CPUTIME_PER_USEC; diff --git a/include/asm-generic/cputime_jiffies.h b/include/asm-generic/cputime_jiffies.h index fe386fc..54e034c 100644 --- a/include/asm-generic/cputime_jiffies.h +++ b/include/asm-generic/cputime_jiffies.h @@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t; #define secs_to_cputime(sec) jiffies_to_cputime((sec) * HZ) /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -#define timespec_to_cputime(__val) \ - jiffies_to_cputime(timespec_to_jiffies(__val)) -#define cputime_to_timespec(__ct,__val)\ - jiffies_to_timespec(cputime_to_jiffies(__ct),__val) +#define timespec64_to_cputime(__val) \ + jiffies_to_cputime(timespec64_to_jiffies(__val)) +#define cputime_to_timespec64(__ct,__val) \ + jiffies_to_timespec64(cputime_to_jiffies(__ct),__val) /* * Convert cputime to timeval and back. diff --git a/include/asm-generic/cputime_nsecs.h b/include/asm-generic/cputime_nsecs.h index 0419485..c0cafc0 100644 --- a/include/asm-generic/cputime_nsecs.h +++ b/include/asm-generic/cputime_nsecs.h @@ -71,14 +71,14 @@ typedef u64 __nocast cputime64_t; (__force cputime_t)((__secs) * NSEC_PER_SEC) /* - * Convert cputime - timespec (nsec) + * Convert cputime - timespec64 (nsec) */ -static inline cputime_t timespec_to_cputime(const struct timespec *val) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *val) { u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec; return (__force cputime_t) ret; } -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *val) { u32 rem; diff --git a/include/linux/cputime.h b/include/linux/cputime.h index f2eb2ee..cd638a0 100644 --- a/include/linux/cputime.h +++ b/include/linux/cputime.h @@ -13,4 +13,20 @@ usecs_to_cputime((__nsecs) / NSEC_PER_USEC) #endif +static inline cputime_t timespec_to_cputime(const struct timespec *ts) +{ + struct timespec64 ts64 = timespec_to_timespec64(*ts); + + return timespec64_to_cputime(ts64); +} + +static inline void cputime_to_timespec(const cputime_t cputime
Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
On 15 July 2015 at 18:31, Thomas Gleixner t...@linutronix.de wrote: On Wed, 15 Jul 2015, Baolin Wang wrote: The cputime_to_timespec() and timespec_to_cputime() functions are not year 2038 safe on 32bit systems due to that the struct timepsec will overflow in 2038 year. And how is this relevant? cputime is not based on wall clock time at all. So what has 2038 to do with cputime? We want proper explanations WHY we need such a change. When converting the posix-cpu-timers, it call the cputime_to_timespec() function. Thus it need a conversion for this function. You can see that conversion in patch posix-cpu-timers: Convert to y2038 safe callbacks from https://git.linaro.org/people/baolin.wang/upstream_0627.git. And I also will explain this in the changelog. Thanks for your comments. Thanks, tglx -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/6] Introduce 64bit accessors and structures required to address y2038 issues in the posix_clock subsystem
This patch series change the 32-bit time types (timespec/itimerspec) to the 64-bit types (timespec64/itimerspec64), and add new 64bit accessor functions, which are required in order to avoid y2038 issues in the posix_clock subsystem. In order to avoid spamming people too much, I'm only sending the first few patches of the patch series, and left the other patches for later. And if you are interested in the whole patch series, see: https://git.linaro.org/people/baolin.wang/upstream_0627.git Thoughts and feedback would be appreciated. Baolin Wang (6): time: Introduce struct itimerspec64 timekeeping: Introduce current_kernel_time64() security: Introduce security_settime64() time: Introduce do_sys_settimeofday64() time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64() cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime() arch/powerpc/include/asm/cputime.h|6 +++--- arch/s390/include/asm/cputime.h |8 include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |6 +++--- include/linux/cputime.h | 16 +++ include/linux/jiffies.h | 22 ++--- include/linux/lsm_hooks.h |5 +++-- include/linux/security.h | 20 --- include/linux/time64.h| 35 + include/linux/timekeeping.h | 24 +++--- kernel/time/time.c| 28 +++--- kernel/time/timekeeping.c |6 +++--- security/commoncap.c |2 +- security/security.c |2 +- 14 files changed, 148 insertions(+), 42 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038
On 12 June 2015 at 21:16, Thomas Gleixner t...@linutronix.de wrote: On Fri, 12 Jun 2015, Baolin Wang wrote: Sigh. Again threading of the series failed. Some patches are, the whole series is not. Can you please get your tools straight? You neither managed to cc me on the security patch. - Modify the subject line and the changelog: timekeeping: Change the implementation of timekeeping_clocktai() Sigh. How is that better than the previous one? It's more accurate, but equally useless. And of course you did not address my request to change the macro mess in posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec according to the discussion with Arnd. Thanks, tglx Hi Thomas, Thanks for your comments, and i'll fix these problems you point out. -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time types (timespec/itimerspec) to the 64-bit types (timespec64/itimerspec64), since 32-bit time types will break in the year 2038 on 32bit systems. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. --- Changes since v4: - Rebase the patch series. - Modify the subject line and the changelog. Changes since v3: - Fix some introducing bugs. Changes since v2: - Split the syscall conversion patch into small some patches. Changes since V1: - Split some patch into small patch. - Add some default function for new 64bit methods for syscall function. - Move do_sys_settimeofday() function to head file. - Modify the EXPORT_SYMPOL issue. - Add new 64bit methods in cputime_nsecs.h file. --- Baolin Wang (24): time: Introduce struct itimerspec64 timekeeping: Introduce current_kernel_time64() security: Introduce security_settime64() time: Introduce do_sys_settimeofday64() posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec posix-timers: Factor out the guts of 'timer_gettime' posix-timers: Implement y2038 safe timer_get64() callback posix-timers: Factor out the guts of 'timer_settime' posix-timers: Implement y2038 safe timer_set64() callback posix-timers: Factor out the guts of 'clock_settime' posix-timers: Implement y2038 safe clock_set64() callback posix-timers: Factor out the guts of 'clock_gettime' posix-timers: Implement y2038 safe clock_get64() callback posix-timers: Factor out the guts of 'clcok_getres' posix-timers: Implement y2038 safe clock_getres64() callback timekeeping: Change the implementation of timekeeping_clocktai() posix-timers: Convert to y2038 safe callbacks mmtimer: Convert to y2038 safe callbacks alarmtimer: Convert to y2038 safe callbacks posix-clock: Convert to y2038 safe callbacks time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64() cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime() posix-cpu-timers: Convert to y2038 safe callbacks k_clock: Remove y2038 unsafe callbacks arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 +++-- drivers/ptp/ptp_clock.c | 22 +-- include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |6 +- include/linux/cputime.h | 16 ++ include/linux/jiffies.h | 21 ++- include/linux/lsm_hooks.h |5 +- include/linux/posix-clock.h | 10 +- include/linux/posix-timers.h | 18 +-- include/linux/security.h | 20 ++- include/linux/time64.h| 35 + include/linux/timekeeping.h | 25 +++- kernel/time/alarmtimer.c | 38 ++--- kernel/time/posix-clock.c | 20 +-- kernel/time/posix-cpu-timers.c| 84 ++- kernel/time/posix-timers.c| 257 + kernel/time/time.c| 19 +-- kernel/time/timekeeping.c |6 +- security/commoncap.c |2 +- security/security.c |2 +- 22 files changed, 412 insertions(+), 254 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v5 22/24] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
The cputime_to_timespec() and timespec_to_cputime() functions are not year 2038 safe on 32bit systems due to the struct timepsec will overflow in 2038 year. Introduce cputime_to_timespec64() and timespec64_to_cputime() functions which use struct timespec64, as well as for arch/s390 and arch/powerpc architecture. The cputime_to_timespec() and timespec_to_cputime() functions are moved to include/linux/cputime.h file as 'static inline' for removing conveniently in future. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- arch/powerpc/include/asm/cputime.h|6 +++--- arch/s390/include/asm/cputime.h |8 include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |6 +++--- include/linux/cputime.h | 16 5 files changed, 31 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index e245255..5dda5c0 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long sec) } /* - * Convert cputime - timespec + * Convert cputime - timespec64 */ -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *p) { u64 x = (__force u64) ct; unsigned int frac; @@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) p-tv_nsec = x; } -static inline cputime_t timespec_to_cputime(const struct timespec *p) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *p) { u64 ct; diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h index 221b454..3319b51 100644 --- a/arch/s390/include/asm/cputime.h +++ b/arch/s390/include/asm/cputime.h @@ -81,16 +81,16 @@ static inline cputime_t secs_to_cputime(const unsigned int s) } /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -static inline cputime_t timespec_to_cputime(const struct timespec *value) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *value) { unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC; return (__force cputime_t)(ret + __div(value-tv_nsec * CPUTIME_PER_USEC, NSEC_PER_USEC)); } -static inline void cputime_to_timespec(const cputime_t cputime, - struct timespec *value) +static inline void cputime_to_timespec64(const cputime_t cputime, +struct timespec64 *value) { unsigned long long __cputime = (__force unsigned long long) cputime; value-tv_nsec = (__cputime % CPUTIME_PER_SEC) * NSEC_PER_USEC / CPUTIME_PER_USEC; diff --git a/include/asm-generic/cputime_jiffies.h b/include/asm-generic/cputime_jiffies.h index fe386fc..54e034c 100644 --- a/include/asm-generic/cputime_jiffies.h +++ b/include/asm-generic/cputime_jiffies.h @@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t; #define secs_to_cputime(sec) jiffies_to_cputime((sec) * HZ) /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -#define timespec_to_cputime(__val) \ - jiffies_to_cputime(timespec_to_jiffies(__val)) -#define cputime_to_timespec(__ct,__val)\ - jiffies_to_timespec(cputime_to_jiffies(__ct),__val) +#define timespec64_to_cputime(__val) \ + jiffies_to_cputime(timespec64_to_jiffies(__val)) +#define cputime_to_timespec64(__ct,__val) \ + jiffies_to_timespec64(cputime_to_jiffies(__ct),__val) /* * Convert cputime to timeval and back. diff --git a/include/asm-generic/cputime_nsecs.h b/include/asm-generic/cputime_nsecs.h index 0419485..c0cafc0 100644 --- a/include/asm-generic/cputime_nsecs.h +++ b/include/asm-generic/cputime_nsecs.h @@ -71,14 +71,14 @@ typedef u64 __nocast cputime64_t; (__force cputime_t)((__secs) * NSEC_PER_SEC) /* - * Convert cputime - timespec (nsec) + * Convert cputime - timespec64 (nsec) */ -static inline cputime_t timespec_to_cputime(const struct timespec *val) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *val) { u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec; return (__force cputime_t) ret; } -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *val) { u32 rem; diff --git a/include/linux/cputime.h b/include/linux/cputime.h index f2eb2ee..cd638a0 100644 --- a/include/linux/cputime.h +++ b/include/linux/cputime.h @@ -13,4 +13,20 @@ usecs_to_cputime((__nsecs) / NSEC_PER_USEC) #endif +static inline cputime_t timespec_to_cputime(const struct timespec *ts) +{ + struct timespec64 ts64 = timespec_to_timespec64(*ts
Re: [PATCH v4 00/25] Convert the posix_clock_operations and k_clock structure to ready for 2038
On 3 June 2015 at 03:20, Thomas Gleixner t...@linutronix.de wrote: On Mon, 1 Jun 2015, Baolin Wang wrote: This patch series changes the 32-bit time types (timespec/itimerspec) to the 64-bit types (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. That's only true for 32bit systems. All in all the patch series looks rather reasonable now, except for the subject lines and the changelogs. The only technical objection I have is the macro conversion magic in patch #6. This can be done in a less cryptic and more efficient way. See the comments to the various patches and please apply them to all of the series. Thanks, tglx Hi Thomas, Thanks for your comments, and i'll check and fix these problems. -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 00/25] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time types (timespec/itimerspec) to the 64-bit types (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. Baolin Wang (25): time:Introduce struct itimerspec64 timekeeping:Introduce the current_kernel_time64() hrtimer:Introduce hrtimer_get_res64() security: Introduce security_settime64() time:Introduce the do_sys_settimeofday64() posix-timers:Introduce {get,put}_timespec/{get,put}_itimerspec posix-timers: Split up timer_gettime()/timer_settime()/clock_settime()/ clock_gettime()/clock_getres(). posix-timers: Convert timer_gettime()/timer_settime()/clock_settime()/ clock_gettime()/clock_getres() to timespec64/itimerspec64. mmtimer:Convert to timespec64/itimerspec64 alarmtimer:Convert to timespec64/itimerspec64 posix-clock:Convert to timespec64/itimerspec64 time:Introduce timespec64_to_jiffies()/jiffies_to_timespec64() cputime:Introduce cputime_to_timespec64()/timespec64_to_cputime() posix-cpu-timers:Convert to timespec64/itimerspec64 k_clock:Remove timespec/itimerspec arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 +++-- drivers/ptp/ptp_clock.c | 26 +--- include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 +- include/linux/cputime.h | 16 ++ include/linux/hrtimer.h | 16 +- include/linux/jiffies.h | 21 ++- include/linux/posix-clock.h | 10 +- include/linux/posix-timers.h | 18 +-- include/linux/security.h | 25 +++- include/linux/time64.h| 35 + include/linux/timekeeping.h | 26 +++- kernel/time/alarmtimer.c | 43 +++--- kernel/time/hrtimer.c | 10 +- kernel/time/posix-clock.c | 20 +-- kernel/time/posix-cpu-timers.c| 84 ++- kernel/time/posix-timers.c| 259 + kernel/time/time.c| 20 +-- kernel/time/timekeeping.c |6 +- kernel/time/timekeeping.h |1 - security/commoncap.c |2 +- security/security.c |2 +- 24 files changed, 437 insertions(+), 267 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 23/25] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function
This patch introduces some functions for converting cputime to timespec64 and back, that repalce the timespec type with timespec64 type, as well as for arch/s390 and arch/powerpc architecture. And these new methods will replace the old cputime_to_timespec/timespec_to_cputime function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime functions are moved to include/linux/cputime.h file for removing conveniently. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- arch/powerpc/include/asm/cputime.h|6 +++--- arch/s390/include/asm/cputime.h |8 include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 ++-- include/linux/cputime.h | 16 5 files changed, 30 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index e245255..5dda5c0 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long sec) } /* - * Convert cputime - timespec + * Convert cputime - timespec64 */ -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *p) { u64 x = (__force u64) ct; unsigned int frac; @@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) p-tv_nsec = x; } -static inline cputime_t timespec_to_cputime(const struct timespec *p) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *p) { u64 ct; diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h index b91e960..1266697 100644 --- a/arch/s390/include/asm/cputime.h +++ b/arch/s390/include/asm/cputime.h @@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int s) } /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -static inline cputime_t timespec_to_cputime(const struct timespec *value) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *value) { unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC; return (__force cputime_t)(ret + __div(value-tv_nsec * CPUTIME_PER_USEC, NSEC_PER_USEC)); } -static inline void cputime_to_timespec(const cputime_t cputime, - struct timespec *value) +static inline void cputime_to_timespec64(const cputime_t cputime, + struct timespec64 *value) { unsigned long long __cputime = (__force unsigned long long) cputime; #ifndef CONFIG_64BIT diff --git a/include/asm-generic/cputime_jiffies.h b/include/asm-generic/cputime_jiffies.h index fe386fc..54e034c 100644 --- a/include/asm-generic/cputime_jiffies.h +++ b/include/asm-generic/cputime_jiffies.h @@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t; #define secs_to_cputime(sec) jiffies_to_cputime((sec) * HZ) /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -#define timespec_to_cputime(__val) \ - jiffies_to_cputime(timespec_to_jiffies(__val)) -#define cputime_to_timespec(__ct,__val)\ - jiffies_to_timespec(cputime_to_jiffies(__ct),__val) +#define timespec64_to_cputime(__val) \ + jiffies_to_cputime(timespec64_to_jiffies(__val)) +#define cputime_to_timespec64(__ct,__val) \ + jiffies_to_timespec64(cputime_to_jiffies(__ct),__val) /* * Convert cputime to timeval and back. diff --git a/include/asm-generic/cputime_nsecs.h b/include/asm-generic/cputime_nsecs.h index 0419485..65c875b 100644 --- a/include/asm-generic/cputime_nsecs.h +++ b/include/asm-generic/cputime_nsecs.h @@ -73,12 +73,12 @@ typedef u64 __nocast cputime64_t; /* * Convert cputime - timespec (nsec) */ -static inline cputime_t timespec_to_cputime(const struct timespec *val) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *val) { u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec; return (__force cputime_t) ret; } -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *val) { u32 rem; diff --git a/include/linux/cputime.h b/include/linux/cputime.h index f2eb2ee..e4c88da 100644 --- a/include/linux/cputime.h +++ b/include/linux/cputime.h @@ -13,4 +13,20 @@ usecs_to_cputime((__nsecs) / NSEC_PER_USEC) #endif +static inline cputime_t timespec_to_cputime(const struct timespec *ts) +{ + struct timespec64 ts64 = timespec_to_timespec64(*ts); + + return timespec64_to_cputime(ts64); +} + +static inline void cputime_to_timespec(const cputime_t cputime, + struct timespec *value) +{ + struct timespec64 ts64
Re: [PATCH v3 00/22] Convert the posix_clock_operations and k_clock structure to ready for 2038
On 12 May 2015 at 17:39, Arnd Bergmann a...@arndb.de wrote: On Monday 11 May 2015 19:08:38 Baolin Wang wrote: This patch series changes the 32-bit time type (timespec/itimerspec) to the 64-bit one (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. Also introduces some new functions with timespec64/itimerspec64 type, like current_kernel_time64(), hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime(). Changes since v2: -Split the syscall conversion patch into small some patches. Baolin Wang (22): linux/time64.h:Introduce the 'struct itimerspec64' for 64bit timekeeping:Introduce the current_kernel_time64() function with timespec64 type time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution posix-timers:Split out the guts of the syscall and change the implementation for timer_gettime posix-timers:Convert to the 64bit methods for the timer_gettime syscall function I have two more very general comments about the series: a) something has gone wrong with your submission in v2 and v3 but was working earlier: normally all emails should be sent by git-send-email as replies to the [patch 00/22] mail. This is the default, and it is enabled by the '--thread --no-chain-reply' options. Please try to get this to work again. b) it would be better to have a little shorter subject lines, to avoid line-wrapping in the list above. Here are some examples what you could use to replace the lines above: timekeeping: introduce struct itimerspec64 timekeeping: introduce current_kernel_time64() hrtimer: introduce hrtimer_get_res64() posix-timers: split up sys_timer_gettime() posix-timers: convert timer_gettime() to timespec64 In general, try to come up with the shortest description that uniquely describes what your patch does, and move any details into the longer patch description. Arnd OK, i'll fix these in next patch series.Thanks for your comments. -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 00/22] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time type (timespec/itimerspec) to the 64-bit one (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. Also introduces some new functions with timespec64/itimerspec64 type, like current_kernel_time64(), hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime(). Changes since v2: -Split the syscall conversion patch into small some patches. Baolin Wang (22): linux/time64.h:Introduce the 'struct itimerspec64' for 64bit timekeeping:Introduce the current_kernel_time64() function with timespec64 type time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution posix-timers:Split out the guts of the syscall and change the implementation for timer_gettime posix-timers:Convert to the 64bit methods for the timer_gettime syscall function posix-timers:Split out the guts of the syscall and change the implementation for timer_settime posix-timers:Convert to the 64bit methods for the timer_settime syscall function posix-timers:Split out the guts of the syscall and change the implementation for clock_settime posix-timers:Convert to the 64bit methods for the clock_settime syscall function posix-timers:Split out the guts of the syscall and change the implementation for clock_gettime posix-timers:Convert to the 64bit methods for the clock_gettime syscall function posix-timers:Split out the guts of the syscall and change the implementation for clock_getres posix-timers:Convert to the 64bit methods for the clock_getres syscall function time:Introduce the do_sys_settimeofday64() function with timespec64 type time/posix-timers:Convert to the 64bit methods for k_clock callback functions char/mmtimer:Convert to the 64bit methods for k_clock callback function time/alarmtimer:Convert to the new 64bit methods for k_clock structure time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure time/time:Introduce the timespec64_to_jiffies/jiffies_to_timespec64 function cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure k_clock:Remove the 32bit methods with timespec/itimerspec type arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 +++-- drivers/ptp/ptp_clock.c | 26 +--- include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 +- include/linux/cputime.h | 15 ++ include/linux/hrtimer.h | 12 +- include/linux/jiffies.h | 21 ++- include/linux/posix-clock.h | 10 +- include/linux/posix-timers.h | 18 +-- include/linux/time64.h| 35 + include/linux/timekeeping.h | 26 +++- kernel/time/alarmtimer.c | 43 +++--- kernel/time/hrtimer.c | 10 +- kernel/time/posix-clock.c | 20 +-- kernel/time/posix-cpu-timers.c| 83 +- kernel/time/posix-timers.c| 269 ++--- kernel/time/time.c| 22 +-- kernel/time/timekeeping.c |6 +- kernel/time/timekeeping.h |2 +- 21 files changed, 428 insertions(+), 254 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 00/22] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time type (timespec/itimerspec) to the 64-bit one (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. Also introduces some new functions with timespec64/itimerspec64 type, like current_kernel_time64(), hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime(). Changes since v2: -Split the syscall conversion patch into small some patches. *** BLURB HERE *** Baolin Wang (22): linux/time64.h:Introduce the 'struct itimerspec64' for 64bit timekeeping:Introduce the current_kernel_time64() function with timespec64 type time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution posix-timers:Split out the guts of the syscall and change the implementation for timer_gettime posix-timers:Convert to the 64bit methods for the timer_gettime syscall function posix-timers:Split out the guts of the syscall and change the implementation for timer_settime posix-timers:Convert to the 64bit methods for the timer_settime syscall function posix-timers:Split out the guts of the syscall and change the implementation for clock_settime posix-timers:Convert to the 64bit methods for the clock_settime syscall function posix-timers:Split out the guts of the syscall and change the implementation for clock_gettime posix-timers:Convert to the 64bit methods for the clock_gettime syscall function posix-timers:Split out the guts of the syscall and change the implementation for clock_getres posix-timers:Convert to the 64bit methods for the clock_getres syscall function time:Introduce the do_sys_settimeofday64() function with timespec64 type time/posix-timers:Convert to the 64bit methods for k_clock callback functions char/mmtimer:Convert to the 64bit methods for k_clock callback function time/alarmtimer:Convert to the new 64bit methods for k_clock structure time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure time/time:Introduce the timespec64_to_jiffies/jiffies_to_timespec64 function cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure k_clock:Remove the 32bit methods with timespec/itimerspec type arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 +++-- drivers/ptp/ptp_clock.c | 26 +--- include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 +- include/linux/cputime.h | 15 ++ include/linux/hrtimer.h | 12 +- include/linux/jiffies.h | 21 ++- include/linux/posix-clock.h | 10 +- include/linux/posix-timers.h | 18 +-- include/linux/time64.h| 35 + include/linux/timekeeping.h | 26 +++- kernel/time/alarmtimer.c | 43 +++--- kernel/time/hrtimer.c | 10 +- kernel/time/posix-clock.c | 20 +-- kernel/time/posix-cpu-timers.c| 83 +- kernel/time/posix-timers.c| 269 ++--- kernel/time/time.c| 22 +-- kernel/time/timekeeping.c |6 +- kernel/time/timekeeping.h |2 +- 21 files changed, 428 insertions(+), 254 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 20/22] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function
This patch introduces some functions for converting cputime to timespec64 and back, that repalce the timespec type with timespec64 type, as well as for arch/s390 and arch/powerpc architecture. And these new methods will replace the old cputime_to_timespec/timespec_to_cputime function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime functions are moved to include/linux/cputime.h file for removing conveniently. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- arch/powerpc/include/asm/cputime.h|6 +++--- arch/s390/include/asm/cputime.h |8 include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 ++-- include/linux/cputime.h | 15 +++ 5 files changed, 29 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index e245255..5dda5c0 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long sec) } /* - * Convert cputime - timespec + * Convert cputime - timespec64 */ -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *p) { u64 x = (__force u64) ct; unsigned int frac; @@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) p-tv_nsec = x; } -static inline cputime_t timespec_to_cputime(const struct timespec *p) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *p) { u64 ct; diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h index b91e960..1266697 100644 --- a/arch/s390/include/asm/cputime.h +++ b/arch/s390/include/asm/cputime.h @@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int s) } /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -static inline cputime_t timespec_to_cputime(const struct timespec *value) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *value) { unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC; return (__force cputime_t)(ret + __div(value-tv_nsec * CPUTIME_PER_USEC, NSEC_PER_USEC)); } -static inline void cputime_to_timespec(const cputime_t cputime, - struct timespec *value) +static inline void cputime_to_timespec64(const cputime_t cputime, + struct timespec64 *value) { unsigned long long __cputime = (__force unsigned long long) cputime; #ifndef CONFIG_64BIT diff --git a/include/asm-generic/cputime_jiffies.h b/include/asm-generic/cputime_jiffies.h index fe386fc..54e034c 100644 --- a/include/asm-generic/cputime_jiffies.h +++ b/include/asm-generic/cputime_jiffies.h @@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t; #define secs_to_cputime(sec) jiffies_to_cputime((sec) * HZ) /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -#define timespec_to_cputime(__val) \ - jiffies_to_cputime(timespec_to_jiffies(__val)) -#define cputime_to_timespec(__ct,__val)\ - jiffies_to_timespec(cputime_to_jiffies(__ct),__val) +#define timespec64_to_cputime(__val) \ + jiffies_to_cputime(timespec64_to_jiffies(__val)) +#define cputime_to_timespec64(__ct,__val) \ + jiffies_to_timespec64(cputime_to_jiffies(__ct),__val) /* * Convert cputime to timeval and back. diff --git a/include/asm-generic/cputime_nsecs.h b/include/asm-generic/cputime_nsecs.h index 0419485..65c875b 100644 --- a/include/asm-generic/cputime_nsecs.h +++ b/include/asm-generic/cputime_nsecs.h @@ -73,12 +73,12 @@ typedef u64 __nocast cputime64_t; /* * Convert cputime - timespec (nsec) */ -static inline cputime_t timespec_to_cputime(const struct timespec *val) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *val) { u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec; return (__force cputime_t) ret; } -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *val) { u32 rem; diff --git a/include/linux/cputime.h b/include/linux/cputime.h index f2eb2ee..f01896f 100644 --- a/include/linux/cputime.h +++ b/include/linux/cputime.h @@ -13,4 +13,19 @@ usecs_to_cputime((__nsecs) / NSEC_PER_USEC) #endif +static inline cputime_t timespec_to_cputime(const struct timespec *ts) +{ + struct timespec64 ts64 = timespec_to_timespec64(*ts); + return timespec64_to_cputime(ts64); +} + +static inline void cputime_to_timespec(const cputime_t cputime, + struct timespec *value) +{ + struct timespec64 *ts64
[PATCH v2 13/15] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function
This patch introduces some functions for converting cputime to timespec64 and back, that repalce the timespec type with timespec64 type, as well as for arch/s390 and arch/powerpc architecture. And these new methods will replace the old cputime_to_timespec/timespec_to_cputime function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime functions are moved to include/linux/cputime.h file for removing conveniently. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- arch/powerpc/include/asm/cputime.h|6 +++--- arch/s390/include/asm/cputime.h |8 include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 ++-- include/linux/cputime.h | 15 +++ 5 files changed, 29 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index e245255..5dda5c0 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long sec) } /* - * Convert cputime - timespec + * Convert cputime - timespec64 */ -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *p) { u64 x = (__force u64) ct; unsigned int frac; @@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) p-tv_nsec = x; } -static inline cputime_t timespec_to_cputime(const struct timespec *p) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *p) { u64 ct; diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h index b91e960..1266697 100644 --- a/arch/s390/include/asm/cputime.h +++ b/arch/s390/include/asm/cputime.h @@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int s) } /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -static inline cputime_t timespec_to_cputime(const struct timespec *value) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *value) { unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC; return (__force cputime_t)(ret + __div(value-tv_nsec * CPUTIME_PER_USEC, NSEC_PER_USEC)); } -static inline void cputime_to_timespec(const cputime_t cputime, - struct timespec *value) +static inline void cputime_to_timespec64(const cputime_t cputime, + struct timespec64 *value) { unsigned long long __cputime = (__force unsigned long long) cputime; #ifndef CONFIG_64BIT diff --git a/include/asm-generic/cputime_jiffies.h b/include/asm-generic/cputime_jiffies.h index fe386fc..54e034c 100644 --- a/include/asm-generic/cputime_jiffies.h +++ b/include/asm-generic/cputime_jiffies.h @@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t; #define secs_to_cputime(sec) jiffies_to_cputime((sec) * HZ) /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -#define timespec_to_cputime(__val) \ - jiffies_to_cputime(timespec_to_jiffies(__val)) -#define cputime_to_timespec(__ct,__val)\ - jiffies_to_timespec(cputime_to_jiffies(__ct),__val) +#define timespec64_to_cputime(__val) \ + jiffies_to_cputime(timespec64_to_jiffies(__val)) +#define cputime_to_timespec64(__ct,__val) \ + jiffies_to_timespec64(cputime_to_jiffies(__ct),__val) /* * Convert cputime to timeval and back. diff --git a/include/asm-generic/cputime_nsecs.h b/include/asm-generic/cputime_nsecs.h index 0419485..65c875b 100644 --- a/include/asm-generic/cputime_nsecs.h +++ b/include/asm-generic/cputime_nsecs.h @@ -73,12 +73,12 @@ typedef u64 __nocast cputime64_t; /* * Convert cputime - timespec (nsec) */ -static inline cputime_t timespec_to_cputime(const struct timespec *val) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *val) { u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec; return (__force cputime_t) ret; } -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *val) { u32 rem; diff --git a/include/linux/cputime.h b/include/linux/cputime.h index f2eb2ee..f01896f 100644 --- a/include/linux/cputime.h +++ b/include/linux/cputime.h @@ -13,4 +13,19 @@ usecs_to_cputime((__nsecs) / NSEC_PER_USEC) #endif +static inline cputime_t timespec_to_cputime(const struct timespec *ts) +{ + struct timespec64 ts64 = timespec_to_timespec64(*ts); + return timespec64_to_cputime(ts64); +} + +static inline void cputime_to_timespec(const cputime_t cputime, + struct timespec *value) +{ + struct timespec64 *ts64
[PATCH v2 00/15] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time type (timespec/itimerspec) to the 64-bit one (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. Also introduces some new functions with timespec64/itimerspec64 type, like current_kernel_time64(), hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime(). Changes since V1: -Split some patch into small patch. -Change the methods for converting the syscall and add some default function for new 64bit methods for syscall function. -Introduce the new function do_sys_settimeofday64() and move do_sys_settimeofday() function to head file. -Modify the EXPORT_SYMPOL issue. -Add new 64bit methods in cputime_nsecs.h file. -Modify some patch logs. Baolin Wang (15): linux/time64.h:Introduce the 'struct itimerspec64' for 64bit timekeeping:Introduce the current_kernel_time64() function with timespec64 type time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure posix-timers:Split out the guts of the syscall and change the implementation posix-timers:Convert to the 64bit methods for the syscall function time:Introduce the do_sys_settimeofday64() function with timespec64 type time/posix-timers:Convert to the 64bit methods for k_clock callback functions char/mmtimer:Convert to the 64bit methods for k_clock callback function time/alarmtimer:Convert to the new methods for k_clock structure time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure time/time:Introduce the timespec64_to_jiffies/jiffies_to_timespec64 function cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure k_clock:Remove the 32bit methods with timespec/itimerspec type arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 +++-- drivers/ptp/ptp_clock.c | 26 +--- include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |4 +- include/linux/cputime.h | 15 ++ include/linux/hrtimer.h | 12 +- include/linux/jiffies.h | 21 ++- include/linux/posix-clock.h | 10 +- include/linux/posix-timers.h | 18 +-- include/linux/time64.h| 35 + include/linux/timekeeping.h | 26 +++- kernel/time/alarmtimer.c | 43 +++--- kernel/time/hrtimer.c | 10 +- kernel/time/posix-clock.c | 20 +-- kernel/time/posix-cpu-timers.c| 83 +- kernel/time/posix-timers.c| 269 ++--- kernel/time/time.c| 22 +-- kernel/time/timekeeping.c |6 +- kernel/time/timekeeping.h |2 +- 21 files changed, 428 insertions(+), 254 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Y2038] [PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions
On 21 April 2015 at 16:45, Arnd Bergmann a...@arndb.de wrote: On Tuesday 21 April 2015 16:36:13 Baolin Wang wrote: On 21 April 2015 at 04:48, Thomas Gleixner t...@linutronix.de wrote: On Mon, 20 Apr 2015, Baolin Wang wrote: /* Set clock_realtime */ static int posix_clock_realtime_set(const clockid_t which_clock, - const struct timespec *tp) + const struct timespec64 *tp) { - return do_sys_settimeofday(tp, NULL); + struct timespec ts = timespec64_to_timespec(*tp); + + return do_sys_settimeofday(ts, NULL); Sigh. No. We first provide a proper function for this, which takes a timespec64, i.e. do_sys_settimeofday64() instead of having this wrapper mess all over the place. Thanks for your comments,but if use do_sys_settimeofday64() here that will introduce a security bug: do_sys_settimeofday contains a capability check that normally prevents non-root users from setting the time. With your change, any user can set the system time. He was asking for a new do_sys_settimeofday64 function to be added, not using the low-level do_settimeofday64. Arnd Sorry for the misunderstand, i'll fix that in next patch. Thanks. -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
On 21 April 2015 at 03:14, Thomas Gleixner t...@linutronix.de wrote: On Mon, 20 Apr 2015, Baolin Wang wrote: This patch introduces the 'struct itimerspec64' for 64bit to replace itimerspec, and also introduces the conversion methods: itimerspec64_to_itimerspec() and itimerspec_to_itimerspec64(), that makes itimerspec to ready for 2038 year. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- include/linux/time64.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/linux/time64.h b/include/linux/time64.h index a383147..3647bdd 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -18,6 +18,11 @@ struct timespec64 { }; #endif +struct itimerspec64 { + struct timespec64 it_interval; /* timer period */ + struct timespec64 it_value; /* timer expiration */ +}; + /* Parameters used to convert the timespec values: */ #define MSEC_PER_SEC 1000L #define USEC_PER_MSEC1000L @@ -187,4 +192,12 @@ static __always_inline void timespec64_add_ns(struct timespec64 *a, u64 ns) #endif +#define itimerspec64_to_itimerspec(its64) \ +#define itimerspec_to_itimerspec64(its) \ 1.) Make these static inlines please. These macros are not typesafe. 2.) Use pointers to the input value. Thanks. tglx Thanks for your comments, i'll fix in next patch. -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions
On 21 April 2015 at 04:48, Thomas Gleixner t...@linutronix.de wrote: On Mon, 20 Apr 2015, Baolin Wang wrote: /* Set clock_realtime */ static int posix_clock_realtime_set(const clockid_t which_clock, - const struct timespec *tp) + const struct timespec64 *tp) { - return do_sys_settimeofday(tp, NULL); + struct timespec ts = timespec64_to_timespec(*tp); + + return do_sys_settimeofday(ts, NULL); Sigh. No. We first provide a proper function for this, which takes a timespec64, i.e. do_sys_settimeofday64() instead of having this wrapper mess all over the place. Thanks for your comments,but if use do_sys_settimeofday64() here that will introduce a security bug: do_sys_settimeofday contains a capability check that normally prevents non-root users from setting the time. With your change, any user can set the system time. /* SIGEV_NONE timers are not queued ! See common_timer_get */ if (((timr-it_sigev_notify ~SIGEV_THREAD_ID) == SIGEV_NONE)) { diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h index 1d91416..144af14 100644 --- a/kernel/time/timekeeping.h +++ b/kernel/time/timekeeping.h @@ -15,7 +15,7 @@ extern u64 timekeeping_max_deferment(void); extern int timekeeping_inject_offset(struct timespec *ts); extern s32 timekeeping_get_tai_offset(void); extern void timekeeping_set_tai_offset(s32 tai_offset); -extern void timekeeping_clocktai(struct timespec *ts); +extern void timekeeping_clocktai(struct timespec64 *ts); # git grep timekeeping_clocktai() is your friend. Thanks, tglx -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
On 20 April 2015 at 17:49, Sergei Shtylyov sergei.shtyl...@cogentembedded.com wrote: Hello. On 4/20/2015 8:57 AM, Baolin Wang wrote: This patch introduces the 'struct itimerspec64' for 64bit to replace itimerspec, and also introduces the conversion methods: itimerspec64_to_itimerspec() and itimerspec_to_itimerspec64(), that makes itimerspec to ready for 2038 year. To not needed here. Signed-off-by: Baolin Wang baolin.w...@linaro.org [...] WBR, Sergei Hi Sergei, Sorry for the mistake. Thank you for your comments. I'll fix that in next patch. -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 11/11] k_clock:Remove the 32bit methods with timespec type
On 20 April 2015 at 16:42, Richard Cochran richardcoch...@gmail.com wrote: On Mon, Apr 20, 2015 at 01:57:39PM +0800, Baolin Wang wrote: @@ -911,18 +907,14 @@ retry: return -EINVAL; kc = clockid_to_kclock(timr-it_clock); - if (WARN_ON_ONCE(!kc || (!kc-timer_set !kc-timer_set64))) { + if (WARN_ON_ONCE(!kc || !kc-timer_set64)) { error = -EINVAL; } else { - if (kc-timer_set64) { - new_spec64 = itimerspec_to_itimerspec64(new_spec); - error = kc-timer_set64(timr, flags, new_spec64, - old_spec64); - if (old_setting) - old_spec = itimerspec64_to_itimerspec(old_spec64); - } else { - error = kc-timer_set(timr, flags, new_spec, rtn); - } + new_spec64 = itimerspec_to_itimerspec64(new_spec); + error = kc-timer_set64(timr, flags, new_spec64, + old_spec64); This statement can fit on one line. + if (old_setting) + old_spec = itimerspec64_to_itimerspec(old_spec64); } unlock_timer(timr, flag); @@ -1057,14 +1045,13 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock, if (!kc) return -EINVAL; - if (kc-clock_get64) { - error = kc-clock_get64(which_clock, kernel_tp64); - kernel_tp = timespec64_to_timespec(kernel_tp64); - } else { - error = kc-clock_get(which_clock, kernel_tp); - } + error = kc-clock_get64(which_clock, kernel_tp64); + if (!error) + return error; Wrong test, should be: if (error) ... + + kernel_tp = timespec64_to_timespec(kernel_tp64); - if (!error copy_to_user(tp, kernel_tp, sizeof (kernel_tp))) The (!error ...) was correct here! + if (copy_to_user(tp, kernel_tp, sizeof (kernel_tp))) error = -EFAULT; return error; You can simplify this like so: return copy_to_user(tp, kernel_tp, sizeof(kernel_tp)) ? -EFAULT : 0; @@ -1104,14 +1091,13 @@ SYSCALL_DEFINE2(clock_getres, const clockid_t, which_clock, if (!kc) return -EINVAL; - if (kc-clock_getres64) { - error = kc-clock_getres64(which_clock, rtn_tp64); - rtn_tp = timespec64_to_timespec(rtn_tp64); - } else { - error = kc-clock_getres(which_clock, rtn_tp); - } + error = kc-clock_getres64(which_clock, rtn_tp64); + if (!error) + return error; Also wrong. + + rtn_tp = timespec64_to_timespec(rtn_tp64); - if (!error tp copy_to_user(tp, rtn_tp, sizeof (rtn_tp))) + if (tp copy_to_user(tp, rtn_tp, sizeof (rtn_tp))) error = -EFAULT; return error; -- 1.7.9.5 Thanks, Richard Thanks for your comments, i'll fix these mistakes in next patch series. -- Baolin.wang Best Regards ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 00/11] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time type (timespec/itimerspec) to the 64-bit one (timespec64/itimerspec64), since 32-bit time types will break in the year 2038. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. Also introduces some new functions with timespec64/itimerspec64 type, like current_kernel_time64(), hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime(). Baolin Wang (11): linux/time64.h:Introduce the 'struct itimerspec64' for 64bit timekeeping:Introduce the current_kernel_time64() function with timespec64 type time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure time/posix-timers:Convert to the 64bit methods for k_clock callback functions char/mmtimer:Convert to the 64bit methods for k_clock callback function time/alarmtimer:Convert to the new methods for k_clock structure time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure k_clock:Remove the 32bit methods with timespec type arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 drivers/ptp/ptp_clock.c | 26 ++ include/asm-generic/cputime_jiffies.h | 10 +-- include/linux/cputime.h | 15 include/linux/hrtimer.h | 12 ++- include/linux/jiffies.h |3 + include/linux/posix-clock.h | 10 +-- include/linux/posix-timers.h | 18 ++-- include/linux/time64.h| 13 +++ include/linux/timekeeping.h | 14 ++- kernel/time/alarmtimer.c | 43 - kernel/time/hrtimer.c | 10 +-- kernel/time/posix-clock.c | 20 ++--- kernel/time/posix-cpu-timers.c| 83 + kernel/time/posix-timers.c| 157 +++-- kernel/time/time.c| 21 + kernel/time/timekeeping.c |6 +- kernel/time/timekeeping.h |2 +- 20 files changed, 302 insertions(+), 211 deletions(-) -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
This patch introduces the 'struct itimerspec64' for 64bit to replace itimerspec, and also introduces the conversion methods: itimerspec64_to_itimerspec() and itimerspec_to_itimerspec64(), that makes itimerspec to ready for 2038 year. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- include/linux/time64.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/linux/time64.h b/include/linux/time64.h index a383147..3647bdd 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -18,6 +18,11 @@ struct timespec64 { }; #endif +struct itimerspec64 { + struct timespec64 it_interval; /* timer period */ + struct timespec64 it_value; /* timer expiration */ +}; + /* Parameters used to convert the timespec values: */ #define MSEC_PER_SEC 1000L #define USEC_PER_MSEC 1000L @@ -187,4 +192,12 @@ static __always_inline void timespec64_add_ns(struct timespec64 *a, u64 ns) #endif +#define itimerspec64_to_itimerspec(its64) \ + ({ (struct itimerspec){ .it_interval = timespec64_to_timespec((its64).it_interval), \ + .it_value = timespec64_to_timespec((its64).it_value) }; }) + +#define itimerspec_to_itimerspec64(its) \ + ({ (struct itimerspec64){ .it_interval = timespec_to_timespec64((its).it_interval), \ + .it_value = timespec_to_timespec64((its).it_value) }; }) + #endif /* _LINUX_TIME64_H */ -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 02/11] timekeeping:Introduce the current_kernel_time64() function with timespec64 type
This patch adds current_kernel_time64() function with timespec64 type, and makes current_kernel_time() 'static inline' and moves it to timekeeping.h file. It is convenient for user to get the current kernel time with timespec64 type, and delete the current_kernel_time() function easily in timekeeping.h file. That is ready for 2038 when get the current time. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- include/linux/timekeeping.h | 10 +- kernel/time/timekeeping.c |6 +++--- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index 3eaae47..c6d5ae9 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -18,10 +18,18 @@ extern int do_sys_settimeofday(const struct timespec *tv, * Kernel time accessors */ unsigned long get_seconds(void); -struct timespec current_kernel_time(void); +struct timespec64 current_kernel_time64(void); /* does not take xtime_lock */ struct timespec __current_kernel_time(void); +static inline struct timespec current_kernel_time(void) +{ + struct timespec64 now; + + now = current_kernel_time64(); + return timespec64_to_timespec(now); +} + /* * timespec based interfaces */ diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 91db941..8ccc02c 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1721,7 +1721,7 @@ struct timespec __current_kernel_time(void) return timespec64_to_timespec(tk_xtime(tk)); } -struct timespec current_kernel_time(void) +struct timespec64 current_kernel_time64(void) { struct timekeeper *tk = tk_core.timekeeper; struct timespec64 now; @@ -1733,9 +1733,9 @@ struct timespec current_kernel_time(void) now = tk_xtime(tk); } while (read_seqcount_retry(tk_core.seq, seq)); - return timespec64_to_timespec(now); + return now; } -EXPORT_SYMBOL(current_kernel_time); +EXPORT_SYMBOL(current_kernel_time64); struct timespec64 get_monotonic_coarse64(void) { -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
This patch introduces the new methods with timespec64 type for k_clcok structure, converts the timepsec type to timespec64 type in k_clock structure and converts the itimerspec type to itimerspec64 type to ready for 2038 issue. And also introduces the 64bit methods with timespec64 type for the framework functions. Next step will migrate all the k_clock users to use the new methods with timespec64 type nd itimerspec64 type, and it contains the files of posix-timers.c, mmtimer.c, alarmtimer.c, posix-clock.c and posix-cpu-timers.c. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- include/linux/posix-timers.h |9 ++ kernel/time/posix-timers.c | 65 -- 2 files changed, 59 insertions(+), 15 deletions(-) diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 907f3fd..35786c5 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -98,9 +98,13 @@ struct k_itimer { struct k_clock { int (*clock_getres) (const clockid_t which_clock, struct timespec *tp); + int (*clock_getres64) (const clockid_t which_clock, struct timespec64 *tp); int (*clock_set) (const clockid_t which_clock, const struct timespec *tp); + int (*clock_set64) (const clockid_t which_clock, + const struct timespec64 *tp); int (*clock_get) (const clockid_t which_clock, struct timespec * tp); + int (*clock_get64) (const clockid_t which_clock, struct timespec64 *tp); int (*clock_adj) (const clockid_t which_clock, struct timex *tx); int (*timer_create) (struct k_itimer *timer); int (*nsleep) (const clockid_t which_clock, int flags, @@ -109,10 +113,15 @@ struct k_clock { int (*timer_set) (struct k_itimer * timr, int flags, struct itimerspec * new_setting, struct itimerspec * old_setting); + int (*timer_set64) (struct k_itimer *timr, int flags, + struct itimerspec64 *new_setting, + struct itimerspec64 *old_setting); int (*timer_del) (struct k_itimer * timr); #define TIMER_RETRY 1 void (*timer_get) (struct k_itimer * timr, struct itimerspec * cur_setting); + void (*timer_get64) (struct k_itimer *timr, +struct itimerspec64 *cur_setting); }; extern struct k_clock clock_posix_cpu; diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 31ea01f..9070387 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -522,13 +522,13 @@ void posix_timers_register_clock(const clockid_t clock_id, return; } - if (!new_clock-clock_get) { - printk(KERN_WARNING POSIX clock id %d lacks clock_get()\n, + if (!new_clock-clock_get !new_clock-clock_get64) { + printk(KERN_WARNING POSIX clock id %d lacks clock_get() and clock_get64()\n, clock_id); return; } - if (!new_clock-clock_getres) { - printk(KERN_WARNING POSIX clock id %d lacks clock_getres()\n, + if (!new_clock-clock_getres !new_clock-clock_getres64) { + printk(KERN_WARNING POSIX clock id %d lacks clock_getres() and clock_getres64()\n, clock_id); return; } @@ -579,7 +579,7 @@ static struct k_clock *clockid_to_kclock(const clockid_t id) return (id CLOCKFD_MASK) == CLOCKFD ? clock_posix_dynamic : clock_posix_cpu; - if (id = MAX_CLOCKS || !posix_clocks[id].clock_getres) + if (id = MAX_CLOCKS || (!posix_clocks[id].clock_getres !posix_clocks[id].clock_getres64)) return NULL; return posix_clocks[id]; } @@ -771,6 +771,7 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, struct itimerspec __user *, setting) { struct itimerspec cur_setting; + struct itimerspec64 cur_setting64; struct k_itimer *timr; struct k_clock *kc; unsigned long flags; @@ -781,10 +782,16 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, return -EINVAL; kc = clockid_to_kclock(timr-it_clock); - if (WARN_ON_ONCE(!kc || !kc-timer_get)) + if (WARN_ON_ONCE(!kc || (!kc-timer_get !kc-timer_get64))) { ret = -EINVAL; - else - kc-timer_get(timr, cur_setting); + } else { + if (kc-timer_get64) { + kc-timer_get64(timr, cur_setting64); + cur_setting = itimerspec64_to_itimerspec(cur_setting64); + } else { + kc-timer_get(timr, cur_setting); + } + } unlock_timer(timr, flags); @@ -877,6 +884,7 @@ SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, flags, { struct
[PATCH 03/11] time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution
This patch introduces hrtimer_get_res64() function to get the timer resolution with timespec64 type, and moves the hrtimer_get_res() function into include/linux/hrtimer.h as a 'static inline' helper that just calls hrtimer_get_res64. It is ready for 2038 year when getting the timer resolution by hrtimer_get_res64() function with timespec64 type, and it is convenient to delete the old hrtimer_get_res() function in hrtimer.h file. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- include/linux/hrtimer.h | 12 +++- kernel/time/hrtimer.c | 10 +- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 05f6df1..ee8ed44 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -383,7 +383,17 @@ static inline int hrtimer_restart(struct hrtimer *timer) /* Query timers: */ extern ktime_t hrtimer_get_remaining(const struct hrtimer *timer); -extern int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp); +extern int hrtimer_get_res64(const clockid_t which_clock, +struct timespec64 *tp); + +static inline int hrtimer_get_res(const clockid_t which_clock, + struct timespec *tp) +{ + struct timespec64 *ts64; + + *ts64 = timespec_to_timespec64(*tp); + return hrtimer_get_res64(which_clock, ts64); +} extern ktime_t hrtimer_get_next_event(void); diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index bee0c1f..508d936 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1175,24 +1175,24 @@ void hrtimer_init(struct hrtimer *timer, clockid_t clock_id, EXPORT_SYMBOL_GPL(hrtimer_init); /** - * hrtimer_get_res - get the timer resolution for a clock + * hrtimer_get_res64 - get the timer resolution for a clock * @which_clock: which clock to query - * @tp: pointer to timespec variable to store the resolution + * @tp: pointer to timespec64 variable to store the resolution * * Store the resolution of the clock selected by @which_clock in the * variable pointed to by @tp. */ -int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp) +int hrtimer_get_res64(const clockid_t which_clock, struct timespec64 *tp) { struct hrtimer_cpu_base *cpu_base; int base = hrtimer_clockid_to_base(which_clock); cpu_base = raw_cpu_ptr(hrtimer_bases); - *tp = ktime_to_timespec(cpu_base-clock_base[base].resolution); + *tp = ktime_to_timespec64(cpu_base-clock_base[base].resolution); return 0; } -EXPORT_SYMBOL_GPL(hrtimer_get_res); +EXPORT_SYMBOL_GPL(hrtimer_get_res64); static void __run_hrtimer(struct hrtimer *timer, ktime_t *now) { -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions
This patch converts the timepsec type to timespec64 type, and converts the itimerspec type to itimerspec64 type for the k_clock callback functions. This patch also converts the timespec type to timespec64 type for timekeeping_clocktai() function which is used only in the posix-timers.c file. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- include/linux/timekeeping.h |4 +- kernel/time/posix-timers.c | 102 +++ kernel/time/timekeeping.h |2 +- 3 files changed, 57 insertions(+), 51 deletions(-) diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index c6d5ae9..bd3df93 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -242,9 +242,9 @@ static inline void get_monotonic_boottime64(struct timespec64 *ts) *ts = ktime_to_timespec64(ktime_get_boottime()); } -static inline void timekeeping_clocktai(struct timespec *ts) +static inline void timekeeping_clocktai(struct timespec64 *ts) { - *ts = ktime_to_timespec(ktime_get_clocktai()); + *ts = ktime_to_timespec64(ktime_get_clocktai()); } /* diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 9070387..47d1abf 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -132,9 +132,9 @@ static struct k_clock posix_clocks[MAX_CLOCKS]; static int common_nsleep(const clockid_t, int flags, struct timespec *t, struct timespec __user *rmtp); static int common_timer_create(struct k_itimer *new_timer); -static void common_timer_get(struct k_itimer *, struct itimerspec *); +static void common_timer_get(struct k_itimer *, struct itimerspec64 *); static int common_timer_set(struct k_itimer *, int, - struct itimerspec *, struct itimerspec *); + struct itimerspec64 *, struct itimerspec64 *); static int common_timer_del(struct k_itimer *timer); static enum hrtimer_restart posix_timer_fn(struct hrtimer *data); @@ -203,17 +203,20 @@ static inline void unlock_timer(struct k_itimer *timr, unsigned long flags) } /* Get clock_realtime */ -static int posix_clock_realtime_get(clockid_t which_clock, struct timespec *tp) +static int posix_clock_realtime_get(clockid_t which_clock, + struct timespec64 *tp) { - ktime_get_real_ts(tp); + ktime_get_real_ts64(tp); return 0; } /* Set clock_realtime */ static int posix_clock_realtime_set(const clockid_t which_clock, - const struct timespec *tp) + const struct timespec64 *tp) { - return do_sys_settimeofday(tp, NULL); + struct timespec ts = timespec64_to_timespec(*tp); + + return do_sys_settimeofday(ts, NULL); } static int posix_clock_realtime_adj(const clockid_t which_clock, @@ -225,48 +228,51 @@ static int posix_clock_realtime_adj(const clockid_t which_clock, /* * Get monotonic time for posix timers */ -static int posix_ktime_get_ts(clockid_t which_clock, struct timespec *tp) +static int posix_ktime_get_ts(clockid_t which_clock, struct timespec64 *tp) { - ktime_get_ts(tp); + ktime_get_ts64(tp); return 0; } /* * Get monotonic-raw time for posix timers */ -static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec *tp) +static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec64 *tp) { - getrawmonotonic(tp); + getrawmonotonic64(tp); return 0; } -static int posix_get_realtime_coarse(clockid_t which_clock, struct timespec *tp) +static int posix_get_realtime_coarse(clockid_t which_clock, +struct timespec64 *tp) { - *tp = current_kernel_time(); + *tp = current_kernel_time64(); return 0; } static int posix_get_monotonic_coarse(clockid_t which_clock, - struct timespec *tp) + struct timespec64 *tp) { - *tp = get_monotonic_coarse(); + *tp = get_monotonic_coarse64(); return 0; } -static int posix_get_coarse_res(const clockid_t which_clock, struct timespec *tp) +static int posix_get_coarse_res(const clockid_t which_clock, + struct timespec64 *tp) { - *tp = ktime_to_timespec(KTIME_LOW_RES); + *tp = ktime_to_timespec64(KTIME_LOW_RES); return 0; } -static int posix_get_boottime(const clockid_t which_clock, struct timespec *tp) +static int posix_get_boottime(const clockid_t which_clock, + struct timespec64 *tp) { - get_monotonic_boottime(tp); + get_monotonic_boottime64(tp); return 0; } -static int posix_get_tai(clockid_t which_clock, struct timespec *tp) +static int posix_get_tai(clockid_t which_clock, struct timespec64 *tp) { timekeeping_clocktai(tp); return 0; @@ -278,57
[PATCH 06/11] char/mmtimer:Convert to the 64bit methods for k_clock callback function
This patch converts to the 64bit methods for k_clock callback function, that converts the timespec type to timespec64 type and converts the itimerspec type to itimerspec64 type. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- drivers/char/mmtimer.c | 36 +--- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/drivers/char/mmtimer.c b/drivers/char/mmtimer.c index 3d6c067..213d0bb 100644 --- a/drivers/char/mmtimer.c +++ b/drivers/char/mmtimer.c @@ -478,18 +478,18 @@ static int sgi_clock_period; static struct timespec sgi_clock_offset; static int sgi_clock_period; -static int sgi_clock_get(clockid_t clockid, struct timespec *tp) +static int sgi_clock_get(clockid_t clockid, struct timespec64 *tp) { u64 nsec; nsec = rtc_time() * sgi_clock_period + sgi_clock_offset.tv_nsec; - *tp = ns_to_timespec(nsec); + *tp = ns_to_timespec64(nsec); tp-tv_sec += sgi_clock_offset.tv_sec; return 0; }; -static int sgi_clock_set(const clockid_t clockid, const struct timespec *tp) +static int sgi_clock_set(const clockid_t clockid, const struct timespec64 *tp) { u64 nsec; @@ -657,7 +657,7 @@ static int sgi_timer_del(struct k_itimer *timr) } /* Assumption: it_lock is already held with irq's disabled */ -static void sgi_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting) +static void sgi_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting) { if (timr-it.mmtimer.clock == TIMER_OFF) { @@ -668,14 +668,14 @@ static void sgi_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting) return; } - cur_setting-it_interval = ns_to_timespec(timr-it.mmtimer.incr * sgi_clock_period); - cur_setting-it_value = ns_to_timespec((timr-it.mmtimer.expires - rtc_time()) * sgi_clock_period); + cur_setting-it_interval = ns_to_timespec64(timr-it.mmtimer.incr * sgi_clock_period); + cur_setting-it_value = ns_to_timespec64((timr-it.mmtimer.expires - rtc_time()) * sgi_clock_period); } static int sgi_timer_set(struct k_itimer *timr, int flags, - struct itimerspec * new_setting, - struct itimerspec * old_setting) + struct itimerspec64 *new_setting, + struct itimerspec64 *old_setting) { unsigned long when, period, irqflags; int err = 0; @@ -687,8 +687,8 @@ static int sgi_timer_set(struct k_itimer *timr, int flags, sgi_timer_get(timr, old_setting); sgi_timer_del(timr); - when = timespec_to_ns(new_setting-it_value); - period = timespec_to_ns(new_setting-it_interval); + when = timespec64_to_ns(new_setting-it_value); + period = timespec64_to_ns(new_setting-it_interval); if (when == 0) /* Clear timer */ @@ -699,11 +699,9 @@ static int sgi_timer_set(struct k_itimer *timr, int flags, return -ENOMEM; if (flags TIMER_ABSTIME) { - struct timespec n; unsigned long now; - getnstimeofday(n); - now = timespec_to_ns(n); + now = ktime_get_real_ns(); if (when now) when -= now; else @@ -765,7 +763,7 @@ static int sgi_timer_set(struct k_itimer *timr, int flags, return err; } -static int sgi_clock_getres(const clockid_t which_clock, struct timespec *tp) +static int sgi_clock_getres(const clockid_t which_clock, struct timespec64 *tp) { tp-tv_sec = 0; tp-tv_nsec = sgi_clock_period; @@ -773,13 +771,13 @@ static int sgi_clock_getres(const clockid_t which_clock, struct timespec *tp) } static struct k_clock sgi_clock = { - .clock_set = sgi_clock_set, - .clock_get = sgi_clock_get, - .clock_getres = sgi_clock_getres, + .clock_set64= sgi_clock_set, + .clock_get64= sgi_clock_get, + .clock_getres64 = sgi_clock_getres, .timer_create = sgi_timer_create, - .timer_set = sgi_timer_set, + .timer_set64= sgi_timer_set, .timer_del = sgi_timer_del, - .timer_get = sgi_timer_get + .timer_get64= sgi_timer_get }; /** -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 07/11] time/alarmtimer:Convert to the new methods for k_clock structure
This patch changes to the new methods with timespec64/itimerspec64 type of k_clock structure, and converts the timespec/itimerspec type to timespec64/itimerspec64 typein alarmtimer.c file. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- kernel/time/alarmtimer.c | 43 ++- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index 1b001ed..68186e1 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -489,35 +489,36 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm, /** * alarm_clock_getres - posix getres interface * @which_clock: clockid - * @tp: timespec to fill + * @tp: timespec64 to fill * * Returns the granularity of underlying alarm base clock */ -static int alarm_clock_getres(const clockid_t which_clock, struct timespec *tp) +static int alarm_clock_getres(const clockid_t which_clock, + struct timespec64 *tp) { clockid_t baseid = alarm_bases[clock2alarm(which_clock)].base_clockid; if (!alarmtimer_get_rtcdev()) return -EINVAL; - return hrtimer_get_res(baseid, tp); + return hrtimer_get_res64(baseid, tp); } /** * alarm_clock_get - posix clock_get interface * @which_clock: clockid - * @tp: timespec to fill. + * @tp: timespec64 to fill. * * Provides the underlying alarm base time. */ -static int alarm_clock_get(clockid_t which_clock, struct timespec *tp) +static int alarm_clock_get(clockid_t which_clock, struct timespec64 *tp) { struct alarm_base *base = alarm_bases[clock2alarm(which_clock)]; if (!alarmtimer_get_rtcdev()) return -EINVAL; - *tp = ktime_to_timespec(base-gettime()); + *tp = ktime_to_timespec64(base-gettime()); return 0; } @@ -547,24 +548,24 @@ static int alarm_timer_create(struct k_itimer *new_timer) /** * alarm_timer_get - posix timer_get interface * @new_timer: k_itimer pointer - * @cur_setting: itimerspec data to fill + * @cur_setting: itimerspec64 data to fill * * Copies out the current itimerspec data */ static void alarm_timer_get(struct k_itimer *timr, - struct itimerspec *cur_setting) + struct itimerspec64 *cur_setting) { ktime_t relative_expiry_time = alarm_expires_remaining((timr-it.alarm.alarmtimer)); if (ktime_to_ns(relative_expiry_time) 0) { - cur_setting-it_value = ktime_to_timespec(relative_expiry_time); + cur_setting-it_value = ktime_to_timespec64(relative_expiry_time); } else { cur_setting-it_value.tv_sec = 0; cur_setting-it_value.tv_nsec = 0; } - cur_setting-it_interval = ktime_to_timespec(timr-it.alarm.interval); + cur_setting-it_interval = ktime_to_timespec64(timr-it.alarm.interval); } /** @@ -588,14 +589,14 @@ static int alarm_timer_del(struct k_itimer *timr) * alarm_timer_set - posix timer_set interface * @timr: k_itimer pointer to be deleted * @flags: timer flags - * @new_setting: itimerspec to be used - * @old_setting: itimerspec being replaced + * @new_setting: itimerspec64 to be used + * @old_setting: itimerspec64 being replaced * * Sets the timer to new_setting, and starts the timer. */ static int alarm_timer_set(struct k_itimer *timr, int flags, - struct itimerspec *new_setting, - struct itimerspec *old_setting) + struct itimerspec64 *new_setting, + struct itimerspec64 *old_setting) { ktime_t exp; @@ -613,8 +614,8 @@ static int alarm_timer_set(struct k_itimer *timr, int flags, return TIMER_RETRY; /* start the timer */ - timr-it.alarm.interval = timespec_to_ktime(new_setting-it_interval); - exp = timespec_to_ktime(new_setting-it_value); + timr-it.alarm.interval = timespec64_to_ktime(new_setting-it_interval); + exp = timespec64_to_ktime(new_setting-it_value); /* Convert (if necessary) to absolute time */ if (flags != TIMER_ABSTIME) { ktime_t now; @@ -670,7 +671,7 @@ static int alarmtimer_do_nsleep(struct alarm *alarm, ktime_t absexp) /** - * update_rmtp - Update remaining timespec value + * update_rmtp - Update remaining timespec64 value * @exp: expiration time * @type: timer type * @rmtp: user pointer to remaining timepsec value @@ -824,12 +825,12 @@ static int __init alarmtimer_init(void) int error = 0; int i; struct k_clock alarm_clock = { - .clock_getres = alarm_clock_getres, - .clock_get = alarm_clock_get, + .clock_getres64 = alarm_clock_getres, + .clock_get64= alarm_clock_get, .timer_create
[PATCH 09/11] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function
This patch introduces some functions for converting cputime to timespec64 and back, that repalce the timespec type with timespec64 type, as well as for arch/s390 and arch/powerpc architecture. And these new methods will replace the old cputime_to_timespec/timespec_to_cputime function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime functions are moved to include/linux/cputime.h file for removing conveniently. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- arch/powerpc/include/asm/cputime.h|6 +++--- arch/s390/include/asm/cputime.h |8 include/asm-generic/cputime_jiffies.h | 10 +- include/linux/cputime.h | 15 +++ include/linux/jiffies.h |3 +++ kernel/time/time.c| 21 + 6 files changed, 51 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h index e245255..5dda5c0 100644 --- a/arch/powerpc/include/asm/cputime.h +++ b/arch/powerpc/include/asm/cputime.h @@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long sec) } /* - * Convert cputime - timespec + * Convert cputime - timespec64 */ -static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) +static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 *p) { u64 x = (__force u64) ct; unsigned int frac; @@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p) p-tv_nsec = x; } -static inline cputime_t timespec_to_cputime(const struct timespec *p) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *p) { u64 ct; diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h index b91e960..1266697 100644 --- a/arch/s390/include/asm/cputime.h +++ b/arch/s390/include/asm/cputime.h @@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int s) } /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and back. */ -static inline cputime_t timespec_to_cputime(const struct timespec *value) +static inline cputime_t timespec64_to_cputime(const struct timespec64 *value) { unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC; return (__force cputime_t)(ret + __div(value-tv_nsec * CPUTIME_PER_USEC, NSEC_PER_USEC)); } -static inline void cputime_to_timespec(const cputime_t cputime, - struct timespec *value) +static inline void cputime_to_timespec64(const cputime_t cputime, + struct timespec64 *value) { unsigned long long __cputime = (__force unsigned long long) cputime; #ifndef CONFIG_64BIT diff --git a/include/asm-generic/cputime_jiffies.h b/include/asm-generic/cputime_jiffies.h index fe386fc..ec77c0b 100644 --- a/include/asm-generic/cputime_jiffies.h +++ b/include/asm-generic/cputime_jiffies.h @@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t; #define secs_to_cputime(sec) jiffies_to_cputime((sec) * HZ) /* - * Convert cputime to timespec and back. + * Convert cputime to timespec64 and abck. */ -#define timespec_to_cputime(__val) \ - jiffies_to_cputime(timespec_to_jiffies(__val)) -#define cputime_to_timespec(__ct,__val)\ - jiffies_to_timespec(cputime_to_jiffies(__ct),__val) +#define timespec64_to_cputime(__val) \ + jiffies_to_cputime(timespec64_to_jiffies(__val)) +#define cputime_to_timespec64(__ct,__val) \ + jiffies_to_timespec64(cputime_to_jiffies(__ct),__val) /* * Convert cputime to timeval and back. diff --git a/include/linux/cputime.h b/include/linux/cputime.h index f2eb2ee..f01896f 100644 --- a/include/linux/cputime.h +++ b/include/linux/cputime.h @@ -13,4 +13,19 @@ usecs_to_cputime((__nsecs) / NSEC_PER_USEC) #endif +static inline cputime_t timespec_to_cputime(const struct timespec *ts) +{ + struct timespec64 ts64 = timespec_to_timespec64(*ts); + return timespec64_to_cputime(ts64); +} + +static inline void cputime_to_timespec(const cputime_t cputime, + struct timespec *value) +{ + struct timespec64 *ts64; + + *ts64 = timespec_to_timespec64(*value); + cputime_to_timespec64(cputime, ts64); +} + #endif /* __LINUX_CPUTIME_H */ diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index c367cbd..dbaa4ee 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -293,6 +293,9 @@ extern unsigned long usecs_to_jiffies(const unsigned int u); extern unsigned long timespec_to_jiffies(const struct timespec *value); extern void jiffies_to_timespec(const unsigned long jiffies, struct timespec *value); +extern unsigned long timespec64_to_jiffies(const struct timespec64 *value); +extern void jiffies_to_timespec64(const
[PATCH 08/11] time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure
This patch converts the posix clock operations over to the new methods with timespec64/itimerspec64 type to making them ready for 2038, and it is based on the ptp patch series. And also changes to the 64bit methods for k_clock structure, that converts the timespec/itimerspec type to timespec64/itimerspec64 type. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- drivers/ptp/ptp_clock.c | 26 -- include/linux/posix-clock.h | 10 +- kernel/time/posix-clock.c | 20 ++-- 3 files changed, 23 insertions(+), 33 deletions(-) diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index bee8270..8c086e7 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -97,32 +97,24 @@ static s32 scaled_ppm_to_ppb(long ppm) /* posix clock implementation */ -static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp) { tp-tv_sec = 0; tp-tv_nsec = 1; return 0; } -static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp) +static int ptp_clock_settime(struct posix_clock *pc, + const struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts = timespec_to_timespec64(*tp); - - return ptp-info-settime64(ptp-info, ts); + return ptp-info-settime64(ptp-info, tp); } -static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts; - int err; - - err = ptp-info-gettime64(ptp-info, ts); - if (!err) - *tp = timespec64_to_timespec(ts); - - return err; + return ptp-info-gettime64(ptp-info, tp); } static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) @@ -134,8 +126,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) ops = ptp-info; if (tx-modes ADJ_SETOFFSET) { - struct timespec ts; - ktime_t kt; + struct timespec64 ts; s64 delta; ts.tv_sec = tx-time.tv_sec; @@ -147,8 +138,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) if ((unsigned long) ts.tv_nsec = NSEC_PER_SEC) return -EINVAL; - kt = timespec_to_ktime(ts); - delta = ktime_to_ns(kt); + delta = timespec64_to_ns(ts); err = ops-adjtime(ops, delta); } else if (tx-modes ADJ_FREQUENCY) { s32 ppb = scaled_ppm_to_ppb(tx-freq); diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h index 34c4498..fd7e22c 100644 --- a/include/linux/posix-clock.h +++ b/include/linux/posix-clock.h @@ -59,23 +59,23 @@ struct posix_clock_operations { int (*clock_adjtime)(struct posix_clock *pc, struct timex *tx); - int (*clock_gettime)(struct posix_clock *pc, struct timespec *ts); + int (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts); - int (*clock_getres) (struct posix_clock *pc, struct timespec *ts); + int (*clock_getres)(struct posix_clock *pc, struct timespec64 *ts); int (*clock_settime)(struct posix_clock *pc, - const struct timespec *ts); + const struct timespec64 *ts); int (*timer_create) (struct posix_clock *pc, struct k_itimer *kit); int (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit); void (*timer_gettime)(struct posix_clock *pc, - struct k_itimer *kit, struct itimerspec *tsp); + struct k_itimer *kit, struct itimerspec64 *tsp); int (*timer_settime)(struct posix_clock *pc, struct k_itimer *kit, int flags, - struct itimerspec *tsp, struct itimerspec *old); + struct itimerspec64 *tsp, struct itimerspec64 *old); /* * Optional character device methods: */ diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c index ce033c7..e21e4c1 100644 --- a/kernel/time/posix-clock.c +++ b/kernel/time/posix-clock.c @@ -297,7 +297,7 @@ out: return err; } -static int pc_clock_gettime(clockid_t id, struct timespec *ts) +static int pc_clock_gettime(clockid_t id, struct timespec64 *ts) { struct posix_clock_desc cd; int err; @@ -316,7 +316,7 @@ static int pc_clock_gettime(clockid_t id, struct timespec *ts) return err; } -static int pc_clock_getres(clockid_t id, struct timespec *ts) +static int pc_clock_getres(clockid_t id, struct timespec64 *ts
[PATCH 11/11] k_clock:Remove the 32bit methods with timespec type
All of the k_clock users have been converted to the new methods. This patch removes the older methods with timepsec/itimerspec type. As a result, the k_clock structure is ready for the year 2038. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- include/linux/posix-timers.h |9 -- kernel/time/posix-timers.c | 72 +- 2 files changed, 29 insertions(+), 52 deletions(-) diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 35786c5..7c3dae2 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -97,29 +97,20 @@ struct k_itimer { }; struct k_clock { - int (*clock_getres) (const clockid_t which_clock, struct timespec *tp); int (*clock_getres64) (const clockid_t which_clock, struct timespec64 *tp); - int (*clock_set) (const clockid_t which_clock, - const struct timespec *tp); int (*clock_set64) (const clockid_t which_clock, const struct timespec64 *tp); - int (*clock_get) (const clockid_t which_clock, struct timespec * tp); int (*clock_get64) (const clockid_t which_clock, struct timespec64 *tp); int (*clock_adj) (const clockid_t which_clock, struct timex *tx); int (*timer_create) (struct k_itimer *timer); int (*nsleep) (const clockid_t which_clock, int flags, struct timespec *, struct timespec __user *); long (*nsleep_restart) (struct restart_block *restart_block); - int (*timer_set) (struct k_itimer * timr, int flags, - struct itimerspec * new_setting, - struct itimerspec * old_setting); int (*timer_set64) (struct k_itimer *timr, int flags, struct itimerspec64 *new_setting, struct itimerspec64 *old_setting); int (*timer_del) (struct k_itimer * timr); #define TIMER_RETRY 1 - void (*timer_get) (struct k_itimer * timr, - struct itimerspec * cur_setting); void (*timer_get64) (struct k_itimer *timr, struct itimerspec64 *cur_setting); }; diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 47d1abf..3196ec0 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -528,13 +528,13 @@ void posix_timers_register_clock(const clockid_t clock_id, return; } - if (!new_clock-clock_get !new_clock-clock_get64) { - printk(KERN_WARNING POSIX clock id %d lacks clock_get() and clock_get64()\n, + if (!new_clock-clock_get64) { + printk(KERN_WARNING POSIX clock id %d lacks clock_get64()\n, clock_id); return; } - if (!new_clock-clock_getres !new_clock-clock_getres64) { - printk(KERN_WARNING POSIX clock id %d lacks clock_getres() and clock_getres64()\n, + if (!!new_clock-clock_getres64) { + printk(KERN_WARNING POSIX clock id %d lacks clock_getres64()\n, clock_id); return; } @@ -585,7 +585,7 @@ static struct k_clock *clockid_to_kclock(const clockid_t id) return (id CLOCKFD_MASK) == CLOCKFD ? clock_posix_dynamic : clock_posix_cpu; - if (id = MAX_CLOCKS || (!posix_clocks[id].clock_getres !posix_clocks[id].clock_getres64)) + if (id = MAX_CLOCKS || !posix_clocks[id].clock_getres64) return NULL; return posix_clocks[id]; } @@ -788,15 +788,11 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, return -EINVAL; kc = clockid_to_kclock(timr-it_clock); - if (WARN_ON_ONCE(!kc || (!kc-timer_get !kc-timer_get64))) { + if (WARN_ON_ONCE(!kc || !kc-timer_get64)) { ret = -EINVAL; } else { - if (kc-timer_get64) { - kc-timer_get64(timr, cur_setting64); - cur_setting = itimerspec64_to_itimerspec(cur_setting64); - } else { - kc-timer_get(timr, cur_setting); - } + kc-timer_get64(timr, cur_setting64); + cur_setting = itimerspec64_to_itimerspec(cur_setting64); } unlock_timer(timr, flags); @@ -911,18 +907,14 @@ retry: return -EINVAL; kc = clockid_to_kclock(timr-it_clock); - if (WARN_ON_ONCE(!kc || (!kc-timer_set !kc-timer_set64))) { + if (WARN_ON_ONCE(!kc || !kc-timer_set64)) { error = -EINVAL; } else { - if (kc-timer_set64) { - new_spec64 = itimerspec_to_itimerspec64(new_spec); - error = kc-timer_set64(timr, flags, new_spec64, - old_spec64); - if (old_setting
[PATCH 10/11] time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure
This patch changes to the new methods of k_clock structure with timespec64 type, converts the timespec/itimerspec type to timespec64/itimerspec64 type for the callback function in posix-cpu-timers.c file. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- kernel/time/posix-cpu-timers.c | 83 +--- 1 file changed, 44 insertions(+), 39 deletions(-) diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index 0075da7..51cfead 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -52,7 +52,7 @@ static int check_clock(const clockid_t which_clock) } static inline unsigned long long -timespec_to_sample(const clockid_t which_clock, const struct timespec *tp) +timespec64_to_sample(const clockid_t which_clock, const struct timespec64 *tp) { unsigned long long ret; @@ -60,19 +60,19 @@ timespec_to_sample(const clockid_t which_clock, const struct timespec *tp) if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) { ret = (unsigned long long)tp-tv_sec * NSEC_PER_SEC + tp-tv_nsec; } else { - ret = cputime_to_expires(timespec_to_cputime(tp)); + ret = cputime_to_expires(timespec64_to_cputime(tp)); } return ret; } -static void sample_to_timespec(const clockid_t which_clock, +static void sample_to_timespec64(const clockid_t which_clock, unsigned long long expires, - struct timespec *tp) + struct timespec64 *tp) { if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) - *tp = ns_to_timespec(expires); + *tp = ns_to_timespec64(expires); else - cputime_to_timespec((__force cputime_t)expires, tp); + cputime_to_timespec64((__force cputime_t)expires, tp); } /* @@ -141,7 +141,7 @@ static inline unsigned long long virt_ticks(struct task_struct *p) } static int -posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *tp) +posix_cpu_clock_getres(const clockid_t which_clock, struct timespec64 *tp) { int error = check_clock(which_clock); if (!error) { @@ -160,7 +160,7 @@ posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *tp) } static int -posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp) +posix_cpu_clock_set(const clockid_t which_clock, const struct timespec64 *tp) { /* * You can never reset a CPU clock, but we check for other errors @@ -263,7 +263,7 @@ static int cpu_clock_sample_group(const clockid_t which_clock, static int posix_cpu_clock_get_task(struct task_struct *tsk, const clockid_t which_clock, - struct timespec *tp) + struct timespec64 *tp) { int err = -EINVAL; unsigned long long rtn; @@ -277,13 +277,14 @@ static int posix_cpu_clock_get_task(struct task_struct *tsk, } if (!err) - sample_to_timespec(which_clock, rtn, tp); + sample_to_timespec64(which_clock, rtn, tp); return err; } -static int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *tp) +static int posix_cpu_clock_get(const clockid_t which_clock, + struct timespec64 *tp) { const pid_t pid = CPUCLOCK_PID(which_clock); int err = -EINVAL; @@ -598,7 +599,7 @@ static inline void posix_cpu_timer_kick_nohz(void) { } * and try again. (This happens when the timer is in the middle of firing.) */ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags, - struct itimerspec *new, struct itimerspec *old) + struct itimerspec64 *new, struct itimerspec64 *old) { unsigned long flags; struct sighand_struct *sighand; @@ -608,7 +609,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags, WARN_ON_ONCE(p == NULL); - new_expires = timespec_to_sample(timer-it_clock, new-it_value); + new_expires = timespec64_to_sample(timer-it_clock, new-it_value); /* * Protect against sighand release/switch in exit/exec and p-cpu_timers @@ -669,7 +670,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags, bump_cpu_timer(timer, val); if (val timer-it.cpu.expires) { old_expires = timer-it.cpu.expires - val; - sample_to_timespec(timer-it_clock, + sample_to_timespec64(timer-it_clock, old_expires, old-it_value); } else { @@ -709,7 +710,7 @@ static int posix_cpu_timer_set(struct