Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask

2022-09-21 Thread Baolin Wang




On 9/19/2022 10:13 AM, Mike Kravetz wrote:

During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:
 if (p?d_huge())
 page = follow_huge_p?d();
the second check is of the form:
 if (is_hugepd())
 page = follow_huge_pd().

We can replace these checks, as well as the special handling routines
such as follow_huge_p?d() and follow_huge_pd() with a single routine to
handle hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] 
https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.w...@linux.alibaba.com/

Suggested-by: David Hildenbrand 
Signed-off-by: Mike Kravetz 


LGTM, and works well on my machine. So feel free to add:
Reviewed-by: Baolin Wang 
Tested-by: Baolin Wang 


Re: [PATCH 1/4] hugetlb: skip to end of PT page mapping when pte not present

2022-06-17 Thread Baolin Wang




On 6/18/2022 1:17 AM, Mike Kravetz wrote:

On 06/17/22 10:15, Peter Xu wrote:

Hi, Mike,

On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote:

@@ -6877,6 +6896,39 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
return (pte_t *)pmd;
  }
  
+/*

+ * Return a mask that can be used to update an address to the last huge
+ * page in a page table page mapping size.  Used to skip non-present
+ * page table entries when linearly scanning address ranges.  Architectures
+ * with unique huge page to page table relationships can define their own
+ * version of this routine.
+ */
+unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+   unsigned long hp_size = huge_page_size(h);
+
+   switch (hp_size) {
+   case P4D_SIZE:
+   return PGDIR_SIZE - P4D_SIZE;
+   case PUD_SIZE:
+   return P4D_SIZE - PUD_SIZE;
+   case PMD_SIZE:
+   return PUD_SIZE - PMD_SIZE;
+   default:


Should we add a WARN_ON_ONCE() if it should never trigger?



Sure.  I will add this.


+   break; /* Should never happen */
+   }
+
+   return ~(0UL);
+}
+
+#else
+
+/* See description above.  Architectures can provide their own version. */
+__weak unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+   return ~(0UL);


I'm wondering whether it's better to return 0 rather than ~0 by default.
Could an arch with !CONFIG_ARCH_WANT_GENERAL_HUGETLB wrongly skip some
valid address ranges with ~0, or perhaps I misread?


Thank you, thank you, thank you Peter!

Yes, the 'default' return for hugetlb_mask_last_page() should be 0.  If
there is no 'optimization', we do not want to modify the address so we
want to OR with 0 not ~0.  My bad, I must have been thinking AND instead
of OR.

I will change here as well as in Baolin's patch.


Ah, I also overlooked this. Thanks Peter, and thanks Mike for updating.


[PATCH v4 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-05-11 Thread Baolin Wang
Hi,

Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue, please find
details in each patch. Thanks.

Note: Mike pointed out the huge_ptep_get() will only return the one specific
value, and it would not take into account the dirty or young bits of 
CONT-PTE/PMDs
like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not 
introduced
by this patch set, and will address this issue in another thread [2]. Meanwhile
the uffd for hugetlb case [3] pointed by Gerald also need another patch to 
address.

[1] 
https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/
[2] 
https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/

Changes from v3:
 - Fix building errors for !CONFIG_MMU.

Changes from v2:
 - Collect reviewed tags from Muchun and Mike.
 - Drop the unnecessary casting in hugetlb.c.
 - Fix building errors with adding dummy functions for !CONFIG_HUGETLB_PAGE.

Changes from v1:
 - Add acked tag from Mike.
 - Update some commit message.
 - Add VM_BUG_ON in try_to_unmap() for hugetlb case.
 - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c.

Baolin Wang (3):
  mm: change huge_ptep_clear_flush() to return the original pte
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

 arch/arm64/include/asm/hugetlb.h   |  4 +--
 arch/arm64/mm/hugetlbpage.c| 12 +++-
 arch/ia64/include/asm/hugetlb.h|  5 +--
 arch/mips/include/asm/hugetlb.h|  9 --
 arch/parisc/include/asm/hugetlb.h  |  5 +--
 arch/powerpc/include/asm/hugetlb.h |  9 --
 arch/s390/include/asm/hugetlb.h|  6 ++--
 arch/sh/include/asm/hugetlb.h  |  5 +--
 arch/sparc/include/asm/hugetlb.h   |  5 +--
 include/asm-generic/hugetlb.h  |  4 +--
 include/linux/hugetlb.h| 11 +++
 mm/rmap.c  | 63 --
 12 files changed, 87 insertions(+), 51 deletions(-)

-- 
1.8.3.1



[PATCH v4 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-11 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When migrating a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it and remap it with
a migration pte entry. This is correct for PMD or PUD size hugetlb,
since they always contain only one pmd entry or pud entry in the
page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes. So we will nuke or remap only one pte
or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
not expected for hugetlb migration. The problem is we can still
continue to modify the subpages' data of a hugetlb page during
migrating a hugetlb page, which can cause a serious data consistent
issue, since we did not nuke the page table entry and set a
migration pte for the subpages of a hugetlb page.

To fix this issue, we should change to use huge_ptep_clear_flush()
to nuke a hugetlb page table, and remap it with set_huge_pte_at()
and set_huge_swap_pte_at() when migrating a hugetlb page, which
already considered the CONT-PTE or CONT-PMD size hugetlb.

Signed-off-by: Baolin Wang 
Reviewed-by: Muchun Song 
Reviewed-by: Mike Kravetz 
---
 include/linux/hugetlb.h | 11 +++
 mm/rmap.c   | 24 ++--
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 306d6ef..abde66e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
*mm, unsigned long addr
pte_t *ptep, pte_t pte, unsigned long 
sz)
 {
 }
+
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+   return *ptep;
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+  pte_t *ptep, pte_t pte)
+{
+}
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/rmap.c b/mm/rmap.c
index 94d6b24..4e96daf 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+
+   /* Nuke the hugetlb page table entry */
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+   /* Nuke the page table entry. */
+   pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
 
-   /* Nuke the page table entry. */
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pteval))
folio_mark_dirty(folio);
@@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
pte_t swp_pte;
 
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
   !anon_exclusive, subpage);
if (anon_exclusive &&
page_try_share_anon_rmap(subpage)) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
swp_pte = pte_swp_mksoft_di

[PATCH v4 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-11 Thread Baolin Wang
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 
Reviewed-by: Muchun Song 
---
 arch/arm64/include/asm/hugetlb.h   |  4 ++--
 arch/arm64/mm/hugetlbpage.c| 12 +---
 arch/ia64/include/asm/hugetlb.h|  5 +++--
 arch/mips/include/asm/hugetlb.h|  9 ++---
 arch/parisc/include/asm/hugetlb.h  |  5 +++--
 arch/powerpc/include/asm/hugetlb.h |  9 ++---
 arch/s390/include/asm/hugetlb.h|  6 +++---
 arch/sh/include/asm/hugetlb.h  |  5 +++--
 arch/sparc/include/asm/hugetlb.h   |  5 +++--
 include/asm-generic/hugetlb.h  |  4 ++--
 10 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 1242f71..616b2ca 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+  unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index cbace1c..ca8e65c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
 }
 
-void huge_ptep_clear_flush(struct vm_area_struct *vma,
-  unsigned long addr, pte_t *ptep)
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
 {
size_t pgsize;
int ncontig;
 
-   if (!pte_cont(READ_ONCE(*ptep))) {
-   ptep_clear_flush(vma, addr, ptep);
-   return;
-   }
+   if (!pte_cont(READ_ONCE(*ptep)))
+   return ptep_clear_flush(vma, addr, ptep);
 
ncontig = find_num_contig(vma->vm_mm, addr, ptep, );
-   clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
+   return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
 static int __init hugetlbpage_init(void)
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 7e46ebd..026ead4 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -23,9 +23,10 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
 #define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
+   return *ptep;
 }
 
 #include 
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index c214440..fd69c88 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
+   pte_t pte;
+
/*
 * clear the huge pte entry firstly, so that the other smp threads will
 * not get old pte entry after finishing flush_tlb_page and before
 * setting new huge pte entry
 */
-   huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+   pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
flush_tlb_page(vma, addr);
+   return pte;
 }
 
 #define __HAVE_ARCH_HUGE_PTE_NONE
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index a69cf9e..f7f078c 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -28,9 +28,10 @@ static inline int prepare_hugepage_range(struct file *file,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-sta

[PATCH v4 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-11 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now try_to_unmap() is only passed a hugetlb page in the case
where the hugetlb page is poisoned. Which means now we will unmap
only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb
page, and we can still access other subpages of a CONT-PTE or CONT-PMD
size poisoned hugetlb page, which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

We've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON()
to make sure the passed hugetlb page is poisoned in try_to_unmap().

Signed-off-by: Baolin Wang 
Reviewed-by: Muchun Song 
Reviewed-by: Mike Kravetz 
---
 mm/rmap.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 4e96daf..219e287 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1528,6 +1528,11 @@ static bool try_to_unmap_one(struct folio *folio, struct 
vm_area_struct *vma,
 
if (folio_test_hugetlb(folio)) {
/*
+* The try_to_unmap() is only passed a hugetlb page
+* in the case where the hugetlb page is poisoned.
+*/
+   VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+   /*
 * huge_pmd_unshare may unmap an entire PMD page.
 * There is no way of knowing exactly which PMDs may
 * be cached for this mm, so we must flush them all.
@@ -1562,28 +1567,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-   }
-
-   /*
-* Nuke the page table entry. When having to clear
-* PageAnonExclusive(), we always have to flush.
-*/
-   if (should_defer_flush(mm, flags) && !anon_exclusive) {
/*
-* We clear the PTE but do not flush so potentially
-* a remote CPU could still be writing to the folio.
-* If the entry was previously clean then the
-* architecture must guarantee that a clear->dirty
-* transition on a cached TLB entry is written through
-* and traps if the PTE is unmapped.
+* Nuke the page table entry. When having to clear
+* PageAnonExclusive(), we always have to flush.
 */
-   pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+   if (should_defer_flush(mm, flags) && !anon_exclusive) {
+   /*
+* We clear the PTE but do not flush so 
potentially
+* a remote CPU could still be writing to the 
folio.
+* If the entry was previously clean then the
+* architecture must guarantee that a 
clear->dirty
+* transition on a cached TLB entry is written 
through
+* and traps if the PTE is unmapped.
+*/
+   pteval = ptep_get_and_clear(mm, address, 
pvmw.pte);
 
-   set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-   } else {
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
+   set_tlb_ubc_flush_pending(mm, 
pte_dirty(pteval));
+   } else {
+   pteval = ptep_clear_flush(vma, address, 
pvmw.pte);
+   }
}
 
/*
-- 
1.8.3.1



Re: [PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-10 Thread Baolin Wang




On 5/11/2022 7:28 AM, Andrew Morton wrote:

On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton  
wrote:


+
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+   return ptep_get(ptep);
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+  pte_t *ptep, pte_t pte)
+{
+}
  #endif/* CONFIG_HUGETLB_PAGE */
  


This blows up nommu (arm allnoconfig):

In file included from fs/io_uring.c:71:
./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush':
./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 
'ptep_get' [-Werror=implicit-function-declaration]
  1100 | return ptep_get(ptep);
   |^~~~


huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply
zapped this change.



Well that wasn't a great success.  Doing this instead.  It's pretty
nasty - something nicer would be nicer please.


Thanks for fixing the building issue. I'll look at this to simplify the 
dummy function. Myabe just remove the ptep_get().


diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1097,7 +1097,7 @@ static inline void set_huge_swap_pte_at(struct 
mm_struct *mm, unsigned long addr

 static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep)
 {
-   return ptep_get(ptep);
+   return *ptep;
 }



--- 
a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix
+++ a/include/linux/hugetlb.h
@@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at(
  {
  }
  
+#ifdef CONFIG_MMU

  static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep)
  {
@@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc
   pte_t *ptep, pte_t pte)
  {
  }
+#endif
  #endif/* CONFIG_HUGETLB_PAGE */
  
  static inline spinlock_t *huge_pte_lock(struct hstate *h,

_


Re: [PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-05-09 Thread Baolin Wang




On 5/10/2022 12:04 PM, Andrew Morton wrote:

On Tue, 10 May 2022 11:45:57 +0800 Baolin Wang  
wrote:


Hi,

Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page,


It would be helpful to describe why it's wrong.  Something like "should
use huge_ptep_clear_flush() and huge_ptep_clear_flush() for this
purpose"?


Sorry for the confusing description. I described the problem explicitly 
in each patch's commit message.


https://lore.kernel.org/all/ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.w...@linux.alibaba.com/
https://lore.kernel.org/all/730ea4b6d292f32fb10b7a4e87dad49b0eb30474.1652147571.git.baolin.w...@linux.alibaba.com/




which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue, please find
details in each patch. Thanks.


Is a cc:stable needed here?  And are we able to identify a target for a
Fixes: tag?


I think need a cc:stable tag, however I am not sure the target fixes 
tag, since we should trace back to the introduction of CONT-PTE/PMD 
hugetlb? 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")


[PATCH v3 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now try_to_unmap() is only passed a hugetlb page in the case
where the hugetlb page is poisoned. Which means now we will unmap
only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb
page, and we can still access other subpages of a CONT-PTE or CONT-PMD
size poisoned hugetlb page, which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

We've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON()
to make sure the passed hugetlb page is poisoned in try_to_unmap().

Signed-off-by: Baolin Wang 
Reviewed-by: Muchun Song 
Reviewed-by: Mike Kravetz 
---
 mm/rmap.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 4e96daf..219e287 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1528,6 +1528,11 @@ static bool try_to_unmap_one(struct folio *folio, struct 
vm_area_struct *vma,
 
if (folio_test_hugetlb(folio)) {
/*
+* The try_to_unmap() is only passed a hugetlb page
+* in the case where the hugetlb page is poisoned.
+*/
+   VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+   /*
 * huge_pmd_unshare may unmap an entire PMD page.
 * There is no way of knowing exactly which PMDs may
 * be cached for this mm, so we must flush them all.
@@ -1562,28 +1567,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-   }
-
-   /*
-* Nuke the page table entry. When having to clear
-* PageAnonExclusive(), we always have to flush.
-*/
-   if (should_defer_flush(mm, flags) && !anon_exclusive) {
/*
-* We clear the PTE but do not flush so potentially
-* a remote CPU could still be writing to the folio.
-* If the entry was previously clean then the
-* architecture must guarantee that a clear->dirty
-* transition on a cached TLB entry is written through
-* and traps if the PTE is unmapped.
+* Nuke the page table entry. When having to clear
+* PageAnonExclusive(), we always have to flush.
 */
-   pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+   if (should_defer_flush(mm, flags) && !anon_exclusive) {
+   /*
+* We clear the PTE but do not flush so 
potentially
+* a remote CPU could still be writing to the 
folio.
+* If the entry was previously clean then the
+* architecture must guarantee that a 
clear->dirty
+* transition on a cached TLB entry is written 
through
+* and traps if the PTE is unmapped.
+*/
+   pteval = ptep_get_and_clear(mm, address, 
pvmw.pte);
 
-   set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-   } else {
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
+   set_tlb_ubc_flush_pending(mm, 
pte_dirty(pteval));
+   } else {
+   pteval = ptep_clear_flush(vma, address, 
pvmw.pte);
+   }
}
 
/*
-- 
1.8.3.1



[PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-09 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When migrating a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it and remap it with
a migration pte entry. This is correct for PMD or PUD size hugetlb,
since they always contain only one pmd entry or pud entry in the
page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes. So we will nuke or remap only one pte
or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
not expected for hugetlb migration. The problem is we can still
continue to modify the subpages' data of a hugetlb page during
migrating a hugetlb page, which can cause a serious data consistent
issue, since we did not nuke the page table entry and set a
migration pte for the subpages of a hugetlb page.

To fix this issue, we should change to use huge_ptep_clear_flush()
to nuke a hugetlb page table, and remap it with set_huge_pte_at()
and set_huge_swap_pte_at() when migrating a hugetlb page, which
already considered the CONT-PTE or CONT-PMD size hugetlb.

Signed-off-by: Baolin Wang 
Reviewed-by: Muchun Song 
Reviewed-by: Mike Kravetz 
---
 include/linux/hugetlb.h | 11 +++
 mm/rmap.c   | 24 ++--
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 306d6ef..9f71043 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
*mm, unsigned long addr
pte_t *ptep, pte_t pte, unsigned long 
sz)
 {
 }
+
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+   return ptep_get(ptep);
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+  pte_t *ptep, pte_t pte)
+{
+}
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/rmap.c b/mm/rmap.c
index 94d6b24..4e96daf 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+
+   /* Nuke the hugetlb page table entry */
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+   /* Nuke the page table entry. */
+   pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
 
-   /* Nuke the page table entry. */
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pteval))
folio_mark_dirty(folio);
@@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
pte_t swp_pte;
 
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
   !anon_exclusive, subpage);
if (anon_exclusive &&
page_try_share_anon_rmap(subpage)) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
swp_pte = pte_swp_mksoft_di

[PATCH v3 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Baolin Wang
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 
Reviewed-by: Muchun Song 
---
 arch/arm64/include/asm/hugetlb.h   |  4 ++--
 arch/arm64/mm/hugetlbpage.c| 12 +---
 arch/ia64/include/asm/hugetlb.h|  4 ++--
 arch/mips/include/asm/hugetlb.h|  9 ++---
 arch/parisc/include/asm/hugetlb.h  |  4 ++--
 arch/powerpc/include/asm/hugetlb.h |  9 ++---
 arch/s390/include/asm/hugetlb.h|  6 +++---
 arch/sh/include/asm/hugetlb.h  |  4 ++--
 arch/sparc/include/asm/hugetlb.h   |  4 ++--
 include/asm-generic/hugetlb.h  |  4 ++--
 10 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 1242f71..616b2ca 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+  unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index cbace1c..ca8e65c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
 }
 
-void huge_ptep_clear_flush(struct vm_area_struct *vma,
-  unsigned long addr, pte_t *ptep)
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
 {
size_t pgsize;
int ncontig;
 
-   if (!pte_cont(READ_ONCE(*ptep))) {
-   ptep_clear_flush(vma, addr, ptep);
-   return;
-   }
+   if (!pte_cont(READ_ONCE(*ptep)))
+   return ptep_clear_flush(vma, addr, ptep);
 
ncontig = find_num_contig(vma->vm_mm, addr, ptep, );
-   clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
+   return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
 static int __init hugetlbpage_init(void)
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 7e46ebd..65d3811 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
 #define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
 }
 
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index c214440..fd69c88 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
+   pte_t pte;
+
/*
 * clear the huge pte entry firstly, so that the other smp threads will
 * not get old pte entry after finishing flush_tlb_page and before
 * setting new huge pte entry
 */
-   huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+   pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
flush_tlb_page(vma, addr);
+   return pte;
 }
 
 #define __HAVE_ARCH_HUGE_PTE_NONE
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index a69cf9e..25bc560 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_cle

[PATCH v3 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-05-09 Thread Baolin Wang
Hi,

Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue, please find
details in each patch. Thanks.

Note: Mike pointed out the huge_ptep_get() will only return the one specific
value, and it would not take into account the dirty or young bits of 
CONT-PTE/PMDs
like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not 
introduced
by this patch set, and will address this issue in another thread [2]. Meanwhile
the uffd for hugetlb case [3] pointed by Gerald also need another patch to 
address.

[1] 
https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/
[2] 
https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/

Changes from v2:
 - Collect reviewed tags from Muchun and Mike.
 - Drop the unnecessary casting in hugetlb.c.
 - Fix building errors with adding dummy functions for !CONFIG_HUGETLB_PAGE.

Changes from v1:
 - Add acked tag from Mike.
 - Update some commit message.
 - Add VM_BUG_ON in try_to_unmap() for hugetlb case.
 - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c.

Baolin Wang (3):
  mm: change huge_ptep_clear_flush() to return the original pte
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

 arch/arm64/include/asm/hugetlb.h   |  4 +--
 arch/arm64/mm/hugetlbpage.c| 12 +++-
 arch/ia64/include/asm/hugetlb.h|  4 +--
 arch/mips/include/asm/hugetlb.h|  9 --
 arch/parisc/include/asm/hugetlb.h  |  4 +--
 arch/powerpc/include/asm/hugetlb.h |  9 --
 arch/s390/include/asm/hugetlb.h|  6 ++--
 arch/sh/include/asm/hugetlb.h  |  4 +--
 arch/sparc/include/asm/hugetlb.h   |  4 +--
 include/asm-generic/hugetlb.h  |  4 +--
 include/linux/hugetlb.h| 11 +++
 mm/rmap.c  | 63 --
 12 files changed, 83 insertions(+), 51 deletions(-)

-- 
1.8.3.1



Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Baolin Wang




On 5/10/2022 4:02 AM, Mike Kravetz wrote:

On 5/9/22 01:46, Baolin Wang wrote:



On 5/9/2022 1:46 PM, Christophe Leroy wrote:



Le 08/05/2022 à 15:09, Baolin Wang a écrit :



On 5/8/2022 7:09 PM, Muchun Song wrote:

On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote:

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the
huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 


Reviewed-by: Muchun Song 


Thanks for reviewing.



But one nit below:

[...]

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8605d7e..61a21af 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct
*mm, struct vm_area_struct *vma,
    ClearHPageRestoreReserve(new_page);
    /* Break COW or unshare */
-    huge_ptep_clear_flush(vma, haddr, ptep);
+    (void)huge_ptep_clear_flush(vma, haddr, ptep);


Why add a "(void)" here? Is there any warning if no "(void)"?
IIUC, I think we can remove this, right?


I did not meet any warning without the casting, but this is per Mike's
comment[1] to make the code consistent with other functions casting to
void type explicitly in hugetlb.c file.

[1]
https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/



As far as I understand, Mike said that you should be accompagnied with a
big fat comment explaining why we ignore the returned value from
huge_ptep_clear_flush(). >
By the way huge_ptep_clear_flush() is not declared 'must_check' so this
cast is just visual polution and should be removed.

In the meantime the comment suggested by Mike should be added instead.

Sorry for my misunderstanding. I just follow the explicit void casting like 
other places in hugetlb.c file. And I am not sure if it is useful adding some 
comments like below, since we did not need the original pte value in the COW 
case mapping with a new page, and the code is more readable already I think.

Mike, could you help to clarify what useful comments would you like? and remove 
the explicit void casting? Thanks.



Sorry for the confusion.

In the original commit, it seemed odd to me that the signature of the
function was changing and there was not an associated change to the only
caller of the function.  I did suggest casting to void or adding a comment.
As Christophe mentions, the cast to void is not necessary.  In addition,
there really isn't a need for a comment as the calling code is not changed.


OK. Will drop the casting in next version.



The original version of the commit without either is actually preferable.
The commit message does say this is a preparation patch and the return
value will be used in later patches.


OK. Thanks Mike for making me clear. Also thanks to Muchun and Christophe.


Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-09 Thread Baolin Wang




On 5/10/2022 12:41 AM, Peter Xu wrote:

On Fri, May 06, 2022 at 12:07:13PM -0700, Mike Kravetz wrote:

On 5/3/22 03:03, Gerald Schaefer wrote:

On Tue, 3 May 2022 10:19:46 +0800
Baolin Wang  wrote:

On 5/2/2022 10:02 PM, Gerald Schaefer wrote:


[...]


Please see previous code, we'll use the original pte value to check if
it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs
is set noop_dirty_folio().

pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);


Uh, ok, that wouldn't work on s390, but we also don't have
CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so
I guess we will be fine (for now).

Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed()
would work on a potential hugetlb *pte, directly de-referencing it
instead of using huge_ptep_get().

The !pte_none(*pte) check at the beginning would be broken in the
hugetlb case for s390 (not sure about other archs, but I think s390
might be the only exception strictly requiring huge_ptep_get()
for de-referencing hugetlb *pte pointers).


We could have used is_vm_hugetlb_page(vma) within the helper so as to
properly use either generic pte or hugetlb version of pte fetching.  We may
want to conditionally do set_[huge_]pte_at() too at the end.

I could prepare a patch for that even if it's not really anything urgently
needed. I assume that won't need to block this patchset since we need the
pteval for pte_dirty() check anyway and uffd-wp definitely needs it too.


OK. Thanks Peter.


Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-09 Thread Baolin Wang




On 5/9/2022 1:46 PM, Christophe Leroy wrote:



Le 08/05/2022 à 15:09, Baolin Wang a écrit :



On 5/8/2022 7:09 PM, Muchun Song wrote:

On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote:

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the
huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 


Reviewed-by: Muchun Song 


Thanks for reviewing.



But one nit below:

[...]

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8605d7e..61a21af 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct
*mm, struct vm_area_struct *vma,
   ClearHPageRestoreReserve(new_page);
   /* Break COW or unshare */
-    huge_ptep_clear_flush(vma, haddr, ptep);
+    (void)huge_ptep_clear_flush(vma, haddr, ptep);


Why add a "(void)" here? Is there any warning if no "(void)"?
IIUC, I think we can remove this, right?


I did not meet any warning without the casting, but this is per Mike's
comment[1] to make the code consistent with other functions casting to
void type explicitly in hugetlb.c file.

[1]
https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/



As far as I understand, Mike said that you should be accompagnied with a
big fat comment explaining why we ignore the returned value from
huge_ptep_clear_flush(). >
By the way huge_ptep_clear_flush() is not declared 'must_check' so this
cast is just visual polution and should be removed.

In the meantime the comment suggested by Mike should be added instead.
Sorry for my misunderstanding. I just follow the explicit void casting 
like other places in hugetlb.c file. And I am not sure if it is useful 
adding some comments like below, since we did not need the original pte 
value in the COW case mapping with a new page, and the code is more 
readable already I think.


Mike, could you help to clarify what useful comments would you like? and 
remove the explicit void casting? Thanks.


/*
 * Just ignore the return value with new page mapped.
 */


Re: [PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-08 Thread Baolin Wang

Hi,

On 5/8/2022 8:01 PM, kernel test robot wrote:

Hi Baolin,

I love your patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on next-20220506]
[cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
mm-everything
config: x86_64-randconfig-a013 
(https://download.01.org/0day-ci/archive/20220508/202205081910.mstoc5rj-...@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
 # 
https://github.com/intel-lab-lkp/linux/commit/907981b27213707fdb2f8a24c107d6752a09a773
 git remote add linux-review https://github.com/intel-lab-lkp/linux
 git fetch --no-tags linux-review 
Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220508-174036
 git checkout 907981b27213707fdb2f8a24c107d6752a09a773
 # save the config file
 mkdir build_dir && cp config build_dir/.config
 make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

mm/rmap.c: In function 'try_to_migrate_one':

mm/rmap.c:1931:34: error: implicit declaration of function 
'huge_ptep_clear_flush'; did you mean 'ptep_clear_flush'? 
[-Werror=implicit-function-declaration]

 1931 | pteval = huge_ptep_clear_flush(vma, 
address, pvmw.pte);
  |  ^
  |  ptep_clear_flush

mm/rmap.c:1931:34: error: incompatible types when assigning to type 'pte_t' 
from type 'int'
mm/rmap.c:2023:41: error: implicit declaration of function 'set_huge_pte_at'; 
did you mean 'set_huge_swap_pte_at'? [-Werror=implicit-function-declaration]

 2023 | set_huge_pte_at(mm, 
address, pvmw.pte, pteval);
  | ^~~
  | set_huge_swap_pte_at
cc1: some warnings being treated as errors


Thanks for reporting. I think I should add some dummy functions in 
hugetlb.h file if the CONFIG_HUGETLB_PAGE is not selected. I can pass 
the building with below changes and your config file.


diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 306d6ef..9f71043 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct 
mm_struct *mm, unsigned long addr
pte_t *ptep, pte_t pte, 
unsigned long sz)

 {
 }
+
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+   return ptep_get(ptep);
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long 
addr,

+  pte_t *ptep, pte_t pte)
+{
+}
 #endif /* CONFIG_HUGETLB_PAGE */


Re: [PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-08 Thread Baolin Wang




On 5/8/2022 7:09 PM, Muchun Song wrote:

On Sun, May 08, 2022 at 05:36:39PM +0800, Baolin Wang wrote:

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 


Reviewed-by: Muchun Song 


Thanks for reviewing.



But one nit below:

[...]

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8605d7e..61a21af 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5342,7 +5342,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct 
vm_area_struct *vma,
ClearHPageRestoreReserve(new_page);
  
  		/* Break COW or unshare */

-   huge_ptep_clear_flush(vma, haddr, ptep);
+   (void)huge_ptep_clear_flush(vma, haddr, ptep);


Why add a "(void)" here? Is there any warning if no "(void)"?
IIUC, I think we can remove this, right?


I did not meet any warning without the casting, but this is per Mike's 
comment[1] to make the code consistent with other functions casting to 
void type explicitly in hugetlb.c file.


[1] 
https://lore.kernel.org/all/495c4ebe-a5b4-afb6-4cb0-956c1b18d...@oracle.com/


[PATCH v2 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-08 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When migrating a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it and remap it with
a migration pte entry. This is correct for PMD or PUD size hugetlb,
since they always contain only one pmd entry or pud entry in the
page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes. So we will nuke or remap only one pte
or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
not expected for hugetlb migration. The problem is we can still
continue to modify the subpages' data of a hugetlb page during
migrating a hugetlb page, which can cause a serious data consistent
issue, since we did not nuke the page table entry and set a
migration pte for the subpages of a hugetlb page.

To fix this issue, we should change to use huge_ptep_clear_flush()
to nuke a hugetlb page table, and remap it with set_huge_pte_at()
and set_huge_swap_pte_at() when migrating a hugetlb page, which
already considered the CONT-PTE or CONT-PMD size hugetlb.

Signed-off-by: Baolin Wang 
---
 mm/rmap.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 6fdd198..7cf2408 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+
+   /* Nuke the hugetlb page table entry */
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+   /* Nuke the page table entry. */
+   pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
 
-   /* Nuke the page table entry. */
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pteval))
folio_mark_dirty(folio);
@@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
pte_t swp_pte;
 
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2024,7 +2029,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
   !anon_exclusive, subpage);
if (anon_exclusive &&
page_try_share_anon_rmap(subpage)) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2050,7 +2058,11 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
swp_pte = pte_swp_mksoft_dirty(swp_pte);
if (pte_uffd_wp(pteval))
swp_pte = pte_swp_mkuffd_wp(swp_pte);
-   set_pte_at(mm, address, pvmw.pte, swp_pte);
+   if (folio_test_hugetlb(folio))
+   set_huge_swap_pte_at(mm, address, pvmw.pte,
+swp_pte, 
vma_mmu_pagesize(vma));
+   else
+   set_pte_at(mm, address, pvmw.pte, swp_pte);
trace_set_migration_pte(address, pte_val(swp_pte),
compound_order(>page));
/*
-- 
1.8.3.1



[PATCH v2 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-08 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now try_to_unmap() is only passed a hugetlb page in the case
where the hugetlb page is poisoned. Which means now we will unmap
only one pte entry for a CONT-PTE or CONT-PMD size poisoned hugetlb
page, and we can still access other subpages of a CONT-PTE or CONT-PMD
size poisoned hugetlb page, which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

We've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON()
to make sure the passed hugetlb page is poisoned in try_to_unmap().

Signed-off-by: Baolin Wang 
---
 mm/rmap.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 7cf2408..37c8fd2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1530,6 +1530,11 @@ static bool try_to_unmap_one(struct folio *folio, struct 
vm_area_struct *vma,
 
if (folio_test_hugetlb(folio)) {
/*
+* The try_to_unmap() is only passed a hugetlb page
+* in the case where the hugetlb page is poisoned.
+*/
+   VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+   /*
 * huge_pmd_unshare may unmap an entire PMD page.
 * There is no way of knowing exactly which PMDs may
 * be cached for this mm, so we must flush them all.
@@ -1564,28 +1569,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-   }
-
-   /*
-* Nuke the page table entry. When having to clear
-* PageAnonExclusive(), we always have to flush.
-*/
-   if (should_defer_flush(mm, flags) && !anon_exclusive) {
/*
-* We clear the PTE but do not flush so potentially
-* a remote CPU could still be writing to the folio.
-* If the entry was previously clean then the
-* architecture must guarantee that a clear->dirty
-* transition on a cached TLB entry is written through
-* and traps if the PTE is unmapped.
+* Nuke the page table entry. When having to clear
+* PageAnonExclusive(), we always have to flush.
 */
-   pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+   if (should_defer_flush(mm, flags) && !anon_exclusive) {
+   /*
+* We clear the PTE but do not flush so 
potentially
+* a remote CPU could still be writing to the 
folio.
+* If the entry was previously clean then the
+* architecture must guarantee that a 
clear->dirty
+* transition on a cached TLB entry is written 
through
+* and traps if the PTE is unmapped.
+*/
+   pteval = ptep_get_and_clear(mm, address, 
pvmw.pte);
 
-   set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-   } else {
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
+   set_tlb_ubc_flush_pending(mm, 
pte_dirty(pteval));
+   } else {
+   pteval = ptep_clear_flush(vma, address, 
pvmw.pte);
+   }
}
 
/*
-- 
1.8.3.1



[PATCH v2 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-08 Thread Baolin Wang
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
Acked-by: Mike Kravetz 
---
 arch/arm64/include/asm/hugetlb.h   |  4 ++--
 arch/arm64/mm/hugetlbpage.c| 12 +---
 arch/ia64/include/asm/hugetlb.h|  4 ++--
 arch/mips/include/asm/hugetlb.h|  9 ++---
 arch/parisc/include/asm/hugetlb.h  |  4 ++--
 arch/powerpc/include/asm/hugetlb.h |  9 ++---
 arch/s390/include/asm/hugetlb.h|  6 +++---
 arch/sh/include/asm/hugetlb.h  |  4 ++--
 arch/sparc/include/asm/hugetlb.h   |  4 ++--
 include/asm-generic/hugetlb.h  |  4 ++--
 mm/hugetlb.c   |  2 +-
 11 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 1242f71..616b2ca 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+  unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index cbace1c..ca8e65c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
 }
 
-void huge_ptep_clear_flush(struct vm_area_struct *vma,
-  unsigned long addr, pte_t *ptep)
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
 {
size_t pgsize;
int ncontig;
 
-   if (!pte_cont(READ_ONCE(*ptep))) {
-   ptep_clear_flush(vma, addr, ptep);
-   return;
-   }
+   if (!pte_cont(READ_ONCE(*ptep)))
+   return ptep_clear_flush(vma, addr, ptep);
 
ncontig = find_num_contig(vma->vm_mm, addr, ptep, );
-   clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
+   return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
 static int __init hugetlbpage_init(void)
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 7e46ebd..65d3811 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
 #define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
 }
 
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index c214440..fd69c88 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
+   pte_t pte;
+
/*
 * clear the huge pte entry firstly, so that the other smp threads will
 * not get old pte entry after finishing flush_tlb_page and before
 * setting new huge pte entry
 */
-   huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+   pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
flush_tlb_page(vma, addr);
+   return pte;
 }
 
 #define __HAVE_ARCH_HUGE_PTE_NONE
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index a69cf9e..25bc560 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_p

[PATCH v2 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-05-08 Thread Baolin Wang
Hi,

Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue, please find
details in each patch. Thanks.

Note: Mike pointed out the huge_ptep_get() will only return the one specific
value, and it would not take into account the dirty or young bits of 
CONT-PTE/PMDs
like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not 
introduced
by this patch set, and will address this issue in another thread [2]. Meanwhile
the uffd for hugetlb case [3] pointed by Gerald also need another patch to 
address.

[1] 
https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f...@oracle.com/
[2] 
https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/

Changes from v1:
 - Add acked tag from Mike.
 - Update some commit message.
 - Add VM_BUG_ON in try_to_unmap() for hugetlb case.
 - Add an explict void casting for huge_ptep_clear_flush() in hugetlb.c.

Baolin Wang (3):
  mm: change huge_ptep_clear_flush() to return the original pte
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

 arch/arm64/include/asm/hugetlb.h   |  4 +--
 arch/arm64/mm/hugetlbpage.c| 12 +++-
 arch/ia64/include/asm/hugetlb.h|  4 +--
 arch/mips/include/asm/hugetlb.h|  9 --
 arch/parisc/include/asm/hugetlb.h  |  4 +--
 arch/powerpc/include/asm/hugetlb.h |  9 --
 arch/s390/include/asm/hugetlb.h|  6 ++--
 arch/sh/include/asm/hugetlb.h  |  4 +--
 arch/sparc/include/asm/hugetlb.h   |  4 +--
 include/asm-generic/hugetlb.h  |  4 +--
 mm/hugetlb.c   |  2 +-
 mm/rmap.c  | 63 --
 12 files changed, 73 insertions(+), 52 deletions(-)

-- 
1.8.3.1



Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-08 Thread Baolin Wang




On 5/7/2022 10:33 AM, Baolin Wang wrote:



On 5/7/2022 1:56 AM, Mike Kravetz wrote:

On 5/5/22 20:39, Baolin Wang wrote:


On 5/6/2022 7:53 AM, Mike Kravetz wrote:

On 4/29/22 01:14, Baolin Wang wrote:

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.



diff --git a/mm/rmap.c b/mm/rmap.c
index 6fdd198..7cf2408 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio 
*folio, struct vm_area_struct *vma,

   break;
   }
   }
+
+    /* Nuke the hugetlb page table entry */
+    pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
   } else {
   flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+    /* Nuke the page table entry. */
+    pteval = ptep_clear_flush(vma, address, pvmw.pte);
   }


On arm64 with CONT-PTE/PMD the returned pteval will have dirty or 
young set

if ANY of the PTE/PMDs had dirty or young set.


Right.




-    /* Nuke the page table entry. */
-    pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
   /* Set the dirty flag on the folio now the pte is gone. */
   if (pte_dirty(pteval))
   folio_mark_dirty(folio);
@@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio 
*folio, struct vm_area_struct *vma,

   pte_t swp_pte;
     if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-    set_pte_at(mm, address, pvmw.pte, pteval);
+    if (folio_test_hugetlb(folio))
+    set_huge_pte_at(mm, address, pvmw.pte, pteval);


And, we will use that pteval for ALL the PTE/PMDs here.  So, we 
would set

the dirty or young bit in ALL PTE/PMDs.

Could that cause any issues?  May be more of a question for the 
arm64 people.


I don't think this will cause any issues. Since the hugetlb can not 
be split, and we should not lose the the dirty or young state if any 
subpages were set. Meanwhile we already did like this in hugetlb.c:


pte = huge_ptep_get_and_clear(mm, address, ptep);
tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
if (huge_pte_dirty(pte))
 set_page_dirty(page);



Agree that it 'should not' cause issues.  It just seems inconsistent.
This is not a problem specifically with your patch, just the handling of
CONT-PTE/PMD entries.

There does not appear to be an arm64 specific version of huge_ptep_get()
that takes CONT-PTE/PMD into account.  So, huge_ptep_get() would only
return the one specific value.  It would not take into account the dirty
or young bits of CONT-PTE/PMDs like your new version of
huge_ptep_get_and_clear.  Is that correct?  Or, am I missing something.


Yes, you are right.



If I am correct, then code like the following may not work:

static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
 unsigned long addr, unsigned long end, struct mm_walk 
*walk)

{
 pte_t huge_pte = huge_ptep_get(pte);
 struct numa_maps *md;
 struct page *page;

 if (!pte_present(huge_pte))
 return 0;

 page = pte_page(huge_pte);

 md = walk->private;
 gather_stats(page, md, pte_dirty(huge_pte), 1);
 return 0;
}


Right, this is inconsistent with current huge_ptep_get() interface like 
you said. So I think we can define an ARCH-specific huge_ptep_get() 
interface for arm64, and some sample code like below. How do you think?


After some investigation, I send out a RFC patch set[1] to address this 
issue. We can talk about this issue in that thread. Thanks.


[1] 
https://lore.kernel.org/all/cover.1651998586.git.baolin.w...@linux.alibaba.com/


Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-06 Thread Baolin Wang




On 5/7/2022 1:56 AM, Mike Kravetz wrote:

On 5/5/22 20:39, Baolin Wang wrote:


On 5/6/2022 7:53 AM, Mike Kravetz wrote:

On 4/29/22 01:14, Baolin Wang wrote:

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.



diff --git a/mm/rmap.c b/mm/rmap.c
index 6fdd198..7cf2408 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
   break;
   }
   }
+
+    /* Nuke the hugetlb page table entry */
+    pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
   } else {
   flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+    /* Nuke the page table entry. */
+    pteval = ptep_clear_flush(vma, address, pvmw.pte);
   }
   


On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set
if ANY of the PTE/PMDs had dirty or young set.


Right.




-    /* Nuke the page table entry. */
-    pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
   /* Set the dirty flag on the folio now the pte is gone. */
   if (pte_dirty(pteval))
   folio_mark_dirty(folio);
@@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
   pte_t swp_pte;
     if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-    set_pte_at(mm, address, pvmw.pte, pteval);
+    if (folio_test_hugetlb(folio))
+    set_huge_pte_at(mm, address, pvmw.pte, pteval);


And, we will use that pteval for ALL the PTE/PMDs here.  So, we would set
the dirty or young bit in ALL PTE/PMDs.

Could that cause any issues?  May be more of a question for the arm64 people.


I don't think this will cause any issues. Since the hugetlb can not be split, 
and we should not lose the the dirty or young state if any subpages were set. 
Meanwhile we already did like this in hugetlb.c:

pte = huge_ptep_get_and_clear(mm, address, ptep);
tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
if (huge_pte_dirty(pte))
 set_page_dirty(page);



Agree that it 'should not' cause issues.  It just seems inconsistent.
This is not a problem specifically with your patch, just the handling of
CONT-PTE/PMD entries.

There does not appear to be an arm64 specific version of huge_ptep_get()
that takes CONT-PTE/PMD into account.  So, huge_ptep_get() would only
return the one specific value.  It would not take into account the dirty
or young bits of CONT-PTE/PMDs like your new version of
huge_ptep_get_and_clear.  Is that correct?  Or, am I missing something.


Yes, you are right.



If I am correct, then code like the following may not work:

static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
 unsigned long addr, unsigned long end, struct mm_walk *walk)
{
 pte_t huge_pte = huge_ptep_get(pte);
 struct numa_maps *md;
 struct page *page;

 if (!pte_present(huge_pte))
 return 0;

 page = pte_page(huge_pte);

 md = walk->private;
 gather_stats(page, md, pte_dirty(huge_pte), 1);
 return 0;
}


Right, this is inconsistent with current huge_ptep_get() interface like 
you said. So I think we can define an ARCH-specific huge_ptep_get() 
interface for arm64, and some sample code like below. How do you think?


+pte_t huge_ptep_get(pte_t *ptep, unsigned long size)
+{
+   int ncontig;
+   pte_t orig_pte = ptep_get(ptep);
+
+   if (!pte_cont(orig_pte))
+   return orig_pte;
+
+   switch (size) {
+   case CONT_PMD_SIZE:
+   ncontig = CONT_PMDS;
+   break;
+   case CONT_PTE_SIZE:
+   ncontig = CONT_PTES;
+   break;
+   default:
+   WARN_ON_ONCE(1);
+   return orig_pte;
+   }
+
+   for (i = 0; i < ncontig; i++, ptep++) {
+   pte_t pte = ptep_get(ptep);
+
+   if (pte_dirty(pte))
+   orig_pte = pte_mkdirty(orig_pte);
+
+   if (pte_young(pte))
+   orig_pte = pte_mkyong(orig_pte);
+   }
+
+   return orig_pte;
+}


Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-06 Thread Baolin Wang




On 5/7/2022 2:55 AM, Mike Kravetz wrote:

On 4/29/22 01:14, Baolin Wang wrote:

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now we only use try_to_unmap() to unmap a poisoned hugetlb page,


Since try_to_unmap can be called for non-hugetlb pages, perhaps the following
is more accurate?

try_to_unmap is only passed a hugetlb page in the case where the
hugetlb page is poisoned.


Yes, will update in next version.


It does concern me that this assumption is built into the code as
pointed out in your discussion with Gerald.  Should we perhaps add
a VM_BUG_ON() to make sure the passed huge page is poisoned?  This
would be in the same 'if block' where we call
adjust_range_if_pmd_sharing_possible.

Good point. Will do in next version. Thanks.


Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-05 Thread Baolin Wang




On 5/6/2022 7:53 AM, Mike Kravetz wrote:

On 4/29/22 01:14, Baolin Wang wrote:

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.



diff --git a/mm/rmap.c b/mm/rmap.c
index 6fdd198..7cf2408 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+
+   /* Nuke the hugetlb page table entry */
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+   /* Nuke the page table entry. */
+   pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
  


On arm64 with CONT-PTE/PMD the returned pteval will have dirty or young set
if ANY of the PTE/PMDs had dirty or young set.


Right.




-   /* Nuke the page table entry. */
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pteval))
folio_mark_dirty(folio);
@@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
pte_t swp_pte;
  
  			if (arch_unmap_one(mm, vma, address, pteval) < 0) {

-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);


And, we will use that pteval for ALL the PTE/PMDs here.  So, we would set
the dirty or young bit in ALL PTE/PMDs.

Could that cause any issues?  May be more of a question for the arm64 people.


I don't think this will cause any issues. Since the hugetlb can not be 
split, and we should not lose the the dirty or young state if any 
subpages were set. Meanwhile we already did like this in hugetlb.c:


pte = huge_ptep_get_and_clear(mm, address, ptep);
tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
if (huge_pte_dirty(pte))
set_page_dirty(page);


Re: [PATCH 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-05-05 Thread Baolin Wang




On 5/6/2022 7:15 AM, Mike Kravetz wrote:

On 4/29/22 01:14, Baolin Wang wrote:

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
---
  arch/arm64/include/asm/hugetlb.h   |  4 ++--
  arch/arm64/mm/hugetlbpage.c| 12 +---
  arch/ia64/include/asm/hugetlb.h|  4 ++--
  arch/mips/include/asm/hugetlb.h|  9 ++---
  arch/parisc/include/asm/hugetlb.h  |  4 ++--
  arch/powerpc/include/asm/hugetlb.h |  9 ++---
  arch/s390/include/asm/hugetlb.h|  6 +++---
  arch/sh/include/asm/hugetlb.h  |  4 ++--
  arch/sparc/include/asm/hugetlb.h   |  4 ++--
  include/asm-generic/hugetlb.h  |  4 ++--
  10 files changed, 32 insertions(+), 28 deletions(-)


The above changes look straight forward.
Happy that you Cc'ed impacted arch maintainers so they can at least
have a look.

The only user of huge_ptep_clear_flush() today is hugetlb_cow/wp() in
mm/hugetlb.c.  Any reason why you did not change that code?  At least


Cause we did not use the return value of huge_ptep_clear_flush() in 
mm/hugetlb.c.



cast the return of huge_ptep_clear_flush() to void with a comment?


Sure. Will add an explicit casting in next version.


Not absolutely necessary.

Acked-by: Mike Kravetz 


Thanks.


Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-03 Thread Baolin Wang




On 5/3/2022 6:03 PM, Gerald Schaefer wrote:

On Tue, 3 May 2022 10:19:46 +0800
Baolin Wang  wrote:




On 5/2/2022 10:02 PM, Gerald Schaefer wrote:

On Sat, 30 Apr 2022 11:22:33 +0800
Baolin Wang  wrote:




On 4/30/2022 4:02 AM, Gerald Schaefer wrote:

On Fri, 29 Apr 2022 16:14:43 +0800
Baolin Wang  wrote:


On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now we only use try_to_unmap() to unmap a poisoned hugetlb page,
which means now we will unmap only one pte entry for a CONT-PTE or
CONT-PMD size poisoned hugetlb page, and we can still access other
subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page,
which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

Note we've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page.

Signed-off-by: Baolin Wang 
---
mm/rmap.c | 34 +-
1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 7cf2408..1e168d7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);


Unlike in your patch 2/3, I do not see that this (huge) pteval would later
be used again with set_huge_pte_at() instead of set_pte_at(). Not sure if
this (huge) pteval could end up at a set_pte_at() later, but if yes, then
this would be broken on s390, and you'd need to use set_huge_pte_at()
instead of set_pte_at() like in your patch 2/3.


IIUC, As I said in the commit message, we will only unmap a poisoned
hugetlb page by try_to_unmap(), and the poisoned hugetlb page will be
remapped with a poisoned entry by set_huge_swap_pte_at() in
try_to_unmap_one(). So I think no need change to use set_huge_pte_at()
instead of set_pte_at() for other cases, since the hugetlb page will not
hit other cases.

if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (folio_test_hugetlb(folio)) {
hugetlb_count_sub(folio_nr_pages(folio), mm);
set_huge_swap_pte_at(mm, address, pvmw.pte, pteval,
 vma_mmu_pagesize(vma));
} else {
dec_mm_counter(mm, mm_counter(>page));
set_pte_at(mm, address, pvmw.pte, pteval);
}

}


OK, but wouldn't the pteval be overwritten here with
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage))?
IOW, what sense does it make to save the returned pteval from
huge_ptep_clear_flush(), when it is never being used anywhere?


Please see previous code, we'll use the original pte value to check if
it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs
is set noop_dirty_folio().

pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);


Uh, ok, that wouldn't work on s390, but we also don't have
CONFIG_PTE_MARKER_UFFD_WP / HAVE_ARCH_USERFAULTFD_WP set, so
I guess we will be fine (for now).


OK.



Still, I find it a bit unsettling that pte_install_uffd_wp_if_needed()
would work on a potential hugetlb *pte, directly de-referencing it
instead of using huge_ptep_get().

The !pte_none(*pte) check at the beginning would be broken in the
hugetlb case for s390 (not sure about other archs, but I think s390
might be the only exception strictly requiring huge_ptep_get()
for de-referencing hugetlb *pte pointers).


Right, I think so too. I'll look at the uffd code in detail, seems need 
another patch to fix the hugetlb for uffd. Thanks for your comments.


Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-05-02 Thread Baolin Wang




On 5/2/2022 10:02 PM, Gerald Schaefer wrote:

On Sat, 30 Apr 2022 11:22:33 +0800
Baolin Wang  wrote:




On 4/30/2022 4:02 AM, Gerald Schaefer wrote:

On Fri, 29 Apr 2022 16:14:43 +0800
Baolin Wang  wrote:


On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now we only use try_to_unmap() to unmap a poisoned hugetlb page,
which means now we will unmap only one pte entry for a CONT-PTE or
CONT-PMD size poisoned hugetlb page, and we can still access other
subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page,
which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

Note we've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page.

Signed-off-by: Baolin Wang 
---
   mm/rmap.c | 34 +-
   1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 7cf2408..1e168d7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);


Unlike in your patch 2/3, I do not see that this (huge) pteval would later
be used again with set_huge_pte_at() instead of set_pte_at(). Not sure if
this (huge) pteval could end up at a set_pte_at() later, but if yes, then
this would be broken on s390, and you'd need to use set_huge_pte_at()
instead of set_pte_at() like in your patch 2/3.


IIUC, As I said in the commit message, we will only unmap a poisoned
hugetlb page by try_to_unmap(), and the poisoned hugetlb page will be
remapped with a poisoned entry by set_huge_swap_pte_at() in
try_to_unmap_one(). So I think no need change to use set_huge_pte_at()
instead of set_pte_at() for other cases, since the hugetlb page will not
hit other cases.

if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (folio_test_hugetlb(folio)) {
hugetlb_count_sub(folio_nr_pages(folio), mm);
set_huge_swap_pte_at(mm, address, pvmw.pte, pteval,
 vma_mmu_pagesize(vma));
} else {
dec_mm_counter(mm, mm_counter(>page));
set_pte_at(mm, address, pvmw.pte, pteval);
}

}


OK, but wouldn't the pteval be overwritten here with
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage))?
IOW, what sense does it make to save the returned pteval from
huge_ptep_clear_flush(), when it is never being used anywhere?


Please see previous code, we'll use the original pte value to check if 
it is uffd-wp armed, and if need to mark it dirty though the hugetlbfs 
is set noop_dirty_folio().


pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);

/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pteval))
folio_mark_dirty(folio);


Re: [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-04-29 Thread Baolin Wang




On 4/30/2022 4:02 AM, Gerald Schaefer wrote:

On Fri, 29 Apr 2022 16:14:43 +0800
Baolin Wang  wrote:


On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now we only use try_to_unmap() to unmap a poisoned hugetlb page,
which means now we will unmap only one pte entry for a CONT-PTE or
CONT-PMD size poisoned hugetlb page, and we can still access other
subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page,
which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

Note we've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page.

Signed-off-by: Baolin Wang 
---
  mm/rmap.c | 34 +-
  1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 7cf2408..1e168d7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);


Unlike in your patch 2/3, I do not see that this (huge) pteval would later
be used again with set_huge_pte_at() instead of set_pte_at(). Not sure if
this (huge) pteval could end up at a set_pte_at() later, but if yes, then
this would be broken on s390, and you'd need to use set_huge_pte_at()
instead of set_pte_at() like in your patch 2/3.


IIUC, As I said in the commit message, we will only unmap a poisoned 
hugetlb page by try_to_unmap(), and the poisoned hugetlb page will be 
remapped with a poisoned entry by set_huge_swap_pte_at() in 
try_to_unmap_one(). So I think no need change to use set_huge_pte_at() 
instead of set_pte_at() for other cases, since the hugetlb page will not 
hit other cases.


if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (folio_test_hugetlb(folio)) {
hugetlb_count_sub(folio_nr_pages(folio), mm);
set_huge_swap_pte_at(mm, address, pvmw.pte, pteval,
 vma_mmu_pagesize(vma));
} else {
dec_mm_counter(mm, mm_counter(>page));
set_pte_at(mm, address, pvmw.pte, pteval);
}

}



Please note that huge_ptep_get functions do not return valid PTEs on s390,
and such PTEs must never be set directly with set_pte_at(), but only with
set_huge_pte_at().

Background is that, for hugetlb pages, we are of course not really dealing
with PTEs at this level, but rather PMDs or PUDs, depending on hugetlb size.
On s390, the layout is quite different for PTEs and PMDs / PUDs, and
unfortunately the hugetlb code is not properly reflecting this by using
PMD or PUD types, like the THP code does.

So, as work-around, on s390, the huge_ptep_xxx functions will return
only fake PTEs, which must be converted again to a proper PMD or PUD,
before writing them to the page table, which is what happens in
set_huge_pte_at(), but not in set_pte_at().


Thanks for your explanation. As I said as above, I think we've already 
handled the hugetlb with set_huge_swap_pte_at() in try_to_unmap_one().


[PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

2022-04-29 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When unmapping a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it. This is correct
for PMD or PUD size hugetlb, since they always contain only one
pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes, so we will nuke only one pte or pmd
entry for this CONT-PTE/PMD size hugetlb page.

And now we only use try_to_unmap() to unmap a poisoned hugetlb page,
which means now we will unmap only one pte entry for a CONT-PTE or
CONT-PMD size poisoned hugetlb page, and we can still access other
subpages of a CONT-PTE or CONT-PMD size poisoned hugetlb page,
which will cause serious issues possibly.

So we should change to use huge_ptep_clear_flush() to nuke the
hugetlb page table to fix this issue, which already considered
CONT-PTE and CONT-PMD size hugetlb.

Note we've already used set_huge_swap_pte_at() to set a poisoned
swap entry for a poisoned hugetlb page.

Signed-off-by: Baolin Wang 
---
 mm/rmap.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 7cf2408..1e168d7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1564,28 +1564,28 @@ static bool try_to_unmap_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-   }
-
-   /*
-* Nuke the page table entry. When having to clear
-* PageAnonExclusive(), we always have to flush.
-*/
-   if (should_defer_flush(mm, flags) && !anon_exclusive) {
/*
-* We clear the PTE but do not flush so potentially
-* a remote CPU could still be writing to the folio.
-* If the entry was previously clean then the
-* architecture must guarantee that a clear->dirty
-* transition on a cached TLB entry is written through
-* and traps if the PTE is unmapped.
+* Nuke the page table entry. When having to clear
+* PageAnonExclusive(), we always have to flush.
 */
-   pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+   if (should_defer_flush(mm, flags) && !anon_exclusive) {
+   /*
+* We clear the PTE but do not flush so 
potentially
+* a remote CPU could still be writing to the 
folio.
+* If the entry was previously clean then the
+* architecture must guarantee that a 
clear->dirty
+* transition on a cached TLB entry is written 
through
+* and traps if the PTE is unmapped.
+*/
+   pteval = ptep_get_and_clear(mm, address, 
pvmw.pte);
 
-   set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
-   } else {
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
+   set_tlb_ubc_flush_pending(mm, 
pte_dirty(pteval));
+   } else {
+   pteval = ptep_clear_flush(vma, address, 
pvmw.pte);
+   }
}
 
/*
-- 
1.8.3.1



[PATCH 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating

2022-04-29 Thread Baolin Wang
Hi,

Now migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll
use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue, please find
details in each patch. Thanks.

Baolin Wang (3):
  mm: change huge_ptep_clear_flush() to return the original pte
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
  mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping

 arch/arm64/include/asm/hugetlb.h   |  4 +--
 arch/arm64/mm/hugetlbpage.c| 12 
 arch/ia64/include/asm/hugetlb.h|  4 +--
 arch/mips/include/asm/hugetlb.h|  9 --
 arch/parisc/include/asm/hugetlb.h  |  4 +--
 arch/powerpc/include/asm/hugetlb.h |  9 --
 arch/s390/include/asm/hugetlb.h|  6 ++--
 arch/sh/include/asm/hugetlb.h  |  4 +--
 arch/sparc/include/asm/hugetlb.h   |  4 +--
 include/asm-generic/hugetlb.h  |  4 +--
 mm/rmap.c  | 58 +++---
 11 files changed, 67 insertions(+), 51 deletions(-)

-- 
1.8.3.1



[PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-04-29 Thread Baolin Wang
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb:
2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
size specified.

When migrating a hugetlb page, we will get the relevant page table
entry by huge_pte_offset() only once to nuke it and remap it with
a migration pte entry. This is correct for PMD or PUD size hugetlb,
since they always contain only one pmd entry or pud entry in the
page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
since they can contain several continuous pte or pmd entry with
same page table attributes. So we will nuke or remap only one pte
or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
not expected for hugetlb migration. The problem is we can still
continue to modify the subpages' data of a hugetlb page during
migrating a hugetlb page, which can cause a serious data consistent
issue, since we did not nuke the page table entry and set a
migration pte for the subpages of a hugetlb page.

To fix this issue, we should change to use huge_ptep_clear_flush()
to nuke a hugetlb page table, and remap it with set_huge_pte_at()
and set_huge_swap_pte_at() when migrating a hugetlb page, which
already considered the CONT-PTE or CONT-PMD size hugetlb.

Signed-off-by: Baolin Wang 
---
 mm/rmap.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 6fdd198..7cf2408 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
break;
}
}
+
+   /* Nuke the hugetlb page table entry */
+   pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
} else {
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+   /* Nuke the page table entry. */
+   pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
 
-   /* Nuke the page table entry. */
-   pteval = ptep_clear_flush(vma, address, pvmw.pte);
-
/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pteval))
folio_mark_dirty(folio);
@@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
pte_t swp_pte;
 
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2024,7 +2029,10 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
   !anon_exclusive, subpage);
if (anon_exclusive &&
page_try_share_anon_rmap(subpage)) {
-   set_pte_at(mm, address, pvmw.pte, pteval);
+   if (folio_test_hugetlb(folio))
+   set_huge_pte_at(mm, address, pvmw.pte, 
pteval);
+   else
+   set_pte_at(mm, address, pvmw.pte, 
pteval);
ret = false;
page_vma_mapped_walk_done();
break;
@@ -2050,7 +2058,11 @@ static bool try_to_migrate_one(struct folio *folio, 
struct vm_area_struct *vma,
swp_pte = pte_swp_mksoft_dirty(swp_pte);
if (pte_uffd_wp(pteval))
swp_pte = pte_swp_mkuffd_wp(swp_pte);
-   set_pte_at(mm, address, pvmw.pte, swp_pte);
+   if (folio_test_hugetlb(folio))
+   set_huge_swap_pte_at(mm, address, pvmw.pte,
+swp_pte, 
vma_mmu_pagesize(vma));
+   else
+   set_pte_at(mm, address, pvmw.pte, swp_pte);
trace_set_migration_pte(address, pte_val(swp_pte),
compound_order(>page));
/*
-- 
1.8.3.1



[PATCH 1/3] mm: change huge_ptep_clear_flush() to return the original pte

2022-04-29 Thread Baolin Wang
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page
table when unmapping or migrating a hugetlb page, and will change
to use huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

Signed-off-by: Baolin Wang 
---
 arch/arm64/include/asm/hugetlb.h   |  4 ++--
 arch/arm64/mm/hugetlbpage.c| 12 +---
 arch/ia64/include/asm/hugetlb.h|  4 ++--
 arch/mips/include/asm/hugetlb.h|  9 ++---
 arch/parisc/include/asm/hugetlb.h  |  4 ++--
 arch/powerpc/include/asm/hugetlb.h |  9 ++---
 arch/s390/include/asm/hugetlb.h|  6 +++---
 arch/sh/include/asm/hugetlb.h  |  4 ++--
 arch/sparc/include/asm/hugetlb.h   |  4 ++--
 include/asm-generic/hugetlb.h  |  4 ++--
 10 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 1242f71..616b2ca 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -39,8 +39,8 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+  unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index cbace1c..ca8e65c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -486,19 +486,17 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
 }
 
-void huge_ptep_clear_flush(struct vm_area_struct *vma,
-  unsigned long addr, pte_t *ptep)
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
 {
size_t pgsize;
int ncontig;
 
-   if (!pte_cont(READ_ONCE(*ptep))) {
-   ptep_clear_flush(vma, addr, ptep);
-   return;
-   }
+   if (!pte_cont(READ_ONCE(*ptep)))
+   return ptep_clear_flush(vma, addr, ptep);
 
ncontig = find_num_contig(vma->vm_mm, addr, ptep, );
-   clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
+   return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
 static int __init hugetlbpage_init(void)
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 7e46ebd..65d3811 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -23,8 +23,8 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
 #define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
 }
 
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index c214440..fd69c88 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -43,16 +43,19 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep)
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
 {
+   pte_t pte;
+
/*
 * clear the huge pte entry firstly, so that the other smp threads will
 * not get old pte entry after finishing flush_tlb_page and before
 * setting new huge pte entry
 */
-   huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+   pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
flush_tlb_page(vma, addr);
+   return pte;
 }
 
 #define __HAVE_ARCH_HUGE_PTE_NONE
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index a69cf9e..25bc560 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -28,8 +28,8 @@ static inline int prepare_hugepage_range(struct file *file,
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
-static inline void huge_ptep_clear_flush(struct vm_a

Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()

2015-07-17 Thread Baolin Wang
On 16 July 2015 at 18:43, Thomas Gleixner t...@linutronix.de wrote:
 On Thu, 16 Jul 2015, Baolin Wang wrote:
 On 15 July 2015 at 19:55, Thomas Gleixner t...@linutronix.de wrote:
  On Wed, 15 Jul 2015, Baolin Wang wrote:
 
  On 15 July 2015 at 18:31, Thomas Gleixner t...@linutronix.de wrote:
   On Wed, 15 Jul 2015, Baolin Wang wrote:
  
   The cputime_to_timespec() and timespec_to_cputime() functions are
   not year 2038 safe on 32bit systems due to that the struct timepsec
   will overflow in 2038 year.
  
   And how is this relevant? cputime is not based on wall clock time at
   all. So what has 2038 to do with cputime?
  
   We want proper explanations WHY we need such a change.
 
  When converting the posix-cpu-timers, it call the
  cputime_to_timespec() function. Thus it need a conversion for this
  function.
 
  There is no requirement to convert posix-cpu-timers on their own. We
  need to adopt the posix cpu timers code because it shares syscalls
  with the other posix timers, but that still does not explain why we
  need these functions.
 

 In posix-cpu-timers, it also defined some 'k_clock struct' variables,
 and we need to convert the callbacks of the 'k_clock struct' which are
 not year 2038 safe on 32bit systems. Some callbacks which need to
 convert call the cputime_to_timespec() function, thus we also want to
 convert the cputime_to_timespec() function to a year 2038 safe
 function to make all them ready for the year 2038 issue.

 You are not getting it at all.

 1) We need to change k_clock callbacks due to 2038 issues

 2) posix cpu timers implement affected callbacks

 3) posix cpu timers themself and cputime are NOT affected by 2038

 So we have 2 options to change the code in posix cpu timers:

A) Do the timespec/timespec64 conversion in the posix cpu timer
   callbacks and leave the cputime functions untouched.

B) Implement cputime/timespec64 functions to avoid #A

If you go for #B, you need to provide a reasonable explanation why
it is better than #A. And that explanation has absolutely nothing
to do with 2038 safety.

Very thanks for your explanation, and I'll think about that.


 Not everything is a 2038 issue, just because the only tool you have is
 a timespec64.

 Thanks,

 tglx






-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()

2015-07-15 Thread Baolin Wang
On 15 July 2015 at 19:55, Thomas Gleixner t...@linutronix.de wrote:
 On Wed, 15 Jul 2015, Baolin Wang wrote:

 On 15 July 2015 at 18:31, Thomas Gleixner t...@linutronix.de wrote:
  On Wed, 15 Jul 2015, Baolin Wang wrote:
 
  The cputime_to_timespec() and timespec_to_cputime() functions are
  not year 2038 safe on 32bit systems due to that the struct timepsec
  will overflow in 2038 year.
 
  And how is this relevant? cputime is not based on wall clock time at
  all. So what has 2038 to do with cputime?
 
  We want proper explanations WHY we need such a change.

 When converting the posix-cpu-timers, it call the
 cputime_to_timespec() function. Thus it need a conversion for this
 function.

 There is no requirement to convert posix-cpu-timers on their own. We
 need to adopt the posix cpu timers code because it shares syscalls
 with the other posix timers, but that still does not explain why we
 need these functions.


In posix-cpu-timers, it also defined some 'k_clock struct' variables,
and we need to convert the callbacks of the 'k_clock struct' which are
not year 2038 safe on 32bit systems. Some callbacks which need to
convert call the cputime_to_timespec() function, thus we also want to
convert the cputime_to_timespec() function to a year 2038 safe
function to make all them ready for the year 2038 issue.

 You can see that conversion in patch posix-cpu-timers: Convert to
 y2038 safe callbacks from
 https://git.linaro.org/people/baolin.wang/upstream_0627.git.

 I do not care about your random git tree. I care about proper
 changelogs. Your changelogs are just a copied boilerplate full of
 errors.

 Thanks,

 tglx



-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()

2015-07-15 Thread Baolin Wang
The cputime_to_timespec() and timespec_to_cputime() functions are
not year 2038 safe on 32bit systems due to that the struct timepsec
will overflow in 2038 year. This patch introduces cputime_to_timespec64()
and timespec64_to_cputime() functions which use struct timespec64.
And converts arch specific implementations in arch/s390 and arch/powerpc.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 arch/powerpc/include/asm/cputime.h|6 +++---
 arch/s390/include/asm/cputime.h   |8 
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |6 +++---
 include/linux/cputime.h   |   16 
 5 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index e245255..5dda5c0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long 
sec)
 }
 
 /*
- * Convert cputime - timespec
+ * Convert cputime - timespec64
  */
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*p)
 {
u64 x = (__force u64) ct;
unsigned int frac;
@@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, 
struct timespec *p)
p-tv_nsec = x;
 }
 
-static inline cputime_t timespec_to_cputime(const struct timespec *p)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *p)
 {
u64 ct;
 
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index 221b454..3319b51 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -81,16 +81,16 @@ static inline cputime_t secs_to_cputime(const unsigned int 
s)
 }
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *value)
 {
unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC;
return (__force cputime_t)(ret + __div(value-tv_nsec * 
CPUTIME_PER_USEC, NSEC_PER_USEC));
 }
 
-static inline void cputime_to_timespec(const cputime_t cputime,
-  struct timespec *value)
+static inline void cputime_to_timespec64(const cputime_t cputime,
+struct timespec64 *value)
 {
unsigned long long __cputime = (__force unsigned long long) cputime;
value-tv_nsec = (__cputime % CPUTIME_PER_SEC) * NSEC_PER_USEC / 
CPUTIME_PER_USEC;
diff --git a/include/asm-generic/cputime_jiffies.h 
b/include/asm-generic/cputime_jiffies.h
index fe386fc..54e034c 100644
--- a/include/asm-generic/cputime_jiffies.h
+++ b/include/asm-generic/cputime_jiffies.h
@@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t;
 #define secs_to_cputime(sec)   jiffies_to_cputime((sec) * HZ)
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-#define timespec_to_cputime(__val) \
-   jiffies_to_cputime(timespec_to_jiffies(__val))
-#define cputime_to_timespec(__ct,__val)\
-   jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
+#define timespec64_to_cputime(__val)   \
+   jiffies_to_cputime(timespec64_to_jiffies(__val))
+#define cputime_to_timespec64(__ct,__val)  \
+   jiffies_to_timespec64(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
diff --git a/include/asm-generic/cputime_nsecs.h 
b/include/asm-generic/cputime_nsecs.h
index 0419485..c0cafc0 100644
--- a/include/asm-generic/cputime_nsecs.h
+++ b/include/asm-generic/cputime_nsecs.h
@@ -71,14 +71,14 @@ typedef u64 __nocast cputime64_t;
(__force cputime_t)((__secs) * NSEC_PER_SEC)
 
 /*
- * Convert cputime - timespec (nsec)
+ * Convert cputime - timespec64 (nsec)
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *val)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *val)
 {
u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec;
return (__force cputime_t) ret;
 }
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec 
*val)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*val)
 {
u32 rem;
 
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index f2eb2ee..cd638a0 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -13,4 +13,20 @@
usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
 #endif
 
+static inline cputime_t timespec_to_cputime(const struct timespec *ts)
+{
+   struct timespec64 ts64 = timespec_to_timespec64(*ts);
+
+   return timespec64_to_cputime(ts64);
+}
+
+static inline void cputime_to_timespec(const cputime_t cputime

Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()

2015-07-15 Thread Baolin Wang
On 15 July 2015 at 18:31, Thomas Gleixner t...@linutronix.de wrote:
 On Wed, 15 Jul 2015, Baolin Wang wrote:

 The cputime_to_timespec() and timespec_to_cputime() functions are
 not year 2038 safe on 32bit systems due to that the struct timepsec
 will overflow in 2038 year.

 And how is this relevant? cputime is not based on wall clock time at
 all. So what has 2038 to do with cputime?

 We want proper explanations WHY we need such a change.

When converting the posix-cpu-timers, it call the
cputime_to_timespec() function. Thus it need a conversion for this
function.
You can see that conversion in patch posix-cpu-timers: Convert to
y2038 safe callbacks from
https://git.linaro.org/people/baolin.wang/upstream_0627.git.
And I also will explain this in the changelog. Thanks for your comments.


 Thanks,

 tglx




-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/6] Introduce 64bit accessors and structures required to address y2038 issues in the posix_clock subsystem

2015-07-14 Thread Baolin Wang
This patch series change the 32-bit time types (timespec/itimerspec) to
the 64-bit types (timespec64/itimerspec64), and add new 64bit accessor
functions, which are required in order to avoid y2038 issues in the
posix_clock subsystem.

In order to avoid spamming people too much, I'm only sending the first
few patches of the patch series, and left the other patches for later.

And if you are interested in the whole patch series, see:
https://git.linaro.org/people/baolin.wang/upstream_0627.git

Thoughts and feedback would be appreciated.

Baolin Wang (6):
  time: Introduce struct itimerspec64
  timekeeping: Introduce current_kernel_time64()
  security: Introduce security_settime64()
  time: Introduce do_sys_settimeofday64()
  time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()

 arch/powerpc/include/asm/cputime.h|6 +++---
 arch/s390/include/asm/cputime.h   |8 
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |6 +++---
 include/linux/cputime.h   |   16 +++
 include/linux/jiffies.h   |   22 ++---
 include/linux/lsm_hooks.h |5 +++--
 include/linux/security.h  |   20 ---
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   24 +++---
 kernel/time/time.c|   28 +++---
 kernel/time/timekeeping.c |6 +++---
 security/commoncap.c  |2 +-
 security/security.c   |2 +-
 14 files changed, 148 insertions(+), 42 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-06-14 Thread Baolin Wang
On 12 June 2015 at 21:16, Thomas Gleixner t...@linutronix.de wrote:
 On Fri, 12 Jun 2015, Baolin Wang wrote:

 Sigh. Again threading of the series failed. Some patches are, the
 whole series is not. Can you please get your tools straight?

 You neither managed to cc me on the security patch.

 - Modify the subject line and the changelog:

   timekeeping: Change the implementation of timekeeping_clocktai()

 Sigh. How is that better than the previous one? It's more accurate,
 but equally useless.

 And of course you did not address my request to change the macro mess
 in

   posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec

 according to the discussion with Arnd.

 Thanks,

 tglx

Hi Thomas,

Thanks for your comments, and i'll fix these problems you point out.

-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-06-12 Thread Baolin Wang
This patch series changes the 32-bit time types (timespec/itimerspec) to
the 64-bit types (timespec64/itimerspec64), since 32-bit time types will
break in the year 2038 on 32bit systems.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

---
Changes since v4:
- Rebase the patch series.
- Modify the subject line and the changelog.

Changes since v3:
- Fix some introducing bugs.

Changes since v2:
- Split the syscall conversion patch into small some patches.

Changes since V1:
- Split some patch into small patch.
- Add some default function for new 64bit methods for syscall function.
- Move do_sys_settimeofday() function to head file.
- Modify the EXPORT_SYMPOL issue.
- Add new 64bit methods in cputime_nsecs.h file.
---

Baolin Wang (24):
  time: Introduce struct itimerspec64
  timekeeping: Introduce current_kernel_time64()
  security: Introduce security_settime64()
  time: Introduce do_sys_settimeofday64()
  posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec
  posix-timers: Factor out the guts of 'timer_gettime'
  posix-timers: Implement y2038 safe timer_get64() callback
  posix-timers: Factor out the guts of 'timer_settime'
  posix-timers: Implement y2038 safe timer_set64() callback
  posix-timers: Factor out the guts of 'clock_settime'
  posix-timers: Implement y2038 safe clock_set64() callback
  posix-timers: Factor out the guts of 'clock_gettime'
  posix-timers: Implement y2038 safe clock_get64() callback
  posix-timers: Factor out the guts of 'clcok_getres'
  posix-timers: Implement y2038 safe clock_getres64() callback
  timekeeping: Change the implementation of timekeeping_clocktai()
  posix-timers: Convert to y2038 safe callbacks
  mmtimer: Convert to y2038 safe callbacks
  alarmtimer: Convert to y2038 safe callbacks
  posix-clock: Convert to y2038 safe callbacks
  time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
  posix-cpu-timers: Convert to y2038 safe callbacks
  k_clock: Remove y2038 unsafe callbacks

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 +++--
 drivers/ptp/ptp_clock.c   |   22 +--
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |6 +-
 include/linux/cputime.h   |   16 ++
 include/linux/jiffies.h   |   21 ++-
 include/linux/lsm_hooks.h |5 +-
 include/linux/posix-clock.h   |   10 +-
 include/linux/posix-timers.h  |   18 +--
 include/linux/security.h  |   20 ++-
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   25 +++-
 kernel/time/alarmtimer.c  |   38 ++---
 kernel/time/posix-clock.c |   20 +--
 kernel/time/posix-cpu-timers.c|   84 ++-
 kernel/time/posix-timers.c|  257 +
 kernel/time/time.c|   19 +--
 kernel/time/timekeeping.c |6 +-
 security/commoncap.c  |2 +-
 security/security.c   |2 +-
 22 files changed, 412 insertions(+), 254 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 22/24] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()

2015-06-12 Thread Baolin Wang
The cputime_to_timespec() and timespec_to_cputime() functions are
not year 2038 safe on 32bit systems due to the struct timepsec will
overflow in 2038 year. Introduce cputime_to_timespec64() and
timespec64_to_cputime() functions which use struct timespec64,
as well as for arch/s390 and arch/powerpc architecture.

The cputime_to_timespec() and timespec_to_cputime() functions are
moved to include/linux/cputime.h file as 'static inline' for removing
conveniently in future.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 arch/powerpc/include/asm/cputime.h|6 +++---
 arch/s390/include/asm/cputime.h   |8 
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |6 +++---
 include/linux/cputime.h   |   16 
 5 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index e245255..5dda5c0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long 
sec)
 }
 
 /*
- * Convert cputime - timespec
+ * Convert cputime - timespec64
  */
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*p)
 {
u64 x = (__force u64) ct;
unsigned int frac;
@@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, 
struct timespec *p)
p-tv_nsec = x;
 }
 
-static inline cputime_t timespec_to_cputime(const struct timespec *p)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *p)
 {
u64 ct;
 
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index 221b454..3319b51 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -81,16 +81,16 @@ static inline cputime_t secs_to_cputime(const unsigned int 
s)
 }
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *value)
 {
unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC;
return (__force cputime_t)(ret + __div(value-tv_nsec * 
CPUTIME_PER_USEC, NSEC_PER_USEC));
 }
 
-static inline void cputime_to_timespec(const cputime_t cputime,
-  struct timespec *value)
+static inline void cputime_to_timespec64(const cputime_t cputime,
+struct timespec64 *value)
 {
unsigned long long __cputime = (__force unsigned long long) cputime;
value-tv_nsec = (__cputime % CPUTIME_PER_SEC) * NSEC_PER_USEC / 
CPUTIME_PER_USEC;
diff --git a/include/asm-generic/cputime_jiffies.h 
b/include/asm-generic/cputime_jiffies.h
index fe386fc..54e034c 100644
--- a/include/asm-generic/cputime_jiffies.h
+++ b/include/asm-generic/cputime_jiffies.h
@@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t;
 #define secs_to_cputime(sec)   jiffies_to_cputime((sec) * HZ)
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-#define timespec_to_cputime(__val) \
-   jiffies_to_cputime(timespec_to_jiffies(__val))
-#define cputime_to_timespec(__ct,__val)\
-   jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
+#define timespec64_to_cputime(__val)   \
+   jiffies_to_cputime(timespec64_to_jiffies(__val))
+#define cputime_to_timespec64(__ct,__val)  \
+   jiffies_to_timespec64(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
diff --git a/include/asm-generic/cputime_nsecs.h 
b/include/asm-generic/cputime_nsecs.h
index 0419485..c0cafc0 100644
--- a/include/asm-generic/cputime_nsecs.h
+++ b/include/asm-generic/cputime_nsecs.h
@@ -71,14 +71,14 @@ typedef u64 __nocast cputime64_t;
(__force cputime_t)((__secs) * NSEC_PER_SEC)
 
 /*
- * Convert cputime - timespec (nsec)
+ * Convert cputime - timespec64 (nsec)
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *val)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *val)
 {
u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec;
return (__force cputime_t) ret;
 }
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec 
*val)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*val)
 {
u32 rem;
 
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index f2eb2ee..cd638a0 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -13,4 +13,20 @@
usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
 #endif
 
+static inline cputime_t timespec_to_cputime(const struct timespec *ts)
+{
+   struct timespec64 ts64 = timespec_to_timespec64(*ts

Re: [PATCH v4 00/25] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-06-02 Thread Baolin Wang
On 3 June 2015 at 03:20, Thomas Gleixner t...@linutronix.de wrote:

 On Mon, 1 Jun 2015, Baolin Wang wrote:

  This patch series changes the 32-bit time types (timespec/itimerspec) to
  the 64-bit types (timespec64/itimerspec64), since 32-bit time types will
  break in the year 2038.

 That's only true for 32bit systems.

 All in all the patch series looks rather reasonable now, except for
 the subject lines and the changelogs.

 The only technical objection I have is the macro conversion magic in
 patch #6. This can be done in a less cryptic and more efficient way.

 See the comments to the various patches and please apply them to all
 of the series.

 Thanks,

 tglx



Hi Thomas,

Thanks for your comments, and i'll check and fix these problems.

-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 00/25] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-06-01 Thread Baolin Wang
This patch series changes the 32-bit time types (timespec/itimerspec) to
the 64-bit types (timespec64/itimerspec64), since 32-bit time types will
break in the year 2038.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

Baolin Wang (25):
  time:Introduce struct itimerspec64
  timekeeping:Introduce the current_kernel_time64()
  hrtimer:Introduce hrtimer_get_res64()
  security: Introduce security_settime64()
  time:Introduce the do_sys_settimeofday64()
  posix-timers:Introduce {get,put}_timespec/{get,put}_itimerspec
  posix-timers: Split up timer_gettime()/timer_settime()/clock_settime()/
clock_gettime()/clock_getres().
  posix-timers: Convert timer_gettime()/timer_settime()/clock_settime()/
clock_gettime()/clock_getres() to timespec64/itimerspec64.
  mmtimer:Convert to timespec64/itimerspec64
  alarmtimer:Convert to timespec64/itimerspec64
  posix-clock:Convert to timespec64/itimerspec64
  time:Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  cputime:Introduce cputime_to_timespec64()/timespec64_to_cputime()
  posix-cpu-timers:Convert to timespec64/itimerspec64
  k_clock:Remove timespec/itimerspec

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 +++--
 drivers/ptp/ptp_clock.c   |   26 +---
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 +-
 include/linux/cputime.h   |   16 ++
 include/linux/hrtimer.h   |   16 +-
 include/linux/jiffies.h   |   21 ++-
 include/linux/posix-clock.h   |   10 +-
 include/linux/posix-timers.h  |   18 +--
 include/linux/security.h  |   25 +++-
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   26 +++-
 kernel/time/alarmtimer.c  |   43 +++---
 kernel/time/hrtimer.c |   10 +-
 kernel/time/posix-clock.c |   20 +--
 kernel/time/posix-cpu-timers.c|   84 ++-
 kernel/time/posix-timers.c|  259 +
 kernel/time/time.c|   20 +--
 kernel/time/timekeeping.c |6 +-
 kernel/time/timekeeping.h |1 -
 security/commoncap.c  |2 +-
 security/security.c   |2 +-
 24 files changed, 437 insertions(+), 267 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 23/25] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function

2015-06-01 Thread Baolin Wang
This patch introduces some functions for converting cputime to timespec64
and back, that repalce the timespec type with timespec64 type, as well as
for arch/s390 and arch/powerpc architecture.

And these new methods will replace the old 
cputime_to_timespec/timespec_to_cputime
function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime
functions are moved to include/linux/cputime.h file for removing conveniently.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 arch/powerpc/include/asm/cputime.h|6 +++---
 arch/s390/include/asm/cputime.h   |8 
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 ++--
 include/linux/cputime.h   |   16 
 5 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index e245255..5dda5c0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long 
sec)
 }
 
 /*
- * Convert cputime - timespec
+ * Convert cputime - timespec64
  */
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*p)
 {
u64 x = (__force u64) ct;
unsigned int frac;
@@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, 
struct timespec *p)
p-tv_nsec = x;
 }
 
-static inline cputime_t timespec_to_cputime(const struct timespec *p)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *p)
 {
u64 ct;
 
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index b91e960..1266697 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int 
s)
 }
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *value)
 {
unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC;
return (__force cputime_t)(ret + __div(value-tv_nsec * 
CPUTIME_PER_USEC, NSEC_PER_USEC));
 }
 
-static inline void cputime_to_timespec(const cputime_t cputime,
-  struct timespec *value)
+static inline void cputime_to_timespec64(const cputime_t cputime,
+  struct timespec64 *value)
 {
unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef CONFIG_64BIT
diff --git a/include/asm-generic/cputime_jiffies.h 
b/include/asm-generic/cputime_jiffies.h
index fe386fc..54e034c 100644
--- a/include/asm-generic/cputime_jiffies.h
+++ b/include/asm-generic/cputime_jiffies.h
@@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t;
 #define secs_to_cputime(sec)   jiffies_to_cputime((sec) * HZ)
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-#define timespec_to_cputime(__val) \
-   jiffies_to_cputime(timespec_to_jiffies(__val))
-#define cputime_to_timespec(__ct,__val)\
-   jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
+#define timespec64_to_cputime(__val)   \
+   jiffies_to_cputime(timespec64_to_jiffies(__val))
+#define cputime_to_timespec64(__ct,__val)  \
+   jiffies_to_timespec64(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
diff --git a/include/asm-generic/cputime_nsecs.h 
b/include/asm-generic/cputime_nsecs.h
index 0419485..65c875b 100644
--- a/include/asm-generic/cputime_nsecs.h
+++ b/include/asm-generic/cputime_nsecs.h
@@ -73,12 +73,12 @@ typedef u64 __nocast cputime64_t;
 /*
  * Convert cputime - timespec (nsec)
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *val)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *val)
 {
u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec;
return (__force cputime_t) ret;
 }
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec 
*val)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*val)
 {
u32 rem;
 
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index f2eb2ee..e4c88da 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -13,4 +13,20 @@
usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
 #endif
 
+static inline cputime_t timespec_to_cputime(const struct timespec *ts)
+{
+   struct timespec64 ts64 = timespec_to_timespec64(*ts);
+
+   return timespec64_to_cputime(ts64);
+}
+
+static inline void cputime_to_timespec(const cputime_t cputime,
+   struct timespec *value)
+{
+   struct timespec64 ts64

Re: [PATCH v3 00/22] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-05-12 Thread Baolin Wang
On 12 May 2015 at 17:39, Arnd Bergmann a...@arndb.de wrote:

 On Monday 11 May 2015 19:08:38 Baolin Wang wrote:
  This patch series changes the 32-bit time type (timespec/itimerspec) to
 the 64-bit one
  (timespec64/itimerspec64), since 32-bit time types will break in the
 year 2038.
 
  This patch series introduces new methods with timespec64/itimerspec64
 type,
  and removes the old ones with timespec/itimerspec type for
 posix_clock_operations
  and k_clock structure.
 
  Also introduces some new functions with timespec64/itimerspec64 type,
 like current_kernel_time64(),
  hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime().
 
  Changes since v2:
-Split the syscall conversion patch into small some patches.
 
 
  Baolin Wang (22):
linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
timekeeping:Introduce the current_kernel_time64() function with
  timespec64 type
time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for
  getting the timer resolution
posix-timers:Split out the guts of the syscall and change the
  implementation for timer_gettime
posix-timers:Convert to the 64bit methods for the timer_gettime
  syscall function

 I have two more very general comments about the series:

 a) something has gone wrong with your submission in v2 and v3 but was
working earlier: normally all emails should be sent by git-send-email
as replies to the [patch 00/22] mail. This is the default, and it
is enabled by the '--thread --no-chain-reply' options. Please try
to get this to work again.

 b) it would be better to have a little shorter subject lines, to avoid
line-wrapping in the list above. Here are some examples what you could
use to replace the lines above:

timekeeping: introduce struct itimerspec64
timekeeping: introduce current_kernel_time64()
hrtimer: introduce hrtimer_get_res64()
posix-timers: split up sys_timer_gettime()
posix-timers: convert timer_gettime() to timespec64

In general, try to come up with the shortest description that
uniquely describes what your patch does, and move any details into
the longer patch description.

 Arnd


OK, i'll fix these in next patch series.Thanks for your comments.


-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 00/22] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-05-11 Thread Baolin Wang
This patch series changes the 32-bit time type (timespec/itimerspec) to the 
64-bit one
(timespec64/itimerspec64), since 32-bit time types will break in the year 2038.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

Also introduces some new functions with timespec64/itimerspec64 type, like 
current_kernel_time64(),
hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime().

Changes since v2:
-Split the syscall conversion patch into small some patches.


Baolin Wang (22):
  linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
  timekeeping:Introduce the current_kernel_time64() function with
timespec64 type
  time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for
getting the timer resolution
  posix-timers:Split out the guts of the syscall and change the
implementation for timer_gettime
  posix-timers:Convert to the 64bit methods for the timer_gettime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for timer_settime
  posix-timers:Convert to the 64bit methods for the timer_settime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for clock_settime
  posix-timers:Convert to the 64bit methods for the clock_settime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for clock_gettime
  posix-timers:Convert to the 64bit methods for the clock_gettime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for clock_getres
  posix-timers:Convert to the 64bit methods for the clock_getres
syscall function
  time:Introduce the do_sys_settimeofday64() function with timespec64
type
  time/posix-timers:Convert to the 64bit methods for k_clock callback
functions
  char/mmtimer:Convert to the 64bit methods for k_clock callback
function
  time/alarmtimer:Convert to the new 64bit methods for k_clock
structure
  time/posix-clock:Convert to the 64bit methods for k_clock and
posix_clock_operations structure
  time/time:Introduce the timespec64_to_jiffies/jiffies_to_timespec64
function
  cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime
function
  time/posix-cpu-timers:Convert to the 64bit methods for k_clock
structure
  k_clock:Remove the 32bit methods with timespec/itimerspec type

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 +++--
 drivers/ptp/ptp_clock.c   |   26 +---
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 +-
 include/linux/cputime.h   |   15 ++
 include/linux/hrtimer.h   |   12 +-
 include/linux/jiffies.h   |   21 ++-
 include/linux/posix-clock.h   |   10 +-
 include/linux/posix-timers.h  |   18 +--
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   26 +++-
 kernel/time/alarmtimer.c  |   43 +++---
 kernel/time/hrtimer.c |   10 +-
 kernel/time/posix-clock.c |   20 +--
 kernel/time/posix-cpu-timers.c|   83 +-
 kernel/time/posix-timers.c|  269 ++---
 kernel/time/time.c|   22 +--
 kernel/time/timekeeping.c |6 +-
 kernel/time/timekeeping.h |2 +-
 21 files changed, 428 insertions(+), 254 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 00/22] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-05-11 Thread Baolin Wang
This patch series changes the 32-bit time type (timespec/itimerspec) to the 
64-bit one
(timespec64/itimerspec64), since 32-bit time types will break in the year 2038.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

Also introduces some new functions with timespec64/itimerspec64 type, like 
current_kernel_time64(),
hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime().

Changes since v2:
-Split the syscall conversion patch into small some patches.

*** BLURB HERE ***

Baolin Wang (22):
  linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
  timekeeping:Introduce the current_kernel_time64() function with
timespec64 type
  time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for
getting the timer resolution
  posix-timers:Split out the guts of the syscall and change the
implementation for timer_gettime
  posix-timers:Convert to the 64bit methods for the timer_gettime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for timer_settime
  posix-timers:Convert to the 64bit methods for the timer_settime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for clock_settime
  posix-timers:Convert to the 64bit methods for the clock_settime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for clock_gettime
  posix-timers:Convert to the 64bit methods for the clock_gettime
syscall function
  posix-timers:Split out the guts of the syscall and change the
implementation for clock_getres
  posix-timers:Convert to the 64bit methods for the clock_getres
syscall function
  time:Introduce the do_sys_settimeofday64() function with timespec64
type
  time/posix-timers:Convert to the 64bit methods for k_clock callback
functions
  char/mmtimer:Convert to the 64bit methods for k_clock callback
function
  time/alarmtimer:Convert to the new 64bit methods for k_clock
structure
  time/posix-clock:Convert to the 64bit methods for k_clock and
posix_clock_operations structure
  time/time:Introduce the timespec64_to_jiffies/jiffies_to_timespec64
function
  cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime
function
  time/posix-cpu-timers:Convert to the 64bit methods for k_clock
structure
  k_clock:Remove the 32bit methods with timespec/itimerspec type

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 +++--
 drivers/ptp/ptp_clock.c   |   26 +---
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 +-
 include/linux/cputime.h   |   15 ++
 include/linux/hrtimer.h   |   12 +-
 include/linux/jiffies.h   |   21 ++-
 include/linux/posix-clock.h   |   10 +-
 include/linux/posix-timers.h  |   18 +--
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   26 +++-
 kernel/time/alarmtimer.c  |   43 +++---
 kernel/time/hrtimer.c |   10 +-
 kernel/time/posix-clock.c |   20 +--
 kernel/time/posix-cpu-timers.c|   83 +-
 kernel/time/posix-timers.c|  269 ++---
 kernel/time/time.c|   22 +--
 kernel/time/timekeeping.c |6 +-
 kernel/time/timekeeping.h |2 +-
 21 files changed, 428 insertions(+), 254 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 20/22] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function

2015-05-11 Thread Baolin Wang
This patch introduces some functions for converting cputime to timespec64 and 
back,
that repalce the timespec type with timespec64 type, as well as for arch/s390 
and
arch/powerpc architecture.

And these new methods will replace the old 
cputime_to_timespec/timespec_to_cputime
function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime 
functions
are moved to include/linux/cputime.h file for removing conveniently.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 arch/powerpc/include/asm/cputime.h|6 +++---
 arch/s390/include/asm/cputime.h   |8 
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 ++--
 include/linux/cputime.h   |   15 +++
 5 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index e245255..5dda5c0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long 
sec)
 }
 
 /*
- * Convert cputime - timespec
+ * Convert cputime - timespec64
  */
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*p)
 {
u64 x = (__force u64) ct;
unsigned int frac;
@@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, 
struct timespec *p)
p-tv_nsec = x;
 }
 
-static inline cputime_t timespec_to_cputime(const struct timespec *p)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *p)
 {
u64 ct;
 
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index b91e960..1266697 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int 
s)
 }
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *value)
 {
unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC;
return (__force cputime_t)(ret + __div(value-tv_nsec * 
CPUTIME_PER_USEC, NSEC_PER_USEC));
 }
 
-static inline void cputime_to_timespec(const cputime_t cputime,
-  struct timespec *value)
+static inline void cputime_to_timespec64(const cputime_t cputime,
+  struct timespec64 *value)
 {
unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef CONFIG_64BIT
diff --git a/include/asm-generic/cputime_jiffies.h 
b/include/asm-generic/cputime_jiffies.h
index fe386fc..54e034c 100644
--- a/include/asm-generic/cputime_jiffies.h
+++ b/include/asm-generic/cputime_jiffies.h
@@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t;
 #define secs_to_cputime(sec)   jiffies_to_cputime((sec) * HZ)
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-#define timespec_to_cputime(__val) \
-   jiffies_to_cputime(timespec_to_jiffies(__val))
-#define cputime_to_timespec(__ct,__val)\
-   jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
+#define timespec64_to_cputime(__val)   \
+   jiffies_to_cputime(timespec64_to_jiffies(__val))
+#define cputime_to_timespec64(__ct,__val)  \
+   jiffies_to_timespec64(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
diff --git a/include/asm-generic/cputime_nsecs.h 
b/include/asm-generic/cputime_nsecs.h
index 0419485..65c875b 100644
--- a/include/asm-generic/cputime_nsecs.h
+++ b/include/asm-generic/cputime_nsecs.h
@@ -73,12 +73,12 @@ typedef u64 __nocast cputime64_t;
 /*
  * Convert cputime - timespec (nsec)
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *val)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *val)
 {
u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec;
return (__force cputime_t) ret;
 }
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec 
*val)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*val)
 {
u32 rem;
 
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index f2eb2ee..f01896f 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -13,4 +13,19 @@
usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
 #endif
 
+static inline cputime_t timespec_to_cputime(const struct timespec *ts)
+{
+   struct timespec64 ts64 = timespec_to_timespec64(*ts);
+   return timespec64_to_cputime(ts64);
+}
+
+static inline void cputime_to_timespec(const cputime_t cputime,
+   struct timespec *value)
+{
+   struct timespec64 *ts64

[PATCH v2 13/15] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function

2015-04-30 Thread Baolin Wang
This patch introduces some functions for converting cputime to timespec64 and 
back,
that repalce the timespec type with timespec64 type, as well as for arch/s390 
and
arch/powerpc architecture.

And these new methods will replace the old 
cputime_to_timespec/timespec_to_cputime
function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime 
functions
are moved to include/linux/cputime.h file for removing conveniently.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 arch/powerpc/include/asm/cputime.h|6 +++---
 arch/s390/include/asm/cputime.h   |8 
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 ++--
 include/linux/cputime.h   |   15 +++
 5 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index e245255..5dda5c0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long 
sec)
 }
 
 /*
- * Convert cputime - timespec
+ * Convert cputime - timespec64
  */
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*p)
 {
u64 x = (__force u64) ct;
unsigned int frac;
@@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, 
struct timespec *p)
p-tv_nsec = x;
 }
 
-static inline cputime_t timespec_to_cputime(const struct timespec *p)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *p)
 {
u64 ct;
 
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index b91e960..1266697 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int 
s)
 }
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *value)
 {
unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC;
return (__force cputime_t)(ret + __div(value-tv_nsec * 
CPUTIME_PER_USEC, NSEC_PER_USEC));
 }
 
-static inline void cputime_to_timespec(const cputime_t cputime,
-  struct timespec *value)
+static inline void cputime_to_timespec64(const cputime_t cputime,
+  struct timespec64 *value)
 {
unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef CONFIG_64BIT
diff --git a/include/asm-generic/cputime_jiffies.h 
b/include/asm-generic/cputime_jiffies.h
index fe386fc..54e034c 100644
--- a/include/asm-generic/cputime_jiffies.h
+++ b/include/asm-generic/cputime_jiffies.h
@@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t;
 #define secs_to_cputime(sec)   jiffies_to_cputime((sec) * HZ)
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-#define timespec_to_cputime(__val) \
-   jiffies_to_cputime(timespec_to_jiffies(__val))
-#define cputime_to_timespec(__ct,__val)\
-   jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
+#define timespec64_to_cputime(__val)   \
+   jiffies_to_cputime(timespec64_to_jiffies(__val))
+#define cputime_to_timespec64(__ct,__val)  \
+   jiffies_to_timespec64(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
diff --git a/include/asm-generic/cputime_nsecs.h 
b/include/asm-generic/cputime_nsecs.h
index 0419485..65c875b 100644
--- a/include/asm-generic/cputime_nsecs.h
+++ b/include/asm-generic/cputime_nsecs.h
@@ -73,12 +73,12 @@ typedef u64 __nocast cputime64_t;
 /*
  * Convert cputime - timespec (nsec)
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *val)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *val)
 {
u64 ret = val-tv_sec * NSEC_PER_SEC + val-tv_nsec;
return (__force cputime_t) ret;
 }
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec 
*val)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*val)
 {
u32 rem;
 
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index f2eb2ee..f01896f 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -13,4 +13,19 @@
usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
 #endif
 
+static inline cputime_t timespec_to_cputime(const struct timespec *ts)
+{
+   struct timespec64 ts64 = timespec_to_timespec64(*ts);
+   return timespec64_to_cputime(ts64);
+}
+
+static inline void cputime_to_timespec(const cputime_t cputime,
+   struct timespec *value)
+{
+   struct timespec64 *ts64

[PATCH v2 00/15] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-04-30 Thread Baolin Wang
This patch series changes the 32-bit time type (timespec/itimerspec) to the 
64-bit one
(timespec64/itimerspec64), since 32-bit time types will break in the year 2038.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

Also introduces some new functions with timespec64/itimerspec64 type, like 
current_kernel_time64(),
hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime().

Changes since V1:
-Split some patch into small patch.
-Change the methods for converting the syscall and add some default 
function for new 64bit methods for syscall function.
-Introduce the new function do_sys_settimeofday64() and move 
do_sys_settimeofday() function to head file.
-Modify the EXPORT_SYMPOL issue.
-Add new 64bit methods in cputime_nsecs.h file.
-Modify some patch logs.

Baolin Wang (15):
  linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
  timekeeping:Introduce the current_kernel_time64() function with
timespec64 type
  time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for
getting the timer resolution
  posix timers:Introduce the 64bit methods with timespec64 type for
k_clock structure
  posix-timers:Split out the guts of the syscall and change the
implementation
  posix-timers:Convert to the 64bit methods for the syscall function
  time:Introduce the do_sys_settimeofday64() function with timespec64
type
  time/posix-timers:Convert to the 64bit methods for k_clock callback
functions
  char/mmtimer:Convert to the 64bit methods for k_clock callback
function
  time/alarmtimer:Convert to the new methods for k_clock structure
  time/posix-clock:Convert to the 64bit methods for k_clock and
posix_clock_operations structure
  time/time:Introduce the timespec64_to_jiffies/jiffies_to_timespec64
function
  cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime
function
  time/posix-cpu-timers:Convert to the 64bit methods for k_clock
structure
  k_clock:Remove the 32bit methods with timespec/itimerspec type

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 +++--
 drivers/ptp/ptp_clock.c   |   26 +---
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |4 +-
 include/linux/cputime.h   |   15 ++
 include/linux/hrtimer.h   |   12 +-
 include/linux/jiffies.h   |   21 ++-
 include/linux/posix-clock.h   |   10 +-
 include/linux/posix-timers.h  |   18 +--
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   26 +++-
 kernel/time/alarmtimer.c  |   43 +++---
 kernel/time/hrtimer.c |   10 +-
 kernel/time/posix-clock.c |   20 +--
 kernel/time/posix-cpu-timers.c|   83 +-
 kernel/time/posix-timers.c|  269 ++---
 kernel/time/time.c|   22 +--
 kernel/time/timekeeping.c |6 +-
 kernel/time/timekeeping.h |2 +-
 21 files changed, 428 insertions(+), 254 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [Y2038] [PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions

2015-04-21 Thread Baolin Wang
On 21 April 2015 at 16:45, Arnd Bergmann a...@arndb.de wrote:

 On Tuesday 21 April 2015 16:36:13 Baolin Wang wrote:
  On 21 April 2015 at 04:48, Thomas Gleixner t...@linutronix.de wrote:
 
   On Mon, 20 Apr 2015, Baolin Wang wrote:
 /* Set clock_realtime */
 static int posix_clock_realtime_set(const clockid_t which_clock,
- const struct timespec *tp)
+ const struct timespec64 *tp)
 {
- return do_sys_settimeofday(tp, NULL);
+ struct timespec ts = timespec64_to_timespec(*tp);
+
+ return do_sys_settimeofday(ts, NULL);
  
   Sigh. No. We first provide a proper function for this, which takes a
   timespec64, i.e. do_sys_settimeofday64() instead of having this
   wrapper mess all over the place.
  
 
  Thanks for your comments,but if use do_sys_settimeofday64() here that
  will  introduce
  a security bug: do_sys_settimeofday contains a capability
  check that normally prevents non-root users from setting the time.
 
  With your change, any user can set the system time.

 He was asking for a new do_sys_settimeofday64 function to be added,
 not using the low-level do_settimeofday64.

 Arnd


Sorry for the misunderstand, i'll fix that in next patch. Thanks.

-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit

2015-04-21 Thread Baolin Wang
On 21 April 2015 at 03:14, Thomas Gleixner t...@linutronix.de wrote:

 On Mon, 20 Apr 2015, Baolin Wang wrote:
  This patch introduces the 'struct itimerspec64' for 64bit to replace
 itimerspec,
  and also introduces the conversion methods: itimerspec64_to_itimerspec()
 and
  itimerspec_to_itimerspec64(), that makes itimerspec to ready for 2038
 year.
 
  Signed-off-by: Baolin Wang baolin.w...@linaro.org
  ---
   include/linux/time64.h |   13 +
   1 file changed, 13 insertions(+)
 
  diff --git a/include/linux/time64.h b/include/linux/time64.h
  index a383147..3647bdd 100644
  --- a/include/linux/time64.h
  +++ b/include/linux/time64.h
  @@ -18,6 +18,11 @@ struct timespec64 {
   };
   #endif
 
  +struct itimerspec64 {
  + struct timespec64 it_interval;  /* timer period */
  + struct timespec64 it_value; /* timer expiration */
  +};
  +
   /* Parameters used to convert the timespec values: */
   #define MSEC_PER_SEC 1000L
   #define USEC_PER_MSEC1000L
  @@ -187,4 +192,12 @@ static __always_inline void
 timespec64_add_ns(struct timespec64 *a, u64 ns)
 
   #endif
 
  +#define itimerspec64_to_itimerspec(its64) \

  +#define itimerspec_to_itimerspec64(its) \

 1.) Make these static inlines please. These macros are not typesafe.

 2.) Use pointers to the input value.

 Thanks.

 tglx



Thanks for your comments, i'll fix in next patch.
-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions

2015-04-21 Thread Baolin Wang
On 21 April 2015 at 04:48, Thomas Gleixner t...@linutronix.de wrote:

 On Mon, 20 Apr 2015, Baolin Wang wrote:
   /* Set clock_realtime */
   static int posix_clock_realtime_set(const clockid_t which_clock,
  - const struct timespec *tp)
  + const struct timespec64 *tp)
   {
  - return do_sys_settimeofday(tp, NULL);
  + struct timespec ts = timespec64_to_timespec(*tp);
  +
  + return do_sys_settimeofday(ts, NULL);

 Sigh. No. We first provide a proper function for this, which takes a
 timespec64, i.e. do_sys_settimeofday64() instead of having this
 wrapper mess all over the place.


Thanks for your comments,but if use do_sys_settimeofday64() here that
will  introduce
a security bug: do_sys_settimeofday contains a capability
check that normally prevents non-root users from setting the time.

With your change, any user can set the system time.


/* SIGEV_NONE timers are not queued ! See common_timer_get */
if (((timr-it_sigev_notify  ~SIGEV_THREAD_ID) == SIGEV_NONE)) {
  diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h
  index 1d91416..144af14 100644
  --- a/kernel/time/timekeeping.h
  +++ b/kernel/time/timekeeping.h
  @@ -15,7 +15,7 @@ extern u64 timekeeping_max_deferment(void);
   extern int timekeeping_inject_offset(struct timespec *ts);
   extern s32 timekeeping_get_tai_offset(void);
   extern void timekeeping_set_tai_offset(s32 tai_offset);
  -extern void timekeeping_clocktai(struct timespec *ts);
  +extern void timekeeping_clocktai(struct timespec64 *ts);

 # git grep timekeeping_clocktai() is your friend.

 Thanks,

 tglx




-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit

2015-04-20 Thread Baolin Wang
On 20 April 2015 at 17:49, Sergei Shtylyov 
sergei.shtyl...@cogentembedded.com wrote:

 Hello.

 On 4/20/2015 8:57 AM, Baolin Wang wrote:

  This patch introduces the 'struct itimerspec64' for 64bit to replace
 itimerspec,
 and also introduces the conversion methods: itimerspec64_to_itimerspec()
 and
 itimerspec_to_itimerspec64(), that makes itimerspec to ready for 2038
 year.


To not needed here.

  Signed-off-by: Baolin Wang baolin.w...@linaro.org


 [...]

 WBR, Sergei


Hi Sergei,

Sorry for the mistake. Thank you for your comments. I'll fix that in next
patch.


-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 11/11] k_clock:Remove the 32bit methods with timespec type

2015-04-20 Thread Baolin Wang
On 20 April 2015 at 16:42, Richard Cochran richardcoch...@gmail.com wrote:

 On Mon, Apr 20, 2015 at 01:57:39PM +0800, Baolin Wang wrote:

  @@ -911,18 +907,14 @@ retry:
return -EINVAL;
 
kc = clockid_to_kclock(timr-it_clock);
  - if (WARN_ON_ONCE(!kc || (!kc-timer_set  !kc-timer_set64))) {
  + if (WARN_ON_ONCE(!kc || !kc-timer_set64)) {
error = -EINVAL;
} else {
  - if (kc-timer_set64) {
  - new_spec64 = itimerspec_to_itimerspec64(new_spec);
  - error = kc-timer_set64(timr, flags, new_spec64,
  - old_spec64);
  - if (old_setting)
  - old_spec =
 itimerspec64_to_itimerspec(old_spec64);
  - } else {
  - error = kc-timer_set(timr, flags, new_spec, rtn);
  - }
  + new_spec64 = itimerspec_to_itimerspec64(new_spec);
  + error = kc-timer_set64(timr, flags, new_spec64,
  + old_spec64);

 This statement can fit on one line.

  + if (old_setting)
  + old_spec = itimerspec64_to_itimerspec(old_spec64);
}
 
unlock_timer(timr, flag);

  @@ -1057,14 +1045,13 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t,
 which_clock,
if (!kc)
return -EINVAL;
 
  - if (kc-clock_get64) {
  - error = kc-clock_get64(which_clock, kernel_tp64);
  - kernel_tp = timespec64_to_timespec(kernel_tp64);
  - } else {
  - error = kc-clock_get(which_clock, kernel_tp);
  - }
  + error = kc-clock_get64(which_clock, kernel_tp64);
  + if (!error)
  + return error;

 Wrong test, should be: if (error) ...

  +
  + kernel_tp = timespec64_to_timespec(kernel_tp64);
 
  - if (!error  copy_to_user(tp, kernel_tp, sizeof (kernel_tp)))

 The (!error  ...) was correct here!

  + if (copy_to_user(tp, kernel_tp, sizeof (kernel_tp)))
error = -EFAULT;
 
return error;

 You can simplify this like so:

 return copy_to_user(tp, kernel_tp, sizeof(kernel_tp)) ? -EFAULT :
 0;

  @@ -1104,14 +1091,13 @@ SYSCALL_DEFINE2(clock_getres, const clockid_t,
 which_clock,
if (!kc)
return -EINVAL;
 
  - if (kc-clock_getres64) {
  - error = kc-clock_getres64(which_clock, rtn_tp64);
  - rtn_tp = timespec64_to_timespec(rtn_tp64);
  - } else {
  - error = kc-clock_getres(which_clock, rtn_tp);
  - }
  + error = kc-clock_getres64(which_clock, rtn_tp64);
  + if (!error)
  + return error;

 Also wrong.

  +
  + rtn_tp = timespec64_to_timespec(rtn_tp64);
 
  - if (!error  tp  copy_to_user(tp, rtn_tp, sizeof (rtn_tp)))
  + if (tp  copy_to_user(tp, rtn_tp, sizeof (rtn_tp)))
error = -EFAULT;
 
return error;
  --
  1.7.9.5
 

 Thanks,
 Richard


Thanks for your comments, i'll fix these mistakes in next patch series.



-- 
Baolin.wang
Best Regards
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 00/11] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-04-20 Thread Baolin Wang
This patch series changes the 32-bit time type (timespec/itimerspec) to the 
64-bit one
(timespec64/itimerspec64), since 32-bit time types will break in the year 2038.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

Also introduces some new functions with timespec64/itimerspec64 type, like 
current_kernel_time64(),
hrtimer_get_res64(), cputime_to_timespec64() and timespec64_to_cputime().

Baolin Wang (11):
  linux/time64.h:Introduce the 'struct itimerspec64' for 64bit
  timekeeping:Introduce the current_kernel_time64() function with
timespec64 type
  time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for
getting the timer resolution
  posix timers:Introduce the 64bit methods with timespec64 type for
k_clock structure
  time/posix-timers:Convert to the 64bit methods for k_clock callback
functions
  char/mmtimer:Convert to the 64bit methods for k_clock callback
function
  time/alarmtimer:Convert to the new methods for k_clock structure
  time/posix-clock:Convert to the 64bit methods for k_clock and
posix_clock_operations structure
  cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime
function
  time/posix-cpu-timers:Convert to the 64bit methods for k_clock
structure
  k_clock:Remove the 32bit methods with timespec type

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 
 drivers/ptp/ptp_clock.c   |   26 ++
 include/asm-generic/cputime_jiffies.h |   10 +--
 include/linux/cputime.h   |   15 
 include/linux/hrtimer.h   |   12 ++-
 include/linux/jiffies.h   |3 +
 include/linux/posix-clock.h   |   10 +--
 include/linux/posix-timers.h  |   18 ++--
 include/linux/time64.h|   13 +++
 include/linux/timekeeping.h   |   14 ++-
 kernel/time/alarmtimer.c  |   43 -
 kernel/time/hrtimer.c |   10 +--
 kernel/time/posix-clock.c |   20 ++---
 kernel/time/posix-cpu-timers.c|   83 +
 kernel/time/posix-timers.c|  157 +++--
 kernel/time/time.c|   21 +
 kernel/time/timekeeping.c |6 +-
 kernel/time/timekeeping.h |2 +-
 20 files changed, 302 insertions(+), 211 deletions(-)

-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit

2015-04-20 Thread Baolin Wang
This patch introduces the 'struct itimerspec64' for 64bit to replace itimerspec,
and also introduces the conversion methods: itimerspec64_to_itimerspec() and
itimerspec_to_itimerspec64(), that makes itimerspec to ready for 2038 year.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 include/linux/time64.h |   13 +
 1 file changed, 13 insertions(+)

diff --git a/include/linux/time64.h b/include/linux/time64.h
index a383147..3647bdd 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -18,6 +18,11 @@ struct timespec64 {
 };
 #endif
 
+struct itimerspec64 {
+   struct timespec64 it_interval;  /* timer period */
+   struct timespec64 it_value; /* timer expiration */
+};
+
 /* Parameters used to convert the timespec values: */
 #define MSEC_PER_SEC   1000L
 #define USEC_PER_MSEC  1000L
@@ -187,4 +192,12 @@ static __always_inline void timespec64_add_ns(struct 
timespec64 *a, u64 ns)
 
 #endif
 
+#define itimerspec64_to_itimerspec(its64) \
+   ({ (struct itimerspec){ .it_interval = 
timespec64_to_timespec((its64).it_interval), \
+   .it_value = 
timespec64_to_timespec((its64).it_value) }; })
+
+#define itimerspec_to_itimerspec64(its) \
+   ({ (struct itimerspec64){ .it_interval = 
timespec_to_timespec64((its).it_interval), \
+ .it_value = 
timespec_to_timespec64((its).it_value) }; })
+
 #endif /* _LINUX_TIME64_H */
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 02/11] timekeeping:Introduce the current_kernel_time64() function with timespec64 type

2015-04-20 Thread Baolin Wang
This patch adds current_kernel_time64() function with timespec64 type,
and makes current_kernel_time() 'static inline' and moves it to timekeeping.h
file.

It is convenient for user to get the current kernel time with timespec64 type,
and delete the current_kernel_time() function easily in timekeeping.h file. That
is ready for 2038 when get the current time.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 include/linux/timekeeping.h |   10 +-
 kernel/time/timekeeping.c   |6 +++---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 3eaae47..c6d5ae9 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -18,10 +18,18 @@ extern int do_sys_settimeofday(const struct timespec *tv,
  * Kernel time accessors
  */
 unsigned long get_seconds(void);
-struct timespec current_kernel_time(void);
+struct timespec64 current_kernel_time64(void);
 /* does not take xtime_lock */
 struct timespec __current_kernel_time(void);
 
+static inline struct timespec current_kernel_time(void)
+{
+   struct timespec64 now;
+
+   now = current_kernel_time64();
+   return timespec64_to_timespec(now);
+}
+
 /*
  * timespec based interfaces
  */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 91db941..8ccc02c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1721,7 +1721,7 @@ struct timespec __current_kernel_time(void)
return timespec64_to_timespec(tk_xtime(tk));
 }
 
-struct timespec current_kernel_time(void)
+struct timespec64 current_kernel_time64(void)
 {
struct timekeeper *tk = tk_core.timekeeper;
struct timespec64 now;
@@ -1733,9 +1733,9 @@ struct timespec current_kernel_time(void)
now = tk_xtime(tk);
} while (read_seqcount_retry(tk_core.seq, seq));
 
-   return timespec64_to_timespec(now);
+   return now;
 }
-EXPORT_SYMBOL(current_kernel_time);
+EXPORT_SYMBOL(current_kernel_time64);
 
 struct timespec64 get_monotonic_coarse64(void)
 {
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure

2015-04-20 Thread Baolin Wang
This patch introduces the new methods with timespec64 type for k_clcok 
structure,
converts the timepsec type to timespec64 type in k_clock structure and converts
the itimerspec type to itimerspec64 type to ready for 2038 issue.

And also introduces the 64bit methods with timespec64 type for the framework
functions.

Next step will migrate all the k_clock users to use the new methods with 
timespec64 type
nd itimerspec64 type, and it contains the files of posix-timers.c, mmtimer.c, 
alarmtimer.c,
posix-clock.c and posix-cpu-timers.c.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 include/linux/posix-timers.h |9 ++
 kernel/time/posix-timers.c   |   65 --
 2 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 907f3fd..35786c5 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -98,9 +98,13 @@ struct k_itimer {
 
 struct k_clock {
int (*clock_getres) (const clockid_t which_clock, struct timespec *tp);
+   int (*clock_getres64) (const clockid_t which_clock, struct timespec64 
*tp);
int (*clock_set) (const clockid_t which_clock,
  const struct timespec *tp);
+   int (*clock_set64) (const clockid_t which_clock,
+   const struct timespec64 *tp);
int (*clock_get) (const clockid_t which_clock, struct timespec * tp);
+   int (*clock_get64) (const clockid_t which_clock, struct timespec64 *tp);
int (*clock_adj) (const clockid_t which_clock, struct timex *tx);
int (*timer_create) (struct k_itimer *timer);
int (*nsleep) (const clockid_t which_clock, int flags,
@@ -109,10 +113,15 @@ struct k_clock {
int (*timer_set) (struct k_itimer * timr, int flags,
  struct itimerspec * new_setting,
  struct itimerspec * old_setting);
+   int (*timer_set64) (struct k_itimer *timr, int flags,
+   struct itimerspec64 *new_setting,
+   struct itimerspec64 *old_setting);
int (*timer_del) (struct k_itimer * timr);
 #define TIMER_RETRY 1
void (*timer_get) (struct k_itimer * timr,
   struct itimerspec * cur_setting);
+   void (*timer_get64) (struct k_itimer *timr,
+struct itimerspec64 *cur_setting);
 };
 
 extern struct k_clock clock_posix_cpu;
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 31ea01f..9070387 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -522,13 +522,13 @@ void posix_timers_register_clock(const clockid_t clock_id,
return;
}
 
-   if (!new_clock-clock_get) {
-   printk(KERN_WARNING POSIX clock id %d lacks clock_get()\n,
+   if (!new_clock-clock_get  !new_clock-clock_get64) {
+   printk(KERN_WARNING POSIX clock id %d lacks clock_get() and 
clock_get64()\n,
   clock_id);
return;
}
-   if (!new_clock-clock_getres) {
-   printk(KERN_WARNING POSIX clock id %d lacks clock_getres()\n,
+   if (!new_clock-clock_getres  !new_clock-clock_getres64) {
+   printk(KERN_WARNING POSIX clock id %d lacks clock_getres() and 
clock_getres64()\n,
   clock_id);
return;
}
@@ -579,7 +579,7 @@ static struct k_clock *clockid_to_kclock(const clockid_t id)
return (id  CLOCKFD_MASK) == CLOCKFD ?
clock_posix_dynamic : clock_posix_cpu;
 
-   if (id = MAX_CLOCKS || !posix_clocks[id].clock_getres)
+   if (id = MAX_CLOCKS || (!posix_clocks[id].clock_getres  
!posix_clocks[id].clock_getres64))
return NULL;
return posix_clocks[id];
 }
@@ -771,6 +771,7 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
struct itimerspec __user *, setting)
 {
struct itimerspec cur_setting;
+   struct itimerspec64 cur_setting64;
struct k_itimer *timr;
struct k_clock *kc;
unsigned long flags;
@@ -781,10 +782,16 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
return -EINVAL;
 
kc = clockid_to_kclock(timr-it_clock);
-   if (WARN_ON_ONCE(!kc || !kc-timer_get))
+   if (WARN_ON_ONCE(!kc || (!kc-timer_get  !kc-timer_get64))) {
ret = -EINVAL;
-   else
-   kc-timer_get(timr, cur_setting);
+   } else {
+   if (kc-timer_get64) {
+   kc-timer_get64(timr, cur_setting64);
+   cur_setting = itimerspec64_to_itimerspec(cur_setting64);
+   } else {
+   kc-timer_get(timr, cur_setting);
+   }
+   }
 
unlock_timer(timr, flags);
 
@@ -877,6 +884,7 @@ SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, 
flags,
 {
struct

[PATCH 03/11] time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution

2015-04-20 Thread Baolin Wang
This patch introduces hrtimer_get_res64() function to get the timer resolution
with timespec64 type, and moves the hrtimer_get_res() function into
include/linux/hrtimer.h as a 'static inline' helper that just calls 
hrtimer_get_res64.

It is ready for 2038 year when getting the timer resolution by 
hrtimer_get_res64() function
with timespec64 type, and it is convenient to delete the old hrtimer_get_res() 
function
in hrtimer.h file.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 include/linux/hrtimer.h |   12 +++-
 kernel/time/hrtimer.c   |   10 +-
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 05f6df1..ee8ed44 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -383,7 +383,17 @@ static inline int hrtimer_restart(struct hrtimer *timer)
 
 /* Query timers: */
 extern ktime_t hrtimer_get_remaining(const struct hrtimer *timer);
-extern int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp);
+extern int hrtimer_get_res64(const clockid_t which_clock,
+struct timespec64 *tp);
+
+static inline int hrtimer_get_res(const clockid_t which_clock,
+ struct timespec *tp)
+{
+   struct timespec64 *ts64;
+
+   *ts64 = timespec_to_timespec64(*tp);
+   return hrtimer_get_res64(which_clock, ts64);
+}
 
 extern ktime_t hrtimer_get_next_event(void);
 
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index bee0c1f..508d936 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1175,24 +1175,24 @@ void hrtimer_init(struct hrtimer *timer, clockid_t 
clock_id,
 EXPORT_SYMBOL_GPL(hrtimer_init);
 
 /**
- * hrtimer_get_res - get the timer resolution for a clock
+ * hrtimer_get_res64 - get the timer resolution for a clock
  * @which_clock: which clock to query
- * @tp: pointer to timespec variable to store the resolution
+ * @tp: pointer to timespec64 variable to store the resolution
  *
  * Store the resolution of the clock selected by @which_clock in the
  * variable pointed to by @tp.
  */
-int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+int hrtimer_get_res64(const clockid_t which_clock, struct timespec64 *tp)
 {
struct hrtimer_cpu_base *cpu_base;
int base = hrtimer_clockid_to_base(which_clock);
 
cpu_base = raw_cpu_ptr(hrtimer_bases);
-   *tp = ktime_to_timespec(cpu_base-clock_base[base].resolution);
+   *tp = ktime_to_timespec64(cpu_base-clock_base[base].resolution);
 
return 0;
 }
-EXPORT_SYMBOL_GPL(hrtimer_get_res);
+EXPORT_SYMBOL_GPL(hrtimer_get_res64);
 
 static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
 {
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions

2015-04-20 Thread Baolin Wang
This patch converts the timepsec type to timespec64 type, and converts the
itimerspec type to itimerspec64 type for the k_clock callback functions.

This patch also converts the timespec type to timespec64 type for 
timekeeping_clocktai()
function which is used only in the posix-timers.c file.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 include/linux/timekeeping.h |4 +-
 kernel/time/posix-timers.c  |  102 +++
 kernel/time/timekeeping.h   |2 +-
 3 files changed, 57 insertions(+), 51 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index c6d5ae9..bd3df93 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -242,9 +242,9 @@ static inline void get_monotonic_boottime64(struct 
timespec64 *ts)
*ts = ktime_to_timespec64(ktime_get_boottime());
 }
 
-static inline void timekeeping_clocktai(struct timespec *ts)
+static inline void timekeeping_clocktai(struct timespec64 *ts)
 {
-   *ts = ktime_to_timespec(ktime_get_clocktai());
+   *ts = ktime_to_timespec64(ktime_get_clocktai());
 }
 
 /*
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 9070387..47d1abf 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -132,9 +132,9 @@ static struct k_clock posix_clocks[MAX_CLOCKS];
 static int common_nsleep(const clockid_t, int flags, struct timespec *t,
 struct timespec __user *rmtp);
 static int common_timer_create(struct k_itimer *new_timer);
-static void common_timer_get(struct k_itimer *, struct itimerspec *);
+static void common_timer_get(struct k_itimer *, struct itimerspec64 *);
 static int common_timer_set(struct k_itimer *, int,
-   struct itimerspec *, struct itimerspec *);
+   struct itimerspec64 *, struct itimerspec64 *);
 static int common_timer_del(struct k_itimer *timer);
 
 static enum hrtimer_restart posix_timer_fn(struct hrtimer *data);
@@ -203,17 +203,20 @@ static inline void unlock_timer(struct k_itimer *timr, 
unsigned long flags)
 }
 
 /* Get clock_realtime */
-static int posix_clock_realtime_get(clockid_t which_clock, struct timespec *tp)
+static int posix_clock_realtime_get(clockid_t which_clock,
+   struct timespec64 *tp)
 {
-   ktime_get_real_ts(tp);
+   ktime_get_real_ts64(tp);
return 0;
 }
 
 /* Set clock_realtime */
 static int posix_clock_realtime_set(const clockid_t which_clock,
-   const struct timespec *tp)
+   const struct timespec64 *tp)
 {
-   return do_sys_settimeofday(tp, NULL);
+   struct timespec ts = timespec64_to_timespec(*tp);
+
+   return do_sys_settimeofday(ts, NULL);
 }
 
 static int posix_clock_realtime_adj(const clockid_t which_clock,
@@ -225,48 +228,51 @@ static int posix_clock_realtime_adj(const clockid_t 
which_clock,
 /*
  * Get monotonic time for posix timers
  */
-static int posix_ktime_get_ts(clockid_t which_clock, struct timespec *tp)
+static int posix_ktime_get_ts(clockid_t which_clock, struct timespec64 *tp)
 {
-   ktime_get_ts(tp);
+   ktime_get_ts64(tp);
return 0;
 }
 
 /*
  * Get monotonic-raw time for posix timers
  */
-static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec *tp)
+static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec64 
*tp)
 {
-   getrawmonotonic(tp);
+   getrawmonotonic64(tp);
return 0;
 }
 
 
-static int posix_get_realtime_coarse(clockid_t which_clock, struct timespec 
*tp)
+static int posix_get_realtime_coarse(clockid_t which_clock,
+struct timespec64 *tp)
 {
-   *tp = current_kernel_time();
+   *tp = current_kernel_time64();
return 0;
 }
 
 static int posix_get_monotonic_coarse(clockid_t which_clock,
-   struct timespec *tp)
+   struct timespec64 *tp)
 {
-   *tp = get_monotonic_coarse();
+   *tp = get_monotonic_coarse64();
return 0;
 }
 
-static int posix_get_coarse_res(const clockid_t which_clock, struct timespec 
*tp)
+static int posix_get_coarse_res(const clockid_t which_clock,
+   struct timespec64 *tp)
 {
-   *tp = ktime_to_timespec(KTIME_LOW_RES);
+   *tp = ktime_to_timespec64(KTIME_LOW_RES);
return 0;
 }
 
-static int posix_get_boottime(const clockid_t which_clock, struct timespec *tp)
+static int posix_get_boottime(const clockid_t which_clock,
+ struct timespec64 *tp)
 {
-   get_monotonic_boottime(tp);
+   get_monotonic_boottime64(tp);
return 0;
 }
 
-static int posix_get_tai(clockid_t which_clock, struct timespec *tp)
+static int posix_get_tai(clockid_t which_clock, struct timespec64 *tp)
 {
timekeeping_clocktai(tp);
return 0;
@@ -278,57

[PATCH 06/11] char/mmtimer:Convert to the 64bit methods for k_clock callback function

2015-04-20 Thread Baolin Wang
This patch converts to the 64bit methods for k_clock callback
function, that converts the timespec type to timespec64 type and
converts the itimerspec type to itimerspec64 type.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 drivers/char/mmtimer.c |   36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/char/mmtimer.c b/drivers/char/mmtimer.c
index 3d6c067..213d0bb 100644
--- a/drivers/char/mmtimer.c
+++ b/drivers/char/mmtimer.c
@@ -478,18 +478,18 @@ static int sgi_clock_period;
 static struct timespec sgi_clock_offset;
 static int sgi_clock_period;
 
-static int sgi_clock_get(clockid_t clockid, struct timespec *tp)
+static int sgi_clock_get(clockid_t clockid, struct timespec64 *tp)
 {
u64 nsec;
 
nsec = rtc_time() * sgi_clock_period
+ sgi_clock_offset.tv_nsec;
-   *tp = ns_to_timespec(nsec);
+   *tp = ns_to_timespec64(nsec);
tp-tv_sec += sgi_clock_offset.tv_sec;
return 0;
 };
 
-static int sgi_clock_set(const clockid_t clockid, const struct timespec *tp)
+static int sgi_clock_set(const clockid_t clockid, const struct timespec64 *tp)
 {
 
u64 nsec;
@@ -657,7 +657,7 @@ static int sgi_timer_del(struct k_itimer *timr)
 }
 
 /* Assumption: it_lock is already held with irq's disabled */
-static void sgi_timer_get(struct k_itimer *timr, struct itimerspec 
*cur_setting)
+static void sgi_timer_get(struct k_itimer *timr, struct itimerspec64 
*cur_setting)
 {
 
if (timr-it.mmtimer.clock == TIMER_OFF) {
@@ -668,14 +668,14 @@ static void sgi_timer_get(struct k_itimer *timr, struct 
itimerspec *cur_setting)
return;
}
 
-   cur_setting-it_interval = ns_to_timespec(timr-it.mmtimer.incr * 
sgi_clock_period);
-   cur_setting-it_value = ns_to_timespec((timr-it.mmtimer.expires - 
rtc_time()) * sgi_clock_period);
+   cur_setting-it_interval = ns_to_timespec64(timr-it.mmtimer.incr * 
sgi_clock_period);
+   cur_setting-it_value = ns_to_timespec64((timr-it.mmtimer.expires - 
rtc_time()) * sgi_clock_period);
 }
 
 
 static int sgi_timer_set(struct k_itimer *timr, int flags,
-   struct itimerspec * new_setting,
-   struct itimerspec * old_setting)
+   struct itimerspec64 *new_setting,
+   struct itimerspec64 *old_setting)
 {
unsigned long when, period, irqflags;
int err = 0;
@@ -687,8 +687,8 @@ static int sgi_timer_set(struct k_itimer *timr, int flags,
sgi_timer_get(timr, old_setting);
 
sgi_timer_del(timr);
-   when = timespec_to_ns(new_setting-it_value);
-   period = timespec_to_ns(new_setting-it_interval);
+   when = timespec64_to_ns(new_setting-it_value);
+   period = timespec64_to_ns(new_setting-it_interval);
 
if (when == 0)
/* Clear timer */
@@ -699,11 +699,9 @@ static int sgi_timer_set(struct k_itimer *timr, int flags,
return -ENOMEM;
 
if (flags  TIMER_ABSTIME) {
-   struct timespec n;
unsigned long now;
 
-   getnstimeofday(n);
-   now = timespec_to_ns(n);
+   now = ktime_get_real_ns();
if (when  now)
when -= now;
else
@@ -765,7 +763,7 @@ static int sgi_timer_set(struct k_itimer *timr, int flags,
return err;
 }
 
-static int sgi_clock_getres(const clockid_t which_clock, struct timespec *tp)
+static int sgi_clock_getres(const clockid_t which_clock, struct timespec64 *tp)
 {
tp-tv_sec = 0;
tp-tv_nsec = sgi_clock_period;
@@ -773,13 +771,13 @@ static int sgi_clock_getres(const clockid_t which_clock, 
struct timespec *tp)
 }
 
 static struct k_clock sgi_clock = {
-   .clock_set  = sgi_clock_set,
-   .clock_get  = sgi_clock_get,
-   .clock_getres   = sgi_clock_getres,
+   .clock_set64= sgi_clock_set,
+   .clock_get64= sgi_clock_get,
+   .clock_getres64 = sgi_clock_getres,
.timer_create   = sgi_timer_create,
-   .timer_set  = sgi_timer_set,
+   .timer_set64= sgi_timer_set,
.timer_del  = sgi_timer_del,
-   .timer_get  = sgi_timer_get
+   .timer_get64= sgi_timer_get
 };
 
 /**
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 07/11] time/alarmtimer:Convert to the new methods for k_clock structure

2015-04-20 Thread Baolin Wang
This patch changes to the new methods with timespec64/itimerspec64
type of k_clock structure, and converts the timespec/itimerspec type to
timespec64/itimerspec64 typein alarmtimer.c file.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 kernel/time/alarmtimer.c |   43 ++-
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 1b001ed..68186e1 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -489,35 +489,36 @@ static enum alarmtimer_restart alarm_handle_timer(struct 
alarm *alarm,
 /**
  * alarm_clock_getres - posix getres interface
  * @which_clock: clockid
- * @tp: timespec to fill
+ * @tp: timespec64 to fill
  *
  * Returns the granularity of underlying alarm base clock
  */
-static int alarm_clock_getres(const clockid_t which_clock, struct timespec *tp)
+static int alarm_clock_getres(const clockid_t which_clock,
+   struct timespec64 *tp)
 {
clockid_t baseid = alarm_bases[clock2alarm(which_clock)].base_clockid;
 
if (!alarmtimer_get_rtcdev())
return -EINVAL;
 
-   return hrtimer_get_res(baseid, tp);
+   return hrtimer_get_res64(baseid, tp);
 }
 
 /**
  * alarm_clock_get - posix clock_get interface
  * @which_clock: clockid
- * @tp: timespec to fill.
+ * @tp: timespec64 to fill.
  *
  * Provides the underlying alarm base time.
  */
-static int alarm_clock_get(clockid_t which_clock, struct timespec *tp)
+static int alarm_clock_get(clockid_t which_clock, struct timespec64 *tp)
 {
struct alarm_base *base = alarm_bases[clock2alarm(which_clock)];
 
if (!alarmtimer_get_rtcdev())
return -EINVAL;
 
-   *tp = ktime_to_timespec(base-gettime());
+   *tp = ktime_to_timespec64(base-gettime());
return 0;
 }
 
@@ -547,24 +548,24 @@ static int alarm_timer_create(struct k_itimer *new_timer)
 /**
  * alarm_timer_get - posix timer_get interface
  * @new_timer: k_itimer pointer
- * @cur_setting: itimerspec data to fill
+ * @cur_setting: itimerspec64 data to fill
  *
  * Copies out the current itimerspec data
  */
 static void alarm_timer_get(struct k_itimer *timr,
-   struct itimerspec *cur_setting)
+   struct itimerspec64 *cur_setting)
 {
ktime_t relative_expiry_time =
alarm_expires_remaining((timr-it.alarm.alarmtimer));
 
if (ktime_to_ns(relative_expiry_time)  0) {
-   cur_setting-it_value = ktime_to_timespec(relative_expiry_time);
+   cur_setting-it_value = 
ktime_to_timespec64(relative_expiry_time);
} else {
cur_setting-it_value.tv_sec = 0;
cur_setting-it_value.tv_nsec = 0;
}
 
-   cur_setting-it_interval = ktime_to_timespec(timr-it.alarm.interval);
+   cur_setting-it_interval = ktime_to_timespec64(timr-it.alarm.interval);
 }
 
 /**
@@ -588,14 +589,14 @@ static int alarm_timer_del(struct k_itimer *timr)
  * alarm_timer_set - posix timer_set interface
  * @timr: k_itimer pointer to be deleted
  * @flags: timer flags
- * @new_setting: itimerspec to be used
- * @old_setting: itimerspec being replaced
+ * @new_setting: itimerspec64 to be used
+ * @old_setting: itimerspec64 being replaced
  *
  * Sets the timer to new_setting, and starts the timer.
  */
 static int alarm_timer_set(struct k_itimer *timr, int flags,
-   struct itimerspec *new_setting,
-   struct itimerspec *old_setting)
+   struct itimerspec64 *new_setting,
+   struct itimerspec64 *old_setting)
 {
ktime_t exp;
 
@@ -613,8 +614,8 @@ static int alarm_timer_set(struct k_itimer *timr, int flags,
return TIMER_RETRY;
 
/* start the timer */
-   timr-it.alarm.interval = timespec_to_ktime(new_setting-it_interval);
-   exp = timespec_to_ktime(new_setting-it_value);
+   timr-it.alarm.interval = timespec64_to_ktime(new_setting-it_interval);
+   exp = timespec64_to_ktime(new_setting-it_value);
/* Convert (if necessary) to absolute time */
if (flags != TIMER_ABSTIME) {
ktime_t now;
@@ -670,7 +671,7 @@ static int alarmtimer_do_nsleep(struct alarm *alarm, 
ktime_t absexp)
 
 
 /**
- * update_rmtp - Update remaining timespec value
+ * update_rmtp - Update remaining timespec64 value
  * @exp: expiration time
  * @type: timer type
  * @rmtp: user pointer to remaining timepsec value
@@ -824,12 +825,12 @@ static int __init alarmtimer_init(void)
int error = 0;
int i;
struct k_clock alarm_clock = {
-   .clock_getres   = alarm_clock_getres,
-   .clock_get  = alarm_clock_get,
+   .clock_getres64 = alarm_clock_getres,
+   .clock_get64= alarm_clock_get,
.timer_create

[PATCH 09/11] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function

2015-04-20 Thread Baolin Wang
This patch introduces some functions for converting cputime to timespec64 and 
back,
that repalce the timespec type with timespec64 type, as well as for arch/s390 
and
arch/powerpc architecture.

And these new methods will replace the old 
cputime_to_timespec/timespec_to_cputime
function to ready for 2038 issue. The cputime_to_timespec/timespec_to_cputime 
functions
are moved to include/linux/cputime.h file for removing conveniently.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 arch/powerpc/include/asm/cputime.h|6 +++---
 arch/s390/include/asm/cputime.h   |8 
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/linux/cputime.h   |   15 +++
 include/linux/jiffies.h   |3 +++
 kernel/time/time.c|   21 +
 6 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index e245255..5dda5c0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -154,9 +154,9 @@ static inline cputime_t secs_to_cputime(const unsigned long 
sec)
 }
 
 /*
- * Convert cputime - timespec
+ * Convert cputime - timespec64
  */
-static inline void cputime_to_timespec(const cputime_t ct, struct timespec *p)
+static inline void cputime_to_timespec64(const cputime_t ct, struct timespec64 
*p)
 {
u64 x = (__force u64) ct;
unsigned int frac;
@@ -168,7 +168,7 @@ static inline void cputime_to_timespec(const cputime_t ct, 
struct timespec *p)
p-tv_nsec = x;
 }
 
-static inline cputime_t timespec_to_cputime(const struct timespec *p)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *p)
 {
u64 ct;
 
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index b91e960..1266697 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -89,16 +89,16 @@ static inline cputime_t secs_to_cputime(const unsigned int 
s)
 }
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and back.
  */
-static inline cputime_t timespec_to_cputime(const struct timespec *value)
+static inline cputime_t timespec64_to_cputime(const struct timespec64 *value)
 {
unsigned long long ret = value-tv_sec * CPUTIME_PER_SEC;
return (__force cputime_t)(ret + __div(value-tv_nsec * 
CPUTIME_PER_USEC, NSEC_PER_USEC));
 }
 
-static inline void cputime_to_timespec(const cputime_t cputime,
-  struct timespec *value)
+static inline void cputime_to_timespec64(const cputime_t cputime,
+  struct timespec64 *value)
 {
unsigned long long __cputime = (__force unsigned long long) cputime;
 #ifndef CONFIG_64BIT
diff --git a/include/asm-generic/cputime_jiffies.h 
b/include/asm-generic/cputime_jiffies.h
index fe386fc..ec77c0b 100644
--- a/include/asm-generic/cputime_jiffies.h
+++ b/include/asm-generic/cputime_jiffies.h
@@ -44,12 +44,12 @@ typedef u64 __nocast cputime64_t;
 #define secs_to_cputime(sec)   jiffies_to_cputime((sec) * HZ)
 
 /*
- * Convert cputime to timespec and back.
+ * Convert cputime to timespec64 and abck.
  */
-#define timespec_to_cputime(__val) \
-   jiffies_to_cputime(timespec_to_jiffies(__val))
-#define cputime_to_timespec(__ct,__val)\
-   jiffies_to_timespec(cputime_to_jiffies(__ct),__val)
+#define timespec64_to_cputime(__val)  \
+   jiffies_to_cputime(timespec64_to_jiffies(__val))
+#define cputime_to_timespec64(__ct,__val)  \
+   jiffies_to_timespec64(cputime_to_jiffies(__ct),__val)
 
 /*
  * Convert cputime to timeval and back.
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index f2eb2ee..f01896f 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -13,4 +13,19 @@
usecs_to_cputime((__nsecs) / NSEC_PER_USEC)
 #endif
 
+static inline cputime_t timespec_to_cputime(const struct timespec *ts)
+{
+   struct timespec64 ts64 = timespec_to_timespec64(*ts);
+   return timespec64_to_cputime(ts64);
+}
+
+static inline void cputime_to_timespec(const cputime_t cputime,
+   struct timespec *value)
+{
+   struct timespec64 *ts64;
+
+   *ts64 = timespec_to_timespec64(*value);
+   cputime_to_timespec64(cputime, ts64);
+}
+
 #endif /* __LINUX_CPUTIME_H */
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index c367cbd..dbaa4ee 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -293,6 +293,9 @@ extern unsigned long usecs_to_jiffies(const unsigned int u);
 extern unsigned long timespec_to_jiffies(const struct timespec *value);
 extern void jiffies_to_timespec(const unsigned long jiffies,
struct timespec *value);
+extern unsigned long timespec64_to_jiffies(const struct timespec64 *value);
+extern void jiffies_to_timespec64(const

[PATCH 08/11] time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure

2015-04-20 Thread Baolin Wang
This patch converts the posix clock operations over to the new methods with
timespec64/itimerspec64 type to making them ready for 2038, and it is based on
the ptp patch series.

And also changes to the 64bit methods for k_clock structure, that
converts the timespec/itimerspec type to timespec64/itimerspec64 type.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 drivers/ptp/ptp_clock.c |   26 --
 include/linux/posix-clock.h |   10 +-
 kernel/time/posix-clock.c   |   20 ++--
 3 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index bee8270..8c086e7 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -97,32 +97,24 @@ static s32 scaled_ppm_to_ppb(long ppm)
 
 /* posix clock implementation */
 
-static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp)
 {
tp-tv_sec = 0;
tp-tv_nsec = 1;
return 0;
 }
 
-static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+static int ptp_clock_settime(struct posix_clock *pc,
+   const struct timespec64 *tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts = timespec_to_timespec64(*tp);
-
-   return ptp-info-settime64(ptp-info, ts);
+   return ptp-info-settime64(ptp-info, tp);
 }
 
-static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts;
-   int err;
-
-   err = ptp-info-gettime64(ptp-info, ts);
-   if (!err)
-   *tp = timespec64_to_timespec(ts);
-
-   return err;
+   return ptp-info-gettime64(ptp-info, tp);
 }
 
 static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx)
@@ -134,8 +126,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
ops = ptp-info;
 
if (tx-modes  ADJ_SETOFFSET) {
-   struct timespec ts;
-   ktime_t kt;
+   struct timespec64 ts;
s64 delta;
 
ts.tv_sec  = tx-time.tv_sec;
@@ -147,8 +138,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
if ((unsigned long) ts.tv_nsec = NSEC_PER_SEC)
return -EINVAL;
 
-   kt = timespec_to_ktime(ts);
-   delta = ktime_to_ns(kt);
+   delta = timespec64_to_ns(ts);
err = ops-adjtime(ops, delta);
} else if (tx-modes  ADJ_FREQUENCY) {
s32 ppb = scaled_ppm_to_ppb(tx-freq);
diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h
index 34c4498..fd7e22c 100644
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -59,23 +59,23 @@ struct posix_clock_operations {
 
int  (*clock_adjtime)(struct posix_clock *pc, struct timex *tx);
 
-   int  (*clock_gettime)(struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts);
 
-   int  (*clock_getres) (struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_getres)(struct posix_clock *pc, struct timespec64 *ts);
 
int  (*clock_settime)(struct posix_clock *pc,
- const struct timespec *ts);
+ const struct timespec64 *ts);
 
int  (*timer_create) (struct posix_clock *pc, struct k_itimer *kit);
 
int  (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit);
 
void (*timer_gettime)(struct posix_clock *pc,
- struct k_itimer *kit, struct itimerspec *tsp);
+ struct k_itimer *kit, struct itimerspec64 *tsp);
 
int  (*timer_settime)(struct posix_clock *pc,
  struct k_itimer *kit, int flags,
- struct itimerspec *tsp, struct itimerspec *old);
+ struct itimerspec64 *tsp, struct itimerspec64 
*old);
/*
 * Optional character device methods:
 */
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index ce033c7..e21e4c1 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -297,7 +297,7 @@ out:
return err;
 }
 
-static int pc_clock_gettime(clockid_t id, struct timespec *ts)
+static int pc_clock_gettime(clockid_t id, struct timespec64 *ts)
 {
struct posix_clock_desc cd;
int err;
@@ -316,7 +316,7 @@ static int pc_clock_gettime(clockid_t id, struct timespec 
*ts)
return err;
 }
 
-static int pc_clock_getres(clockid_t id, struct timespec *ts)
+static int pc_clock_getres(clockid_t id, struct timespec64 *ts

[PATCH 11/11] k_clock:Remove the 32bit methods with timespec type

2015-04-20 Thread Baolin Wang
All of the k_clock users have been converted to the new methods. This patch
removes the older methods with timepsec/itimerspec type.  As a result, the 
k_clock
structure is ready for the year 2038.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 include/linux/posix-timers.h |9 --
 kernel/time/posix-timers.c   |   72 +-
 2 files changed, 29 insertions(+), 52 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 35786c5..7c3dae2 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -97,29 +97,20 @@ struct k_itimer {
 };
 
 struct k_clock {
-   int (*clock_getres) (const clockid_t which_clock, struct timespec *tp);
int (*clock_getres64) (const clockid_t which_clock, struct timespec64 
*tp);
-   int (*clock_set) (const clockid_t which_clock,
- const struct timespec *tp);
int (*clock_set64) (const clockid_t which_clock,
const struct timespec64 *tp);
-   int (*clock_get) (const clockid_t which_clock, struct timespec * tp);
int (*clock_get64) (const clockid_t which_clock, struct timespec64 *tp);
int (*clock_adj) (const clockid_t which_clock, struct timex *tx);
int (*timer_create) (struct k_itimer *timer);
int (*nsleep) (const clockid_t which_clock, int flags,
   struct timespec *, struct timespec __user *);
long (*nsleep_restart) (struct restart_block *restart_block);
-   int (*timer_set) (struct k_itimer * timr, int flags,
- struct itimerspec * new_setting,
- struct itimerspec * old_setting);
int (*timer_set64) (struct k_itimer *timr, int flags,
struct itimerspec64 *new_setting,
struct itimerspec64 *old_setting);
int (*timer_del) (struct k_itimer * timr);
 #define TIMER_RETRY 1
-   void (*timer_get) (struct k_itimer * timr,
-  struct itimerspec * cur_setting);
void (*timer_get64) (struct k_itimer *timr,
 struct itimerspec64 *cur_setting);
 };
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 47d1abf..3196ec0 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -528,13 +528,13 @@ void posix_timers_register_clock(const clockid_t clock_id,
return;
}
 
-   if (!new_clock-clock_get  !new_clock-clock_get64) {
-   printk(KERN_WARNING POSIX clock id %d lacks clock_get() and 
clock_get64()\n,
+   if (!new_clock-clock_get64) {
+   printk(KERN_WARNING POSIX clock id %d lacks clock_get64()\n,
   clock_id);
return;
}
-   if (!new_clock-clock_getres  !new_clock-clock_getres64) {
-   printk(KERN_WARNING POSIX clock id %d lacks clock_getres() and 
clock_getres64()\n,
+   if (!!new_clock-clock_getres64) {
+   printk(KERN_WARNING POSIX clock id %d lacks 
clock_getres64()\n,
   clock_id);
return;
}
@@ -585,7 +585,7 @@ static struct k_clock *clockid_to_kclock(const clockid_t id)
return (id  CLOCKFD_MASK) == CLOCKFD ?
clock_posix_dynamic : clock_posix_cpu;
 
-   if (id = MAX_CLOCKS || (!posix_clocks[id].clock_getres  
!posix_clocks[id].clock_getres64))
+   if (id = MAX_CLOCKS || !posix_clocks[id].clock_getres64)
return NULL;
return posix_clocks[id];
 }
@@ -788,15 +788,11 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id,
return -EINVAL;
 
kc = clockid_to_kclock(timr-it_clock);
-   if (WARN_ON_ONCE(!kc || (!kc-timer_get  !kc-timer_get64))) {
+   if (WARN_ON_ONCE(!kc || !kc-timer_get64)) {
ret = -EINVAL;
} else {
-   if (kc-timer_get64) {
-   kc-timer_get64(timr, cur_setting64);
-   cur_setting = itimerspec64_to_itimerspec(cur_setting64);
-   } else {
-   kc-timer_get(timr, cur_setting);
-   }
+   kc-timer_get64(timr, cur_setting64);
+   cur_setting = itimerspec64_to_itimerspec(cur_setting64);
}
 
unlock_timer(timr, flags);
@@ -911,18 +907,14 @@ retry:
return -EINVAL;
 
kc = clockid_to_kclock(timr-it_clock);
-   if (WARN_ON_ONCE(!kc || (!kc-timer_set  !kc-timer_set64))) {
+   if (WARN_ON_ONCE(!kc || !kc-timer_set64)) {
error = -EINVAL;
} else {
-   if (kc-timer_set64) {
-   new_spec64 = itimerspec_to_itimerspec64(new_spec);
-   error = kc-timer_set64(timr, flags, new_spec64,
-   old_spec64);
-   if (old_setting

[PATCH 10/11] time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure

2015-04-20 Thread Baolin Wang
This patch changes to the new methods of k_clock structure with timespec64
type, converts the timespec/itimerspec type to timespec64/itimerspec64 type
for the callback function in posix-cpu-timers.c file.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 kernel/time/posix-cpu-timers.c |   83 +---
 1 file changed, 44 insertions(+), 39 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0075da7..51cfead 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -52,7 +52,7 @@ static int check_clock(const clockid_t which_clock)
 }
 
 static inline unsigned long long
-timespec_to_sample(const clockid_t which_clock, const struct timespec *tp)
+timespec64_to_sample(const clockid_t which_clock, const struct timespec64 *tp)
 {
unsigned long long ret;
 
@@ -60,19 +60,19 @@ timespec_to_sample(const clockid_t which_clock, const 
struct timespec *tp)
if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED) {
ret = (unsigned long long)tp-tv_sec * NSEC_PER_SEC + 
tp-tv_nsec;
} else {
-   ret = cputime_to_expires(timespec_to_cputime(tp));
+   ret = cputime_to_expires(timespec64_to_cputime(tp));
}
return ret;
 }
 
-static void sample_to_timespec(const clockid_t which_clock,
+static void sample_to_timespec64(const clockid_t which_clock,
   unsigned long long expires,
-  struct timespec *tp)
+  struct timespec64 *tp)
 {
if (CPUCLOCK_WHICH(which_clock) == CPUCLOCK_SCHED)
-   *tp = ns_to_timespec(expires);
+   *tp = ns_to_timespec64(expires);
else
-   cputime_to_timespec((__force cputime_t)expires, tp);
+   cputime_to_timespec64((__force cputime_t)expires, tp);
 }
 
 /*
@@ -141,7 +141,7 @@ static inline unsigned long long virt_ticks(struct 
task_struct *p)
 }
 
 static int
-posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *tp)
+posix_cpu_clock_getres(const clockid_t which_clock, struct timespec64 *tp)
 {
int error = check_clock(which_clock);
if (!error) {
@@ -160,7 +160,7 @@ posix_cpu_clock_getres(const clockid_t which_clock, struct 
timespec *tp)
 }
 
 static int
-posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp)
+posix_cpu_clock_set(const clockid_t which_clock, const struct timespec64 *tp)
 {
/*
 * You can never reset a CPU clock, but we check for other errors
@@ -263,7 +263,7 @@ static int cpu_clock_sample_group(const clockid_t 
which_clock,
 
 static int posix_cpu_clock_get_task(struct task_struct *tsk,
const clockid_t which_clock,
-   struct timespec *tp)
+   struct timespec64 *tp)
 {
int err = -EINVAL;
unsigned long long rtn;
@@ -277,13 +277,14 @@ static int posix_cpu_clock_get_task(struct task_struct 
*tsk,
}
 
if (!err)
-   sample_to_timespec(which_clock, rtn, tp);
+   sample_to_timespec64(which_clock, rtn, tp);
 
return err;
 }
 
 
-static int posix_cpu_clock_get(const clockid_t which_clock, struct timespec 
*tp)
+static int posix_cpu_clock_get(const clockid_t which_clock,
+   struct timespec64 *tp)
 {
const pid_t pid = CPUCLOCK_PID(which_clock);
int err = -EINVAL;
@@ -598,7 +599,7 @@ static inline void posix_cpu_timer_kick_nohz(void) { }
  * and try again.  (This happens when the timer is in the middle of firing.)
  */
 static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
-  struct itimerspec *new, struct itimerspec *old)
+  struct itimerspec64 *new, struct itimerspec64 
*old)
 {
unsigned long flags;
struct sighand_struct *sighand;
@@ -608,7 +609,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int 
timer_flags,
 
WARN_ON_ONCE(p == NULL);
 
-   new_expires = timespec_to_sample(timer-it_clock, new-it_value);
+   new_expires = timespec64_to_sample(timer-it_clock, new-it_value);
 
/*
 * Protect against sighand release/switch in exit/exec and p-cpu_timers
@@ -669,7 +670,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int 
timer_flags,
bump_cpu_timer(timer, val);
if (val  timer-it.cpu.expires) {
old_expires = timer-it.cpu.expires - val;
-   sample_to_timespec(timer-it_clock,
+   sample_to_timespec64(timer-it_clock,
   old_expires,
   old-it_value);
} else {
@@ -709,7 +710,7 @@ static int posix_cpu_timer_set(struct