Re: [PATCH V2 1/2] powerpc/mm/slice: Move slice_mask struct definition to slice.c

2017-02-13 Thread Aneesh Kumar K.V



On Tuesday 14 February 2017 11:55 AM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:


diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b3f45e413a60..08ac27eae408 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -37,7 +37,16 @@
  #include 
  
  static DEFINE_SPINLOCK(slice_convert_lock);

-
+/*
+ * One bit per slice. We have lower slices which cover 256MB segments
+ * upto 4G range. That gets us 16 low slices. For the rest we track slices
+ * in 1TB size.

Can we tighten this comment up a bit.

What about:


+ * One bit per slice. The low slices cover the range 0 - 4GB, each
   * slice being 256MB in size, for 16 low slices. The high slices
   * cover the rest of the address space at 1TB granularity, with the
   * exception of high slice 0 which covers the range 4GB - 1TB.

OK?



good.




+ * 64 below is actually SLICE_NUM_HIGH to fixup complie errros

That line is bogus AFAICS, it refers to the old hardcoded value (prior
to 512), I'll drop it.



Thanks


-aneesh




Re: [PATCH V2 1/2] powerpc/mm/slice: Move slice_mask struct definition to slice.c

2017-02-13 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:

> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index b3f45e413a60..08ac27eae408 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -37,7 +37,16 @@
>  #include 
>  
>  static DEFINE_SPINLOCK(slice_convert_lock);
> -
> +/*
> + * One bit per slice. We have lower slices which cover 256MB segments
> + * upto 4G range. That gets us 16 low slices. For the rest we track slices
> + * in 1TB size.

Can we tighten this comment up a bit.

What about:

> + * One bit per slice. The low slices cover the range 0 - 4GB, each
>   * slice being 256MB in size, for 16 low slices. The high slices
>   * cover the rest of the address space at 1TB granularity, with the
>   * exception of high slice 0 which covers the range 4GB - 1TB.

OK?

> + * 64 below is actually SLICE_NUM_HIGH to fixup complie errros

That line is bogus AFAICS, it refers to the old hardcoded value (prior
to 512), I'll drop it.

> + */
> +struct slice_mask {
> + u64 low_slices;
> + DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
> +};


cheers


Re: [PATCH] KVM: PPC: Book3S: Ratelimit copy data failure error messages

2017-02-13 Thread Vipin K Parashar

Forwarded same patch to k...@vger.kernel.org

and kvm-...@vger.kernel.org too.


On Tuesday 14 February 2017 12:26 AM, Vipin K Parashar wrote:

kvm_ppc_mmu_book3s_32/64 xlat() log "KVM can't copy data" error
upon failing to copy user data to kernel space. This floods kernel
log once such fails occur in short time period. Ratelimit this
error to avoid flooding kernel logs upon copy data failures.

Signed-off-by: Vipin K Parashar 
---
  arch/powerpc/kvm/book3s_32_mmu.c | 3 ++-
  arch/powerpc/kvm/book3s_64_mmu.c | 3 ++-
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index a2eb6d3..ca8f960 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -224,7 +224,8 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary);

if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
-   printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp);
+   if (printk_ratelimit())
+   printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", 
ptegp);
goto no_page_found;
}

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index b9131aa..b420aca 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -265,7 +265,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
goto no_page_found;

if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
-   printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp);
+   if (printk_ratelimit())
+   printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", 
ptegp);
goto no_page_found;
}





linux-next: build failure after merge of the akpm-current tree

2017-02-13 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm-current tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

arch/powerpc/lib/code-patching.c:61:16: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before 'is_conditional_branch'
 bool __kprobes is_conditional_branch(unsigned int instr)
^

Caused by commit

  916c821aaf13 ("kprobes: move kprobe declarations to asm-generic/kprobes.h")

interacting with commit

  51c9c0843993 ("powerpc/kprobes: Implement Optprobes")

from the powerpc tree.

I have applied this merge fix patch for today:

From: Stephen Rothwell 
Date: Tue, 14 Feb 2017 16:56:11 +1100
Subject: [PATCH] powerpc/kprobes: fixup for kprobes declarations moving

Signed-off-by: Stephen Rothwell 
---
 arch/powerpc/lib/code-patching.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 0899315e1434..0d3002b7e2b4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 
 int patch_instruction(unsigned int *addr, unsigned int instr)
-- 
2.10.2

-- 
Cheers,
Stephen Rothwell


Re: [PATCH V2 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.

2017-02-13 Thread Aneesh Kumar K.V



On Tuesday 14 February 2017 11:19 AM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:


Autonuma preserves the write permission across numa fault to avoid taking
a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE
write permissions across a NUMA hinting fault"). Architecture can implement
protnone in different ways and some may choose to implement that by clearing 
Read/
Write/Exec bit of pte. Setting the write bit on such pte can result in wrong
behaviour. Fix this up by allowing arch to override how to save the write bit
on a protnone pte.

This is pretty obviously a nop on arches that don't implement the new
hooks, but it'd still be good to get an ack from someone in mm land
before I merge it.



To get it apply cleanly you may need
http://ozlabs.org/~akpm/mmots/broken-out/mm-autonuma-dont-use-set_pte_at-when-updating-protnone-ptes.patch
http://ozlabs.org/~akpm/mmots/broken-out/mm-autonuma-dont-use-set_pte_at-when-updating-protnone-ptes-fix.patch

They are strictly not needed after the saved write patch. But I didn't 
request to drop them, because the patch helps us

to get closer to the goal of no ste_pte_at() call on present ptes.

-aneesh




Re: [PATCH V2 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.

2017-02-13 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:

> Autonuma preserves the write permission across numa fault to avoid taking
> a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE
> write permissions across a NUMA hinting fault"). Architecture can implement
> protnone in different ways and some may choose to implement that by clearing 
> Read/
> Write/Exec bit of pte. Setting the write bit on such pte can result in wrong
> behaviour. Fix this up by allowing arch to override how to save the write bit
> on a protnone pte.

This is pretty obviously a nop on arches that don't implement the new
hooks, but it'd still be good to get an ack from someone in mm land
before I merge it.

cheers

> Acked-By: Michael Neuling 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  include/asm-generic/pgtable.h | 16 
>  mm/huge_memory.c  |  4 ++--
>  mm/memory.c   |  2 +-
>  mm/mprotect.c |  4 ++--
>  4 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 18af2bcefe6a..b6f3a8a4b738 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -192,6 +192,22 @@ static inline void ptep_set_wrprotect(struct mm_struct 
> *mm, unsigned long addres
>  }
>  #endif
>  
> +#ifndef pte_savedwrite
> +#define pte_savedwrite pte_write
> +#endif
> +
> +#ifndef pte_mk_savedwrite
> +#define pte_mk_savedwrite pte_mkwrite
> +#endif
> +
> +#ifndef pmd_savedwrite
> +#define pmd_savedwrite pmd_write
> +#endif
> +
> +#ifndef pmd_mk_savedwrite
> +#define pmd_mk_savedwrite pmd_mkwrite
> +#endif
> +
>  #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static inline void pmdp_set_wrprotect(struct mm_struct *mm,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9a6bd6c8d55a..2f0f855ec911 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1300,7 +1300,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t 
> pmd)
>   goto out;
>  clear_pmdnuma:
>   BUG_ON(!PageLocked(page));
> - was_writable = pmd_write(pmd);
> + was_writable = pmd_savedwrite(pmd);
>   pmd = pmd_modify(pmd, vma->vm_page_prot);
>   pmd = pmd_mkyoung(pmd);
>   if (was_writable)
> @@ -1555,7 +1555,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
> *pmd,
>   entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
>   entry = pmd_modify(entry, newprot);
>   if (preserve_write)
> - entry = pmd_mkwrite(entry);
> + entry = pmd_mk_savedwrite(entry);
>   ret = HPAGE_PMD_NR;
>   set_pmd_at(mm, addr, pmd, entry);
>   BUG_ON(vma_is_anonymous(vma) && !preserve_write &&
> diff --git a/mm/memory.c b/mm/memory.c
> index e78bf72f30dd..88c24f89d6d3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3388,7 +3388,7 @@ static int do_numa_page(struct vm_fault *vmf)
>   int target_nid;
>   bool migrated = false;
>   pte_t pte;
> - bool was_writable = pte_write(vmf->orig_pte);
> + bool was_writable = pte_savedwrite(vmf->orig_pte);
>   int flags = 0;
>  
>   /*
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index f9c07f54dd62..15f5c174a7c1 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -113,13 +113,13 @@ static unsigned long change_pte_range(struct 
> vm_area_struct *vma, pmd_t *pmd,
>   ptent = ptep_modify_prot_start(mm, addr, pte);
>   ptent = pte_modify(ptent, newprot);
>   if (preserve_write)
> - ptent = pte_mkwrite(ptent);
> + ptent = pte_mk_savedwrite(ptent);
>  
>   /* Avoid taking write faults for known dirty pages */
>   if (dirty_accountable && pte_dirty(ptent) &&
>   (pte_soft_dirty(ptent) ||
>!(vma->vm_flags & VM_SOFTDIRTY))) {
> - ptent = pte_mkwrite(ptent);
> + ptent = pte_mk_savedwrite(ptent);
>   }
>   ptep_modify_prot_commit(mm, addr, pte, ptent);
>   pages++;
> -- 
> 2.7.4



[PATCH V2 2/2] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write

2017-02-13 Thread Aneesh Kumar K.V
With this our protnone becomes a present pte with READ/WRITE/EXEC bit cleared.
By default we also set _PAGE_PRIVILEGED on such pte. This is now used to help
us identify a protnone pte that as saved write bit. For such pte, we will clear
the _PAGE_PRIVILEGED bit. The pte still remain non-accessible from both user
and kernel.

Acked-By: Michael Neuling 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  3 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 32 +--
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 0735d5a8049f..8720a406bbbe 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -16,6 +16,9 @@
 #include 
 #include 
 
+#ifndef __ASSEMBLY__
+#include 
+#endif
 /*
  * This is necessary to get the definition of PGTABLE_RANGE which we
  * need for various slices related matters. Note that this isn't the
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fef738229a68..c684ef6cbd10 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -441,8 +441,8 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte)
  */
 static inline int pte_protnone(pte_t pte)
 {
-   return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)) ==
-   cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED);
+   return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX)) ==
+   cpu_to_be64(_PAGE_PRESENT);
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
@@ -512,6 +512,32 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
 }
 
+#define pte_mk_savedwrite pte_mk_savedwrite
+static inline pte_t pte_mk_savedwrite(pte_t pte)
+{
+   /*
+* Used by Autonuma subsystem to preserve the write bit
+* while marking the pte PROT_NONE. Only allow this
+* on PROT_NONE pte
+*/
+   VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | 
_PAGE_PRIVILEGED)) !=
+ cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED));
+   return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED);
+}
+
+#define pte_savedwrite pte_savedwrite
+static inline bool pte_savedwrite(pte_t pte)
+{
+   /*
+* Saved write ptes are prot none ptes that doesn't have
+* privileged bit sit. We mark prot none as one which has
+* present and pviliged bit set and RWX cleared. To mark
+* protnone which used to have _PAGE_WRITE set we clear
+* the privileged bit.
+*/
+   return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED));
+}
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
/* FIXME!! check whether this need to be a conditional */
@@ -873,6 +899,7 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mkclean(pmd)   pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)   pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
+#define pmd_mk_savedwrite(pmd) pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
 
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #define pmd_soft_dirty(pmd)pte_soft_dirty(pmd_pte(pmd))
@@ -889,6 +916,7 @@ static inline int pmd_protnone(pmd_t pmd)
 
 #define __HAVE_ARCH_PMD_WRITE
 #define pmd_write(pmd) pte_write(pmd_pte(pmd))
+#define pmd_savedwrite(pmd)pte_savedwrite(pmd_pte(pmd))
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
-- 
2.7.4



[PATCH V2 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.

2017-02-13 Thread Aneesh Kumar K.V
Autonuma preserves the write permission across numa fault to avoid taking
a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE
write permissions across a NUMA hinting fault"). Architecture can implement
protnone in different ways and some may choose to implement that by clearing 
Read/
Write/Exec bit of pte. Setting the write bit on such pte can result in wrong
behaviour. Fix this up by allowing arch to override how to save the write bit
on a protnone pte.

Acked-By: Michael Neuling 
Signed-off-by: Aneesh Kumar K.V 
---
 include/asm-generic/pgtable.h | 16 
 mm/huge_memory.c  |  4 ++--
 mm/memory.c   |  2 +-
 mm/mprotect.c |  4 ++--
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 18af2bcefe6a..b6f3a8a4b738 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -192,6 +192,22 @@ static inline void ptep_set_wrprotect(struct mm_struct 
*mm, unsigned long addres
 }
 #endif
 
+#ifndef pte_savedwrite
+#define pte_savedwrite pte_write
+#endif
+
+#ifndef pte_mk_savedwrite
+#define pte_mk_savedwrite pte_mkwrite
+#endif
+
+#ifndef pmd_savedwrite
+#define pmd_savedwrite pmd_write
+#endif
+
+#ifndef pmd_mk_savedwrite
+#define pmd_mk_savedwrite pmd_mkwrite
+#endif
+
 #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9a6bd6c8d55a..2f0f855ec911 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1300,7 +1300,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
goto out;
 clear_pmdnuma:
BUG_ON(!PageLocked(page));
-   was_writable = pmd_write(pmd);
+   was_writable = pmd_savedwrite(pmd);
pmd = pmd_modify(pmd, vma->vm_page_prot);
pmd = pmd_mkyoung(pmd);
if (was_writable)
@@ -1555,7 +1555,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
entry = pmd_modify(entry, newprot);
if (preserve_write)
-   entry = pmd_mkwrite(entry);
+   entry = pmd_mk_savedwrite(entry);
ret = HPAGE_PMD_NR;
set_pmd_at(mm, addr, pmd, entry);
BUG_ON(vma_is_anonymous(vma) && !preserve_write &&
diff --git a/mm/memory.c b/mm/memory.c
index e78bf72f30dd..88c24f89d6d3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3388,7 +3388,7 @@ static int do_numa_page(struct vm_fault *vmf)
int target_nid;
bool migrated = false;
pte_t pte;
-   bool was_writable = pte_write(vmf->orig_pte);
+   bool was_writable = pte_savedwrite(vmf->orig_pte);
int flags = 0;
 
/*
diff --git a/mm/mprotect.c b/mm/mprotect.c
index f9c07f54dd62..15f5c174a7c1 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -113,13 +113,13 @@ static unsigned long change_pte_range(struct 
vm_area_struct *vma, pmd_t *pmd,
ptent = ptep_modify_prot_start(mm, addr, pte);
ptent = pte_modify(ptent, newprot);
if (preserve_write)
-   ptent = pte_mkwrite(ptent);
+   ptent = pte_mk_savedwrite(ptent);
 
/* Avoid taking write faults for known dirty pages */
if (dirty_accountable && pte_dirty(ptent) &&
(pte_soft_dirty(ptent) ||
 !(vma->vm_flags & VM_SOFTDIRTY))) {
-   ptent = pte_mkwrite(ptent);
+   ptent = pte_mk_savedwrite(ptent);
}
ptep_modify_prot_commit(mm, addr, pte, ptent);
pages++;
-- 
2.7.4



[PATCH V2 0/2] Numabalancing preserve write fix

2017-02-13 Thread Aneesh Kumar K.V
This patch series address an issue w.r.t THP migration and autonuma
preserve write feature. migrate_misplaced_transhuge_page() cannot deal with
concurrent modification of the page. It does a page copy without
following the migration pte sequence. IIUC, this was done to keep the
migration simpler and at the time of implemenation we didn't had THP
page cache which would have required a more elaborate migration scheme.
That means thp autonuma migration expect the protnone with saved write
to be done such that both kernel and user cannot update
the page content. This patch series enables archs like ppc64 to do that.
We are good with the hash translation mode with the current code,
because we never create a hardware page table entry for a protnone pte. 

Changes from V1:
* Update the patch so that it apply cleanly to upstream.
* Add acked-by from Michael Neuling

Aneesh Kumar K.V (2):
  mm/autonuma: Let architecture override how the write bit should be
stashed in a protnone pte.
  powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved
write

 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  3 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 32 +--
 include/asm-generic/pgtable.h | 16 ++
 mm/huge_memory.c  |  4 ++--
 mm/memory.c   |  2 +-
 mm/mprotect.c |  4 ++--
 6 files changed, 54 insertions(+), 7 deletions(-)

-- 
2.7.4



Re: [PATCH 3/5] selftests: Fix the .S and .S -> .o rules

2017-02-13 Thread Bamvor Zhang Jian
Tested-by: Bamvor Jian Zhang 

On 9 February 2017 at 16:56, Michael Ellerman  wrote:
> Both these rules incorrectly use $< (first prerequisite) rather than
> $^ (all prerequisites), meaning they don't work if we're using more than
> one .S file as input. Switch them to using $^.
>
> They also don't include $(CPPFLAGS) and other variables used in the
> default rules, which breaks targets that require those. Fix that by
> using the builtin $(COMPILE.S) and $(LINK.S) rules.
>
> Fixes: a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT")
> Signed-off-by: Michael Ellerman 
> ---
>  tools/testing/selftests/lib.mk | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 98841c54763a..ce96d80ad64f 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -54,9 +54,9 @@ $(OUTPUT)/%:%.c
> $(LINK.c) $^ $(LDLIBS) -o $@
>
>  $(OUTPUT)/%.o:%.S
> -   $(CC) $(ASFLAGS) -c $< -o $@
> +   $(COMPILE.S) $^ -o $@
>
>  $(OUTPUT)/%:%.S
> -   $(CC) $(ASFLAGS) $< -o $@
> +   $(LINK.S) $^ $(LDLIBS) -o $@
>
>  .PHONY: run_tests all clean install emit_tests
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] selftests: Fix the .c linking rule

2017-02-13 Thread Bamvor Zhang Jian
Tested-by: Bamvor Jian Zhang 

On 9 February 2017 at 16:56, Michael Ellerman  wrote:
> Currently we can't build some tests, for example:
>
>   $ make -C tools/testing/selftests/ TARGETS=vm
>   ...
>   gcc -Wall -I ../../../../usr/include   -lrt -lpthread 
> ../../../../usr/include/linux/kernel.h userfaultfd.c -o 
> tools/testing/selftests/vm/userfaultfd
>   /tmp/ccmOkQSM.o: In function `stress':
>   userfaultfd.c:(.text+0xc60): undefined reference to `pthread_create'
>   userfaultfd.c:(.text+0xca5): undefined reference to `pthread_create'
>   userfaultfd.c:(.text+0xcee): undefined reference to `pthread_create'
>   userfaultfd.c:(.text+0xd30): undefined reference to `pthread_create'
>   userfaultfd.c:(.text+0xd77): undefined reference to `pthread_join'
>   userfaultfd.c:(.text+0xe7d): undefined reference to `pthread_join'
>   userfaultfd.c:(.text+0xe9f): undefined reference to `pthread_cancel'
>   userfaultfd.c:(.text+0xec6): undefined reference to `pthread_join'
>   userfaultfd.c:(.text+0xf14): undefined reference to `pthread_join'
>   /tmp/ccmOkQSM.o: In function `userfaultfd_stress':
>   userfaultfd.c:(.text+0x13e2): undefined reference to 
> `pthread_attr_setstacksize'
>   collect2: error: ld returned 1 exit status
>
> This is because the rule for linking .c files to binaries is incorrect.
>
> The first bug is that it uses $< (first prerequisite) instead of $^ (all
> preqrequisites), fix it by using ^$.
>
> Secondly the ordering of the prerequisites vs $(LDLIBS) is wrong,
> meaning on toolchains that use --as-needed we fail to link (as above).
> Fix that by placing $(LDLIBS) *after* ^$.
>
> Finally switch to using the default rule $(LINK.c), so that we get
> $(CPPFLAGS) etc. included.
>
> Fixes: a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT")
> Signed-off-by: Michael Ellerman 
> ---
>  tools/testing/selftests/lib.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 17ed4bbe3963..98841c54763a 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -51,7 +51,7 @@ clean:
> $(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) 
> $(TEST_GEN_FILES) $(EXTRA_CLEAN)
>
>  $(OUTPUT)/%:%.c
> -   $(CC) $(CFLAGS) $(LDFLAGS) $(LDLIBS) $< -o $@
> +   $(LINK.c) $^ $(LDLIBS) -o $@
>
>  $(OUTPUT)/%.o:%.S
> $(CC) $(ASFLAGS) -c $< -o $@
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] selftests: Fix selftests build to just build, not run tests

2017-02-13 Thread Bamvor Zhang Jian
Tested by: Bamvor Jian Zhang 

On 9 February 2017 at 16:56, Michael Ellerman  wrote:
> In commit 88baa78d1f31 ("selftests: remove duplicated all and clean
> target"), the "all" target was removed from individual Makefiles and
> added to lib.mk.
>
> However the "all" target was added to lib.mk *after* the existing
> "runtests" target. This means "runtests" becomes the first (default)
> target for most of our Makefiles.
>
> This has the effect of causing a plain "make" to build *and run* the
> tests. Which is at best rude, but depending on which tests are run could
> oops someone's build machine.
>
>   $ make -C tools/testing/selftests/
>   ...
>   make[1]: Entering directory 'tools/testing/selftests/bpf'
>   gcc -Wall -O2 -I../../../../usr/include   test_verifier.c -o 
> tools/testing/selftests/bpf/test_verifier
>   gcc -Wall -O2 -I../../../../usr/include   test_maps.c -o 
> tools/testing/selftests/bpf/test_maps
>   gcc -Wall -O2 -I../../../../usr/include   test_lru_map.c -o 
> tools/testing/selftests/bpf/test_lru_map
>   #0 add+sub+mul FAIL
>   Failed to load prog 'Function not implemented'!
>   #1 unreachable FAIL
>   Unexpected error message!
>   #2 unreachable2 FAIL
>   ...
>
> Fix it by moving the "all" target to the start of lib.mk, making it the
> default target.
>
> Fixes: 88baa78d1f31 ("selftests: remove duplicated all and clean target")
> Signed-off-by: Michael Ellerman 
> ---
>  tools/testing/selftests/lib.mk | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 01bb7782a35e..17ed4bbe3963 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -2,6 +2,11 @@
>  # Makefile can operate with or without the kbuild infrastructure.
>  CC := $(CROSS_COMPILE)gcc
>
> +TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
> +TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
> +
> +all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
> +
>  define RUN_TESTS
> @for TEST in $(TEST_GEN_PROGS) $(TEST_PROGS); do \
> BASENAME_TEST=`basename $$TEST`;\
> @@ -42,11 +47,6 @@ endef
>  emit_tests:
> $(EMIT_TESTS)
>
> -TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
> -TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
> -
> -all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
> -
>  clean:
> $(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) 
> $(TEST_GEN_FILES) $(EXTRA_CLEAN)
>
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] powerpc/mm/radix: Skip ptesync in pte update helpers

2017-02-13 Thread Michael Neuling
On Thu, 2017-02-09 at 08:28 +0530, Aneesh Kumar K.V wrote:
> We do them at the start of tlb flush, and we are sure a pte update will be
> followed by a tlbflush. Hence we can skip the ptesync in pte update helpers.
> 
> Signed-off-by: Aneesh Kumar K.V 

Tested-by: Michael Neuling 

> ---
>  arch/powerpc/include/asm/book3s/64/radix.h | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h
> b/arch/powerpc/include/asm/book3s/64/radix.h
> index fcf822d6c204..77e590c77299 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -144,13 +144,11 @@ static inline unsigned long radix__pte_update(struct
> mm_struct *mm,
>    * new value of pte
>    */
>   new_pte = (old_pte | set) & ~clr;
> - asm volatile("ptesync" : : : "memory");
>   radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr);
>   if (new_pte)
>   __radix_pte_update(ptep, 0, new_pte);
>   } else
>   old_pte = __radix_pte_update(ptep, clr, set);
> - asm volatile("ptesync" : : : "memory");
>   if (!huge)
>   assert_pte_locked(mm, addr);
>  
> @@ -195,7 +193,6 @@ static inline void radix__ptep_set_access_flags(struct
> mm_struct *mm,
>   unsigned long old_pte, new_pte;
>  
>   old_pte = __radix_pte_update(ptep, ~0, 0);
> - asm volatile("ptesync" : : : "memory");
>   /*
>    * new value of pte
>    */


Re: [PATCH 2/3] powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm

2017-02-13 Thread Michael Neuling
On Thu, 2017-02-09 at 08:28 +0530, Aneesh Kumar K.V wrote:
> This helps us to do some optimization for application exit case, where we can
> skip the DD1 style pte update sequence.
> 
> Signed-off-by: Aneesh Kumar K.V 

Tested-by: Michael Neuling 

> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 17 +
>  arch/powerpc/include/asm/book3s/64/radix.h   | 23 ++-
>  2 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 6f15bde94da2..e91ada786d48 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -373,6 +373,23 @@ static inline pte_t ptep_get_and_clear(struct mm_struct
> *mm,
>   return __pte(old);
>  }
>  
> +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
> +static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
> + unsigned long addr,
> + pte_t *ptep, int full)
> +{
> + if (full && radix_enabled()) {
> + /*
> +  * Let's skip the DD1 style pte update here. We know that
> +  * this is a full mm pte clear and hence can be sure there is
> +  * no parallel set_pte.
> +  */
> + return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
> + }
> + return ptep_get_and_clear(mm, addr, ptep);
> +}
> +
> +
>  static inline void pte_clear(struct mm_struct *mm, unsigned long addr,
>    pte_t * ptep)
>  {
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h
> b/arch/powerpc/include/asm/book3s/64/radix.h
> index 70a3cdcdbe47..fcf822d6c204 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -139,7 +139,7 @@ static inline unsigned long radix__pte_update(struct
> mm_struct *mm,
>  
>   unsigned long new_pte;
>  
> - old_pte = __radix_pte_update(ptep, ~0, 0);
> + old_pte = __radix_pte_update(ptep, ~0ul, 0);
>   /*
>    * new value of pte
>    */
> @@ -157,6 +157,27 @@ static inline unsigned long radix__pte_update(struct
> mm_struct *mm,
>   return old_pte;
>  }
>  
> +static inline pte_t radix__ptep_get_and_clear_full(struct mm_struct *mm,
> +    unsigned long addr,
> +    pte_t *ptep, int full)
> +{
> + unsigned long old_pte;
> +
> + if (full) {
> + /*
> +  * If we are trying to clear the pte, we can skip
> +  * the DD1 pte update sequence and batch the tlb flush. The
> +  * tlb flush batching is done by mmu gather code. We
> +  * still keep the cmp_xchg update to make sure we get
> +  * correct R/C bit which might be updated via Nest MMU.
> +  */
> + old_pte = __radix_pte_update(ptep, ~0ul, 0);
> + } else
> + old_pte = radix__pte_update(mm, addr, ptep, ~0ul, 0, 0);
> +
> + return __pte(old_pte);
> +}
> +
>  /*
>   * Set the dirty and/or accessed bits atomically in a linux PTE, this
>   * function doesn't need to invalidate tlb.


Re: [PATCH 1/3] powerpc/mm/radix: Update pte update sequence for pte clear case

2017-02-13 Thread Michael Neuling
On Thu, 2017-02-09 at 08:28 +0530, Aneesh Kumar K.V wrote:
> In the kernel we do follow the below sequence in different code paths.
> pte = ptep_get_clear(ptep)
> 
> set_pte_at(ptep, pte)
> 
> We do that for mremap, autonuma protection update and softdirty clearing. This
> implies our optimization to skip a tlb flush when clearing a pte update is
> not valid, because for DD1 system that followup set_pte_at will be done witout
> doing the required tlbflush. Fix that by always doing the dd1 style pte update
> irrespective of new_pte value. In a later patch we will optimize the
> application
> exit case.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Aneesh Kumar K.V 

Tested-by: Michael Neuling 

> ---
>  arch/powerpc/include/asm/book3s/64/radix.h | 12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h
> b/arch/powerpc/include/asm/book3s/64/radix.h
> index b4d1302387a3..70a3cdcdbe47 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -144,16 +144,10 @@ static inline unsigned long radix__pte_update(struct
> mm_struct *mm,
>    * new value of pte
>    */
>   new_pte = (old_pte | set) & ~clr;
> - /*
> -  * If we are trying to clear the pte, we can skip
> -  * the below sequence and batch the tlb flush. The
> -  * tlb flush batching is done by mmu gather code
> -  */
> - if (new_pte) {
> - asm volatile("ptesync" : : : "memory");
> - radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr);
> + asm volatile("ptesync" : : : "memory");
> + radix__flush_tlb_pte_p9_dd1(old_pte, mm, addr);
> + if (new_pte)
>   __radix_pte_update(ptep, 0, new_pte);
> - }
>   } else
>   old_pte = __radix_pte_update(ptep, clr, set);
>   asm volatile("ptesync" : : : "memory");


Re: [PATCH] powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n

2017-02-13 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> If we enable RADIX but disable HUGETLBFS, the build breaks with:
>
>   arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of 
> function 'pmd_huge'
>   arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of 
> function 'pud_huge'
>
> Fix it by stubbing those functions when HUGETLBFS=n.
>
Reviewed-by: Aneesh Kumar K.V 


> Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()")
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable-4k.h  | 5 +
>  arch/powerpc/include/asm/book3s/64/pgtable-64k.h | 3 +++
>  2 files changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> index 9db83b4e017d..8708a0239a56 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> @@ -47,7 +47,12 @@ static inline int hugepd_ok(hugepd_t hpd)
>   return hash__hugepd_ok(hpd);
>  }
>  #define is_hugepd(hpd)   (hugepd_ok(hpd))
> +
> +#else /* !CONFIG_HUGETLB_PAGE */
> +static inline int pmd_huge(pmd_t pmd) { return 0; }
> +static inline int pud_huge(pud_t pud) { return 0; }
>  #endif /* CONFIG_HUGETLB_PAGE */
> +
>  #endif /* __ASSEMBLY__ */
>
>  #endif /*_ASM_POWERPC_BOOK3S_64_PGTABLE_4K_H */
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> index 198aff33c380..2ce4209399ed 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> @@ -46,6 +46,9 @@ static inline int hugepd_ok(hugepd_t hpd)
>  }
>  #define is_hugepd(pdep)  0
>
> +#else /* !CONFIG_HUGETLB_PAGE */
> +static inline int pmd_huge(pmd_t pmd) { return 0; }
> +static inline int pud_huge(pud_t pud) { return 0; }
>  #endif /* CONFIG_HUGETLB_PAGE */
>
>  static inline int remap_4k_pfn(struct vm_area_struct *vma, unsigned long 
> addr,
> -- 
> 2.7.4



Re: [PATCH] powerpc/xmon: add debugfs entry for xmon

2017-02-13 Thread Pan Xinhui



在 2017/2/14 10:35, Nicholas Piggin 写道:

On Mon, 13 Feb 2017 19:00:42 -0200
"Guilherme G. Piccoli"  wrote:


Currently the xmon debugger is set only via kernel boot command-line.
It's disabled by default, and can be enabled with "xmon=on" on the
command-line. Also, xmon may be accessed via sysrq mechanism, but once
we enter xmon via sysrq,  it's  kept enabled until system is rebooted,
even if we exit the debugger. A kernel crash will then lead to xmon
instance, instead of triggering a kdump procedure (if configured), for
example.

This patch introduces a debugfs entry for xmon, allowing user to query
its current state and change it if desired. Basically, the "xmon" file
to read from/write to is under the debugfs mount point, on powerpc
directory. Reading this file will provide the current state of the
debugger, one of the following: "on", "off", "early" or "nobt". Writing
one of these states to the file will take immediate effect on the debugger.

Signed-off-by: Guilherme G. Piccoli 
---
* I had this patch partially done for some time, and after a discussion
at the kernel slack channel latest week, I decided to rebase and fix
some remaining bugs. I'd change 'x' option to always disable the debugger,
since with this patch we can always re-enable xmon, but today I noticed
Pan's patch on the mailing list, so perhaps his approach of adding a flag
to 'x' option is preferable. I can change this in a V2, if requested.
Thanks in advance!


xmon state changing after the first sysrq+x violates principle of least
astonishment, so I think that should be fixed.


hi, Nick
yes, as long as xmon is disabled during boot, it should still be disabled after 
existing xmon.
My patch does not fix that as it need people add one more char 'z' following 
'x'.
I will provide a new patch to fix that.


Then the question is, is it worth making it runtime configurable with xmon
command or debugfs tunables?


They are options for people to turn xmon features on or off. Maybe people 
needn't this.
However I am not a fan  of debugfs this time as I am used to using xmon cmds. :)

Hi, Guilherme
So in the end, my thought is that: 1) cmd x|X will exit xmon and keep xmon in 
the original state(indicated by var xmon_off).
2) Then add options to turn some features on/off. And debugfs maybe not fit for 
this. But I am also wondering at same time, are people needing this?

thanks
xinhui


Thanks,
Nick





Re: [PATCH 2/2] powerpc/mm/autonuma: Switch ppc64 to its own implementeation of saved write

2017-02-13 Thread Michael Neuling
On Thu, 2017-02-09 at 08:30 +0530, Aneesh Kumar K.V wrote:
> With this our protnone becomes a present pte with READ/WRITE/EXEC bit cleared.
> By default we also set _PAGE_PRIVILEGED on such pte. This is now used to help
> us identify a protnone pte that as saved write bit. For such pte, we will
> clear
> the _PAGE_PRIVILEGED bit. The pte still remain non-accessible from both user
> and kernel.
> 
> Signed-off-by: Aneesh Kumar K.V 


FWIW I've tested this, so:

Acked-By: Michael Neuling 

> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  3 +++
>  arch/powerpc/include/asm/book3s/64/pgtable.h  | 32 +-
> -
>  2 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 0735d5a8049f..8720a406bbbe 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -16,6 +16,9 @@
>  #include 
>  #include 
>  
> +#ifndef __ASSEMBLY__
> +#include 
> +#endif
>  /*
>   * This is necessary to get the definition of PGTABLE_RANGE which we
>   * need for various slices related matters. Note that this isn't the
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index e91ada786d48..efff910a84b1 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -443,8 +443,8 @@ static inline pte_t pte_clear_soft_dirty(pte_t pte)
>   */
>  static inline int pte_protnone(pte_t pte)
>  {
> - return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED))
> ==
> - cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED);
> + return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX)) ==
> + cpu_to_be64(_PAGE_PRESENT);
>  }
>  #endif /* CONFIG_NUMA_BALANCING */
>  
> @@ -514,6 +514,32 @@ static inline pte_t pte_mkhuge(pte_t pte)
>   return pte;
>  }
>  
> +#define pte_mk_savedwrite pte_mk_savedwrite
> +static inline pte_t pte_mk_savedwrite(pte_t pte)
> +{
> + /*
> +  * Used by Autonuma subsystem to preserve the write bit
> +  * while marking the pte PROT_NONE. Only allow this
> +  * on PROT_NONE pte
> +  */
> + VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX |
> _PAGE_PRIVILEGED)) !=
> +   cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED));
> + return __pte(pte_val(pte) & ~_PAGE_PRIVILEGED);
> +}
> +
> +#define pte_savedwrite pte_savedwrite
> +static inline bool pte_savedwrite(pte_t pte)
> +{
> + /*
> +  * Saved write ptes are prot none ptes that doesn't have
> +  * privileged bit sit. We mark prot none as one which has
> +  * present and pviliged bit set and RWX cleared. To mark
> +  * protnone which used to have _PAGE_WRITE set we clear
> +  * the privileged bit.
> +  */
> + return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED));
> +}
> +
>  static inline pte_t pte_mkdevmap(pte_t pte)
>  {
>   return __pte(pte_val(pte) | _PAGE_SPECIAL|_PAGE_DEVMAP);
> @@ -885,6 +911,7 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
>  #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd)))
>  #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
>  #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd)))
> +#define pmd_mk_savedwrite(pmd)   pte_pmd(pte_mk_savedwrite(pmd_pte(pmd))
> )
>  
>  #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
>  #define pmd_soft_dirty(pmd)pte_soft_dirty(pmd_pte(pmd))
> @@ -901,6 +928,7 @@ static inline int pmd_protnone(pmd_t pmd)
>  
>  #define __HAVE_ARCH_PMD_WRITE
>  #define pmd_write(pmd)   pte_write(pmd_pte(pmd))
> +#define pmd_savedwrite(pmd)  pte_savedwrite(pmd_pte(pmd))
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);


Re: [PATCH 1/2] mm/autonuma: Let architecture override how the write bit should be stashed in a protnone pte.

2017-02-13 Thread Michael Neuling
On Thu, 2017-02-09 at 08:30 +0530, Aneesh Kumar K.V wrote:
> Autonuma preserves the write permission across numa fault to avoid taking
> a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE
> write permissions across a NUMA hinting fault"). Architecture can implement
> protnone in different ways and some may choose to implement that by clearing
> Read/
> Write/Exec bit of pte. Setting the write bit on such pte can result in wrong
> behaviour. Fix this up by allowing arch to override how to save the write bit
> on a protnone pte.
> 
> Signed-off-by: Aneesh Kumar K.V 

FWIW this is pretty simple and helps with us in powerpc...

Acked-By: Michael Neuling 

> ---
>  include/asm-generic/pgtable.h | 16 
>  mm/huge_memory.c  |  4 ++--
>  mm/memory.c   |  2 +-
>  mm/mprotect.c |  4 ++--
>  4 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 18af2bcefe6a..b6f3a8a4b738 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -192,6 +192,22 @@ static inline void ptep_set_wrprotect(struct mm_struct
> *mm, unsigned long addres
>  }
>  #endif
>  
> +#ifndef pte_savedwrite
> +#define pte_savedwrite pte_write
> +#endif
> +
> +#ifndef pte_mk_savedwrite
> +#define pte_mk_savedwrite pte_mkwrite
> +#endif
> +
> +#ifndef pmd_savedwrite
> +#define pmd_savedwrite pmd_write
> +#endif
> +
> +#ifndef pmd_mk_savedwrite
> +#define pmd_mk_savedwrite pmd_mkwrite
> +#endif
> +
>  #ifndef __HAVE_ARCH_PMDP_SET_WRPROTECT
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static inline void pmdp_set_wrprotect(struct mm_struct *mm,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9a6bd6c8d55a..2f0f855ec911 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1300,7 +1300,7 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t
> pmd)
>   goto out;
>  clear_pmdnuma:
>   BUG_ON(!PageLocked(page));
> - was_writable = pmd_write(pmd);
> + was_writable = pmd_savedwrite(pmd);
>   pmd = pmd_modify(pmd, vma->vm_page_prot);
>   pmd = pmd_mkyoung(pmd);
>   if (was_writable)
> @@ -1555,7 +1555,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t
> *pmd,
>   entry = pmdp_huge_get_and_clear_notify(mm, addr,
> pmd);
>   entry = pmd_modify(entry, newprot);
>   if (preserve_write)
> - entry = pmd_mkwrite(entry);
> + entry = pmd_mk_savedwrite(entry);
>   ret = HPAGE_PMD_NR;
>   set_pmd_at(mm, addr, pmd, entry);
>   BUG_ON(vma_is_anonymous(vma) && !preserve_write &&
> diff --git a/mm/memory.c b/mm/memory.c
> index e78bf72f30dd..88c24f89d6d3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3388,7 +3388,7 @@ static int do_numa_page(struct vm_fault *vmf)
>   int target_nid;
>   bool migrated = false;
>   pte_t pte;
> - bool was_writable = pte_write(vmf->orig_pte);
> + bool was_writable = pte_savedwrite(vmf->orig_pte);
>   int flags = 0;
>  
>   /*
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index f9c07f54dd62..15f5c174a7c1 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -113,13 +113,13 @@ static unsigned long change_pte_range(struct
> vm_area_struct *vma, pmd_t *pmd,
>   ptent = ptep_modify_prot_start(mm, addr, pte);
>   ptent = pte_modify(ptent, newprot);
>   if (preserve_write)
> - ptent = pte_mkwrite(ptent);
> + ptent = pte_mk_savedwrite(ptent);
>  
>   /* Avoid taking write faults for known dirty pages */
>   if (dirty_accountable && pte_dirty(ptent) &&
>   (pte_soft_dirty(ptent) ||
>    !(vma->vm_flags & VM_SOFTDIRTY))) {
> - ptent = pte_mkwrite(ptent);
> + ptent = pte_mk_savedwrite(ptent);
>   }
>   ptep_modify_prot_commit(mm, addr, pte, ptent);
>   pages++;


linux-next: manual merge of the kvm tree with the powerpc tree

2017-02-13 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/powerpc/platforms/pseries/lpar.c

between commit:

  dbcf929c0062 ("powerpc/pseries: Add support for hash table resizing")

from the powerpc tree and commit:

  cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on 
POWER9")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/platforms/pseries/lpar.c
index c2e13a51f369,0587655aea69..
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@@ -611,112 -609,29 +611,135 @@@ static int __init disable_bulk_remove(c
  
  __setup("bulk_remove=", disable_bulk_remove);
  
 +#define HPT_RESIZE_TIMEOUT1 /* ms */
 +
 +struct hpt_resize_state {
 +  unsigned long shift;
 +  int commit_rc;
 +};
 +
 +static int pseries_lpar_resize_hpt_commit(void *data)
 +{
 +  struct hpt_resize_state *state = data;
 +
 +  state->commit_rc = plpar_resize_hpt_commit(0, state->shift);
 +  if (state->commit_rc != H_SUCCESS)
 +  return -EIO;
 +
 +  /* Hypervisor has transitioned the HTAB, update our globals */
 +  ppc64_pft_size = state->shift;
 +  htab_size_bytes = 1UL << ppc64_pft_size;
 +  htab_hash_mask = (htab_size_bytes >> 7) - 1;
 +
 +  return 0;
 +}
 +
 +/* Must be called in user context */
 +static int pseries_lpar_resize_hpt(unsigned long shift)
 +{
 +  struct hpt_resize_state state = {
 +  .shift = shift,
 +  .commit_rc = H_FUNCTION,
 +  };
 +  unsigned int delay, total_delay = 0;
 +  int rc;
 +  ktime_t t0, t1, t2;
 +
 +  might_sleep();
 +
 +  if (!firmware_has_feature(FW_FEATURE_HPT_RESIZE))
 +  return -ENODEV;
 +
 +  printk(KERN_INFO "lpar: Attempting to resize HPT to shift %lu\n",
 + shift);
 +
 +  t0 = ktime_get();
 +
 +  rc = plpar_resize_hpt_prepare(0, shift);
 +  while (H_IS_LONG_BUSY(rc)) {
 +  delay = get_longbusy_msecs(rc);
 +  total_delay += delay;
 +  if (total_delay > HPT_RESIZE_TIMEOUT) {
 +  /* prepare with shift==0 cancels an in-progress resize 
*/
 +  rc = plpar_resize_hpt_prepare(0, 0);
 +  if (rc != H_SUCCESS)
 +  printk(KERN_WARNING
 + "lpar: Unexpected error %d cancelling 
timed out HPT resize\n",
 + rc);
 +  return -ETIMEDOUT;
 +  }
 +  msleep(delay);
 +  rc = plpar_resize_hpt_prepare(0, shift);
 +  };
 +
 +  switch (rc) {
 +  case H_SUCCESS:
 +  /* Continue on */
 +  break;
 +
 +  case H_PARAMETER:
 +  return -EINVAL;
 +  case H_RESOURCE:
 +  return -EPERM;
 +  default:
 +  printk(KERN_WARNING
 + "lpar: Unexpected error %d from H_RESIZE_HPT_PREPARE\n",
 + rc);
 +  return -EIO;
 +  }
 +
 +  t1 = ktime_get();
 +
 +  rc = stop_machine(pseries_lpar_resize_hpt_commit, , NULL);
 +
 +  t2 = ktime_get();
 +
 +  if (rc != 0) {
 +  switch (state.commit_rc) {
 +  case H_PTEG_FULL:
 +  printk(KERN_WARNING
 + "lpar: Hash collision while resizing HPT\n");
 +  return -ENOSPC;
 +
 +  default:
 +  printk(KERN_WARNING
 + "lpar: Unexpected error %d from 
H_RESIZE_HPT_COMMIT\n",
 + state.commit_rc);
 +  return -EIO;
 +  };
 +  }
 +
 +  printk(KERN_INFO
 + "lpar: HPT resize to shift %lu complete (%lld ms / %lld ms)\n",
 + shift, (long long) ktime_ms_delta(t1, t0),
 + (long long) ktime_ms_delta(t2, t1));
 +
 +  return 0;
 +}
 +
+ /* Actually only used for radix, so far */
+ static int pseries_lpar_register_process_table(unsigned long base,
+   unsigned long page_size, unsigned long table_size)
+ {
+   long rc;
+   unsigned long flags = PROC_TABLE_NEW;
+ 
+   if (radix_enabled())
+   flags |= PROC_TABLE_RADIX | PROC_TABLE_GTSE;
+   for (;;) {
+   rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, base,
+   page_size, table_size);
+   if (!H_IS_LONG_BUSY(rc))
+   break;
+   mdelay(get_longbusy_msecs(rc));
+   }
+  

linux-next: manual merge of the kvm tree with the powerpc tree

2017-02-13 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/powerpc/include/asm/prom.h

between commit:

  0de0fb09bbce ("powerpc/pseries: Advertise HPT resizing support via CAS")

from the powerpc tree and commit:

  3f4ab2f83b4e ("powerpc/pseries: Fixes for the "ibm,architecture-vec-5" 
options")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/include/asm/prom.h
index 00fcfcbdd053,8af2546ea593..
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@@ -151,11 -153,17 +153,18 @@@ struct of_drconf_cell 
  #define OV5_XCMO  0x0440  /* Page Coalescing */
  #define OV5_TYPE1_AFFINITY0x0580  /* Type 1 NUMA affinity */
  #define OV5_PRRN  0x0540  /* Platform Resource Reassignment */
 +#define OV5_RESIZE_HPT0x0601  /* Hash Page Table resizing */
- #define OV5_PFO_HW_RNG0x0E80  /* PFO Random Number Generator 
*/
- #define OV5_PFO_HW_8420x0E40  /* PFO Compression Accelerator 
*/
- #define OV5_PFO_HW_ENCR   0x0E20  /* PFO Encryption Accelerator */
- #define OV5_SUB_PROCESSORS0x0F01  /* 1,2,or 4 Sub-Processors supported */
+ #define OV5_PFO_HW_RNG0x1180  /* PFO Random Number Generator 
*/
+ #define OV5_PFO_HW_8420x1140  /* PFO Compression Accelerator 
*/
+ #define OV5_PFO_HW_ENCR   0x1120  /* PFO Encryption Accelerator */
+ #define OV5_SUB_PROCESSORS0x1501  /* 1,2,or 4 Sub-Processors supported */
+ #define OV5_XIVE_EXPLOIT  0x1701  /* XIVE exploitation supported */
+ #define OV5_MMU_RADIX_300 0x1880  /* ISA v3.00 radix MMU supported */
+ #define OV5_MMU_HASH_300  0x1840  /* ISA v3.00 hash MMU supported */
+ #define OV5_MMU_SEGM_RADIX0x1820  /* radix mode (no segmentation) */
+ #define OV5_MMU_PROC_TBL  0x1810  /* hcall selects SLB or proc table */
+ #define OV5_MMU_SLB   0x1800  /* always use SLB */
+ #define OV5_MMU_GTSE  0x1808  /* Guest translation shootdown */
  
  /* Option Vector 6: IBM PAPR hints */
  #define OV6_LINUX 0x02/* Linux is our OS */


Re: [PATCH 1/2] net: fs_enet: Fix an error handling path

2017-02-13 Thread David Miller
From: Christophe JAILLET 
Date: Fri, 10 Feb 2017 21:17:06 +0100

> 'of_node_put(fpi->phy_node)' should also be called if we branch to
> 'out_deregister_fixed_link' error handling path.
> 
> Signed-off-by: Christophe JAILLET 

Applied.


Re: [PATCH 2/2] net: fs_enet: Simplify code

2017-02-13 Thread David Miller
From: Christophe JAILLET 
Date: Fri, 10 Feb 2017 21:17:19 +0100

> There is no need to use an intermediate variable to handle an error code
> in this case.
> 
> Signed-off-by: Christophe JAILLET 

Applied.


linux-next: manual merge of the kvm tree with the powerpc tree

2017-02-13 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/powerpc/include/asm/hvcall.h

between commit:

  64b40ffbc830 ("powerpc/pseries: Add hypercall wrappers for hash page table 
resizing")

from the powerpc tree and commit:

  cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on 
POWER9")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/include/asm/hvcall.h
index 490c4b9f4e3a,54d11b3a6bf7..
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@@ -276,8 -276,7 +276,9 @@@
  #define H_GET_MPP_X   0x314
  #define H_SET_MODE0x31C
  #define H_CLEAR_HPT   0x358
 +#define H_RESIZE_HPT_PREPARE  0x36C
 +#define H_RESIZE_HPT_COMMIT   0x370
+ #define H_REGISTER_PROC_TBL   0x37C
  #define H_SIGNAL_SYS_RESET0x380
  #define MAX_HCALL_OPCODE  H_SIGNAL_SYS_RESET
  


Re: [RFC][PATCH] powerpc/64s: optimise syscall entry with relon hypercalls

2017-02-13 Thread Nicholas Piggin
On Mon, 13 Feb 2017 11:04:06 +
David Laight  wrote:

> From: Nicholas Piggin
> > Sent: 10 February 2017 18:23
> > After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
> > guest to host"), a getppid() system call goes from 307 cycles to 358
> > cycles (+17%). This is due significantly to the scratch SPR used by the
> > hypercall.
> > 
> > It turns out there are a some volatile registers common to both system
> > call and hypercall (in particular, r12, cr0, ctr), which can be used to
> > avoid the SPR and some other overheads for the system call case. This
> > brings getppid to 320 cycles (+4%).  
> ...
> > + * syscall register convention is in 
> > Documentation/powerpc/syscall64-abi.txt
> > + *
> > + * For hypercalls, the register convention is as follows:
> > + * r0 volatile
> > + * r1-2 nonvolatile
> > + * r3 volatile parameter and return value for status
> > + * r4-r10 volatile input and output value
> > + * r11 volatile hypercall number and output value
> > + * r12 volatile
> > + * r13-r31 nonvolatile
> > + * LR nonvolatile
> > + * CTR volatile
> > + * XER volatile
> > + * CR0-1 CR5-7 volatile
> > + * CR2-4 nonvolatile
> > + * Other registers nonvolatile
> > + *
> > + * The intersection of volatile registers that don't contain possible
> > + * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs
> > + * upon entry without saving.  
> 
> Except that they must surely be set to some known value on exit in order
> to avoid leaking information to the guest.

True. I don't see why that's a problem for the entry code though.

Thanks,
Nick


Re: [PATCH] powerpc/xmon: add debugfs entry for xmon

2017-02-13 Thread Nicholas Piggin
On Mon, 13 Feb 2017 19:00:42 -0200
"Guilherme G. Piccoli"  wrote:

> Currently the xmon debugger is set only via kernel boot command-line.
> It's disabled by default, and can be enabled with "xmon=on" on the
> command-line. Also, xmon may be accessed via sysrq mechanism, but once
> we enter xmon via sysrq,  it's  kept enabled until system is rebooted,
> even if we exit the debugger. A kernel crash will then lead to xmon
> instance, instead of triggering a kdump procedure (if configured), for
> example.
> 
> This patch introduces a debugfs entry for xmon, allowing user to query
> its current state and change it if desired. Basically, the "xmon" file
> to read from/write to is under the debugfs mount point, on powerpc
> directory. Reading this file will provide the current state of the
> debugger, one of the following: "on", "off", "early" or "nobt". Writing
> one of these states to the file will take immediate effect on the debugger.
> 
> Signed-off-by: Guilherme G. Piccoli 
> ---
> * I had this patch partially done for some time, and after a discussion
> at the kernel slack channel latest week, I decided to rebase and fix
> some remaining bugs. I'd change 'x' option to always disable the debugger,
> since with this patch we can always re-enable xmon, but today I noticed
> Pan's patch on the mailing list, so perhaps his approach of adding a flag
> to 'x' option is preferable. I can change this in a V2, if requested.
> Thanks in advance!

xmon state changing after the first sysrq+x violates principle of least
astonishment, so I think that should be fixed.

Then the question is, is it worth making it runtime configurable with xmon
command or debugfs tunables?

Thanks,
Nick


Re: [PATCH v2] powerpc: Blacklist GCC 5.4 6.1 and 6.2

2017-02-13 Thread Segher Boessenkool
On Tue, Feb 14, 2017 at 11:25:43AM +1100, Cyril Bur wrote:
> > > At the time of writing GCC 5.4 is the most recent and is affected. GCC
> > > 6.3 contains the backported fix, has been tested and appears safe to
> > > use.
> > 
> > 6.3 is (of course) the newer release; 5.4 is a maintenance release of
> > a compiler that is a year older.
> 
> Yes. I think the point I was trying to make is that since they
> backported the fix to 5.x and 6.x then I expect that 5.5 will have the
> fix but since it doesn't exist yet, I can't be sure. I'll add something
> to that effect.

The patch has been backported to the GCC 5 branch; it will be part of
any future GCC 5 release (5.5 and later, if any later will happen; 5.5
will).

Don't be so unsure about these things, we aren't *that* incompetent ;-)

> > Please mention the GCC PR # somewhere in the code, too?
> 
> Sure.

Thanks!


Segher


[PATCH] powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n

2017-02-13 Thread Michael Ellerman
If we enable RADIX but disable HUGETLBFS, the build breaks with:

  arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of 
function 'pmd_huge'
  arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of 
function 'pud_huge'

Fix it by stubbing those functions when HUGETLBFS=n.

Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/book3s/64/pgtable-4k.h  | 5 +
 arch/powerpc/include/asm/book3s/64/pgtable-64k.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h 
b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
index 9db83b4e017d..8708a0239a56 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
@@ -47,7 +47,12 @@ static inline int hugepd_ok(hugepd_t hpd)
return hash__hugepd_ok(hpd);
 }
 #define is_hugepd(hpd) (hugepd_ok(hpd))
+
+#else /* !CONFIG_HUGETLB_PAGE */
+static inline int pmd_huge(pmd_t pmd) { return 0; }
+static inline int pud_huge(pud_t pud) { return 0; }
 #endif /* CONFIG_HUGETLB_PAGE */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /*_ASM_POWERPC_BOOK3S_64_PGTABLE_4K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h 
b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
index 198aff33c380..2ce4209399ed 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
@@ -46,6 +46,9 @@ static inline int hugepd_ok(hugepd_t hpd)
 }
 #define is_hugepd(pdep)0
 
+#else /* !CONFIG_HUGETLB_PAGE */
+static inline int pmd_huge(pmd_t pmd) { return 0; }
+static inline int pud_huge(pud_t pud) { return 0; }
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static inline int remap_4k_pfn(struct vm_area_struct *vma, unsigned long addr,
-- 
2.7.4



Re: [PATCH 1/5] selftests: Fix selftests build to just build, not run tests

2017-02-13 Thread Michael Ellerman
Michael Ellerman  writes:

> In commit 88baa78d1f31 ("selftests: remove duplicated all and clean
> target"), the "all" target was removed from individual Makefiles and
> added to lib.mk.
>
> However the "all" target was added to lib.mk *after* the existing
> "runtests" target. This means "runtests" becomes the first (default)
> target for most of our Makefiles.
...
>
> Fix it by moving the "all" target to the start of lib.mk, making it the
> default target.
>
> Fixes: 88baa78d1f31 ("selftests: remove duplicated all and clean target")
> Signed-off-by: Michael Ellerman 

Hi Shuah,

Can you please merge this series into linux-next?

The selftests are badly broken otherwise.

cheers


[PATCH v5 15/15] livepatch: allow removal of a disabled patch

2017-02-13 Thread Josh Poimboeuf
From: Miroslav Benes 

Currently we do not allow patch module to unload since there is no
method to determine if a task is still running in the patched code.

The consistency model gives us the way because when the unpatching
finishes we know that all tasks were marked as safe to call an original
function. Thus every new call to the function calls the original code
and at the same time no task can be somewhere in the patched code,
because it had to leave that code to be marked as safe.

We can safely let the patch module go after that.

Completion is used for synchronization between module removal and sysfs
infrastructure in a similar way to commit 942e443127e9 ("module: Fix
mod->mkobj.kobj potentially freed too early").

Note that we still do not allow the removal for immediate model, that is
no consistency model. The module refcount may increase in this case if
somebody disables and enables the patch several times. This should not
cause any harm.

With this change a call to try_module_get() is moved to
__klp_enable_patch from klp_register_patch to make module reference
counting symmetric (module_put() is in a patch disable path) and to
allow to take a new reference to a disabled module when being enabled.

Finally, we need to be very careful about possible races between
klp_unregister_patch(), kobject_put() functions and operations
on the related sysfs files.

kobject_put(>kobj) must be called without klp_mutex. Otherwise,
it might be blocked by enabled_store() that needs the mutex as well.
In addition, enabled_store() must check if the patch was not
unregisted in the meantime.

There is no need to do the same for other kobject_put() callsites
at the moment. Their sysfs operations neither take the lock nor
they access any data that might be freed in the meantime.

There was an attempt to use kobjects the right way and prevent these
races by design. But it made the patch definition more complicated
and opened another can of worms. See
https://lkml.kernel.org/r/1464018848-4303-1-git-send-email-pmla...@suse.com

[Thanks to Petr Mladek for improving the commit message.]

Signed-off-by: Miroslav Benes 
Signed-off-by: Josh Poimboeuf 
Reviewed-by: Petr Mladek 
---
 Documentation/livepatch/livepatch.txt | 28 
 include/linux/livepatch.h |  3 ++
 kernel/livepatch/core.c   | 80 ++-
 kernel/livepatch/transition.c | 12 +-
 samples/livepatch/livepatch-sample.c  |  1 -
 5 files changed, 72 insertions(+), 52 deletions(-)

diff --git a/Documentation/livepatch/livepatch.txt 
b/Documentation/livepatch/livepatch.txt
index 4f2aec8..ecdb181 100644
--- a/Documentation/livepatch/livepatch.txt
+++ b/Documentation/livepatch/livepatch.txt
@@ -316,8 +316,15 @@ section "Livepatch life-cycle" below for more details 
about these
 two operations.
 
 Module removal is only safe when there are no users of the underlying
-functions.  The immediate consistency model is not able to detect this;
-therefore livepatch modules cannot be removed. See "Limitations" below.
+functions. The immediate consistency model is not able to detect this. The
+code just redirects the functions at the very beginning and it does not
+check if the functions are in use. In other words, it knows when the
+functions get called but it does not know when the functions return.
+Therefore it cannot be decided when the livepatch module can be safely
+removed. This is solved by a hybrid consistency model. When the system is
+transitioned to a new patch state (patched/unpatched) it is guaranteed that
+no task sleeps or runs in the old code.
+
 
 5. Livepatch life-cycle
 ===
@@ -469,23 +476,6 @@ The current Livepatch implementation has several 
limitations:
 by "notrace".
 
 
-  + Livepatch modules can not be removed.
-
-The current implementation just redirects the functions at the very
-beginning. It does not check if the functions are in use. In other
-words, it knows when the functions get called but it does not
-know when the functions return. Therefore it can not decide when
-the livepatch module can be safely removed.
-
-This will get most likely solved once a more complex consistency model
-is supported. The idea is that a safe state for patching should also
-mean a safe state for removing the patch.
-
-Note that the patch itself might get disabled by writing zero
-to /sys/kernel/livepatch//enabled. It causes that the new
-code will not longer get called. But it does not guarantee
-that anyone is not sleeping anywhere in the new code.
-
 
   + Livepatch works reliably only when the dynamic ftrace is located at
 the very beginning of the function.
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index ed90ad1..194991e 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -23,6 +23,7 @@
 
 #include 
 

[PATCH v5 14/15] livepatch: add /proc//patch_state

2017-02-13 Thread Josh Poimboeuf
Expose the per-task patch state value so users can determine which tasks
are holding up completion of a patching operation.

Signed-off-by: Josh Poimboeuf 
Reviewed-by: Petr Mladek 
Reviewed-by: Miroslav Benes 
---
 Documentation/filesystems/proc.txt | 18 ++
 fs/proc/base.c | 15 +++
 2 files changed, 33 insertions(+)

diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index c94b467..9036dbf 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -44,6 +44,7 @@ Table of Contents
   3.8   /proc//fdinfo/ - Information about opened file
   3.9   /proc//map_files - Information about memory mapped files
   3.10  /proc//timerslack_ns - Task timerslack value
+  3.11 /proc//patch_state - Livepatch patch operation state
 
   4Configuring procfs
   4.1  Mount options
@@ -1887,6 +1888,23 @@ Valid values are from 0 - ULLONG_MAX
 An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level
 permissions on the task specified to change its timerslack_ns value.
 
+3.11   /proc//patch_state - Livepatch patch operation state
+-
+When CONFIG_LIVEPATCH is enabled, this file displays the value of the
+patch state for the task.
+
+A value of '-1' indicates that no patch is in transition.
+
+A value of '0' indicates that a patch is in transition and the task is
+unpatched.  If the patch is being enabled, then the task hasn't been
+patched yet.  If the patch is being disabled, then the task has already
+been unpatched.
+
+A value of '1' indicates that a patch is in transition and the task is
+patched.  If the patch is being enabled, then the task has already been
+patched.  If the patch is being disabled, then the task hasn't been
+unpatched yet.
+
 
 --
 Configuring procfs
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6e86558..5145f40 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2828,6 +2828,15 @@ static int proc_pid_personality(struct seq_file *m, 
struct pid_namespace *ns,
return err;
 }
 
+#ifdef CONFIG_LIVEPATCH
+static int proc_pid_patch_state(struct seq_file *m, struct pid_namespace *ns,
+   struct pid *pid, struct task_struct *task)
+{
+   seq_printf(m, "%d\n", task->patch_state);
+   return 0;
+}
+#endif /* CONFIG_LIVEPATCH */
+
 /*
  * Thread groups
  */
@@ -2927,6 +2936,9 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("timers", S_IRUGO, proc_timers_operations),
 #endif
REG("timerslack_ns", S_IRUGO|S_IWUGO, 
proc_pid_set_timerslack_ns_operations),
+#ifdef CONFIG_LIVEPATCH
+   ONE("patch_state",  S_IRUSR, proc_pid_patch_state),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
@@ -3309,6 +3321,9 @@ static const struct pid_entry tid_base_stuff[] = {
REG("projid_map", S_IRUGO|S_IWUSR, proc_projid_map_operations),
REG("setgroups",  S_IRUGO|S_IWUSR, proc_setgroups_operations),
 #endif
+#ifdef CONFIG_LIVEPATCH
+   ONE("patch_state",  S_IRUSR, proc_pid_patch_state),
+#endif
 };
 
 static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx)
-- 
2.7.4



[PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-13 Thread Josh Poimboeuf
Change livepatch to use a basic per-task consistency model.  This is the
foundation which will eventually enable us to patch those ~10% of
security patches which change function or data semantics.  This is the
biggest remaining piece needed to make livepatch more generally useful.

This code stems from the design proposal made by Vojtech [1] in November
2014.  It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with kpatch's stack
trace switching.  There are also a number of fallback options which make
it quite flexible.

Patches are applied on a per-task basis, when the task is deemed safe to
switch over.  When a patch is enabled, livepatch enters into a
transition state where tasks are converging to the patched state.
Usually this transition state can complete in a few seconds.  The same
sequence occurs when a patch is disabled, except the tasks converge from
the patched state to the unpatched state.

An interrupt handler inherits the patched state of the task it
interrupts.  The same is true for forked tasks: the child inherits the
patched state of the parent.

Livepatch uses several complementary approaches to determine when it's
safe to patch tasks:

1. The first and most effective approach is stack checking of sleeping
   tasks.  If no affected functions are on the stack of a given task,
   the task is patched.  In most cases this will patch most or all of
   the tasks on the first try.  Otherwise it'll keep trying
   periodically.  This option is only available if the architecture has
   reliable stacks (HAVE_RELIABLE_STACKTRACE).

2. The second approach, if needed, is kernel exit switching.  A
   task is switched when it returns to user space from a system call, a
   user space IRQ, or a signal.  It's useful in the following cases:

   a) Patching I/O-bound user tasks which are sleeping on an affected
  function.  In this case you have to send SIGSTOP and SIGCONT to
  force it to exit the kernel and be patched.
   b) Patching CPU-bound user tasks.  If the task is highly CPU-bound
  then it will get patched the next time it gets interrupted by an
  IRQ.
   c) In the future it could be useful for applying patches for
  architectures which don't yet have HAVE_RELIABLE_STACKTRACE.  In
  this case you would have to signal most of the tasks on the
  system.  However this isn't supported yet because there's
  currently no way to patch kthreads without
  HAVE_RELIABLE_STACKTRACE.

3. For idle "swapper" tasks, since they don't ever exit the kernel, they
   instead have a klp_update_patch_state() call in the idle loop which
   allows them to be patched before the CPU enters the idle state.

   (Note there's not yet such an approach for kthreads.)

All the above approaches may be skipped by setting the 'immediate' flag
in the 'klp_patch' struct, which will disable per-task consistency and
patch all tasks immediately.  This can be useful if the patch doesn't
change any function or data semantics.  Note that, even with this flag
set, it's possible that some tasks may still be running with an old
version of the function, until that function returns.

There's also an 'immediate' flag in the 'klp_func' struct which allows
you to specify that certain functions in the patch can be applied
without per-task consistency.  This might be useful if you want to patch
a common function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.

For architectures which don't have HAVE_RELIABLE_STACKTRACE, the user
must set patch->immediate which causes all tasks to be patched
immediately.  This option should be used with care, only when the patch
doesn't change any function or data semantics.

In the future, architectures which don't have HAVE_RELIABLE_STACKTRACE
may be allowed to use per-task consistency if we can come up with
another way to patch kthreads.

The /sys/kernel/livepatch//transition file shows whether a patch
is in transition.  Only a single patch (the topmost patch on the stack)
can be in transition at a given time.  A patch can remain in transition
indefinitely, if any of the tasks are stuck in the initial patch state.

A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch//enabled file while
the transition is in progress.  Then all the tasks will attempt to
converge back to the original patch state.

[1] https://lkml.kernel.org/r/20141107140458.ga21...@suse.cz

Signed-off-by: Josh Poimboeuf 
---
 Documentation/ABI/testing/sysfs-kernel-livepatch |   8 +
 Documentation/livepatch/livepatch.txt| 186 +++-
 include/linux/init_task.h|   9 +
 include/linux/livepatch.h|  42 +-
 include/linux/sched.h|   3 +
 kernel/fork.c|   3 +
 kernel/livepatch/Makefile| 

[PATCH v5 12/15] livepatch: store function sizes

2017-02-13 Thread Josh Poimboeuf
For the consistency model we'll need to know the sizes of the old and
new functions to determine if they're on the stacks of any tasks.

Signed-off-by: Josh Poimboeuf 
Acked-by: Miroslav Benes 
Reviewed-by: Petr Mladek 
Reviewed-by: Kamalesh Babulal 
---
 include/linux/livepatch.h |  3 +++
 kernel/livepatch/core.c   | 16 
 2 files changed, 19 insertions(+)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 9787a63..6602b34 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -37,6 +37,8 @@
  * @old_addr:  the address of the function being patched
  * @kobj:  kobject for sysfs resources
  * @stack_node:list node for klp_ops func_stack list
+ * @old_size:  size of the old function
+ * @new_size:  size of the new function
  * @patched:   the func has been added to the klp_ops list
  */
 struct klp_func {
@@ -56,6 +58,7 @@ struct klp_func {
unsigned long old_addr;
struct kobject kobj;
struct list_head stack_node;
+   unsigned long old_size, new_size;
bool patched;
 };
 
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 83c4949..10ba3a1 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -584,6 +584,22 @@ static int klp_init_object_loaded(struct klp_patch *patch,
 >old_addr);
if (ret)
return ret;
+
+   ret = kallsyms_lookup_size_offset(func->old_addr,
+ >old_size, NULL);
+   if (!ret) {
+   pr_err("kallsyms size lookup failed for '%s'\n",
+  func->old_name);
+   return -ENOENT;
+   }
+
+   ret = kallsyms_lookup_size_offset((unsigned long)func->new_func,
+ >new_size, NULL);
+   if (!ret) {
+   pr_err("kallsyms size lookup failed for '%s' 
replacement\n",
+  func->old_name);
+   return -ENOENT;
+   }
}
 
return 0;
-- 
2.7.4



[PATCH v5 11/15] livepatch: use kstrtobool() in enabled_store()

2017-02-13 Thread Josh Poimboeuf
The sysfs enabled value is a boolean, so kstrtobool() is a better fit
for parsing the input string since it does the range checking for us.

Suggested-by: Petr Mladek 
Signed-off-by: Josh Poimboeuf 
Acked-by: Miroslav Benes 
Reviewed-by: Petr Mladek 
---
 kernel/livepatch/core.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 6a137e1..83c4949 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -408,26 +408,23 @@ static ssize_t enabled_store(struct kobject *kobj, struct 
kobj_attribute *attr,
 {
struct klp_patch *patch;
int ret;
-   unsigned long val;
+   bool enabled;
 
-   ret = kstrtoul(buf, 10, );
+   ret = kstrtobool(buf, );
if (ret)
-   return -EINVAL;
-
-   if (val > 1)
-   return -EINVAL;
+   return ret;
 
patch = container_of(kobj, struct klp_patch, kobj);
 
mutex_lock(_mutex);
 
-   if (patch->enabled == val) {
+   if (patch->enabled == enabled) {
/* already in requested state */
ret = -EINVAL;
goto err;
}
 
-   if (val) {
+   if (enabled) {
ret = __klp_enable_patch(patch);
if (ret)
goto err;
-- 
2.7.4



[PATCH v5 10/15] livepatch: move patching functions into patch.c

2017-02-13 Thread Josh Poimboeuf
Move functions related to the actual patching of functions and objects
into a new patch.c file.

Signed-off-by: Josh Poimboeuf 
Acked-by: Miroslav Benes 
Reviewed-by: Petr Mladek 
Reviewed-by: Kamalesh Babulal 
---
 kernel/livepatch/Makefile |   2 +-
 kernel/livepatch/core.c   | 202 +--
 kernel/livepatch/patch.c  | 213 ++
 kernel/livepatch/patch.h  |  32 +++
 4 files changed, 247 insertions(+), 202 deletions(-)
 create mode 100644 kernel/livepatch/patch.c
 create mode 100644 kernel/livepatch/patch.h

diff --git a/kernel/livepatch/Makefile b/kernel/livepatch/Makefile
index e8780c0..e136dad 100644
--- a/kernel/livepatch/Makefile
+++ b/kernel/livepatch/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 
-livepatch-objs := core.o
+livepatch-objs := core.o patch.o
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 47ed643..6a137e1 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -24,32 +24,13 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
-
-/**
- * struct klp_ops - structure for tracking registered ftrace ops structs
- *
- * A single ftrace_ops is shared between all enabled replacement functions
- * (klp_func structs) which have the same old_addr.  This allows the switch
- * between function versions to happen instantaneously by updating the klp_ops
- * struct's func_stack list.  The winner is the klp_func at the top of the
- * func_stack (front of the list).
- *
- * @node:  node for the global klp_ops list
- * @func_stack:list head for the stack of klp_func's (active func is 
on top)
- * @fops:  registered ftrace ops struct
- */
-struct klp_ops {
-   struct list_head node;
-   struct list_head func_stack;
-   struct ftrace_ops fops;
-};
+#include "patch.h"
 
 /*
  * The klp_mutex protects the global lists and state transitions of any
@@ -60,28 +41,12 @@ struct klp_ops {
 static DEFINE_MUTEX(klp_mutex);
 
 static LIST_HEAD(klp_patches);
-static LIST_HEAD(klp_ops);
 
 static struct kobject *klp_root_kobj;
 
 /* TODO: temporary stub */
 void klp_update_patch_state(struct task_struct *task) {}
 
-static struct klp_ops *klp_find_ops(unsigned long old_addr)
-{
-   struct klp_ops *ops;
-   struct klp_func *func;
-
-   list_for_each_entry(ops, _ops, node) {
-   func = list_first_entry(>func_stack, struct klp_func,
-   stack_node);
-   if (func->old_addr == old_addr)
-   return ops;
-   }
-
-   return NULL;
-}
-
 static bool klp_is_module(struct klp_object *obj)
 {
return obj->name;
@@ -314,171 +279,6 @@ static int klp_write_object_relocations(struct module 
*pmod,
return ret;
 }
 
-static void notrace klp_ftrace_handler(unsigned long ip,
-  unsigned long parent_ip,
-  struct ftrace_ops *fops,
-  struct pt_regs *regs)
-{
-   struct klp_ops *ops;
-   struct klp_func *func;
-
-   ops = container_of(fops, struct klp_ops, fops);
-
-   rcu_read_lock();
-   func = list_first_or_null_rcu(>func_stack, struct klp_func,
- stack_node);
-   if (WARN_ON_ONCE(!func))
-   goto unlock;
-
-   klp_arch_set_pc(regs, (unsigned long)func->new_func);
-unlock:
-   rcu_read_unlock();
-}
-
-/*
- * Convert a function address into the appropriate ftrace location.
- *
- * Usually this is just the address of the function, but on some architectures
- * it's more complicated so allow them to provide a custom behaviour.
- */
-#ifndef klp_get_ftrace_location
-static unsigned long klp_get_ftrace_location(unsigned long faddr)
-{
-   return faddr;
-}
-#endif
-
-static void klp_unpatch_func(struct klp_func *func)
-{
-   struct klp_ops *ops;
-
-   if (WARN_ON(!func->patched))
-   return;
-   if (WARN_ON(!func->old_addr))
-   return;
-
-   ops = klp_find_ops(func->old_addr);
-   if (WARN_ON(!ops))
-   return;
-
-   if (list_is_singular(>func_stack)) {
-   unsigned long ftrace_loc;
-
-   ftrace_loc = klp_get_ftrace_location(func->old_addr);
-   if (WARN_ON(!ftrace_loc))
-   return;
-
-   WARN_ON(unregister_ftrace_function(>fops));
-   WARN_ON(ftrace_set_filter_ip(>fops, ftrace_loc, 1, 0));
-
-   list_del_rcu(>stack_node);
-   list_del(>node);
-   kfree(ops);
-   } else {
-   list_del_rcu(>stack_node);
-   }
-
-   func->patched = false;
-}
-
-static int klp_patch_func(struct klp_func *func)
-{
-   struct klp_ops *ops;
-   int ret;
-
-

[PATCH v5 09/15] livepatch: remove unnecessary object loaded check

2017-02-13 Thread Josh Poimboeuf
klp_patch_object()'s callers already ensure that the object is loaded,
so its call to klp_is_object_loaded() is unnecessary.

This will also make it possible to move the patching code into a
separate file.

Signed-off-by: Josh Poimboeuf 
Acked-by: Miroslav Benes 
Reviewed-by: Petr Mladek 
Reviewed-by: Kamalesh Babulal 
---
 kernel/livepatch/core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 2dbd355..47ed643 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -467,9 +467,6 @@ static int klp_patch_object(struct klp_object *obj)
if (WARN_ON(obj->patched))
return -EINVAL;
 
-   if (WARN_ON(!klp_is_object_loaded(obj)))
-   return -EINVAL;
-
klp_for_each_func(obj, func) {
ret = klp_patch_func(func);
if (ret) {
-- 
2.7.4



[PATCH v5 08/15] livepatch: separate enabled and patched states

2017-02-13 Thread Josh Poimboeuf
Once we have a consistency model, patches and their objects will be
enabled and disabled at different times.  For example, when a patch is
disabled, its loaded objects' funcs can remain registered with ftrace
indefinitely until the unpatching operation is complete and they're no
longer in use.

It's less confusing if we give them different names: patches can be
enabled or disabled; objects (and their funcs) can be patched or
unpatched:

- Enabled means that a patch is logically enabled (but not necessarily
  fully applied).

- Patched means that an object's funcs are registered with ftrace and
  added to the klp_ops func stack.

Also, since these states are binary, represent them with booleans
instead of ints.

Signed-off-by: Josh Poimboeuf 
Acked-by: Miroslav Benes 
Reviewed-by: Petr Mladek 
Reviewed-by: Kamalesh Babulal 
---
 include/linux/livepatch.h | 17 ---
 kernel/livepatch/core.c   | 72 +++
 2 files changed, 42 insertions(+), 47 deletions(-)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 5cc20e5..9787a63 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -28,11 +28,6 @@
 
 #include 
 
-enum klp_state {
-   KLP_DISABLED,
-   KLP_ENABLED
-};
-
 /**
  * struct klp_func - function structure for live patching
  * @old_name:  name of the function to be patched
@@ -41,8 +36,8 @@ enum klp_state {
  * can be found (optional)
  * @old_addr:  the address of the function being patched
  * @kobj:  kobject for sysfs resources
- * @state: tracks function-level patch application state
  * @stack_node:list node for klp_ops func_stack list
+ * @patched:   the func has been added to the klp_ops list
  */
 struct klp_func {
/* external */
@@ -60,8 +55,8 @@ struct klp_func {
/* internal */
unsigned long old_addr;
struct kobject kobj;
-   enum klp_state state;
struct list_head stack_node;
+   bool patched;
 };
 
 /**
@@ -71,7 +66,7 @@ struct klp_func {
  * @kobj:  kobject for sysfs resources
  * @mod:   kernel module associated with the patched object
  * (NULL for vmlinux)
- * @state: tracks object-level patch application state
+ * @patched:   the object's funcs have been added to the klp_ops list
  */
 struct klp_object {
/* external */
@@ -81,7 +76,7 @@ struct klp_object {
/* internal */
struct kobject kobj;
struct module *mod;
-   enum klp_state state;
+   bool patched;
 };
 
 /**
@@ -90,7 +85,7 @@ struct klp_object {
  * @objs:  object entries for kernel objects to be patched
  * @list:  list node for global list of registered patches
  * @kobj:  kobject for sysfs resources
- * @state: tracks patch-level application state
+ * @enabled:   the patch is enabled (but operation may be incomplete)
  */
 struct klp_patch {
/* external */
@@ -100,7 +95,7 @@ struct klp_patch {
/* internal */
struct list_head list;
struct kobject kobj;
-   enum klp_state state;
+   bool enabled;
 };
 
 #define klp_for_each_object(patch, obj) \
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 217b39d..2dbd355 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -348,11 +348,11 @@ static unsigned long klp_get_ftrace_location(unsigned 
long faddr)
 }
 #endif
 
-static void klp_disable_func(struct klp_func *func)
+static void klp_unpatch_func(struct klp_func *func)
 {
struct klp_ops *ops;
 
-   if (WARN_ON(func->state != KLP_ENABLED))
+   if (WARN_ON(!func->patched))
return;
if (WARN_ON(!func->old_addr))
return;
@@ -378,10 +378,10 @@ static void klp_disable_func(struct klp_func *func)
list_del_rcu(>stack_node);
}
 
-   func->state = KLP_DISABLED;
+   func->patched = false;
 }
 
-static int klp_enable_func(struct klp_func *func)
+static int klp_patch_func(struct klp_func *func)
 {
struct klp_ops *ops;
int ret;
@@ -389,7 +389,7 @@ static int klp_enable_func(struct klp_func *func)
if (WARN_ON(!func->old_addr))
return -EINVAL;
 
-   if (WARN_ON(func->state != KLP_DISABLED))
+   if (WARN_ON(func->patched))
return -EINVAL;
 
ops = klp_find_ops(func->old_addr);
@@ -437,7 +437,7 @@ static int klp_enable_func(struct klp_func *func)
list_add_rcu(>stack_node, >func_stack);
}
 
-   func->state = KLP_ENABLED;
+   func->patched = true;
 
return 0;
 
@@ -448,36 +448,36 @@ static int klp_enable_func(struct klp_func *func)
return ret;
 }
 
-static void klp_disable_object(struct klp_object *obj)
+static void klp_unpatch_object(struct klp_object *obj)
 {
struct klp_func *func;
 
klp_for_each_func(obj, func)
-   if 

[PATCH v5 07/15] livepatch/s390: add TIF_PATCH_PENDING thread flag

2017-02-13 Thread Josh Poimboeuf
From: Miroslav Benes 

Update a task's patch state when returning from a system call or user
space interrupt, or after handling a signal.

This greatly increases the chances of a patch operation succeeding.  If
a task is I/O bound, it can be patched when returning from a system
call.  If a task is CPU bound, it can be patched when returning from an
interrupt.  If a task is sleeping on a to-be-patched function, the user
can send SIGSTOP and SIGCONT to force it to switch.

Since there are two ways the syscall can be restarted on return from a
signal handling process, it is important to clear the flag before
do_signal() is called. Otherwise we could miss the migration if we used
SIGSTOP/SIGCONT procedure or fake signal to migrate patching blocking
tasks. If we place our hook to sysc_work label in entry before
TIF_SIGPENDING is evaluated we kill two birds with one stone. The task
is correctly migrated in all return paths from a syscall.

Signed-off-by: Miroslav Benes 
Signed-off-by: Josh Poimboeuf 
---
 arch/s390/include/asm/thread_info.h |  2 ++
 arch/s390/kernel/entry.S| 31 ++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/thread_info.h 
b/arch/s390/include/asm/thread_info.h
index 4977668..646845e 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -56,6 +56,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define TIF_SIGPENDING 1   /* signal pending */
 #define TIF_NEED_RESCHED   2   /* rescheduling necessary */
 #define TIF_UPROBE 3   /* breakpointed or single-stepping */
+#define TIF_PATCH_PENDING  4   /* pending live patching update */
 
 #define TIF_31BIT  16  /* 32bit process */
 #define TIF_MEMDIE 17  /* is terminating due to OOM killer */
@@ -74,6 +75,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define _TIF_SIGPENDING_BITUL(TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED  _BITUL(TIF_NEED_RESCHED)
 #define _TIF_UPROBE_BITUL(TIF_UPROBE)
+#define _TIF_PATCH_PENDING _BITUL(TIF_PATCH_PENDING)
 
 #define _TIF_31BIT _BITUL(TIF_31BIT)
 #define _TIF_SINGLE_STEP   _BITUL(TIF_SINGLE_STEP)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 34ab7e8..9a15eac 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -47,7 +47,7 @@ STACK_SIZE  = 1 << STACK_SHIFT
 STACK_INIT = STACK_SIZE - STACK_FRAME_OVERHEAD - __PT_SIZE
 
 _TIF_WORK  = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
-  _TIF_UPROBE)
+  _TIF_UPROBE | _TIF_PATCH_PENDING)
 _TIF_TRACE = (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | \
   _TIF_SYSCALL_TRACEPOINT)
 _CIF_WORK  = (_CIF_MCCK_PENDING | _CIF_ASCE | _CIF_FPU)
@@ -333,6 +333,11 @@ ENTRY(system_call)
 #endif
TSTMSK  __PT_FLAGS(%r11),_PIF_PER_TRAP
jo  .Lsysc_singlestep
+#ifdef CONFIG_LIVEPATCH
+   TSTMSK  __TI_flags(%r12),_TIF_PATCH_PENDING
+   jo  .Lsysc_patch_pending# handle live patching just before
+   # signals and possible syscall restart
+#endif
TSTMSK  __TI_flags(%r12),_TIF_SIGPENDING
jo  .Lsysc_sigpending
TSTMSK  __TI_flags(%r12),_TIF_NOTIFY_RESUME
@@ -405,6 +410,16 @@ ENTRY(system_call)
 #endif
 
 #
+# _TIF_PATCH_PENDING is set, call klp_update_patch_state
+#
+#ifdef CONFIG_LIVEPATCH
+.Lsysc_patch_pending:
+   lg  %r2,__LC_CURRENT# pass pointer to task struct
+   larl%r14,.Lsysc_return
+   jg  klp_update_patch_state
+#endif
+
+#
 # _PIF_PER_TRAP is set, call do_per_trap
 #
 .Lsysc_singlestep:
@@ -654,6 +669,10 @@ ENTRY(io_int_handler)
jo  .Lio_mcck_pending
TSTMSK  __TI_flags(%r12),_TIF_NEED_RESCHED
jo  .Lio_reschedule
+#ifdef CONFIG_LIVEPATCH
+   TSTMSK  __TI_flags(%r12),_TIF_PATCH_PENDING
+   jo  .Lio_patch_pending
+#endif
TSTMSK  __TI_flags(%r12),_TIF_SIGPENDING
jo  .Lio_sigpending
TSTMSK  __TI_flags(%r12),_TIF_NOTIFY_RESUME
@@ -700,6 +719,16 @@ ENTRY(io_int_handler)
j   .Lio_return
 
 #
+# _TIF_PATCH_PENDING is set, call klp_update_patch_state
+#
+#ifdef CONFIG_LIVEPATCH
+.Lio_patch_pending:
+   lg  %r2,__LC_CURRENT# pass pointer to task struct
+   larl%r14,.Lio_return
+   jg  klp_update_patch_state
+#endif
+
+#
 # _TIF_SIGPENDING or is set, call do_signal
 #
 .Lio_sigpending:
-- 
2.7.4



[PATCH v5 06/15] livepatch/s390: reorganize TIF thread flag bits

2017-02-13 Thread Josh Poimboeuf
From: Jiri Slaby 

Group the TIF thread flag bits by their inclusion in the _TIF_WORK and
_TIF_TRACE macros.

Signed-off-by: Jiri Slaby 
Signed-off-by: Josh Poimboeuf 
Reviewed-by: Miroslav Benes 
---
 arch/s390/include/asm/thread_info.h | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/thread_info.h 
b/arch/s390/include/asm/thread_info.h
index a5b54a4..4977668 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -51,14 +51,12 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 /*
  * thread information flags bit numbers
  */
+/* _TIF_WORK bits */
 #define TIF_NOTIFY_RESUME  0   /* callback before returning to user */
 #define TIF_SIGPENDING 1   /* signal pending */
 #define TIF_NEED_RESCHED   2   /* rescheduling necessary */
-#define TIF_SYSCALL_TRACE  3   /* syscall trace active */
-#define TIF_SYSCALL_AUDIT  4   /* syscall auditing active */
-#define TIF_SECCOMP5   /* secure computing */
-#define TIF_SYSCALL_TRACEPOINT 6   /* syscall tracepoint instrumentation */
-#define TIF_UPROBE 7   /* breakpointed or single-stepping */
+#define TIF_UPROBE 3   /* breakpointed or single-stepping */
+
 #define TIF_31BIT  16  /* 32bit process */
 #define TIF_MEMDIE 17  /* is terminating due to OOM killer */
 #define TIF_RESTORE_SIGMASK18  /* restore signal mask in do_signal() */
@@ -66,15 +64,23 @@ int arch_dup_task_struct(struct task_struct *dst, struct 
task_struct *src);
 #define TIF_BLOCK_STEP 20  /* This task is block stepped */
 #define TIF_UPROBE_SINGLESTEP  21  /* This task is uprobe single stepped */
 
+/* _TIF_TRACE bits */
+#define TIF_SYSCALL_TRACE  24  /* syscall trace active */
+#define TIF_SYSCALL_AUDIT  25  /* syscall auditing active */
+#define TIF_SECCOMP26  /* secure computing */
+#define TIF_SYSCALL_TRACEPOINT 27  /* syscall tracepoint instrumentation */
+
 #define _TIF_NOTIFY_RESUME _BITUL(TIF_NOTIFY_RESUME)
 #define _TIF_SIGPENDING_BITUL(TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED  _BITUL(TIF_NEED_RESCHED)
+#define _TIF_UPROBE_BITUL(TIF_UPROBE)
+
+#define _TIF_31BIT _BITUL(TIF_31BIT)
+#define _TIF_SINGLE_STEP   _BITUL(TIF_SINGLE_STEP)
+
 #define _TIF_SYSCALL_TRACE _BITUL(TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT _BITUL(TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP   _BITUL(TIF_SECCOMP)
 #define _TIF_SYSCALL_TRACEPOINT_BITUL(TIF_SYSCALL_TRACEPOINT)
-#define _TIF_UPROBE_BITUL(TIF_UPROBE)
-#define _TIF_31BIT _BITUL(TIF_31BIT)
-#define _TIF_SINGLE_STEP   _BITUL(TIF_SINGLE_STEP)
 
 #endif /* _ASM_THREAD_INFO_H */
-- 
2.7.4



[PATCH v5 05/15] livepatch/powerpc: add TIF_PATCH_PENDING thread flag

2017-02-13 Thread Josh Poimboeuf
Add the TIF_PATCH_PENDING thread flag to enable the new livepatch
per-task consistency model for powerpc.  The bit getting set indicates
the thread has a pending patch which needs to be applied when the thread
exits the kernel.

The bit is included in the _TIF_USER_WORK_MASK macro so that
do_notify_resume() and klp_update_patch_state() get called when the bit
is set.

Signed-off-by: Josh Poimboeuf 
Reviewed-by: Petr Mladek 
Reviewed-by: Miroslav Benes 
Reviewed-by: Kamalesh Babulal 
---
 arch/powerpc/include/asm/thread_info.h | 4 +++-
 arch/powerpc/kernel/signal.c   | 4 
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 87e4b2d..6fc6464 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -92,6 +92,7 @@ static inline struct thread_info *current_thread_info(void)
   TIF_NEED_RESCHED */
 #define TIF_32BIT  4   /* 32 bit binary */
 #define TIF_RESTORE_TM 5   /* need to restore TM FP/VEC/VSX */
+#define TIF_PATCH_PENDING  6   /* pending live patching update */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
 #define TIF_NOHZ   9   /* in adaptive nohz mode */
@@ -115,6 +116,7 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_POLLING_NRFLAG(1<

[PATCH v5 04/15] livepatch/x86: add TIF_PATCH_PENDING thread flag

2017-02-13 Thread Josh Poimboeuf
Add the TIF_PATCH_PENDING thread flag to enable the new livepatch
per-task consistency model for x86_64.  The bit getting set indicates
the thread has a pending patch which needs to be applied when the thread
exits the kernel.

The bit is placed in the _TIF_ALLWORK_MASK macro, which results in
exit_to_usermode_loop() calling klp_update_patch_state() when it's set.

Signed-off-by: Josh Poimboeuf 
Acked-by: Andy Lutomirski 
Reviewed-by: Petr Mladek 
Reviewed-by: Miroslav Benes 
Reviewed-by: Kamalesh Babulal 
---
 arch/x86/entry/common.c| 9 ++---
 arch/x86/include/asm/thread_info.h | 4 +++-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index b83c61c..6a9d564 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -129,14 +130,13 @@ static long syscall_trace_enter(struct pt_regs *regs)
 
 #define EXIT_TO_USERMODE_LOOP_FLAGS\
(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |   \
-_TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY)
+_TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING)
 
 static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
 {
/*
 * In order to return to user mode, we need to have IRQs off with
-* none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY,
-* _TIF_UPROBE, or _TIF_NEED_RESCHED set.  Several of these flags
+* none of EXIT_TO_USERMODE_LOOP_FLAGS set.  Several of these flags
 * can be set at any time on preemptable kernels if we have IRQs on,
 * so we need to loop.  Disabling preemption wouldn't help: doing the
 * work to clear some of the flags can sleep.
@@ -163,6 +163,9 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 
cached_flags)
if (cached_flags & _TIF_USER_RETURN_NOTIFY)
fire_user_return_notifiers();
 
+   if (cached_flags & _TIF_PATCH_PENDING)
+   klp_update_patch_state(current);
+
/* Disable IRQs and retry */
local_irq_disable();
 
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 207d0d9..83372dc 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -84,6 +84,7 @@ struct thread_info {
 #define TIF_SECCOMP8   /* secure computing */
 #define TIF_USER_RETURN_NOTIFY 11  /* notify kernel of userspace return */
 #define TIF_UPROBE 12  /* breakpointed or singlestepping */
+#define TIF_PATCH_PENDING  13  /* pending live patching update */
 #define TIF_NOTSC  16  /* TSC is not accessible in userland */
 #define TIF_IA32   17  /* IA32 compatibility process */
 #define TIF_NOHZ   19  /* in adaptive nohz mode */
@@ -107,6 +108,7 @@ struct thread_info {
 #define _TIF_SECCOMP   (1 << TIF_SECCOMP)
 #define _TIF_USER_RETURN_NOTIFY(1 << TIF_USER_RETURN_NOTIFY)
 #define _TIF_UPROBE(1 << TIF_UPROBE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOTSC (1 << TIF_NOTSC)
 #define _TIF_IA32  (1 << TIF_IA32)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
@@ -133,7 +135,7 @@ struct thread_info {
(_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |\
 _TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU |   \
 _TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE |   \
-_TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT)
+_TIF_PATCH_PENDING | _TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT)
 
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW
\
-- 
2.7.4



[PATCH v5 03/15] livepatch: create temporary klp_update_patch_state() stub

2017-02-13 Thread Josh Poimboeuf
Create temporary stubs for klp_update_patch_state() so we can add
TIF_PATCH_PENDING to different architectures in separate patches without
breaking build bisectability.

Signed-off-by: Josh Poimboeuf 
Reviewed-by: Petr Mladek 
---
 include/linux/livepatch.h | 5 -
 kernel/livepatch/core.c   | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 9072f04..5cc20e5 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -123,10 +123,13 @@ void arch_klp_init_object_loaded(struct klp_patch *patch,
 int klp_module_coming(struct module *mod);
 void klp_module_going(struct module *mod);
 
+void klp_update_patch_state(struct task_struct *task);
+
 #else /* !CONFIG_LIVEPATCH */
 
 static inline int klp_module_coming(struct module *mod) { return 0; }
-static inline void klp_module_going(struct module *mod) { }
+static inline void klp_module_going(struct module *mod) {}
+static inline void klp_update_patch_state(struct task_struct *task) {}
 
 #endif /* CONFIG_LIVEPATCH */
 
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index af46438..217b39d 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -64,6 +64,9 @@ static LIST_HEAD(klp_ops);
 
 static struct kobject *klp_root_kobj;
 
+/* TODO: temporary stub */
+void klp_update_patch_state(struct task_struct *task) {}
+
 static struct klp_ops *klp_find_ops(unsigned long old_addr)
 {
struct klp_ops *ops;
-- 
2.7.4



[PATCH v5 02/15] x86/entry: define _TIF_ALLWORK_MASK flags explicitly

2017-02-13 Thread Josh Poimboeuf
The _TIF_ALLWORK_MASK macro automatically includes the least-significant
16 bits of the thread_info flags, which is less than obvious and tends
to create confusion and surprises when reading or modifying the code.

Define the flags explicitly.

Signed-off-by: Josh Poimboeuf 
Reviewed-by: Petr Mladek 
Reviewed-by: Miroslav Benes 
Reviewed-by: Kamalesh Babulal 
---
 arch/x86/include/asm/thread_info.h | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index ad6f5eb0..207d0d9 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -73,9 +73,6 @@ struct thread_info {
  * thread information flags
  * - these are process state flags that various assembly files
  *   may need to access
- * - pending work-to-be-done flags are in LSW
- * - other flags in MSW
- * Warning: layout of LSW is hardcoded in entry.S
  */
 #define TIF_SYSCALL_TRACE  0   /* syscall trace active */
 #define TIF_NOTIFY_RESUME  1   /* callback before returning to user */
@@ -103,8 +100,8 @@ struct thread_info {
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_SIGPENDING(1 << TIF_SIGPENDING)
-#define _TIF_SINGLESTEP(1 << TIF_SINGLESTEP)
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
+#define _TIF_SINGLESTEP(1 << TIF_SINGLESTEP)
 #define _TIF_SYSCALL_EMU   (1 << TIF_SYSCALL_EMU)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP   (1 << TIF_SECCOMP)
@@ -133,8 +130,10 @@ struct thread_info {
 
 /* work to do on any return to user space */
 #define _TIF_ALLWORK_MASK  \
-   ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT |   \
-   _TIF_NOHZ)
+   (_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |\
+_TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU |   \
+_TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE |   \
+_TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT)
 
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW
\
-- 
2.7.4



[PATCH v5 01/15] stacktrace/x86: add function for detecting reliable stack traces

2017-02-13 Thread Josh Poimboeuf
For live patching and possibly other use cases, a stack trace is only
useful if it can be assured that it's completely reliable.  Add a new
save_stack_trace_tsk_reliable() function to achieve that.

Note that if the target task isn't the current task, and the target task
is allowed to run, then it could be writing the stack while the unwinder
is reading it, resulting in possible corruption.  So the caller of
save_stack_trace_tsk_reliable() must ensure that the task is either
'current' or inactive.

save_stack_trace_tsk_reliable() relies on the x86 unwinder's detection
of pt_regs on the stack.  If the pt_regs are not user-mode registers
from a syscall, then they indicate an in-kernel interrupt or exception
(e.g. preemption or a page fault), in which case the stack is considered
unreliable due to the nature of frame pointers.

It also relies on the x86 unwinder's detection of other issues, such as:

- corrupted stack data
- stack grows the wrong way
- stack walk doesn't reach the bottom
- user didn't provide a large enough entries array

Such issues are reported by checking unwind_error() and !unwind_done().

Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
determine at build time whether the function is implemented.

Signed-off-by: Josh Poimboeuf 
---
 arch/Kconfig   |  6 +++
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/unwind.h  |  6 +++
 arch/x86/kernel/stacktrace.c   | 96 +-
 arch/x86/kernel/unwind_frame.c |  2 +
 include/linux/stacktrace.h |  9 ++--
 kernel/stacktrace.c| 12 +-
 7 files changed, 126 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 80f3e5e..478b939 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -749,6 +749,12 @@ config HAVE_STACK_VALIDATION
  Architecture supports the 'objtool check' host tool command, which
  performs compile-time stack metadata validation.
 
+config HAVE_RELIABLE_STACKTRACE
+   bool
+   help
+ Architecture has a save_stack_trace_tsk_reliable() function which
+ only returns a stack trace if it can guarantee the trace is reliable.
+
 config HAVE_ARCH_HASH
bool
default n
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ea82a7b..e79fbf8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -160,6 +160,7 @@ config X86
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && 
STACK_VALIDATION
select HAVE_STACK_VALIDATIONif X86_64
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 6fa75b1..137e9cce 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -11,6 +11,7 @@ struct unwind_state {
unsigned long stack_mask;
struct task_struct *task;
int graph_idx;
+   bool error;
 #ifdef CONFIG_FRAME_POINTER
unsigned long *bp, *orig_sp;
struct pt_regs *regs;
@@ -40,6 +41,11 @@ void unwind_start(struct unwind_state *state, struct 
task_struct *task,
__unwind_start(state, task, regs, first_frame);
 }
 
+static inline bool unwind_error(struct unwind_state *state)
+{
+   return state->error;
+}
+
 #ifdef CONFIG_FRAME_POINTER
 
 static inline
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 0653788..c5490d9 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -74,6 +74,101 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct 
stack_trace *trace)
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+
+#define STACKTRACE_DUMP_ONCE(task) ({  \
+   static bool __section(.data.unlikely) __dumped; \
+   \
+   if (!__dumped) {\
+   __dumped = true;\
+   WARN_ON(1); \
+   show_stack(task, NULL); \
+   }   \
+})
+
+static int __save_stack_trace_reliable(struct stack_trace *trace,
+  struct task_struct *task)
+{
+   struct unwind_state state;
+   struct pt_regs *regs;
+   unsigned long addr;
+
+   for (unwind_start(, task, NULL, NULL); !unwind_done();
+unwind_next_frame()) {
+
+   regs = unwind_get_entry_regs();
+   if (regs) {
+   /*
+* Kernel mode registers on the stack indicate an
+* in-kernel interrupt or exception (e.g., preemption
+* 

[PATCH v5 00/15] livepatch: hybrid consistency model

2017-02-13 Thread Josh Poimboeuf
Here's v5 of the consistency model, targeted for 4.12.  Only a few minor
changes this time.

I would very much appreciate reviews/acks from the following:

- Michael Ellerman for the powerpc changes in patch 5.

- Heiko Carstens for the s390 changes in patches 6 & 7.

- Peter Zijlstra/Ingo Molnar for the use of task_rq_lock() and the
  modification of do_idle() in patch 13.

Thanks!

Based on linux-next/master (20170213).

v5:
- return -EINVAL in __save_stack_trace_reliable()
- only call show_stack() once
- add save_stack_trace_tsk_reliable() define for !CONFIG_STACKTRACE
- update kernel version and date in ABI doc
- make suggested improvements to livepatch.txt
- update barrier comments
- remove klp_try_complete_transition() call from klp_start_transition()
- move end of klp_try_complete_transition() into klp_complete_transition()
- fix __klp_enable_patch() error path
- check for transition in klp_module_going()

v4:
- add warnings for "impossible" scenarios in __save_stack_trace_reliable()
- sort _TIF_ALLWORK_MASK flags
- move klp_transition_work to transition.c.  This resulted in the following 
  related changes:
  - klp_mutex is now visible to transition.c
  - klp_start_transition() now calls klp_try_complete_transition()
  - klp_try_complete_transition() now sets up the work
  - rearrange code in transition.c accordingly
- klp_reverse_transition(): clear TIF flags and call synchronize_rcu()
- klp_try_complete_transition(): do synchronize_rcu() only when unpatching
- klp_start_transition(): only set TIF flags when necessary
- klp_complete_transition(): add synchronize_rcu() when patching
- klp_ftrace_handler(): put WARN_ON_ONCE back in and add comment
- use for_each_possible_cpu() to patch offline idle tasks
- add warnings to sample module when setting patch.immediate
- don't use pr_debug() with the task rq lock
- add documentation about porting consistency model to other arches
- move klp_patch_pending() to patch 13
- improve several comments and commit messages

v3:
- rebase on new x86 unwinder
- force !HAVE_RELIABLE_STACKTRACE arches to use patch->immediate for
  now, because we don't have a way to transition kthreads otherwise
- rebase s390 TIF_PATCH_PENDING patch onto latest entry code
- update barrier comments and move barrier from the end of
  klp_init_transition() to its callers
- "klp_work" -> "klp_transition_work"
- "klp_patch_task()" -> "klp_update_patch_state()"
- explicit _TIF_ALLWORK_MASK
- change klp_reverse_transition() to not try to complete transition.
  instead modify the work queue delay to zero.
- get rid of klp_schedule_work() in favor of calling
  schedule_delayed_work() directly with a KLP_TRANSITION_DELAY
- initialize klp_target_state to KLP_UNDEFINED
- move klp_target_state assignment to before patch->immediate check in
  klp_init_transition()
- rcu_read_lock() in klp_update_patch_state(), test the thread flag in
  patch task, synchronize_rcu() in klp_complete_transition()
- use kstrtobool() in enabled_store()
- change task_rq_lock() argument type to struct rq_flags
- add several WARN_ON_ONCE assertions for klp_target_state and
  task->patch_state

v2:
- "universe" -> "patch state"
- rename klp_update_task_universe() -> klp_patch_task()
- add preempt IRQ tracking (TF_PREEMPT_IRQ)
- fix print_context_stack_reliable() bug
- improve print_context_stack_reliable() comments
- klp_ftrace_handler comment fixes
- add "patch_state" proc file to tid_base_stuff
- schedule work even for !RELIABLE_STACKTRACE
- forked child inherits patch state from parent
- add detailed comment to livepatch.h klp_func definition about the
  klp_func patched/transition state transitions
- update exit_to_usermode_loop() comment
- clear all TIF_KLP_NEED_UPDATE flags in klp_complete_transition()
- remove unnecessary function externs
- add livepatch documentation, sysfs documentation, /proc documentation
- /proc/pid/patch_state: -1 means no patch is currently being applied/reverted
- "TIF_KLP_NEED_UPDATE" -> "TIF_PATCH_PENDING"
- support for s390 and powerpc-le
- don't assume stacks with dynamic ftrace trampolines are reliable
- add _TIF_ALLWORK_MASK info to commit log

v1.9:
- revive from the dead and rebased
- reliable stacks!
- add support for immediate consistency model
- add a ton of comments
- fix up memory barriers
- remove "allow patch modules to be removed" patch for now, it still 
  needs more discussion and thought - it can be done with something
- "proc/pid/universe" -> "proc/pid/patch_status"
- remove WARN_ON_ONCE from !func condition in ftrace handler -- can
  happen because of RCU
- keep klp_mutex private by putting the work_fn in core.c
- convert states from int to boolean
- remove obsolete '@state' comments
- several header file and include improvements suggested by Jiri S
- change kallsyms_lookup_size_offset() errors from EINVAL -

Re: [PATCH v2] powerpc: Blacklist GCC 5.4 6.1 and 6.2

2017-02-13 Thread Cyril Bur
On Mon, 2017-02-13 at 09:44 -0600, Segher Boessenkool wrote:
> Hi Cyril,
> 
> On Mon, Feb 13, 2017 at 02:35:36PM +1100, Cyril Bur wrote:
> > A bug in the -02 optimisation of GCC 5.4 6.1 and 6.2 causes
> > setup_command_line() to not pass the correct first argument to strcpy
> > and therefore not actually copy the command_line.
> 
> There is no such thing as an "-O2 optimisation".

Right, perhaps I should have phrased it as "One of the -O2 level
optimisations of GCC 5.4, 6.1 and 6.2 causes setup_command_line() to
not pass the correct first argument to strcpy and therefore not
actually copy the command_line, -O1 does not have this problem."

> 
> > At the time of writing GCC 5.4 is the most recent and is affected. GCC
> > 6.3 contains the backported fix, has been tested and appears safe to
> > use.
> 
> 6.3 is (of course) the newer release; 5.4 is a maintenance release of
> a compiler that is a year older.

Yes. I think the point I was trying to make is that since they
backported the fix to 5.x and 6.x then I expect that 5.5 will have the
fix but since it doesn't exist yet, I can't be sure. I'll add something
to that effect.

> 
> > +# - gcc-5.4, 6.1, 6.2 don't copy the command_line around correctly
> > +   echo -n '*** GCC-5.4 6.1 6.2 have a bad -O2 optimisation ' ; \
> > +   echo 'which will cause lost command_line options (at least).' ; 
> > \
> 
> Maybe something more like
> 
> "GCC 5.4, 6.1, and 6.2 have a bug that results in a kernel that does
> not boot.  Please use GCC 6.3 or later.".

"that may not boot" is more accurate, if it can boot without a
command_line param it might just do so.

> 
> Please mention the GCC PR # somewhere in the code, too?
> 

Sure.

Thanks,

Cyril

> 
> Segher


[PATCH] powerpc/xmon: add debugfs entry for xmon

2017-02-13 Thread Guilherme G. Piccoli
Currently the xmon debugger is set only via kernel boot command-line.
It's disabled by default, and can be enabled with "xmon=on" on the
command-line. Also, xmon may be accessed via sysrq mechanism, but once
we enter xmon via sysrq,  it's  kept enabled until system is rebooted,
even if we exit the debugger. A kernel crash will then lead to xmon
instance, instead of triggering a kdump procedure (if configured), for
example.

This patch introduces a debugfs entry for xmon, allowing user to query
its current state and change it if desired. Basically, the "xmon" file
to read from/write to is under the debugfs mount point, on powerpc
directory. Reading this file will provide the current state of the
debugger, one of the following: "on", "off", "early" or "nobt". Writing
one of these states to the file will take immediate effect on the debugger.

Signed-off-by: Guilherme G. Piccoli 
---
* I had this patch partially done for some time, and after a discussion
at the kernel slack channel latest week, I decided to rebase and fix
some remaining bugs. I'd change 'x' option to always disable the debugger,
since with this patch we can always re-enable xmon, but today I noticed
Pan's patch on the mailing list, so perhaps his approach of adding a flag
to 'x' option is preferable. I can change this in a V2, if requested.
Thanks in advance!

 arch/powerpc/xmon/xmon.c | 124 +++
 1 file changed, 105 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9c0e17c..5fb39db 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -29,6 +29,12 @@
 #include 
 #include 
 
+#ifdef CONFIG_DEBUG_FS
+#include 
+#include 
+#include 
+#endif
+
 #include 
 #include 
 #include 
@@ -184,7 +190,12 @@ static void dump_tlb_44x(void);
 static void dump_tlb_book3e(void);
 #endif
 
-static int xmon_no_auto_backtrace;
+/* xmon_state values */
+#define XMON_OFF   0
+#define XMON_ON1
+#define XMON_EARLY 2
+#define XMON_NOBT  3
+static int xmon_state;
 
 #ifdef CONFIG_PPC64
 #define REG"%.16lx"
@@ -880,8 +891,8 @@ cmds(struct pt_regs *excp)
last_cmd = NULL;
xmon_regs = excp;
 
-   if (!xmon_no_auto_backtrace) {
-   xmon_no_auto_backtrace = 1;
+   if (xmon_state != XMON_NOBT) {
+   xmon_state = XMON_NOBT;
xmon_show_stack(excp->gpr[1], excp->link, excp->nip);
}
 
@@ -3244,6 +3255,26 @@ static void xmon_init(int enable)
}
 }
 
+static int parse_xmon(char *p)
+{
+   if (!p || strncmp(p, "early", 5) == 0) {
+   /* just "xmon" is equivalent to "xmon=early" */
+   xmon_init(1);
+   xmon_state = XMON_EARLY;
+   } else if (strncmp(p, "on", 2) == 0) {
+   xmon_init(1);
+   xmon_state = XMON_ON;
+   } else if (strncmp(p, "off", 3) == 0) {
+   xmon_init(0);
+   xmon_state = XMON_OFF;
+   } else if (strncmp(p, "nobt", 4) == 0)
+   xmon_state = XMON_NOBT;
+   else
+   return 1;
+
+   return 0;
+}
+
 #ifdef CONFIG_MAGIC_SYSRQ
 static void sysrq_handle_xmon(int key)
 {
@@ -3266,34 +3297,89 @@ static int __init setup_xmon_sysrq(void)
 __initcall(setup_xmon_sysrq);
 #endif /* CONFIG_MAGIC_SYSRQ */
 
-static int __initdata xmon_early, xmon_off;
+#ifdef CONFIG_DEBUG_FS
+static ssize_t xmon_dbgfs_read(struct file *file, char __user *ubuffer,
+   size_t len, loff_t *offset)
+{
+   int buf_len = 0;
+   char buf[6] = { 0 };
 
-static int __init early_parse_xmon(char *p)
+   switch (xmon_state) {
+   case XMON_OFF:
+   buf_len = sprintf(buf, "off");
+   break;
+   case XMON_ON:
+   buf_len = sprintf(buf, "on");
+   break;
+   case XMON_EARLY:
+   buf_len = sprintf(buf, "early");
+   break;
+   case XMON_NOBT:
+   buf_len = sprintf(buf, "nobt");
+   break;
+   }
+
+   return simple_read_from_buffer(ubuffer, len, offset, buf, buf_len);
+}
+
+static ssize_t xmon_dbgfs_write(struct file *file, const char __user *ubuffer,
+   size_t len, loff_t *offset)
 {
-   if (!p || strncmp(p, "early", 5) == 0) {
-   /* just "xmon" is equivalent to "xmon=early" */
-   xmon_init(1);
-   xmon_early = 1;
-   } else if (strncmp(p, "on", 2) == 0)
-   xmon_init(1);
-   else if (strncmp(p, "off", 3) == 0)
-   xmon_off = 1;
-   else if (strncmp(p, "nobt", 4) == 0)
-   xmon_no_auto_backtrace = 1;
-   else
-   return 1;
+   int ret, not_copied;
+   char *buf;
+
+   /* Valid states are on, off, early and nobt. */
+   if ((*offset != 0) || (len <= 0) || (len > 6))
+return -EINVAL;
+
+   buf = kzalloc(len + 1, 

[PATCH] KVM: PPC: Book3S: Ratelimit copy data failure error messages

2017-02-13 Thread Vipin K Parashar
kvm_ppc_mmu_book3s_32/64 xlat() log "KVM can't copy data" error
upon failing to copy user data to kernel space. This floods kernel
log once such fails occur in short time period. Ratelimit this
error to avoid flooding kernel logs upon copy data failures.

Signed-off-by: Vipin K Parashar 
---
 arch/powerpc/kvm/book3s_32_mmu.c | 3 ++-
 arch/powerpc/kvm/book3s_64_mmu.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index a2eb6d3..ca8f960 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -224,7 +224,8 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary);
 
if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
-   printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp);
+   if (printk_ratelimit())
+   printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", 
ptegp);
goto no_page_found;
}
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index b9131aa..b420aca 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -265,7 +265,8 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
goto no_page_found;
 
if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
-   printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp);
+   if (printk_ratelimit())
+   printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", 
ptegp);
goto no_page_found;
}
 
-- 
2.7.4



Re: [PATCH v2] powerpc: Blacklist GCC 5.4 6.1 and 6.2

2017-02-13 Thread Segher Boessenkool
Hi Cyril,

On Mon, Feb 13, 2017 at 02:35:36PM +1100, Cyril Bur wrote:
> A bug in the -02 optimisation of GCC 5.4 6.1 and 6.2 causes
> setup_command_line() to not pass the correct first argument to strcpy
> and therefore not actually copy the command_line.

There is no such thing as an "-O2 optimisation".

> At the time of writing GCC 5.4 is the most recent and is affected. GCC
> 6.3 contains the backported fix, has been tested and appears safe to
> use.

6.3 is (of course) the newer release; 5.4 is a maintenance release of
a compiler that is a year older.

> +# - gcc-5.4, 6.1, 6.2 don't copy the command_line around correctly

> + echo -n '*** GCC-5.4 6.1 6.2 have a bad -O2 optimisation ' ; \
> + echo 'which will cause lost command_line options (at least).' ; 
> \

Maybe something more like

"GCC 5.4, 6.1, and 6.2 have a bug that results in a kernel that does
not boot.  Please use GCC 6.3 or later.".

Please mention the GCC PR # somewhere in the code, too?


Segher


[PATCH V2 2/2] powerpc/mm/slice: Update slice mask printing to use bitmap printing.

2017-02-13 Thread Aneesh Kumar K.V
We now get output like below which is much better.

[0.935306]  good_mask low_slice: 0-15
[0.935360]  good_mask high_slice: 0-511

Compared to

[0.953414]  good_mask: - 1.

I also fixed an error with slice_dbg printing.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/slice.c | 30 +++---
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 08ac27eae408..d3701b0f439f 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -53,29 +53,13 @@ int _slice_debug = 1;
 
 static void slice_print_mask(const char *label, struct slice_mask mask)
 {
-   char*p, buf[SLICE_NUM_LOW + 3 + SLICE_NUM_HIGH + 1];
-   int i;
-
if (!_slice_debug)
return;
-   p = buf;
-   for (i = 0; i < SLICE_NUM_LOW; i++)
-   *(p++) = (mask.low_slices & (1 << i)) ? '1' : '0';
-   *(p++) = ' ';
-   *(p++) = '-';
-   *(p++) = ' ';
-   for (i = 0; i < SLICE_NUM_HIGH; i++) {
-   if (test_bit(i, mask.high_slices))
-   *(p++) = '1';
-   else
-   *(p++) = '0';
-   }
-   *(p++) = 0;
-
-   printk(KERN_DEBUG "%s:%s\n", label, buf);
+   pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, 
_slices);
+   pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, 
mask.high_slices);
 }
 
-#define slice_dbg(fmt...) do { if (_slice_debug) pr_debug(fmt); } while(0)
+#define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0)
 
 #else
 
@@ -242,8 +226,8 @@ static void slice_convert(struct mm_struct *mm, struct 
slice_mask mask, int psiz
}
 
slice_dbg(" lsps=%lx, hsps=%lx\n",
- mm->context.low_slices_psize,
- mm->context.high_slices_psize);
+ (unsigned long)mm->context.low_slices_psize,
+ (unsigned long)mm->context.high_slices_psize);
 
spin_unlock_irqrestore(_convert_lock, flags);
 
@@ -685,8 +669,8 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)
 
 
slice_dbg(" lsps=%lx, hsps=%lx\n",
- mm->context.low_slices_psize,
- mm->context.high_slices_psize);
+ (unsigned long)mm->context.low_slices_psize,
+ (unsigned long)mm->context.high_slices_psize);
 
  bail:
spin_unlock_irqrestore(_convert_lock, flags);
-- 
2.7.4



[PATCH V2 1/2] powerpc/mm/slice: Move slice_mask struct definition to slice.c

2017-02-13 Thread Aneesh Kumar K.V
This structure definition need not be in a header since this is used only by
slice.c file. So move it to slice.c. This also allow us to use SLICE_NUM_HIGH
instead of 512 and also helps in getting rid of one BUILD_BUG_ON().

I also switch the low_slices type to u64 from u16. This doesn't have an impact
on size of struct due to padding added with u16 type. This helps in using
bitmap printing function for printing slice mask.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/page_64.h | 11 ---
 arch/powerpc/mm/slice.c| 13 ++---
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/page_64.h 
b/arch/powerpc/include/asm/page_64.h
index 9b60e9455c6e..3ecfc2734b50 100644
--- a/arch/powerpc/include/asm/page_64.h
+++ b/arch/powerpc/include/asm/page_64.h
@@ -99,17 +99,6 @@ extern u64 ppc64_pft_size;
 #define GET_HIGH_SLICE_INDEX(addr) ((addr) >> SLICE_HIGH_SHIFT)
 
 #ifndef __ASSEMBLY__
-/*
- * One bit per slice. We have lower slices which cover 256MB segments
- * upto 4G range. That gets us 16 low slices. For the rest we track slices
- * in 1TB size.
- * 64 below is actually SLICE_NUM_HIGH to fixup complie errros
- */
-struct slice_mask {
-   u16 low_slices;
-   DECLARE_BITMAP(high_slices, 512);
-};
-
 struct mm_struct;
 
 extern unsigned long slice_get_unmapped_area(unsigned long addr,
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b3f45e413a60..08ac27eae408 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -37,7 +37,16 @@
 #include 
 
 static DEFINE_SPINLOCK(slice_convert_lock);
-
+/*
+ * One bit per slice. We have lower slices which cover 256MB segments
+ * upto 4G range. That gets us 16 low slices. For the rest we track slices
+ * in 1TB size.
+ * 64 below is actually SLICE_NUM_HIGH to fixup complie errros
+ */
+struct slice_mask {
+   u64 low_slices;
+   DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
+};
 
 #ifdef DEBUG
 int _slice_debug = 1;
@@ -407,8 +416,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
struct mm_struct *mm = current->mm;
unsigned long newaddr;
 
-   /* Make sure high_slices bitmap size is same as we expected */
-   BUILD_BUG_ON(512 != SLICE_NUM_HIGH);
/*
 * init different masks
 */
-- 
2.7.4



Re: [PATCH] powerpc/mm/slice: Update slice mask printing to use bitmap printing.

2017-02-13 Thread Aneesh Kumar K.V



On Monday 13 February 2017 04:40 PM, Aneesh Kumar K.V wrote:

We now get output like below which is much better.

[0.935306]  good_mask low_slice: 4-5,9,11-13
[0.935360]  good_mask high_slice: 0-511
[0.935385]  mask low_slice: 3-6,8,10-12
[0.935397]  mask high_slice:

Compared to

[0.953414]  good_mask: - 1.

I also fixed an error with slice_dbg printing.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/mm/slice.c | 30 +++---
  1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b3f45e413a60..0575897fdbe3 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -44,29 +44,13 @@ int _slice_debug = 1;

  static void slice_print_mask(const char *label, struct slice_mask mask)
  {
-   char*p, buf[SLICE_NUM_LOW + 3 + SLICE_NUM_HIGH + 1];
-   int i;
-
if (!_slice_debug)
return;
-   p = buf;
-   for (i = 0; i < SLICE_NUM_LOW; i++)
-   *(p++) = (mask.low_slices & (1 << i)) ? '1' : '0';
-   *(p++) = ' ';
-   *(p++) = '-';
-   *(p++) = ' ';
-   for (i = 0; i < SLICE_NUM_HIGH; i++) {
-   if (test_bit(i, mask.high_slices))
-   *(p++) = '1';
-   else
-   *(p++) = '0';
-   }
-   *(p++) = 0;
-
-   printk(KERN_DEBUG "%s:%s\n", label, buf);
+   pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, 
_slices);



This doesn't work as expected because low_slices is of type u16. I am 
fixing that to be u64.




+   pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, 
mask.high_slices);
  }

-#define slice_dbg(fmt...) do { if (_slice_debug) pr_debug(fmt); } while(0)
+#define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0)

  #else

@@ -233,8 +217,8 @@ static void slice_convert(struct mm_struct *mm, struct 
slice_mask mask, int psiz
}

slice_dbg(" lsps=%lx, hsps=%lx\n",
- mm->context.low_slices_psize,
- mm->context.high_slices_psize);
+ (unsigned long)mm->context.low_slices_psize,
+ (unsigned long)mm->context.high_slices_psize);

spin_unlock_irqrestore(_convert_lock, flags);

@@ -678,8 +662,8 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)


slice_dbg(" lsps=%lx, hsps=%lx\n",
- mm->context.low_slices_psize,
- mm->context.high_slices_psize);
+ (unsigned long)mm->context.low_slices_psize,
+ (unsigned long)mm->context.high_slices_psize);

   bail:
spin_unlock_irqrestore(_convert_lock, flags);


-aneesh



[PATCH] powerpc/perf: Avoid FAB_*_MATCH checks for power9

2017-02-13 Thread Madhavan Srinivasan
Since power9 does not support FAB_*_MATCH bits in MMCR1,
avoid these checks for power9. For this, patch factor out
code in isa207_get_constraint() to retain these checks
only for power8.

Patch also updates the comment in power9-pmu raw event
encode layout to remove FAB_*_MATCH.

Finally for power9, patch adds additional check for
threshold events when adding the thresh mask and value in
isa207_get_constraint().

fixes: 7ffd948fae4c ('powerpc/perf: factor out power8 pmu functions')
fixes: 18201b204286 ('powerpc/perf: power9 raw event format encoding')
Signed-off-by: Ravi Bangoria 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/isa207-common.c | 58 ++-
 arch/powerpc/perf/power9-pmu.c|  8 ++
 2 files changed, 42 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 50e598cf644b..2703a1e340e7 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -97,6 +97,28 @@ static unsigned long combine_shift(unsigned long pmc)
return MMCR1_COMBINE_SHIFT(pmc);
 }
 
+static inline bool event_is_threshold(u64 event)
+{
+   return (event >> EVENT_THR_SEL_SHIFT) & EVENT_THR_SEL_MASK;
+}
+
+static bool is_thresh_cmp_valid(u64 event)
+{
+   unsigned int cmp, exp;
+
+   /*
+* Check the mantissa upper two bits are not zero, unless the
+* exponent is also zero. See the THRESH_CMP_MANTISSA doc.
+*/
+   cmp = (event >> EVENT_THR_CMP_SHIFT) & EVENT_THR_CMP_MASK;
+   exp = cmp >> 7;
+
+   if (exp && (cmp & 0x60) == 0)
+   return false;
+
+   return true;
+}
+
 int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 {
unsigned int unit, pmc, cache, ebb;
@@ -163,28 +185,26 @@ int isa207_get_constraint(u64 event, unsigned long 
*maskp, unsigned long *valp)
value |= CNST_SAMPLE_VAL(event >> EVENT_SAMPLE_SHIFT);
}
 
-   /*
-* Special case for PM_MRK_FAB_RSP_MATCH and PM_MRK_FAB_RSP_MATCH_CYC,
-* the threshold control bits are used for the match value.
-*/
-   if (event_is_fab_match(event)) {
-   mask  |= CNST_FAB_MATCH_MASK;
-   value |= CNST_FAB_MATCH_VAL(event >> EVENT_THR_CTL_SHIFT);
+   if (cpu_has_feature(CPU_FTR_ARCH_300))  {
+   if (event_is_threshold(event) && is_thresh_cmp_valid(event)) {
+   mask  |= CNST_THRESH_MASK;
+   value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT);
+   }
} else {
/*
-* Check the mantissa upper two bits are not zero, unless the
-* exponent is also zero. See the THRESH_CMP_MANTISSA doc.
+* Special case for PM_MRK_FAB_RSP_MATCH and 
PM_MRK_FAB_RSP_MATCH_CYC,
+* the threshold control bits are used for the match value.
 */
-   unsigned int cmp, exp;
-
-   cmp = (event >> EVENT_THR_CMP_SHIFT) & EVENT_THR_CMP_MASK;
-   exp = cmp >> 7;
-
-   if (exp && (cmp & 0x60) == 0)
-   return -1;
+   if (event_is_fab_match(event)) {
+   mask  |= CNST_FAB_MATCH_MASK;
+   value |= CNST_FAB_MATCH_VAL(event >> 
EVENT_THR_CTL_SHIFT);
+   } else {
+   if (!is_thresh_cmp_valid(event))
+   return -1;
 
-   mask  |= CNST_THRESH_MASK;
-   value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT);
+   mask  |= CNST_THRESH_MASK;
+   value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT);
+   }
}
 
if (!pmc && ebb)
@@ -279,7 +299,7 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
 * PM_MRK_FAB_RSP_MATCH and PM_MRK_FAB_RSP_MATCH_CYC,
 * the threshold bits are used for the match value.
 */
-   if (event_is_fab_match(event[i])) {
+   if (!cpu_has_feature(CPU_FTR_ARCH_300) && 
event_is_fab_match(event[i])) {
mmcr1 |= ((event[i] >> EVENT_THR_CTL_SHIFT) &
  EVENT_THR_CTL_MASK) << MMCR1_FAB_SHIFT;
} else {
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 7332634e18c9..7950cee7d617 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -22,7 +22,7 @@
  * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - 
- - |
  *   | | [ ]   [ ] [  thresh_cmp ]   [  thresh_ctl 
  ]
  *   | |  | | |
- *   | |  *- IFM (Linux)|thresh start/stop OR FAB match -*
+ *   | |  *- IFM (Linux)|

[PATCH] powerpc/mm/slice: Update slice mask printing to use bitmap printing.

2017-02-13 Thread Aneesh Kumar K.V
We now get output like below which is much better.

[0.935306]  good_mask low_slice: 4-5,9,11-13
[0.935360]  good_mask high_slice: 0-511
[0.935385]  mask low_slice: 3-6,8,10-12
[0.935397]  mask high_slice:

Compared to

[0.953414]  good_mask: - 1.

I also fixed an error with slice_dbg printing.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/slice.c | 30 +++---
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b3f45e413a60..0575897fdbe3 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -44,29 +44,13 @@ int _slice_debug = 1;
 
 static void slice_print_mask(const char *label, struct slice_mask mask)
 {
-   char*p, buf[SLICE_NUM_LOW + 3 + SLICE_NUM_HIGH + 1];
-   int i;
-
if (!_slice_debug)
return;
-   p = buf;
-   for (i = 0; i < SLICE_NUM_LOW; i++)
-   *(p++) = (mask.low_slices & (1 << i)) ? '1' : '0';
-   *(p++) = ' ';
-   *(p++) = '-';
-   *(p++) = ' ';
-   for (i = 0; i < SLICE_NUM_HIGH; i++) {
-   if (test_bit(i, mask.high_slices))
-   *(p++) = '1';
-   else
-   *(p++) = '0';
-   }
-   *(p++) = 0;
-
-   printk(KERN_DEBUG "%s:%s\n", label, buf);
+   pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, 
_slices);
+   pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, 
mask.high_slices);
 }
 
-#define slice_dbg(fmt...) do { if (_slice_debug) pr_debug(fmt); } while(0)
+#define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0)
 
 #else
 
@@ -233,8 +217,8 @@ static void slice_convert(struct mm_struct *mm, struct 
slice_mask mask, int psiz
}
 
slice_dbg(" lsps=%lx, hsps=%lx\n",
- mm->context.low_slices_psize,
- mm->context.high_slices_psize);
+ (unsigned long)mm->context.low_slices_psize,
+ (unsigned long)mm->context.high_slices_psize);
 
spin_unlock_irqrestore(_convert_lock, flags);
 
@@ -678,8 +662,8 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)
 
 
slice_dbg(" lsps=%lx, hsps=%lx\n",
- mm->context.low_slices_psize,
- mm->context.high_slices_psize);
+ (unsigned long)mm->context.low_slices_psize,
+ (unsigned long)mm->context.high_slices_psize);
 
  bail:
spin_unlock_irqrestore(_convert_lock, flags);
-- 
2.7.4



RE: [RFC][PATCH] powerpc/64s: optimise syscall entry with relon hypercalls

2017-02-13 Thread David Laight
From: Nicholas Piggin
> Sent: 10 February 2017 18:23
> After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from
> guest to host"), a getppid() system call goes from 307 cycles to 358
> cycles (+17%). This is due significantly to the scratch SPR used by the
> hypercall.
> 
> It turns out there are a some volatile registers common to both system
> call and hypercall (in particular, r12, cr0, ctr), which can be used to
> avoid the SPR and some other overheads for the system call case. This
> brings getppid to 320 cycles (+4%).
...
> + * syscall register convention is in Documentation/powerpc/syscall64-abi.txt
> + *
> + * For hypercalls, the register convention is as follows:
> + * r0 volatile
> + * r1-2 nonvolatile
> + * r3 volatile parameter and return value for status
> + * r4-r10 volatile input and output value
> + * r11 volatile hypercall number and output value
> + * r12 volatile
> + * r13-r31 nonvolatile
> + * LR nonvolatile
> + * CTR volatile
> + * XER volatile
> + * CR0-1 CR5-7 volatile
> + * CR2-4 nonvolatile
> + * Other registers nonvolatile
> + *
> + * The intersection of volatile registers that don't contain possible
> + * inputs is: r12, cr0, xer, ctr. We may use these as scratch regs
> + * upon entry without saving.

Except that they must surely be set to some known value on exit in order
to avoid leaking information to the guest.

David



Re: [RFC] implement QUEUED spinlocks on powerpc

2017-02-13 Thread panxinhui


在 2017/2/7 下午2:46, Eric Dumazet 写道:
> On Mon, Feb 6, 2017 at 10:21 PM, panxinhui  wrote:
> 
>> hi all
>> I do some netperf tests and get some benchmark results.
>> I also attach my test script and netperf-result(Excel)
>>
HI, all
I use loopback interface to run netperf tests,
#tc qd add dev lo root pfifo limit 1
#ip link
1: lo:  mtu 65536 qdisc pfifo state UNKNOWN mode DEFAULT 
group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

and put the result in netperf.xlsx(excel)

It is a 32 vcpus P8 machine, with 32Gib memory.

This time spinlock is the best one, qspinlock > pvqspinlock. So sad.

thanks
xinhui
>> There are two machine. one runs netserver and the other runs netperf
>> benchmark. 1000Mbps network is connected with them.
>>
>> #ip link infomation
>> 2: eth0:  mtu 1500 qdisc pfifo_fast state
>> UNKNOWN mode DEFAULT group default qlen 1000
>>  link/ether ba:68:9c:14:32:02 brd ff:ff:ff:ff:ff:ff
>>
>> According to the results, there is not much performance gap with each other.
>> And as we are only testing the throughput, the pvqspinlock shows the
>> overhead of its pv stuff. but qspinlock shows a little improvement than
>> spinlock. My simple summary in this testcase is
>> qspinlock > spinlock > pvqspinlock.
>>
>> when run 200 concurrent netperf, I paste the total throughput here.
>>
>> concurrent runners| total throughput | variance
>> ---
>> spinlock| 199 | 66882.8 | 89.93
>> ---
>> qspinlock   | 199 | 66350.4 | 72.0239
>> ---
>> pvqspinlock | 199 | 64740.5 | 85.7837
>>
>> You could see more data in nerperf.xlsx
>>
>> thanks
>> xinhui
> 
> 
> Hi xinhui
> 
> 1Gbit NIC is too slow for this use case. I would try a 10Gbit NIC at least...
> 
> Alternatively, you could use loopback interface.  (netperf -H 127.0.0.1)
> 
> tc qd add dev lo root pfifo limit 1
> 


netperf.xlsx
Description: MS-Excel 2007 spreadsheet


[PATCH] add a const to ioread* routines to fix compile testing

2017-02-13 Thread Cédric Le Goater
On some architectures, the ioread routines are still using a non-const
argument for the address parameter. Let's change that to be consistent
with the others and fix compile testing (ARM drivers on Intel for
instance).

Signed-off-by: Cédric Le Goater 
---

 I am not sure how we should handle these changes, so, here is a big
 patch for all architectures to let maintainers decide. I suppose we
 could merge the patch in one arch, and first see how the 0-Day bot
 reacts.

 The patch can be found on this branch :

   https://github.com/legoater/linux/tree/aspeed

 Compiled on :
   
   arm
   arm64
   avr32   
   frv
   ia64
   alpha
   m68k
   parisc32
   parisc64
   mips
   mips64
   sh32
   sparc32
   sparc64
   x86_64
   i386
   powerpc32
   powerpc64
   powerpc64le
   s390x

 arch/alpha/include/asm/core_apecs.h  |  6 ++--
 arch/alpha/include/asm/core_cia.h|  6 ++--
 arch/alpha/include/asm/core_lca.h|  6 ++--
 arch/alpha/include/asm/core_marvel.h |  4 +--
 arch/alpha/include/asm/core_mcpcia.h |  6 ++--
 arch/alpha/include/asm/core_t2.h |  2 +-
 arch/alpha/include/asm/io.h  |  2 +-
 arch/alpha/include/asm/io_trivial.h  |  6 ++--
 arch/alpha/include/asm/jensen.h  |  2 +-
 arch/alpha/include/asm/machvec.h |  6 ++--
 arch/alpha/kernel/core_marvel.c  |  2 +-
 arch/alpha/kernel/io.c   | 12 
 arch/frv/include/asm/io.h| 12 
 arch/frv/include/asm/mb-regs.h   |  6 ++--
 arch/mips/lib/iomap.c| 22 ++---
 arch/parisc/lib/iomap.c  | 60 ++--
 arch/powerpc/kernel/iomap.c  | 20 ++--
 arch/sh/kernel/iomap.c   | 22 ++---
 arch/sparc/include/asm/io_64.h   |  6 ++--
 include/asm-generic/iomap.h  | 20 ++--
 lib/iomap.c  | 22 ++---
 21 files changed, 125 insertions(+), 125 deletions(-)

diff --git a/arch/alpha/include/asm/core_apecs.h 
b/arch/alpha/include/asm/core_apecs.h
index 6785ff7e02bc..a4c88d2a66f0 100644
--- a/arch/alpha/include/asm/core_apecs.h
+++ b/arch/alpha/include/asm/core_apecs.h
@@ -383,7 +383,7 @@ struct el_apecs_procdata
}   \
} while (0)
 
-__EXTERN_INLINE unsigned int apecs_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -419,7 +419,7 @@ __EXTERN_INLINE void apecs_iowrite8(u8 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -455,7 +455,7 @@ __EXTERN_INLINE void apecs_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < APECS_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_cia.h 
b/arch/alpha/include/asm/core_cia.h
index 9e0516c0ca27..fdc029953b90 100644
--- a/arch/alpha/include/asm/core_cia.h
+++ b/arch/alpha/include/asm/core_cia.h
@@ -341,7 +341,7 @@ struct el_CIA_sysdata_mcheck {
 #define vuip   volatile unsigned int __force *
 #define vulp   volatile unsigned long __force *
 
-__EXTERN_INLINE unsigned int cia_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -373,7 +373,7 @@ __EXTERN_INLINE void cia_iowrite8(u8 b, void __iomem *xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -403,7 +403,7 @@ __EXTERN_INLINE void cia_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < CIA_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_lca.h 
b/arch/alpha/include/asm/core_lca.h
index 8ee6c516279c..25277e989731 100644
--- a/arch/alpha/include/asm/core_lca.h
+++ b/arch/alpha/include/asm/core_lca.h
@@ -229,7 +229,7 @@ union el_lca {
} while (0)
 
 
-__EXTERN_INLINE unsigned int lca_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread8(const void __iomem *xaddr)
 {

Re: [PATCH 1/3] kprobes: introduce weak variant of kprobe_exceptions_notify

2017-02-13 Thread Naveen N. Rao
On 2017/02/10 02:41PM, Michael Ellerman wrote:
> "Naveen N. Rao"  writes:
> 
> > kprobe_exceptions_notify() is not used on some of the architectures such
> > as arm[64] and powerpc anymore. Introduce a weak variant for such
> > architectures.
> 
> I'll merge patch 1 & 3 via the powerpc tree for v4.11.
> 
> You can then send patch 2 to the arm guys after -rc1, or for 4.12.

Sure, thanks!

- Naveen