date:20200602

[PATCH 0/5] Use per-CPU temporary mappings for patching

2020-06-02 Thread Christopher M. Riedl

When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
temporary mappings when patching itself. These mappings temporarily
override the strict RWX text protections to permit a write. Currently,
powerpc allocates a per-CPU VM area for patching. Patching occurs as
follows:

1. Map page of text to be patched to per-CPU VM area w/
   PAGE_KERNEL protection
2. Patch text
3. Remove the temporary mapping

While the VM area is per-CPU, the mapping is actually inserted into the
kernel page tables. Presumably, this could allow another CPU to access
the normally write-protected text - either malicously or accidentally -
via this same mapping if the address of the VM area is known. Ideally,
the mapping should be kept local to the CPU doing the patching (or any
other sensitive operations requiring temporarily overriding memory
protections) [0].

x86 introduced "temporary mm" structs which allow the creation of
mappings local to a particular CPU [1]. This series intends to bring the
notion of a temporary mm to powerpc and harden powerpc by using such a
mapping for patching a kernel with strict RWX permissions.

The first patch introduces the temporary mm struct and API for powerpc
along with a new function to retrieve a current hw breakpoint.

The second patch uses the `poking_init` init hook added by the x86
patches to initialize a temporary mm and patching address. The patching
address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
upper limit is necessary due to how the hash MMU operates - by default
the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
and radix randomize inside this range. The number of possible random
addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.

Bits of entropy with 64K page size on BOOK3S_64:

bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
bits of entropy = log2(128TB / 64K)
bits of entropy = 31

Randomization occurs only once during initialization at boot.

The third patch replaces the VM area with the temporary mm in the
patching code. The page for patching has to be mapped PAGE_SHARED with
the hash MMU since hash prevents the kernel from accessing userspace
pages with PAGE_PRIVILEGED bit set. On the radix MMU the page is mapped with
PAGE_KERNEL which has the added benefit that we can skip KUAP. 

The fourth and fifth patches implement an LKDTM test "proof-of-concept" which
exploits the previous vulnerability (ie. the mapping during patching is exposed
in kernel page tables and accessible by other CPUS). The LKDTM test is somewhat
"rough" in that it uses a brute-force approach - I am open to any suggestions
and/or ideas to improve this. Currently, the LKDTM test passes with this series
on POWER8 (hash) and POWER9 (radix, hash) and fails without this series (ie.
the temporary mapping for patching is exposed to CPUs other than the patching
CPU).

The test can be applied to a tree without this new series by first
adding this in /arch/powerpc/lib/code-patching.c:

 #ifdef CONFIG_STRICT_KERNEL_RWX
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);

+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+   return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
+}
+#endif
+
 static int text_area_cpu_up(unsigned int cpu)
 {
struct vm_struct *area;

And then applying the last patch of this series which adds the LKDTM test,
(powerpc: Add LKDTM test to hijack a patch mapping).

Tested on QEMU (POWER8, POWER9), POWER8 VM, and a Blackbird (8-core POWER9).

v1: * Rebase on linuxppc/next (4336b9337824)
* Save and restore second hw watchpoint
* Use new ppc_inst_* functions for patching check and in LKDTM
  test

rfc-v2: * Many fixes and improvements mostly based on extensive feedback and
  testing by Christophe Leroy (thanks!).
* Make patching_mm and patching_addr static and mode '__ro_after_init'
  to after the variable name (more common in other parts of the kernel)
* Use 'asm/debug.h' header instead of 'asm/hw_breakpoint.h' to fix
  PPC64e compile
* Add comment explaining why we use BUG_ON() during the init call to
  setup for patching later
* Move ptep into patch_mapping to avoid walking page tables a second
  time when unmapping the temporary mapping
* Use KUAP under non-radix, also manually dirty the PTE for patch
  mapping on non-BOOK3S_64 platforms
* Properly return any error from __patch_instruction
* Do not use 'memcmp' where a simple comparison is appropriate
* Simplify expression for patch address by removing pointer maths
* Add LKDTM test


[0]: https://github.com/linuxppc/issues/issues/224
[1]: 
https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.a...@gmail.com/


Christopher M. Riedl

[PATCH 5/5] powerpc: Add LKDTM test to hijack a patch mapping

2020-06-02 Thread Christopher M. Riedl

When live patching with STRICT_KERNEL_RWX, the CPU doing the patching
must use a temporary mapping which allows for writing to kernel text.
During the entire window of time when this temporary mapping is in use,
another CPU could write to the same mapping and maliciously alter kernel
text. Implement a LKDTM test to attempt to exploit such a openings when
a CPU is patching under STRICT_KERNEL_RWX. The test is only implemented
on powerpc for now.

The LKDTM "hijack" test works as follows:

1. A CPU executes an infinite loop to patch an instruction.
   This is the "patching" CPU.
2. Another CPU attempts to write to the address of the temporary
   mapping used by the "patching" CPU. This other CPU is the
   "hijacker" CPU. The hijack either fails with a segfault or
   succeeds, in which case some kernel text is now overwritten.

How to run the test:

mount -t debugfs none /sys/kernel/debug
(echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT)

Signed-off-by: Christopher M. Riedl 
---
 drivers/misc/lkdtm/core.c  |   1 +
 drivers/misc/lkdtm/lkdtm.h |   1 +
 drivers/misc/lkdtm/perms.c | 101 +
 3 files changed, 103 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..482e72f6a1e1 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
CRASHTYPE(WRITE_KERN),
+   CRASHTYPE(HIJACK_PATCH),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 601a2156a0d4..bfcf3542370d 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -62,6 +62,7 @@ void lkdtm_EXEC_USERSPACE(void);
 void lkdtm_EXEC_NULL(void);
 void lkdtm_ACCESS_USERSPACE(void);
 void lkdtm_ACCESS_NULL(void);
+void lkdtm_HIJACK_PATCH(void);
 
 /* lkdtm_refcount.c */
 void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 62f76d506f04..8bda3b56bc78 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -213,6 +214,106 @@ void lkdtm_ACCESS_NULL(void)
*ptr = tmp;
 }
 
+#if defined(CONFIG_PPC) && defined(CONFIG_STRICT_KERNEL_RWX)
+#include 
+
+extern unsigned long read_cpu_patching_addr(unsigned int cpu);
+
+static struct ppc_inst * const patch_site = (struct ppc_inst *)_nothing;
+
+static int lkdtm_patching_cpu(void *data)
+{
+   int err = 0;
+   struct ppc_inst insn = ppc_inst(0xdeadbeef);
+
+   pr_info("starting patching_cpu=%d\n", smp_processor_id());
+   do {
+   err = patch_instruction(patch_site, insn);
+   } while (ppc_inst_equal(ppc_inst_read(READ_ONCE(patch_site)), insn) &&
+   !err && !kthread_should_stop());
+
+   if (err)
+   pr_warn("patch_instruction returned error: %d\n", err);
+
+   set_current_state(TASK_INTERRUPTIBLE);
+   while (!kthread_should_stop()) {
+   schedule();
+   set_current_state(TASK_INTERRUPTIBLE);
+   }
+
+   return err;
+}
+
+void lkdtm_HIJACK_PATCH(void)
+{
+   struct task_struct *patching_kthrd;
+   struct ppc_inst original_insn;
+   int patching_cpu, hijacker_cpu, attempts;
+   unsigned long addr;
+   bool hijacked;
+
+   if (num_online_cpus() < 2) {
+   pr_warn("need at least two cpus\n");
+   return;
+   }
+
+   original_insn = ppc_inst_read(READ_ONCE(patch_site));
+
+   hijacker_cpu = smp_processor_id();
+   patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
+
+   patching_kthrd = kthread_create_on_node(_patching_cpu, NULL,
+   cpu_to_node(patching_cpu),
+   "lkdtm_patching_cpu");
+   kthread_bind(patching_kthrd, patching_cpu);
+   wake_up_process(patching_kthrd);
+
+   addr = offset_in_page(patch_site) | 
read_cpu_patching_addr(patching_cpu);
+
+   pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
+   for (attempts = 0; attempts < 10; ++attempts) {
+   /* Use __put_user to catch faults without an Oops */
+   hijacked = !__put_user(0xbad00bad, (unsigned int *)addr);
+
+   if (hijacked) {
+   if (kthread_stop(patching_kthrd))
+   goto out;
+   break;
+   }
+   }
+   pr_info("hijack attempts: %d\n", attempts);
+
+   if (hijacked) {
+   if (*(unsigned int

[PATCH 2/5] powerpc/lib: Initialize a temporary mm for code patching

2020-06-02 Thread Christopher M. Riedl

When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 33 
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 5ecf0d635a8d..599114f63b44 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -45,6 +47,37 @@ int raw_patch_instruction(struct ppc_inst *addr, struct 
ppc_inst instr)
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+
+   /*
+* Some parts of the kernel (static keys for example) depend on
+* successful code patching. Code patching under STRICT_KERNEL_RWX
+* requires this setup - otherwise we cannot patch at all. We use
+* BUG_ON() here and later since an early failure is preferred to
+* buggy behavior and/or strange crashes later.
+*/
+   patching_mm = copy_init_mm();
+   BUG_ON(!patching_mm);
+
+   /*
+* In hash we cannot go above DEFAULT_MAP_WINDOW easily.
+* XXX: Do we want additional bits of entropy for radix?
+*/
+   patching_addr = (get_random_long() & PAGE_MASK) %
+   (DEFAULT_MAP_WINDOW - PAGE_SIZE);
+
+   ptep = get_locked_pte(patching_mm, patching_addr, );
+   BUG_ON(!ptep);
+   pte_unmap_unlock(ptep, ptl);
+}
+
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
 static int text_area_cpu_up(unsigned int cpu)
-- 
2.26.2

[PATCH 3/5] powerpc/lib: Use a temporary mm for code patching

2020-06-02 Thread Christopher M. Riedl

Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
mappings to other CPUs. These mappings should be kept local to the CPU
doing the patching. Use the pre-initialized temporary mm and patching
address for this purpose. Also add a check after patching to ensure the
patch succeeded.

Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
mapping for patching uses a userspace address (to keep the mapping
local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
(KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).

Based on x86 implementation:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 148 ---
 1 file changed, 55 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 599114f63b44..df0765845204 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst 
instr,
   struct ppc_inst *patch_addr)
@@ -78,101 +79,58 @@ void __init poking_init(void)
pte_unmap_unlock(ptep, ptl);
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
-
-static int text_area_cpu_up(unsigned int cpu)
-{
-   struct vm_struct *area;
-
-   area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-   if (!area) {
-   WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-   cpu);
-   return -1;
-   }
-   this_cpu_write(text_poke_area, area);
-
-   return 0;
-}
-
-static int text_area_cpu_down(unsigned int cpu)
-{
-   free_vm_area(this_cpu_read(text_poke_area));
-   return 0;
-}
-
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-   BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-   "powerpc/text_poke:online", text_area_cpu_up,
-   text_area_cpu_down));
-
-   return 0;
-}
-late_initcall(setup_text_poke_area);
+struct patch_mapping {
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+   struct temp_mm temp_mm;
+};
 
 /*
  * This can be called for kernel text or a module.
  */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 {
-   unsigned long pfn;
-   int err;
+   struct page *page;
+   pte_t pte;
+   pgprot_t pgprot;
 
if (is_vmalloc_addr(addr))
-   pfn = vmalloc_to_pfn(addr);
+   page = vmalloc_to_page(addr);
else
-   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+   page = virt_to_page(addr);
 
-   err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
+   if (radix_enabled())
+   pgprot = PAGE_KERNEL;
+   else
+   pgprot = PAGE_SHARED;
 
-   pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
-   if (err)
+   patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
+_mapping->ptl);
+   if (unlikely(!patch_mapping->ptep)) {
+   pr_warn("map patch: failed to allocate pte for patching\n");
return -1;
+   }
+
+   pte = mk_pte(page, pgprot);
+   if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   pte = pte_mkdirty(pte);
+   set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
+
+   init_temp_mm(_mapping->temp_mm, patching_mm);
+   use_temporary_mm(_mapping->temp_mm);
 
return 0;
 }
 
-static inline int unmap_patch_area(unsigned long addr)
+static void unmap_patch(struct patch_mapping *patch_mapping)
 {
-   pte_t *ptep;
-   pmd_t *pmdp;
-   pud_t *pudp;
-   pgd_t *pgdp;
-
-   pgdp = pgd_offset_k(addr);
-   if (unlikely(!pgdp))
-   return -EINVAL;
-
-   pudp = pud_offset(pgdp, addr);
-   if (unlikely(!pudp))
-   return -EINVAL;
-
-   pmdp = pmd_offset(pudp, addr);
-   if (unlikely(!pmdp))
-   return -EINVAL;
-
-   ptep = pte_offset_kernel(pmdp, addr);
-   if (unlikely(!ptep))
-   return -EINVAL;
+   /* In hash, pte_clear flushes the tlb */
+   pte_clear(patching_mm, patching_addr,

[PATCH 4/5] powerpc/lib: Add LKDTM accessor for patching addr

2020-06-02 Thread Christopher M. Riedl

When live patching a STRICT_RWX kernel, a mapping is installed at a
"patching address" with temporary write permissions. Provide a
LKDTM-only accessor function for this address in preparation for a LKDTM
test which attempts to "hijack" this mapping by writing to it from
another CPU.

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index df0765845204..c23453049116 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -52,6 +52,13 @@ int raw_patch_instruction(struct ppc_inst *addr, struct 
ppc_inst instr)
 static struct mm_struct *patching_mm __ro_after_init;
 static unsigned long patching_addr __ro_after_init;
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+   return patching_addr;
+}
+#endif
+
 void __init poking_init(void)
 {
spinlock_t *ptl; /* for protecting pte table */
-- 
2.26.2

[PATCH 1/5] powerpc/mm: Introduce temporary mm

2020-06-02 Thread Christopher M. Riedl

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.

Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/include/asm/debug.h   |  1 +
 arch/powerpc/include/asm/mmu_context.h | 64 ++
 arch/powerpc/kernel/process.c  |  5 ++
 3 files changed, 70 insertions(+)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index ec57daf87f40..827350c9bcf3 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs 
*regs) { return 0; }
 #endif
 
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
 bool ppc_breakpoint_available(void);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 extern void do_send_trap(struct pt_regs *regs, unsigned long address,
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 1a474f6b1992..9269c7c7b04e 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -10,6 +10,7 @@
 #include
 #include 
 #include 
+#include 
 
 /*
  * Most if the context management is out of line
@@ -300,5 +301,68 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
return 0;
 }
 
+struct temp_mm {
+   struct mm_struct *temp;
+   struct mm_struct *prev;
+   bool is_kernel_thread;
+   struct arch_hw_breakpoint brk[HBP_NUM_MAX];
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+   temp_mm->temp = mm;
+   temp_mm->prev = NULL;
+   temp_mm->is_kernel_thread = false;
+   memset(_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   temp_mm->is_kernel_thread = current->mm == NULL;
+   if (temp_mm->is_kernel_thread)
+   temp_mm->prev = current->active_mm;
+   else
+   temp_mm->prev = current->mm;
+
+   /*
+* Hash requires a non-NULL current->mm to allocate a userspace address
+* when handling a page fault. Does not appear to hurt in Radix either.
+*/
+   current->mm = temp_mm->temp;
+   switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+   if (ppc_breakpoint_available()) {
+   struct arch_hw_breakpoint null_brk = {0};
+   int i = 0;
+
+   for (; i < nr_wp_slots(); ++i) {
+   __get_breakpoint(i, _mm->brk[i]);
+   if (temp_mm->brk[i].type != 0)
+   __set_breakpoint(i, _brk);
+   }
+   }
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   if (temp_mm->is_kernel_thread)
+   current->mm = NULL;
+   else
+   current->mm = temp_mm->prev;
+   switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+   if (ppc_breakpoint_available()) {
+   int i = 0;
+
+   for (; i < nr_wp_slots(); ++i)
+   if (temp_mm->brk[i].type != 0)
+   __set_breakpoint(i, _mm->brk[i]);
+   }
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 048d64c4e115..3973144f6980 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -825,6 +825,11 @@ static inline int set_breakpoint_8xx(struct 
arch_hw_breakpoint *brk)
return 0;
 }
 
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
+{
+   memcpy(brk, this_cpu_ptr(_brk[nr]), sizeof(*brk));
+}
+
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
 {
memcpy(this_cpu_ptr(_brk[nr]), brk, sizeof(*brk));
-- 
2.26.2

Re: cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()

2020-06-02 Thread Markus Elfring

>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465
>> I just used the original author's label, should I replace all his labels
>> like'err','err1' with reasonable one.
>
> No.

Do you insist to deviate from the current Linux coding style?

Regards,
Markus

[powerpc:merge] BUILD SUCCESS 4d8244d005b2a92c872a0a993c3aa94b5842e56b

2020-06-02 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 4d8244d005b2a92c872a0a993c3aa94b5842e56b  Automatic merge of 
'master', 'next' and 'fixes' (2020-06-02 11:51)

elapsed time: 922m

configs tested: 119
configs skipped: 10

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64allmodconfig
arm64 allnoconfig
arm64   defconfig
arm   imx_v4_v5_defconfig
h8300 edosk2674_defconfig
pariscgeneric-64bit_defconfig
m68k amcore_defconfig
arm davinci_all_defconfig
m68kq40_defconfig
sh   se7343_defconfig
mipsnlm_xlp_defconfig
arm   corgi_defconfig
mips  bmips_stb_defconfig
armrealview_defconfig
sparcallyesconfig
alpha   defconfig
arm  prima2_defconfig
s390  allnoconfig
mips  allnoconfig
mipsgpr_defconfig
sh sh7710voipgw_defconfig
ia64 allmodconfig
powerpc  storcenter_defconfig
mips  decstation_64_defconfig
sh   rts7751r2dplus_defconfig
sh magicpanelr2_defconfig
pariscallnoconfig
mips   ci20_defconfig
mips   ip22_defconfig
s390 alldefconfig
c6xevmc6472_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alphaallyesconfig
xtensa  defconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips allmodconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20200602
i386 randconfig-a006-20200602
i386 randconfig-a002-20200602
i386 randconfig-a005-20200602
i386 randconfig-a003-20200602
i386 randconfig-a004-20200602
x86_64   randconfig-a011-20200602
x86_64   randconfig-a016-20200602
x86_64   randconfig-a013-20200602
x86_64   randconfig-a012-20200602
x86_64   randconfig-a014-20200602
x86_64   randconfig-a015-20200602
i386 randconfig-a014-20200602
i386 randconfig-a015-20200602
i386 randconfig-a011-20200602
i386 randconfig-a016-20200602
i386 randconfig-a012-20200602
i386 randconfig-a013-20200602
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390

[powerpc:next] BUILD SUCCESS 4336b9337824a60a0b10013c622caeee99460db5

2020-06-02 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next
branch HEAD: 4336b9337824a60a0b10013c622caeee99460db5  powerpc/pseries: Make 
vio and ibmebus initcalls pseries specific

elapsed time: 919m

configs tested: 118
configs skipped: 12

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
arm davinci_all_defconfig
m68kq40_defconfig
sh   se7343_defconfig
mipsnlm_xlp_defconfig
arm   corgi_defconfig
sparcallyesconfig
mips  bmips_stb_defconfig
armrealview_defconfig
alpha   defconfig
arm  prima2_defconfig
s390  allnoconfig
mips  allnoconfig
mipsgpr_defconfig
sh sh7710voipgw_defconfig
ia64 allmodconfig
powerpc  storcenter_defconfig
mips  decstation_64_defconfig
sh   rts7751r2dplus_defconfig
sh magicpanelr2_defconfig
pariscallnoconfig
mips   ci20_defconfig
mips   ip22_defconfig
powerpc  alldefconfig
mipsmaltaup_defconfig
mipsjmr3927_defconfig
s390 alldefconfig
c6xevmc6472_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alphaallyesconfig
xtensa  defconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips allmodconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20200602
i386 randconfig-a006-20200602
i386 randconfig-a002-20200602
i386 randconfig-a005-20200602
i386 randconfig-a003-20200602
i386 randconfig-a004-20200602
x86_64   randconfig-a011-20200602
x86_64   randconfig-a016-20200602
x86_64   randconfig-a013-20200602
x86_64   randconfig-a012-20200602
x86_64   randconfig-a014-20200602
x86_64   randconfig-a015-20200602
i386 randconfig-a014-20200602
i386 randconfig-a015-20200602
i386 randconfig-a011-20200602
i386 randconfig-a016-20200602
i386 randconfig-a012-20200602
i386 randconfig-a013-20200602
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390

Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-06-02 Thread Alexey Kardashevskiy




On 09/05/2020 18:19, Christoph Hellwig wrote:
> On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 17/04/2020 17:58, Christoph Hellwig wrote:
>>> On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote:
 And the fact they were exported leaves possibility that there is a
 driver somewhere relying on these symbols or distro kernel won't build
 because the symbol disappeared from exports (I do not know what KABI
 guarantees or if mainline kernel cares).
>>>
>>> We absolutely do not care.  In fact for abuses of APIs that drivers
>>> should not use we almost care to make them private and break people
>>> abusing them.
>>
>> ok :)
>>
 I do not care in particular but
 some might, a line separated with empty lines in the commit log would do.
>>>
>>> I'll add a blurb for the next version.
>>
>>
>> Has it gone anywhere? Thanks,
> 
> I've been hoping for the sg_buf helpers to land first, as they need
> backporting and would conflict.  Do you urgently need the series?

Any progress with sg_buf helpers stuff? Thanks,



-- 
Alexey

[PATCH 3/3] ASoC: fsl_easrc: Fix "Function parameter not described" warnings

2020-06-02 Thread Shengjiu Wang

Obtained with:
$ make W=1

sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 'easrc' 
not described in 'fsl_easrc_normalize_filter'
sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 'infilter' 
not described in 'fsl_easrc_normalize_filter'
sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 
'outfilter' not described in 'fsl_easrc_normalize_filter'
sound/soc/fsl/fsl_easrc.c:403: warning: Function parameter or member 'shift' 
not described in 'fsl_easrc_normalize_filter'

Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Signed-off-by: Shengjiu Wang 
Reported-by: kbuild test robot 
---
 sound/soc/fsl/fsl_easrc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index 7d8bf9d47842..2f6b3d8bfcfc 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -389,11 +389,11 @@ static int fsl_easrc_resampler_config(struct fsl_asrc 
*easrc)
  *  For input int[16, 24, 32] -> output float32
  *  scale it by multiplying filter coefficients by 2^-15, 2^-23, 2^-31
  *  input:
- *  asrc:  Structure pointer of fsl_asrc
- *  infilter : Pointer to non-scaled input filter
- *  shift:  The multiply factor
+ *  @easrc:  Structure pointer of fsl_asrc
+ *  @infilter : Pointer to non-scaled input filter
+ *  @shift:  The multiply factor
  *  output:
- *  outfilter: scaled filter
+ *  @outfilter: scaled filter
  */
 static int fsl_easrc_normalize_filter(struct fsl_asrc *easrc,
  u64 *infilter,
-- 
2.21.0

[PATCH 2/3] ASoC: fsl_easrc: Fix -Wunused-but-set-variable

2020-06-02 Thread Shengjiu Wang

Obtained with:
$ make W=1

sound/soc/fsl/fsl_easrc.c: In function 'fsl_easrc_set_rs_ratio':
sound/soc/fsl/fsl_easrc.c:182:15: warning: variable 'int_bits' set but not used 
[-Wunused-but-set-variable]
  unsigned int int_bits;
   ^
sound/soc/fsl/fsl_easrc.c: In function 'fsl_easrc_set_ctx_organziation':
sound/soc/fsl/fsl_easrc.c:1204:17: warning: variable 'dev' set but not used 
[-Wunused-but-set-variable]
  struct device *dev;
 ^
sound/soc/fsl/fsl_easrc.c: In function 'fsl_easrc_release_context':
sound/soc/fsl/fsl_easrc.c:1294:17: warning: variable 'dev' set but not used 
[-Wunused-but-set-variable]
  struct device *dev;
 ^
Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Signed-off-by: Shengjiu Wang 
Reported-by: kbuild test robot 
---
 sound/soc/fsl/fsl_easrc.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index f227308a50e2..7d8bf9d47842 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -179,22 +179,21 @@ static int fsl_easrc_set_rs_ratio(struct fsl_asrc_pair 
*ctx)
struct fsl_easrc_ctx_priv *ctx_priv = ctx->private;
unsigned int in_rate = ctx_priv->in_params.norm_rate;
unsigned int out_rate = ctx_priv->out_params.norm_rate;
-   unsigned int int_bits;
unsigned int frac_bits;
u64 val;
u32 *r;
 
switch (easrc_priv->rs_num_taps) {
case EASRC_RS_32_TAPS:
-   int_bits = 5;
+   /* integer bits = 5; */
frac_bits = 39;
break;
case EASRC_RS_64_TAPS:
-   int_bits = 6;
+   /* integer bits = 6; */
frac_bits = 38;
break;
case EASRC_RS_128_TAPS:
-   int_bits = 7;
+   /* integer bits = 7; */
frac_bits = 37;
break;
default:
@@ -1201,7 +1200,6 @@ static int fsl_easrc_set_ctx_format(struct fsl_asrc_pair 
*ctx,
 static int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
 {
struct fsl_easrc_ctx_priv *ctx_priv;
-   struct device *dev;
struct fsl_asrc *easrc;
 
if (!ctx)
@@ -1209,7 +1207,6 @@ static int fsl_easrc_set_ctx_organziation(struct 
fsl_asrc_pair *ctx)
 
easrc = ctx->asrc;
ctx_priv = ctx->private;
-   dev = >pdev->dev;
 
/* input interleaving parameters */
regmap_update_bits(easrc->regmap, REG_EASRC_CIA(ctx->index),
@@ -1291,13 +1288,11 @@ static void fsl_easrc_release_context(struct 
fsl_asrc_pair *ctx)
 {
unsigned long lock_flags;
struct fsl_asrc *easrc;
-   struct device *dev;
 
if (!ctx)
return;
 
easrc = ctx->asrc;
-   dev = >pdev->dev;
 
spin_lock_irqsave(>lock, lock_flags);
 
-- 
2.21.0

[PATCH 0/3] ASoC: fsl_easrc: Fix several warnings

2020-06-02 Thread Shengjiu Wang

Fix several warnings with "make W=1"

Shengjiu Wang (3):
  ASoC: fsl_easrc: Fix -Wmissing-prototypes warning
  ASoC: fsl_easrc: Fix -Wunused-but-set-variable
  ASoC: fsl_easrc: Fix "Function parameter not described" warnings

 sound/soc/fsl/fsl_easrc.c | 42 +--
 1 file changed, 18 insertions(+), 24 deletions(-)

-- 
2.21.0

[PATCH 1/3] ASoC: fsl_easrc: Fix -Wmissing-prototypes warning

2020-06-02 Thread Shengjiu Wang

Obtained with:
$ make W=1

sound/soc/fsl/fsl_easrc.c:967:5: warning: no previous prototype for function 
'fsl_easrc_config_context' [-Wmissing-prototypes]
int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int ctx_id)
^
sound/soc/fsl/fsl_easrc.c:967:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int ctx_id)
^
static
sound/soc/fsl/fsl_easrc.c:1128:5: warning: no previous prototype for function 
'fsl_easrc_set_ctx_format' [-Wmissing-prototypes]
int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
^
sound/soc/fsl/fsl_easrc.c:1128:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
^
static
sound/soc/fsl/fsl_easrc.c:1201:5: warning: no previous prototype for function 
'fsl_easrc_set_ctx_organziation' [-Wmissing-prototypes]
int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
^
sound/soc/fsl/fsl_easrc.c:1201:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1245:5: warning: no previous prototype for function 
'fsl_easrc_request_context' [-Wmissing-prototypes]
int fsl_easrc_request_context(int channels, struct fsl_asrc_pair *ctx)
^
sound/soc/fsl/fsl_easrc.c:1245:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
int fsl_easrc_request_context(int channels, struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1290:6: warning: no previous prototype for function 
'fsl_easrc_release_context' [-Wmissing-prototypes]
void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
 ^
sound/soc/fsl/fsl_easrc.c:1290:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
void fsl_easrc_release_context(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1317:5: warning: no previous prototype for function 
'fsl_easrc_start_context' [-Wmissing-prototypes]
int fsl_easrc_start_context(struct fsl_asrc_pair *ctx)
^
sound/soc/fsl/fsl_easrc.c:1317:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
int fsl_easrc_start_context(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1335:5: warning: no previous prototype for function 
'fsl_easrc_stop_context' [-Wmissing-prototypes]
int fsl_easrc_stop_context(struct fsl_asrc_pair *ctx)
^
sound/soc/fsl/fsl_easrc.c:1335:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
int fsl_easrc_stop_context(struct fsl_asrc_pair *ctx)
^
static
sound/soc/fsl/fsl_easrc.c:1382:18: warning: no previous prototype for function 
'fsl_easrc_get_dma_channel' [-Wmissing-prototypes]
struct dma_chan *fsl_easrc_get_dma_channel(struct fsl_asrc_pair *ctx,
 ^
sound/soc/fsl/fsl_easrc.c:1382:1: note: declare 'static' if the function is not 
intended to be used outside of this translation unit
struct dma_chan *fsl_easrc_get_dma_channel(struct fsl_asrc_pair *ctx,
^
static

Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Signed-off-by: Shengjiu Wang 
Reported-by: kbuild test robot 
---
 sound/soc/fsl/fsl_easrc.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index c6b5eb2d2af7..f227308a50e2 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -964,7 +964,7 @@ static int fsl_easrc_release_slot(struct fsl_asrc *easrc, 
unsigned int ctx_id)
  *
  * Configure the register relate with context.
  */
-int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int ctx_id)
+static int fsl_easrc_config_context(struct fsl_asrc *easrc, unsigned int 
ctx_id)
 {
struct fsl_easrc_ctx_priv *ctx_priv;
struct fsl_asrc_pair *ctx;
@@ -1125,9 +1125,9 @@ static int fsl_easrc_process_format(struct fsl_asrc_pair 
*ctx,
return 0;
 }
 
-int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
-snd_pcm_format_t *in_raw_format,
-snd_pcm_format_t *out_raw_format)
+static int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
+   snd_pcm_format_t *in_raw_format,
+   snd_pcm_format_t *out_raw_format)
 {
struct fsl_asrc *easrc = ctx->asrc;
struct fsl_easrc_ctx_priv *ctx_priv = ctx->private;
@@ -1198,7 +1198,7 @@ int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
  * to conform with this format. Interleaving parameters are accessed
  * through the ASRC_CTRL_IN_ACCESSa and ASRC_CTRL_OUT_ACCESSa registers
  */
-int fsl_easrc_set_ctx_organziation(struct fsl_asrc_pair *ctx)
+static int

Re: [PATCH v2] selftests: powerpc: Add test for execute-disabled pkeys

2020-06-02 Thread Michael Ellerman

Sandipan Das  writes:
> Hi Michael,
>
> Thanks for your suggestions. I had a few questions regarding some
> of them.
>
> On 29/05/20 7:18 am, Michael Ellerman wrote:
>>> [...]
>>> +
>>> +static void pkeyreg_set(unsigned long uamr)
>>> +{
>>> +   asm volatile("isync; mtspr  0xd, %0; isync;" : : "r"(uamr));
>>> +}
>> 
>> You can use mtspr() there, but you'll need to add the isync's yourself.
>> 
>
> Would it make sense to add a new macro that adds the CSI instructions?
> Something like this.

I guess. I'm not sure there's that many places that need it, it's just
the pkey tests I think.

I'd be more inclined to have a set_amr() helper that includes the
isyncs, rather than a generic mtspr() version.

> diff --git a/tools/testing/selftests/powerpc/include/reg.h 
> b/tools/testing/selftests/powerpc/include/reg.h
> index 022c5076b2c5..d60f66380cad 100644
> --- a/tools/testing/selftests/powerpc/include/reg.h
> +++ b/tools/testing/selftests/powerpc/include/reg.h
> @@ -15,6 +15,10 @@
>  #define mtspr(rn, v)   asm volatile("mtspr " _str(rn) ",%0" : \
> : "r" ((unsigned long)(v)) \
> : "memory")
> +#define mtspr_csi(rn, v)   ({ \
> +   asm volatile("isync; mtspr " _str(rn) ",%0; isync;" : 
> \
> +   : "r" ((unsigned long)(v)) \
> +   : "memory"); })
>  
>  #define mb()   asm volatile("sync" : : : "memory");
>  #define barrier()  asm volatile("" : : : "memory");
>
>
> I also noticed that two of the ptrace-related pkey tests were also not
> using CSIs. I will fix those too.
>
>>> [...]
>>> +   /* The following two cases will avoid SEGV_PKUERR */
>>> +   ftype = -1;
>>> +   fpkey = -1;
>>> +
>>> +   /*
>>> +* Read an instruction word from the address when AMR bits
>>> +* are not set.
>> 
>> You should explain for people who aren't familiar with the ISA that "AMR
>> bits not set" means "read/write access allowed".
>> 
>>> +*
>>> +* This should not generate a fault as having PROT_EXEC
>>> +* implicitly allows reads. The pkey currently restricts
>> 
>> Whether PROT_EXEC implies read is not well defined (see the man page).
>> If you want to test this case I think you'd be better off specifying
>> PROT_EXEC | PROT_READ explicitly.
>> 
>
> But I guess specifying PROT_EXEC | PROT_READ defeats the purpose? I can
> tweak the passing condition though based on whether READ_IMPLIES_EXEC is
> set in the personality.
>
>> [...]
>>> +   FAIL_IF(faults != 0 || fcode != SEGV_ACCERR);
>>> +
>>> +   /* The following three cases will generate SEGV_PKUERR */
>>> +   ftype = PKEY_DISABLE_ACCESS;
>>> +   fpkey = pkey;
>>> +
>>> +   /*
>>> +* Read an instruction word from the address when AMR bits
>>> +* are set.
>>> +*
>>> +* This should generate a pkey fault based on AMR bits only
>>> +* as having PROT_EXEC implicitly allows reads.
>> 
>> Again would be better to specify PROT_READ IMHO.
>> 
>
> I can use a personality check here too.
>
>>> +*/
>>> +   faults = 1;
>>> +   FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
>>> +   printf("read from %p, pkey is execute-disabled, access-disabled\n",
>>> +  (void *) faddr);
>>> +   pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
>>> +   i = *faddr;
>>> +   FAIL_IF(faults != 0 || fcode != SEGV_PKUERR);
>>> +
>>> +   /*
>>> +* Write an instruction word to the address when AMR bits
>>> +* are set.
>>> +*
>>> +* This should generate two faults. First, a pkey fault based
>>> +* on AMR bits and then an access fault based on PROT_EXEC.
>>> +*/
>>> +   faults = 2;
>> 
>> Setting faults to the expected value and decrementing it in the fault
>> handler is kind of weird.
>> 
>> I think it would be clearer if faults was just a zero-based counter of
>> how many faults we've taken, and then you test that it's == 2 below.
>> 
>>> +   FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
>>> +   printf("write to %p, pkey is execute-disabled, access-disabled\n",
>>> +  (void *) faddr);
>>> +   pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
>>> +   *faddr = 0x6000;/* nop */
>>> +   FAIL_IF(faults != 0 || fcode != SEGV_ACCERR);
>> 
>> ie. FAIL_IF(faults != 2 || ... )
>> 
>
> Agreed, it is weird. IIRC, I did this to make sure that if the test
> kept getting repeated faults at the same address and exceeded the
> maximum number of expected faults i.e. it gets another fault when
> 'faults' is already zero, then the signal handler will attempt to
> let the program continue by giving all permissions to the page and
> also the pkey. Would it make sense to just rename 'faults' to
> something like 'remaining_faults'?

It seems like you've tried to make the code cope with a situation that
should not happen, and would indicate a bug if it did happen, in which
case I think it would be fine if the test just timed out.

But if you want to handle it

Re: [PATCH] cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()

2020-06-02 Thread Michael Ellerman

"wanghai (M)"  writes:
> 在 2020/6/3 1:20, Markus Elfring 写道:
>>> Fix it by adding a call to kobject_put() in the error path of
>>> kobject_init_and_add().
>> Thanks for another completion of the exception handling.
>>
>> Would an other patch subject be a bit nicer?
> Thanks for the guidance, I will perfect this description and send a v2
>>
>> …
>>> +++ b/drivers/misc/cxl/sysfs.c
>>> @@ -624,7 +624,7 @@ static struct afu_config_record 
>>> *cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
>>> rc = kobject_init_and_add(>kobj, _config_record_type,
>>>   >dev.kobj, "cr%i", cr->cr);
>>> if (rc)
>>> -   goto err;
>>> +   goto err1;
>> …
>>
>> Can an other label be more reasonable here?
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465
> I just used the original author's label, should I replace all his labels 
> like'err','err1' with reasonable one.

No.

cheers

Re: linux-next: manual merge of the rcu tree with the powerpc tree

2020-06-02 Thread Stephen Rothwell

Hi all,

On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell  
wrote:
>
> Hi all,
> 
> Today's linux-next merge of the rcu tree got a conflict in:
> 
>   arch/powerpc/kernel/traps.c
> 
> between commit:
> 
>   116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
> 
> from the powerpc tree and commit:
> 
>   187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()")
> 
> from the rcu tree.

This is now a conflict between commit

  69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")

From Linus tree and the above powerpc tree commit.
-- 
Cheers,
Stephen Rothwell


pgpqOJyreOK_g.pgp
Description: OpenPGP digital signature

Re: [PATCH] cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()

2020-06-02 Thread wanghai (M)




在 2020/6/3 1:20, Markus Elfring 写道:

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Thanks for another completion of the exception handling.

Would an other patch subject be a bit nicer?

Thanks for the guidance, I will perfect this description and send a v2


…

+++ b/drivers/misc/cxl/sysfs.c
@@ -624,7 +624,7 @@ static struct afu_config_record 
*cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
rc = kobject_init_and_add(>kobj, _config_record_type,
  >dev.kobj, "cr%i", cr->cr);
if (rc)
-   goto err;
+   goto err1;

…

Can an other label be more reasonable here?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465
I just used the original author's label, should I replace all his labels 
like'err','err1' with reasonable one.

Re: [RESEND PATCH v9 1/5] powerpc: Document details on H_SCM_HEALTH hcall

2020-06-02 Thread Ira Weiny

On Tue, Jun 02, 2020 at 03:44:34PM +0530, Vaibhav Jain wrote:
> Add documentation to 'papr_hcalls.rst' describing the bitmap flags
> that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
> specification.
> 
> Cc: "Aneesh Kumar K . V" 
> Cc: Dan Williams 
> Cc: Michael Ellerman 
> Cc: Ira Weiny 

Acked-by: Ira Weiny 

> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
> 
> Resend:
> * None
> 
> v8..v9:
> * s/SCM/PMEM device. [ Dan Williams, Aneesh ]
> 
> v7..v8:
> * Added a clarification on bit-ordering of Health Bitmap
> 
> Resend:
> * None
> 
> v6..v7:
> * None
> 
> v5..v6:
> * New patch in the series
> ---
>  Documentation/powerpc/papr_hcalls.rst | 46 ---
>  1 file changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/powerpc/papr_hcalls.rst 
> b/Documentation/powerpc/papr_hcalls.rst
> index 3493631a60f8..48fcf1255a33 100644
> --- a/Documentation/powerpc/papr_hcalls.rst
> +++ b/Documentation/powerpc/papr_hcalls.rst
> @@ -220,13 +220,51 @@ from the LPAR memory.
>  **H_SCM_HEALTH**
>  
>  | Input: drcIndex
> -| Out: *health-bitmap, health-bit-valid-bitmap*
> +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
>  | Return Value: *H_Success, H_Parameter, H_Hardware*
>  
>  Given a DRC Index return the info on predictive failure and overall health of
> -the NVDIMM. The asserted bits in the health-bitmap indicate a single 
> predictive
> -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> -valid.
> +the PMEM device. The asserted bits in the health-bitmap indicate one or more 
> states
> +(described in table below) of the PMEM device and health-bit-valid-bitmap 
> indicate
> +which bits in health-bitmap are valid. The bits are reported in
> +reverse bit ordering for example a value of 0xC400
> +indicates bits 0, 1, and 5 are valid.
> +
> +Health Bitmap Flags:
> +
> ++--+---+
> +|  Bit |   Definition
>   |
> ++==+===+
> +|  00  | PMEM device is unable to persist memory contents.   
>   |
> +|  | If the system is powered down, nothing will be saved.   
>   |
> ++--+---+
> +|  01  | PMEM device failed to persist memory contents. Either contents were 
>   |
> +|  | not saved successfully on power down or were not restored properly 
> on |
> +|  | power up.   
>   |
> ++--+---+
> +|  02  | PMEM device contents are persisted from previous IPL. The data from 
>   |
> +|  | the last boot were successfully restored.   
>   |
> ++--+---+
> +|  03  | PMEM device contents are not persisted from previous IPL. There was 
> no|
> +|  | data to restore from the last boot. 
>   |
> ++--+---+
> +|  04  | PMEM device memory life remaining is critically low 
>   |
> ++--+---+
> +|  05  | PMEM device will be garded off next IPL due to failure  
>   |
> ++--+---+
> +|  06  | PMEM device contents cannot persist due to current platform health  
>   |
> +|  | status. A hardware failure may prevent data from being saved or 
>   |
> +|  | restored.   
>   |
> ++--+---+
> +|  07  | PMEM device is unable to persist memory contents in certain 
> conditions|
> ++--+---+
> +|  08  | PMEM device is encrypted
>   |
> ++--+---+
> +|  09  | PMEM device has successfully completed a requested erase or secure  
>   |
> +|  | erase procedure.
>   |
> ++--+---+
> +|10:63 | Reserved / Unused   
>   |
> ++--+---+
>  
>  **H_SCM_PERFORMANCE_STATS**
>  
> -- 
> 2.26.2
>

Re: [RESEND PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH

2020-06-02 Thread Ira Weiny

On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
> containing dimm health information back to user space in response to
> ND_CMD_CALL. This functionality is implemented in newly introduced
> papr_pdsm_health() that queries the nvdimm health information and
> then copies this information to the package payload whose layout is
> defined by 'struct nd_papr_pdsm_health'.
> 
> The patch also introduces a new member 'struct papr_scm_priv.health'
> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
> information of a nvdimm. As a result functions drc_pmem_query_health()
> and flags_show() are updated to populate and use this new struct
> instead of a u64 integer that was earlier used.
> 
> Cc: "Aneesh Kumar K . V" 
> Cc: Dan Williams 
> Cc: Michael Ellerman 
> Cc: Ira Weiny 
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
> 
> Resend:
> * Added ack from Aneesh.
> 
> v8..v9:
> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g  [ Dan , Aneesh ]
> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
> * Renamed papr_scm_get_health() to papr_psdm_health()
> * Updated patch description to replace papr-scm dimm with nvdimm.
> 
> v7..v8:
> * None
> 
> Resend:
> * None
> 
> v6..v7:
> * Updated flags_show() to use seq_buf_printf(). [Mpe]
> * Updated papr_scm_get_health() to use newly introduced
>   __drc_pmem_query_health() bypassing the cache [Mpe].
> 
> v5..v6:
> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
>   gaurd against possibility of different compilers adding different
>   paddings to the struct [ Dan Williams ]
> 
> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
>   'bool' and also updated drc_pmem_query_health() to take this into
>   account. [ Dan Williams ]
> 
> v4..v5:
> * None
> 
> v3..v4:
> * Call the DSM_PAPR_SCM_HEALTH service function from
>   papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
> 
> v2..v3:
> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
>   as its exported to the userspace [Aneesh]
> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
>   from enum to #defines [Aneesh]
> 
> v1..v2:
> * New patch in the series
> ---
>  arch/powerpc/include/uapi/asm/papr_pdsm.h |  39 +++
>  arch/powerpc/platforms/pseries/papr_scm.c | 125 +++---
>  2 files changed, 147 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
> b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> index 6407fefcc007..411725a91591 100644
> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
>   */
>  enum papr_pdsm {
>   PAPR_PDSM_MIN = 0x0,
> + PAPR_PDSM_HEALTH,
>   PAPR_PDSM_MAX,
>  };
>  
> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct 
> nd_pdsm_cmd_pkg *pcmd)
>   return (void *)(pcmd->payload);
>  }
>  
> +/* Various nvdimm health indicators */
> +#define PAPR_PDSM_DIMM_HEALTHY   0
> +#define PAPR_PDSM_DIMM_UNHEALTHY 1
> +#define PAPR_PDSM_DIMM_CRITICAL  2
> +#define PAPR_PDSM_DIMM_FATAL 3
> +
> +/*
> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
> + * Various flags indicate the health status of the dimm.
> + *
> + * dimm_unarmed  : Dimm not armed. So contents wont persist.
> + * dimm_bad_shutdown : Previous shutdown did not persist contents.
> + * dimm_bad_restore  : Contents from previous shutdown werent restored.
> + * dimm_scrubbed : Contents of the dimm have been scrubbed.
> + * dimm_locked   : Contents of the dimm cant be modified until 
> CEC reboot
> + * dimm_encrypted: Contents of dimm are encrypted.
> + * dimm_health   : Dimm health indicator. One of 
> PAPR_PDSM_DIMM_
> + */
> +struct nd_papr_pdsm_health_v1 {
> + __u8 dimm_unarmed;
> + __u8 dimm_bad_shutdown;
> + __u8 dimm_bad_restore;
> + __u8 dimm_scrubbed;
> + __u8 dimm_locked;
> + __u8 dimm_encrypted;
> + __u16 dimm_health;
> +} __packed;
> +
> +/*
> + * Typedef the current struct for dimm_health so that any application
> + * or kernel recompiled after introducing a new version automatically
> + * supports the new version.
> + */
> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
> +
> +/* Current version number for the dimm health struct */

This can't be the 'current' version.  You will need a list of versions you
support.  Because if the user passes in an old version you need to be able to
respond with that old version.  Also if you plan to support 'return X for a Y
query' then the user will need both X and Y defined to interpret X.

> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
> +
>  #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> diff --git

Re: [RESEND PATCH v9 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods

2020-06-02 Thread Ira Weiny

On Tue, Jun 02, 2020 at 01:51:49PM -0700, 'Ira Weiny' wrote:
> On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:

...

> > +
> > +/*
> > + * PDSM Envelope:
> > + *
> > + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> > + * envelope which consists of a header and user-defined payload sections.
> > + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> > + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
> > + * There is reserved field that can used to introduce new fields to the
> > + * structure in future. It also tries to ensure that 
> > 'nd_pdsm_cmd_pkg.payload'
> > + * lies at a 8-byte boundary.
> > + *
> > + *  +-+-+---+
> > + *  |   64-Bytes  |   16-Bytes  |   Max 176-Bytes   |
> > + *  +-+-+---+
> > + *  |   nd_pdsm_cmd_pkg |   |
> > + *  |-+ |   |
> > + *  |  nd_cmd_pkg | |   |
> > + *  +-+-+---+
> > + *  | nd_family   | |   |
> > + *  | nd_size_out | cmd_status  |   |
> > + *  | nd_size_in  | payload_version | payload   |
> > + *  | nd_command  | reserved|   |
> > + *  | nd_fw_size  | |   |
> > + *  +-+-+---+

One more comment WRT nd_size_[in|out].  I know that it is defined as the size
of the FW payload but normally when you nest headers 'size' in Header A
represents everything after Header A, including Header B.  In this case that
would be including nd_pdsm_cmd_pkg...

It looks like that is not what you have done?  Or perhaps I missed it?

Ira

> > + *
> > + * PDSM Header:
> > + *
> > + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> > + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> > + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope 
> > which is
> > + * contained in 'struct nd_cmd_pkg', the header also has members following 
>  ^
> ...  the  ...
> 
> > + * members:
> > + *
> > + * 'cmd_status': (Out) Errors if any encountered while 
> > servicing PDSM.
> > + * 'payload_version'   : (In/Out) Version number associated with the 
> > payload.
> > + * 'reserved'  : Not used and reserved for future.
> > + *
> > + * PDSM Payload:
> > + *
> > + * The layout of the PDSM Payload is defined by various structs shared 
> > between
> > + * papr_scm and libndctl so that contents of payload can be interpreted. 
> > During
> > + * servicing of a PDSM the papr_scm module will read input args from the 
> > payload
> > + * field by casting its contents to an appropriate struct pointer based on 
> > the
> > + * PDSM command. Similarly the output of servicing the PDSM command will be
> > + * copied to the payload field using the same struct.
> > + *
> > + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, 
> > which
> > + * leaves around 176 bytes for the envelope payload (ignoring any padding 
> > that
> > + * the compiler may silently introduce).
> > + *
> > + * Payload Version:
> > + *
> > + * A 'payload_version' field is present in PDSM header that indicates a 
> > specific
> > + * version of the structure present in PDSM Payload for a given PDSM 
> > command.
> > + * This provides backward compatibility in case the PDSM Payload structure
> > + * evolves and different structures are supported by 'papr_scm' and 
> > 'libndctl'.
> > + *
> > + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the 
> > version
> > + * of the payload struct it supports via 'payload_version' field. The 
> > 'papr_scm'
> > + * module when servicing the PDSM envelope checks the 'payload_version' 
> > and then
> > + * uses 'payload struct version' == MIN('payload_version field',
> > + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> > + * After servicing the PDSM, 'papr_scm' put the negotiated version of 
> > payload
> > + * struct in returned 'payload_version' field.
> > + *
> > + * Libndctl on receiving the envelope back from papr_scm again checks the
> > + * 'payload_version' field and based on it use the appropriate version dsm
> > + * struct to parse the results.
> > + *
> > + * Backward Compatibility:
> > + *
> > + * Above scheme of exchanging different versioned PDSM struct between 
> > libndctl
> > + * and papr_scm should provide backward compatibility until following two
> > + * assumptions/conditions when defining new

Re: [RESEND PATCH v9 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods

2020-06-02 Thread Ira Weiny

On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:
> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
> module and add the command family NVDIMM_FAMILY_PAPR to the white list
> of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
> nvdimm command mask and implement necessary scaffolding in the module
> to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
> 
> The layout of the PDSM request as we expect from libnvdimm/libndctl is
> described in newly introduced uapi header 'papr_pdsm.h' which
> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
> to communicate the PDSM request via member
> 'nd_cmd_pkg.nd_command' and size of payload that need to be
> sent/received for servicing the PDSM.
> 
> A new function is_cmd_valid() is implemented that reads the args to
> papr_scm_ndctl() and performs sanity tests on them. A new function
> papr_scm_service_pdsm() is introduced and is called from
> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
> command from libnvdimm.
> 
> Cc: "Aneesh Kumar K . V" 
> Cc: Dan Williams 
> Cc: Michael Ellerman 
> Cc: Ira Weiny 
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
> 
> Resend:
> * Added ack from Aneesh.
> 
> v8..v9:
> * Reduced the usage of term SCM replacing it with appropriate
>   replacement [ Dan Williams, Aneesh ]
> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
> * Minor update to patch description.
> 
> v7..v8:
> * Removed the 'payload_offset' field from 'struct
>   nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
>   at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
>   'reserved' field of 10-bytes is introduced. [ Aneesh ]
> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
>   [ Ira ]
> 
> Resend:
> * None
> 
> v6..v7 :
> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>   [Mpe].
> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>   [Mpe].
> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
> 
> v5..v6 :
> * Changed the usage of the term DSM to PDSM to distinguish it from the
>   ACPI term [ Dan Williams ]
> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
>   to reflect the new terminology.
> * Updated the patch description and title to reflect the new terminology.
> * Squashed patch to introduce new command family in 'ndctl.h' with
>   this patch [ Dan Williams ]
> * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0
>   [ Dan Williams ]
> * Removed redundant license text from the papr_scm_psdm.h file.
>   [ Dan Williams ]
> * s/envelop/envelope/ at various places [ Dan Williams ]
> * Added '__packed' attribute to command package header to gaurd
>   against different compiler adding paddings between the fields.
>   [ Dan Williams]
> * Converted various pr_debug to dev_debug [ Dan Williams ]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
> 
> v1..v2 :
> * None
> ---
>  arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++
>  arch/powerpc/platforms/pseries/papr_scm.c | 101 +++-
>  include/uapi/linux/ndctl.h|   1 +
>  3 files changed, 232 insertions(+), 6 deletions(-)
>  create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
> b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> new file mode 100644
> index ..6407fefcc007
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> @@ -0,0 +1,136 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
> + *
> + * (C) Copyright IBM 2020
> + *
> + * Author: Vaibhav Jain 
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
> +#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
> +
> +#include 
> +
> +/*
> + * PDSM Envelope:
> + *
> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> + * envelope which consists of a header and user-defined payload sections.
> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
> + * There is reserved field that can used to introduce new fields to the
> + * structure in future. It also tries to ensure that 
> 'nd_pdsm_cmd_pkg.payload'
> + * lies at a 8-byte boundary.
> + *
> + *  +-+-+---+
> + *  |   64-Bytes  |   16-Bytes  |   Max 176-Bytes   |
> + *

[PATCH v5 2/3] arch: wire-up close_range()

2020-06-02 Thread Christian Brauner

This wires up the close_range() syscall into all arches at once.

Suggested-by: Arnd Bergmann 
Signed-off-by: Christian Brauner 
Reviewed-by: Oleg Nesterov 
Acked-by: Arnd Bergmann 
Acked-by: Michael Ellerman  (powerpc)
Cc: Jann Horn 
Cc: David Howells 
Cc: Dmitry V. Levin 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: Florian Weimer 
Cc: linux-...@vger.kernel.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
---
/* v2 */
not present

/* v3 */
not present

/* v4 */
introduced
- Arnd Bergmann :
  - split into two patches:
1. add close_range()
2. add syscall to all arches at once
  - bump __NR_compat_syscalls in arch/arm64/include/asm/unistd.h

/* v5 */
unchanged
---
 arch/alpha/kernel/syscalls/syscall.tbl  | 1 +
 arch/arm/tools/syscall.tbl  | 1 +
 arch/arm64/include/asm/unistd.h | 2 +-
 arch/arm64/include/asm/unistd32.h   | 2 ++
 arch/ia64/kernel/syscalls/syscall.tbl   | 1 +
 arch/m68k/kernel/syscalls/syscall.tbl   | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl| 1 +
 arch/s390/kernel/syscalls/syscall.tbl   | 1 +
 arch/sh/kernel/syscalls/syscall.tbl | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl  | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl  | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl  | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
 include/uapi/asm-generic/unistd.h   | 4 +++-
 19 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 36d42da7466a..67ef02ead4da 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -475,5 +475,6 @@
 543common  fspick  sys_fspick
 544common  pidfd_open  sys_pidfd_open
 # 545 reserved for clone3
+546common  close_range sys_close_range
 547common  openat2 sys_openat2
 548common  pidfd_getfd sys_pidfd_getfd
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 4d1cf74a2caa..13c5652137fb 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -449,5 +449,6 @@
 433common  fspick  sys_fspick
 434common  pidfd_open  sys_pidfd_open
 435common  clone3  sys_clone3
+436common  close_range sys_close_range
 437common  openat2 sys_openat2
 438common  pidfd_getfd sys_pidfd_getfd
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 803039d504de..3b859596840d 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls   439
+#define __NR_compat_syscalls   440
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index c1c61635f89c..902bfb136002 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick)
 __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 #define __NR_clone3 435
 __SYSCALL(__NR_clone3, sys_clone3)
+#define __NR_close_range 436
+__SYSCALL(__NR_close_range, sys_close_range)
 #define __NR_openat2 437
 __SYSCALL(__NR_openat2, sys_openat2)
 #define __NR_pidfd_getfd 438
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl 
b/arch/ia64/kernel/syscalls/syscall.tbl
index 042911e670b8..df2e14da6a29 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -356,5 +356,6 @@
 433common  fspick  sys_fspick
 434common  pidfd_open  sys_pidfd_open
 # 435 reserved for clone3
+436common  close_range sys_close_range
 437common  openat2 sys_openat2
 438common  pidfd_getfd sys_pidfd_getfd
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl 
b/arch/m68k/kernel/syscalls/syscall.tbl
index f4f49fcb76d0..553b8858e667 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++

Re: [PATCH 1/4] powerpc: Add a ppc_inst_as_str() helper

2020-06-02 Thread Segher Boessenkool

On Tue, Jun 02, 2020 at 03:27:25PM +1000, Jordan Niethe wrote:
> There are quite a few places where instructions are printed, this is
> done using a '%x' format specifier. With the introduction of prefixed
> instructions, this does not work well. Currently in these places,
> ppc_inst_val() is used for the value for %x so only the first word of
> prefixed instructions are printed.
> 
> When the instructions are word instructions, only a single word should
> be printed. For prefixed instructions both the prefix and suffix should
> be printed. To accommodate both of these situations, instead of a '%x'
> specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
> returns a char *. The char * __ppc_inst_as_str() returns is buffer that
> is passed to it by the caller.
> 
> It is cumbersome to require every caller of __ppc_inst_as_str() to now
> declare a buffer. To make it more convenient to use __ppc_inst_as_str(),
> wrap it in a macro that uses a compound statement to allocate a buffer
> on the caller's stack before calling it.
> 
> Signed-off-by: Jordan Niethe 
Acked-by: Segher Boessenkool 

> +#define PPC_INST_STR_LEN sizeof("0x 0x")

sizeof is not a function or anything function-like; it's just an
operator, like (unary) "+" or "-" or dereference "*".  The parentheses
are completely redundant here.  And it kind of matters, as written a
reader might think that a string object is actually constructed here.

But, so very many people do this, I'm not going to fight this fight ;-)


Segher

Re: [PATCH 11/14] x86/entry: Clarify irq_{enter,exit}_rcu()

2020-06-02 Thread Qian Cai

On Tue, Jun 02, 2020 at 05:05:11PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 02, 2020 at 10:42:35AM -0400, Qian Cai wrote:
> 
> > Reverted this commit fixed the POWER9 boot warning,
> 
> ARGH, I'm an idiot. Please try this instead:
>
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index a3eb6eba8c41..c4201b7f42b1 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -438,7 +438,7 @@ void irq_exit_rcu(void)
>   */
>  void irq_exit(void)
>  {
> - irq_exit_rcu();
> + __irq_exit_rcu();
>   rcu_irq_exit();
>/* must be last! */
>   lockdep_hardirq_exit();

This works fine.

Re: [RESEND PATCH v9 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-06-02 Thread Ira Weiny

On Tue, Jun 02, 2020 at 03:44:36PM +0530, Vaibhav Jain wrote:
> Implement support for fetching nvdimm health information via
> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
> of 64-bit bitmap, bitwise-and of which is then stored in
> 'struct papr_scm_priv' and subsequently partially exposed to
> user-space via newly introduced dimm specific attribute
> 'papr/flags'. Since the hcall is costly, the health information is
> cached and only re-queried, 60s after the previous successful hcall.
> 
> The patch also adds a  documentation text describing flags reported by
> the the new sysfs attribute 'papr/flags' is also introduced at
> Documentation/ABI/testing/sysfs-bus-papr-pmem.
> 
> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
> PAPR hcalls")
> 
> Cc: "Aneesh Kumar K . V" 
> Cc: Dan Williams 
> Cc: Michael Ellerman 
> Cc: Ira Weiny 
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
> 
> Resend:
> * Added ack from Aneesh.
> 
> v8..v9:
> * Rename some variables and defines to reduce usage of term SCM
>   replacing it with PMEM [Dan Williams, Aneesh]
> * s/PAPR_SCM_DIMM/PAPR_PMEM/g
> * s/papr_scm_nd_attributes/papr_nd_attributes/g
> * s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
> * s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
> * Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
> 
> v7..v8:
> * Update type of variable 'rc' in __drc_pmem_query_health() and
>   drc_pmem_query_health() to long and int respectively. [ Ira ]
> * Updated the patch description to s/64 bit Big Endian Number/64-bit
>   bitmap/ [ Ira, Aneesh ].
> 
> Resend:
> * None
> 
> v6..v7 :
> * Used the exported buf_seq_printf() function to generate content for
>   'papr/flags'
> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
>   and removed the papr_scm.h file [Mpe]
> * Some minor consistency issued in sysfs-bus-papr-scm
>   documentation. [Mpe]
> * s/dimm_mutex/health_mutex/g [Mpe]
> * Split drc_pmem_query_health() into two function one of which takes
>   care of caching and locking. [Mpe]
> * Fixed a local copy creation of dimm health information using
>   READ_ONCE(). [Mpe]
> 
> v5..v6 :
> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
>   [Dan Williams]
> * Include documentation for 'papr/flags' attr [Dan Williams]
> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
> * Replaced two __be64 integers from papr_scm_priv to a single u64
>   integer [Mpe]
> * Updated patch description to reflect the changes made in this
>   version.
> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
>   flags_show() [Dan Williams]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>NVDIMM unarmed [Aneesh]
> 
> v1..v2 :
> * New patch in the series.
> ---
>  Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 +++
>  arch/powerpc/platforms/pseries/papr_scm.c | 169 +-
>  2 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem 
> b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> new file mode 100644
> index ..5b10d036a8d4
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> @@ -0,0 +1,27 @@
> +What:/sys/bus/nd/devices/nmemX/papr/flags
> +Date:Apr, 2020
> +KernelVersion:   v5.8
> +Contact: linuxppc-dev , 
> linux-nvd...@lists.01.org,
> +Description:
> + (RO) Report flags indicating various states of a
> + papr-pmem NVDIMM device. Each flag maps to a one or
> + more bits set in the dimm-health-bitmap retrieved in
> + response to H_SCM_HEALTH hcall. The details of the bit
> + flags returned in response to this hcall is available
> + at 'Documentation/powerpc/papr_hcalls.rst' . Below are
> + the flags reported in this sysfs file:
> +
> + * "not_armed"   : Indicates that NVDIMM contents will not
> +   survive a power cycle.
> + * "flush_fail"  : Indicates that NVDIMM contents
> +   couldn't be flushed during last
> +   shut-down event.
> + * "restore_fail": Indicates that NVDIMM contents
> +   couldn't be restored during NVDIMM
> +   initialization.
> + * "encrypted"   : NVDIMM contents are encrypted.
> + * "smart_notify": There is health event for the NVDIMM.
> + * "scrubbed": Indicating that contents of the
> +   NVDIMM have been scrubbed.
> +

RE: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-06-02 Thread Williams, Dan J

[ forgive formatting, a series of unfortunate events has me using Outlook for 
the moment ]

> From: Jan Kara 
> > > > These flags are device properties that affect the kernel and
> > > > userspace's handling of persistence.
> > > >
> > >
> > > That will not handle the scenario with multiple applications using
> > > the same fsdax mount point where one is updated to use the new
> > > instruction and the other is not.
> >
> > Right, it needs to be a global setting / flag day to switch from one
> > regime to another. Per-process control is a recipe for disaster.
> 
> First I'd like to mention that hopefully the concern is mostly theoretical 
> since
> as Aneesh wrote above, real persistent memory never shipped for PPC and
> so there are very few apps (if any) using the old way to ensure cache
> flushing.
> 
> But I'd like to understand why do you think per-process control is a recipe 
> for
> disaster? Because from my POV the sysfs interface you propose is actually
> difficult to use in practice. As a distributor, you have hard time picking the
> default because you have a choice between picking safe option which is
> going to confuse users because of failing MAP_SYNC and unsafe option
> where everyone will be happy until someone looses data because of some
> ancient application using wrong instructions to persist data. Poor experience
> for users in either way. And when distro defaults to "safe option", then the
> burden is on the sysadmin to toggle the switch but how is he supposed to
> decide when that is safe? First he has to understand what the problem
> actually is, then he has to audit all the applications using pmem whether they
> use the new instruction - which is IMO a lot of effort if you have a couple of
> applications and practically infeasible if you have more of them.
> So IMO the burden should be *on the application* to declare that it is aware
> of the new instructions to flush pmem on the platform and only to such
> application the kernel should give the trust to use MAP_SYNC mappings.

The "disaster" in my mind is this need to globally change the ABI for 
persistence semantics for all of Linux because one CPU wants a do over. What 
does a generic "MAP_SYNC_ENABLE" knob even mean to the existing deployed base 
of persistent memory applications? Yes, sysfs is awkward, but it's trying to 
provide some relief without imposing unexplainable semantics on everyone else. 
I think a comprehensive (overengineered) solution would involve not introducing 
another "I know what I'm doing" flag to the interface, but maybe requiring 
applications to call a pmem sync API in something like a vsyscall. Or, also 
overengineered, some binary translation / interpretation to actively detect and 
kill applications that deploy the old instructions. Something horrid like on 
first write fault to a MAP_SYNC try to look ahead in the binary for the correct 
sync sequence and kill the application otherwise. That would at least provide 
some enforcement and safety without requiring other architectures to consider 
what MAP_SYNC_ENABLE means to them.

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Segher Boessenkool

On Tue, Jun 02, 2020 at 01:50:32PM +, Joseph Myers wrote:
> On Mon, 1 Jun 2020, Segher Boessenkool wrote:
> 
> > > The supported glibc ABIs are listed at 
> > > .
> > 
> > powerpcle-linux already does work somewhat, and that should also he
> > worth something, official or not ;-)
> > 
> > (It has worked for very many years already...  That is, I have built it
> > a lot, I have no idea about running a full distro that way).
> 
> Greg McGary's patches as referenced at 
>  were 
> never merged, and I don't know how big the changes to .S files were or if 
> they were ever posted.  Many files were subsequently fixed as part of 
> bringing up support for powerpc64le, but without actual 32-bit LE testing 
> as part of each release cycle there's not much evidence of correctness for 
> LE of code not used for powerpc64le.

Yeah sorry, I meant I have built the toolchain parts a lot, not libc.
GCC and binutils.  I reckon there is more work to do in libc, certainly
for configuration and similar.


Segher

Re: [PATCH] cxl: Fix kobject memory leak in cxl_sysfs_afu_new_cr()

2020-06-02 Thread Markus Elfring

> Fix it by adding a call to kobject_put() in the error path of
> kobject_init_and_add().

Thanks for another completion of the exception handling.

Would an other patch subject be a bit nicer?


…
> +++ b/drivers/misc/cxl/sysfs.c
> @@ -624,7 +624,7 @@ static struct afu_config_record 
> *cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
>   rc = kobject_init_and_add(>kobj, _config_record_type,
> >dev.kobj, "cr%i", cr->cr);
>   if (rc)
> - goto err;
> + goto err1;
…

Can an other label be more reasonable here?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n465

Regards,
Markus

Re: [PATCH] cxl: Fix kobject memleak

2020-06-02 Thread Frederic Barrat





Le 02/06/2020 à 14:07, Wang Hai a écrit :

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking
the kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Fixes: b087e6190ddc ("cxl: Export optional AFU configuration record in sysfs")
Reported-by: Hulk Robot 
Signed-off-by: Wang Hai 



Indeed, a call to kobject_put() is needed when the init fails.
Thanks!
Acked-by: Frederic Barrat 



---
  drivers/misc/cxl/sysfs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
index f0263d1..d97a243 100644
--- a/drivers/misc/cxl/sysfs.c
+++ b/drivers/misc/cxl/sysfs.c
@@ -624,7 +624,7 @@ static struct afu_config_record 
*cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
rc = kobject_init_and_add(>kobj, _config_record_type,
  >dev.kobj, "cr%i", cr->cr);
if (rc)
-   goto err;
+   goto err1;
  
  	rc = sysfs_create_bin_file(>kobj, >config_attr);

if (rc)

Re: [PATCH v2] cxl: Remove dead Kconfig option

2020-06-02 Thread Frederic Barrat





Le 02/06/2020 à 09:05, Andrew Donnellan a écrit :

The CXL_AFU_DRIVER_OPS Kconfig option was added to coordinate merging of
new features. It no longer serves any purpose, so remove it.

Signed-off-by: Andrew Donnellan 



Acked-by: Frederic Barrat 




---
v1->v2:
- keep CXL_LIB for now to avoid breaking a driver that is currently out of
tree
---
  drivers/misc/cxl/Kconfig | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 39eec9031487..984114f7cee9 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -7,9 +7,6 @@ config CXL_BASE
bool
select PPC_COPRO_BASE
  
-config CXL_AFU_DRIVER_OPS

-   bool
-
  config CXL_LIB
bool
  
@@ -17,7 +14,6 @@ config CXL

tristate "Support for IBM Coherent Accelerators (CXL)"
depends on PPC_POWERNV && PCI_MSI && EEH
select CXL_BASE
-   select CXL_AFU_DRIVER_OPS
select CXL_LIB
default m
help

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Michal Suchánek

On Tue, Jun 02, 2020 at 05:40:39PM +0200, Daniel Kolesa wrote:
> 
> 
> On Tue, Jun 2, 2020, at 17:27, Michal Suchánek wrote:
> > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > On Tue, Jun 2, 2020, at 16:23, Michal Suchánek wrote:
> > > > On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> > > > > On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> > > > > 
> > > > > > not be limited to being just userspace under ppc64le, but should be 
> > > > > > runnable on a native kernel as well, which should not be limited to 
> > > > > > any 
> > > > > > particular baseline other than just PowerPC.
> > > > > 
> > > > > This is a fairly unusual approach to bringing up a new ABI.  Since 
> > > > > new 
> > > > > ABIs are more likely to be used on new systems rather than switching 
> > > > > ABI 
> > > > > on an existing installation, and since it can take quite some time 
> > > > > for all 
> > > > > the software support for a new ABI to become widely available in 
> > > > > distributions, people developing new ABIs are likely to think about 
> > > > > what 
> > > > > new systems are going to be relevant in a few years' time when 
> > > > > working out 
> > > > > the minimum hardware requirements for the new ABI.  (The POWER8 
> > > > > minimum 
> > > > > for powerpc64le fits in with that, for example.)
> > > > That means that you cannot run ppc64le on FSL embedded CPUs (which lack
> > > > the vector instructions in LE mode). Which may be fine with you but
> > > > other people may want to support these. Can't really say if that's good
> > > > idea or not but I don't foresee them going away in a few years, either.
> > > 
> > > well, ppc64le already cannot be run on those, as far as I know (I don't 
> > > think it's possible to build ppc64le userland without VSX in any 
> > > configuration)
> > 
> > What hardware are you targetting then? I did not notice anything
> > specific mentioned in the thread.
> > 
> > Naturally on POWER the first cpu that has LE support is POWER8 so you
> > can count on all other POWER8 features to be present. With other
> > architecture variants the situation is different.
> 
> This is not true; nearly every 32-bit PowerPC CPU has LE support (all the way 
> back to 6xx), these would be the native-hardware targets for the port (would 
> need kernel support implemented, but it's technically possible).
I find dealing with memory management issues on 32bit architectures a
pain. There is never enough address space.
> 
> As far as 64-bit CPUs go, POWER7 is the first one that could in practice run 
> the current ppc64le configuration, but in glibc it's limited to POWER8 and in 
> gcc the default for powerpc64le is also POWER8 (however, it is perfectly 
> possible to configure gcc for POWER7 and use musl libc with it).
That's interesting. I guess I was tricked but the glibc limitation.

Thanks

Michal

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Daniel Kolesa




On Tue, Jun 2, 2020, at 17:27, Michal Suchánek wrote:
> On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > On Tue, Jun 2, 2020, at 16:23, Michal Suchánek wrote:
> > > On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> > > > On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> > > > 
> > > > > not be limited to being just userspace under ppc64le, but should be 
> > > > > runnable on a native kernel as well, which should not be limited to 
> > > > > any 
> > > > > particular baseline other than just PowerPC.
> > > > 
> > > > This is a fairly unusual approach to bringing up a new ABI.  Since new 
> > > > ABIs are more likely to be used on new systems rather than switching 
> > > > ABI 
> > > > on an existing installation, and since it can take quite some time for 
> > > > all 
> > > > the software support for a new ABI to become widely available in 
> > > > distributions, people developing new ABIs are likely to think about 
> > > > what 
> > > > new systems are going to be relevant in a few years' time when working 
> > > > out 
> > > > the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
> > > > for powerpc64le fits in with that, for example.)
> > > That means that you cannot run ppc64le on FSL embedded CPUs (which lack
> > > the vector instructions in LE mode). Which may be fine with you but
> > > other people may want to support these. Can't really say if that's good
> > > idea or not but I don't foresee them going away in a few years, either.
> > 
> > well, ppc64le already cannot be run on those, as far as I know (I don't 
> > think it's possible to build ppc64le userland without VSX in any 
> > configuration)
> 
> What hardware are you targetting then? I did not notice anything
> specific mentioned in the thread.
> 
> Naturally on POWER the first cpu that has LE support is POWER8 so you
> can count on all other POWER8 features to be present. With other
> architecture variants the situation is different.

This is not true; nearly every 32-bit PowerPC CPU has LE support (all the way 
back to 6xx), these would be the native-hardware targets for the port (would 
need kernel support implemented, but it's technically possible).

As far as 64-bit CPUs go, POWER7 is the first one that could in practice run 
the current ppc64le configuration, but in glibc it's limited to POWER8 and in 
gcc the default for powerpc64le is also POWER8 (however, it is perfectly 
possible to configure gcc for POWER7 and use musl libc with it).

> 
> Thanks
> 
> Michal
>

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Michal Suchánek

On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> On Tue, Jun 2, 2020, at 16:23, Michal Suchánek wrote:
> > On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> > > On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> > > 
> > > > not be limited to being just userspace under ppc64le, but should be 
> > > > runnable on a native kernel as well, which should not be limited to any 
> > > > particular baseline other than just PowerPC.
> > > 
> > > This is a fairly unusual approach to bringing up a new ABI.  Since new 
> > > ABIs are more likely to be used on new systems rather than switching ABI 
> > > on an existing installation, and since it can take quite some time for 
> > > all 
> > > the software support for a new ABI to become widely available in 
> > > distributions, people developing new ABIs are likely to think about what 
> > > new systems are going to be relevant in a few years' time when working 
> > > out 
> > > the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
> > > for powerpc64le fits in with that, for example.)
> > That means that you cannot run ppc64le on FSL embedded CPUs (which lack
> > the vector instructions in LE mode). Which may be fine with you but
> > other people may want to support these. Can't really say if that's good
> > idea or not but I don't foresee them going away in a few years, either.
> 
> well, ppc64le already cannot be run on those, as far as I know (I don't think 
> it's possible to build ppc64le userland without VSX in any configuration)

What hardware are you targetting then? I did not notice anything
specific mentioned in the thread.

Naturally on POWER the first cpu that has LE support is POWER8 so you
can count on all other POWER8 features to be present. With other
architecture variants the situation is different.

Thanks

Michal

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Daniel Kolesa

On Tue, Jun 2, 2020, at 16:23, Michal Suchánek wrote:
> On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> > On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> > 
> > > not be limited to being just userspace under ppc64le, but should be 
> > > runnable on a native kernel as well, which should not be limited to any 
> > > particular baseline other than just PowerPC.
> > 
> > This is a fairly unusual approach to bringing up a new ABI.  Since new 
> > ABIs are more likely to be used on new systems rather than switching ABI 
> > on an existing installation, and since it can take quite some time for all 
> > the software support for a new ABI to become widely available in 
> > distributions, people developing new ABIs are likely to think about what 
> > new systems are going to be relevant in a few years' time when working out 
> > the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
> > for powerpc64le fits in with that, for example.)
> That means that you cannot run ppc64le on FSL embedded CPUs (which lack
> the vector instructions in LE mode). Which may be fine with you but
> other people may want to support these. Can't really say if that's good
> idea or not but I don't foresee them going away in a few years, either.

well, ppc64le already cannot be run on those, as far as I know (I don't think 
it's possible to build ppc64le userland without VSX in any configuration)

> 
> Thanks
> 
> Michal
>

Daniel

Re: [PATCH 11/14] x86/entry: Clarify irq_{enter,exit}_rcu()

2020-06-02 Thread Peter Zijlstra

On Tue, Jun 02, 2020 at 10:42:35AM -0400, Qian Cai wrote:

> Reverted this commit fixed the POWER9 boot warning,

ARGH, I'm an idiot. Please try this instead:


diff --git a/kernel/softirq.c b/kernel/softirq.c
index a3eb6eba8c41..c4201b7f42b1 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -438,7 +438,7 @@ void irq_exit_rcu(void)
  */
 void irq_exit(void)
 {
-   irq_exit_rcu();
+   __irq_exit_rcu();
rcu_irq_exit();
 /* must be last! */
lockdep_hardirq_exit();

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Daniel Kolesa

On Tue, Jun 2, 2020, at 15:40, Joseph Myers wrote:
> On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> 
> > not be limited to being just userspace under ppc64le, but should be 
> > runnable on a native kernel as well, which should not be limited to any 
> > particular baseline other than just PowerPC.
> 
> This is a fairly unusual approach to bringing up a new ABI.  Since new 
> ABIs are more likely to be used on new systems rather than switching ABI 
> on an existing installation, and since it can take quite some time for all 
> the software support for a new ABI to become widely available in 
> distributions, people developing new ABIs are likely to think about what 
> new systems are going to be relevant in a few years' time when working out 
> the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
> for powerpc64le fits in with that, for example.)

In this case the whole point is targeting existing hardware that we already 
have. It also aims to largely utilize things that are already present in all 
parts of the toolchain, otherwise it's a lot of effort with questionable 
benefit and artificially limits the baseline for no good reason.

> 
> > either the AIX/ELFv1 nor the ELFv2 ABIs) If we were to introduce new 
> > ports, what would those use? ld64.so.3 for BE/v2? ld.so.2 for LE/32-bit? 
> 
> Rather than relying on numbers such as "3" or 2" in a particular place 
> being unique across all (architecture, ABI) pairs supported by glibc, 
> something more obviously specific to a particular architecture and ABI, 
> e.g. ld-linux-powerpc64be-elfv2.so.1, would be better.

Yes, agreed on that - probably ld-linux-powerpc64-elfv2.so.1, to match existing 
conventions (powerpc64 implicitly refers to BE in target triples, etc). It's 
just inconsistent with the existing ports, but I guess the reason in those is 
legacy in the first place, so not much point in worrying about that.

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com
>

Daniel

Re: [PATCH] powerpc: Fix misleading small cores print

2020-06-02 Thread Gautham R Shenoy

On Fri, May 29, 2020 at 09:07:31AM +1000, Michael Neuling wrote:
> Currently when we boot on a big core system, we get this print:
>   [0.040500] Using small cores at SMT level
> 
> This is misleading as we've actually detected big cores.
> 
> This patch clears up the print to say we've detect big cores but are
> using small cores for scheduling.

Thanks for making the print more meaningful.


> 
> Signed-off-by: Michael Neuling 

FWIW,
Acked-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/kernel/smp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 6d2a3a3666..c820c95162 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1383,7 +1383,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> 
>  #ifdef CONFIG_SCHED_SMT
>   if (has_big_cores) {
> - pr_info("Using small cores at SMT level\n");
> + pr_info("Big cores detected but using small core scheduling\n");
>   power9_topology[0].mask = smallcore_smt_mask;
>   powerpc_topology[0].mask = smallcore_smt_mask;
>   }
> -- 
> 2.26.2
>

Re: [PATCH 11/14] x86/entry: Clarify irq_{enter,exit}_rcu()

2020-06-02 Thread Qian Cai

On Fri, May 29, 2020 at 11:27:39PM +0200, Peter Zijlstra wrote:
> Because:
> 
>   irq_enter_rcu() includes lockdep_hardirq_enter()
>   irq_exit_rcu() does *NOT* include lockdep_hardirq_exit()
> 
> Which resulted in two 'stray' lockdep_hardirq_exit() calls in
> idtentry.h, and me spending a long time trying to find the matching
> enter calls.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/include/asm/idtentry.h |2 --
>  kernel/softirq.c|   19 +--
>  2 files changed, 13 insertions(+), 8 deletions(-)
> 
[]
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -404,12 +404,7 @@ static inline void tick_irq_exit(void)
>  #endif
>  }
>  
> -/**
> - * irq_exit_rcu() - Exit an interrupt context without updating RCU
> - *
> - * Also processes softirqs if needed and possible.
> - */
> -void irq_exit_rcu(void)
> +static inline void __irq_exit_rcu(void)
>  {
>  #ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
>   local_irq_disable();
> @@ -425,6 +420,18 @@ void irq_exit_rcu(void)
>  }
>  
>  /**
> + * irq_exit_rcu() - Exit an interrupt context without updating RCU
> + *
> + * Also processes softirqs if needed and possible.
> + */
> +void irq_exit_rcu(void)
> +{
> + __irq_exit_rcu();
> +  /* must be last! */
> + lockdep_hardirq_exit();
> +}
> +
> +/**
>   * irq_exit - Exit an interrupt context, update RCU and lockdep
>   *
>   * Also processes softirqs if needed and possible.
> 
>

Reverted this commit fixed the POWER9 boot warning,

[0.005196][T0] clocksource: timebase: mask: 0x 
max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns
[0.012502][T0] clocksource: timebase mult[1f4] shift[24] registered
[0.030273][T0] [ cut here ]
[0.034421][T0] DEBUG_LOCKS_WARN_ON(current->hardirq_context)
[0.034433][T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:3680 
lockdep_hardirqs_on_prepare+0x29c/0x2d0
[0.045874][T0] Modules linked in:
[0.047977][T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.7.0-next-20200602 #1
[0.053187][    T0] NIP:  c01d2fec LR: c01d2fe8 CTR: 
c074b0a0
[0.057395][T0] REGS: c130f810 TRAP: 0700   Not tainted  
(5.7.0-next-20200602)
[0.062614][T0] MSR:  90021033   CR: 
48000422  XER: 2004
[0.069856][T0] CFAR: c010e448 IRQMASK: 1
[0.069856][T0] GPR00: c01d2fe8 c130faa0 
c130aa00 002d
[0.069856][T0] GPR04: c133c3b0 000d 
6e6f635f 72727563284e4f5f
[0.069856][T0] GPR08: 0002 c0dcf230 
0001 c12b0280
[0.069856][T0] GPR12:  c57b 
 
[0.069856][T0] GPR16:   
 
[0.069856][T0] GPR20:  0001 
10004d9c 100053ed
[0.069856][T0] GPR24: 10005411 0001 
0002 0003
[0.069856][T0] GPR28:   
 c3e3b008
[0.117846][T0] NIP [c01d2fec] 
lockdep_hardirqs_on_prepare+0x29c/0x2d0
[0.123052][T0] LR [c01d2fe8] 
lockdep_hardirqs_on_prepare+0x298/0x2d0
[0.127248][T0] Call Trace:
[0.129337][T0] [c130faa0] [c01d2fe8] 
lockdep_hardirqs_on_prepare+0x298/0x2d0 (unreliable)
[0.137613][T0] [c130fb10] [c02d3834] 
trace_hardirqs_on+0x94/0x230
trace_hardirqs_on at kernel/trace/trace_preemptirq.c:49
[0.141824][T0] [c130fb60] [c0039100] 
interrupt_exit_kernel_prepare+0x110/0x1f0
interrupt_exit_kernel_prepare at arch/powerpc/kernel/syscall_64.c:337
[0.148069][T0] [c130fbc0] [c000f328] 
interrupt_return+0x118/0x1c0
[0.152281][T0] --- interrupt: 900 at arch_local_irq_restore+0xc0/0xd0
arch_local_irq_restore at arch/powerpc/kernel/irq.c:367
(inlined by) arch_local_irq_restore at arch/powerpc/kernel/irq.c:318
[0.152281][T0] LR = start_kernel+0x7f0/0x9dc
[0.153579][T0] [c130fec0] [c1208fa8] 
init_on_free+0x0/0x2b0 (unreliable)
[0.159810][T0] [c130fee0] [c0c845c8] 
start_kernel+0x7e4/0x9dc
start_kernel at init/main.c:961 (discriminator 3)
[0.165017][T0] [c130ff90] [c000c890] 
start_here_common+0x1c/0x8c
[0.169224][T0] Instruction dump:
[0.171324][T0] 0fe0 e8010080 ebc10060 ebe10068 7c0803a6 4bfffe7c 
3c82ff8b 3c62ff8a
[0.177558][T0] 38848808 3863e460 4bf3b3fd 6000 <0fe0> e8010080 
ebc10060 ebe10068
[0.183796][T0] irq event stamp: 16
[0.186904][T0] hardirqs las

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Michal Suchánek

On Tue, Jun 02, 2020 at 01:40:23PM +, Joseph Myers wrote:
> On Tue, 2 Jun 2020, Daniel Kolesa wrote:
> 
> > not be limited to being just userspace under ppc64le, but should be 
> > runnable on a native kernel as well, which should not be limited to any 
> > particular baseline other than just PowerPC.
> 
> This is a fairly unusual approach to bringing up a new ABI.  Since new 
> ABIs are more likely to be used on new systems rather than switching ABI 
> on an existing installation, and since it can take quite some time for all 
> the software support for a new ABI to become widely available in 
> distributions, people developing new ABIs are likely to think about what 
> new systems are going to be relevant in a few years' time when working out 
> the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
> for powerpc64le fits in with that, for example.)
That means that you cannot run ppc64le on FSL embedded CPUs (which lack
the vector instructions in LE mode). Which may be fine with you but
other people may want to support these. Can't really say if that's good
idea or not but I don't foresee them going away in a few years, either.

Thanks

Michal

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Joseph Myers

On Mon, 1 Jun 2020, Segher Boessenkool wrote:

> > The supported glibc ABIs are listed at 
> > .
> 
> powerpcle-linux already does work somewhat, and that should also he
> worth something, official or not ;-)
> 
> (It has worked for very many years already...  That is, I have built it
> a lot, I have no idea about running a full distro that way).

Greg McGary's patches as referenced at 
 were 
never merged, and I don't know how big the changes to .S files were or if 
they were ever posted.  Many files were subsequently fixed as part of 
bringing up support for powerpc64le, but without actual 32-bit LE testing 
as part of each release cycle there's not much evidence of correctness for 
LE of code not used for powerpc64le.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: ppc64le and 32-bit LE userland compatibility

2020-06-02 Thread Joseph Myers

On Tue, 2 Jun 2020, Daniel Kolesa wrote:

> not be limited to being just userspace under ppc64le, but should be 
> runnable on a native kernel as well, which should not be limited to any 
> particular baseline other than just PowerPC.

This is a fairly unusual approach to bringing up a new ABI.  Since new 
ABIs are more likely to be used on new systems rather than switching ABI 
on an existing installation, and since it can take quite some time for all 
the software support for a new ABI to become widely available in 
distributions, people developing new ABIs are likely to think about what 
new systems are going to be relevant in a few years' time when working out 
the minimum hardware requirements for the new ABI.  (The POWER8 minimum 
for powerpc64le fits in with that, for example.)

> either the AIX/ELFv1 nor the ELFv2 ABIs) If we were to introduce new 
> ports, what would those use? ld64.so.3 for BE/v2? ld.so.2 for LE/32-bit? 

Rather than relying on numbers such as "3" or 2" in a particular place 
being unique across all (architecture, ABI) pairs supported by glibc, 
something more obviously specific to a particular architecture and ABI, 
e.g. ld-linux-powerpc64be-elfv2.so.1, would be better.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] powerpc/kvm: Enable support for ISA v3.1 guests

2020-06-02 Thread kbuild test robot

Hi Alistair,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.7 next-20200529]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Alistair-Popple/powerpc-kvm-Enable-support-for-ISA-v3-1-guests/20200602-140435
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All error/warnings (new ones prefixed by >>, old ones prefixed by <<):

arch/powerpc/kvm/book3s_hv.c: In function 'kvmppc_set_arch_compat':
<<  from arch/powerpc/kvm/book3s_hv.c:81:
>> arch/powerpc/kvm/book3s_hv.c:356:22: error: 'CPU_FTR_ARCH_31' undeclared 
>> (first use in this function); did you mean 'CPU_FTR_ARCH_300'?
356 |  if (cpu_has_feature(CPU_FTR_ARCH_31))
|  ^~~
|  CPU_FTR_ARCH_300
arch/powerpc/kvm/book3s_hv.c:356:22: note: each undeclared identifier is 
reported only once for each function it appears in
<<  from arch/powerpc/kvm/book3s_hv.c:81:
>> arch/powerpc/kvm/book3s_hv.c:348:25: error: 'PCR_ARCH_300' undeclared (first 
>> use in this function); did you mean 'PVR_ARCH_300'?
348 | #define PCR_ARCH_31(PCR_ARCH_300 << 1)
| ^~~~
<<  from arch/powerpc/kvm/book3s_hv.c:81:
>> arch/powerpc/kvm/book3s_hv.c:357:18: note: in expansion of macro 
>> 'PCR_ARCH_31'
357 |   host_pcr_bit = PCR_ARCH_31;
|  ^~~
arch/powerpc/kvm/book3s_hv.c: At top level:
arch/powerpc/kvm/book3s_hv.c:3521:5: error: no previous prototype for 
'kvmhv_p9_guest_entry' [-Werror=missing-prototypes]
3521 | int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
| ^~~~
cc1: all warnings being treated as errors

vim +356 arch/powerpc/kvm/book3s_hv.c

   346  
   347  /* Dummy value used in computing PCR value below */
 > 348  #define PCR_ARCH_31(PCR_ARCH_300 << 1)
   349  
   350  static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 
arch_compat)
   351  {
   352  unsigned long host_pcr_bit = 0, guest_pcr_bit = 0;
   353  struct kvmppc_vcore *vc = vcpu->arch.vcore;
   354  
   355  /* We can (emulate) our own architecture version and anything 
older */
 > 356  if (cpu_has_feature(CPU_FTR_ARCH_31))
 > 357  host_pcr_bit = PCR_ARCH_31;
   358  else if (cpu_has_feature(CPU_FTR_ARCH_300))
   359  host_pcr_bit = PCR_ARCH_300;
   360  else if (cpu_has_feature(CPU_FTR_ARCH_207S))
   361  host_pcr_bit = PCR_ARCH_207;
   362  else if (cpu_has_feature(CPU_FTR_ARCH_206))
   363  host_pcr_bit = PCR_ARCH_206;
   364  else
   365  host_pcr_bit = PCR_ARCH_205;
   366  
   367  /* Determine lowest PCR bit needed to run guest in given PVR 
level */
   368  guest_pcr_bit = host_pcr_bit;
   369  if (arch_compat) {
   370  switch (arch_compat) {
   371  case PVR_ARCH_205:
   372  guest_pcr_bit = PCR_ARCH_205;
   373  break;
   374  case PVR_ARCH_206:
   375  case PVR_ARCH_206p:
   376  guest_pcr_bit = PCR_ARCH_206;
   377  break;
   378  case PVR_ARCH_207:
   379  guest_pcr_bit = PCR_ARCH_207;
   380  break;
   381  case PVR_ARCH_300:
   382  guest_pcr_bit = PCR_ARCH_300;
   383  break;
   384  case PVR_ARCH_31:
   385  guest_pcr_bit = PCR_ARCH_31;
   386  break;
   387  default:
   388  return -EINVAL;
   389  }
   390  }
   391  
   392  /* Check requested PCR bits don't exceed our capabilities */
   393  if (guest_pcr_bit > host_pcr_bit)
   394  return -EINVAL;
   395  
   396  spin_lock(>lock);
   397  vc->arch_compat = arch_compat;
   398

[PATCH] cxl: Fix kobject memleak

2020-06-02 Thread Wang Hai

Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking
the kobject.

Fix it by adding a call to kobject_put() in the error path of
kobject_init_and_add().

Fixes: b087e6190ddc ("cxl: Export optional AFU configuration record in sysfs")
Reported-by: Hulk Robot 
Signed-off-by: Wang Hai 
---
 drivers/misc/cxl/sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
index f0263d1..d97a243 100644
--- a/drivers/misc/cxl/sysfs.c
+++ b/drivers/misc/cxl/sysfs.c
@@ -624,7 +624,7 @@ static struct afu_config_record 
*cxl_sysfs_afu_new_cr(struct cxl_afu *afu, int c
rc = kobject_init_and_add(>kobj, _config_record_type,
  >dev.kobj, "cr%i", cr->cr);
if (rc)
-   goto err;
+   goto err1;
 
rc = sysfs_create_bin_file(>kobj, >config_attr);
if (rc)
-- 
1.8.3.1

Re: [PATCH] powerpc/nvram: Replace kmalloc with kzalloc in the error message

2020-06-02 Thread Dan Carpenter

On Tue, Jun 02, 2020 at 09:23:57PM +1000, Michael Ellerman wrote:
> Markus Elfring  writes:
>  Please just remove the message instead, it's a tiny allocation that's
>  unlikely to ever fail, and the caller will print an error anyway.
> >>>
> >>> How do you think about to take another look at a previous update 
> >>> suggestion
> >>> like the following?
> >>>
> >>> powerpc/nvram: Delete three error messages for a failed memory allocation
> >>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/
> >>> https://lore.kernel.org/linuxppc-dev/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/
> >>> https://lore.kernel.org/patchwork/patch/752720/
> >>> https://lkml.org/lkml/2017/1/19/537
> >>
> >> That deleted the messages from nvram_scan_partitions(), but neither of
> >> the callers of nvram_scan_paritions() check its return value or print
> >> anything if it fails. So removing those messages would make those
> >> failures silent which is not what we want.
> >
> > * How do you think about information like the following?
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n883
> > “…
> > These generic allocation functions all emit a stack dump on failure when 
> > used
> > without __GFP_NOWARN so there is no use in emitting an additional failure
> > message when NULL is returned.
> > …”
> 
> Are you sure that's actually true?
> 
> A quick look around in slub.c leads me to:
> 
> slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
> {
> #ifdef CONFIG_SLUB_DEBUG

You first have to enable EXPERT mode before you can disable SLUB_DEBUG.
So that hopefully means you *really* want to save memory.  It doesn't
make sense to add a bunch of memory wasting printks when the users want
to go to extra lengths to conserve memory.

regards,
dan carpenter

Re: [PATCH] powerpc/nvram: Replace kmalloc with kzalloc in the error message

2020-06-02 Thread Michael Ellerman

Markus Elfring  writes:
 Please just remove the message instead, it's a tiny allocation that's
 unlikely to ever fail, and the caller will print an error anyway.
>>>
>>> How do you think about to take another look at a previous update suggestion
>>> like the following?
>>>
>>> powerpc/nvram: Delete three error messages for a failed memory allocation
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/
>>> https://lore.kernel.org/linuxppc-dev/00845261-8528-d011-d3b8-e9355a231...@users.sourceforge.net/
>>> https://lore.kernel.org/patchwork/patch/752720/
>>> https://lkml.org/lkml/2017/1/19/537
>>
>> That deleted the messages from nvram_scan_partitions(), but neither of
>> the callers of nvram_scan_paritions() check its return value or print
>> anything if it fails. So removing those messages would make those
>> failures silent which is not what we want.
>
> * How do you think about information like the following?
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=f359287765c04711ff54fbd11645271d8e5ff763#n883
> “…
> These generic allocation functions all emit a stack dump on failure when used
> without __GFP_NOWARN so there is no use in emitting an additional failure
> message when NULL is returned.
> …”

Are you sure that's actually true?

A quick look around in slub.c leads me to:

slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid)
{
#ifdef CONFIG_SLUB_DEBUG
static DEFINE_RATELIMIT_STATE(slub_oom_rs, DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
int node;
struct kmem_cache_node *n;

if ((gfpflags & __GFP_NOWARN) || !__ratelimit(_oom_rs))
return;

pr_warn("SLUB: Unable to allocate memory on node %d, gfp=%#x(%pGg)\n",
nid, gfpflags, );
pr_warn("  cache: %s, object size: %u, buffer size: %u, default order: 
%u, min order: %u\n",
s->name, s->object_size, s->size, oo_order(s->oo),
oo_order(s->min));

if (oo_order(s->min) > get_order(s->object_size))
pr_warn("  %s debugging increased min order, use slub_debug=O 
to disable.\n",
s->name);

for_each_kmem_cache_node(s, node, n) {
unsigned long nr_slabs;
unsigned long nr_objs;
unsigned long nr_free;

nr_free  = count_partial(n, count_free);
nr_slabs = node_nr_slabs(n);
nr_objs  = node_nr_objs(n);

pr_warn("  node %d: slabs: %ld, objs: %ld, free: %ld\n",
node, nr_slabs, nr_objs, nr_free);
}
#endif
}

Which looks a lot like it won't print anything when CONFIG_SLUB_DEBUG=n.

But maybe I'm looking in the wrong place?

cheers

Re: [PATCH] hw_breakpoint: Fix build warnings with clang

2020-06-02 Thread Ravi Bangoria





On 6/2/20 4:30 PM, Michael Ellerman wrote:

Christophe Leroy  writes:

Le 02/06/2020 à 06:12, Ravi Bangoria a écrit :

kbuild test robot reported few build warnings with hw_breakpoint code
when compiled with clang[1]. Fix those.

[1]: https://lore.kernel.org/linuxppc-dev/202005192233.oi9cjrta%25...@intel.com/



This should have mentioned that some of the errors were recently
introduced by your commit.


Sure, will take care next time.




Reported-by: kbuild test robot 
Signed-off-by: Ravi Bangoria 
---
Note: Prepared on top of powerpc/next.

   arch/powerpc/include/asm/hw_breakpoint.h | 3 ---
   include/linux/hw_breakpoint.h| 4 
   2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index f42a55eb77d2..cb424799da0d 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -70,9 +70,6 @@ extern int hw_breakpoint_exceptions_notify(struct 
notifier_block *unused,
unsigned long val, void *data);
   int arch_install_hw_breakpoint(struct perf_event *bp);
   void arch_uninstall_hw_breakpoint(struct perf_event *bp);
-int arch_reserve_bp_slot(struct perf_event *bp);
-void arch_release_bp_slot(struct perf_event *bp);
-void arch_unregister_hw_breakpoint(struct perf_event *bp);
   void hw_breakpoint_pmu_read(struct perf_event *bp);
   extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk);
   
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h

index 6058c3844a76..521481f0d320 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -80,6 +80,10 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp);
   extern int dbg_release_bp_slot(struct perf_event *bp);
   extern int reserve_bp_slot(struct perf_event *bp);
   extern void release_bp_slot(struct perf_event *bp);
+extern int hw_breakpoint_weight(struct perf_event *bp);
+extern int arch_reserve_bp_slot(struct perf_event *bp);
+extern void arch_release_bp_slot(struct perf_event *bp);
+extern void arch_unregister_hw_breakpoint(struct perf_event *bp);


Please no new 'extern'. In the old days 'extern' keyword was used, but
new code shall not introduce new unnecessary usage of 'extern' keyword.
See report from Checkpatch below:

WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description
(prefer a maximum 75 chars per line)
#9:
[1]:
https://lore.kernel.org/linuxppc-dev/202005192233.oi9cjrta%25...@intel.com/

CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#40: FILE: include/linux/hw_breakpoint.h:83:
+extern int hw_breakpoint_weight(struct perf_event *bp);


I fixed it up when applying.


Thanks Michael.

Ravi

Re: [PATCH] hw_breakpoint: Fix build warnings with clang

2020-06-02 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 02/06/2020 à 06:12, Ravi Bangoria a écrit :
>> kbuild test robot reported few build warnings with hw_breakpoint code
>> when compiled with clang[1]. Fix those.
>> 
>> [1]: 
>> https://lore.kernel.org/linuxppc-dev/202005192233.oi9cjrta%25...@intel.com/
>> 

This should have mentioned that some of the errors were recently
introduced by your commit.

>> Reported-by: kbuild test robot 
>> Signed-off-by: Ravi Bangoria 
>> ---
>> Note: Prepared on top of powerpc/next.
>> 
>>   arch/powerpc/include/asm/hw_breakpoint.h | 3 ---
>>   include/linux/hw_breakpoint.h| 4 
>>   2 files changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
>> b/arch/powerpc/include/asm/hw_breakpoint.h
>> index f42a55eb77d2..cb424799da0d 100644
>> --- a/arch/powerpc/include/asm/hw_breakpoint.h
>> +++ b/arch/powerpc/include/asm/hw_breakpoint.h
>> @@ -70,9 +70,6 @@ extern int hw_breakpoint_exceptions_notify(struct 
>> notifier_block *unused,
>>  unsigned long val, void *data);
>>   int arch_install_hw_breakpoint(struct perf_event *bp);
>>   void arch_uninstall_hw_breakpoint(struct perf_event *bp);
>> -int arch_reserve_bp_slot(struct perf_event *bp);
>> -void arch_release_bp_slot(struct perf_event *bp);
>> -void arch_unregister_hw_breakpoint(struct perf_event *bp);
>>   void hw_breakpoint_pmu_read(struct perf_event *bp);
>>   extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk);
>>   
>> diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
>> index 6058c3844a76..521481f0d320 100644
>> --- a/include/linux/hw_breakpoint.h
>> +++ b/include/linux/hw_breakpoint.h
>> @@ -80,6 +80,10 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp);
>>   extern int dbg_release_bp_slot(struct perf_event *bp);
>>   extern int reserve_bp_slot(struct perf_event *bp);
>>   extern void release_bp_slot(struct perf_event *bp);
>> +extern int hw_breakpoint_weight(struct perf_event *bp);
>> +extern int arch_reserve_bp_slot(struct perf_event *bp);
>> +extern void arch_release_bp_slot(struct perf_event *bp);
>> +extern void arch_unregister_hw_breakpoint(struct perf_event *bp);
>
> Please no new 'extern'. In the old days 'extern' keyword was used, but 
> new code shall not introduce new unnecessary usage of 'extern' keyword. 
> See report from Checkpatch below:
>
> WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
> (prefer a maximum 75 chars per line)
> #9:
> [1]: 
> https://lore.kernel.org/linuxppc-dev/202005192233.oi9cjrta%25...@intel.com/
>
> CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
> #40: FILE: include/linux/hw_breakpoint.h:83:
> +extern int hw_breakpoint_weight(struct perf_event *bp);

I fixed it up when applying.

cheers

Re: [PATCH] powerpc/32: disable KASAN with pages bigger than 16k

2020-06-02 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 28/05/2020 à 12:17, Christophe Leroy a écrit :
>> Mapping of early shadow area is implemented by using a single static
>> page table having all entries pointing to the same early shadow page.
>> The shadow area must therefore occupy full PGD entries.
>> 
>> The shadow area has a size of 128Mbytes starting at 0xf800.
>> With 4k pages, a PGD entry is 4Mbytes
>> With 16k pages, a PGD entry is 64Mbytes
>> With 64k pages, a PGD entry is 256Mbytes which is too big.
>
> That's for 32k pages that a PGD is 256Mbytes.
>
> With 64k pages, a PGD entry is 1Gbytes which is too big.
>
> Michael, can you fix the commit log ?

Yes.

  powerpc/32: Disable KASAN with pages bigger than 16k

  Mapping of early shadow area is implemented by using a single static
  page table having all entries pointing to the same early shadow page.
  The shadow area must therefore occupy full PGD entries.

  The shadow area has a size of 128MB starting at 0xf800.
  With 4k pages, a PGD entry is 4MB
  With 16k pages, a PGD entry is 64MB
  With 64k pages, a PGD entry is 1GB which is too big.

  Until we rework the early shadow mapping, disable KASAN when the page
  size is too big.

  Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
  Cc: sta...@vger.kernel.org # v5.2+
  Reported-by: kbuild test robot 
  Signed-off-by: Christophe Leroy 
  Signed-off-by: Michael Ellerman 
  Link: 
https://lore.kernel.org/r/7195fcde7314ccbf7a081b356084a69d421b10d4.1590660977.git.christophe.le...@csgroup.eu

cheers

[RESEND PATCH v9 2/5] seq_buf: Export seq_buf_printf

2020-06-02 Thread Vaibhav Jain

'seq_buf' provides a very useful abstraction for writing to a string
buffer without needing to worry about it over-flowing. However even
though the API has been stable for couple of years now its still not
exported to kernel loadable modules limiting its usage.

Hence this patch proposes update to 'seq_buf.c' to mark
seq_buf_printf() which is part of the seq_buf API to be exported to
kernel loadable GPL modules. This symbol will be used in later parts
of this patch-set to simplify content creation for a sysfs attribute.

Cc: Piotr Maziarz 
Cc: Cezary Rojewski 
Cc: Christoph Hellwig 
Cc: Steven Rostedt 
Cc: Borislav Petkov 
Acked-by: Steven Rostedt (VMware) 
Signed-off-by: Vaibhav Jain 
---
Changelog:

Resend:
* Added ack from Steven Rostedt

v8..v9:
* None

v7..v8:
* Updated the patch title [ Christoph Hellwig ]
* Updated patch description to replace confusing term 'external kernel
  modules' to 'kernel lodable modules'.

Resend:
* Added ack from Steven Rostedt

v6..v7:
* New patch in the series
---
 lib/seq_buf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4e865d42ab03..707453f5d58e 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
 
return ret;
 }
+EXPORT_SYMBOL_GPL(seq_buf_printf);
 
 #ifdef CONFIG_BINARY_PRINTF
 /**
-- 
2.26.2

[RESEND PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH

2020-06-02 Thread Vaibhav Jain

This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_pdsm_health() that queries the nvdimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.

The patch also introduces a new member 'struct papr_scm_priv.health'
thats an instance of 'struct nd_papr_pdsm_health' to cache the health
information of a nvdimm. As a result functions drc_pmem_query_health()
and flags_show() are updated to populate and use this new struct
instead of a u64 integer that was earlier used.

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Vaibhav Jain 
---
Changelog:

Resend:
* Added ack from Aneesh.

v8..v9:
* s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g  [ Dan , Aneesh ]
* s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
* Renamed papr_scm_get_health() to papr_psdm_health()
* Updated patch description to replace papr-scm dimm with nvdimm.

v7..v8:
* None

Resend:
* None

v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
  __drc_pmem_query_health() bypassing the cache [Mpe].

v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
  gaurd against possibility of different compilers adding different
  paddings to the struct [ Dan Williams ]

* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
  'bool' and also updated drc_pmem_query_health() to take this into
  account. [ Dan Williams ]

v4..v5:
* None

v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
  papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]

v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
  as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
  from enum to #defines [Aneesh]

v1..v2:
* New patch in the series
---
 arch/powerpc/include/uapi/asm/papr_pdsm.h |  39 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 125 +++---
 2 files changed, 147 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 6407fefcc007..411725a91591 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
  */
 enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
+   PAPR_PDSM_HEALTH,
PAPR_PDSM_MAX,
 };
 
@@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct 
nd_pdsm_cmd_pkg *pcmd)
return (void *)(pcmd->payload);
 }
 
+/* Various nvdimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY   0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL  2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * dimm_unarmed: Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown   : Previous shutdown did not persist contents.
+ * dimm_bad_restore: Contents from previous shutdown werent restored.
+ * dimm_scrubbed   : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted  : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_
+ */
+struct nd_papr_pdsm_health_v1 {
+   __u8 dimm_unarmed;
+   __u8 dimm_bad_shutdown;
+   __u8 dimm_bad_restore;
+   __u8 dimm_scrubbed;
+   __u8 dimm_locked;
+   __u8 dimm_encrypted;
+   __u16 dimm_health;
+} __packed;
+
+/*
+ * Typedef the current struct for dimm_health so that any application
+ * or kernel recompiled after introducing a new version automatically
+ * supports the new version.
+ */
+#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
+
+/* Current version number for the dimm health struct */
+#define ND_PAPR_PDSM_HEALTH_VERSION 1
+
 #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 5e2237e7ec08..c0606c0c659c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -88,7 +88,7 @@ struct papr_scm_priv {
unsigned long lasthealth_jiffies;
 
/* Health information for the dimm */
-   u64 health_bitmap;
+   struct nd_papr_pdsm_health health;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
 static int __drc_pmem_query_health(struct papr_scm_priv *p)
 {
unsigned long

[RESEND PATCH v9 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-06-02 Thread Vaibhav Jain

Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.

The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
to communicate the PDSM request via member
'nd_cmd_pkg.nd_command' and size of payload that need to be
sent/received for servicing the PDSM.

A new function is_cmd_valid() is implemented that reads the args to
papr_scm_ndctl() and performs sanity tests on them. A new function
papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm.

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Vaibhav Jain 
---
Changelog:

Resend:
* Added ack from Aneesh.

v8..v9:
* Reduced the usage of term SCM replacing it with appropriate
  replacement [ Dan Williams, Aneesh ]
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
* s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
* s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
* Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
* Minor update to patch description.

v7..v8:
* Removed the 'payload_offset' field from 'struct
  nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
  at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
* To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
  'reserved' field of 10-bytes is introduced. [ Aneesh ]
* Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
  [ Ira ]

Resend:
* None

v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
  [Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
  [Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]

v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
  ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
  to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
  this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x1 to 0x0
  [ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
  [ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
  against different compiler adding paddings between the fields.
  [ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]

v4..v5 :
* None

v3..v4 :
* None

v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]

v1..v2 :
* None
---
 arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++
 arch/powerpc/platforms/pseries/papr_scm.c | 101 +++-
 include/uapi/linux/ndctl.h|   1 +
 3 files changed, 232 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h

diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
new file mode 100644
index ..6407fefcc007
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain 
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+
+#include 
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * envelope which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
+ * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
+ * There is reserved field that can used to introduce new fields to the
+ * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
+ * lies at a 8-byte boundary.
+ *
+ *  +-+-+---+
+ *  |   64-Bytes  |   16-Bytes  |   Max 176-Bytes   |
+ *  +-+-+---+
+ *  |   nd_pdsm_cmd_pkg |   |
+ *  |-+ |   |
+ *  |  nd_cmd_pkg | |   |
+ *  +-+-+---+
+ *  | nd_family

[RESEND PATCH v9 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-06-02 Thread Vaibhav Jain

Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit bitmap, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.

The patch also adds a  documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-pmem.

[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Vaibhav Jain 
---
Changelog:

Resend:
* Added ack from Aneesh.

v8..v9:
* Rename some variables and defines to reduce usage of term SCM
  replacing it with PMEM [Dan Williams, Aneesh]
* s/PAPR_SCM_DIMM/PAPR_PMEM/g
* s/papr_scm_nd_attributes/papr_nd_attributes/g
* s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
* s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
* Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem

v7..v8:
* Update type of variable 'rc' in __drc_pmem_query_health() and
  drc_pmem_query_health() to long and int respectively. [ Ira ]
* Updated the patch description to s/64 bit Big Endian Number/64-bit
  bitmap/ [ Ira, Aneesh ].

Resend:
* None

v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
  'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
  and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
  documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
  care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
  READ_ONCE(). [Mpe]

v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
  [Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
  integer [Mpe]
* Updated patch description to reflect the changes made in this
  version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
  flags_show() [Dan Williams]

v4..v5 :
* None

v3..v4 :
* None

v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
 NVDIMM unarmed [Aneesh]

v1..v2 :
* New patch in the series.
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 169 +-
 2 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem 
b/Documentation/ABI/testing/sysfs-bus-papr-pmem
new file mode 100644
index ..5b10d036a8d4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -0,0 +1,27 @@
+What:  /sys/bus/nd/devices/nmemX/papr/flags
+Date:  Apr, 2020
+KernelVersion: v5.8
+Contact:   linuxppc-dev , 
linux-nvd...@lists.01.org,
+Description:
+   (RO) Report flags indicating various states of a
+   papr-pmem NVDIMM device. Each flag maps to a one or
+   more bits set in the dimm-health-bitmap retrieved in
+   response to H_SCM_HEALTH hcall. The details of the bit
+   flags returned in response to this hcall is available
+   at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+   the flags reported in this sysfs file:
+
+   * "not_armed"   : Indicates that NVDIMM contents will not
+ survive a power cycle.
+   * "flush_fail"  : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+   * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+   * "encrypted"   : NVDIMM contents are encrypted.
+   * "smart_notify": There is health event for the NVDIMM.
+   * "scrubbed": Indicating that contents of the
+ NVDIMM have been scrubbed.
+   * "locked"  : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..149431594839 100644
---

[RESEND PATCH v9 1/5] powerpc: Document details on H_SCM_HEALTH hcall

2020-06-02 Thread Vaibhav Jain

Add documentation to 'papr_hcalls.rst' describing the bitmap flags
that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
specification.

Cc: "Aneesh Kumar K . V" 
Cc: Dan Williams 
Cc: Michael Ellerman 
Cc: Ira Weiny 
Signed-off-by: Vaibhav Jain 
---
Changelog:

Resend:
* None

v8..v9:
* s/SCM/PMEM device. [ Dan Williams, Aneesh ]

v7..v8:
* Added a clarification on bit-ordering of Health Bitmap

Resend:
* None

v6..v7:
* None

v5..v6:
* New patch in the series
---
 Documentation/powerpc/papr_hcalls.rst | 46 ---
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/Documentation/powerpc/papr_hcalls.rst 
b/Documentation/powerpc/papr_hcalls.rst
index 3493631a60f8..48fcf1255a33 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -220,13 +220,51 @@ from the LPAR memory.
 **H_SCM_HEALTH**
 
 | Input: drcIndex
-| Out: *health-bitmap, health-bit-valid-bitmap*
+| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
 | Return Value: *H_Success, H_Parameter, H_Hardware*
 
 Given a DRC Index return the info on predictive failure and overall health of
-the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
-failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
-valid.
+the PMEM device. The asserted bits in the health-bitmap indicate one or more 
states
+(described in table below) of the PMEM device and health-bit-valid-bitmap 
indicate
+which bits in health-bitmap are valid. The bits are reported in
+reverse bit ordering for example a value of 0xC400
+indicates bits 0, 1, and 5 are valid.
+
+Health Bitmap Flags:
+
++--+---+
+|  Bit |   Definition  
|
++==+===+
+|  00  | PMEM device is unable to persist memory contents. 
|
+|  | If the system is powered down, nothing will be saved. 
|
++--+---+
+|  01  | PMEM device failed to persist memory contents. Either contents were   
|
+|  | not saved successfully on power down or were not restored properly on 
|
+|  | power up. 
|
++--+---+
+|  02  | PMEM device contents are persisted from previous IPL. The data from   
|
+|  | the last boot were successfully restored. 
|
++--+---+
+|  03  | PMEM device contents are not persisted from previous IPL. There was 
no|
+|  | data to restore from the last boot.   
|
++--+---+
+|  04  | PMEM device memory life remaining is critically low   
|
++--+---+
+|  05  | PMEM device will be garded off next IPL due to failure
|
++--+---+
+|  06  | PMEM device contents cannot persist due to current platform health
|
+|  | status. A hardware failure may prevent data from being saved or   
|
+|  | restored. 
|
++--+---+
+|  07  | PMEM device is unable to persist memory contents in certain 
conditions|
++--+---+
+|  08  | PMEM device is encrypted  
|
++--+---+
+|  09  | PMEM device has successfully completed a requested erase or secure
|
+|  | erase procedure.  
|
++--+---+
+|10:63 | Reserved / Unused 
|
++--+---+
 
 **H_SCM_PERFORMANCE_STATS**
 
-- 
2.26.2

[RESEND PATCH v9 0/5] powerpc/papr_scm: Add support for reporting nvdimm health

2020-06-02 Thread Vaibhav Jain

Changes since v9 [1]:
* Added acks from Aneesh and Steven Steven Rostedt.

Changes since v8 [2]:

* Updated proposed changes to remove usage of term 'SCM' due to
  ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ]
* Replaced the usage of term 'SCM' with 'PMEM' in most contexts.
  [ Aneesh ]
* Renamed NVDIMM health defines from PAPR_SCM_DIMM_* to PAPR_PMEM_* .
* Updates to various newly introduced identifiers in 'papr_scm.c'
  removing the 'SCM' prefix from their names.
* Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR
* Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH
* Renamed uapi header 'papr_scm_pdsm.h' to 'papr_pdsm.h'.
* Renamed sysfs ABI document from sysfs-bus-papr-scm to
  sysfs-bus-papr-pmem.
* No behavioural changes from v8 [1].

[1] 
https://lore.kernel.org/linux-nvdimm/20200529214719.223344-1-vaib...@linux.ibm.com
[2] 
https://lore.kernel.org/linux-nvdimm/20200527041244.37821-1-vaib...@linux.ibm.com/
---

The PAPR standard[3][5] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[4].  Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.

To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[4].  This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.

These changes coupled with proposed ndtcl changes located at Ref[6]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.

Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:

 # ndctl list -DH
[
  {
"dev":"nmem0",
"health":{
  "health_state":"fatal",
  "shutdown_state":"dirty"
}
  }
]

Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:

 # ndctl list -d nmem0 -H
[
  {
"dev":"nmem0",
"health":{
  "health_state":"ok",
  "shutdown_state":"clean"
}
  }
]

PAPR NVDIMM-Specific-Methods(PDSM)
==

PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.

Structure of the patch-set
==

The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.

Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.

Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.

Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.

References:
[3] "Power Architecture Platform Reference"
  https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[4] commit 58b278f568f0
 ("powerpc: Provide initial documentation for PAPR hcalls")
[5] "Linux on Power Architecture Platform Reference"
 https://members.openpowerfoundation.org/document/dl/469
[6] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v9

---

Vaibhav Jain (5):
  powerpc: Document details on H_SCM_HEALTH hcall
  seq_buf: Export seq_buf_printf
  powerpc/papr_scm: Fetch nvdimm health information from PHYP
  ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
  powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH

 Documentation/ABI/testing/sysfs-bus-papr-pmem |  27 ++
 Documentation/powerpc/papr_hcalls.rst |  46 ++-
 arch/powerpc/include/uapi/asm/papr_pdsm.h | 175 +
 arch/powerpc/platforms/pseries/papr_scm.c | 363 +-
 include/uapi/linux/ndctl.h|   1 +
 lib/seq_buf.c |   1 +
 6 files changed, 600 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
 create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h

-- 
2.26.2

Re: [PATCH v1 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-06-02 Thread Bharata B Rao

On Mon, Jun 01, 2020 at 12:05:35PM -0700, Ram Pai wrote:
> On Mon, Jun 01, 2020 at 05:25:18PM +0530, Bharata B Rao wrote:
> > On Sat, May 30, 2020 at 07:27:50PM -0700, Ram Pai wrote:
> > > H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
> > > called H_SVM_PAGE_IN for all secure pages.
> > 
> > I don't think that is quite true. HV doesn't assume anything about
> > secure pages by itself.
> 
> Yes. Currently, it does not assume anything about secure pages.  But I am
> proposing that it should consider all pages (except the shared pages) as
> secure pages, when H_SVM_INIT_DONE is called.

Ok, then may be also add the proposed changes to H_SVM_INIT_DONE
documentation.

> 
> In other words, HV should treat all pages; except shared pages, as
> secure pages once H_SVM_INIT_DONE is called. And this includes pages
> added subsequently through memory hotplug.

So after H_SVM_INIT_DONE, if HV touches a secure page for any
reason and gets encrypted contents via page-out, HV drops the
device pfn at that time. So what state we would be in that case? We
have completed H_SVM_INIT_DONE, but still have a normal (but encrypted)
page in HV?

> 
> Yes. the Ultravisor can explicitly request the HV to move the pages
> individually.  But that will slow down the transition too significantly.
> It takes above 20min to transition them, for a SVM of size 100G.
> 
> With this proposed enhancement, the switch completes in a few seconds.

I think, many pages during initial switch and most pages for hotplugged
memory are zero pages, for which we don't anyway issue UV page-in calls.
So the 20min saving you are observing is purely due to hcall overhead?

How about extending H_SVM_PAGE_IN interface or a new hcall to request
multiple pages in one request?

Also, how about requesting for bigger page sizes (2M)? Ralph Campbell
had patches that added THP support for migrate_vma_* calls.

> 
> > 
> > > These GFNs continue to be
> > > normal GFNs associated with normal PFNs; when infact, these GFNs should
> > > have been secure GFNs associated with device PFNs.
> > 
> > Transition to secure state is driven by SVM/UV and HV just responds to
> > hcalls by issuing appropriate uvcalls. SVM/UV is in the best position to
> > determine the required pages that need to be moved into secure side.
> > HV just responds to it and tracks such pages as device private pages.
> > 
> > If SVM/UV doesn't get in all the pages to secure side by the time
> > of H_SVM_INIT_DONE, the remaining pages are just normal (shared or
> > otherwise) pages as far as HV is concerned.  Why should HV assume that
> > SVM/UV didn't ask for a few pages and hence push those pages during
> > H_SVM_INIT_DONE?
> 
> By definition, SVM is a VM backed by secure pages.
> Hence all pages(except shared) must turn secure when a VM switches to SVM.
> 
> UV is interested in only a certain pages for the VM, which it will
> request explicitly through H_SVM_PAGE_IN.  All other pages, need not
> be paged-in through UV_PAGE_IN.  They just need to be switched to
> device-pages.
> 
> > 
> > I think UV should drive the movement of pages into secure side both
> > of boot-time SVM memory and hot-plugged memory. HV does memslot
> > registration uvcall when new memory is plugged in, UV should explicitly
> > get the required pages in at that time instead of expecting HV to drive
> > the same.
> > 
> > > +static int uv_migrate_mem_slot(struct kvm *kvm,
> > > + const struct kvm_memory_slot *memslot)
> > > +{
> > > + unsigned long gfn = memslot->base_gfn;
> > > + unsigned long end;
> > > + bool downgrade = false;
> > > + struct vm_area_struct *vma;
> > > + int i, ret = 0;
> > > + unsigned long start = gfn_to_hva(kvm, gfn);
> > > +
> > > + if (kvm_is_error_hva(start))
> > > + return H_STATE;
> > > +
> > > + end = start + (memslot->npages << PAGE_SHIFT);
> > > +
> > > + down_write(>mm->mmap_sem);
> > > +
> > > + mutex_lock(>arch.uvmem_lock);
> > > + vma = find_vma_intersection(kvm->mm, start, end);
> > > + if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > > + ret = H_STATE;
> > > + goto out_unlock;
> > > + }
> > > +
> > > + ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > > +   MADV_UNMERGEABLE, >vm_flags);
> > > + downgrade_write(>mm->mmap_sem);
> > > + downgrade = true;
> > > + if (ret) {
> > > + ret = H_STATE;
> > > + goto out_unlock;
> > > + }
> > > +
> > > + for (i = 0; i < memslot->npages; i++, ++gfn) {
> > > + /* skip paged-in pages and shared pages */
> > > + if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL) ||
> > > + kvmppc_gfn_is_uvmem_shared(gfn, kvm))
> > > + continue;
> > > +
> > > + start = gfn_to_hva(kvm, gfn);
> > > + end = start + (1UL << PAGE_SHIFT);
> > > + ret = kvmppc_svm_migrate_page(vma, start, end,
> > > + (gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> > > +
> > > + if (ret)
> > > +

Re: [PATCH v1 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-06-02 Thread Laurent Dufour


Le 31/05/2020 à 04:27, Ram Pai a écrit :

From: Laurent Dufour 

When a memory slot is hot plugged to a SVM, GFNs associated with that
memory slot automatically default to secure GFN. Hence migrate the
PFNs associated with these GFNs to device-PFNs.

uv_migrate_mem_slot() is called to achieve that. It will not call
UV_PAGE_IN since this request is ignored by the Ultravisor.
NOTE: Ultravisor does not trust any page content provided by
the Hypervisor, ones the VM turns secure.

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Bharata B Rao 
Cc: Aneesh Kumar K.V 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Thiago Jung Bauermann 
Cc: David Gibson 
Cc: Claudio Carvalho 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai 
(fixed merge conflicts. Modified the commit message)
Signed-off-by: Laurent Dufour 
---
  arch/powerpc/include/asm/kvm_book3s_uvmem.h |  4 
  arch/powerpc/kvm/book3s_hv.c| 11 +++
  arch/powerpc/kvm/book3s_hv_uvmem.c  |  3 +--
  3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index f0c5708..2ec2e5afb 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
  void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 struct kvm *kvm, bool skip_page_out,
 bool purge_gfn);
+int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot 
*memslot);
  #else
  static inline int kvmppc_uvmem_init(void)
  {
@@ -78,5 +79,8 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
unsigned long gfn)
  kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm, bool skip_page_out,
bool purge_gfn) { }
+
+static int uv_migrate_mem_slot(struct kvm *kvm,
+   const struct kvm_memory_slot *memslot);


That line was not part of the patch I sent to you!



  #endif /* CONFIG_PPC_UV */
  #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4c62bfe..604d062 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4516,13 +4516,16 @@ static void kvmppc_core_commit_memory_region_hv(struct 
kvm *kvm,
case KVM_MR_CREATE:
if (kvmppc_uvmem_slot_init(kvm, new))
return;
-   uv_register_mem_slot(kvm->arch.lpid,
-new->base_gfn << PAGE_SHIFT,
-new->npages * PAGE_SIZE,
-0, new->id);
+   if (uv_register_mem_slot(kvm->arch.lpid,
+new->base_gfn << PAGE_SHIFT,
+new->npages * PAGE_SIZE,
+0, new->id))
+   return;
+   uv_migrate_mem_slot(kvm, new);
break;
case KVM_MR_DELETE:
uv_unregister_mem_slot(kvm->arch.lpid, old->id);
+   kvmppc_uvmem_drop_pages(old, kvm, true, true);


Again that line has been changed from the patch I sent to you. The last 'true' 
argument has nothing to do here.


Is that series really building?


kvmppc_uvmem_slot_free(kvm, old);
break;
default:
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 36dda1d..1fa5f2a 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -377,8 +377,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct 
*vma,
return ret;
  }
  
-static int uv_migrate_mem_slot(struct kvm *kvm,

-   const struct kvm_memory_slot *memslot)
+int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
  {
unsigned long gfn = memslot->base_gfn;
unsigned long end;

Re: [RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

2020-06-02 Thread Aneesh Kumar K.V

"Aneesh Kumar K.V"  writes:

> On 6/1/20 5:37 PM, Michal Suchánek wrote:
>> On Mon, Jun 01, 2020 at 05:31:50PM +0530, Aneesh Kumar K.V wrote:
>>> On 6/1/20 3:39 PM, Jan Kara wrote:
 On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
> On 5/29/20 3:22 PM, Jan Kara wrote:
>> On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
>>> Thanks Michal. I also missed Jeff in this email thread.
>>
>> And I think you'll also need some of the sched maintainers for the prctl
>> bits...
>>
>>> On 5/29/20 3:03 PM, Michal Suchánek wrote:
 Adding Jan

 On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
> With POWER10, architecture is adding new pmem flush and sync 
> instructions.
> The kernel should prevent the usage of MAP_SYNC if applications are 
> not using
> the new instructions on newer hardware.
>
> This patch adds a prctl option MAP_SYNC_ENABLE that can be used to 
> enable
> the usage of MAP_SYNC. The kernel config option is added to allow the 
> user
> to control whether MAP_SYNC should be enabled by default or not.
>
> Signed-off-by: Aneesh Kumar K.V 
>> ...
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8c700f881d92..d5a9a363e81e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp 
> DEFINE_SPINLOCK(mmlist_lock);
>  static unsigned long default_dump_filter = 
> MMF_DUMP_FILTER_DEFAULT;
> +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
> +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
> +#else
> +unsigned long default_map_sync_mask = 0;
> +#endif
> +
>>
>> I'm not sure CONFIG is really the right approach here. For a distro that 
>> would
>> basically mean to disable MAP_SYNC for all PPC kernels unless application
>> explicitly uses the right prctl. Shouldn't we rather initialize
>> default_map_sync_mask on boot based on whether the CPU we run on requires
>> new flush instructions or not? Otherwise the patch looks sensible.
>>
>
> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
> But on a virtualized platform there is no easy way to detect that. We 
> could
> ideally hook this into the nvdimm driver where we look at the new compat
> string ibm,persistent-memory-v2 and then disable MAP_SYNC
> if we find a device with the specific value.

 Hum, couldn't we set some flag for nvdimm devices with
 "ibm,persistent-memory-v2" property and then check it during mmap(2) time
 and when the device has this propery and the mmap(2) caller doesn't have
 the prctl set, we'd disallow MAP_SYNC? That should make things mostly
 seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
 devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
 applications need to be aware of new instructions so this isn't that much
 additional burden...
>>>
>>> I am not sure application would want to add that much details/knowledge
>>> about a platform in their code. I was expecting application to do
>>>
>>> #ifdef __ppc64__
>>>  prctl(MAP_SYNC_ENABLE, 1, 0, 0, 0));
>>> #endif
>>>  a = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
>>>  MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0);
>>>
>>>
>>> For that code all the complexity that we add w.r.t ibm,persistent-memory-v2
>>> is not useful. Do you see a value in making all these device specific rather
>>> than a conditional on  __ppc64__?
>
>> If the vpmem devices continue to work with the old instruction on
>> POWER10 then it makes sense to make this per-device.
>
> vPMEM doesn't have write_cache and hence it is synchronous even without 
> using any specific flush instruction. The question is do we want to have
> different programming steps when running on vPMEM vs a persistent PMEM 
> device on ppc64.
>
> I will work on the device specific ENABLE flag and then we can compare 
> the kernel complexity against the added benefit.

I have posted an RFC v2 [1] that implements a device-specific MAP_SYNC
enable/disable feature. The Posted changes also add a dax flag suggested
by Dan. With device-specific MAP_SYNC enable/disable, it was just a sysfs
file export of the same flag. 

1. 
https://lore.kernel.org/linuxppc-dev/20200602074909.36738-1-aneesh.ku...@linux.ibm.com/

-aneesh

[RFC PATCH v2 2/5] powerpc/pmem: Disable synchronous fault by default

2020-06-02 Thread Aneesh Kumar K.V

This adds a kernel config option that controls whether MAP_SYNC is enabled by
default. With POWER10, architecture is adding new pmem flush and sync
instructions. The kernel should prevent the usage of MAP_SYNC if applications
are not using the new instructions on newer hardware.

This config allows user to control whether MAP_SYNC should be enabled by
default or not.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/Kconfig.cputype | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..f8694838ad4e 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
  If you're unsure, say Y.
 
+config ARCH_MAP_SYNC_DISABLE
+   bool "Disable synchronous fault support (MAP_SYNC)"
+   default y
+   help
+ Disable support for synchronous fault with nvdimm namespaces.
+
+ If you're unsure, say Y.
+
+
 config PPC_HAVE_KUAP
bool
 
-- 
2.26.2

[RFC PATCH v2 5/5] libnvdimm: Add prctl control for disabling synchronous fault support

2020-06-02 Thread Aneesh Kumar K.V

With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware.

This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
the usage of MAP_SYNC. This is in addition to the namespace specific control
already added (/sys/bus/nd/devices/region0/pfn0.1/block/pmem0/dax/sync_fault)

With this patch, if the device supports synchronous fault, then an application
can enable the synchronous fault support using the prctl() interface even if
the platform disabled it for the namespace.

Signed-off-by: Aneesh Kumar K.V 
---
 include/linux/dax.h|  5 +++--
 include/linux/sched/coredump.h | 13 ++---
 include/uapi/linux/prctl.h |  3 +++
 kernel/fork.c  |  8 +++-
 kernel/sys.c   | 18 ++
 5 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index c4a3551557de..0733aae23828 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -80,9 +80,10 @@ static inline bool daxdev_mapping_supported(struct 
vm_area_struct *vma,
if (!IS_DAX(file_inode(vma->vm_file)))
return false;
/*
-* check MAP_SYNC is disabled by platform for this device.
+* MAP_SYNC is disabled by platform for this device.
+* check for prctl.
 */
-   if (!dax_synchronous_enabled(dax_dev))
+   if (!dax_synchronous_enabled(dax_dev) && !map_sync_enabled(vma->vm_mm))
return false;
 
return dax_synchronous(dax_dev);
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index ecdc6542070f..35698adc3d13 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -72,9 +72,16 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_DISABLE_THP24  /* disable THP for all VMAs */
 #define MMF_OOM_VICTIM 25  /* mm is the oom victim */
 #define MMF_OOM_REAP_QUEUED26  /* mm was queued for oom_reaper */
-#define MMF_DISABLE_THP_MASK   (1 << MMF_DISABLE_THP)
+#define MMF_ENABLE_MAP_SYNC27  /* disable THP for all VMAs */
+#define MMF_DISABLE_THP_MASK   (1 << MMF_DISABLE_THP)
+#define MMF_ENABLE_MAP_SYNC_MASK   (1 << MMF_ENABLE_MAP_SYNC)
 
-#define MMF_INIT_MASK  (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
-MMF_DISABLE_THP_MASK)
+#define MMF_INIT_MASK  (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK | \
+   MMF_DISABLE_THP_MASK | MMF_ENABLE_MAP_SYNC_MASK)
+
+static inline bool map_sync_enabled(struct mm_struct *mm)
+{
+   return !!(mm->flags & MMF_ENABLE_MAP_SYNC_MASK);
+}
 
 #endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..ee4cde32d5cf 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,7 @@ struct prctl_mm_map {
 #define PR_SET_IO_FLUSHER  57
 #define PR_GET_IO_FLUSHER  58
 
+#define PR_SET_MAP_SYNC_ENABLE 59
+#define PR_GET_MAP_SYNC_ENABLE 60
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..d50cac15ef41 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
 
 static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
 
+#ifndef CONFIG_ARCH_MAP_SYNC_DISABLE
+unsigned long default_map_sync_mask = MMF_ENABLE_MAP_SYNC_MASK;
+#else
+unsigned long default_map_sync_mask = 0;
+#endif
+
 static int __init coredump_filter_setup(char *s)
 {
default_dump_filter =
@@ -1039,7 +1045,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm->flags = current->mm->flags & MMF_INIT_MASK;
mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
} else {
-   mm->flags = default_dump_filter;
+   mm->flags = default_dump_filter | default_map_sync_mask;
mm->def_flags = 0;
}
 
diff --git a/kernel/sys.c b/kernel/sys.c
index d325f3ab624a..5011912831b0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2450,6 +2450,24 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
clear_bit(MMF_DISABLE_THP, >mm->flags);
up_write(>mm->mmap_sem);
break;
+
+   case PR_GET_MAP_SYNC_ENABLE:
+   if (arg2 || arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = !!test_bit(MMF_ENABLE_MAP_SYNC, >mm->flags);
+   break;
+   case PR_SET_MAP_SYNC_ENABLE:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   if (down_write_killable(>mm->mmap_sem))
+   return -EINTR;
+

[RFC PATCH v2 4/5] powerpc/papr_scm: disable MAP_SYNC for newer hardware

2020-06-02 Thread Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 17 -
 drivers/nvdimm/of_pmem.c  |  7 +++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..30474d4cd109 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -30,6 +30,7 @@ struct papr_scm_priv {
uint64_t block_size;
int metadata_size;
bool is_volatile;
+   bool disable_map_sync;
 
uint64_t bound_addr;
 
@@ -340,11 +341,18 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.mapping = 
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = >nd_set;
+   set_bit(ND_REGION_SYNC_ENABLED, _desc.flags);
 
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, _desc);
else {
set_bit(ND_REGION_PERSIST_MEMCTRL, _desc.flags);
+   /*
+* for a persistent region, check if the platform needs to
+* force MAP_SYNC disable.
+*/
+   if (p->disable_map_sync)
+   clear_bit(ND_REGION_SYNC_ENABLED, _desc.flags);
p->region = nvdimm_pmem_region_create(p->bus, _desc);
}
if (!p->region) {
@@ -365,7 +373,7 @@ err:nvdimm_bus_unregister(p->bus);
 
 static int papr_scm_probe(struct platform_device *pdev)
 {
-   struct device_node *dn = pdev->dev.of_node;
+   struct device_node *dn;
u32 drc_index, metadata_size;
u64 blocks, block_size;
struct papr_scm_priv *p;
@@ -373,6 +381,10 @@ static int papr_scm_probe(struct platform_device *pdev)
u64 uuid[2];
int rc;
 
+   dn = dev_of_node(>dev);
+   if (!dn)
+   return -ENXIO;
+
/* check we have all the required DT properties */
if (of_property_read_u32(dn, "ibm,my-drc-index", _index)) {
dev_err(>dev, "%pOF: missing drc-index!\n", dn);
@@ -402,6 +414,9 @@ static int papr_scm_probe(struct platform_device *pdev)
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", _size);
 
+   if (of_device_is_compatible(dn, "ibm,pmemory-v2"))
+   p->disable_map_sync = true;
+
p->dn = dn;
p->drc_index = drc_index;
p->block_size = block_size;
diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 6826a274a1f1..a6cc3488e552 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -59,12 +59,19 @@ static int of_pmem_region_probe(struct platform_device 
*pdev)
ndr_desc.res = >resource[i];
ndr_desc.of_node = np;
set_bit(ND_REGION_PAGEMAP, _desc.flags);
+   set_bit(ND_REGION_SYNC_ENABLED, _desc.flags);
 
if (is_volatile)
region = nvdimm_volatile_region_create(bus, _desc);
else {
set_bit(ND_REGION_PERSIST_MEMCTRL, _desc.flags);
+   /*
+* for a persistent region, check for newer device
+*/
+   if (of_device_is_compatible(np, "pmem-region-v2"))
+   clear_bit(ND_REGION_SYNC_ENABLED, 
_desc.flags);
region = nvdimm_pmem_region_create(bus, _desc);
+
}
 
if (!region)
-- 
2.26.2

[RFC PATCH v2 3/5] libnvdimm/dax: Make DAXDEV_SYNC_ENABLED flag region-specific

2020-06-02 Thread Aneesh Kumar K.V

This patch makes sync fault enable/disable feature more fine-grained
by allowing region-wise control of the same.

In a followup patch on ppc64 only device with compat string "ibm,pmemory-v2"
will disable the sync fault feature.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 16 +---
 drivers/nvdimm/pmem.c|  4 
 drivers/nvdimm/region_devs.c | 16 
 include/linux/dax.h  | 16 
 include/linux/libnvdimm.h|  4 
 6 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index df238c8b6ef2..8a825ecff49b 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -420,7 +420,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC | 
DAXDEV_F_SYNC_ENABLED);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 980f7be7e56d..f93e6649d452 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -260,10 +260,11 @@ static ssize_t write_cache_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(write_cache);
 
-static bool dax_synchronous_enabled(struct dax_device *dax_dev)
+bool __dax_synchronous_enabled(struct dax_device *dax_dev)
 {
return test_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
 }
+EXPORT_SYMBOL_GPL(__dax_synchronous_enabled);
 
 static void set_dax_synchronous_enable(struct dax_device *dax_dev, bool enable)
 {
@@ -280,6 +281,7 @@ static void set_dax_synchronous_enable(struct dax_device 
*dax_dev, bool enable)
 static ssize_t sync_fault_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
+   int enabled;
struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
ssize_t rc;
 
@@ -287,7 +289,8 @@ static ssize_t sync_fault_show(struct device *dev,
if (!dax_dev)
return -ENXIO;
 
-   rc = sprintf(buf, "%d\n", !!__dax_synchronous(dax_dev));
+   enabled = (dax_synchronous(dax_dev) && 
dax_synchronous_enabled(dax_dev));
+   rc = sprintf(buf, "%d\n", enabled);
put_dax(dax_dev);
return rc;
 }
@@ -461,17 +464,13 @@ EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
 bool __dax_synchronous(struct dax_device *dax_dev)
 {
-   return test_bit(DAXDEV_SYNC, _dev->flags) &&
-   test_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
+   return test_bit(DAXDEV_SYNC, _dev->flags);
 }
 EXPORT_SYMBOL_GPL(__dax_synchronous);
 
 void __set_dax_synchronous(struct dax_device *dax_dev)
 {
set_bit(DAXDEV_SYNC, _dev->flags);
-#ifndef CONFIG_ARCH_MAP_SYNC_DISABLE
-   set_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
-#endif
 }
 EXPORT_SYMBOL_GPL(__set_dax_synchronous);
 
@@ -665,6 +664,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
if (flags & DAXDEV_F_SYNC)
set_dax_synchronous(dax_dev);
 
+   if (flags & DAXDEV_F_SYNC_ENABLED)
+   set_dax_synchronous_enable(dax_dev, true);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 2df6994acf83..dc9c269eb50d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -485,6 +485,10 @@ static int pmem_attach_disk(struct device *dev,
 
if (is_nvdimm_sync(nd_region))
flags = DAXDEV_F_SYNC;
+
+   if (is_nvdimm_sync_enabled(nd_region))
+   flags |= DAXDEV_F_SYNC_ENABLED;
+
dax_dev = alloc_dax(pmem, disk->disk_name, _dax_ops, flags);
if (IS_ERR(dax_dev)) {
put_disk(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index ccbb5b43b8b2..2181560cf655 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1283,6 +1283,22 @@ bool is_nvdimm_sync(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(is_nvdimm_sync);
 
+bool is_nvdimm_sync_enabled(struct nd_region *nd_region)
+{
+#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
+   if (is_nd_volatile(_region->dev))
+   return true;
+
+   return is_nd_pmem(_region->dev) &&
+   test_bit(ND_REGION_SYNC_ENABLED, _region->flags);
+#else
+   return true;
+#endif
+
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync_enabled);
+
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d7af5d243f24..c4a3551557de 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -10,6 +10,9 @@
 /* Flag for synchronous flush */
 #define DAXDEV_F_SYNC (1UL << 0)
 
+/* flag for platform forcing synchronous flush disable */

[RFC PATCH v2 1/5] libnvdimm/dax: Add a dax flag to control synchronous fault support

2020-06-02 Thread Aneesh Kumar K.V

With POWER10, architecture is adding new pmem flush and sync instructions.
The kernel should prevent the usage of MAP_SYNC if applications are not using
the new instructions on newer hardware

This patch adds a dax attribute 
(/sys/bus/nd/devices/region0/pfn0.1/block/pmem0/dax/sync_fault)
which can be used to control this flag. If the device supports synchronous flush
then userspace can update this attribute to enable/disable the synchronous
fault. The attribute is only visible if there is write cache enabled on the 
device.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/dax/super.c | 73 -
 mm/Kconfig  |  3 ++
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 8e32345be0f7..980f7be7e56d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -198,6 +198,12 @@ enum dax_device_flags {
DAXDEV_WRITE_CACHE,
/* flag to check if device supports synchronous flush */
DAXDEV_SYNC,
+   /*
+* flag to indicate whether synchronous flush is enabled.
+* Some platform may want to disable synchronous flush support
+* even though device supports the same.
+*/
+   DAXDEV_SYNC_ENABLED,
 };
 
 /**
@@ -254,6 +260,60 @@ static ssize_t write_cache_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(write_cache);
 
+static bool dax_synchronous_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
+}
+
+static void set_dax_synchronous_enable(struct dax_device *dax_dev, bool enable)
+{
+   if (!test_bit(DAXDEV_SYNC, _dev->flags))
+   return;
+
+   if (enable)
+   set_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
+   else
+   clear_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
+}
+
+
+static ssize_t sync_fault_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+   ssize_t rc;
+
+   WARN_ON_ONCE(!dax_dev);
+   if (!dax_dev)
+   return -ENXIO;
+
+   rc = sprintf(buf, "%d\n", !!__dax_synchronous(dax_dev));
+   put_dax(dax_dev);
+   return rc;
+}
+
+static ssize_t sync_fault_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t len)
+{
+   bool enable_sync;
+   int rc = strtobool(buf, _sync);
+   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
+
+   WARN_ON_ONCE(!dax_dev);
+   if (!dax_dev)
+   return -ENXIO;
+
+   if (rc)
+   len = rc;
+   else
+   set_dax_synchronous_enable(dax_dev, enable_sync);
+
+   put_dax(dax_dev);
+   return len;
+}
+
+static DEVICE_ATTR_RW(sync_fault);
+
 static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
 {
struct device *dev = container_of(kobj, typeof(*dev), kobj);
@@ -267,11 +327,18 @@ static umode_t dax_visible(struct kobject *kobj, struct 
attribute *a, int n)
if (a == _attr_write_cache.attr)
return 0;
 #endif
+   if (a == _attr_sync_fault.attr) {
+   if (dax_write_cache_enabled(dax_dev))
+   return a->mode;
+   return 0;
+   }
+
return a->mode;
 }
 
 static struct attribute *dax_attributes[] = {
_attr_write_cache.attr,
+   _attr_sync_fault.attr,
NULL,
 };
 
@@ -394,13 +461,17 @@ EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
 bool __dax_synchronous(struct dax_device *dax_dev)
 {
-   return test_bit(DAXDEV_SYNC, _dev->flags);
+   return test_bit(DAXDEV_SYNC, _dev->flags) &&
+   test_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
 }
 EXPORT_SYMBOL_GPL(__dax_synchronous);
 
 void __set_dax_synchronous(struct dax_device *dax_dev)
 {
set_bit(DAXDEV_SYNC, _dev->flags);
+#ifndef CONFIG_ARCH_MAP_SYNC_DISABLE
+   set_bit(DAXDEV_SYNC_ENABLED, _dev->flags);
+#endif
 }
 EXPORT_SYMBOL_GPL(__set_dax_synchronous);
 
diff --git a/mm/Kconfig b/mm/Kconfig
index c1acc34c1c35..38fd7cfbfca8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -867,4 +867,7 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
 bool
 
+config ARCH_MAP_SYNC_DISABLE
+   bool
+
 endmenu
-- 
2.26.2

Re: [PATCH 1/4] powerpc: Add a ppc_inst_as_str() helper

2020-06-02 Thread Joel Stanley

On Tue, 2 Jun 2020 at 05:31, Jordan Niethe  wrote:
>
> There are quite a few places where instructions are printed, this is
> done using a '%x' format specifier. With the introduction of prefixed
> instructions, this does not work well. Currently in these places,
> ppc_inst_val() is used for the value for %x so only the first word of
> prefixed instructions are printed.
>
> When the instructions are word instructions, only a single word should
> be printed. For prefixed instructions both the prefix and suffix should
> be printed. To accommodate both of these situations, instead of a '%x'
> specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
> returns a char *. The char * __ppc_inst_as_str() returns is buffer that
> is passed to it by the caller.
>
> It is cumbersome to require every caller of __ppc_inst_as_str() to now
> declare a buffer. To make it more convenient to use __ppc_inst_as_str(),
> wrap it in a macro that uses a compound statement to allocate a buffer
> on the caller's stack before calling it.
>
> Signed-off-by: Jordan Niethe 

Reviewed-by: Joel Stanley 

> ---
>  arch/powerpc/include/asm/inst.h  | 19 +++
>  arch/powerpc/kernel/kprobes.c|  2 +-
>  arch/powerpc/kernel/trace/ftrace.c   | 26 +-
>  arch/powerpc/lib/test_emulate_step.c |  4 ++--
>  arch/powerpc/xmon/xmon.c |  2 +-
>  5 files changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index 45f3ec868258..3df7806e6dc3 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -122,6 +122,25 @@ static inline u64 ppc_inst_as_u64(struct ppc_inst x)
>  #endif
>  }
>
> +#define PPC_INST_STR_LEN sizeof("0x 0x")
> +
> +static inline char *__ppc_inst_as_str(char str[PPC_INST_STR_LEN], struct 
> ppc_inst x)
> +{
> +   if (ppc_inst_prefixed(x))
> +   sprintf(str, "0x%08x 0x%08x", ppc_inst_val(x), 
> ppc_inst_suffix(x));
> +   else
> +   sprintf(str, "0x%08x", ppc_inst_val(x));
> +
> +   return str;
> +}
> +
> +#define ppc_inst_as_str(x) \
> +({ \
> +   char __str[PPC_INST_STR_LEN];   \
> +   __ppc_inst_as_str(__str, x);\
> +   __str;  \
> +})
> +
>  int probe_user_read_inst(struct ppc_inst *inst,
>  struct ppc_inst __user *nip);
>
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 227510df8c55..d0797171dba3 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -244,7 +244,7 @@ static int try_to_emulate(struct kprobe *p, struct 
> pt_regs *regs)
>  * So, we should never get here... but, its still
>  * good to catch them, just in case...
>  */
> -   printk("Can't step on instruction %x\n", ppc_inst_val(insn));
> +   printk("Can't step on instruction %s\n", 
> ppc_inst_as_str(insn));
> BUG();
> } else {
> /*
> diff --git a/arch/powerpc/kernel/trace/ftrace.c 
> b/arch/powerpc/kernel/trace/ftrace.c
> index 5e399628f51a..da11a26d8213 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -73,8 +73,8 @@ ftrace_modify_code(unsigned long ip, struct ppc_inst old, 
> struct ppc_inst new)
>
> /* Make sure it is what we expect it to be */
> if (!ppc_inst_equal(replaced, old)) {
> -   pr_err("%p: replaced (%#x) != old (%#x)",
> -   (void *)ip, ppc_inst_val(replaced), ppc_inst_val(old));
> +   pr_err("%p: replaced (%s) != old (%s)",
> +   (void *)ip, ppc_inst_as_str(replaced), ppc_inst_as_str(old));
> return -EINVAL;
> }
>
> @@ -137,7 +137,7 @@ __ftrace_make_nop(struct module *mod,
>
> /* Make sure that that this is still a 24bit jump */
> if (!is_bl_op(op)) {
> -   pr_err("Not expected bl: opcode is %x\n", ppc_inst_val(op));
> +   pr_err("Not expected bl: opcode is %s\n", 
> ppc_inst_as_str(op));
> return -EINVAL;
> }
>
> @@ -172,8 +172,8 @@ __ftrace_make_nop(struct module *mod,
> /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
> if (!ppc_inst_equal(op, ppc_inst(PPC_INST_MFLR)) &&
> !ppc_inst_equal(op, ppc_inst(PPC_INST_STD_LR))) {
> -   pr_err("Unexpected instruction %08x around bl _mcount\n",
> -  ppc_inst_val(op));
> +   pr_err("Unexpected instruction %s around bl _mcount\n",
> +  ppc_inst_as_str(op));
> return -EINVAL;
> }
>  #else
> @@ -203,7 +203,7 @@ __ftrace_make_nop(struct module *mod,
> }
>
> if (!ppc_inst_equal(op,  ppc_inst(PPC_INST_LD_TOC))) {
> -   pr_err("Expected %08x found %08x\n",

[PATCH v2] cxl: Remove dead Kconfig option

2020-06-02 Thread Andrew Donnellan

The CXL_AFU_DRIVER_OPS Kconfig option was added to coordinate merging of
new features. It no longer serves any purpose, so remove it.

Signed-off-by: Andrew Donnellan 

---
v1->v2:
- keep CXL_LIB for now to avoid breaking a driver that is currently out of
tree
---
 drivers/misc/cxl/Kconfig | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 39eec9031487..984114f7cee9 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -7,9 +7,6 @@ config CXL_BASE
bool
select PPC_COPRO_BASE
 
-config CXL_AFU_DRIVER_OPS
-   bool
-
 config CXL_LIB
bool
 
@@ -17,7 +14,6 @@ config CXL
tristate "Support for IBM Coherent Accelerators (CXL)"
depends on PPC_POWERNV && PCI_MSI && EEH
select CXL_BASE
-   select CXL_AFU_DRIVER_OPS
select CXL_LIB
default m
help
-- 
2.20.1

Re: [PATCH] powerpc/kvm: Enable support for ISA v3.1 guests

2020-06-02 Thread Jordan Niethe

On Tue, Jun 2, 2020 at 3:55 PM Alistair Popple  wrote:
>
> Adds support for emulating ISAv3.1 guests by adding the appropriate PCR
> and FSCR bits.
>
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/include/asm/reg.h |  1 +
>  arch/powerpc/kvm/book3s_hv.c   | 11 ---
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 773f76402392..d77040d0588a 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1348,6 +1348,7 @@
>  #define PVR_ARCH_206p  0x0f13
>  #define PVR_ARCH_207   0x0f04
>  #define PVR_ARCH_300   0x0f05
> +#define PVR_ARCH_310x0f06
>
>  /* Macros for setting and retrieving special purpose registers */
>  #ifndef __ASSEMBLY__
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 93493f0cbfe8..359bb2ed43e1 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -345,7 +345,7 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 
> pvr)
>  }
>
>  /* Dummy value used in computing PCR value below */
> -#define PCR_ARCH_300   (PCR_ARCH_207 << 1)
> +#define PCR_ARCH_31(PCR_ARCH_300 << 1)
>
>  static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
>  {
> @@ -353,7 +353,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, 
> u32 arch_compat)
> struct kvmppc_vcore *vc = vcpu->arch.vcore;
>
> /* We can (emulate) our own architecture version and anything older */
> -   if (cpu_has_feature(CPU_FTR_ARCH_300))
> +   if (cpu_has_feature(CPU_FTR_ARCH_31))
> +   host_pcr_bit = PCR_ARCH_31;
> +   else if (cpu_has_feature(CPU_FTR_ARCH_300))
> host_pcr_bit = PCR_ARCH_300;
> else if (cpu_has_feature(CPU_FTR_ARCH_207S))
> host_pcr_bit = PCR_ARCH_207;
> @@ -379,6 +381,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, 
> u32 arch_compat)
> case PVR_ARCH_300:
> guest_pcr_bit = PCR_ARCH_300;
> break;
> +   case PVR_ARCH_31:
> +   guest_pcr_bit = PCR_ARCH_31;
> +   break;
> default:
> return -EINVAL;
> }
> @@ -2318,7 +2323,7 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu 
> *vcpu)
>  * to trap and then we emulate them.
>  */
The comment above this:
"...
 * Set the default HFSCR for the guest from the host value.
 * This value is only used on POWER9..."
would need to be updated.
> vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
> -   HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
> +   HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP | HFSCR_PREFIX;
> if (cpu_has_feature(CPU_FTR_HVMODE)) {
> vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
> if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
> --
> 2.20.1
>

66 matches

Mail list logo