date:20190522

Re: [BISECTED] kexec regression on PowerBook G4

2019-05-22 Thread Christophe Leroy


Hi,

Le 22/05/2019 à 23:17, Aaro Koskinen a écrit :

Hi,

On Wed, May 22, 2019 at 10:33:35PM +0200, LEROY Christophe wrote:

Can you provide full details of the Oops you get ? And also your System.map ?


System.map is below. The oops log is attached as jpeg (crappy camera
shoot, apologies, I hope it gets through) as the only way I can see it
is the frame buffer display.


Also build with CONFIG_PPC_PTDUMP and mount /sys/kernel/debug and give
content of /sys/kernel/debug/powerpc/block_address_translation and
.../segment_registers before the failing kexec, and also
/sys/kernel/debug/kernel_page_tables


The kernel that fails is essentially headless without any input access. I
could probably do this if needed, but it's going to take a while...



Ok, the Oops confirms that the error is due to executing the kexec 
control code which is located outside the kernel text area.


My yesterday's proposed change doesn't work because on book3S/32, NX 
protection is based on setting segments to NX, and using IBATs for 
kernel text.


Can you try the patch I sent out a few minutes ago ? 
(https://patchwork.ozlabs.org/patch/1103827/)


Thanks
Christophe

[RFC PATCH] powerpc: fix kexec failure on book3s/32

2019-05-22 Thread Christophe Leroy

Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/machine_kexec_32.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/machine_kexec_32.c 
b/arch/powerpc/kernel/machine_kexec_32.c
index affe5dcce7f4..b6a4250b9ee0 100644
--- a/arch/powerpc/kernel/machine_kexec_32.c
+++ b/arch/powerpc/kernel/machine_kexec_32.c
@@ -54,6 +54,8 @@ void default_machine_kexec(struct kimage *image)
memcpy((void *)reboot_code_buffer, relocate_new_kernel,
relocate_new_kernel_size);
 
+   mtsrin(mfsrin(reboot_code_buffer) & ~SR_NX, reboot_code_buffer);
+
flush_icache_range(reboot_code_buffer,
reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
printk(KERN_INFO "Bye!\n");
-- 
2.13.3

[RFC PATCH 7/7] powerpc: Book3S 64-bit "heavyweight" KASAN support

2019-05-22 Thread Daniel Axtens

KASAN support on powerpc64 is interesting:

 - We want to be able to support inline instrumentation so as to be
   able to catch global and stack issues.

 - We run a lot of code at boot in real mode. This includes stuff like
   printk(), so it's not feasible to just disable instrumentation
   around it.

   [For those not immersed in ppc64, in real mode, the top nibble or
   byte (depending on radix/hash mmu) of the address is ignored. To
   make things work, we put the linear mapping at
   0xc000. This means that a pointer to part of the linear
   mapping will work both in real mode, where it will be interpreted
   as a physical address of the form 0x000..., and out of real mode,
   where it will go via the linear mapping.]

 - Inline instrumentation requires a fixed offset.

 - Because of our running things in real mode, the offset has to
   point to valid memory both in and out of real mode.

This makes finding somewhere to put the KASAN shadow region a bit fun.

One approach is just to give up on inline instrumentation; and this is
what the 64 bit book3e code does. This way we can delay all checks
until after we get everything set up to our satisfaction. However,
we'd really like to do better.

What we can do - if we know _at compile time_ how much physical memory
we have - is to set aside the top 1/8th of the memory and use that.
This is a big hammer (hence the "heavyweight" name) and comes with 2
big consequences:

 - kernels will simply fail to boot on machines with less memory than
   specified when compiling.

 - kernels running on machines with more memory than specified when
   compiling will simply ignore the extra memory.

If you can bear this consequence, you get pretty full support for
KASAN.

This is still pretty WIP but I wanted to get it out there sooner
rather than later. Ongoing work:

 - Currently incompatible with KUAP (top priority to fix)

 - Currently incompatible with ftrace (no idea why yet)

 - Only supports radix at the moment

 - Very minimal testing (boots a Ubuntu VM, test_kasan runs)

 - Extend 'lightweight' outline support from book3e that will work
   without requring memory to be known at compile time.

 - It assumes physical memory is contiguous. I don't really think
   we can get around this, so we should try to ensure it.

Despite the limitations, it can still find bugs,
e.g. http://patchwork.ozlabs.org/patch/1103775/

Massive thanks to mpe, who had the idea for the initial design.

Signed-off-by: Daniel Axtens 

---

Tested on qemu-pseries and qemu-powernv, seems to work on both
of those. Does not work on the talos that I tested on, no idea
why yet.

---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/Kconfig.debug   | 15 +
 arch/powerpc/Makefile|  7 ++
 arch/powerpc/include/asm/kasan.h | 45 +
 arch/powerpc/kernel/prom.c   | 40 
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/kasan_init_book3s_64.c | 67 
 7 files changed, 176 insertions(+)
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3s_64.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4e266b019dd7..203cd07cf6e0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -171,6 +171,7 @@ config PPC
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
select HAVE_ARCH_KASAN  if PPC_BOOK3E_64 && 
!SPARSEMEM_VMEMMAP
+   select HAVE_ARCH_KASAN  if PPC_BOOK3S_64 && 
!CONFIG_FTRACE && !PPC_KUAP
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 23a37facc854..c0916408668c 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -394,6 +394,21 @@ config PPC_FAST_ENDIAN_SWITCH
 help
  If you're unsure what this is, say N.
 
+config PHYS_MEM_SIZE_FOR_KASAN
+   int "Physical memory size for KASAN (MB)"
+   depends on KASAN && PPC_BOOK3S_64
+   help
+ To get inline instrumentation support for KASAN on 64-bit Book3S
+ machines, you need to specify how much physical memory your system
+ has. A shadow offset will be calculated based on this figure, which
+ will be compiled in to the kernel. KASAN will use this offset to
+ access its shadow region, which is used to verify memory accesses.
+
+ If you attempt to boot on a system with less memory than you specify
+ here, your system will fail to boot very early in the process. If you
+ boot on a system with more memory than you specify, the extra memory
+ will wasted - it will be reserved and not used.
+
 config KASAN_SHADOW_OFFSET
hex
depends on KASAN && PPC32
diff --git a/arch/powerpc/Makefile

[RFC PATCH 6/7] kasan: allow arches to hook into global registration

2019-05-22 Thread Daniel Axtens

Not all arches have a specific space carved out for modules -
some, such as powerpc, just use regular vmalloc space. Therefore,
globals in these modules cannot be backed by real shadow memory.

In order to allow arches to perform this check, add a hook.

Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 5 +
 mm/kasan/generic.c| 3 +++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index dfee2b42d799..4752749e4797 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -18,6 +18,11 @@ struct task_struct;
 static inline bool kasan_arch_is_ready(void)   { return true; }
 #endif
 
+#ifndef kasan_arch_can_register_global
+static inline bool kasan_arch_can_register_global(const void * addr)   { 
return true; }
+#endif
+
+
 #ifndef ARCH_HAS_KASAN_EARLY_SHADOW
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
 extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 0336f31bbae3..935b06f659a0 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -208,6 +208,9 @@ static void register_global(struct kasan_global *global)
 {
size_t aligned_size = round_up(global->size, KASAN_SHADOW_SCALE_SIZE);
 
+   if (!kasan_arch_can_register_global(global->beg))
+   return;
+
kasan_unpoison_shadow(global->beg, global->size);
 
kasan_poison_shadow(global->beg + aligned_size,
-- 
2.19.1

[RFC PATCH 5/7] kasan: allow arches to provide their own early shadow setup

2019-05-22 Thread Daniel Axtens

powerpc supports several different MMUs. In particular, book3s
machines support both a hash-table based MMU and a radix MMU.
These MMUs support different numbers of entries per directory
level: PTES_PER_* reference variables. This leads to complier
errors as global variables must have constant sizes.

Allow architectures to manage their own early shadow variables
so we can work around this on powerpc.

Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h |  2 ++
 mm/kasan/init.c   | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index a630d53f1a36..dfee2b42d799 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -18,11 +18,13 @@ struct task_struct;
 static inline bool kasan_arch_is_ready(void)   { return true; }
 #endif
 
+#ifndef ARCH_HAS_KASAN_EARLY_SHADOW
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
 extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
 extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
 extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
+#endif
 
 int kasan_populate_early_shadow(const void *shadow_start,
const void *shadow_end);
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index ce45c491ebcd..2522382bf374 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -31,10 +31,14 @@
  *   - Latter it reused it as zero shadow to cover large ranges of memory
  * that allowed to access, but not handled by kasan (vmalloc/vmemmap ...).
  */
+#ifndef ARCH_HAS_KASAN_EARLY_SHADOW
 unsigned char kasan_early_shadow_page[PAGE_SIZE] __page_aligned_bss;
+#endif
 
 #if CONFIG_PGTABLE_LEVELS > 4
+#ifndef ARCH_HAS_KASAN_EARLY_SHADOW
 p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D] __page_aligned_bss;
+#endif
 static inline bool kasan_p4d_table(pgd_t pgd)
 {
return pgd_page(pgd) == virt_to_page(lm_alias(kasan_early_shadow_p4d));
@@ -46,7 +50,9 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
+#ifndef ARCH_HAS_KASAN_EARLY_SHADOW
 pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+#endif
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -58,7 +64,9 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
+#ifndef ARCH_HAS_KASAN_EARLY_SHADOW
 pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+#endif
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -69,7 +77,9 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
+#ifndef ARCH_HAS_KASAN_EARLY_SHADOW
 pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
+#endif
 
 static inline bool kasan_pte_table(pmd_t pmd)
 {
-- 
2.19.1

[RFC PATCH 4/7] powerpc: KASAN for 64bit Book3E

2019-05-22 Thread Daniel Axtens

Wire up KASAN. Only outline instrumentation is supported.

The KASAN shadow area is mapped into vmemmap space:
0x8000 0400   to 0x8000 0600  .
To do this we require that vmemmap be disabled. (This is the default
in the kernel config that QorIQ provides for the machine in their
SDK anyway - they use flat memory.)

Only the kernel linear mapping (0xc000...) is checked. The vmalloc and
ioremap areas (also in 0x800...) are all mapped to the zero page. As
with the Book3S hash series, this requires overriding the memory <->
shadow mapping.

Also, as with both previous 64-bit series, early instrumentation is not
supported.  It would allow us to drop the check_return_arch_not_ready()
hook in the KASAN core, but it's tricky to get it set up early enough:
we need it setup before the first call to instrumented code like printk().
Perhaps in the future.

Only KASAN_MINIMAL works.

Tested on e6500. KVM, kexec and xmon have not been tested.

The test_kasan module fires warnings as expected, except for the
following tests:

 - Expected/by design:
kasan test: memcg_accounted_kmem_cache allocate memcg accounted object

 - Due to only supporting KASAN_MINIMAL:
kasan test: kasan_stack_oob out-of-bounds on stack
kasan test: kasan_global_oob out-of-bounds global variable
kasan test: kasan_alloca_oob_left out-of-bounds to left on alloca
kasan test: kasan_alloca_oob_right out-of-bounds to right on alloca
kasan test: use_after_scope_test use-after-scope on int
kasan test: use_after_scope_test use-after-scope on array

Thanks to those who have done the heavy lifting over the past several
years:
 - Christophe's 32 bit series: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-February/185379.html
 - Aneesh's Book3S hash series: https://lwn.net/Articles/655642/
 - Balbir's Book3S radix series: https://patchwork.ozlabs.org/patch/795211/

Cc: Christophe Leroy 
Cc: Aneesh Kumar K.V 
Cc: Balbir Singh 
Signed-off-by: Daniel Axtens 
[- Removed EXPORT_SYMBOL of the static key
 - Fixed most checkpatch problems
 - Replaced kasan_zero_page[] by kasan_early_shadow_page[]
 - Reduced casting mess by using intermediate locals
 - Fixed build failure on pmac32_defconfig]
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/Kconfig.debug   |  2 +-
 arch/powerpc/include/asm/kasan.h | 71 
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/kasan_init_book3e_64.c | 50 ++
 arch/powerpc/mm/nohash/Makefile  |  5 ++
 6 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3e_64.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6a66a2da5b1a..4e266b019dd7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -170,6 +170,7 @@ config PPC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
+   select HAVE_ARCH_KASAN  if PPC_BOOK3E_64 && 
!SPARSEMEM_VMEMMAP
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index c59920920ddc..23a37facc854 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -396,5 +396,5 @@ config PPC_FAST_ENDIAN_SWITCH
 
 config KASAN_SHADOW_OFFSET
hex
-   depends on KASAN
+   depends on KASAN && PPC32
default 0xe000
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 296e51c2f066..ae410f0e060d 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -21,12 +21,15 @@
 #define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \
 (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
 
+#ifdef CONFIG_PPC32
 #define KASAN_SHADOW_OFFSETASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
 
 #define KASAN_SHADOW_END   0UL
 
 #define KASAN_SHADOW_SIZE  (KASAN_SHADOW_END - KASAN_SHADOW_START)
 
+#endif /* CONFIG_PPC32 */
+
 #ifdef CONFIG_KASAN
 void kasan_early_init(void);
 void kasan_mmu_init(void);
@@ -36,5 +39,73 @@ static inline void kasan_init(void) { }
 static inline void kasan_mmu_init(void) { }
 #endif
 
+#ifdef CONFIG_PPC_BOOK3E_64
+#include 
+#include 
+
+/*
+ * We don't put this in Kconfig as we only support KASAN_MINIMAL, and
+ * that will be disabled if the symbol is available in Kconfig
+ */
+#define KASAN_SHADOW_OFFSETASM_CONST(0x68000400)
+
+#define KASAN_SHADOW_SIZE  (KERN_VIRT_SIZE >> KASAN_SHADOW_SCALE_SHIFT)
+
+extern struct static_key_false powerpc_kasan_enabled_key;
+extern unsigned char kasan_early_shadow_page[];
+
+static inline bool kasan_arch_is_ready_book3e(void)
+{
+   if (static_branch_likely(_kasan_enabled_key))
+   return true;
+   return false;
+}
+#define kasan_arch_is_ready

[RFC PATCH 3/7] kasan: allow architectures to provide an outline readiness check

2019-05-22 Thread Daniel Axtens

In powerpc (as I understand it), we spend a lot of time in boot
running in real mode before MMU paging is initialised. During
this time we call a lot of generic code, including printk(). If
we try to access the shadow region during this time, things fail.

My attempts to move early init before the first printk have not
been successful. (Both previous RFCs for ppc64 - by 2 different
people - have needed this trick too!)

So, allow architectures to define a kasan_arch_is_ready()
hook that bails out of check_memory_region_inline() unless the
arch has done all of the init.

Link: https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
Link: https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
Originally-by: Balbir Singh 
Cc: Aneesh Kumar K.V 
Signed-off-by: Daniel Axtens 
[check_return_arch_not_ready() ==> static inline kasan_arch_is_ready()]
Signed-off-by: Christophe Leroy 
---
 include/linux/kasan.h | 4 
 mm/kasan/generic.c| 3 +++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index f6261840f94c..a630d53f1a36 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -14,6 +14,10 @@ struct task_struct;
 #include 
 #include 
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
 extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
 extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index a5b28e3ceacb..0336f31bbae3 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -170,6 +170,9 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return;
+
if (unlikely(size == 0))
return;
 
-- 
2.19.1

[RFC PATCH 2/7] kasan: allow architectures to manage the memory-to-shadow mapping

2019-05-22 Thread Daniel Axtens

Currently, shadow addresses are always addr >> shift + offset.
However, for powerpc, the virtual address space is fragmented in
ways that make this simple scheme impractical.

Allow architectures to override:
 - kasan_shadow_to_mem
 - kasan_mem_to_shadow
 - addr_has_shadow

Rename addr_has_shadow to kasan_addr_has_shadow as if it is
overridden it will be available in more places, increasing the
risk of collisions.

If architectures do not #define their own versions, the generic
code will continue to run as usual.

Reviewed-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 
Signed-off-by: Christophe Leroy 
---
 include/linux/kasan.h | 2 ++
 mm/kasan/generic.c| 2 +-
 mm/kasan/generic_report.c | 2 +-
 mm/kasan/kasan.h  | 6 +-
 mm/kasan/report.c | 6 +++---
 mm/kasan/tags.c   | 2 +-
 6 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index b40ea104dd36..f6261840f94c 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -23,11 +23,13 @@ extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 int kasan_populate_early_shadow(const void *shadow_start,
const void *shadow_end);
 
+#ifndef kasan_mem_to_shadow
 static inline void *kasan_mem_to_shadow(const void *addr)
 {
return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
+ KASAN_SHADOW_OFFSET;
 }
+#endif
 
 /* Enable reporting bugs after kasan_disable_current() */
 extern void kasan_enable_current(void);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 9e5c989dab8c..a5b28e3ceacb 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -173,7 +173,7 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
if (unlikely(size == 0))
return;
 
-   if (unlikely(!addr_has_shadow((void *)addr))) {
+   if (unlikely(!kasan_addr_has_shadow((void *)addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
diff --git a/mm/kasan/generic_report.c b/mm/kasan/generic_report.c
index 36c645939bc9..6caafd61fc3a 100644
--- a/mm/kasan/generic_report.c
+++ b/mm/kasan/generic_report.c
@@ -107,7 +107,7 @@ static const char *get_wild_bug_type(struct 
kasan_access_info *info)
 
 const char *get_bug_type(struct kasan_access_info *info)
 {
-   if (addr_has_shadow(info->access_addr))
+   if (kasan_addr_has_shadow(info->access_addr))
return get_shadow_bug_type(info);
return get_wild_bug_type(info);
 }
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 3ce956efa0cb..8fcbe4027929 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -110,16 +110,20 @@ struct kasan_alloc_meta *get_alloc_info(struct kmem_cache 
*cache,
 struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
const void *object);
 
+#ifndef kasan_shadow_to_mem
 static inline const void *kasan_shadow_to_mem(const void *shadow_addr)
 {
return (void *)(((unsigned long)shadow_addr - KASAN_SHADOW_OFFSET)
<< KASAN_SHADOW_SCALE_SHIFT);
 }
+#endif
 
-static inline bool addr_has_shadow(const void *addr)
+#ifndef kasan_addr_has_shadow
+static inline bool kasan_addr_has_shadow(const void *addr)
 {
return (addr >= kasan_shadow_to_mem((void *)KASAN_SHADOW_START));
 }
+#endif
 
 void kasan_poison_shadow(const void *address, size_t size, u8 value);
 
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 03a443579386..a713b64c232b 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -298,7 +298,7 @@ void __kasan_report(unsigned long addr, size_t size, bool 
is_write, unsigned lon
untagged_addr = reset_tag(tagged_addr);
 
info.access_addr = tagged_addr;
-   if (addr_has_shadow(untagged_addr))
+   if (kasan_addr_has_shadow(untagged_addr))
info.first_bad_addr = find_first_bad_addr(tagged_addr, size);
else
info.first_bad_addr = untagged_addr;
@@ -309,11 +309,11 @@ void __kasan_report(unsigned long addr, size_t size, bool 
is_write, unsigned lon
start_report();
 
print_error_description();
-   if (addr_has_shadow(untagged_addr))
+   if (kasan_addr_has_shadow(untagged_addr))
print_tags(get_tag(tagged_addr), info.first_bad_addr);
pr_err("\n");
 
-   if (addr_has_shadow(untagged_addr)) {
+   if (kasan_addr_has_shadow(untagged_addr)) {
print_address_description(untagged_addr);
pr_err("\n");
print_shadow_for_address(info.first_bad_addr);
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 87ebee0a6aea..661c23dd5340 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -109,7 +109,7 @@ void check_memory_region(unsigned long addr, size_t size, 
bool write,
return;
 
untagged_addr = reset_tag((const void *)addr);
-   if (unlikely(!addr_has_shadow(untagged_addr))) {

[RFC PATCH 1/7] kasan: do not open-code addr_has_shadow

2019-05-22 Thread Daniel Axtens

We have a couple of places checking for the existence of a shadow
mapping for an address by open-coding the inverse of the check in
addr_has_shadow.

Replace the open-coded versions with the helper. This will be
needed in future to allow architectures to override the layout
of the shadow mapping.

Reviewed-by: Andrew Donnellan 
Reviewed-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 
Signed-off-by: Christophe Leroy 
---
 mm/kasan/generic.c | 3 +--
 mm/kasan/tags.c| 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 504c79363a34..9e5c989dab8c 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -173,8 +173,7 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
if (unlikely(size == 0))
return;
 
-   if (unlikely((void *)addr <
-   kasan_shadow_to_mem((void *)KASAN_SHADOW_START))) {
+   if (unlikely(!addr_has_shadow((void *)addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 63fca3172659..87ebee0a6aea 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -109,8 +109,7 @@ void check_memory_region(unsigned long addr, size_t size, 
bool write,
return;
 
untagged_addr = reset_tag((const void *)addr);
-   if (unlikely(untagged_addr <
-   kasan_shadow_to_mem((void *)KASAN_SHADOW_START))) {
+   if (unlikely(!addr_has_shadow(untagged_addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
-- 
2.19.1

[RFC PATCH 0/7] powerpc: KASAN for 64-bit 3s radix

2019-05-22 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to Book3S radix.

It builds on top Christophe's work on 32bit, and includes my work for
64-bit Book3E (3S doesn't really depend on 3E, but it was handy to
have around when developing and debugging).

This provides full inline instrumentation on radix, but does require
that you be able to specify the amount of memory on the system at
compile time. More details in patch 7.

Regards,
Daniel

Daniel Axtens (7):
  kasan: do not open-code addr_has_shadow
  kasan: allow architectures to manage the memory-to-shadow mapping
  kasan: allow architectures to provide an outline readiness check
  powerpc: KASAN for 64bit Book3E
  kasan: allow arches to provide their own early shadow setup
  kasan: allow arches to hook into global registration
  powerpc: Book3S 64-bit "heavyweight" KASAN support

 arch/powerpc/Kconfig |   2 +
 arch/powerpc/Kconfig.debug   |  17 ++-
 arch/powerpc/Makefile|   7 ++
 arch/powerpc/include/asm/kasan.h | 116 +++
 arch/powerpc/kernel/prom.c   |  40 +++
 arch/powerpc/mm/kasan/Makefile   |   2 +
 arch/powerpc/mm/kasan/kasan_init_book3e_64.c |  50 
 arch/powerpc/mm/kasan/kasan_init_book3s_64.c |  67 +++
 arch/powerpc/mm/nohash/Makefile  |   5 +
 include/linux/kasan.h|  13 +++
 mm/kasan/generic.c   |   9 +-
 mm/kasan/generic_report.c|   2 +-
 mm/kasan/init.c  |  10 ++
 mm/kasan/kasan.h |   6 +-
 mm/kasan/report.c|   6 +-
 mm/kasan/tags.c  |   3 +-
 16 files changed, 345 insertions(+), 10 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3e_64.c
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3s_64.c

-- 
2.19.1

Re: [PATCH 11/12] powerpc/pseries/svm: Force SWIOTLB for secure guests

2019-05-22 Thread Thiago Jung Bauermann



Hello Christoph,

Thanks for reviewing the patch!

Christoph Hellwig  writes:

>> diff --git a/arch/powerpc/include/asm/mem_encrypt.h 
>> b/arch/powerpc/include/asm/mem_encrypt.h
>> new file mode 100644
>> index ..45d5e4d0e6e0
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/mem_encrypt.h
>> @@ -0,0 +1,19 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ */
>> +/*
>> + * SVM helper functions
>> + *
>> + * Copyright 2019 IBM Corporation
>> + */
>> +
>> +#ifndef _ASM_POWERPC_MEM_ENCRYPT_H
>> +#define _ASM_POWERPC_MEM_ENCRYPT_H
>> +
>> +#define sme_me_mask 0ULL
>> +
>> +static inline bool sme_active(void) { return false; }
>> +static inline bool sev_active(void) { return false; }
>> +
>> +int set_memory_encrypted(unsigned long addr, int numpages);
>> +int set_memory_decrypted(unsigned long addr, int numpages);
>> +
>> +#endif /* _ASM_POWERPC_MEM_ENCRYPT_H */
>
> S/390 seems to be adding a stub header just like this.  Can you please
> clean up the Kconfig and generic headers bits for memory encryption so
> that we don't need all this boilerplate code?

Yes, that's a good idea. Will do.

>>  config PPC_SVM
>>  bool "Secure virtual machine (SVM) support for POWER"
>>  depends on PPC_PSERIES
>> +select SWIOTLB
>> +select ARCH_HAS_MEM_ENCRYPT
>>  default n
>
> n is the default default, no need to explictly specify it.

Indeed. Changed for the next version.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH] powerpc: Fix loading of kernel + initramfs with kexec_file_load()

2019-05-22 Thread Thiago Jung Bauermann



Dave Young  writes:

> On 05/22/19 at 07:01pm, Thiago Jung Bauermann wrote:
>> Commit b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
>> changed kexec_add_buffer() to skip searching for a memory location if
>> kexec_buf.mem is already set, and use the address that is there.
>> 
>> In powerpc code we reuse a kexec_buf variable for loading both the kernel
>> and the initramfs by resetting some of the fields between those uses, but
>> not mem. This causes kexec_add_buffer() to try to load the kernel at the
>> same address where initramfs will be loaded, which is naturally rejected:
>> 
>>   # kexec -s -l --initrd initramfs vmlinuz
>>   kexec_file_load failed: Invalid argument
>> 
>> Setting the mem field before every call to kexec_add_buffer() fixes this
>> regression.
>> 
>> Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>  arch/powerpc/kernel/kexec_elf_64.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> Reviewed-by: Dave Young 

Thanks!

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH] powerpc: Fix loading of kernel + initramfs with kexec_file_load()

2019-05-22 Thread Dave Young

On 05/22/19 at 07:01pm, Thiago Jung Bauermann wrote:
> Commit b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
> changed kexec_add_buffer() to skip searching for a memory location if
> kexec_buf.mem is already set, and use the address that is there.
> 
> In powerpc code we reuse a kexec_buf variable for loading both the kernel
> and the initramfs by resetting some of the fields between those uses, but
> not mem. This causes kexec_add_buffer() to try to load the kernel at the
> same address where initramfs will be loaded, which is naturally rejected:
> 
>   # kexec -s -l --initrd initramfs vmlinuz
>   kexec_file_load failed: Invalid argument
> 
> Setting the mem field before every call to kexec_add_buffer() fixes this
> regression.
> 
> Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/kernel/kexec_elf_64.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index ba4f18a43ee8..52a29fc73730 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -547,6 +547,7 @@ static int elf_exec_load(struct kimage *image, struct 
> elfhdr *ehdr,
>   kbuf.memsz = phdr->p_memsz;
>   kbuf.buf_align = phdr->p_align;
>   kbuf.buf_min = phdr->p_paddr + base;
> + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out;
> @@ -581,7 +582,8 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> .buf_max = ppc64_rma_size };
>   struct kexec_buf pbuf = { .image = image, .buf_min = 0,
> -   .buf_max = ppc64_rma_size, .top_down = true };
> +   .buf_max = ppc64_rma_size, .top_down = true,
> +   .mem = KEXEC_BUF_MEM_UNKNOWN };
>  
>   ret = build_elf_exec_info(kernel_buf, kernel_len, , _info);
>   if (ret)
> @@ -606,6 +608,7 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = initrd_len;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = false;
> + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out;
> @@ -638,6 +641,7 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = fdt_size;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = true;
> + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out;
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Reviewed-by: Dave Young 

Thanks
Dave

[PATCH] powerpc/powernv: fix variable "c" set but not used

2019-05-22 Thread Qian Cai

The commit 58629c0dc349 ("powerpc/powernv/npu: Fault user page into the
hypervisor's pagetable") introduced a variable "c" to be used in
__get_user() and __get_user_nocheck() which need to stay as macros for
performance reasons, and "c" is not actually used in
pnv_npu2_handle_fault(),

arch/powerpc/platforms/powernv/npu-dma.c: In function 'pnv_npu2_handle_fault':
arch/powerpc/platforms/powernv/npu-dma.c:1122:7: warning: variable 'c'
set but not used [-Wunused-but-set-variable]

Fixed it by appending the __maybe_unused attribute, so compilers would
ignore it.

Signed-off-by: Qian Cai 
---
 arch/powerpc/platforms/powernv/npu-dma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 495550432f3d..5bbe59573ee6 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -1119,7 +1119,8 @@ int pnv_npu2_handle_fault(struct npu_context *context, 
uintptr_t *ea,
int i, is_write;
struct page *page[1];
const char __user *u;
-   char c;
+   /* To silence a -Wunused-but-set-variable warning. */
+   char c __maybe_unused;
 
/* mmap_sem should be held so the struct_mm must be present */
struct mm_struct *mm = context->mm;
-- 
2.20.1 (Apple Git-117)

[PATCH] powerpc: pseries/hvconsole: fix stack overread

2019-05-22 Thread Daniel Axtens

While developing kasan for 64-bit book3s, I hit the following stack
over-read.

It occurs because the hypercall to put characters onto the terminal
takes 2 longs (128 bits/16 bytes) of characters at a time, and so
hvc_put_chars would unconditionally copy 16 bytes from the argument
buffer, regardless of supplied length. However, sometimes
hvc_put_chars is called with less than 16 characters, leading to the
error.

Use memcpy to copy the correct length.

==
BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0x44/0xc0
Read of size 8 at addr c169fac0 by task swapper/0

CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-00065-g7e26a58cb076 #43
Call Trace:
[c169f770] [c0e83900] dump_stack+0xc4/0x114 (unreliable)
[c169f7c0] [c03f3034] print_address_description+0xd0/0x3cc
[c169f850] [c03f2c0c] kasan_report+0x20c/0x224
[c169f920] [c03f4808] __asan_load8+0x198/0x330
[c169f9c0] [c00d7264] hvc_put_chars+0x44/0xc0
[c169fa40] [c089b998] hvterm_raw_put_chars+0x78/0xb0
[c169fa80] [c089bff0] udbg_hvc_putc+0x110/0x1a0
[c169fb30] [c0036610] udbg_write+0xa0/0x1a0
[c169fb80] [c01b9cd4] console_unlock+0x694/0x810
[c169fc80] [c01bc5ec] vprintk_emit+0x24c/0x310
[c169fcf0] [c01bde04] vprintk_func+0xd4/0x250
[c169fd40] [c01bd088] printk+0x38/0x4c
[c169fd60] [c12ec4a0] kasan_init+0x330/0x350
[c169fde0] [c12dc304] setup_arch+0x4b4/0x504
[c169fe70] [c12d3e50] start_kernel+0x10c/0x868
[c169ff90] [c000b360] start_here_common+0x1c/0x53c

Memory state around the buggy address:
 c169f980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c169fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>c169fa80: 00 00 00 00 f1 f1 f1 f1 01 f2 f2 f2 00 00 00 00
   ^
 c169fb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c169fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==

CC: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/platforms/pseries/hvconsole.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hvconsole.c 
b/arch/powerpc/platforms/pseries/hvconsole.c
index 74da18de853a..c39907b635eb 100644
--- a/arch/powerpc/platforms/pseries/hvconsole.c
+++ b/arch/powerpc/platforms/pseries/hvconsole.c
@@ -67,9 +67,10 @@ EXPORT_SYMBOL(hvc_get_chars);
  */
 int hvc_put_chars(uint32_t vtermno, const char *buf, int count)
 {
-   unsigned long *lbuf = (unsigned long *) buf;
+   unsigned long lbuf[2];
long ret;
 
+   memcpy(lbuf, buf, count);
 
/* hcall will ret H_PARAMETER if 'count' exceeds firmware max.*/
if (count > MAX_VIO_PUT_CHARS)
-- 
2.19.1

[PATCH] powerpc: Fix loading of kernel + initramfs with kexec_file_load()

2019-05-22 Thread Thiago Jung Bauermann

Commit b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
changed kexec_add_buffer() to skip searching for a memory location if
kexec_buf.mem is already set, and use the address that is there.

In powerpc code we reuse a kexec_buf variable for loading both the kernel
and the initramfs by resetting some of the fields between those uses, but
not mem. This causes kexec_add_buffer() to try to load the kernel at the
same address where initramfs will be loaded, which is naturally rejected:

  # kexec -s -l --initrd initramfs vmlinuz
  kexec_file_load failed: Invalid argument

Setting the mem field before every call to kexec_add_buffer() fixes this
regression.

Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/kernel/kexec_elf_64.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
b/arch/powerpc/kernel/kexec_elf_64.c
index ba4f18a43ee8..52a29fc73730 100644
--- a/arch/powerpc/kernel/kexec_elf_64.c
+++ b/arch/powerpc/kernel/kexec_elf_64.c
@@ -547,6 +547,7 @@ static int elf_exec_load(struct kimage *image, struct 
elfhdr *ehdr,
kbuf.memsz = phdr->p_memsz;
kbuf.buf_align = phdr->p_align;
kbuf.buf_min = phdr->p_paddr + base;
+   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer();
if (ret)
goto out;
@@ -581,7 +582,8 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
  .buf_max = ppc64_rma_size };
struct kexec_buf pbuf = { .image = image, .buf_min = 0,
- .buf_max = ppc64_rma_size, .top_down = true };
+ .buf_max = ppc64_rma_size, .top_down = true,
+ .mem = KEXEC_BUF_MEM_UNKNOWN };
 
ret = build_elf_exec_info(kernel_buf, kernel_len, , _info);
if (ret)
@@ -606,6 +608,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
kbuf.bufsz = kbuf.memsz = initrd_len;
kbuf.buf_align = PAGE_SIZE;
kbuf.top_down = false;
+   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer();
if (ret)
goto out;
@@ -638,6 +641,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
kbuf.bufsz = kbuf.memsz = fdt_size;
kbuf.buf_align = PAGE_SIZE;
kbuf.top_down = true;
+   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer();
if (ret)
goto out;

Re: [PATCH] misc: remove redundant 'default n' from Kconfig-s

2019-05-22 Thread Arnd Bergmann

On Mon, May 20, 2019 at 4:10 PM Bartlomiej Zolnierkiewicz
 wrote:
>
> 'default n' is the default value for any bool or tristate Kconfig
> setting so there is no need to write it explicitly.
>
> Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
> is not set' for visible symbols") the Kconfig behavior is the same
> regardless of 'default n' being present or not:
>
> ...
> One side effect of (and the main motivation for) this change is making
> the following two definitions behave exactly the same:
>
> config FOO
> bool
>
> config FOO
> bool
> default n
>
> With this change, neither of these will generate a
> '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
> That might make it clearer to people that a bare 'default n' is
> redundant.
> ...
>
> Signed-off-by: Bartlomiej Zolnierkiewicz 

Acked-by: Arnd Bergmann

Re: [BISECTED] kexec regression on PowerBook G4

2019-05-22 Thread LEROY Christophe


Aaro Koskinen  a écrit :


Hi,

On Wed, May 22, 2019 at 07:44:56AM +, Christophe Leroy wrote:

On 05/22/2019 06:14 AM, Christophe Leroy wrote:
>Le 22/05/2019 à 00:18, Aaro Koskinen a écrit :
>>I was trying to upgrade from v5.0 -> v5.1 on PowerBook G4, but when
>>trying
>>to kexec a kernel the system gets stuck (no errors seen on the console).
>
>Do you mean you are trying to kexec a v5.1 kernel from a v5.0 kernel, or
>do you have a working v5.1 kernel, but kexec doesn't work with it ?
>
>>
>>Bisected to: 93c4a162b014 ("powerpc/6xx: Store PGDIR physical address
>>in a SPRG"). This commit doesn't revert cleanly anymore but I tested
>>that the one before works OK.
>
>Not sure that's the problem. There was a problem with that commit, but it
>was fixed by 4622a2d43101 ("powerpc/6xx: fix setup and use of
>SPRN_SPRG_PGDIR for hash32").
>You probably hit some commit between those two during bisect, that's
>likely the reason why you ended here.
>
>Can you restart your bisect from 4622a2d43101 ?
>
>If you have CONFIG_SMP, maybe you should also consider taking 397d2300b08c
>("powerpc/32s: fix flush_hash_pages() on SMP"). Stable 5.1.4 includes it.
>
>>
>>With current Linus HEAD (9c7db5004280), it gets a bit further but still
>>doesn't work: now I get an error on the console after kexec "Starting
>>new kernel! ... Bye!":
>>
>>kernel tried to execute exec-protected page (...) - exploit attempt?
>
>Interesting.
>
>Do you have CONFIG_STRICT_KERNEL_RWX=y in your .config ? If so, can you
>retry without it ?

After looking at the code, I don't thing CONFIG_STRICT_KERNEL_RWX will make
any difference. Can you try the patch below ?


Doesn't help (git refuses the patch as corrupted, so I had to do those
changes manually, but I'm pretty sure I got it right).

I still get the "kernel tried to execute exec-protected page...". What
should I try next?


Can you provide full details of the Oops you get ? And also your System.map ?
K
Also build with CONFIG_PPC_PTDUMP and mount /sys/kernel/debug and give  
content of /sys/kernel/debug/powerpc/block_address_translation and  
.../segment_registers before the failing kexec, and also  
/sys/kernel/debug/kernel_page_tables


Thx
Christophe



A.


From 8c1039da0d0f26cdf995156a905fc97fe7bda36c Mon Sep 17 00:00:00 2001
From: Christophe Leroy 
Date: Wed, 22 May 2019 07:28:42 +
Subject: [PATCH] Fix Kexec

---
 arch/powerpc/include/asm/pgtable.h | 2 ++
 arch/powerpc/kernel/machine_kexec_32.c | 4 
 arch/powerpc/mm/pgtable_32.c   | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable.h
b/arch/powerpc/include/asm/pgtable.h
index 3f53be60fb01..642eea937229 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -140,6 +140,8 @@ static inline void pte_frag_set(mm_context_t *ctx, void
*p)
 }
 #endif

+int change_page_attr(struct page *page, int numpages, pgprot_t prot);
+
 #endif /* __ASSEMBLY__ */

 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/kernel/machine_kexec_32.c
b/arch/powerpc/kernel/machine_kexec_32.c
index affe5dcce7f4..4f719501e6ae 100644
--- a/arch/powerpc/kernel/machine_kexec_32.c
+++ b/arch/powerpc/kernel/machine_kexec_32.c
@@ -54,6 +54,10 @@ void default_machine_kexec(struct kimage *image)
memcpy((void *)reboot_code_buffer, relocate_new_kernel,
relocate_new_kernel_size);

+   change_page_attr(image->control_code_page,
+ALIGN(KEXEC_CONTROL_PAGE_SIZE, PAGE_SIZE) >> 
PAGE_SHIFT,
+PAGE_KERNEL_TEXT);
+
flush_icache_range(reboot_code_buffer,
reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
printk(KERN_INFO "Bye!\n");
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 16ada373b32b..0e4651d803fc 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -340,7 +340,7 @@ static int __change_page_attr_noflush(struct page *page,
pgprot_t prot)
  *
  * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
  */
-static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
+int change_page_attr(struct page *page, int numpages, pgprot_t prot)
 {
int i, err = 0;
unsigned long flags;
--
2.13.3

[Bug 203125] Kernel 5.1-rc1 fails to boot on a PowerMac G4 3,6: Caused by (from SRR1=141020): Transfer error ack signal

2019-05-22 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=203125

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |CODE_FIX

--- Comment #11 from Erhard F. (erhar...@mailbox.org) ---
Your fix landed in 5.1.4 stable now, the G4 boots fine again. Thanks!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [BISECTED] kexec regression on PowerBook G4

2019-05-22 Thread Aaro Koskinen

Hi,

On Wed, May 22, 2019 at 07:44:56AM +, Christophe Leroy wrote:
> On 05/22/2019 06:14 AM, Christophe Leroy wrote:
> >Le 22/05/2019 à 00:18, Aaro Koskinen a écrit :
> >>I was trying to upgrade from v5.0 -> v5.1 on PowerBook G4, but when
> >>trying
> >>to kexec a kernel the system gets stuck (no errors seen on the console).
> >
> >Do you mean you are trying to kexec a v5.1 kernel from a v5.0 kernel, or
> >do you have a working v5.1 kernel, but kexec doesn't work with it ?
> >
> >>
> >>Bisected to: 93c4a162b014 ("powerpc/6xx: Store PGDIR physical address
> >>in a SPRG"). This commit doesn't revert cleanly anymore but I tested
> >>that the one before works OK.
> >
> >Not sure that's the problem. There was a problem with that commit, but it
> >was fixed by 4622a2d43101 ("powerpc/6xx: fix setup and use of
> >SPRN_SPRG_PGDIR for hash32").
> >You probably hit some commit between those two during bisect, that's
> >likely the reason why you ended here.
> >
> >Can you restart your bisect from 4622a2d43101 ?
> >
> >If you have CONFIG_SMP, maybe you should also consider taking 397d2300b08c
> >("powerpc/32s: fix flush_hash_pages() on SMP"). Stable 5.1.4 includes it.
> >
> >>
> >>With current Linus HEAD (9c7db5004280), it gets a bit further but still
> >>doesn't work: now I get an error on the console after kexec "Starting
> >>new kernel! ... Bye!":
> >>
> >>kernel tried to execute exec-protected page (...) - exploit attempt?
> >
> >Interesting.
> >
> >Do you have CONFIG_STRICT_KERNEL_RWX=y in your .config ? If so, can you
> >retry without it ?
> 
> After looking at the code, I don't thing CONFIG_STRICT_KERNEL_RWX will make
> any difference. Can you try the patch below ?

Doesn't help (git refuses the patch as corrupted, so I had to do those
changes manually, but I'm pretty sure I got it right).

I still get the "kernel tried to execute exec-protected page...". What
should I try next?

A.

> From 8c1039da0d0f26cdf995156a905fc97fe7bda36c Mon Sep 17 00:00:00 2001
> From: Christophe Leroy 
> Date: Wed, 22 May 2019 07:28:42 +
> Subject: [PATCH] Fix Kexec
> 
> ---
>  arch/powerpc/include/asm/pgtable.h | 2 ++
>  arch/powerpc/kernel/machine_kexec_32.c | 4 
>  arch/powerpc/mm/pgtable_32.c   | 2 +-
>  3 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable.h
> b/arch/powerpc/include/asm/pgtable.h
> index 3f53be60fb01..642eea937229 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -140,6 +140,8 @@ static inline void pte_frag_set(mm_context_t *ctx, void
> *p)
>  }
>  #endif
> 
> +int change_page_attr(struct page *page, int numpages, pgprot_t prot);
> +
>  #endif /* __ASSEMBLY__ */
> 
>  #endif /* _ASM_POWERPC_PGTABLE_H */
> diff --git a/arch/powerpc/kernel/machine_kexec_32.c
> b/arch/powerpc/kernel/machine_kexec_32.c
> index affe5dcce7f4..4f719501e6ae 100644
> --- a/arch/powerpc/kernel/machine_kexec_32.c
> +++ b/arch/powerpc/kernel/machine_kexec_32.c
> @@ -54,6 +54,10 @@ void default_machine_kexec(struct kimage *image)
>   memcpy((void *)reboot_code_buffer, relocate_new_kernel,
>   relocate_new_kernel_size);
> 
> + change_page_attr(image->control_code_page,
> +  ALIGN(KEXEC_CONTROL_PAGE_SIZE, PAGE_SIZE) >> 
> PAGE_SHIFT,
> +  PAGE_KERNEL_TEXT);
> +
>   flush_icache_range(reboot_code_buffer,
>   reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
>   printk(KERN_INFO "Bye!\n");
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 16ada373b32b..0e4651d803fc 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -340,7 +340,7 @@ static int __change_page_attr_noflush(struct page *page,
> pgprot_t prot)
>   *
>   * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
>   */
> -static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
> +int change_page_attr(struct page *page, int numpages, pgprot_t prot)
>  {
>   int i, err = 0;
>   unsigned long flags;
> -- 
> 2.13.3

Re: [BISECTED] kexec regression on PowerBook G4

2019-05-22 Thread Aaro Koskinen

Hi,

On Wed, May 22, 2019 at 08:14:23AM +0200, Christophe Leroy wrote:
> Le 22/05/2019 à 00:18, Aaro Koskinen a écrit :
> >I was trying to upgrade from v5.0 -> v5.1 on PowerBook G4, but when trying
> >to kexec a kernel the system gets stuck (no errors seen on the console).
> 
> Do you mean you are trying to kexec a v5.1 kernel from a v5.0 kernel, or do
> you have a working v5.1 kernel, but kexec doesn't work with it ?

To summarize, my system's boot goes like this:

Open Firmware -> kernel A (small due to OF limit) -> (kexec) -> kernel B (big)

First both A & B were at v5.0 ==> boot works.
Then I upgraded B to v5.1 ==> boot works.
Then I upgraded A to v5.1 ==> boot fails.

So the issue must be in A. So after bisecting I got the following result:

Kernel A with commit 93c4a162b014 ==> fails
Kernel A with commit 93c4a162b014^1 ==> works

n >Bisected to: 93c4a162b014 ("powerpc/6xx: Store PGDIR physical address
> >in a SPRG"). This commit doesn't revert cleanly anymore but I tested
> >that the one before works OK.
> 
> Not sure that's the problem. There was a problem with that commit, but it
> was fixed by 4622a2d43101 ("powerpc/6xx: fix setup and use of
> SPRN_SPRG_PGDIR for hash32").
> You probably hit some commit between those two during bisect, that's likely
> the reason why you ended here.
> 
> Can you restart your bisect from 4622a2d43101 ?

This is not a good commit to start with, as it already gives "kernel
tried to execute exec protected page..." after the "Bye!" message.

> If you have CONFIG_SMP, maybe you should also consider taking 397d2300b08c
> ("powerpc/32s: fix flush_hash_pages() on SMP"). Stable 5.1.4 includes it.

This is UP computer and CONFIG_SMP is not set.

> >With current Linus HEAD (9c7db5004280), it gets a bit further but still
> >doesn't work: now I get an error on the console after kexec "Starting
> >new kernel! ... Bye!":
> >
> > kernel tried to execute exec-protected page (...) - exploit attempt?
> 
> Interesting.
> 
> Do you have CONFIG_STRICT_KERNEL_RWX=y in your .config ? If so, can you
> retry without it ?

I don't set that option.

A.

[PATCH AUTOSEL 4.4 13/92] ASoC: fsl_sai: Update is_slave_mode with correct value

2019-05-22 Thread Sasha Levin

From: Daniel Baluta 

[ Upstream commit ddb351145a967ee791a0fb0156852ec2fcb746ba ]

is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.

Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
although SAI being master it should be set to false.

Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.

Signed-off-by: Daniel Baluta 
Acked-by: Nicolin Chen 
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_sai.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 08b460ba06efc..61d2d955f26a6 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -260,12 +260,14 @@ static int fsl_sai_set_dai_fmt_tr(struct snd_soc_dai 
*cpu_dai,
case SND_SOC_DAIFMT_CBS_CFS:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFM:
sai->is_slave_mode = true;
break;
case SND_SOC_DAIFMT_CBS_CFM:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFS:
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
-- 
2.20.1

[PATCH AUTOSEL 4.4 04/92] powerpc/boot: Fix missing check of lseek() return value

2019-05-22 Thread Sasha Levin

From: Bo YU 

[ Upstream commit 5d085ec04a000fefb5182d3b03ee46ca96d8389b ]

This is detected by Coverity scan: CID: 1440481

Signed-off-by: Bo YU 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/addnote.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/addnote.c b/arch/powerpc/boot/addnote.c
index 9d9f6f334d3cc..3da3e2b1b51bc 100644
--- a/arch/powerpc/boot/addnote.c
+++ b/arch/powerpc/boot/addnote.c
@@ -223,7 +223,11 @@ main(int ac, char **av)
PUT_16(E_PHNUM, np + 2);
 
/* write back */
-   lseek(fd, (long) 0, SEEK_SET);
+   i = lseek(fd, (long) 0, SEEK_SET);
+   if (i < 0) {
+   perror("lseek");
+   exit(1);
+   }
i = write(fd, buf, n);
if (i < 0) {
perror("write");
-- 
2.20.1

[PATCH AUTOSEL 4.9 022/114] ASoC: fsl_sai: Update is_slave_mode with correct value

2019-05-22 Thread Sasha Levin

From: Daniel Baluta 

[ Upstream commit ddb351145a967ee791a0fb0156852ec2fcb746ba ]

is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.

Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
although SAI being master it should be set to false.

Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.

Signed-off-by: Daniel Baluta 
Acked-by: Nicolin Chen 
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_sai.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 9fadf7e31c5f8..cb43f57f978b1 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -274,12 +274,14 @@ static int fsl_sai_set_dai_fmt_tr(struct snd_soc_dai 
*cpu_dai,
case SND_SOC_DAIFMT_CBS_CFS:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFM:
sai->is_slave_mode = true;
break;
case SND_SOC_DAIFMT_CBS_CFM:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFS:
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
-- 
2.20.1

[PATCH AUTOSEL 4.9 008/114] powerpc/boot: Fix missing check of lseek() return value

2019-05-22 Thread Sasha Levin

From: Bo YU 

[ Upstream commit 5d085ec04a000fefb5182d3b03ee46ca96d8389b ]

This is detected by Coverity scan: CID: 1440481

Signed-off-by: Bo YU 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/addnote.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/addnote.c b/arch/powerpc/boot/addnote.c
index 9d9f6f334d3cc..3da3e2b1b51bc 100644
--- a/arch/powerpc/boot/addnote.c
+++ b/arch/powerpc/boot/addnote.c
@@ -223,7 +223,11 @@ main(int ac, char **av)
PUT_16(E_PHNUM, np + 2);
 
/* write back */
-   lseek(fd, (long) 0, SEEK_SET);
+   i = lseek(fd, (long) 0, SEEK_SET);
+   if (i < 0) {
+   perror("lseek");
+   exit(1);
+   }
i = write(fd, buf, n);
if (i < 0) {
perror("write");
-- 
2.20.1

[PATCH AUTOSEL 4.14 037/167] ASoC: fsl_sai: Update is_slave_mode with correct value

2019-05-22 Thread Sasha Levin

From: Daniel Baluta 

[ Upstream commit ddb351145a967ee791a0fb0156852ec2fcb746ba ]

is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.

Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
although SAI being master it should be set to false.

Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.

Signed-off-by: Daniel Baluta 
Acked-by: Nicolin Chen 
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_sai.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 18e5ce81527d2..c1c733b573a7f 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -274,12 +274,14 @@ static int fsl_sai_set_dai_fmt_tr(struct snd_soc_dai 
*cpu_dai,
case SND_SOC_DAIFMT_CBS_CFS:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFM:
sai->is_slave_mode = true;
break;
case SND_SOC_DAIFMT_CBS_CFM:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFS:
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
-- 
2.20.1

[PATCH AUTOSEL 4.14 015/167] powerpc/boot: Fix missing check of lseek() return value

2019-05-22 Thread Sasha Levin

From: Bo YU 

[ Upstream commit 5d085ec04a000fefb5182d3b03ee46ca96d8389b ]

This is detected by Coverity scan: CID: 1440481

Signed-off-by: Bo YU 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/addnote.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/addnote.c b/arch/powerpc/boot/addnote.c
index 9d9f6f334d3cc..3da3e2b1b51bc 100644
--- a/arch/powerpc/boot/addnote.c
+++ b/arch/powerpc/boot/addnote.c
@@ -223,7 +223,11 @@ main(int ac, char **av)
PUT_16(E_PHNUM, np + 2);
 
/* write back */
-   lseek(fd, (long) 0, SEEK_SET);
+   i = lseek(fd, (long) 0, SEEK_SET);
+   if (i < 0) {
+   perror("lseek");
+   exit(1);
+   }
i = write(fd, buf, n);
if (i < 0) {
perror("write");
-- 
2.20.1

[PATCH AUTOSEL 4.14 014/167] powerpc/perf: Return accordingly on invalid chip-id in

2019-05-22 Thread Sasha Levin

From: Anju T Sudhakar 

[ Upstream commit a913e5e8b43be1d3897a141ce61c1ec071cad89c ]

Nest hardware counter memory resides in a per-chip reserve-memory.
During nest_imc_event_init(), chip-id of the event-cpu is considered to
calculate the base memory addresss for that cpu. Return, proper error
condition if the chip_id calculated is invalid.

Reported-by: Dan Carpenter 
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Reviewed-by: Madhavan Srinivasan 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/perf/imc-pmu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index b73961b95c345..994e4392cac5c 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -481,6 +481,11 @@ static int nest_imc_event_init(struct perf_event *event)
 * Get the base memory addresss for this cpu.
 */
chip_id = cpu_to_chip_id(event->cpu);
+
+   /* Return, if chip_id is not valid */
+   if (chip_id < 0)
+   return -ENODEV;
+
pcni = pmu->mem_info;
do {
if (pcni->id == chip_id) {
-- 
2.20.1

[PATCH AUTOSEL 4.19 057/244] ASoC: fsl_sai: Update is_slave_mode with correct value

2019-05-22 Thread Sasha Levin

From: Daniel Baluta 

[ Upstream commit ddb351145a967ee791a0fb0156852ec2fcb746ba ]

is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.

Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
although SAI being master it should be set to false.

Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.

Signed-off-by: Daniel Baluta 
Acked-by: Nicolin Chen 
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_sai.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 4163f2cfc06fc..bfc5b21d0c3f9 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -268,12 +268,14 @@ static int fsl_sai_set_dai_fmt_tr(struct snd_soc_dai 
*cpu_dai,
case SND_SOC_DAIFMT_CBS_CFS:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFM:
sai->is_slave_mode = true;
break;
case SND_SOC_DAIFMT_CBS_CFM:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFS:
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
-- 
2.20.1

[PATCH AUTOSEL 4.19 033/244] powerpc/watchdog: Use hrtimers for per-CPU heartbeat

2019-05-22 Thread Sasha Levin

From: Nicholas Piggin 

[ Upstream commit 7ae3f6e130e8dc6188b59e3b4ebc2f16e9c8d053 ]

Using a jiffies timer creates a dependency on the tick_do_timer_cpu
incrementing jiffies. If that CPU has locked up and jiffies is not
incrementing, the watchdog heartbeat timer for all CPUs stops and
creates false positives and confusing warnings on local CPUs, and
also causes the SMP detector to stop, so the root cause is never
detected.

Fix this by using hrtimer based timers for the watchdog heartbeat,
like the generic kernel hardlockup detector.

Cc: Gautham R. Shenoy 
Reported-by: Ravikumar Bangoria 
Signed-off-by: Nicholas Piggin 
Tested-by: Ravi Bangoria 
Reported-by: Ravi Bangoria 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/watchdog.c | 81 +-
 1 file changed, 40 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 3c6ab22a0c4e3..af3c15a1d41eb 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -77,7 +77,7 @@ static u64 wd_smp_panic_timeout_tb __read_mostly; /* panic 
other CPUs */
 
 static u64 wd_timer_period_ms __read_mostly;  /* interval between heartbeat */
 
-static DEFINE_PER_CPU(struct timer_list, wd_timer);
+static DEFINE_PER_CPU(struct hrtimer, wd_hrtimer);
 static DEFINE_PER_CPU(u64, wd_timer_tb);
 
 /* SMP checker bits */
@@ -293,21 +293,21 @@ void soft_nmi_interrupt(struct pt_regs *regs)
nmi_exit();
 }
 
-static void wd_timer_reset(unsigned int cpu, struct timer_list *t)
-{
-   t->expires = jiffies + msecs_to_jiffies(wd_timer_period_ms);
-   if (wd_timer_period_ms > 1000)
-   t->expires = __round_jiffies_up(t->expires, cpu);
-   add_timer_on(t, cpu);
-}
-
-static void wd_timer_fn(struct timer_list *t)
+static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
int cpu = smp_processor_id();
 
+   if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+   return HRTIMER_NORESTART;
+
+   if (!cpumask_test_cpu(cpu, _cpumask))
+   return HRTIMER_NORESTART;
+
watchdog_timer_interrupt(cpu);
 
-   wd_timer_reset(cpu, t);
+   hrtimer_forward_now(hrtimer, ms_to_ktime(wd_timer_period_ms));
+
+   return HRTIMER_RESTART;
 }
 
 void arch_touch_nmi_watchdog(void)
@@ -323,37 +323,22 @@ void arch_touch_nmi_watchdog(void)
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
-static void start_watchdog_timer_on(unsigned int cpu)
-{
-   struct timer_list *t = per_cpu_ptr(_timer, cpu);
-
-   per_cpu(wd_timer_tb, cpu) = get_tb();
-
-   timer_setup(t, wd_timer_fn, TIMER_PINNED);
-   wd_timer_reset(cpu, t);
-}
-
-static void stop_watchdog_timer_on(unsigned int cpu)
-{
-   struct timer_list *t = per_cpu_ptr(_timer, cpu);
-
-   del_timer_sync(t);
-}
-
-static int start_wd_on_cpu(unsigned int cpu)
+static void start_watchdog(void *arg)
 {
+   struct hrtimer *hrtimer = this_cpu_ptr(_hrtimer);
+   int cpu = smp_processor_id();
unsigned long flags;
 
if (cpumask_test_cpu(cpu, _cpus_enabled)) {
WARN_ON(1);
-   return 0;
+   return;
}
 
if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-   return 0;
+   return;
 
if (!cpumask_test_cpu(cpu, _cpumask))
-   return 0;
+   return;
 
wd_smp_lock();
cpumask_set_cpu(cpu, _cpus_enabled);
@@ -363,27 +348,40 @@ static int start_wd_on_cpu(unsigned int cpu)
}
wd_smp_unlock();
 
-   start_watchdog_timer_on(cpu);
+   *this_cpu_ptr(_timer_tb) = get_tb();
 
-   return 0;
+   hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer->function = watchdog_timer_fn;
+   hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms),
+ HRTIMER_MODE_REL_PINNED);
 }
 
-static int stop_wd_on_cpu(unsigned int cpu)
+static int start_watchdog_on_cpu(unsigned int cpu)
 {
+   return smp_call_function_single(cpu, start_watchdog, NULL, true);
+}
+
+static void stop_watchdog(void *arg)
+{
+   struct hrtimer *hrtimer = this_cpu_ptr(_hrtimer);
+   int cpu = smp_processor_id();
unsigned long flags;
 
if (!cpumask_test_cpu(cpu, _cpus_enabled))
-   return 0; /* Can happen in CPU unplug case */
+   return; /* Can happen in CPU unplug case */
 
-   stop_watchdog_timer_on(cpu);
+   hrtimer_cancel(hrtimer);
 
wd_smp_lock();
cpumask_clear_cpu(cpu, _cpus_enabled);
wd_smp_unlock();
 
wd_smp_clear_cpu_pending(cpu, get_tb());
+}
 
-   return 0;
+static int stop_watchdog_on_cpu(unsigned int cpu)
+{
+   return smp_call_function_single(cpu, stop_watchdog, NULL, true);
 }
 
 static void watchdog_calc_timeouts(void)
@@ -402,7 +400,7 @@ void watchdog_nmi_stop(void)
int cpu;
 
for_each_cpu(cpu,

[PATCH AUTOSEL 4.19 024/244] powerpc/perf: Fix loop exit condition in nest_imc_event_init

2019-05-22 Thread Sasha Levin

From: Anju T Sudhakar 

[ Upstream commit 860b7d2286236170a36f94946d03ca9888d32571 ]

The data structure (i.e struct imc_mem_info) to hold the memory address
information for nest imc units is allocated based on the number of nodes
in the system.

nest_imc_event_init() traverse this struct array to calculate the memory
base address for the event-cpu. If we fail to find a match for the event
cpu's chip-id in imc_mem_info struct array, then the do-while loop will
iterate until we crash.

Fix this by changing the loop exit condition based on the number of
non zero vbase elements in the array, since the allocation is done for
nr_chips + 1.

Reported-by: Dan Carpenter 
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar 
Reviewed-by: Madhavan Srinivasan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/perf/imc-pmu.c   | 2 +-
 arch/powerpc/platforms/powernv/opal-imc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 3cebfdf362116..5553226770748 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -508,7 +508,7 @@ static int nest_imc_event_init(struct perf_event *event)
break;
}
pcni++;
-   } while (pcni);
+   } while (pcni->vbase != 0);
 
if (!flag)
return -ENODEV;
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 58a07948c76e7..3d27f02695e41 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -127,7 +127,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
nr_chips))
goto error;
 
-   pmu_ptr->mem_info = kcalloc(nr_chips, sizeof(*pmu_ptr->mem_info),
+   pmu_ptr->mem_info = kcalloc(nr_chips + 1, sizeof(*pmu_ptr->mem_info),
GFP_KERNEL);
if (!pmu_ptr->mem_info)
goto error;
-- 
2.20.1

[PATCH AUTOSEL 4.19 023/244] powerpc/boot: Fix missing check of lseek() return value

2019-05-22 Thread Sasha Levin

From: Bo YU 

[ Upstream commit 5d085ec04a000fefb5182d3b03ee46ca96d8389b ]

This is detected by Coverity scan: CID: 1440481

Signed-off-by: Bo YU 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/addnote.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/addnote.c b/arch/powerpc/boot/addnote.c
index 9d9f6f334d3cc..3da3e2b1b51bc 100644
--- a/arch/powerpc/boot/addnote.c
+++ b/arch/powerpc/boot/addnote.c
@@ -223,7 +223,11 @@ main(int ac, char **av)
PUT_16(E_PHNUM, np + 2);
 
/* write back */
-   lseek(fd, (long) 0, SEEK_SET);
+   i = lseek(fd, (long) 0, SEEK_SET);
+   if (i < 0) {
+   perror("lseek");
+   exit(1);
+   }
i = write(fd, buf, n);
if (i < 0) {
perror("write");
-- 
2.20.1

[PATCH AUTOSEL 4.19 022/244] powerpc/perf: Return accordingly on invalid chip-id in

2019-05-22 Thread Sasha Levin

From: Anju T Sudhakar 

[ Upstream commit a913e5e8b43be1d3897a141ce61c1ec071cad89c ]

Nest hardware counter memory resides in a per-chip reserve-memory.
During nest_imc_event_init(), chip-id of the event-cpu is considered to
calculate the base memory addresss for that cpu. Return, proper error
condition if the chip_id calculated is invalid.

Reported-by: Dan Carpenter 
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Reviewed-by: Madhavan Srinivasan 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/perf/imc-pmu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 1fafc32b12a0f..3cebfdf362116 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -496,6 +496,11 @@ static int nest_imc_event_init(struct perf_event *event)
 * Get the base memory addresss for this cpu.
 */
chip_id = cpu_to_chip_id(event->cpu);
+
+   /* Return, if chip_id is not valid */
+   if (chip_id < 0)
+   return -ENODEV;
+
pcni = pmu->mem_info;
do {
if (pcni->id == chip_id) {
-- 
2.20.1

[PATCH AUTOSEL 5.0 072/317] ASoC: fsl_sai: Update is_slave_mode with correct value

2019-05-22 Thread Sasha Levin

From: Daniel Baluta 

[ Upstream commit ddb351145a967ee791a0fb0156852ec2fcb746ba ]

is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.

Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
although SAI being master it should be set to false.

Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.

Signed-off-by: Daniel Baluta 
Acked-by: Nicolin Chen 
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_sai.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 4163f2cfc06fc..bfc5b21d0c3f9 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -268,12 +268,14 @@ static int fsl_sai_set_dai_fmt_tr(struct snd_soc_dai 
*cpu_dai,
case SND_SOC_DAIFMT_CBS_CFS:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFM:
sai->is_slave_mode = true;
break;
case SND_SOC_DAIFMT_CBS_CFM:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
+   sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFS:
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
-- 
2.20.1

[PATCH AUTOSEL 5.0 042/317] powerpc/watchdog: Use hrtimers for per-CPU heartbeat

2019-05-22 Thread Sasha Levin

From: Nicholas Piggin 

[ Upstream commit 7ae3f6e130e8dc6188b59e3b4ebc2f16e9c8d053 ]

Using a jiffies timer creates a dependency on the tick_do_timer_cpu
incrementing jiffies. If that CPU has locked up and jiffies is not
incrementing, the watchdog heartbeat timer for all CPUs stops and
creates false positives and confusing warnings on local CPUs, and
also causes the SMP detector to stop, so the root cause is never
detected.

Fix this by using hrtimer based timers for the watchdog heartbeat,
like the generic kernel hardlockup detector.

Cc: Gautham R. Shenoy 
Reported-by: Ravikumar Bangoria 
Signed-off-by: Nicholas Piggin 
Tested-by: Ravi Bangoria 
Reported-by: Ravi Bangoria 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/watchdog.c | 81 +-
 1 file changed, 40 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 3c6ab22a0c4e3..af3c15a1d41eb 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -77,7 +77,7 @@ static u64 wd_smp_panic_timeout_tb __read_mostly; /* panic 
other CPUs */
 
 static u64 wd_timer_period_ms __read_mostly;  /* interval between heartbeat */
 
-static DEFINE_PER_CPU(struct timer_list, wd_timer);
+static DEFINE_PER_CPU(struct hrtimer, wd_hrtimer);
 static DEFINE_PER_CPU(u64, wd_timer_tb);
 
 /* SMP checker bits */
@@ -293,21 +293,21 @@ void soft_nmi_interrupt(struct pt_regs *regs)
nmi_exit();
 }
 
-static void wd_timer_reset(unsigned int cpu, struct timer_list *t)
-{
-   t->expires = jiffies + msecs_to_jiffies(wd_timer_period_ms);
-   if (wd_timer_period_ms > 1000)
-   t->expires = __round_jiffies_up(t->expires, cpu);
-   add_timer_on(t, cpu);
-}
-
-static void wd_timer_fn(struct timer_list *t)
+static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
int cpu = smp_processor_id();
 
+   if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+   return HRTIMER_NORESTART;
+
+   if (!cpumask_test_cpu(cpu, _cpumask))
+   return HRTIMER_NORESTART;
+
watchdog_timer_interrupt(cpu);
 
-   wd_timer_reset(cpu, t);
+   hrtimer_forward_now(hrtimer, ms_to_ktime(wd_timer_period_ms));
+
+   return HRTIMER_RESTART;
 }
 
 void arch_touch_nmi_watchdog(void)
@@ -323,37 +323,22 @@ void arch_touch_nmi_watchdog(void)
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
-static void start_watchdog_timer_on(unsigned int cpu)
-{
-   struct timer_list *t = per_cpu_ptr(_timer, cpu);
-
-   per_cpu(wd_timer_tb, cpu) = get_tb();
-
-   timer_setup(t, wd_timer_fn, TIMER_PINNED);
-   wd_timer_reset(cpu, t);
-}
-
-static void stop_watchdog_timer_on(unsigned int cpu)
-{
-   struct timer_list *t = per_cpu_ptr(_timer, cpu);
-
-   del_timer_sync(t);
-}
-
-static int start_wd_on_cpu(unsigned int cpu)
+static void start_watchdog(void *arg)
 {
+   struct hrtimer *hrtimer = this_cpu_ptr(_hrtimer);
+   int cpu = smp_processor_id();
unsigned long flags;
 
if (cpumask_test_cpu(cpu, _cpus_enabled)) {
WARN_ON(1);
-   return 0;
+   return;
}
 
if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-   return 0;
+   return;
 
if (!cpumask_test_cpu(cpu, _cpumask))
-   return 0;
+   return;
 
wd_smp_lock();
cpumask_set_cpu(cpu, _cpus_enabled);
@@ -363,27 +348,40 @@ static int start_wd_on_cpu(unsigned int cpu)
}
wd_smp_unlock();
 
-   start_watchdog_timer_on(cpu);
+   *this_cpu_ptr(_timer_tb) = get_tb();
 
-   return 0;
+   hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer->function = watchdog_timer_fn;
+   hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms),
+ HRTIMER_MODE_REL_PINNED);
 }
 
-static int stop_wd_on_cpu(unsigned int cpu)
+static int start_watchdog_on_cpu(unsigned int cpu)
 {
+   return smp_call_function_single(cpu, start_watchdog, NULL, true);
+}
+
+static void stop_watchdog(void *arg)
+{
+   struct hrtimer *hrtimer = this_cpu_ptr(_hrtimer);
+   int cpu = smp_processor_id();
unsigned long flags;
 
if (!cpumask_test_cpu(cpu, _cpus_enabled))
-   return 0; /* Can happen in CPU unplug case */
+   return; /* Can happen in CPU unplug case */
 
-   stop_watchdog_timer_on(cpu);
+   hrtimer_cancel(hrtimer);
 
wd_smp_lock();
cpumask_clear_cpu(cpu, _cpus_enabled);
wd_smp_unlock();
 
wd_smp_clear_cpu_pending(cpu, get_tb());
+}
 
-   return 0;
+static int stop_watchdog_on_cpu(unsigned int cpu)
+{
+   return smp_call_function_single(cpu, stop_watchdog, NULL, true);
 }
 
 static void watchdog_calc_timeouts(void)
@@ -402,7 +400,7 @@ void watchdog_nmi_stop(void)
int cpu;
 
for_each_cpu(cpu,

[PATCH AUTOSEL 5.0 031/317] powerpc/perf: Fix loop exit condition in nest_imc_event_init

2019-05-22 Thread Sasha Levin

From: Anju T Sudhakar 

[ Upstream commit 860b7d2286236170a36f94946d03ca9888d32571 ]

The data structure (i.e struct imc_mem_info) to hold the memory address
information for nest imc units is allocated based on the number of nodes
in the system.

nest_imc_event_init() traverse this struct array to calculate the memory
base address for the event-cpu. If we fail to find a match for the event
cpu's chip-id in imc_mem_info struct array, then the do-while loop will
iterate until we crash.

Fix this by changing the loop exit condition based on the number of
non zero vbase elements in the array, since the allocation is done for
nr_chips + 1.

Reported-by: Dan Carpenter 
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar 
Reviewed-by: Madhavan Srinivasan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/perf/imc-pmu.c   | 2 +-
 arch/powerpc/platforms/powernv/opal-imc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 4f34c7557bdb7..d1009fe3130b1 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -508,7 +508,7 @@ static int nest_imc_event_init(struct perf_event *event)
break;
}
pcni++;
-   } while (pcni);
+   } while (pcni->vbase != 0);
 
if (!flag)
return -ENODEV;
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 58a07948c76e7..3d27f02695e41 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -127,7 +127,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
nr_chips))
goto error;
 
-   pmu_ptr->mem_info = kcalloc(nr_chips, sizeof(*pmu_ptr->mem_info),
+   pmu_ptr->mem_info = kcalloc(nr_chips + 1, sizeof(*pmu_ptr->mem_info),
GFP_KERNEL);
if (!pmu_ptr->mem_info)
goto error;
-- 
2.20.1

[PATCH AUTOSEL 5.0 030/317] powerpc/boot: Fix missing check of lseek() return value

2019-05-22 Thread Sasha Levin

From: Bo YU 

[ Upstream commit 5d085ec04a000fefb5182d3b03ee46ca96d8389b ]

This is detected by Coverity scan: CID: 1440481

Signed-off-by: Bo YU 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/addnote.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/addnote.c b/arch/powerpc/boot/addnote.c
index 9d9f6f334d3cc..3da3e2b1b51bc 100644
--- a/arch/powerpc/boot/addnote.c
+++ b/arch/powerpc/boot/addnote.c
@@ -223,7 +223,11 @@ main(int ac, char **av)
PUT_16(E_PHNUM, np + 2);
 
/* write back */
-   lseek(fd, (long) 0, SEEK_SET);
+   i = lseek(fd, (long) 0, SEEK_SET);
+   if (i < 0) {
+   perror("lseek");
+   exit(1);
+   }
i = write(fd, buf, n);
if (i < 0) {
perror("write");
-- 
2.20.1

[PATCH AUTOSEL 5.0 029/317] powerpc/perf: Return accordingly on invalid chip-id in

2019-05-22 Thread Sasha Levin

From: Anju T Sudhakar 

[ Upstream commit a913e5e8b43be1d3897a141ce61c1ec071cad89c ]

Nest hardware counter memory resides in a per-chip reserve-memory.
During nest_imc_event_init(), chip-id of the event-cpu is considered to
calculate the base memory addresss for that cpu. Return, proper error
condition if the chip_id calculated is invalid.

Reported-by: Dan Carpenter 
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Reviewed-by: Madhavan Srinivasan 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/perf/imc-pmu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f292a3f284f1c..4f34c7557bdb7 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -496,6 +496,11 @@ static int nest_imc_event_init(struct perf_event *event)
 * Get the base memory addresss for this cpu.
 */
chip_id = cpu_to_chip_id(event->cpu);
+
+   /* Return, if chip_id is not valid */
+   if (chip_id < 0)
+   return -ENODEV;
+
pcni = pmu->mem_info;
do {
if (pcni->id == chip_id) {
-- 
2.20.1

[PATCH AUTOSEL 5.1 049/375] powerpc/watchdog: Use hrtimers for per-CPU heartbeat

2019-05-22 Thread Sasha Levin

From: Nicholas Piggin 

[ Upstream commit 7ae3f6e130e8dc6188b59e3b4ebc2f16e9c8d053 ]

Using a jiffies timer creates a dependency on the tick_do_timer_cpu
incrementing jiffies. If that CPU has locked up and jiffies is not
incrementing, the watchdog heartbeat timer for all CPUs stops and
creates false positives and confusing warnings on local CPUs, and
also causes the SMP detector to stop, so the root cause is never
detected.

Fix this by using hrtimer based timers for the watchdog heartbeat,
like the generic kernel hardlockup detector.

Cc: Gautham R. Shenoy 
Reported-by: Ravikumar Bangoria 
Signed-off-by: Nicholas Piggin 
Tested-by: Ravi Bangoria 
Reported-by: Ravi Bangoria 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/watchdog.c | 81 +-
 1 file changed, 40 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 3c6ab22a0c4e3..af3c15a1d41eb 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -77,7 +77,7 @@ static u64 wd_smp_panic_timeout_tb __read_mostly; /* panic 
other CPUs */
 
 static u64 wd_timer_period_ms __read_mostly;  /* interval between heartbeat */
 
-static DEFINE_PER_CPU(struct timer_list, wd_timer);
+static DEFINE_PER_CPU(struct hrtimer, wd_hrtimer);
 static DEFINE_PER_CPU(u64, wd_timer_tb);
 
 /* SMP checker bits */
@@ -293,21 +293,21 @@ void soft_nmi_interrupt(struct pt_regs *regs)
nmi_exit();
 }
 
-static void wd_timer_reset(unsigned int cpu, struct timer_list *t)
-{
-   t->expires = jiffies + msecs_to_jiffies(wd_timer_period_ms);
-   if (wd_timer_period_ms > 1000)
-   t->expires = __round_jiffies_up(t->expires, cpu);
-   add_timer_on(t, cpu);
-}
-
-static void wd_timer_fn(struct timer_list *t)
+static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
int cpu = smp_processor_id();
 
+   if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+   return HRTIMER_NORESTART;
+
+   if (!cpumask_test_cpu(cpu, _cpumask))
+   return HRTIMER_NORESTART;
+
watchdog_timer_interrupt(cpu);
 
-   wd_timer_reset(cpu, t);
+   hrtimer_forward_now(hrtimer, ms_to_ktime(wd_timer_period_ms));
+
+   return HRTIMER_RESTART;
 }
 
 void arch_touch_nmi_watchdog(void)
@@ -323,37 +323,22 @@ void arch_touch_nmi_watchdog(void)
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
-static void start_watchdog_timer_on(unsigned int cpu)
-{
-   struct timer_list *t = per_cpu_ptr(_timer, cpu);
-
-   per_cpu(wd_timer_tb, cpu) = get_tb();
-
-   timer_setup(t, wd_timer_fn, TIMER_PINNED);
-   wd_timer_reset(cpu, t);
-}
-
-static void stop_watchdog_timer_on(unsigned int cpu)
-{
-   struct timer_list *t = per_cpu_ptr(_timer, cpu);
-
-   del_timer_sync(t);
-}
-
-static int start_wd_on_cpu(unsigned int cpu)
+static void start_watchdog(void *arg)
 {
+   struct hrtimer *hrtimer = this_cpu_ptr(_hrtimer);
+   int cpu = smp_processor_id();
unsigned long flags;
 
if (cpumask_test_cpu(cpu, _cpus_enabled)) {
WARN_ON(1);
-   return 0;
+   return;
}
 
if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-   return 0;
+   return;
 
if (!cpumask_test_cpu(cpu, _cpumask))
-   return 0;
+   return;
 
wd_smp_lock();
cpumask_set_cpu(cpu, _cpus_enabled);
@@ -363,27 +348,40 @@ static int start_wd_on_cpu(unsigned int cpu)
}
wd_smp_unlock();
 
-   start_watchdog_timer_on(cpu);
+   *this_cpu_ptr(_timer_tb) = get_tb();
 
-   return 0;
+   hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer->function = watchdog_timer_fn;
+   hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms),
+ HRTIMER_MODE_REL_PINNED);
 }
 
-static int stop_wd_on_cpu(unsigned int cpu)
+static int start_watchdog_on_cpu(unsigned int cpu)
 {
+   return smp_call_function_single(cpu, start_watchdog, NULL, true);
+}
+
+static void stop_watchdog(void *arg)
+{
+   struct hrtimer *hrtimer = this_cpu_ptr(_hrtimer);
+   int cpu = smp_processor_id();
unsigned long flags;
 
if (!cpumask_test_cpu(cpu, _cpus_enabled))
-   return 0; /* Can happen in CPU unplug case */
+   return; /* Can happen in CPU unplug case */
 
-   stop_watchdog_timer_on(cpu);
+   hrtimer_cancel(hrtimer);
 
wd_smp_lock();
cpumask_clear_cpu(cpu, _cpus_enabled);
wd_smp_unlock();
 
wd_smp_clear_cpu_pending(cpu, get_tb());
+}
 
-   return 0;
+static int stop_watchdog_on_cpu(unsigned int cpu)
+{
+   return smp_call_function_single(cpu, stop_watchdog, NULL, true);
 }
 
 static void watchdog_calc_timeouts(void)
@@ -402,7 +400,7 @@ void watchdog_nmi_stop(void)
int cpu;
 
for_each_cpu(cpu,

[PATCH AUTOSEL 5.1 036/375] powerpc/perf: Fix loop exit condition in nest_imc_event_init

2019-05-22 Thread Sasha Levin

From: Anju T Sudhakar 

[ Upstream commit 860b7d2286236170a36f94946d03ca9888d32571 ]

The data structure (i.e struct imc_mem_info) to hold the memory address
information for nest imc units is allocated based on the number of nodes
in the system.

nest_imc_event_init() traverse this struct array to calculate the memory
base address for the event-cpu. If we fail to find a match for the event
cpu's chip-id in imc_mem_info struct array, then the do-while loop will
iterate until we crash.

Fix this by changing the loop exit condition based on the number of
non zero vbase elements in the array, since the allocation is done for
nr_chips + 1.

Reported-by: Dan Carpenter 
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar 
Reviewed-by: Madhavan Srinivasan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/perf/imc-pmu.c   | 2 +-
 arch/powerpc/platforms/powernv/opal-imc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 6159e9edddfd0..2d12f0037e3a5 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -499,7 +499,7 @@ static int nest_imc_event_init(struct perf_event *event)
break;
}
pcni++;
-   } while (pcni);
+   } while (pcni->vbase != 0);
 
if (!flag)
return -ENODEV;
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 58a07948c76e7..3d27f02695e41 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -127,7 +127,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
nr_chips))
goto error;
 
-   pmu_ptr->mem_info = kcalloc(nr_chips, sizeof(*pmu_ptr->mem_info),
+   pmu_ptr->mem_info = kcalloc(nr_chips + 1, sizeof(*pmu_ptr->mem_info),
GFP_KERNEL);
if (!pmu_ptr->mem_info)
goto error;
-- 
2.20.1

[PATCH AUTOSEL 5.1 035/375] powerpc/boot: Fix missing check of lseek() return value

2019-05-22 Thread Sasha Levin

From: Bo YU 

[ Upstream commit 5d085ec04a000fefb5182d3b03ee46ca96d8389b ]

This is detected by Coverity scan: CID: 1440481

Signed-off-by: Bo YU 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/addnote.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/addnote.c b/arch/powerpc/boot/addnote.c
index 9d9f6f334d3cc..3da3e2b1b51bc 100644
--- a/arch/powerpc/boot/addnote.c
+++ b/arch/powerpc/boot/addnote.c
@@ -223,7 +223,11 @@ main(int ac, char **av)
PUT_16(E_PHNUM, np + 2);
 
/* write back */
-   lseek(fd, (long) 0, SEEK_SET);
+   i = lseek(fd, (long) 0, SEEK_SET);
+   if (i < 0) {
+   perror("lseek");
+   exit(1);
+   }
i = write(fd, buf, n);
if (i < 0) {
perror("write");
-- 
2.20.1

[PATCH AUTOSEL 5.1 034/375] powerpc/perf: Return accordingly on invalid chip-id in

2019-05-22 Thread Sasha Levin

From: Anju T Sudhakar 

[ Upstream commit a913e5e8b43be1d3897a141ce61c1ec071cad89c ]

Nest hardware counter memory resides in a per-chip reserve-memory.
During nest_imc_event_init(), chip-id of the event-cpu is considered to
calculate the base memory addresss for that cpu. Return, proper error
condition if the chip_id calculated is invalid.

Reported-by: Dan Carpenter 
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Reviewed-by: Madhavan Srinivasan 
Signed-off-by: Anju T Sudhakar 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/perf/imc-pmu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index b1c37cc3fa98b..6159e9edddfd0 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -487,6 +487,11 @@ static int nest_imc_event_init(struct perf_event *event)
 * Get the base memory addresss for this cpu.
 */
chip_id = cpu_to_chip_id(event->cpu);
+
+   /* Return, if chip_id is not valid */
+   if (chip_id < 0)
+   return -ENODEV;
+
pcni = pmu->mem_info;
do {
if (pcni->id == chip_id) {
-- 
2.20.1

Re: [PATCH v1 1/2] open: add close_range()

2019-05-22 Thread Oleg Nesterov

On 05/22, Christian Brauner wrote:
>
> +static struct file *pick_file(struct files_struct *files, unsigned fd)
>  {
> - struct file *file;
> + struct file *file = NULL;
>   struct fdtable *fdt;
>  
>   spin_lock(>file_lock);
> @@ -632,15 +629,65 @@ int __close_fd(struct files_struct *files, unsigned fd)
>   goto out_unlock;
>   rcu_assign_pointer(fdt->fd[fd], NULL);
>   __put_unused_fd(files, fd);
> - spin_unlock(>file_lock);
> - return filp_close(file, files);
>  
>  out_unlock:
>   spin_unlock(>file_lock);
> - return -EBADF;
> + return file;

...

> +int __close_range(struct files_struct *files, unsigned fd, unsigned max_fd)
> +{
> + unsigned int cur_max;
> +
> + if (fd > max_fd)
> + return -EINVAL;
> +
> + rcu_read_lock();
> + cur_max = files_fdtable(files)->max_fds;
> + rcu_read_unlock();
> +
> + /* cap to last valid index into fdtable */
> + if (max_fd >= cur_max)
> + max_fd = cur_max - 1;
> +
> + while (fd <= max_fd) {
> + struct file *file;
> +
> + file = pick_file(files, fd++);

Well, how about something like

static unsigned int find_next_opened_fd(struct fdtable *fdt, unsigned 
start)
{
unsigned int maxfd = fdt->max_fds;
unsigned int maxbit = maxfd / BITS_PER_LONG;
unsigned int bitbit = start / BITS_PER_LONG;

bitbit = find_next_bit(fdt->full_fds_bits, maxbit, bitbit) * 
BITS_PER_LONG;
if (bitbit > maxfd)
return maxfd;
if (bitbit > start)
start = bitbit;
return find_next_bit(fdt->open_fds, maxfd, start);
}

unsigned close_next_fd(struct files_struct *files, unsigned start, 
unsigned maxfd)
{
unsigned fd;
struct file *file;
struct fdtable *fdt;

spin_lock(>file_lock);
fdt = files_fdtable(files);
fd = find_next_opened_fd(fdt, start);
if (fd >= fdt->max_fds || fd > maxfd) {
fd = -1;
goto out;
}

file = fdt->fd[fd];
rcu_assign_pointer(fdt->fd[fd], NULL);
__put_unused_fd(files, fd);
out:
spin_unlock(>file_lock);

if (fd == -1u)
return fd;

filp_close(file, files);
return fd + 1;
}

?

Then close_range() can do

while (fd < max_fd)
fd = close_next_fd(fd, maxfd);

Oleg.

[PATCH] powerpc/powernv: fix a W=1 compilation warning

2019-05-22 Thread Qian Cai

The commit b575c731fe58 ("powerpc/powernv/npu: Add set/unset window
helpers") called pnv_npu_set_window() in a void function
pnv_npu_dma_set_32(), but the return code from pnv_npu_set_window() has
no use there as all the error logging happen in pnv_npu_set_window(),
so just remove the unused variable to avoid a compilation warning,

arch/powerpc/platforms/powernv/npu-dma.c: In function
'pnv_npu_dma_set_32':
arch/powerpc/platforms/powernv/npu-dma.c:198:10: warning: variable ‘rc’
set but not used [-Wunused-but-set-variable]

Signed-off-by: Qian Cai 
---
 arch/powerpc/platforms/powernv/npu-dma.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 495550432f3d..035208ed591f 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -195,7 +195,6 @@ static void pnv_npu_dma_set_32(struct pnv_ioda_pe *npe)
 {
struct pci_dev *gpdev;
struct pnv_ioda_pe *gpe;
-   int64_t rc;
 
/*
 * Find the assoicated PCI devices and get the dma window
@@ -208,8 +207,8 @@ static void pnv_npu_dma_set_32(struct pnv_ioda_pe *npe)
if (!gpe)
return;
 
-   rc = pnv_npu_set_window(>table_group, 0,
-   gpe->table_group.tables[0]);
+   pnv_npu_set_window(>table_group, 0,
+  gpe->table_group.tables[0]);
 
/*
 * NVLink devices use the same TCE table configuration as
-- 
1.8.3.1

[PATCH v1 2/2] tests: add close_range() tests

2019-05-22 Thread Christian Brauner

This adds basic tests for the new close_range() syscall.
- test that no invalid flags can be passed
- test that a range of file descriptors is correctly closed
- test that a range of file descriptors is correctly closed if there there
  are already closed file descriptors in the range
- test that max_fd is correctly capped to the current fdtable maximum

Signed-off-by: Christian Brauner 
Cc: Arnd Bergmann 
Cc: Jann Horn 
Cc: David Howells 
Cc: Dmitry V. Levin 
Cc: Oleg Nesterov 
Cc: Linus Torvalds 
Cc: Florian Weimer 
Cc: linux-...@vger.kernel.org
---
v1: unchanged
---
 tools/testing/selftests/Makefile  |   1 +
 tools/testing/selftests/core/.gitignore   |   1 +
 tools/testing/selftests/core/Makefile |   6 +
 .../testing/selftests/core/close_range_test.c | 128 ++
 4 files changed, 136 insertions(+)
 create mode 100644 tools/testing/selftests/core/.gitignore
 create mode 100644 tools/testing/selftests/core/Makefile
 create mode 100644 tools/testing/selftests/core/close_range_test.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 9781ca79794a..06e57fabbff9 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -4,6 +4,7 @@ TARGETS += bpf
 TARGETS += breakpoints
 TARGETS += capabilities
 TARGETS += cgroup
+TARGETS += core
 TARGETS += cpufreq
 TARGETS += cpu-hotplug
 TARGETS += drivers/dma-buf
diff --git a/tools/testing/selftests/core/.gitignore 
b/tools/testing/selftests/core/.gitignore
new file mode 100644
index ..6e6712ce5817
--- /dev/null
+++ b/tools/testing/selftests/core/.gitignore
@@ -0,0 +1 @@
+close_range_test
diff --git a/tools/testing/selftests/core/Makefile 
b/tools/testing/selftests/core/Makefile
new file mode 100644
index ..de3ae68aa345
--- /dev/null
+++ b/tools/testing/selftests/core/Makefile
@@ -0,0 +1,6 @@
+CFLAGS += -g -I../../../../usr/include/ -I../../../../include
+
+TEST_GEN_PROGS := close_range_test
+
+include ../lib.mk
+
diff --git a/tools/testing/selftests/core/close_range_test.c 
b/tools/testing/selftests/core/close_range_test.c
new file mode 100644
index ..ab10cd205ab9
--- /dev/null
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+static inline int sys_close_range(unsigned int fd, unsigned int max_fd,
+ unsigned int flags)
+{
+   return syscall(__NR_close_range, fd, max_fd, flags);
+}
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+int main(int argc, char **argv)
+{
+   const char *test_name = "close_range";
+   int i, ret;
+   int open_fds[100];
+   int fd_max, fd_mid, fd_min;
+
+   ksft_set_plan(7);
+
+   for (i = 0; i < ARRAY_SIZE(open_fds); i++) {
+   int fd;
+
+   fd = open("/dev/null", O_RDONLY | O_CLOEXEC);
+   if (fd < 0) {
+   if (errno == ENOENT)
+   ksft_exit_skip(
+   "%s test: skipping test since /dev/null 
does not exist\n",
+   test_name);
+
+   ksft_exit_fail_msg(
+   "%s test: %s - failed to open /dev/null\n",
+   strerror(errno), test_name);
+   }
+
+   open_fds[i] = fd;
+   }
+
+   fd_min = open_fds[0];
+   fd_max = open_fds[99];
+
+   ret = sys_close_range(fd_min, fd_max, 1);
+   if (!ret)
+   ksft_exit_fail_msg(
+   "%s test: managed to pass invalid flag value\n",
+   test_name);
+   ksft_test_result_pass("do not allow invalid flag values for 
close_range()\n");
+
+   fd_mid = open_fds[50];
+   ret = sys_close_range(fd_min, fd_mid, 0);
+   if (ret < 0)
+   ksft_exit_fail_msg(
+   "%s test: Failed to close range of file descriptors 
from 4 to 50\n",
+   test_name);
+   ksft_test_result_pass("close_range() from %d to %d\n", fd_min, fd_mid);
+
+   for (i = 0; i <= 50; i++) {
+   ret = fcntl(open_fds[i], F_GETFL);
+   if (ret >= 0)
+   ksft_exit_fail_msg(
+   "%s test: Failed to close range of file 
descriptors from 4 to 50\n",
+   test_name);
+   }
+   ksft_test_result_pass("fcntl() verify closed range from %d to %d\n", 
fd_min, fd_mid);
+
+   /* create a couple of gaps */
+   close(57);
+   close(78);
+   close(81);
+   close(82);
+   close(84);
+   close(90);
+
+   fd_mid = open_fds[51];
+   /* Choose slightly lower limit and leave some fds for a later test */
+

[PATCH v1 1/2] open: add close_range()

2019-05-22 Thread Christian Brauner

This adds the close_range() syscall. It allows to efficiently close a range
of file descriptors up to all file descriptors of a calling task.

The syscall came up in a recent discussion around the new mount API and
making new file descriptor types cloexec by default. During this
discussion, Al suggested the close_range() syscall (cf. [1]). Note, a
syscall in this manner has been requested by various people over time.

First, it helps to close all file descriptors of an exec()ing task. This
can be done safely via (quoting Al's example from [1] verbatim):

/* that exec is sensitive */
unshare(CLONE_FILES);
/* we don't want anything past stderr here */
close_range(3, ~0U);
execve();

The code snippet above is one way of working around the problem that file
descriptors are not cloexec by default. This is aggravated by the fact that
we can't just switch them over without massively regressing userspace. For
a whole class of programs having an in-kernel method of closing all file
descriptors is very helpful (e.g. demons, service managers, programming
language standard libraries, container managers etc.).
(Please note, unshare(CLONE_FILES) should only be needed if the calling
 task is multi-threaded and shares the file descriptor table with another
 thread in which case two threads could race with one thread allocating
 file descriptors and the other one closing them via close_range(). For the
 general case close_range() before the execve() is sufficient.)

Second, it allows userspace to avoid implementing closing all file
descriptors by parsing through /proc//fd/* and calling close() on each
file descriptor. From looking at various large(ish) userspace code bases
this or similar patterns are very common in:
- service managers (cf. [4])
- libcs (cf. [6])
- container runtimes (cf. [5])
- programming language runtimes/standard libraries
  - Python (cf. [2])
  - Rust (cf. [7], [8])
As Dmitry pointed out there's even a long-standing glibc bug about missing
kernel support for this task (cf. [3]).
In addition, the syscall will also work for tasks that do not have procfs
mounted and on kernels that do not have procfs support compiled in. In such
situations the only way to make sure that all file descriptors are closed
is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE,
OPEN_MAX trickery (cf. comment [8] on Rust).

The performance is striking. For good measure, comparing the following
simple close_all_fds() userspace implementation that is essentially just
glibc's version in [6]:

static int close_all_fds(void)
{
int dir_fd;
DIR *dir;
struct dirent *direntp;

dir = opendir("/proc/self/fd");
if (!dir)
return -1;
dir_fd = dirfd(dir);
while ((direntp = readdir(dir))) {
int fd;
if (strcmp(direntp->d_name, ".") == 0)
continue;
if (strcmp(direntp->d_name, "..") == 0)
continue;
fd = atoi(direntp->d_name);
if (fd == dir_fd || fd == 0 || fd == 1 || fd == 2)
continue;
close(fd);
}
closedir(dir);
return 0;
}

to close_range() yields:
1. closing 4 open files:
   - close_all_fds(): ~280 us
   - close_range():~24 us

2. closing 1000 open files:
   - close_all_fds(): ~5000 us
   - close_range():   ~800 us

close_range() is designed to allow for some flexibility. Specifically, it
does not simply always close all open file descriptors of a task. Instead,
callers can specify an upper bound.
This is e.g. useful for scenarios where specific file descriptors are
created with well-known numbers that are supposed to be excluded from
getting closed.
For extra paranoia close_range() comes with a flags argument. This can e.g.
be used to implement extension. Once can imagine userspace wanting to stop
at the first error instead of ignoring errors under certain circumstances.
There might be other valid ideas in the future. In any case, a flag
argument doesn't hurt and keeps us on the safe side.

>From an implementation side this is kept rather dumb. It saw some input
from David and Jann but all nonsense is obviously my own!
- Errors to close file descriptors are currently ignored. (Could be changed
  by setting a flag in the future if needed.)
- __close_range() is a rather simplistic wrapper around __close_fd().
  My reasoning behind this is based on the nature of how __close_fd() needs
  to release an fd. But maybe I misunderstood specifics:
  We take the files_lock and rcu-dereference the fdtable of the calling
  task, we find the entry in the fdtable, get the file and need to release
  files_lock before calling filp_close().
  In the meantime the fdtable might have been altered so we can't just
  retake the spinlock and keep the old rcu-reference of the fdtable
  around. Instead we need to grab a fresh reference to

Re: [PATCH v3 3/3] kselftest: Extend vDSO selftest to clock_getres

2019-05-22 Thread Vincenzo Frascino

Hi Christophe,

thank you for your review.

On 22/05/2019 12:50, Christophe Leroy wrote:
> 
> 
> Le 22/05/2019 à 13:07, Vincenzo Frascino a écrit :
>> The current version of the multiarch vDSO selftest verifies only
>> gettimeofday.
>>
>> Extend the vDSO selftest to clock_getres, to verify that the
>> syscall and the vDSO library function return the same information.
>>
>> The extension has been used to verify the hrtimer_resoltion fix.
>>
>> Cc: Shuah Khan 
>> Signed-off-by: Vincenzo Frascino 
>> ---
>>
>> Note: This patch is independent from the others in this series, hence it
>> can be merged singularly by the kselftest maintainers.
>>
>>   tools/testing/selftests/vDSO/Makefile |   2 +
>>   .../selftests/vDSO/vdso_clock_getres.c| 137 ++
>>   2 files changed, 139 insertions(+)
>>   create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c
>>
>> diff --git a/tools/testing/selftests/vDSO/Makefile 
>> b/tools/testing/selftests/vDSO/Makefile
>> index 9e03d61f52fd..d5c5bfdf1ac1 100644
>> --- a/tools/testing/selftests/vDSO/Makefile
>> +++ b/tools/testing/selftests/vDSO/Makefile
>> @@ -5,6 +5,7 @@ uname_M := $(shell uname -m 2>/dev/null || echo not)
>>   ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
>>   
>>   TEST_GEN_PROGS := $(OUTPUT)/vdso_test
>> +TEST_GEN_PROGS += $(OUTPUT)/vdso_clock_getres
>>   ifeq ($(ARCH),x86)
>>   TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86
>>   endif
>> @@ -18,6 +19,7 @@ endif
>>   
>>   all: $(TEST_GEN_PROGS)
>>   $(OUTPUT)/vdso_test: parse_vdso.c vdso_test.c
>> +$(OUTPUT)/vdso_clock_getres: vdso_clock_getres.c
>>   $(OUTPUT)/vdso_standalone_test_x86: vdso_standalone_test_x86.c parse_vdso.c
>>  $(CC) $(CFLAGS) $(CFLAGS_vdso_standalone_test_x86) \
>>  vdso_standalone_test_x86.c parse_vdso.c \
>> diff --git a/tools/testing/selftests/vDSO/vdso_clock_getres.c 
>> b/tools/testing/selftests/vDSO/vdso_clock_getres.c
>> new file mode 100644
>> index ..341a9bc34ffc
>> --- /dev/null
>> +++ b/tools/testing/selftests/vDSO/vdso_clock_getres.c
>> @@ -0,0 +1,137 @@
>> +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>> +/*
>> + * vdso_clock_getres.c: Sample code to test clock_getres.
>> + * Copyright (c) 2019 Arm Ltd.
>> + *
>> + * Compile with:
>> + * gcc -std=gnu99 vdso_clock_getres.c
>> + *
>> + * Tested on ARM, ARM64, MIPS32, x86 (32-bit and 64-bit),
>> + * Power (32-bit and 64-bit), S390x (32-bit and 64-bit).
>> + * Might work on other architectures.
>> + */
>> +
>> +#define _GNU_SOURCE
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "../kselftest.h"
>> +
>> +static long syscall_clock_getres(clockid_t _clkid, struct timespec *_ts)
>> +{
>> +long ret;
>> +
>> +ret = syscall(SYS_clock_getres, _clkid, _ts);
>> +
>> +return ret;
>> +}
>> +
>> +const char *vdso_clock_name[12] = {
>> +"CLOCK_REALTIME",
>> +"CLOCK_MONOTONIC",
>> +"CLOCK_PROCESS_CPUTIME_ID",
>> +"CLOCK_THREAD_CPUTIME_ID",
>> +"CLOCK_MONOTONIC_RAW",
>> +"CLOCK_REALTIME_COARSE",
>> +"CLOCK_MONOTONIC_COARSE",
>> +"CLOCK_BOOTTIME",
>> +"CLOCK_REALTIME_ALARM",
>> +"CLOCK_BOOTTIME_ALARM",
>> +"CLOCK_SGI_CYCLE",
>> +"CLOCK_TAI",
>> +};
>> +
>> +/*
>> + * This function calls clock_getres in vdso and by system call
>> + * with different values for clock_id.
>> + *
>> + * Example of output:
>> + *
>> + * clock_id: CLOCK_REALTIME [PASS]
>> + * clock_id: CLOCK_BOOTTIME [PASS]
>> + * clock_id: CLOCK_TAI [PASS]
>> + * clock_id: CLOCK_REALTIME_COARSE [PASS]
>> + * clock_id: CLOCK_MONOTONIC [PASS]
>> + * clock_id: CLOCK_MONOTONIC_RAW [PASS]
>> + * clock_id: CLOCK_MONOTONIC_COARSE [PASS]
>> + */
>> +static inline int vdso_test_clock(unsigned int clock_id)
>> +{
>> +struct timespec x, y;
>> +
>> +printf("clock_id: %s", vdso_clock_name[clock_id]);
>> +clock_getres(clock_id, );
>> +syscall_clock_getres(clock_id, );
>> +
>> +if ((x.tv_sec != y.tv_sec) || (x.tv_sec != y.tv_sec)) {
>> +printf(" [FAIL]\n");
>> +return KSFT_FAIL;
>> +}
>> +
>> +printf(" [PASS]\n");
>> +return 0;
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +int ret;
>> +
>> +#if _POSIX_TIMERS > 0
>> +
>> +#ifdef CLOCK_REALTIME
> 
> Why do you need that #ifdef and all the ones below ?
> 
> CLOCK_REALTIME (and others) is defined in include/uapi/linux/time.h, so 
> it should be there when you build the test, shouldn't it ?
> 

In implementing this test I tried to follow what the man page for
clock_gettime(2) defines in terms of availability of the timers. Since I do not
know how old are the userspace headers, I think it is a good idea checking that
the clocks are defined before trying to use them.

>> +ret = vdso_test_clock(CLOCK_REALTIME);
>> +if (ret)
>> +goto out;
> 
> Why that goto ? Nothing

Re: [PATCH v3 3/3] kselftest: Extend vDSO selftest to clock_getres

2019-05-22 Thread Vincenzo Frascino

Hi Christophe,

thank you for your review.

On 22/05/2019 12:50, Christophe Leroy wrote:
> 
> 
> Le 22/05/2019 à 13:07, Vincenzo Frascino a écrit :
>> The current version of the multiarch vDSO selftest verifies only
>> gettimeofday.
>>
>> Extend the vDSO selftest to clock_getres, to verify that the
>> syscall and the vDSO library function return the same information.
>>
>> The extension has been used to verify the hrtimer_resoltion fix.
>>
>> Cc: Shuah Khan 
>> Signed-off-by: Vincenzo Frascino 
>> ---
>>
>> Note: This patch is independent from the others in this series, hence it
>> can be merged singularly by the kselftest maintainers.
>>
>>   tools/testing/selftests/vDSO/Makefile |   2 +
>>   .../selftests/vDSO/vdso_clock_getres.c| 137 ++
>>   2 files changed, 139 insertions(+)
>>   create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c
>>
>> diff --git a/tools/testing/selftests/vDSO/Makefile 
>> b/tools/testing/selftests/vDSO/Makefile
>> index 9e03d61f52fd..d5c5bfdf1ac1 100644
>> --- a/tools/testing/selftests/vDSO/Makefile
>> +++ b/tools/testing/selftests/vDSO/Makefile
>> @@ -5,6 +5,7 @@ uname_M := $(shell uname -m 2>/dev/null || echo not)
>>   ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
>>   
>>   TEST_GEN_PROGS := $(OUTPUT)/vdso_test
>> +TEST_GEN_PROGS += $(OUTPUT)/vdso_clock_getres
>>   ifeq ($(ARCH),x86)
>>   TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86
>>   endif
>> @@ -18,6 +19,7 @@ endif
>>   
>>   all: $(TEST_GEN_PROGS)
>>   $(OUTPUT)/vdso_test: parse_vdso.c vdso_test.c
>> +$(OUTPUT)/vdso_clock_getres: vdso_clock_getres.c
>>   $(OUTPUT)/vdso_standalone_test_x86: vdso_standalone_test_x86.c parse_vdso.c
>>  $(CC) $(CFLAGS) $(CFLAGS_vdso_standalone_test_x86) \
>>  vdso_standalone_test_x86.c parse_vdso.c \
>> diff --git a/tools/testing/selftests/vDSO/vdso_clock_getres.c 
>> b/tools/testing/selftests/vDSO/vdso_clock_getres.c
>> new file mode 100644
>> index ..341a9bc34ffc
>> --- /dev/null
>> +++ b/tools/testing/selftests/vDSO/vdso_clock_getres.c
>> @@ -0,0 +1,137 @@
>> +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>> +/*
>> + * vdso_clock_getres.c: Sample code to test clock_getres.
>> + * Copyright (c) 2019 Arm Ltd.
>> + *
>> + * Compile with:
>> + * gcc -std=gnu99 vdso_clock_getres.c
>> + *
>> + * Tested on ARM, ARM64, MIPS32, x86 (32-bit and 64-bit),
>> + * Power (32-bit and 64-bit), S390x (32-bit and 64-bit).
>> + * Might work on other architectures.
>> + */
>> +
>> +#define _GNU_SOURCE
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "../kselftest.h"
>> +
>> +static long syscall_clock_getres(clockid_t _clkid, struct timespec *_ts)
>> +{
>> +long ret;
>> +
>> +ret = syscall(SYS_clock_getres, _clkid, _ts);
>> +
>> +return ret;
>> +}
>> +
>> +const char *vdso_clock_name[12] = {
>> +"CLOCK_REALTIME",
>> +"CLOCK_MONOTONIC",
>> +"CLOCK_PROCESS_CPUTIME_ID",
>> +"CLOCK_THREAD_CPUTIME_ID",
>> +"CLOCK_MONOTONIC_RAW",
>> +"CLOCK_REALTIME_COARSE",
>> +"CLOCK_MONOTONIC_COARSE",
>> +"CLOCK_BOOTTIME",
>> +"CLOCK_REALTIME_ALARM",
>> +"CLOCK_BOOTTIME_ALARM",
>> +"CLOCK_SGI_CYCLE",
>> +"CLOCK_TAI",
>> +};
>> +
>> +/*
>> + * This function calls clock_getres in vdso and by system call
>> + * with different values for clock_id.
>> + *
>> + * Example of output:
>> + *
>> + * clock_id: CLOCK_REALTIME [PASS]
>> + * clock_id: CLOCK_BOOTTIME [PASS]
>> + * clock_id: CLOCK_TAI [PASS]
>> + * clock_id: CLOCK_REALTIME_COARSE [PASS]
>> + * clock_id: CLOCK_MONOTONIC [PASS]
>> + * clock_id: CLOCK_MONOTONIC_RAW [PASS]
>> + * clock_id: CLOCK_MONOTONIC_COARSE [PASS]
>> + */
>> +static inline int vdso_test_clock(unsigned int clock_id)
>> +{
>> +struct timespec x, y;
>> +
>> +printf("clock_id: %s", vdso_clock_name[clock_id]);
>> +clock_getres(clock_id, );
>> +syscall_clock_getres(clock_id, );
>> +
>> +if ((x.tv_sec != y.tv_sec) || (x.tv_sec != y.tv_sec)) {
>> +printf(" [FAIL]\n");
>> +return KSFT_FAIL;
>> +}
>> +
>> +printf(" [PASS]\n");
>> +return 0;
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +int ret;
>> +
>> +#if _POSIX_TIMERS > 0
>> +
>> +#ifdef CLOCK_REALTIME
> 
> Why do you need that #ifdef and all the ones below ?
> 
> CLOCK_REALTIME (and others) is defined in include/uapi/linux/time.h, so 
> it should be there when you build the test, shouldn't it ?
> 

In implementing this test I followed what the man page for clock_gettime(2)
defines in terms of availability of the timers. Since I do not know how old are
the userspace headers, I think it is a good idea checking that the clocks are
defined before trying to use them.

>> +ret = vdso_test_clock(CLOCK_REALTIME);
>> +if (ret)
>> +goto out;
> 
> Why that goto ? Nothing is

Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release

2019-05-22 Thread Aneesh Kumar K.V

"Aneesh Kumar K.V"  writes:

> On 5/14/19 9:45 AM, Dan Williams wrote:
>> [ add Keith who was looking at something similar ]
>> 

...

>>
>> If it's reserved then we should not be accessing, even if the above
>> works in practice. Isn't the fix something more like this to fix up
>> the assumptions at release time?
>> 
>> diff --git a/kernel/memremap.c b/kernel/memremap.c
>> index a856cb5ff192..9074ba14572c 100644
>> --- a/kernel/memremap.c
>> +++ b/kernel/memremap.c
>> @@ -90,6 +90,7 @@ static void devm_memremap_pages_release(void *data)
>>struct device *dev = pgmap->dev;
>>struct resource *res = >res;
>>resource_size_t align_start, align_size;
>> + struct vmem_altmap *altmap = pgmap->altmap_valid ? >altmap : NULL;
>>unsigned long pfn;
>>int nid;
>> 
>> @@ -102,7 +103,10 @@ static void devm_memremap_pages_release(void *data)
>>align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
>>- align_start;
>> 
>> - nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
>> + pfn = align_start >> PAGE_SHIFT;
>> + if (altmap)
>> + pfn += vmem_altmap_offset(altmap);
>> + nid = page_to_nid(pfn_to_page(pfn));
>> 
>>mem_hotplug_begin();
>>if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>> @@ -110,8 +114,7 @@ static void devm_memremap_pages_release(void *data)
>>__remove_pages(page_zone(pfn_to_page(pfn)), pfn,
>>align_size >> PAGE_SHIFT, NULL);
>>} else {
>> - arch_remove_memory(nid, align_start, align_size,
>> - pgmap->altmap_valid ? >altmap : NULL);
>> + arch_remove_memory(nid, align_start, align_size, altmap);
>>kasan_remove_zero_shadow(__va(align_start), align_size);
>>}
>>mem_hotplug_done();
>> 
> I did try that first. I was not sure about that. From the memory add vs 
> remove perspective.
>
> devm_memremap_pages:
>
> align_start = res->start & ~(SECTION_SIZE - 1);
> align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
>   - align_start;
> align_end = align_start + align_size - 1;
>
> error = arch_add_memory(nid, align_start, align_size, altmap,
>   false);
>
>
> devm_memremap_pages_release:
>
> /* pages are dead and unused, undo the arch mapping */
> align_start = res->start & ~(SECTION_SIZE - 1);
> align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
>   - align_start;
>
> arch_remove_memory(nid, align_start, align_size,
>   pgmap->altmap_valid ? >altmap : NULL);
>
>
> Now if we are fixing the memremap_pages_release, shouldn't we adjust 
> alig_start w.r.t memremap_pages too? and I was not sure what that means 
> w.r.t add/remove alignment requirements.
>
> What is the intended usage of reserve area? I guess we want that part to 
> be added? if so shouldn't we remove them?

We need to intialize the struct page backing the reserve area too right?
Where should we do that?

-aneesh

Re: Failure to boot G4: dt_headr_start=0x01501000

2019-05-22 Thread Christophe Leroy





Le 22/05/2019 à 14:15, Mathieu Malaterre a écrit :

Hi all,

I have not boot my G4 in a while, today using master here is what I see:

done
Setting btext !
W=640 H=488 LB=768 addr=0x9c008000
copying OF device tree...
starting device tree allocs at 01401000
otloc_up(0010, 0013d948)
   trying: 0x01401000
   trying: 0x01501000
  -› 01501000
   alloc_bottom : 01601000
   alloc_top: 2000
   alloc_top_hi : 2000
   nmo_top  : 2000
   ram_top  : 2000
Building dt strings...
Building dt structure...
reserved memory map:
   00d4 - 006c1000
Device tree strings 0x01502000 -> 0x0007
Device tree struct 0x01503000 -> 0x0007
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x00140
->dt_headr_start=0x01501000

Any suggestions before I start a bisect ?



Have you tried without CONFIG_PPC_KUEP and CONFIG_PPC_KUAP ?

Christophe

[PATCH] tty: serial: cpm_uart - fix init when SMC is relocated

2019-05-22 Thread Christophe Leroy

SMC relocation can also be activated earlier by the bootloader,
so the driver's behaviour cannot rely on selected kernel config.

When the SMC is relocated, CPM_CR_INIT_TRX cannot be used.

But the only thing CPM_CR_INIT_TRX does is to clear the
rstate and tstate registers, so this can be done manually,
even when SMC is not relocated.

Signed-off-by: Christophe Leroy 
Fixes: 9ab921201444 ("cpm_uart: fix non-console port startup bug")
---
 drivers/tty/serial/cpm_uart/cpm_uart_core.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c 
b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
index b929c7ae3a27..7bab9a3eda92 100644
--- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
+++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
@@ -407,7 +407,16 @@ static int cpm_uart_startup(struct uart_port *port)
clrbits16(>sccp->scc_sccm, UART_SCCM_RX);
}
cpm_uart_initbd(pinfo);
-   cpm_line_cr_cmd(pinfo, CPM_CR_INIT_TRX);
+   if (IS_SMC(pinfo)) {
+   out_be32(>smcup->smc_rstate, 0);
+   out_be32(>smcup->smc_tstate, 0);
+   out_be16(>smcup->smc_rbptr,
+in_be16(>smcup->smc_rbase));
+   out_be16(>smcup->smc_tbptr,
+in_be16(>smcup->smc_tbase));
+   } else {
+   cpm_line_cr_cmd(pinfo, CPM_CR_INIT_TRX);
+   }
}
/* Install interrupt handler. */
retval = request_irq(port->irq, cpm_uart_int, 0, "cpm_uart", port);
@@ -861,16 +870,14 @@ static void cpm_uart_init_smc(struct uart_cpm_port *pinfo)
 (u8 __iomem *)pinfo->tx_bd_base - DPRAM_BASE);
 
 /*
- *  In case SMC1 is being relocated...
+ *  In case SMC is being relocated...
  */
-#if defined (CONFIG_I2C_SPI_SMC1_UCODE_PATCH)
out_be16(>smc_rbptr, in_be16(>smcup->smc_rbase));
out_be16(>smc_tbptr, in_be16(>smcup->smc_tbase));
out_be32(>smc_rstate, 0);
out_be32(>smc_tstate, 0);
out_be16(>smc_brkcr, 1);  /* number of break chars */
out_be16(>smc_brkec, 0);
-#endif
 
/* Set up the uart parameters in the
 * parameter ram.
@@ -884,8 +891,6 @@ static void cpm_uart_init_smc(struct uart_cpm_port *pinfo)
out_be16(>smc_brkec, 0);
out_be16(>smc_brkcr, 1);
 
-   cpm_line_cr_cmd(pinfo, CPM_CR_INIT_TRX);
-
/* Set UART mode, 8 bit, no parity, one stop.
 * Enable receive and transmit.
 */
-- 
2.13.3

Failure to boot G4: dt_headr_start=0x01501000

2019-05-22 Thread Mathieu Malaterre

Hi all,

I have not boot my G4 in a while, today using master here is what I see:

done
Setting btext !
W=640 H=488 LB=768 addr=0x9c008000
copying OF device tree...
starting device tree allocs at 01401000
otloc_up(0010, 0013d948)
  trying: 0x01401000
  trying: 0x01501000
 -› 01501000
  alloc_bottom : 01601000
  alloc_top: 2000
  alloc_top_hi : 2000
  nmo_top  : 2000
  ram_top  : 2000
Building dt strings...
Building dt structure...
reserved memory map:
  00d4 - 006c1000
Device tree strings 0x01502000 -> 0x0007
Device tree struct 0x01503000 -> 0x0007
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x00140
->dt_headr_start=0x01501000

Any suggestions before I start a bisect ?

Thanks

Re: [PATCH v3 3/3] kselftest: Extend vDSO selftest to clock_getres

2019-05-22 Thread Christophe Leroy





Le 22/05/2019 à 13:07, Vincenzo Frascino a écrit :

The current version of the multiarch vDSO selftest verifies only
gettimeofday.

Extend the vDSO selftest to clock_getres, to verify that the
syscall and the vDSO library function return the same information.

The extension has been used to verify the hrtimer_resoltion fix.

Cc: Shuah Khan 
Signed-off-by: Vincenzo Frascino 
---

Note: This patch is independent from the others in this series, hence it
can be merged singularly by the kselftest maintainers.

  tools/testing/selftests/vDSO/Makefile |   2 +
  .../selftests/vDSO/vdso_clock_getres.c| 137 ++
  2 files changed, 139 insertions(+)
  create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c

diff --git a/tools/testing/selftests/vDSO/Makefile 
b/tools/testing/selftests/vDSO/Makefile
index 9e03d61f52fd..d5c5bfdf1ac1 100644
--- a/tools/testing/selftests/vDSO/Makefile
+++ b/tools/testing/selftests/vDSO/Makefile
@@ -5,6 +5,7 @@ uname_M := $(shell uname -m 2>/dev/null || echo not)
  ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
  
  TEST_GEN_PROGS := $(OUTPUT)/vdso_test

+TEST_GEN_PROGS += $(OUTPUT)/vdso_clock_getres
  ifeq ($(ARCH),x86)
  TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86
  endif
@@ -18,6 +19,7 @@ endif
  
  all: $(TEST_GEN_PROGS)

  $(OUTPUT)/vdso_test: parse_vdso.c vdso_test.c
+$(OUTPUT)/vdso_clock_getres: vdso_clock_getres.c
  $(OUTPUT)/vdso_standalone_test_x86: vdso_standalone_test_x86.c parse_vdso.c
$(CC) $(CFLAGS) $(CFLAGS_vdso_standalone_test_x86) \
vdso_standalone_test_x86.c parse_vdso.c \
diff --git a/tools/testing/selftests/vDSO/vdso_clock_getres.c 
b/tools/testing/selftests/vDSO/vdso_clock_getres.c
new file mode 100644
index ..341a9bc34ffc
--- /dev/null
+++ b/tools/testing/selftests/vDSO/vdso_clock_getres.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ * vdso_clock_getres.c: Sample code to test clock_getres.
+ * Copyright (c) 2019 Arm Ltd.
+ *
+ * Compile with:
+ * gcc -std=gnu99 vdso_clock_getres.c
+ *
+ * Tested on ARM, ARM64, MIPS32, x86 (32-bit and 64-bit),
+ * Power (32-bit and 64-bit), S390x (32-bit and 64-bit).
+ * Might work on other architectures.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+static long syscall_clock_getres(clockid_t _clkid, struct timespec *_ts)
+{
+   long ret;
+
+   ret = syscall(SYS_clock_getres, _clkid, _ts);
+
+   return ret;
+}
+
+const char *vdso_clock_name[12] = {
+   "CLOCK_REALTIME",
+   "CLOCK_MONOTONIC",
+   "CLOCK_PROCESS_CPUTIME_ID",
+   "CLOCK_THREAD_CPUTIME_ID",
+   "CLOCK_MONOTONIC_RAW",
+   "CLOCK_REALTIME_COARSE",
+   "CLOCK_MONOTONIC_COARSE",
+   "CLOCK_BOOTTIME",
+   "CLOCK_REALTIME_ALARM",
+   "CLOCK_BOOTTIME_ALARM",
+   "CLOCK_SGI_CYCLE",
+   "CLOCK_TAI",
+};
+
+/*
+ * This function calls clock_getres in vdso and by system call
+ * with different values for clock_id.
+ *
+ * Example of output:
+ *
+ * clock_id: CLOCK_REALTIME [PASS]
+ * clock_id: CLOCK_BOOTTIME [PASS]
+ * clock_id: CLOCK_TAI [PASS]
+ * clock_id: CLOCK_REALTIME_COARSE [PASS]
+ * clock_id: CLOCK_MONOTONIC [PASS]
+ * clock_id: CLOCK_MONOTONIC_RAW [PASS]
+ * clock_id: CLOCK_MONOTONIC_COARSE [PASS]
+ */
+static inline int vdso_test_clock(unsigned int clock_id)
+{
+   struct timespec x, y;
+
+   printf("clock_id: %s", vdso_clock_name[clock_id]);
+   clock_getres(clock_id, );
+   syscall_clock_getres(clock_id, );
+
+   if ((x.tv_sec != y.tv_sec) || (x.tv_sec != y.tv_sec)) {
+   printf(" [FAIL]\n");
+   return KSFT_FAIL;
+   }
+
+   printf(" [PASS]\n");
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   int ret;
+
+#if _POSIX_TIMERS > 0
+
+#ifdef CLOCK_REALTIME


Why do you need that #ifdef and all the ones below ?

CLOCK_REALTIME (and others) is defined in include/uapi/linux/time.h, so 
it should be there when you build the test, shouldn't it ?



+   ret = vdso_test_clock(CLOCK_REALTIME);
+   if (ret)
+   goto out;


Why that goto ? Nothing is done at out, so a 'return ret' would be 
better I think.


And do we really want to stop at first failure ? Wouldn't it be better 
to run all the tests regardless ?


Christophe


+#endif
+
+#ifdef CLOCK_BOOTTIME
+   ret = vdso_test_clock(CLOCK_BOOTTIME);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_TAI
+   ret = vdso_test_clock(CLOCK_TAI);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_REALTIME_COARSE
+   ret = vdso_test_clock(CLOCK_REALTIME_COARSE);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_MONOTONIC
+   ret = vdso_test_clock(CLOCK_MONOTONIC);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef

[PATCH v3 2/3] s390: Fix vDSO clock_getres()

2019-05-22 Thread Vincenzo Frascino

clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().

In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.

Fix the s390 vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.

Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Signed-off-by: Vincenzo Frascino 
Acked-by: Martin Schwidefsky 
---

Note: This patch is independent from the others in this series, hence it
can be merged singularly by the s390 maintainers.

 arch/s390/include/asm/vdso.h   |  1 +
 arch/s390/kernel/asm-offsets.c |  2 +-
 arch/s390/kernel/time.c|  1 +
 arch/s390/kernel/vdso32/clock_getres.S | 12 +++-
 arch/s390/kernel/vdso64/clock_getres.S | 10 +-
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/vdso.h b/arch/s390/include/asm/vdso.h
index 169d7604eb80..f3ba84fa9bd1 100644
--- a/arch/s390/include/asm/vdso.h
+++ b/arch/s390/include/asm/vdso.h
@@ -36,6 +36,7 @@ struct vdso_data {
__u32 tk_shift; /* Shift used for xtime_nsec0x60 */
__u32 ts_dir;   /* TOD steering direction   0x64 */
__u64 ts_end;   /* TOD steering end 0x68 */
+   __u32 hrtimer_res;  /* hrtimer resolution   0x70 */
 };
 
 struct vdso_per_cpu_data {
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index 41ac4ad21311..4a229a60b24a 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -76,6 +76,7 @@ int main(void)
OFFSET(__VDSO_TK_SHIFT, vdso_data, tk_shift);
OFFSET(__VDSO_TS_DIR, vdso_data, ts_dir);
OFFSET(__VDSO_TS_END, vdso_data, ts_end);
+   OFFSET(__VDSO_CLOCK_REALTIME_RES, vdso_data, hrtimer_res);
OFFSET(__VDSO_ECTG_BASE, vdso_per_cpu_data, ectg_timer_base);
OFFSET(__VDSO_ECTG_USER, vdso_per_cpu_data, ectg_user_time);
OFFSET(__VDSO_CPU_NR, vdso_per_cpu_data, cpu_nr);
@@ -87,7 +88,6 @@ int main(void)
DEFINE(__CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
DEFINE(__CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE);
DEFINE(__CLOCK_THREAD_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID);
-   DEFINE(__CLOCK_REALTIME_RES, MONOTONIC_RES_NSEC);
DEFINE(__CLOCK_COARSE_RES, LOW_RES_NSEC);
BLANK();
/* idle data offsets */
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index e8766beee5ad..8ea9db599d38 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -310,6 +310,7 @@ void update_vsyscall(struct timekeeper *tk)
 
vdso_data->tk_mult = tk->tkr_mono.mult;
vdso_data->tk_shift = tk->tkr_mono.shift;
+   vdso_data->hrtimer_res = hrtimer_resolution;
smp_wmb();
++vdso_data->tb_update_count;
 }
diff --git a/arch/s390/kernel/vdso32/clock_getres.S 
b/arch/s390/kernel/vdso32/clock_getres.S
index eaf9cf1417f6..fecd7684c645 100644
--- a/arch/s390/kernel/vdso32/clock_getres.S
+++ b/arch/s390/kernel/vdso32/clock_getres.S
@@ -18,20 +18,22 @@
 __kernel_clock_getres:
CFI_STARTPROC
basr%r1,0
-   la  %r1,4f-.(%r1)
+10:al  %r1,4f-10b(%r1)
+   l   %r0,__VDSO_CLOCK_REALTIME_RES(%r1)
chi %r2,__CLOCK_REALTIME
je  0f
chi %r2,__CLOCK_MONOTONIC
je  0f
-   la  %r1,5f-4f(%r1)
+   basr%r1,0
+   la  %r1,5f-.(%r1)
+   l   %r0,0(%r1)
chi %r2,__CLOCK_REALTIME_COARSE
je  0f
chi %r2,__CLOCK_MONOTONIC_COARSE
jne 3f
 0: ltr %r3,%r3
jz  2f  /* res == NULL */
-1: l   %r0,0(%r1)
-   xc  0(4,%r3),0(%r3) /* set tp->tv_sec to zero */
+1: xc  0(4,%r3),0(%r3) /* set tp->tv_sec to zero */
st  %r0,4(%r3)  /* store tp->tv_usec */
 2: lhi %r2,0
br  %r14
@@ -39,6 +41,6 @@ __kernel_clock_getres:
svc 0
br  %r14
CFI_ENDPROC
-4: .long   __CLOCK_REALTIME_RES
+4: .long   _vdso_data - 10b
 5: .long   __CLOCK_COARSE_RES
.size   __kernel_clock_getres,.-__kernel_clock_getres
diff --git a/arch/s390/kernel/vdso64/clock_getres.S 
b/arch/s390/kernel/vdso64/clock_getres.S
index 081435398e0a..022b58c980db 100644
--- a/arch/s390/kernel/vdso64/clock_getres.S
+++ b/arch/s390/kernel/vdso64/clock_getres.S
@@ -17,12 +17,14 @@
.type  __kernel_clock_getres,@function
 __kernel_clock_getres:
CFI_STARTPROC
-   larl%r1,4f
+   larl%r1,3f
+   lg  %r0,0(%r1)
cghi%r2,__CLOCK_REALTIME_COARSE
je  0f
cghi%r2,__CLOCK_MONOTONIC_COARSE
je  0f
-

[PATCH v3 3/3] kselftest: Extend vDSO selftest to clock_getres

2019-05-22 Thread Vincenzo Frascino

The current version of the multiarch vDSO selftest verifies only
gettimeofday.

Extend the vDSO selftest to clock_getres, to verify that the
syscall and the vDSO library function return the same information.

The extension has been used to verify the hrtimer_resoltion fix.

Cc: Shuah Khan 
Signed-off-by: Vincenzo Frascino 
---

Note: This patch is independent from the others in this series, hence it
can be merged singularly by the kselftest maintainers.

 tools/testing/selftests/vDSO/Makefile |   2 +
 .../selftests/vDSO/vdso_clock_getres.c| 137 ++
 2 files changed, 139 insertions(+)
 create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c

diff --git a/tools/testing/selftests/vDSO/Makefile 
b/tools/testing/selftests/vDSO/Makefile
index 9e03d61f52fd..d5c5bfdf1ac1 100644
--- a/tools/testing/selftests/vDSO/Makefile
+++ b/tools/testing/selftests/vDSO/Makefile
@@ -5,6 +5,7 @@ uname_M := $(shell uname -m 2>/dev/null || echo not)
 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
 
 TEST_GEN_PROGS := $(OUTPUT)/vdso_test
+TEST_GEN_PROGS += $(OUTPUT)/vdso_clock_getres
 ifeq ($(ARCH),x86)
 TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86
 endif
@@ -18,6 +19,7 @@ endif
 
 all: $(TEST_GEN_PROGS)
 $(OUTPUT)/vdso_test: parse_vdso.c vdso_test.c
+$(OUTPUT)/vdso_clock_getres: vdso_clock_getres.c
 $(OUTPUT)/vdso_standalone_test_x86: vdso_standalone_test_x86.c parse_vdso.c
$(CC) $(CFLAGS) $(CFLAGS_vdso_standalone_test_x86) \
vdso_standalone_test_x86.c parse_vdso.c \
diff --git a/tools/testing/selftests/vDSO/vdso_clock_getres.c 
b/tools/testing/selftests/vDSO/vdso_clock_getres.c
new file mode 100644
index ..341a9bc34ffc
--- /dev/null
+++ b/tools/testing/selftests/vDSO/vdso_clock_getres.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ * vdso_clock_getres.c: Sample code to test clock_getres.
+ * Copyright (c) 2019 Arm Ltd.
+ *
+ * Compile with:
+ * gcc -std=gnu99 vdso_clock_getres.c
+ *
+ * Tested on ARM, ARM64, MIPS32, x86 (32-bit and 64-bit),
+ * Power (32-bit and 64-bit), S390x (32-bit and 64-bit).
+ * Might work on other architectures.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+static long syscall_clock_getres(clockid_t _clkid, struct timespec *_ts)
+{
+   long ret;
+
+   ret = syscall(SYS_clock_getres, _clkid, _ts);
+
+   return ret;
+}
+
+const char *vdso_clock_name[12] = {
+   "CLOCK_REALTIME",
+   "CLOCK_MONOTONIC",
+   "CLOCK_PROCESS_CPUTIME_ID",
+   "CLOCK_THREAD_CPUTIME_ID",
+   "CLOCK_MONOTONIC_RAW",
+   "CLOCK_REALTIME_COARSE",
+   "CLOCK_MONOTONIC_COARSE",
+   "CLOCK_BOOTTIME",
+   "CLOCK_REALTIME_ALARM",
+   "CLOCK_BOOTTIME_ALARM",
+   "CLOCK_SGI_CYCLE",
+   "CLOCK_TAI",
+};
+
+/*
+ * This function calls clock_getres in vdso and by system call
+ * with different values for clock_id.
+ *
+ * Example of output:
+ *
+ * clock_id: CLOCK_REALTIME [PASS]
+ * clock_id: CLOCK_BOOTTIME [PASS]
+ * clock_id: CLOCK_TAI [PASS]
+ * clock_id: CLOCK_REALTIME_COARSE [PASS]
+ * clock_id: CLOCK_MONOTONIC [PASS]
+ * clock_id: CLOCK_MONOTONIC_RAW [PASS]
+ * clock_id: CLOCK_MONOTONIC_COARSE [PASS]
+ */
+static inline int vdso_test_clock(unsigned int clock_id)
+{
+   struct timespec x, y;
+
+   printf("clock_id: %s", vdso_clock_name[clock_id]);
+   clock_getres(clock_id, );
+   syscall_clock_getres(clock_id, );
+
+   if ((x.tv_sec != y.tv_sec) || (x.tv_sec != y.tv_sec)) {
+   printf(" [FAIL]\n");
+   return KSFT_FAIL;
+   }
+
+   printf(" [PASS]\n");
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   int ret;
+
+#if _POSIX_TIMERS > 0
+
+#ifdef CLOCK_REALTIME
+   ret = vdso_test_clock(CLOCK_REALTIME);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_BOOTTIME
+   ret = vdso_test_clock(CLOCK_BOOTTIME);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_TAI
+   ret = vdso_test_clock(CLOCK_TAI);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_REALTIME_COARSE
+   ret = vdso_test_clock(CLOCK_REALTIME_COARSE);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_MONOTONIC
+   ret = vdso_test_clock(CLOCK_MONOTONIC);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_MONOTONIC_RAW
+   ret = vdso_test_clock(CLOCK_MONOTONIC_RAW);
+   if (ret)
+   goto out;
+#endif
+
+#ifdef CLOCK_MONOTONIC_COARSE
+   ret = vdso_test_clock(CLOCK_MONOTONIC_COARSE);
+   if (ret)
+   goto out;
+#endif
+
+#endif
+
+out:
+   return ret;
+}
-- 
2.21.0

[PATCH v3 1/3] powerpc: Fix vDSO clock_getres()

2019-05-22 Thread Vincenzo Frascino

clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().

In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.

Fix the powerpc vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.

Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support
to 32 bits kernel")
Cc: sta...@vger.kernel.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Signed-off-by: Vincenzo Frascino 
Reviewed-by: Christophe Leroy 
---

Note: This patch is independent from the others in this series, hence it
can be merged singularly by the powerpc maintainers.

 arch/powerpc/include/asm/vdso_datapage.h  | 2 ++
 arch/powerpc/kernel/asm-offsets.c | 2 +-
 arch/powerpc/kernel/time.c| 1 +
 arch/powerpc/kernel/vdso32/gettimeofday.S | 7 +--
 arch/powerpc/kernel/vdso64/gettimeofday.S | 7 +--
 5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index bbc06bd72b1f..4333b9a473dc 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -86,6 +86,7 @@ struct vdso_data {
__s32 wtom_clock_nsec;  /* Wall to monotonic clock nsec 
*/
__s64 wtom_clock_sec;   /* Wall to monotonic clock sec 
*/
struct timespec stamp_xtime;/* xtime as at tb_orig_stamp */
+   __u32 hrtimer_res;  /* hrtimer resolution */
__u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls  */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
 };
@@ -107,6 +108,7 @@ struct vdso_data {
__s32 wtom_clock_nsec;
struct timespec stamp_xtime;/* xtime as at tb_orig_stamp */
__u32 stamp_sec_fraction;   /* fractional seconds of stamp_xtime */
+   __u32 hrtimer_res;  /* hrtimer resolution */
__u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
__u32 dcache_block_size;/* L1 d-cache block size */
__u32 icache_block_size;/* L1 i-cache block size */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8e02444e9d3d..dfc40f29f2b9 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -389,6 +389,7 @@ int main(void)
OFFSET(WTOM_CLOCK_NSEC, vdso_data, wtom_clock_nsec);
OFFSET(STAMP_XTIME, vdso_data, stamp_xtime);
OFFSET(STAMP_SEC_FRAC, vdso_data, stamp_sec_fraction);
+   OFFSET(CLOCK_REALTIME_RES, vdso_data, hrtimer_res);
OFFSET(CFG_ICACHE_BLOCKSZ, vdso_data, icache_block_size);
OFFSET(CFG_DCACHE_BLOCKSZ, vdso_data, dcache_block_size);
OFFSET(CFG_ICACHE_LOGBLOCKSZ, vdso_data, icache_log_block_size);
@@ -419,7 +420,6 @@ int main(void)
DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
DEFINE(CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE);
DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
-   DEFINE(CLOCK_REALTIME_RES, MONOTONIC_RES_NSEC);
 
 #ifdef CONFIG_BUG
DEFINE(BUG_ENTRY_SIZE, sizeof(struct bug_entry));
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 325d60633dfa..4ea4e9d7a58e 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -963,6 +963,7 @@ void update_vsyscall(struct timekeeper *tk)
vdso_data->wtom_clock_nsec = tk->wall_to_monotonic.tv_nsec;
vdso_data->stamp_xtime = xt;
vdso_data->stamp_sec_fraction = frac_sec;
+   vdso_data->hrtimer_res = hrtimer_resolution;
smp_wmb();
++(vdso_data->tb_update_count);
 }
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index afd516b572f8..2b5f9e83c610 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -160,12 +160,15 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
crorcr0*4+eq,cr0*4+eq,cr1*4+eq
bne cr0,99f
 
+   mflrr12
+  .cfi_register lr,r12
+   bl  __get_datapage@local
+   lwz r5,CLOCK_REALTIME_RES(r3)
+   mtlrr12
li  r3,0
cmpli   cr0,r4,0
crclr   cr0*4+so
beqlr
-   lis r5,CLOCK_REALTIME_RES@h
-   ori r5,r5,CLOCK_REALTIME_RES@l
stw r3,TSPC32_TV_SEC(r4)
stw r5,TSPC32_TV_NSEC(r4)
blr
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/powerpc/kernel/vdso64/gettimeofday.S
index 1f324c28705b..f07730f73d5e 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -190,12 +190,15 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
cror

[PATCH v3 0/3] Fix vDSO clock_getres()

2019-05-22 Thread Vincenzo Frascino

clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().

In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.

A possible fix is to change the vdso implementation of clock_getres,
keeping a copy of hrtimer_resolution in vdso data and using that
directly [1].

This patchset implements the proposed fix for arm64, powerpc, s390,
nds32 and adds a test to verify that the syscall and the vdso library
implementation of clock_getres return the same values.

Even if these patches are unified by the same topic, there is no
dependency between them, hence they can be merged singularly by each
arch maintainer.

Note: arm64 and nds32 respective fixes have been merged in 5.2-rc1,
hence they have been removed from this series.

[1] https://marc.info/?l=linux-arm-kernel=155110381930196=2

Changes:

v3:
  - Rebased on 5.2-rc1.
  - Addressed review comments.
v2:
  - Rebased on 5.1-rc5.
  - Addressed review comments.

Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Shuah Khan 
Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Signed-off-by: Vincenzo Frascino 

Vincenzo Frascino (3):
  powerpc: Fix vDSO clock_getres()
  s390: Fix vDSO clock_getres()
  kselftest: Extend vDSO selftest to clock_getres

 arch/powerpc/include/asm/vdso_datapage.h  |   2 +
 arch/powerpc/kernel/asm-offsets.c |   2 +-
 arch/powerpc/kernel/time.c|   1 +
 arch/powerpc/kernel/vdso32/gettimeofday.S |   7 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S |   7 +-
 arch/s390/include/asm/vdso.h  |   1 +
 arch/s390/kernel/asm-offsets.c|   2 +-
 arch/s390/kernel/time.c   |   1 +
 arch/s390/kernel/vdso32/clock_getres.S|  12 +-
 arch/s390/kernel/vdso64/clock_getres.S|  10 +-
 tools/testing/selftests/vDSO/Makefile |   2 +
 .../selftests/vDSO/vdso_clock_getres.c| 137 ++
 12 files changed, 168 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c

-- 
2.21.0

[PATCH] spi: spi-fsl-spi: call spi_finalize_current_message() at the end

2019-05-22 Thread Christophe Leroy

spi_finalize_current_message() shall be called once all
actions are finished, otherwise the last actions might
step over a newly started transfer.

Fixes: c592becbe704 ("spi: fsl-(e)spi: migrate to generic master queueing")
Signed-off-by: Christophe Leroy 
---
 drivers/spi/spi-fsl-spi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index b36ac6aa3b1f..7fbdaf066719 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -432,7 +432,6 @@ static int fsl_spi_do_one_msg(struct spi_master *master,
}
 
m->status = status;
-   spi_finalize_current_message(master);
 
if (status || !cs_change) {
ndelay(nsecs);
@@ -440,6 +439,7 @@ static int fsl_spi_do_one_msg(struct spi_master *master,
}
 
fsl_spi_setup_transfer(spi, NULL);
+   spi_finalize_current_message(master);
return 0;
 }
 
-- 
2.13.3

Re: [RFC PATCH V2 3/3] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-22 Thread Satheesh Rajendran

On Wed, May 22, 2019 at 01:57:01PM +0530, Aneesh Kumar K.V wrote:
> The nfpn related change is needed to fix the kernel message
> 
> "number of pfns truncated from 2617344 to 163584"
> 
> The change makes sure the nfpns stored in the superblock is right value.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/label.c   | 2 +-
>  drivers/nvdimm/pfn_devs.c| 6 +++---
>  drivers/nvdimm/region_devs.c | 8 
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
> index f3d753d3169c..bc6de8fb0153 100644
> --- a/drivers/nvdimm/label.c
> +++ b/drivers/nvdimm/label.c
> @@ -361,7 +361,7 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
> 
>   /* check that DPA allocations are page aligned */
>   if ((__le64_to_cpu(nd_label->dpa)
> - | __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
> + | __le64_to_cpu(nd_label->rawsize)) % PAGE_SIZE)
>   return false;
> 
>   /* check checksum */
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 39fa8cf8ef58..9fc2e514e28a 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -769,8 +769,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>* when populating the vmemmap. This *should* be equal to
>* PMD_SIZE for most architectures.
>*/
> - offset = ALIGN(start + reserve + 64 * npfns,
> - max(nd_pfn->align, PMD_SIZE)) - start;
> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
> +max(nd_pfn->align, PMD_SIZE)) - start;
>   } else if (nd_pfn->mode == PFN_MODE_RAM)
>   offset = ALIGN(start + reserve, nd_pfn->align) - start;
>   else
> @@ -782,7 +782,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>   return -ENXIO;
>   }
> 
> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;
>   pfn_sb->mode = cpu_to_le32(nd_pfn->mode);
>   pfn_sb->dataoff = cpu_to_le64(offset);
>   pfn_sb->npfns = cpu_to_le64(npfns);
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index b4ef7d9ff22e..2d8facea5a03 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -994,10 +994,10 @@ static struct nd_region *nd_region_create(struct 
> nvdimm_bus *nvdimm_bus,
>   struct nd_mapping_desc *mapping = _desc->mapping[i];
>   struct nvdimm *nvdimm = mapping->nvdimm;
> 
> - if ((mapping->start | mapping->size) % SZ_4K) {
> - dev_err(_bus->dev, "%s: %s mapping%d is not 4K 
> aligned\n",
> - caller, dev_name(>dev), i);
> -
> + if ((mapping->start | mapping->size) % PAGE_SIZE) {
> + dev_err(_bus->dev,
> + "%s: %s mapping%d is not 4K aligned\n",

s/not 4K aligned/not PAGE_SIZE aligned ?

hope the error msg need to be changed as well..

Regards,
-Satheesh.
> + caller, dev_name(>dev), i);
>   return NULL;
>   }
> 
> -- 
> 2.21.0
>

[RFC PATCH V2 1/3] mm/nvdimm: Add PFN_MIN_VERSION support

2019-05-22 Thread Aneesh Kumar K.V

This allows us to make changes in a backward incompatible way. I have
kept the PFN_MIN_VERSION in this patch '0' because we are not introducing
any incompatible changes in this patch. We also may want to backport this
to older kernels.

The error looks like

  dax0.1: init failed, superblock min version 1, kernel support version 0

and the namespace is marked disabled

$ndctl list -Ni
[
  {
"dev":"namespace0.0",
"mode":"fsdax",
"map":"mem",
"size":10737418240,
"uuid":"9605de6d-cefa-4a87-99cd-dec28b02cffe",
"state":"disabled"
  }
]

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/pfn.h  |  9 -
 drivers/nvdimm/pfn_devs.c |  8 
 drivers/nvdimm/pmem.c | 26 ++
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dde9853453d3..5fd29242745a 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -20,6 +20,12 @@
 #define PFN_SIG_LEN 16
 #define PFN_SIG "NVDIMM_PFN_INFO\0"
 #define DAX_SIG "NVDIMM_DAX_INFO\0"
+/*
+ * increment this when we are making changes such that older
+ * kernel should fail to initialize that namespace.
+ */
+
+#define PFN_MIN_VERSION 0
 
 struct nd_pfn_sb {
u8 signature[PFN_SIG_LEN];
@@ -36,7 +42,8 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
-   u8 padding[4000];
+   __le16 min_version;
+   u8 padding[3998];
__le64 checksum;
 };
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 01f40672507f..a2268cf262f5 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -439,6 +439,13 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
if (nvdimm_read_bytes(ndns, SZ_4K, pfn_sb, sizeof(*pfn_sb), 0))
return -ENXIO;
 
+   if (le16_to_cpu(pfn_sb->min_version) > PFN_MIN_VERSION) {
+   dev_err(_pfn->dev,
+   "init failed, superblock min version %ld kernel support 
version %ld\n",
+   le16_to_cpu(pfn_sb->min_version), PFN_MIN_VERSION);
+   return -EOPNOTSUPP;
+   }
+
if (memcmp(pfn_sb->signature, sig, PFN_SIG_LEN) != 0)
return -ENODEV;
 
@@ -769,6 +776,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(>dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
pfn_sb->version_minor = cpu_to_le16(2);
+   pfn_sb->min_version = cpu_to_le16(PFN_MIN_VERSION);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 845c5b430cdd..406427c064d9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
 
 static int nd_pmem_probe(struct device *dev)
 {
+   int ret;
struct nd_namespace_common *ndns;
 
ndns = nvdimm_namespace_common_probe(dev);
@@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
if (is_nd_pfn(dev))
return pmem_attach_disk(dev, ndns);
 
-   /* if we find a valid info-block we'll come back as that personality */
-   if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
-   || nd_dax_probe(dev, ndns) == 0)
+   ret = nd_btt_probe(dev, ndns);
+   if (ret == 0)
return -ENXIO;
+   else if (ret == -EOPNOTSUPP)
+   return ret;
 
-   /* ...otherwise we're just a raw pmem device */
+   ret = nd_pfn_probe(dev, ndns);
+   if (ret == 0)
+   return -ENXIO;
+   else if (ret == -EOPNOTSUPP)
+   return ret;
+
+   ret = nd_dax_probe(dev, ndns);
+   if (ret == 0)
+   return -ENXIO;
+   else if (ret == -EOPNOTSUPP)
+   return ret;
+   /*
+* We have two failure conditions here, there is no
+* info reserver block or we found a valid info reserve block
+* but failed to initialize the pfn superblock.
+* Don't create a raw pmem disk for the second case.
+*/
return pmem_attach_disk(dev, ndns);
 }
 
-- 
2.21.0

[RFC PATCH V2 3/3] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-22 Thread Aneesh Kumar K.V

The nfpn related change is needed to fix the kernel message

"number of pfns truncated from 2617344 to 163584"

The change makes sure the nfpns stored in the superblock is right value.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/label.c   | 2 +-
 drivers/nvdimm/pfn_devs.c| 6 +++---
 drivers/nvdimm/region_devs.c | 8 
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index f3d753d3169c..bc6de8fb0153 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -361,7 +361,7 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
 
/* check that DPA allocations are page aligned */
if ((__le64_to_cpu(nd_label->dpa)
-   | __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
+   | __le64_to_cpu(nd_label->rawsize)) % PAGE_SIZE)
return false;
 
/* check checksum */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 39fa8cf8ef58..9fc2e514e28a 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -769,8 +769,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 * when populating the vmemmap. This *should* be equal to
 * PMD_SIZE for most architectures.
 */
-   offset = ALIGN(start + reserve + 64 * npfns,
-   max(nd_pfn->align, PMD_SIZE)) - start;
+   offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
+  max(nd_pfn->align, PMD_SIZE)) - start;
} else if (nd_pfn->mode == PFN_MODE_RAM)
offset = ALIGN(start + reserve, nd_pfn->align) - start;
else
@@ -782,7 +782,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
return -ENXIO;
}
 
-   npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
+   npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;
pfn_sb->mode = cpu_to_le32(nd_pfn->mode);
pfn_sb->dataoff = cpu_to_le64(offset);
pfn_sb->npfns = cpu_to_le64(npfns);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..2d8facea5a03 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -994,10 +994,10 @@ static struct nd_region *nd_region_create(struct 
nvdimm_bus *nvdimm_bus,
struct nd_mapping_desc *mapping = _desc->mapping[i];
struct nvdimm *nvdimm = mapping->nvdimm;
 
-   if ((mapping->start | mapping->size) % SZ_4K) {
-   dev_err(_bus->dev, "%s: %s mapping%d is not 4K 
aligned\n",
-   caller, dev_name(>dev), i);
-
+   if ((mapping->start | mapping->size) % PAGE_SIZE) {
+   dev_err(_bus->dev,
+   "%s: %s mapping%d is not 4K aligned\n",
+   caller, dev_name(>dev), i);
return NULL;
}
 
-- 
2.21.0

[RFC PATCH V2 2/3] mm/nvdimm: Add page size and struct page size to pfn superblock

2019-05-22 Thread Aneesh Kumar K.V

This is needed so that we don't wrongly initialize a namespace
which doesn't have enough space reserved for holding struct pages
with the current kernel.

We also increment PFN_MIN_VERSION to make sure that older kernel
won't initialize namespace created with newer kernel.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/pfn.h  |  7 +--
 drivers/nvdimm/pfn_devs.c | 19 ++-
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index 5fd29242745a..ba11738ca8a2 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -25,7 +25,7 @@
  * kernel should fail to initialize that namespace.
  */
 
-#define PFN_MIN_VERSION 0
+#define PFN_MIN_VERSION 1
 
 struct nd_pfn_sb {
u8 signature[PFN_SIG_LEN];
@@ -43,7 +43,10 @@ struct nd_pfn_sb {
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
__le16 min_version;
-   u8 padding[3998];
+   /* minor-version-3 record the page size and struct page size */
+   __le16 page_struct_size;
+   __le32 page_size;
+   u8 padding[3992];
__le64 checksum;
 };
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index a2268cf262f5..39fa8cf8ef58 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -466,6 +466,15 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
if (__le16_to_cpu(pfn_sb->version_minor) < 2)
pfn_sb->align = 0;
 
+   if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
+   /*
+* For a large part we use PAGE_SIZE. But we
+* do have some accounting code using SZ_4K.
+*/
+   pfn_sb->page_struct_size = cpu_to_le16(64);
+   pfn_sb->page_size = cpu_to_le32(SZ_4K);
+   }
+
switch (le32_to_cpu(pfn_sb->mode)) {
case PFN_MODE_RAM:
case PFN_MODE_PMEM:
@@ -481,6 +490,12 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
align = 1UL << ilog2(offset);
mode = le32_to_cpu(pfn_sb->mode);
 
+   if (le32_to_cpu(pfn_sb->page_size) != PAGE_SIZE)
+   return -EOPNOTSUPP;
+
+   if (le16_to_cpu(pfn_sb->page_struct_size) != sizeof(struct page))
+   return -EOPNOTSUPP;
+
if (!nd_pfn->uuid) {
/*
 * When probing a namepace via nd_pfn_probe() the uuid
@@ -775,11 +790,13 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(>dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
-   pfn_sb->version_minor = cpu_to_le16(2);
+   pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->min_version = cpu_to_le16(PFN_MIN_VERSION);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
+   pfn_sb->page_struct_size = cpu_to_le16(sizeof(struct page));
+   pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
pfn_sb->checksum = cpu_to_le64(checksum);
 
-- 
2.21.0

Re: [PATCH 1/2] open: add close_range()

2019-05-22 Thread Christian Brauner

On Tue, May 21, 2019 at 10:23 PM Linus Torvalds
 wrote:
>
> On Tue, May 21, 2019 at 9:41 AM Christian Brauner  
> wrote:
> >
> > Yeah, you mentioned this before. I do like being able to specify an
> > upper bound to have the ability to place fds strategically after said
> > upper bound.
>
> I suspect that's the case.
>
> And if somebody really wants to just close everything and uses a large
> upper bound, we can - if we really want to - just compare the upper
> bound to the file table size, and do an optimized case for that. We do
> that upper bound comparison anyway to limit the size of the walk, so
> *if* it's a big deal, that case could then do the whole "shrink
> fdtable" case too.

Makes sense.

>
> But I don't believe it's worth optimizing for unless somebody really
> has a load where that is shown to be a big deal.   Just do the silly
> and simple loop, and add a cond_resched() in the loop, like
> close_files() does for the "we have a _lot_ of files open" case.

Ok. I will resend a v1 later with the cond_resched() logic you and Al
suggested added.

Thanks!
Christian

Re: [BISECTED] kexec regression on PowerBook G4

2019-05-22 Thread Christophe Leroy


Hi Again,

On 05/22/2019 06:14 AM, Christophe Leroy wrote:

Hi Aero,

Le 22/05/2019 à 00:18, Aaro Koskinen a écrit :

Hi,

I was trying to upgrade from v5.0 -> v5.1 on PowerBook G4, but when 
trying

to kexec a kernel the system gets stuck (no errors seen on the console).


Do you mean you are trying to kexec a v5.1 kernel from a v5.0 kernel, or 
do you have a working v5.1 kernel, but kexec doesn't work with it ?




Bisected to: 93c4a162b014 ("powerpc/6xx: Store PGDIR physical address
in a SPRG"). This commit doesn't revert cleanly anymore but I tested
that the one before works OK.


Not sure that's the problem. There was a problem with that commit, but 
it was fixed by 4622a2d43101 ("powerpc/6xx: fix setup and use of 
SPRN_SPRG_PGDIR for hash32").
You probably hit some commit between those two during bisect, that's 
likely the reason why you ended here.


Can you restart your bisect from 4622a2d43101 ?

If you have CONFIG_SMP, maybe you should also consider taking 
397d2300b08c ("powerpc/32s: fix flush_hash_pages() on SMP"). Stable 
5.1.4 includes it.




With current Linus HEAD (9c7db5004280), it gets a bit further but still
doesn't work: now I get an error on the console after kexec "Starting
new kernel! ... Bye!":

kernel tried to execute exec-protected page (...) - exploit attempt?


Interesting.

Do you have CONFIG_STRICT_KERNEL_RWX=y in your .config ? If so, can you 
retry without it ?


After looking at the code, I don't thing CONFIG_STRICT_KERNEL_RWX will 
make any difference. Can you try the patch below ?


From 8c1039da0d0f26cdf995156a905fc97fe7bda36c Mon Sep 17 00:00:00 2001
From: Christophe Leroy 
Date: Wed, 22 May 2019 07:28:42 +
Subject: [PATCH] Fix Kexec

---
 arch/powerpc/include/asm/pgtable.h | 2 ++
 arch/powerpc/kernel/machine_kexec_32.c | 4 
 arch/powerpc/mm/pgtable_32.c   | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h

index 3f53be60fb01..642eea937229 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -140,6 +140,8 @@ static inline void pte_frag_set(mm_context_t *ctx, 
void *p)

 }
 #endif

+int change_page_attr(struct page *page, int numpages, pgprot_t prot);
+
 #endif /* __ASSEMBLY__ */

 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/kernel/machine_kexec_32.c 
b/arch/powerpc/kernel/machine_kexec_32.c

index affe5dcce7f4..4f719501e6ae 100644
--- a/arch/powerpc/kernel/machine_kexec_32.c
+++ b/arch/powerpc/kernel/machine_kexec_32.c
@@ -54,6 +54,10 @@ void default_machine_kexec(struct kimage *image)
memcpy((void *)reboot_code_buffer, relocate_new_kernel,
relocate_new_kernel_size);

+   change_page_attr(image->control_code_page,
+ALIGN(KEXEC_CONTROL_PAGE_SIZE, PAGE_SIZE) >> 
PAGE_SHIFT,
+PAGE_KERNEL_TEXT);
+
flush_icache_range(reboot_code_buffer,
reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
printk(KERN_INFO "Bye!\n");
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 16ada373b32b..0e4651d803fc 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -340,7 +340,7 @@ static int __change_page_attr_noflush(struct page 
*page, pgprot_t prot)

  *
  * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
  */
-static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
+int change_page_attr(struct page *page, int numpages, pgprot_t prot)
 {
int i, err = 0;
unsigned long flags;
--
2.13.3

Re: [PATCH] powerpc/powernv: Return for invalid IMC domain

2019-05-22 Thread Anju T Sudhakar


Hi,

On 5/21/19 5:18 PM, Michael Ellerman wrote:

Anju T Sudhakar  writes:

Currently init_imc_pmu() can be failed either because
an IMC unit with invalid domain(i.e an IMC node not
supported by the kernel) is attempted a pmu-registration
or something went wrong while registering a valid IMC unit.
In both the cases kernel provides a 'Registration failed'
error message.

Example:
Log message, when trace-imc node is not supported by the kernel, and the
skiboot supports trace-imc node.

So for kernel, trace-imc node is now an unknown domain.

[1.731870] nest_phb5_imc performance monitor hardware support registered
[1.731944] nest_powerbus0_imc performance monitor hardware support 
registered
[1.734458] thread_imc performance monitor hardware support registered
[1.734460] IMC Unknown Device type
[1.734462] IMC PMU (null) Register failed
[1.734558] nest_xlink0_imc performance monitor hardware support registered
[1.734614] nest_xlink1_imc performance monitor hardware support registered
[1.734670] nest_xlink2_imc performance monitor hardware support registered
[1.747043] Initialise system trusted keyrings
[1.747054] Key type blacklist registered


To avoid ambiguity on the error message, return for invalid domain
before attempting a pmu registration.

What do we print once the patch is applied?



Once the patch is applied, we return for invalid domains. so we will 
only have


`/IMC Unknown Device type/` message printed for *unknown domains*.

And `/IMC PMU (null) Register failed/` message will appear only if the

registration fails for a *known domain*.



Thanks,

Anju

Re: [RFC PATCH 1/3] mm/nvdimm: Add PFN_MIN_VERSION support

2019-05-22 Thread Aneesh Kumar K.V


On 5/22/19 11:50 AM, Aneesh Kumar K.V wrote:

This allows us to make changes in a backward incompatible way. I have
kept the PFN_MIN_VERSION in this patch '0' because we are not introducing
any incompatible changes in this patch. We also may want to backport this
to older kernels.

Signed-off-by: Aneesh Kumar K.V 
---
  drivers/nvdimm/pfn.h  |  9 -
  drivers/nvdimm/pfn_devs.c |  4 
  drivers/nvdimm/pmem.c | 26 ++
  3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dde9853453d3..1b10ae5773b6 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -20,6 +20,12 @@
  #define PFN_SIG_LEN 16
  #define PFN_SIG "NVDIMM_PFN_INFO\0"
  #define DAX_SIG "NVDIMM_DAX_INFO\0"
+/*
+ * increment this when we are making changes such that older
+ * kernel should fail to initialize that namespace.
+ */
+
+#define PFN_MIN_VERSION 0
  
  struct nd_pfn_sb {

u8 signature[PFN_SIG_LEN];
@@ -36,7 +42,8 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
-   u8 padding[4000];
+   __le16 min_verison;
+   u8 padding[3998];
__le64 checksum;
  };
  
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c

index 01f40672507f..3250de70a7b3 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -439,6 +439,9 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
if (nvdimm_read_bytes(ndns, SZ_4K, pfn_sb, sizeof(*pfn_sb), 0))
return -ENXIO;
  
+	if (le16_to_cpu(pfn_sb->min_version > PFN_MIN_VERSION))

+   return -EOPNOTSUPP;


+   if (le16_to_cpu(pfn_sb->min_version) > PFN_MIN_VERSION)
+   return -EOPNOTSUPP;



-aneesh

[RFC PATCH 3/3] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-22 Thread Aneesh Kumar K.V

The nfpn related change is needed to fix the kernel message

"number of pfns truncated from 2617344 to 163584"

The change makes sure the nfpns stored in the superblock is right value.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/label.c   | 2 +-
 drivers/nvdimm/pfn_devs.c| 6 +++---
 drivers/nvdimm/region_devs.c | 8 
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/nvdimm/label.c b/drivers/nvdimm/label.c
index f3d753d3169c..bc6de8fb0153 100644
--- a/drivers/nvdimm/label.c
+++ b/drivers/nvdimm/label.c
@@ -361,7 +361,7 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
 
/* check that DPA allocations are page aligned */
if ((__le64_to_cpu(nd_label->dpa)
-   | __le64_to_cpu(nd_label->rawsize)) % SZ_4K)
+   | __le64_to_cpu(nd_label->rawsize)) % PAGE_SIZE)
return false;
 
/* check checksum */
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 94918a4e6e73..f549bddc680c 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -765,8 +765,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 * when populating the vmemmap. This *should* be equal to
 * PMD_SIZE for most architectures.
 */
-   offset = ALIGN(start + reserve + 64 * npfns,
-   max(nd_pfn->align, PMD_SIZE)) - start;
+   offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
+  max(nd_pfn->align, PMD_SIZE)) - start;
} else if (nd_pfn->mode == PFN_MODE_RAM)
offset = ALIGN(start + reserve, nd_pfn->align) - start;
else
@@ -778,7 +778,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
return -ENXIO;
}
 
-   npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
+   npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;
pfn_sb->mode = cpu_to_le32(nd_pfn->mode);
pfn_sb->dataoff = cpu_to_le64(offset);
pfn_sb->npfns = cpu_to_le64(npfns);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..2d8facea5a03 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -994,10 +994,10 @@ static struct nd_region *nd_region_create(struct 
nvdimm_bus *nvdimm_bus,
struct nd_mapping_desc *mapping = _desc->mapping[i];
struct nvdimm *nvdimm = mapping->nvdimm;
 
-   if ((mapping->start | mapping->size) % SZ_4K) {
-   dev_err(_bus->dev, "%s: %s mapping%d is not 4K 
aligned\n",
-   caller, dev_name(>dev), i);
-
+   if ((mapping->start | mapping->size) % PAGE_SIZE) {
+   dev_err(_bus->dev,
+   "%s: %s mapping%d is not 4K aligned\n",
+   caller, dev_name(>dev), i);
return NULL;
}
 
-- 
2.21.0

[RFC PATCH 2/3] mm/nvdimm: Add page size and struct page size to pfn superblock

2019-05-22 Thread Aneesh Kumar K.V

This is needed so that we don't wrongly initialize a namespace
which doesn't have enough space reserved for holding struct pages
with the current kernel.

We also increment PFN_MIN_VERSION to make sure that older kernel
won't initialize namespace created with newer kernel.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/pfn.h  |  9 ++---
 drivers/nvdimm/pfn_devs.c | 19 ++-
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index 1b10ae5773b6..ba11738ca8a2 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -25,7 +25,7 @@
  * kernel should fail to initialize that namespace.
  */
 
-#define PFN_MIN_VERSION 0
+#define PFN_MIN_VERSION 1
 
 struct nd_pfn_sb {
u8 signature[PFN_SIG_LEN];
@@ -42,8 +42,11 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
-   __le16 min_verison;
-   u8 padding[3998];
+   __le16 min_version;
+   /* minor-version-3 record the page size and struct page size */
+   __le16 page_struct_size;
+   __le32 page_size;
+   u8 padding[3992];
__le64 checksum;
 };
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 3250de70a7b3..94918a4e6e73 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -462,6 +462,15 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
if (__le16_to_cpu(pfn_sb->version_minor) < 2)
pfn_sb->align = 0;
 
+   if (__le16_to_cpu(pfn_sb->version_minor) < 3) {
+   /*
+* For a large part we use PAGE_SIZE. But we
+* do have some accounting code using SZ_4K.
+*/
+   pfn_sb->page_struct_size = cpu_to_le16(64);
+   pfn_sb->page_size = cpu_to_le32(SZ_4K);
+   }
+
switch (le32_to_cpu(pfn_sb->mode)) {
case PFN_MODE_RAM:
case PFN_MODE_PMEM:
@@ -477,6 +486,12 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
align = 1UL << ilog2(offset);
mode = le32_to_cpu(pfn_sb->mode);
 
+   if (le32_to_cpu(pfn_sb->page_size) != PAGE_SIZE)
+   return -EOPNOTSUPP;
+
+   if (le16_to_cpu(pfn_sb->page_struct_size) != sizeof(struct page))
+   return -EOPNOTSUPP;
+
if (!nd_pfn->uuid) {
/*
 * When probing a namepace via nd_pfn_probe() the uuid
@@ -771,11 +786,13 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(>dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
-   pfn_sb->version_minor = cpu_to_le16(2);
+   pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->min_version = cpu_to_le16(PFN_MIN_VERSION);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
+   pfn_sb->page_struct_size = cpu_to_le16(sizeof(struct page));
+   pfn_sb->page_size = cpu_to_le32(PAGE_SIZE);
checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb);
pfn_sb->checksum = cpu_to_le64(checksum);
 
-- 
2.21.0

[RFC PATCH 1/3] mm/nvdimm: Add PFN_MIN_VERSION support

2019-05-22 Thread Aneesh Kumar K.V

This allows us to make changes in a backward incompatible way. I have
kept the PFN_MIN_VERSION in this patch '0' because we are not introducing
any incompatible changes in this patch. We also may want to backport this
to older kernels.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/pfn.h  |  9 -
 drivers/nvdimm/pfn_devs.c |  4 
 drivers/nvdimm/pmem.c | 26 ++
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h
index dde9853453d3..1b10ae5773b6 100644
--- a/drivers/nvdimm/pfn.h
+++ b/drivers/nvdimm/pfn.h
@@ -20,6 +20,12 @@
 #define PFN_SIG_LEN 16
 #define PFN_SIG "NVDIMM_PFN_INFO\0"
 #define DAX_SIG "NVDIMM_DAX_INFO\0"
+/*
+ * increment this when we are making changes such that older
+ * kernel should fail to initialize that namespace.
+ */
+
+#define PFN_MIN_VERSION 0
 
 struct nd_pfn_sb {
u8 signature[PFN_SIG_LEN];
@@ -36,7 +42,8 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
-   u8 padding[4000];
+   __le16 min_verison;
+   u8 padding[3998];
__le64 checksum;
 };
 
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 01f40672507f..3250de70a7b3 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -439,6 +439,9 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
if (nvdimm_read_bytes(ndns, SZ_4K, pfn_sb, sizeof(*pfn_sb), 0))
return -ENXIO;
 
+   if (le16_to_cpu(pfn_sb->min_version > PFN_MIN_VERSION))
+   return -EOPNOTSUPP;
+
if (memcmp(pfn_sb->signature, sig, PFN_SIG_LEN) != 0)
return -ENODEV;
 
@@ -769,6 +772,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(>dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
pfn_sb->version_minor = cpu_to_le16(2);
+   pfn_sb->min_version = cpu_to_le16(PFN_MIN_VERSION);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 845c5b430cdd..406427c064d9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
 
 static int nd_pmem_probe(struct device *dev)
 {
+   int ret;
struct nd_namespace_common *ndns;
 
ndns = nvdimm_namespace_common_probe(dev);
@@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
if (is_nd_pfn(dev))
return pmem_attach_disk(dev, ndns);
 
-   /* if we find a valid info-block we'll come back as that personality */
-   if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
-   || nd_dax_probe(dev, ndns) == 0)
+   ret = nd_btt_probe(dev, ndns);
+   if (ret == 0)
return -ENXIO;
+   else if (ret == -EOPNOTSUPP)
+   return ret;
 
-   /* ...otherwise we're just a raw pmem device */
+   ret = nd_pfn_probe(dev, ndns);
+   if (ret == 0)
+   return -ENXIO;
+   else if (ret == -EOPNOTSUPP)
+   return ret;
+
+   ret = nd_dax_probe(dev, ndns);
+   if (ret == 0)
+   return -ENXIO;
+   else if (ret == -EOPNOTSUPP)
+   return ret;
+   /*
+* We have two failure conditions here, there is no
+* info reserver block or we found a valid info reserve block
+* but failed to initialize the pfn superblock.
+* Don't create a raw pmem disk for the second case.
+*/
return pmem_attach_disk(dev, ndns);
 }
 
-- 
2.21.0

Re: [BISECTED] kexec regression on PowerBook G4

2019-05-22 Thread Christophe Leroy


Hi Aero,

Le 22/05/2019 à 00:18, Aaro Koskinen a écrit :

Hi,

I was trying to upgrade from v5.0 -> v5.1 on PowerBook G4, but when trying
to kexec a kernel the system gets stuck (no errors seen on the console).


Do you mean you are trying to kexec a v5.1 kernel from a v5.0 kernel, or 
do you have a working v5.1 kernel, but kexec doesn't work with it ?




Bisected to: 93c4a162b014 ("powerpc/6xx: Store PGDIR physical address
in a SPRG"). This commit doesn't revert cleanly anymore but I tested
that the one before works OK.


Not sure that's the problem. There was a problem with that commit, but 
it was fixed by 4622a2d43101 ("powerpc/6xx: fix setup and use of 
SPRN_SPRG_PGDIR for hash32").
You probably hit some commit between those two during bisect, that's 
likely the reason why you ended here.


Can you restart your bisect from 4622a2d43101 ?

If you have CONFIG_SMP, maybe you should also consider taking 
397d2300b08c ("powerpc/32s: fix flush_hash_pages() on SMP"). Stable 
5.1.4 includes it.




With current Linus HEAD (9c7db5004280), it gets a bit further but still
doesn't work: now I get an error on the console after kexec "Starting
new kernel! ... Bye!":

kernel tried to execute exec-protected page (...) - exploit attempt?


Interesting.

Do you have CONFIG_STRICT_KERNEL_RWX=y in your .config ? If so, can you 
retry without it ?


Thanks
Christophe



A.

70 matches

Mail list logo