[PATCH] powerpc/mm: Fix build error with FLATMEM book3s64 config

2019-04-03 Thread Aneesh Kumar K.V
The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs.
We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32
bit configs never expected a value to be set for MAX_PHYSMEM_BITS.

Dependent code such as zsmalloc derived the right values based on other
fields. Instead of finding a value that works with different configs,
use new values only for book3s_64. For 64 bit booke, use the definition
of MAX_PHYSMEM_BITS as per commit a7df61a0e2b6 ("[PATCH] ppc64: Increase 
sparsemem defaults")
That change was done in 2005 and hopefully will work with book3e 64.

Fixes: 8bc086899816 ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM 
configurations")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 15 +++
 arch/powerpc/include/asm/mmu.h   | 15 ---
 arch/powerpc/include/asm/nohash/64/mmu.h |  2 ++
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 1ceee000c18d..a809bdd77322 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -35,6 +35,21 @@ typedef pte_t *pgtable_t;
 
 #endif /* __ASSEMBLY__ */
 
+/*
+ * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
+ * if we increase SECTIONS_WIDTH we will not store node details in page->flags 
and
+ * page_to_nid does a page->section->node lookup
+ * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME 
reduce
+ * memory requirements with large number of sections.
+ * 51 bits is the max physical real address on POWER9
+ */
+#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) &&  
\
+   defined(CONFIG_PPC_64K_PAGES)
+#define MAX_PHYSMEM_BITS 51
+#else
+#define MAX_PHYSMEM_BITS 46
+#endif
+
 /* 64-bit classic hash table MMU */
 #include 
 
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 598cdcdd1355..78d53c4396ac 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -341,21 +341,6 @@ static inline bool strict_kernel_rwx_enabled(void)
  */
 #define MMU_PAGE_COUNT 16
 
-/*
- * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
- * if we increase SECTIONS_WIDTH we will not store node details in page->flags 
and
- * page_to_nid does a page->section->node lookup
- * Hence only increase for VMEMMAP. Further depending on SPARSEMEM_EXTREME 
reduce
- * memory requirements with large number of sections.
- * 51 bits is the max physical real address on POWER9
- */
-#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) &&  
\
-   defined (CONFIG_PPC_64K_PAGES)
-#define MAX_PHYSMEM_BITS51
-#elif defined(CONFIG_SPARSEMEM)
-#define MAX_PHYSMEM_BITS46
-#endif
-
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
 #else /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index e6585480dfc4..81cf30c370e5 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
 #define _ASM_POWERPC_NOHASH_64_MMU_H_
 
+#define MAX_PHYSMEM_BITS44
+
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
-- 
2.20.1



Re: [PATCH 3/5] s390: Fix vDSO clock_getres()

2019-04-03 Thread Martin Schwidefsky
On Mon,  1 Apr 2019 12:51:50 +0100
Vincenzo Frascino  wrote:

> clock_getres in the vDSO library has to preserve the same behaviour
> of posix_get_hrtimer_res().
> 
> In particular, posix_get_hrtimer_res() does:
> sec = 0;
> ns = hrtimer_resolution;
> and hrtimer_resolution depends on the enablement of the high
> resolution timers that can happen either at compile or at run time.
> 
> Fix the s390 vdso implementation of clock_getres keeping a copy of
> hrtimer_resolution in vdso data and using that directly.
> 
> Cc: Martin Schwidefsky 
> Cc: Heiko Carstens 
> Signed-off-by: Vincenzo Frascino 
> ---
>  arch/s390/include/asm/vdso.h   |  1 +
>  arch/s390/kernel/asm-offsets.c |  2 +-
>  arch/s390/kernel/time.c|  1 +
>  arch/s390/kernel/vdso32/clock_getres.S | 17 -
>  arch/s390/kernel/vdso64/clock_getres.S | 15 ++-
>  5 files changed, 25 insertions(+), 11 deletions(-)

I tried this patch and in principle this works. In that regard
Acked-by: Martin Schwidefsky 

But I wonder if the loop to check the update counter is really
necessary. The hrtimer_resolution value can only changes once with
the first call to hrtimer_switch_to_hres(). With the TOD clock
as the only clock available on s390 we always have the ability
to do hrtimer. It then all depends on the highres=[on|off] kernel
parameter what value we get with clock_getres().

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [PATCH] powerpc: config: skiroot: Add (back) MLX5 ethernet support

2019-04-03 Thread Mathieu Malaterre
On Wed, Apr 3, 2019 at 2:51 AM Joel Stanley  wrote:
>
> It turns out that some defconfig changes and kernel config option
> changes meant we accidentally dropped Ethernet support for Mellanox CLX5
> cards.
>
> Reported-by: Carol L Soto 
> Suggested-by: Carol L Soto 
> Signed-off-by: Stewart Smith 
> Signed-off-by: Joel Stanley 

Fixes: cbc39809a398 ("powerpc/configs: Update skiroot defconfig")

?

> ---
>  arch/powerpc/configs/skiroot_defconfig | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/configs/skiroot_defconfig 
> b/arch/powerpc/configs/skiroot_defconfig
> index 5ba131c30f6b..6038b9347d9e 100644
> --- a/arch/powerpc/configs/skiroot_defconfig
> +++ b/arch/powerpc/configs/skiroot_defconfig
> @@ -163,6 +163,8 @@ CONFIG_S2IO=m
>  CONFIG_MLX4_EN=m
>  # CONFIG_MLX4_CORE_GEN2 is not set
>  CONFIG_MLX5_CORE=m
> +CONFIG_MLX5_CORE_EN=y
> +# CONFIG_MLX5_EN_RXNFC is not set
>  # CONFIG_NET_VENDOR_MICREL is not set
>  # CONFIG_NET_VENDOR_MICROSEMI is not set
>  CONFIG_MYRI10GE=m
> --
> 2.20.1
>


[PATCH -next] ibmvnic: remove set but not used variable 'netdev'

2019-04-03 Thread Yue Haibing
From: YueHaibing 

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/ibm/ibmvnic.c: In function '__ibmvnic_reset':
drivers/net/ethernet/ibm/ibmvnic.c:1971:21: warning: variable 'netdev' set but 
not used [-Wunused-but-set-variable]

It's never used since introduction in
commit ed651a10875f ("ibmvnic: Updated reset handling")

Signed-off-by: YueHaibing 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 25b8e04..20c4e08 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1968,13 +1968,11 @@ static void __ibmvnic_reset(struct work_struct *work)
 {
struct ibmvnic_rwi *rwi;
struct ibmvnic_adapter *adapter;
-   struct net_device *netdev;
bool we_lock_rtnl = false;
u32 reset_state;
int rc = 0;
 
adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);
-   netdev = adapter->netdev;
 
/* netif_set_real_num_xx_queues needs to take rtnl lock here
 * unless wait_for_reset is set, in which case the rtnl lock
-- 
2.7.0




Re: [PATCH -next] ibmvnic: remove set but not used variable 'netdev'

2019-04-03 Thread Mukesh Ojha



On 4/3/2019 1:24 PM, Yue Haibing wrote:

From: YueHaibing 

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/ibm/ibmvnic.c: In function '__ibmvnic_reset':
drivers/net/ethernet/ibm/ibmvnic.c:1971:21: warning: variable 'netdev' set but 
not used [-Wunused-but-set-variable]

It's never used since introduction in
commit ed651a10875f ("ibmvnic: Updated reset handling")

Signed-off-by: YueHaibing 

Reviewed-by: Mukesh Ojha 

Cheers,
-Mukesh

---
  drivers/net/ethernet/ibm/ibmvnic.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 25b8e04..20c4e08 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1968,13 +1968,11 @@ static void __ibmvnic_reset(struct work_struct *work)
  {
struct ibmvnic_rwi *rwi;
struct ibmvnic_adapter *adapter;
-   struct net_device *netdev;
bool we_lock_rtnl = false;
u32 reset_state;
int rc = 0;
  
  	adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);

-   netdev = adapter->netdev;
  
  	/* netif_set_real_num_xx_queues needs to take rtnl lock here

 * unless wait_for_reset is set, in which case the rtnl lock


Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Will Deacon
Hi Michael,

On Wed, Apr 03, 2019 at 01:47:50PM +1100, Michael Ellerman wrote:
> Arnd Bergmann  writes:
> > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> > b/arch/powerpc/kernel/syscalls/syscall.tbl
> > index b18abb0c3dae..00f5a63c8d9a 100644
> > --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> > @@ -505,3 +505,7 @@
> >  42132  rt_sigtimedwait_time64  sys_rt_sigtimedwait 
> > compat_sys_rt_sigtimedwait_time64
> >  42232  futex_time64sys_futex   
> > sys_futex
> >  42332  sched_rr_get_interval_time64
> > sys_sched_rr_get_interval   sys_sched_rr_get_interval
> > +424common  pidfd_send_signal   sys_pidfd_send_signal
> > +425common  io_uring_setup  sys_io_uring_setup
> > +426common  io_uring_enter  sys_io_uring_enter
> > +427common  io_uring_register   sys_io_uring_register
> 
> Acked-by: Michael Ellerman  (powerpc)
> 
> Lightly tested.
> 
> The pidfd_test selftest passes.

That reports pass for me too, although it fails to unshare the pid ns, which I
assume is benign.

> Ran the io_uring example from fio, which prints lots of:

How did you invoke that? I had a play with the tests in:

  git://git.kernel.dk/liburing

but I quickly ran into the kernel oops below.

Will

--->8

will@autoplooker:~/liburing/test$ ./io_uring_register 
RELIMIT_MEMLOCK: 67108864 (67108864)
[   35.477875] Unable to handle kernel NULL pointer dereference at virtual 
address 0070
[   35.478969] Mem abort info:
[   35.479296]   ESR = 0x9604
[   35.479785]   Exception class = DABT (current EL), IL = 32 bits
[   35.480528]   SET = 0, FnV = 0
[   35.480980]   EA = 0, S1PTW = 0
[   35.481345] Data abort info:
[   35.481680]   ISV = 0, ISS = 0x0004
[   35.482267]   CM = 0, WnR = 0
[   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
[   35.483486] [0070] pgd=
[   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
[   35.484788] Modules linked in:
[   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
5.1.0-rc3-00012-g40b114779944 #1
[   35.486712] Hardware name: linux,dummy-virt (DT)
[   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
[   35.488228] pc : link_pwq+0x10/0x60
[   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
[   35.489550] sp : 17e2bbc0
[   35.490088] x29: 17e2bbc0 x28: 8004b9118000 
[   35.490939] x27:  x26: 8004c21c4200 
[   35.491786] x25: 0004 x24: 1123e1b0 
[   35.492640] x23: 8004c539 x22: 8004bb440500 
[   35.493502] x21: 8004bb440500 x20: 0070 
[   35.494355] x19: 0022 x18:  
[   35.495202] x17:  x16:  
[   35.496054] x15:  x14: 7e0012e8a240 
[   35.496910] x13: 4a73a5e663e2 x12:  
[   35.497764] x11: 0001 x10: 0070 
[   35.498611] x9 : 8004cb49d610 x8 :  
[   35.499462] x7 : 8004c4ff9c70 x6 : 8004cb49ccb0 
[   35.500308] x5 : 8004c66cc4c0 x4 : 0001 
[   35.501173] x3 :  x2 : 0040 
[   35.502019] x1 : 0004 x0 :  
[   35.502872] Process io_uring_regist (pid: 3973, stack limit = 
0x(ptrval))
[   35.504052] Call trace:
[   35.504463]  link_pwq+0x10/0x60
[   35.504987]  apply_wqattrs_commit+0xe0/0x118
[   35.505681]  apply_workqueue_attrs_locked+0x3c/0x80
[   35.506460]  apply_workqueue_attrs+0x3c/0x60
[   35.507152]  alloc_workqueue+0x264/0x430
[   35.507786]  io_uring_setup+0x478/0x6a8
[   35.508414]  __arm64_sys_io_uring_setup+0x18/0x20
[   35.509183]  el0_svc_common+0x80/0xf0
[   35.509786]  el0_svc_handler+0x2c/0x80
[   35.510393]  el0_svc+0x8/0xc
[   35.510873] Code: a9bd7bfd 910003fd a90153f3 9101c014 (f9403802) 
[   35.511843] ---[ end trace 0a53e45ee26def4c ]---
Segmentation fault


Re: [PATCH v3 5/5] Lib: sort.h: remove the size argument from the swap function

2019-04-03 Thread Andy Shevchenko
On Tue, Apr 02, 2019 at 11:55:25PM +0300, Andrey Abramov wrote:
> Removes size argument from the swap function because:
>   1) It wasn't used.
>   2) Custom swap function knows what kind of objects it swaps,
>   so it already knows their sizes.
> 
> Signed-off-by: Andrey Abramov 
> Reviewed by: George Spelvin 

FWIW,
Reviewed-by: Andy Shevchenko 

> ---
>  arch/x86/kernel/unwind_orc.c | 2 +-
>  include/linux/sort.h | 2 +-
>  kernel/jump_label.c  | 2 +-
>  lib/extable.c| 2 +-
>  lib/sort.c   | 7 +++
>  5 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 89be1be1790c..dc410b567189 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -176,7 +176,7 @@ static struct orc_entry *orc_find(unsigned long ip)
>   return orc_ftrace_find(ip);
>  }
>  
> -static void orc_sort_swap(void *_a, void *_b, int size)
> +static void orc_sort_swap(void *_a, void *_b)
>  {
>   struct orc_entry *orc_a, *orc_b;
>   struct orc_entry orc_tmp;
> diff --git a/include/linux/sort.h b/include/linux/sort.h
> index 2b99a5dd073d..13bb4635b5f1 100644
> --- a/include/linux/sort.h
> +++ b/include/linux/sort.h
> @@ -6,6 +6,6 @@
>  
>  void sort(void *base, size_t num, size_t size,
> int (*cmp)(const void *, const void *),
> -   void (*swap)(void *, void *, int));
> +   void (*swap)(void *, void *));
>  
>  #endif
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index bad96b476eb6..6b1187b8a060 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -45,7 +45,7 @@ static int jump_label_cmp(const void *a, const void *b)
>   return 0;
>  }
>  
> -static void jump_label_swap(void *a, void *b, int size)
> +static void jump_label_swap(void *a, void *b)
>  {
>   long delta = (unsigned long)a - (unsigned long)b;
>   struct jump_entry *jea = a;
> diff --git a/lib/extable.c b/lib/extable.c
> index f54996fdd0b8..0515a94538ca 100644
> --- a/lib/extable.c
> +++ b/lib/extable.c
> @@ -28,7 +28,7 @@ static inline unsigned long ex_to_insn(const struct 
> exception_table_entry *x)
>  #ifndef ARCH_HAS_RELATIVE_EXTABLE
>  #define swap_ex  NULL
>  #else
> -static void swap_ex(void *a, void *b, int size)
> +static void swap_ex(void *a, void *b)
>  {
>   struct exception_table_entry *x = a, *y = b, tmp;
>   int delta = b - a;
> diff --git a/lib/sort.c b/lib/sort.c
> index 50855ea8c262..8704750e6bde 100644
> --- a/lib/sort.c
> +++ b/lib/sort.c
> @@ -114,7 +114,7 @@ static void swap_bytes(void *a, void *b, size_t n)
>   } while (n);
>  }
>  
> -typedef void (*swap_func_t)(void *a, void *b, int size);
> +typedef void (*swap_func_t)(void *a, void *b);
>  
>  /*
>   * The values are arbitrary as long as they can't be confused with
> @@ -138,7 +138,7 @@ static void do_swap(void *a, void *b, size_t size, 
> swap_func_t swap_func)
>   else if (swap_func == SWAP_BYTES)
>   swap_bytes(a, b, size);
>   else
> - swap_func(a, b, (int)size);
> + swap_func(a, b);
>  }
>  
>  /**
> @@ -186,8 +186,7 @@ static size_t parent(size_t i, unsigned int lsbit, size_t 
> size)
>   * it less suitable for kernel use.
>   */
>  void sort(void *base, size_t num, size_t size,
> -   int (*cmp_func)(const void *, const void *),
> -   void (*swap_func)(void *, void *, int size))
> +   int (*cmp_func)(const void *, const void *), swap_func_t swap_func)
>  {
>   /* pre-scale counters for performance */
>   size_t n = num * size, a = (num/2) * size;
> -- 
> 2.21.0
> 
> 

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH 3/5] s390: Fix vDSO clock_getres()

2019-04-03 Thread Thomas Gleixner
On Wed, 3 Apr 2019, Martin Schwidefsky wrote:

> On Mon,  1 Apr 2019 12:51:50 +0100
> Vincenzo Frascino  wrote:
> 
> > clock_getres in the vDSO library has to preserve the same behaviour
> > of posix_get_hrtimer_res().
> > 
> > In particular, posix_get_hrtimer_res() does:
> > sec = 0;
> > ns = hrtimer_resolution;
> > and hrtimer_resolution depends on the enablement of the high
> > resolution timers that can happen either at compile or at run time.
> > 
> > Fix the s390 vdso implementation of clock_getres keeping a copy of
> > hrtimer_resolution in vdso data and using that directly.
> > 
> > Cc: Martin Schwidefsky 
> > Cc: Heiko Carstens 
> > Signed-off-by: Vincenzo Frascino 
> > ---
> >  arch/s390/include/asm/vdso.h   |  1 +
> >  arch/s390/kernel/asm-offsets.c |  2 +-
> >  arch/s390/kernel/time.c|  1 +
> >  arch/s390/kernel/vdso32/clock_getres.S | 17 -
> >  arch/s390/kernel/vdso64/clock_getres.S | 15 ++-
> >  5 files changed, 25 insertions(+), 11 deletions(-)
> 
> I tried this patch and in principle this works. In that regard
> Acked-by: Martin Schwidefsky 
> 
> But I wonder if the loop to check the update counter is really
> necessary. The hrtimer_resolution value can only changes once with
> the first call to hrtimer_switch_to_hres(). With the TOD clock
> as the only clock available on s390 we always have the ability
> to do hrtimer. It then all depends on the highres=[on|off] kernel
> parameter what value we get with clock_getres().

Yes, it's not changing after boot anymore.

Thanks,

tglx


RE: [PATCH] ASoC: fsl_esai: Support synchronous mode

2019-04-03 Thread S.j. Wang
Hi

> 
> > > On Mon, Apr 01, 2019 at 11:39:10AM +, S.j. Wang wrote:
> > > > In ESAI synchronous mode, the clock is generated by Tx, So we
> > > > should always set registers of Tx which relate with the bit clock
> > > > and frame clock generation (TCCR, TCR, ECR), even there is only Rx is
> working.
> > > >
> > > > Signed-off-by: Shengjiu Wang 
> > > > ---
> > > >  sound/soc/fsl/fsl_esai.c | 28 +++-
> > > >  1 file changed, 27 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
> > > > index
> > > > 3623aa9a6f2e..d9fcddd55c02 100644
> > > > --- a/sound/soc/fsl/fsl_esai.c
> > > > +++ b/sound/soc/fsl/fsl_esai.c
> > > > @@ -230,6 +230,21 @@ static int fsl_esai_set_dai_sysclk(struct
> > > snd_soc_dai *dai, int clk_id,
> > > > return -EINVAL;
> > > > }
> > > >
> > > > +   if (esai_priv->synchronous && !tx) {
> > > > +   switch (clk_id) {
> > > > +   case ESAI_HCKR_FSYS:
> > > > +   fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_FSYS,
> > > > +   freq, 
> > > > dir);
> > > > +   break;
> > > > +   case ESAI_HCKR_EXTAL:
> > > > +   fsl_esai_set_dai_sysclk(dai, ESAI_HCKT_EXTAL,
> > > > +   freq, 
> > > > dir);
> > >
> > > Not sure why you call set_dai_sysclk inside set_dai_sysclk again. It
> > > feels very confusing to do so, especially without a comments.
> >
> > For sync mode, only RX is enabled,  the register of tx should be set,
> > so call the Set_dai_sysclk again.
> 
> Yea, I understood that. But why not just replace RX with TX on the register-
> writing level? Do we need to set both TCCR and RCCR? Your change in
> hw_params() only sets TCCR inside fsl_esai_set_bclk(), so we probably only
> need to change TCCR for recordings running in sync mode, right?
> 
> From the commit message, it feels like that only the clock-related fields in
> the TX registers need to be set. Things like calculation and setting the
> direction of HCKx pin don't need to run again.
> 
> > > > @@ -537,10 +552,21 @@ static int fsl_esai_hw_params(struct
> > > > snd_pcm_substream *substream,
> > > >
> > > > bclk = params_rate(params) * slot_width * esai_priv->slots;
> > > >
> > > > -   ret = fsl_esai_set_bclk(dai, tx, bclk);
> > > > +   ret = fsl_esai_set_bclk(dai, esai_priv->synchronous ? true : tx,
> > > > +bclk);
> > > > if (ret)
> > > > return ret;
> > > >
> > > > +   if (esai_priv->synchronous && !tx) {
> > > > +   /* Use Normal mode to support monaural audio */
> > > > +   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> > > > +  ESAI_xCR_xMOD_MASK,
> > > params_channels(params) > 1 ?
> > > > +  ESAI_xCR_xMOD_NETWORK : 0);
> > > > +
> > > > +   mask = ESAI_xCR_xSWS_MASK | ESAI_xCR_PADC;
> > > > +   val = ESAI_xCR_xSWS(slot_width, width) | ESAI_xCR_PADC;
> > > > +   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> > > mask, val);
> > > > +   }
> > >
> > > Does synchronous mode require to set both TCR and RCR? or just TCR?
> 
> > Both TCR and RCR.  RCR will be set in normal flow.
> 
> OK. Settings both xCRs makes sense. Would you please try this:
> 
> ===
> @@ -537,14 +552,20 @@ static int fsl_esai_hw_params(struct
> snd_pcm_substream *substream,
> 
> bclk = params_rate(params) * slot_width * esai_priv->slots;
> 
> -   ret = fsl_esai_set_bclk(dai, tx, bclk);
> +   /* Synchronous mode uses TX clock generator */
> +   ret = fsl_esai_set_bclk(dai, esai_priv->synchronous || tx,
> + bclk);
> if (ret)
> return ret;
> 
> +   mask = ESAI_xCR_xMOD_MASK | ESAI_xCR_xSWS_MASK;
> +   val = ESAI_xCR_xSWS(slot_width, width);
> /* Use Normal mode to support monaural audio */
> -   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
> -  ESAI_xCR_xMOD_MASK, params_channels(params) > 1 ?
> -  ESAI_xCR_xMOD_NETWORK : 0);
> +   val |= params_channels(params) > 1 ? ESAI_xCR_xMOD_NETWORK : 0;
> +
> +   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val);
> +   /* Recording in synchronous mode needs to set TCR also */
> +   if (!tx && esai_priv->synchronous)
> +   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> + mask, val);
> 
> regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx),
>ESAI_xFCR_xFR_MASK, ESAI_xFCR_xFR); @@ -556,10
> +577,10 @@ static int fsl_esai_hw_params(struct snd_pcm_substream
> *substream,
> 
> regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), mask, val);
> 
> -   mask = ESAI_xCR_xSWS_MASK | (tx ? ESAI_xCR_PADC : 0);
> -   

[PATCH V2] ASoC: fsl_esai: Support synchronous mode

2019-04-03 Thread S.j. Wang
In ESAI synchronous mode, the clock is generated by Tx, So
we should always set registers of Tx which relate with the
bit clock and frame clock generation (TCCR, TCR, ECR), even
there is only Rx is working.

Signed-off-by: Shengjiu Wang 
---
changes in v2
- refine the patch according Nicolin's comments
- merge plain setting.

 sound/soc/fsl/fsl_esai.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index 3623aa9a6f2e..f2e7b27f8447 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -218,7 +218,7 @@ static int fsl_esai_set_dai_sysclk(struct snd_soc_dai *dai, 
int clk_id,
 {
struct fsl_esai *esai_priv = snd_soc_dai_get_drvdata(dai);
struct clk *clksrc = esai_priv->extalclk;
-   bool tx = clk_id <= ESAI_HCKT_EXTAL;
+   bool tx = (clk_id <= ESAI_HCKT_EXTAL || esai_priv->synchronous);
bool in = dir == SND_SOC_CLOCK_IN;
u32 ratio, ecr = 0;
unsigned long clk_rate;
@@ -253,7 +253,7 @@ static int fsl_esai_set_dai_sysclk(struct snd_soc_dai *dai, 
int clk_id,
ecr |= ESAI_ECR_ETI;
/* fall through */
case ESAI_HCKR_EXTAL:
-   ecr |= ESAI_ECR_ERI;
+   ecr |= esai_priv->synchronous ? ESAI_ECR_ETI : ESAI_ECR_ERI;
break;
default:
return -EINVAL;
@@ -537,10 +537,18 @@ static int fsl_esai_hw_params(struct snd_pcm_substream 
*substream,
 
bclk = params_rate(params) * slot_width * esai_priv->slots;
 
-   ret = fsl_esai_set_bclk(dai, tx, bclk);
+   ret = fsl_esai_set_bclk(dai, esai_priv->synchronous || tx, bclk);
if (ret)
return ret;
 
+   mask = ESAI_xCR_xSWS_MASK;
+   val = ESAI_xCR_xSWS(slot_width, width);
+
+   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val);
+   /* Recording in synchronous mode needs to set TCR also */
+   if (!tx && esai_priv->synchronous)
+   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, mask, val);
+
/* Use Normal mode to support monaural audio */
regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
   ESAI_xCR_xMOD_MASK, params_channels(params) > 1 ?
@@ -556,10 +564,9 @@ static int fsl_esai_hw_params(struct snd_pcm_substream 
*substream,
 
regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), mask, val);
 
-   mask = ESAI_xCR_xSWS_MASK | (tx ? ESAI_xCR_PADC : 0);
-   val = ESAI_xCR_xSWS(slot_width, width) | (tx ? ESAI_xCR_PADC : 0);
-
-   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val);
+   if (tx)
+   regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
+   ESAI_xCR_PADC, ESAI_xCR_PADC);
 
/* Remove ESAI personal reset by configuring ESAI_PCRC and ESAI_PRRC */
regmap_update_bits(esai_priv->regmap, REG_ESAI_PRRC,
-- 
1.9.1



Re: [5/7] cpufreq/pasemi: Checking implementation of pas_cpufreq_cpu_init()

2019-04-03 Thread Markus Elfring
> @@ -146,6 +146,7 @@  static int pas_cpufreq_cpu_init(struct cpufreq_policy 
> *policy)
>
>   cpu = of_get_cpu_node(policy->cpu, NULL);
>
> + of_node_put(cpu);
>   if (!cpu)
>   goto out;

Can the statement “return -ENODEV” be nicer as exception handling
in the if branch of this source code place?
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/cpufreq/pasemi-cpufreq.c?id=bf97b82f37c6d90e16de001d0659644c57fa490d#n137

Regards,
Markus


Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t

2019-04-03 Thread Daniel Jordan
On Wed, Apr 03, 2019 at 06:46:07AM +0200, Christophe Leroy wrote:
> 
> 
> Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> > Taking and dropping mmap_sem to modify a single counter, locked_vm, is
> > overkill when the counter could be synchronized separately.
> > 
> > Make mmap_sem a little less coarse by changing locked_vm to an atomic,
> > the 64-bit variety to avoid issues with overflow on 32-bit systems.
> 
> Can you elaborate on the above ? Previously it was 'unsigned long', what
> were the issues ?

Sure, I responded to this in another thread from this series.

> If there was such issues, shouldn't there be a first patch
> moving it from unsigned long to u64 before this atomic64_t change ? Or at
> least it should be clearly explain here what the issues are and how
> switching to a 64 bit counter fixes them.

Yes, I can explain the motivation in the next version.


Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t

2019-04-03 Thread Daniel Jordan
On Tue, Apr 02, 2019 at 04:43:57PM -0700, Davidlohr Bueso wrote:
> On Tue, 02 Apr 2019, Andrew Morton wrote:
> 
> > Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
> > thinking that the benefit of removing a few mmap_sem-takings from a few
> > obscure drivers (sorry ;)) is pretty small.
> 
> afaik porting the remaining incorrect users of locked_vm to pinned_vm was
> the next step before this one, which made converting locked_vm to atomic
> hardly worth it. Daniel?
 
Right, as you know I tried those incorrect users first, but there were concerns
about user-visible changes regarding RLIMIT_MEMLOCK and pinned_vm/locked_vm
without the accounting problem between all three being solved.

To my knowledge no one has a solution for that, so in the meantime I'm taking
the incremental step of getting rid of mmap_sem for locked_vm users.  The
locked_vm -> pinned_vm conversion can happen later.


Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic

2019-04-03 Thread Daniel Jordan
On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
> Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> > With locked_vm now an atomic, there is no need to take mmap_sem as
> > writer.  Delete and refactor accordingly.
> 
> Could you please detail the change ?

Ok, I'll be more specific in the next version, using some of your language in
fact.  :)

> It looks like this is not the only
> change. I'm wondering what the consequences are.
> 
> Before we did:
> - lock
> - calculate future value
> - check the future value is acceptable
> - update value if future value acceptable
> - return error if future value non acceptable
> - unlock
> 
> Now we do:
> - atomic update with future (possibly too high) value
> - check the new value is acceptable
> - atomic update back with older value if new value not acceptable and return
> error
> 
> So if a concurrent action wants to increase locked_vm with an acceptable
> step while another one has temporarily set it too high, it will now fail.
> 
> I think we should keep the previous approach and do a cmpxchg after
> validating the new value.

That's a good idea, and especially worth doing considering that an arbitrary
number of threads that charge a low amount of locked_vm can fail just because
one thread charges lots of it.

pinned_vm appears to be broken the same way, so I can fix it too unless someone
beats me to it.


Re: Question about Power8/9, PHB3/4 and setting of DMA mask

2019-04-03 Thread Christoph Hellwig
On Sun, Mar 31, 2019 at 12:50:21PM +0300, Oded Gabbay wrote:
> Due to some limitation in Goya, the driver first need to allocate a
> 2MB chunk in a DMA-able address under 39 bits and then we would like
> to move to using up to 48 bits. Therefore, the driver first tries to
> set the DMA mask to 39 bits, allocate the 2MB area and later on,
> change the DMA mask to 48 bits. On x86 this works fine.

You can't just change the DMA mask while you have active allocations,
this will fail for many implementations.


Re: [PATCH V2] ASoC: fsl_esai: Support synchronous mode

2019-04-03 Thread Nicolin Chen
This looks better :)

On Wed, Apr 03, 2019 at 10:07:40AM +, S.j. Wang wrote:
> @@ -218,7 +218,7 @@ static int fsl_esai_set_dai_sysclk(struct snd_soc_dai 
> *dai, int clk_id,
>  {
>   struct fsl_esai *esai_priv = snd_soc_dai_get_drvdata(dai);
>   struct clk *clksrc = esai_priv->extalclk;
> - bool tx = clk_id <= ESAI_HCKT_EXTAL;
> + bool tx = (clk_id <= ESAI_HCKT_EXTAL || esai_priv->synchronous);
>   bool in = dir == SND_SOC_CLOCK_IN;
>   u32 ratio, ecr = 0;
>   unsigned long clk_rate;
> @@ -253,7 +253,7 @@ static int fsl_esai_set_dai_sysclk(struct snd_soc_dai 
> *dai, int clk_id,
>   ecr |= ESAI_ECR_ETI;
>   /* fall through */

Btw, I am also wondering if the fall through here is a bug
Because I don't recall that there is a specific reason to fall
through here. Can you please help confirm? Perhaps we need to
submit a separate fix as well by replacing it with a "break;".

>   case ESAI_HCKR_EXTAL:
> - ecr |= ESAI_ECR_ERI;
> + ecr |= esai_priv->synchronous ? ESAI_ECR_ETI : ESAI_ECR_ERI;
>   break;
>   default:
>   return -EINVAL;

> @@ -537,10 +537,18 @@ static int fsl_esai_hw_params(struct snd_pcm_substream 
> *substream,
>  
>   bclk = params_rate(params) * slot_width * esai_priv->slots;
>  
> - ret = fsl_esai_set_bclk(dai, tx, bclk);
> + ret = fsl_esai_set_bclk(dai, esai_priv->synchronous || tx, bclk);
>   if (ret)
>   return ret;
>  
> + mask = ESAI_xCR_xSWS_MASK;
> + val = ESAI_xCR_xSWS(slot_width, width);
> +
> + regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val);
> + /* Recording in synchronous mode needs to set TCR also */
> + if (!tx && esai_priv->synchronous)
> + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR, mask, val);
> +
>   /* Use Normal mode to support monaural audio */
>   regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx),
>  ESAI_xCR_xMOD_MASK, params_channels(params) > 1 ?
> @@ -556,10 +564,9 @@ static int fsl_esai_hw_params(struct snd_pcm_substream 
> *substream,
>  
>   regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), mask, val);
>  
> - mask = ESAI_xCR_xSWS_MASK | (tx ? ESAI_xCR_PADC : 0);
> - val = ESAI_xCR_xSWS(slot_width, width) | (tx ? ESAI_xCR_PADC : 0);
> -
> - regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), mask, val);
> + if (tx)
> + regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
> + ESAI_xCR_PADC, ESAI_xCR_PADC);

Mind aligning the indentation here like the one below?
regmap_update_bits(esai_priv->regmap, REG_ESAI_TCR,
   ESAI_xCR_PADC, ESAI_xCR_PADC);

Once you fix the indentation, add this:

Acked-by: Nicolin Chen 

Thanks


[PATCH v1 04/15] powerpc/mm: move pgtable_t in asm/mmu.h

2019-04-03 Thread Christophe Leroy
pgtable_t is now identical for all subarches, move it to the
top level asm/mmu.h

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h | 4 
 arch/powerpc/include/asm/book3s/64/mmu.h  | 8 
 arch/powerpc/include/asm/mmu.h| 3 +++
 arch/powerpc/include/asm/nohash/32/mmu.h  | 6 --
 arch/powerpc/include/asm/nohash/64/mmu.h  | 6 --
 5 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 5cb588395fdc..2612d7a1688c 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -10,8 +10,6 @@
  * BATs
  */
 
-#include 
-
 /* Block size masks */
 #define BL_128K0x000
 #define BL_256K 0x001
@@ -49,8 +47,6 @@ struct ppc_bat {
u32 batu;
u32 batl;
 };
-
-typedef pte_t *pgtable_t;
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 1ceee000c18d..b98b5b304307 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -25,14 +25,6 @@ struct mmu_psize_def {
};
 };
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
-
-/*
- * For BOOK3s 64 with 4k and 64K linux page size
- * we want to use pointers, because the page table
- * actually store pfn
- */
-typedef pte_t *pgtable_t;
-
 #endif /* __ASSEMBLY__ */
 
 /* 64-bit classic hash table MMU */
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 598cdcdd1355..d10dc714f95f 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -124,6 +124,9 @@
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
+
+typedef pte_t *pgtable_t;
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
 #include 
diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
index 7d94a36d57d2..af0e8b54876a 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu.h
@@ -2,8 +2,6 @@
 #ifndef _ASM_POWERPC_NOHASH_32_MMU_H_
 #define _ASM_POWERPC_NOHASH_32_MMU_H_
 
-#include 
-
 #if defined(CONFIG_40x)
 /* 40x-style software loaded TLB */
 #include 
@@ -18,8 +16,4 @@
 #include 
 #endif
 
-#ifndef __ASSEMBLY__
-typedef pte_t *pgtable_t;
-#endif
-
 #endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index 3376f5222d24..87871d027b75 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -2,13 +2,7 @@
 #ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
 #define _ASM_POWERPC_NOHASH_64_MMU_H_
 
-#include 
-
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
-#ifndef __ASSEMBLY__
-typedef pte_t *pgtable_t;
-#endif
-
 #endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
-- 
2.13.3



[PATCH v1 02/15] powerpc/mm: define __pud_free_tlb() at all time on nohash/64

2019-04-03 Thread Christophe Leroy
CONFIG_PPC_64K_PAGES is not selectable on nohash/64, so get
__pud_free_tlb() defined at all time.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 66d086f85bd5..ded453f9b5a8 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -171,12 +171,9 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
 
 #define __pmd_free_tlb(tlb, pmd, addr)   \
pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX)
-#ifndef CONFIG_PPC_64K_PAGES
 #define __pud_free_tlb(tlb, pud, addr)   \
pgtable_free_tlb(tlb, pud, PUD_INDEX_SIZE)
 
-#endif /* CONFIG_PPC_64K_PAGES */
-
 #define check_pgt_cache()  do { } while (0)
 
 #endif /* _ASM_POWERPC_PGALLOC_64_H */
-- 
2.13.3



[PATCH v1 09/15] powerpc/mm: inline pte_alloc_one_kernel() and pte_alloc_one() on PPC32

2019-04-03 Thread Christophe Leroy
pte_alloc_one_kernel() and pte_alloc_one() are simple calls to
pte_fragment_alloc(), so they are good candidates for inlining as
already done on PPC64.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 15 ---
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 15 ---
 arch/powerpc/mm/pgtable_32.c | 10 --
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 645af86cd072..0ed856068bb8 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -59,10 +59,19 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 
 #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
-extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
-extern pgtable_t pte_alloc_one(struct mm_struct *mm);
-void pte_frag_destroy(void *pte_frag);
 pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel);
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
+{
+   return (pte_t *)pte_fragment_alloc(mm, 1);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
+{
+   return (pgtable_t)pte_fragment_alloc(mm, 0);
+}
+
+void pte_frag_destroy(void *pte_frag);
 void pte_fragment_free(unsigned long *table, int kernel);
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index ea265a578eb0..1d41508f0676 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -77,10 +77,19 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 #endif
 
-extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
-extern pgtable_t pte_alloc_one(struct mm_struct *mm);
-void pte_frag_destroy(void *pte_frag);
 pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel);
+
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
+{
+   return (pte_t *)pte_fragment_alloc(mm, 1);
+}
+
+static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
+{
+   return (pgtable_t)pte_fragment_alloc(mm, 0);
+}
+
+void pte_frag_destroy(void *pte_frag);
 void pte_fragment_free(unsigned long *table, int kernel);
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index a1c3062f0665..d02fe3ce64db 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -43,16 +43,6 @@ EXPORT_SYMBOL(ioremap_bot);  /* aka VMALLOC_END */
 
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
-pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
-{
-   return (pte_t *)pte_fragment_alloc(mm, 1);
-}
-
-pgtable_t pte_alloc_one(struct mm_struct *mm)
-{
-   return (pgtable_t)pte_fragment_alloc(mm, 0);
-}
-
 void __iomem *
 ioremap(phys_addr_t addr, unsigned long size)
 {
-- 
2.13.3



[PATCH v1 12/15] powerpc/mm: Only keep one version of pmd_populate() functions on nohash/32

2019-04-03 Thread Christophe Leroy
Use IS_ENABLED(CONFIG_BOOKE) to make single versions of
pmd_populate() and pmd_populate_kernel()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 4615801aa953..7ee8e27070f4 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -25,37 +25,25 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t 
*pgd)
 #define __pmd_free_tlb(tlb,x,a)do { } while (0)
 /* #define pgd_populate(mm, pmd, pte)  BUG() */
 
-#ifndef CONFIG_BOOKE
-
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
   pte_t *pte)
 {
-   *pmdp = __pmd(__pa(pte) | _PMD_PRESENT);
+   if (IS_ENABLED(CONFIG_BOOKE))
+   *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
+   else
+   *pmdp = __pmd(__pa(pte) | _PMD_PRESENT);
 }
 
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
 {
-   *pmdp = __pmd(__pa(pte_page) | _PMD_USER | _PMD_PRESENT);
+   if (IS_ENABLED(CONFIG_BOOKE))
+   *pmdp = __pmd((unsigned long)pte_page | _PMD_PRESENT);
+   else
+   *pmdp = __pmd(__pa(pte_page) | _PMD_USER | _PMD_PRESENT);
 }
 
 #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
-#else
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
-  pte_t *pte)
-{
-   *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
-}
-
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
-   pgtable_t pte_page)
-{
-   *pmdp = __pmd((unsigned long)pte_page | _PMD_PRESENT);
-}
-
-#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
-#endif
 
 static inline void pgtable_free(void *table, unsigned index_size)
 {
-- 
2.13.3



Re: [PATCH 0/4] Enabling secure boot on PowerNV systems

2019-04-03 Thread Claudio Carvalho


On 4/3/19 10:21 AM, Michael Ellerman wrote:
> Hi Claudio,
>
> Thanks for posting this.
>
> Claudio Carvalho  writes:
>> This patch set is part of a series that implements secure boot on
>> PowerNV systems.
>>
>> In order to verify the OS kernel on PowerNV, secure boot requires X.509
>> certificates trusted by the platform, the secure boot modes, and several
>> other pieces of information. These are stored in secure variables
>> controlled by OPAL, also known as OPAL secure variables.
>>
>> This patch set adds the following features:
>>
>> 1. Enable efivarfs by selecting CONFIG_EFI in the CONFIG_OPAL_SECVAR
>>introduced in this patch set. With CONFIG_EFIVAR_FS, userspace tools can
>>be used to manage the secure variables.
>> 2. Add support for OPAL secure variables by overwriting the EFI hooks
>>(get_variable, get_next_variable, set_variable and query_variable_info)
>>with OPAL call wrappers. There is probably a better way to add this
>>support, for example, we are investigating if we could register the
>>efivar_operations rather than overwriting the EFI hooks. In this patch
>>set, CONFIG_OPAL_SECVAR selects CONFIG_EFI. If, instead, we registered
>>efivar_operations, CONFIG_EFIVAR_FS would need to depend on
>>CONFIG_EFI|| CONFIG_OPAL_SECVAR. Comments or suggestions on the
>>preferred technique would be greatly appreciated.
> I am *very* reluctant to start selecting CONFIG_EFI on powerpc.
>
> Simply because we don't actually have EFI, and I worry we're going to
> both break assumptions in the EFI code as well as impose requirements on
> the powerpc code that aren't really necessary.

Yes, we agree. We are working on the v2 and it is not going to depend on
CONFIG_EFI. Rather, the IMA arch policies will make the OPAL calls directly.


>
> So I'd definitely prefer we go the route of enabling efivarfs with an
> alternate backend.

Right, I'm investigating how we can do that, but it looks like we should
post that as a separate patchset to avoid delaying upstreaming signature
verification based on the secure boot variables.

Thanks,
Claudio


>
> Better still would be a generic secure variable interface as Matt
> suggests, if the userspace tools can be relatively easily adapted to use
> that interface.
>
> cheers
>



[PATCH v1 05/15] powerpc/mm: get rid of nohash/32/mmu.h and nohash/64/mmu.h

2019-04-03 Thread Christophe Leroy
Those files have no real added values, especially the 64 bit
which only includes the common book3e mmu.h which is also
included from 32 bits side.

So lets do the final inclusion directly from nohash/mmu.h

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/mmu.h | 19 ---
 arch/powerpc/include/asm/nohash/64/mmu.h |  8 
 arch/powerpc/include/asm/nohash/mmu.h| 16 
 3 files changed, 12 insertions(+), 31 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h
 delete mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h

diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
deleted file mode 100644
index af0e8b54876a..
--- a/arch/powerpc/include/asm/nohash/32/mmu.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_NOHASH_32_MMU_H_
-#define _ASM_POWERPC_NOHASH_32_MMU_H_
-
-#if defined(CONFIG_40x)
-/* 40x-style software loaded TLB */
-#include 
-#elif defined(CONFIG_44x)
-/* 44x-style software loaded TLB */
-#include 
-#elif defined(CONFIG_PPC_BOOK3E_MMU)
-/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
-#include 
-#elif defined (CONFIG_PPC_8xx)
-/* Motorola/Freescale 8xx software loaded TLB */
-#include 
-#endif
-
-#endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
deleted file mode 100644
index 87871d027b75..
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
-#define _ASM_POWERPC_NOHASH_64_MMU_H_
-
-/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
-#include 
-
-#endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/mmu.h 
b/arch/powerpc/include/asm/nohash/mmu.h
index a037cb1efb57..edc793e5f08f 100644
--- a/arch/powerpc/include/asm/nohash/mmu.h
+++ b/arch/powerpc/include/asm/nohash/mmu.h
@@ -2,10 +2,18 @@
 #ifndef _ASM_POWERPC_NOHASH_MMU_H_
 #define _ASM_POWERPC_NOHASH_MMU_H_
 
-#ifdef CONFIG_PPC64
-#include 
-#else
-#include 
+#if defined(CONFIG_40x)
+/* 40x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_44x)
+/* 44x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_PPC_BOOK3E_MMU)
+/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
+#include 
+#elif defined (CONFIG_PPC_8xx)
+/* Motorola/Freescale 8xx software loaded TLB */
+#include 
 #endif
 
 #endif /* _ASM_POWERPC_NOHASH_MMU_H_ */
-- 
2.13.3



[PATCH v1 11/15] powerpc/mm: refactor definition of pgtable_cache[]

2019-04-03 Thread Christophe Leroy
pgtable_cache[] is the same for the 4 subarches, lets make it common.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 21 -
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 22 --
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 21 -
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 22 --
 arch/powerpc/include/asm/pgalloc.h   | 21 +
 5 files changed, 21 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 46422309d6e0..1b9b5c228230 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -5,26 +5,6 @@
 #include 
 #include 
 
-/*
- * Functions that deal with pagetables that could be at any level of
- * the table need to be passed an "index_size" so they know how to
- * handle allocation.  For PTE pages (which are linked to a struct
- * page for now, and drawn from the main get_free_pages() pool), the
- * allocation size will be (2^index_size * sizeof(pointer)) and
- * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
- *
- * The maximum index size needs to be big enough to allow any
- * pagetable sizes we need, but small enough to fit in the low bits of
- * any page table pointer.  In other words all pagetables, even tiny
- * ones, must be aligned to allow at least enough low 0 bits to
- * contain this value.  This value is also used as a mask, so it must
- * be one less than a power of two.
- */
-#define MAX_PGTABLE_INDEX_SIZE 0xf
-
-extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) pgtable_cache[shift]
-
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
@@ -69,7 +49,6 @@ static inline void pgtable_free(void *table, unsigned 
index_size)
}
 }
 
-#define check_pgt_cache()  do { } while (0)
 #define get_hugepd_cache_index(x)  (x)
 
 #ifdef CONFIG_SMP
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index cfd48d8cc055..df2dce6afe14 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -19,26 +19,6 @@ struct vmemmap_backing {
 };
 extern struct vmemmap_backing *vmemmap_list;
 
-/*
- * Functions that deal with pagetables that could be at any level of
- * the table need to be passed an "index_size" so they know how to
- * handle allocation.  For PTE pages (which are linked to a struct
- * page for now, and drawn from the main get_free_pages() pool), the
- * allocation size will be (2^index_size * sizeof(pointer)) and
- * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
- *
- * The maximum index size needs to be big enough to allow any
- * pagetable sizes we need, but small enough to fit in the low bits of
- * any page table pointer.  In other words all pagetables, even tiny
- * ones, must be aligned to allow at least enough low 0 bits to
- * contain this value.  This value is also used as a mask, so it must
- * be one less than a power of two.
- */
-#define MAX_PGTABLE_INDEX_SIZE 0xf
-
-extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) pgtable_cache[shift]
-
 extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
 extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
@@ -199,8 +179,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
pgtable_free_tlb(tlb, table, PTE_INDEX);
 }
 
-#define check_pgt_cache()  do { } while (0)
-
 extern atomic_long_t direct_pages_count[MMU_PAGE_COUNT];
 static inline void update_page_count(int psize, long count)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index e96ef2fde2ca..4615801aa953 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -5,26 +5,6 @@
 #include 
 #include 
 
-/*
- * Functions that deal with pagetables that could be at any level of
- * the table need to be passed an "index_size" so they know how to
- * handle allocation.  For PTE pages (which are linked to a struct
- * page for now, and drawn from the main get_free_pages() pool), the
- * allocation size will be (2^index_size * sizeof(pointer)) and
- * allocations are drawn from the kmem_cache in PGT_CACHE(index_size).
- *
- * The maximum index size needs to be big enough to allow any
- * pagetable sizes we need, but small enough to fit in the low bits of
- * any page table pointer.  In other words all pagetables, even tiny
- * ones, must be aligned to allow at least enough low 0 bits to
- * contain this value.  This value is also used as a mask, so it must
- * be one less than a power of two.
- */
-#define 

[PATCH] powerpc/powernv: Add mmap to opal export sysfs nodes

2019-04-03 Thread Jordan Niethe
The sysfs nodes created under /opal/exports/ do not currently support
mmap. Skiboot trace buffers are exported here with in the series
https://patchwork.ozlabs.org/cover/1073501/. Adding mmap support makes
it possible to use the functions for reading traces in external/trace.
This improves on the current read/lseek method as it handles cases like
the buffer wrapping and overflowing.

Signed-off-by: Jordan Niethe 
---
v2: ensure only whole pages can be mapped
---
 arch/powerpc/platforms/powernv/opal.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 2b0eca104f86..3611b5b9c5d2 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -714,6 +714,15 @@ static ssize_t export_attr_read(struct file *fp, struct 
kobject *kobj,
   bin_attr->size);
 }
 
+static int export_attr_mmap(struct file *fp, struct kobject *kobj,
+   struct bin_attribute *attr,
+   struct vm_area_struct *vma)
+{
+   return remap_pfn_range(vma, vma->vm_start,
+  __pa(attr->private) >> PAGE_SHIFT,
+  attr->size, PAGE_READONLY);
+}
+
 /*
  * opal_export_attrs: creates a sysfs node for each property listed in
  * the device-tree under /ibm,opal/firmware/exports/
@@ -759,6 +768,9 @@ static void opal_export_attrs(void)
attr->attr.name = kstrdup(prop->name, GFP_KERNEL);
attr->attr.mode = 0400;
attr->read = export_attr_read;
+   /* Ensure only whole pages are mapped */
+   if (vals[0] % PAGE_SIZE == 0 && vals[1] % PAGE_SIZE == 0)
+   attr->mmap = export_attr_mmap;
attr->private = __va(vals[0]);
attr->size = vals[1];
 
-- 
2.20.1



[PATCH v1 03/15] powerpc/mm: convert Book3E 64 to pte_fragment

2019-04-03 Thread Christophe Leroy
Book3E 64 is the only subarch not using pte_fragment. In order
to allow refactorisation, this patch converts it to pte_fragment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu_context.h   |  6 -
 arch/powerpc/include/asm/nohash/64/mmu.h |  4 +++-
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 33 ++--
 arch/powerpc/mm/Makefile |  4 ++--
 arch/powerpc/mm/mmu_context.c|  2 +-
 5 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 6ee8195a2ffb..66a3805dc935 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -228,13 +228,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
 #endif
 }
 
-#ifdef CONFIG_PPC_BOOK3E_64
-static inline void arch_exit_mmap(struct mm_struct *mm)
-{
-}
-#else
 extern void arch_exit_mmap(struct mm_struct *mm);
-#endif
 
 static inline void arch_unmap(struct mm_struct *mm,
  struct vm_area_struct *vma,
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index e6585480dfc4..3376f5222d24 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -2,11 +2,13 @@
 #ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
 #define _ASM_POWERPC_NOHASH_64_MMU_H_
 
+#include 
+
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
 #ifndef __ASSEMBLY__
-typedef struct page *pgtable_t;
+typedef pte_t *pgtable_t;
 #endif
 
 #endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index ded453f9b5a8..7fb87235f845 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -76,10 +76,10 @@ static inline void pmd_populate_kernel(struct mm_struct 
*mm, pmd_t *pmd,
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
pgtable_t pte_page)
 {
-   pmd_set(pmd, (unsigned long)page_address(pte_page));
+   pmd_set(pmd, (unsigned long)pte_page);
 }
 
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
@@ -92,44 +92,35 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
 }
 
+pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel);
 
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+   return (pte_t *)pte_fragment_alloc(mm, 1);
 }
 
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-   struct page *page;
-   pte_t *pte;
-
-   pte = (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT);
-   if (!pte)
-   return NULL;
-   page = virt_to_page(pte);
-   if (!pgtable_page_ctor(page)) {
-   __free_page(page);
-   return NULL;
-   }
-   return page;
+   return (pgtable_t)pte_fragment_alloc(mm, 0);
 }
 
+void pte_frag_destroy(void *pte_frag);
+void pte_fragment_free(unsigned long *table, int kernel);
+
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long)pte);
+   pte_fragment_free((unsigned long *)pte, 1);
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 {
-   pgtable_page_dtor(ptepage);
-   __free_page(ptepage);
+   pte_fragment_free((unsigned long *)ptepage, 0);
 }
 
 static inline void pgtable_free(void *table, int shift)
 {
if (!shift) {
-   pgtable_page_dtor(virt_to_page(table));
-   free_page((unsigned long)table);
+   pte_fragment_free((unsigned long *)table, 0);
} else {
BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
kmem_cache_free(PGT_CACHE(shift), table);
@@ -166,7 +157,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
  unsigned long address)
 {
tlb_flush_pgtable(tlb, address);
-   pgtable_free_tlb(tlb, page_address(table), 0);
+   pgtable_free_tlb(tlb, table, 0);
 }
 
 #define __pmd_free_tlb(tlb, pmd, addr)   \
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 3c1bd9fa23cd..138c772d58d1 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -9,6 +9,7 @@ CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
 
 obj-y  := fault.o mem.o pgtable.o mmap.o \
   init_$(BITS).o pgtable_$(BITS).o \
+  pgtable-frag.o \
   init-common.o mmu_context.o drmem.o
 

[PATCH v1 14/15] powerpc/mm: refactor pmd_pgtable()

2019-04-03 Thread Christophe Leroy
pmd_pgtable() is identical on the 4 subarches, refactor it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 2 --
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 5 -
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 2 --
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 2 --
 arch/powerpc/include/asm/pgalloc.h   | 5 +
 5 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 1b9b5c228230..998317702630 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -37,8 +37,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
*pmdp = __pmd(__pa(pte_page) | _PMD_PRESENT);
 }
 
-#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
-
 static inline void pgtable_free(void *table, unsigned index_size)
 {
if (!index_size) {
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index df2dce6afe14..053a7940504e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -163,11 +163,6 @@ static inline void pmd_populate(struct mm_struct *mm, 
pmd_t *pmd,
*pmd = __pmd(__pgtable_ptr_val(pte_page) | PMD_VAL_BITS);
 }
 
-static inline pgtable_t pmd_pgtable(pmd_t pmd)
-{
-   return (pgtable_t)pmd_page_vaddr(pmd);
-}
-
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 6c0f5151dc1d..137761b01588 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -43,6 +43,4 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
*pmdp = __pmd(__pa(pte_page) | _PMD_USER | _PMD_PRESENT);
 }
 
-#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
-
 #endif /* _ASM_POWERPC_PGALLOC_32_H */
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index c636feced1ff..5a0ea63c77c7 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -59,8 +59,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
pmd_set(pmd, (unsigned long)pte_page);
 }
 
-#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
-
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX),
diff --git a/arch/powerpc/include/asm/pgalloc.h 
b/arch/powerpc/include/asm/pgalloc.h
index 5761bee0f004..2b2c60a1a66d 100644
--- a/arch/powerpc/include/asm/pgalloc.h
+++ b/arch/powerpc/include/asm/pgalloc.h
@@ -72,4 +72,9 @@ static inline void check_pgt_cache(void) { }
 #include 
 #endif
 
+static inline pgtable_t pmd_pgtable(pmd_t pmd)
+{
+   return (pgtable_t)pmd_page_vaddr(pmd);
+}
+
 #endif /* _ASM_POWERPC_PGALLOC_H */
-- 
2.13.3



[PATCH v1 01/15] powerpc/mm: drop __bad_pte()

2019-04-03 Thread Christophe Leroy
This has never been called (since Kernel has been in git at least),
drop it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 2 --
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 3633502e102c..645af86cd072 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -22,8 +22,6 @@
  */
 #define MAX_PGTABLE_INDEX_SIZE 0xf
 
-extern void __bad_pte(pmd_t *pmd);
-
 extern struct kmem_cache *pgtable_cache[];
 #define PGT_CACHE(shift) pgtable_cache[shift]
 
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index bd186e85b4f7..ea265a578eb0 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -22,8 +22,6 @@
  */
 #define MAX_PGTABLE_INDEX_SIZE 0xf
 
-extern void __bad_pte(pmd_t *pmd);
-
 extern struct kmem_cache *pgtable_cache[];
 #define PGT_CACHE(shift) pgtable_cache[shift]
 
-- 
2.13.3



[PATCH v1 10/15] powerpc/mm: refactor pte_alloc_one() and pte_free() families definition.

2019-04-03 Thread Christophe Leroy
Functions pte_alloc_one(), pte_alloc_one_kernel(), pte_free(),
pte_free_kernel() are identical for the four subarches.

This patch moves their definition in a common place.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 25 -
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 22 --
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 25 -
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 25 -
 arch/powerpc/include/asm/pgalloc.h   | 25 +
 5 files changed, 25 insertions(+), 97 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 0ed856068bb8..46422309d6e0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -59,31 +59,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 
 #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
-pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel);
-
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
-{
-   return (pte_t *)pte_fragment_alloc(mm, 1);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
-{
-   return (pgtable_t)pte_fragment_alloc(mm, 0);
-}
-
-void pte_frag_destroy(void *pte_frag);
-void pte_fragment_free(unsigned long *table, int kernel);
-
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
-{
-   pte_fragment_free((unsigned long *)pte, 1);
-}
-
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
-{
-   pte_fragment_free((unsigned long *)ptepage, 0);
-}
-
 static inline void pgtable_free(void *table, unsigned index_size)
 {
if (!index_size) {
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 138bc2ecc0c4..cfd48d8cc055 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -39,9 +39,7 @@ extern struct vmemmap_backing *vmemmap_list;
 extern struct kmem_cache *pgtable_cache[];
 #define PGT_CACHE(shift) pgtable_cache[shift]
 
-extern pte_t *pte_fragment_alloc(struct mm_struct *, int);
 extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
-extern void pte_fragment_free(unsigned long *, int);
 extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
 #ifdef CONFIG_SMP
@@ -190,26 +188,6 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
return (pgtable_t)pmd_page_vaddr(pmd);
 }
 
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
-{
-   return (pte_t *)pte_fragment_alloc(mm, 1);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
-{
-   return (pgtable_t)pte_fragment_alloc(mm, 0);
-}
-
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
-{
-   pte_fragment_free((unsigned long *)pte, 1);
-}
-
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
-{
-   pte_fragment_free((unsigned long *)ptepage, 0);
-}
-
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 1d41508f0676..e96ef2fde2ca 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -77,31 +77,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 #endif
 
-pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel);
-
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
-{
-   return (pte_t *)pte_fragment_alloc(mm, 1);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
-{
-   return (pgtable_t)pte_fragment_alloc(mm, 0);
-}
-
-void pte_frag_destroy(void *pte_frag);
-void pte_fragment_free(unsigned long *table, int kernel);
-
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
-{
-   pte_fragment_free((unsigned long *)pte, 1);
-}
-
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
-{
-   pte_fragment_free((unsigned long *)ptepage, 0);
-}
-
 static inline void pgtable_free(void *table, unsigned index_size)
 {
if (!index_size) {
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 7fb87235f845..98de4f3b0306 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -92,31 +92,6 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
 }
 
-pte_t *pte_fragment_alloc(struct mm_struct *mm, int kernel);
-
-static inline pte_t *pte_alloc_one_kernel(struct 

[PATCH v1 15/15] powerpc/mm: refactor pgd_alloc() and pgd_free() on nohash

2019-04-03 Thread Christophe Leroy
pgd_alloc() and pgd_free() are identical on nohash 32 and 64.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 11 ---
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 11 ---
 arch/powerpc/include/asm/nohash/pgalloc.h| 12 
 3 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 137761b01588..11eac371e7e0 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -5,17 +5,6 @@
 #include 
 #include 
 
-static inline pgd_t *pgd_alloc(struct mm_struct *mm)
-{
-   return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
-   pgtable_gfp_flags(mm, GFP_KERNEL));
-}
-
-static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
-{
-   kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
-}
-
 /*
  * We don't have any real pmd's, and this code never triggers because
  * the pgd will always be present..
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 5a0ea63c77c7..62321cd12da9 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -18,17 +18,6 @@ struct vmemmap_backing {
 };
 extern struct vmemmap_backing *vmemmap_list;
 
-static inline pgd_t *pgd_alloc(struct mm_struct *mm)
-{
-   return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
-   pgtable_gfp_flags(mm, GFP_KERNEL));
-}
-
-static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
-{
-   kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
-}
-
 #define pgd_populate(MM, PGD, PUD) pgd_set(PGD, (unsigned long)PUD)
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
diff --git a/arch/powerpc/include/asm/nohash/pgalloc.h 
b/arch/powerpc/include/asm/nohash/pgalloc.h
index 4fccac6af3ad..332b13b4ecdb 100644
--- a/arch/powerpc/include/asm/nohash/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/pgalloc.h
@@ -3,6 +3,7 @@
 #define _ASM_POWERPC_NOHASH_PGALLOC_H
 
 #include 
+#include 
 
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 #ifdef CONFIG_PPC64
@@ -16,6 +17,17 @@ static inline void tlb_flush_pgtable(struct mmu_gather *tlb,
 }
 #endif /* !CONFIG_PPC_BOOK3E */
 
+static inline pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+   return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
+   pgtable_gfp_flags(mm, GFP_KERNEL));
+}
+
+static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
+{
+   kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
+}
+
 #ifdef CONFIG_PPC64
 #include 
 #else
-- 
2.13.3



[PATCH v1 06/15] powerpc/Kconfig: select PPC_MM_SLICES from subarch type

2019-04-03 Thread Christophe Leroy
Lets select PPC_MM_SLICES from the subarch config item instead of
doing it via defaults declaration in the PPC_MM_SLICES item itself.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/Kconfig.cputype | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 842b2c7e156a..a46a0adb634d 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -34,6 +34,7 @@ config PPC_8xx
bool "Freescale 8xx"
select FSL_SOC
select SYS_SUPPORTS_HUGETLBFS
+   select PPC_MM_SLICES if HUGETLB_PAGE
 
 config 40x
bool "AMCC 40x"
@@ -75,6 +76,7 @@ config PPC_BOOK3S_64
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_NUMA_BALANCING
select IRQ_WORK
+   select PPC_MM_SLICES
 
 config PPC_BOOK3E_64
bool "Embedded processors"
@@ -360,8 +362,6 @@ config PPC_BOOK3E_MMU
 
 config PPC_MM_SLICES
bool
-   default y if PPC_BOOK3S_64
-   default y if PPC_8xx && HUGETLB_PAGE
 
 config PPC_HAVE_PMU_SUPPORT
bool
-- 
2.13.3



Re: [PATCH] powerpc: config: skiroot: Add (back) MLX5 ethernet support

2019-04-03 Thread Stewart Smith
Mathieu Malaterre  writes:
> On Wed, Apr 3, 2019 at 2:51 AM Joel Stanley  wrote:
>>
>> It turns out that some defconfig changes and kernel config option
>> changes meant we accidentally dropped Ethernet support for Mellanox CLX5
>> cards.
>>
>> Reported-by: Carol L Soto 
>> Suggested-by: Carol L Soto 
>> Signed-off-by: Stewart Smith 
>> Signed-off-by: Joel Stanley 
>
> Fixes: cbc39809a398 ("powerpc/configs: Update skiroot defconfig")
>
> ?

Yes.

-- 
Stewart Smith
OPAL Architect, IBM.



[PATCH v1 00/15] Refactor pgalloc stuff

2019-04-03 Thread Christophe Leroy
This series converts book3e64 to pte_fragment and refactor
things that are common among subarches.

Christophe Leroy (15):
  powerpc/mm: drop __bad_pte()
  powerpc/mm: define __pud_free_tlb() at all time on nohash/64
  powerpc/mm: convert Book3E 64 to pte_fragment
  powerpc/mm: move pgtable_t in asm/mmu.h
  powerpc/mm: get rid of nohash/32/mmu.h and nohash/64/mmu.h
  powerpc/Kconfig: select PPC_MM_SLICES from subarch type
  powerpc/book3e: move early_alloc_pgtable() to init section
  powerpc/mm: don't use pte_alloc_kernel() until slab is available on
PPC32
  powerpc/mm: inline pte_alloc_one_kernel() and pte_alloc_one() on PPC32
  powerpc/mm: refactor pte_alloc_one() and pte_free() families
definition.
  powerpc/mm: refactor definition of pgtable_cache[]
  powerpc/mm: Only keep one version of pmd_populate() functions on
nohash/32
  powerpc/mm: refactor pgtable freeing functions on nohash
  powerpc/mm: refactor pmd_pgtable()
  powerpc/mm: refactor pgd_alloc() and pgd_free() on nohash

 arch/powerpc/include/asm/book3s/32/mmu-hash.h |   4 -
 arch/powerpc/include/asm/book3s/32/pgalloc.h  |  41 -
 arch/powerpc/include/asm/book3s/64/mmu.h  |   8 --
 arch/powerpc/include/asm/book3s/64/pgalloc.h  |  49 --
 arch/powerpc/include/asm/mmu.h|   3 +
 arch/powerpc/include/asm/mmu_context.h|   6 --
 arch/powerpc/include/asm/nohash/32/mmu.h  |  25 --
 arch/powerpc/include/asm/nohash/32/pgalloc.h  | 123 ++
 arch/powerpc/include/asm/nohash/64/mmu.h  |  12 ---
 arch/powerpc/include/asm/nohash/64/pgalloc.h  | 117 +---
 arch/powerpc/include/asm/nohash/mmu.h |  16 +++-
 arch/powerpc/include/asm/nohash/pgalloc.h |  56 
 arch/powerpc/include/asm/pgalloc.h|  51 +++
 arch/powerpc/mm/Makefile  |   4 +-
 arch/powerpc/mm/mmu_context.c |   2 +-
 arch/powerpc/mm/pgtable-book3e.c  |   4 +-
 arch/powerpc/mm/pgtable_32.c  |  42 +
 arch/powerpc/platforms/Kconfig.cputype|   4 +-
 18 files changed, 165 insertions(+), 402 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h
 delete mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h

-- 
2.13.3



[PATCH v1 07/15] powerpc/book3e: move early_alloc_pgtable() to init section

2019-04-03 Thread Christophe Leroy
early_alloc_pgtable() is only used during init.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable-book3e.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-book3e.c b/arch/powerpc/mm/pgtable-book3e.c
index 1032ef7aaf62..f6fc709697ee 100644
--- a/arch/powerpc/mm/pgtable-book3e.c
+++ b/arch/powerpc/mm/pgtable-book3e.c
@@ -55,7 +55,7 @@ void vmemmap_remove_mapping(unsigned long start,
 #endif
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
-static __ref void *early_alloc_pgtable(unsigned long size)
+static void __init *early_alloc_pgtable(unsigned long size)
 {
void *ptr;
 
@@ -74,7 +74,7 @@ static __ref void *early_alloc_pgtable(unsigned long size)
  * map_kernel_page adds an entry to the ioremap page table
  * and adds an entry to the HPT, possibly bolting it
  */
-int map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
+int __ref map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot)
 {
pgd_t *pgdp;
pud_t *pudp;
-- 
2.13.3



[PATCH v1 08/15] powerpc/mm: don't use pte_alloc_kernel() until slab is available on PPC32

2019-04-03 Thread Christophe Leroy
In the same way as PPC64, implement early allocation functions and
avoid calling pte_alloc_kernel() before slab is available.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable_32.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 6e56a6240bfa..a1c3062f0665 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -43,11 +43,8 @@ EXPORT_SYMBOL(ioremap_bot);  /* aka VMALLOC_END */
 
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
-__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   if (!slab_is_available())
-   return memblock_alloc(PTE_FRAG_SIZE, PTE_FRAG_SIZE);
-
return (pte_t *)pte_fragment_alloc(mm, 1);
 }
 
@@ -205,7 +202,29 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
-int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot)
+static void __init *early_alloc_pgtable(unsigned long size)
+{
+   void *ptr = memblock_alloc(size, size);
+
+   if (!ptr)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
+ __func__, size, size);
+
+   return ptr;
+}
+
+static pte_t __init *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
+{
+   if (pmd_none(*pmdp)) {
+   pte_t *ptep = early_alloc_pgtable(PTE_FRAG_SIZE);
+
+   pmd_populate_kernel(_mm, pmdp, ptep);
+   }
+   return pte_offset_kernel(pmdp, va);
+}
+
+
+int __ref map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot)
 {
pmd_t *pd;
pte_t *pg;
@@ -214,7 +233,10 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot)
/* Use upper 10 bits of VA to index the first level map */
pd = pmd_offset(pud_offset(pgd_offset_k(va), va), va);
/* Use middle 10 bits of VA to index the second-level map */
-   pg = pte_alloc_kernel(pd, va);
+   if (likely(slab_is_available()))
+   pg = pte_alloc_kernel(pd, va);
+   else
+   pg = early_pte_alloc_kernel(pd, va);
if (pg != 0) {
err = 0;
/* The PTE should never be already set nor present in the
-- 
2.13.3



[PATCH v1 13/15] powerpc/mm: refactor pgtable freeing functions on nohash

2019-04-03 Thread Christophe Leroy
pgtable_free() and others are identical on nohash/32 and 64,
so move them into asm/nohash/pgalloc.h

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 43 ---
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 43 ---
 arch/powerpc/include/asm/nohash/pgalloc.h| 44 
 3 files changed, 44 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 7ee8e27070f4..6c0f5151dc1d 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -45,47 +45,4 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 
 #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
-static inline void pgtable_free(void *table, unsigned index_size)
-{
-   if (!index_size) {
-   pte_fragment_free((unsigned long *)table, 0);
-   } else {
-   BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
-   kmem_cache_free(PGT_CACHE(index_size), table);
-   }
-}
-
-#define get_hugepd_cache_index(x)  (x)
-
-#ifdef CONFIG_SMP
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
-{
-   unsigned long pgf = (unsigned long)table;
-   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
-   pgf |= shift;
-   tlb_remove_table(tlb, (void *)pgf);
-}
-
-static inline void __tlb_remove_table(void *_table)
-{
-   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
-   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
-
-   pgtable_free(table, shift);
-}
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
-{
-   pgtable_free(table, shift);
-}
-#endif
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
- unsigned long address)
-{
-   tlb_flush_pgtable(tlb, address);
-   pgtable_free_tlb(tlb, table, 0);
-}
 #endif /* _ASM_POWERPC_PGALLOC_32_H */
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index ffc86d42816d..c636feced1ff 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -72,49 +72,6 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
 }
 
-static inline void pgtable_free(void *table, int shift)
-{
-   if (!shift) {
-   pte_fragment_free((unsigned long *)table, 0);
-   } else {
-   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
-   kmem_cache_free(PGT_CACHE(shift), table);
-   }
-}
-
-#define get_hugepd_cache_index(x)  (x)
-#ifdef CONFIG_SMP
-static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int 
shift)
-{
-   unsigned long pgf = (unsigned long)table;
-
-   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
-   pgf |= shift;
-   tlb_remove_table(tlb, (void *)pgf);
-}
-
-static inline void __tlb_remove_table(void *_table)
-{
-   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
-   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
-
-   pgtable_free(table, shift);
-}
-
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int 
shift)
-{
-   pgtable_free(table, shift);
-}
-#endif
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
- unsigned long address)
-{
-   tlb_flush_pgtable(tlb, address);
-   pgtable_free_tlb(tlb, table, 0);
-}
-
 #define __pmd_free_tlb(tlb, pmd, addr)   \
pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX)
 #define __pud_free_tlb(tlb, pud, addr)   \
diff --git a/arch/powerpc/include/asm/nohash/pgalloc.h 
b/arch/powerpc/include/asm/nohash/pgalloc.h
index 0634f2949438..4fccac6af3ad 100644
--- a/arch/powerpc/include/asm/nohash/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/pgalloc.h
@@ -21,4 +21,48 @@ static inline void tlb_flush_pgtable(struct mmu_gather *tlb,
 #else
 #include 
 #endif
+
+static inline void pgtable_free(void *table, int shift)
+{
+   if (!shift) {
+   pte_fragment_free((unsigned long *)table, 0);
+   } else {
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   kmem_cache_free(PGT_CACHE(shift), table);
+   }
+}
+
+#define get_hugepd_cache_index(x)  (x)
+
+#ifdef CONFIG_SMP
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int 
shift)
+{
+   unsigned long pgf = (unsigned long)table;
+
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   pgf |= shift;
+   tlb_remove_table(tlb, (void *)pgf);
+}
+
+static inline void __tlb_remove_table(void *_table)
+{
+ 

Re: [PATCH v4 1/4] ocxl: Rename struct link to ocxl_link

2019-04-03 Thread Frederic Barrat




Le 25/03/2019 à 06:34, Alastair D'Silva a écrit :

From: Alastair D'Silva 

The term 'link' is ambiguous (especially when the struct is used for a
list), so rename it for clarity.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---


Acked-by: Frederic Barrat 



  drivers/misc/ocxl/file.c |  5 ++---
  drivers/misc/ocxl/link.c | 36 ++--
  2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index e6a607488f8a..009e09b7ded5 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -151,10 +151,9 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context 
*ctx,
mutex_unlock(>status_mutex);
  
  		if (status == ATTACHED) {

-   int rc;
-   struct link *link = ctx->afu->fn->link;
+   int rc = ocxl_link_update_pe(ctx->afu->fn->link,
+   ctx->pasid, ctx->tidr);
  
-			rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);

if (rc)
return rc;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index d50b861d7e57..8d2690a1a9de 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -76,7 +76,7 @@ struct spa {
   * limited number of opencapi slots on a system and lookup is only
   * done when the device is probed
   */
-struct link {
+struct ocxl_link {
struct list_head list;
struct kref ref;
int domain;
@@ -179,7 +179,7 @@ static void xsl_fault_handler_bh(struct work_struct 
*fault_work)
  
  static irqreturn_t xsl_fault_handler(int irq, void *data)

  {
-   struct link *link = (struct link *) data;
+   struct ocxl_link *link = (struct ocxl_link *) data;
struct spa *spa = link->spa;
u64 dsisr, dar, pe_handle;
struct pe_data *pe_data;
@@ -256,7 +256,7 @@ static int map_irq_registers(struct pci_dev *dev, struct 
spa *spa)
>reg_tfc, >reg_pe_handle);
  }
  
-static int setup_xsl_irq(struct pci_dev *dev, struct link *link)

+static int setup_xsl_irq(struct pci_dev *dev, struct ocxl_link *link)
  {
struct spa *spa = link->spa;
int rc;
@@ -311,7 +311,7 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link 
*link)
return rc;
  }
  
-static void release_xsl_irq(struct link *link)

+static void release_xsl_irq(struct ocxl_link *link)
  {
struct spa *spa = link->spa;
  
@@ -323,7 +323,7 @@ static void release_xsl_irq(struct link *link)

unmap_irq_registers(spa);
  }
  
-static int alloc_spa(struct pci_dev *dev, struct link *link)

+static int alloc_spa(struct pci_dev *dev, struct ocxl_link *link)
  {
struct spa *spa;
  
@@ -350,7 +350,7 @@ static int alloc_spa(struct pci_dev *dev, struct link *link)

return 0;
  }
  
-static void free_spa(struct link *link)

+static void free_spa(struct ocxl_link *link)
  {
struct spa *spa = link->spa;
  
@@ -364,12 +364,12 @@ static void free_spa(struct link *link)

}
  }
  
-static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)

+static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link 
**out_link)
  {
-   struct link *link;
+   struct ocxl_link *link;
int rc;
  
-	link = kzalloc(sizeof(struct link), GFP_KERNEL);

+   link = kzalloc(sizeof(struct ocxl_link), GFP_KERNEL);
if (!link)
return -ENOMEM;
  
@@ -405,7 +405,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)

return rc;
  }
  
-static void free_link(struct link *link)

+static void free_link(struct ocxl_link *link)
  {
release_xsl_irq(link);
free_spa(link);
@@ -415,7 +415,7 @@ static void free_link(struct link *link)
  int ocxl_link_setup(struct pci_dev *dev, int PE_mask, void **link_handle)
  {
int rc = 0;
-   struct link *link;
+   struct ocxl_link *link;
  
  	mutex_lock(_list_lock);

list_for_each_entry(link, _list, list) {
@@ -442,7 +442,7 @@ EXPORT_SYMBOL_GPL(ocxl_link_setup);
  
  static void release_xsl(struct kref *ref)

  {
-   struct link *link = container_of(ref, struct link, ref);
+   struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
  
  	list_del(>list);

/* call platform code before releasing data */
@@ -452,7 +452,7 @@ static void release_xsl(struct kref *ref)
  
  void ocxl_link_release(struct pci_dev *dev, void *link_handle)

  {
-   struct link *link = (struct link *) link_handle;
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
  
  	mutex_lock(_list_lock);

kref_put(>ref, release_xsl);
@@ -488,7 +488,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
  {
-   

Re: [PATCH 3/5] s390: Fix vDSO clock_getres()

2019-04-03 Thread Vincenzo Frascino


On 03/04/2019 11:06, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Martin Schwidefsky wrote:
> 
>> On Mon,  1 Apr 2019 12:51:50 +0100
>> Vincenzo Frascino  wrote:
>>
>>> clock_getres in the vDSO library has to preserve the same behaviour
>>> of posix_get_hrtimer_res().
>>>
>>> In particular, posix_get_hrtimer_res() does:
>>> sec = 0;
>>> ns = hrtimer_resolution;
>>> and hrtimer_resolution depends on the enablement of the high
>>> resolution timers that can happen either at compile or at run time.
>>>
>>> Fix the s390 vdso implementation of clock_getres keeping a copy of
>>> hrtimer_resolution in vdso data and using that directly.
>>>
>>> Cc: Martin Schwidefsky 
>>> Cc: Heiko Carstens 
>>> Signed-off-by: Vincenzo Frascino 
>>> ---
>>>  arch/s390/include/asm/vdso.h   |  1 +
>>>  arch/s390/kernel/asm-offsets.c |  2 +-
>>>  arch/s390/kernel/time.c|  1 +
>>>  arch/s390/kernel/vdso32/clock_getres.S | 17 -
>>>  arch/s390/kernel/vdso64/clock_getres.S | 15 ++-
>>>  5 files changed, 25 insertions(+), 11 deletions(-)
>>
>> I tried this patch and in principle this works. In that regard
>> Acked-by: Martin Schwidefsky 
>>
>> But I wonder if the loop to check the update counter is really
>> necessary. The hrtimer_resolution value can only changes once with
>> the first call to hrtimer_switch_to_hres(). With the TOD clock
>> as the only clock available on s390 we always have the ability
>> to do hrtimer. It then all depends on the highres=[on|off] kernel
>> parameter what value we get with clock_getres().
> 
> Yes, it's not changing after boot anymore.
> 
> Thanks,
> 
>   tglx
> 

Ok, I will remove the loop from both the implementations and post it with v2.

-- 
Regards,
Vincenzo


Re: [5/7] cpufreq/pasemi: Checking implementation of pas_cpufreq_cpu_init()

2019-04-03 Thread Dan Carpenter
On Wed, Apr 03, 2019 at 04:23:54PM +0200, Markus Elfring wrote:
> > @@ -146,6 +146,7 @@  static int pas_cpufreq_cpu_init(struct cpufreq_policy 
> > *policy)
> >
> > cpu = of_get_cpu_node(policy->cpu, NULL);
> >
> > +   of_node_put(cpu);
> > if (!cpu)
> > goto out;
> 
> Can the statement “return -ENODEV” be nicer as exception handling
> in the if branch of this source code place?
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/cpufreq/pasemi-cpufreq.c?id=bf97b82f37c6d90e16de001d0659644c57fa490d#n137
> 

Why am I only receiving only one side of this conversation?

I don't know why you're responding to...  It's not required to fix/change
unrelated style choices.  If people want, they can just focus on their
own thing.

regards,
dan carpenter



Re: [PATCH v4 3/4] ocxl: Remove superfluous 'extern' from headers

2019-04-03 Thread Frederic Barrat




Le 25/03/2019 à 06:34, Alastair D'Silva a écrit :

From: Alastair D'Silva 

The 'extern' keyword adds no value here.

Signed-off-by: Alastair D'Silva 
---


Acked-by: Frederic Barrat 



  drivers/misc/ocxl/ocxl_internal.h | 54 +++
  include/misc/ocxl.h   | 36 ++---
  2 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index a32f2151029f..321b29e77f45 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -16,7 +16,6 @@
  
  extern struct pci_driver ocxl_pci_driver;
  
-

  struct ocxl_fn {
struct device dev;
int bar_used[3];
@@ -92,41 +91,40 @@ struct ocxl_process_element {
__be32 software_state;
  };
  
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);

+void ocxl_afu_put(struct ocxl_afu *afu);
  
-extern struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);

-extern void ocxl_afu_put(struct ocxl_afu *afu);
-
-extern int ocxl_create_cdev(struct ocxl_afu *afu);
-extern void ocxl_destroy_cdev(struct ocxl_afu *afu);
-extern int ocxl_register_afu(struct ocxl_afu *afu);
-extern void ocxl_unregister_afu(struct ocxl_afu *afu);
+int ocxl_create_cdev(struct ocxl_afu *afu);
+void ocxl_destroy_cdev(struct ocxl_afu *afu);
+int ocxl_register_afu(struct ocxl_afu *afu);
+void ocxl_unregister_afu(struct ocxl_afu *afu);
  
-extern int ocxl_file_init(void);

-extern void ocxl_file_exit(void);
+int ocxl_file_init(void);
+void ocxl_file_exit(void);
  
-extern int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);

-extern void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
-extern int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
-extern void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
+void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
+void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
  
-extern struct ocxl_context *ocxl_context_alloc(void);

-extern int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
+struct ocxl_context *ocxl_context_alloc(void);
+int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping);
-extern int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
-extern int ocxl_context_mmap(struct ocxl_context *ctx,
+int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
+int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
-extern int ocxl_context_detach(struct ocxl_context *ctx);
-extern void ocxl_context_detach_all(struct ocxl_afu *afu);
-extern void ocxl_context_free(struct ocxl_context *ctx);
+int ocxl_context_detach(struct ocxl_context *ctx);
+void ocxl_context_detach_all(struct ocxl_afu *afu);
+void ocxl_context_free(struct ocxl_context *ctx);
  
-extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);

-extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
+int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
+void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
  
-extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);

-extern int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
-extern void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
-extern int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
+int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
+void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
+int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
int eventfd);
-extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
+u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
  
  #endif /* _OCXL_INTERNAL_H_ */

diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 9ff6ddc28e22..4544573cc93c 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -53,7 +53,7 @@ struct ocxl_fn_config {
   * Read the configuration space of a function and fill in a
   * ocxl_fn_config structure with all the function details
   */
-extern int ocxl_config_read_function(struct pci_dev *dev,
+int ocxl_config_read_function(struct pci_dev *dev,
struct ocxl_fn_config *fn);
  
  /*

@@ -62,14 +62,14 @@ extern int ocxl_config_read_function(struct pci_dev *dev,
   * AFU indexes can be sparse, so a driver should check all indexes up
   * to the maximum found in the function description
   */
-extern int ocxl_config_check_afu_index(struct pci_dev *dev,
+int ocxl_config_check_afu_index(struct pci_dev *dev,
struct ocxl_fn_config *fn, int afu_idx);
  
  /*

   * Read the configuration space of a function for the AFU specified by
   * the index 

Re: [PATCH v4 2/4] ocxl: read_pasid never returns an error, so make it void

2019-04-03 Thread Frederic Barrat




Le 25/03/2019 à 06:34, Alastair D'Silva a écrit :

From: Alastair D'Silva 

No need for a return value in read_pasid as it only returns 0.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---


Acked-by: Frederic Barrat 



  drivers/misc/ocxl/config.c | 9 ++---
  1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 8f2c5d8bd2ee..4dc11897237d 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -68,7 +68,7 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 
afu_idx)
return 0;
  }
  
-static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)

+static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
  {
u16 val;
int pos;
@@ -89,7 +89,6 @@ static int read_pasid(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
  out:
dev_dbg(>dev, "PASID capability:\n");
dev_dbg(>dev, "  Max PASID log = %d\n", fn->max_pasid_log);
-   return 0;
  }
  
  static int read_dvsec_tl(struct pci_dev *dev, struct ocxl_fn_config *fn)

@@ -205,11 +204,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
  {
int rc;
  
-	rc = read_pasid(dev, fn);

-   if (rc) {
-   dev_err(>dev, "Invalid PASID configuration: %d\n", rc);
-   return -ENODEV;
-   }
+   read_pasid(dev, fn);
  
  	rc = read_dvsec_tl(dev, fn);

if (rc) {





Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Will Deacon
On Wed, Apr 03, 2019 at 09:39:52AM -0600, Jens Axboe wrote:
> On 4/3/19 9:19 AM, Will Deacon wrote:
> > On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
> >> On 4/3/19 5:11 AM, Will Deacon wrote:
> >>> will@autoplooker:~/liburing/test$ ./io_uring_register 
> >>> RELIMIT_MEMLOCK: 67108864 (67108864)
> >>> [   35.477875] Unable to handle kernel NULL pointer dereference at 
> >>> virtual address 0070
> >>> [   35.478969] Mem abort info:
> >>> [   35.479296]   ESR = 0x9604
> >>> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
> >>> [   35.480528]   SET = 0, FnV = 0
> >>> [   35.480980]   EA = 0, S1PTW = 0
> >>> [   35.481345] Data abort info:
> >>> [   35.481680]   ISV = 0, ISS = 0x0004
> >>> [   35.482267]   CM = 0, WnR = 0
> >>> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
> >>> [   35.483486] [0070] pgd=
> >>> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
> >>> [   35.484788] Modules linked in:
> >>> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
> >>> 5.1.0-rc3-00012-g40b114779944 #1
> >>> [   35.486712] Hardware name: linux,dummy-virt (DT)
> >>> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
> >>> [   35.488228] pc : link_pwq+0x10/0x60
> >>> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> >>> [   35.489550] sp : 17e2bbc0
> >>
> >> Huh, this looks odd, it's crashing inside the wq setup.
> > 
> > Enabling KASAN seems to indicate a double-free, which may well be related.
> 
> Does this help?

Yes, thanks for the quick patch. Feel free to add:

Reported-by: Will Deacon 
Tested-by: Will Deacon 

if you spin a proper patch.

Will

> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index bbdbd56cf2ac..07d6ef195d05 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2215,6 +2215,7 @@ static int io_sqe_files_register(struct io_ring_ctx 
> *ctx, void __user *arg,
>   fput(ctx->user_files[i]);
>  
>   kfree(ctx->user_files);
> + ctx->user_files = NULL;
>   ctx->nr_user_files = 0;
>   return ret;
>   }
> 
> -- 
> Jens Axboe
> 


[PATCH v8 05/20] KVM: PPC: Book3S HV: Remove pmd_is_leaf()

2019-04-03 Thread Steven Price
Since pmd_large() is now always available, pmd_is_leaf() is redundant.
Replace all uses with calls to pmd_large().

CC: Benjamin Herrenschmidt 
CC: Michael Ellerman 
CC: Paul Mackerras 
CC: kvm-...@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Steven Price 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index f55ef071883f..1b57b4e3f819 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -363,12 +363,6 @@ static void kvmppc_pte_free(pte_t *ptep)
kmem_cache_free(kvm_pte_cache, ptep);
 }
 
-/* Like pmd_huge() and pmd_large(), but works regardless of config options */
-static inline int pmd_is_leaf(pmd_t pmd)
-{
-   return !!(pmd_val(pmd) & _PAGE_PTE);
-}
-
 static pmd_t *kvmppc_pmd_alloc(void)
 {
return kmem_cache_alloc(kvm_pmd_cache, GFP_KERNEL);
@@ -460,7 +454,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
-   if (pmd_is_leaf(*p)) {
+   if (pmd_large(*p)) {
if (full) {
pmd_clear(p);
} else {
@@ -593,7 +587,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
 
-   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
+   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_large(*pmd)))
new_ptep = kvmppc_pte_alloc();
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -662,7 +656,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_large(*pmd)) {
unsigned long lgpa = gpa & PMD_MASK;
 
/* Check if we raced and someone else has set the same thing */
-- 
2.20.1



Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Will Deacon
Hi Jens,

On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
> On 4/3/19 5:11 AM, Will Deacon wrote:
> > will@autoplooker:~/liburing/test$ ./io_uring_register 
> > RELIMIT_MEMLOCK: 67108864 (67108864)
> > [   35.477875] Unable to handle kernel NULL pointer dereference at virtual 
> > address 0070
> > [   35.478969] Mem abort info:
> > [   35.479296]   ESR = 0x9604
> > [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
> > [   35.480528]   SET = 0, FnV = 0
> > [   35.480980]   EA = 0, S1PTW = 0
> > [   35.481345] Data abort info:
> > [   35.481680]   ISV = 0, ISS = 0x0004
> > [   35.482267]   CM = 0, WnR = 0
> > [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
> > [   35.483486] [0070] pgd=
> > [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
> > [   35.484788] Modules linked in:
> > [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
> > 5.1.0-rc3-00012-g40b114779944 #1
> > [   35.486712] Hardware name: linux,dummy-virt (DT)
> > [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
> > [   35.488228] pc : link_pwq+0x10/0x60
> > [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> > [   35.489550] sp : 17e2bbc0
> 
> Huh, this looks odd, it's crashing inside the wq setup.

Enabling KASAN seems to indicate a double-free, which may well be related.

Will

[  149.890370] 
==
[  149.891266] BUG: KASAN: double-free or invalid-free in 
io_sqe_files_unregister+0xa8/0x140
[  149.892218] 
[  149.892411] CPU: 113 PID: 3974 Comm: io_uring_regist Tainted: GB 
5.1.0-rc3-00012-g40b114779944 #3
[  149.893623] Hardware name: linux,dummy-virt (DT)
[  149.894169] Call trace:
[  149.894539]  dump_backtrace+0x0/0x228
[  149.895172]  show_stack+0x14/0x20
[  149.895747]  dump_stack+0xe8/0x124
[  149.896335]  print_address_description+0x60/0x258
[  149.897148]  kasan_report_invalid_free+0x78/0xb8
[  149.897936]  __kasan_slab_free+0x1fc/0x228
[  149.898641]  kasan_slab_free+0x10/0x18
[  149.899283]  kfree+0x70/0x1f8
[  149.899798]  io_sqe_files_unregister+0xa8/0x140
[  149.900574]  io_ring_ctx_wait_and_kill+0x190/0x3c0
[  149.901402]  io_uring_release+0x2c/0x48
[  149.902068]  __fput+0x18c/0x510
[  149.902612]  fput+0xc/0x18
[  149.903146]  task_work_run+0xf0/0x148
[  149.903778]  do_notify_resume+0x554/0x748
[  149.904467]  work_pending+0x8/0x10
[  149.905060] 
[  149.905331] Allocated by task 3974:
[  149.905934]  __kasan_kmalloc.isra.0.part.1+0x48/0xf8
[  149.906786]  __kasan_kmalloc.isra.0+0xb8/0xd8
[  149.907531]  kasan_kmalloc+0xc/0x18
[  149.908134]  __kmalloc+0x168/0x248
[  149.908724]  __arm64_sys_io_uring_register+0x2b8/0x15a8
[  149.909622]  el0_svc_common+0x100/0x258
[  149.910281]  el0_svc_handler+0x48/0xc0
[  149.910928]  el0_svc+0x8/0xc
[  149.911425] 
[  149.911696] Freed by task 3974:
[  149.912242]  __kasan_slab_free+0x114/0x228
[  149.912955]  kasan_slab_free+0x10/0x18
[  149.913602]  kfree+0x70/0x1f8
[  149.914118]  __arm64_sys_io_uring_register+0xc2c/0x15a8
[  149.915009]  el0_svc_common+0x100/0x258
[  149.915670]  el0_svc_handler+0x48/0xc0
[  149.916317]  el0_svc+0x8/0xc
[  149.916817] 
[  149.917101] The buggy address belongs to the object at 8004ce07ed00
[  149.917101]  which belongs to the cache kmalloc-128 of size 128
[  149.919197] The buggy address is located 0 bytes inside of
[  149.919197]  128-byte region [8004ce07ed00, 8004ce07ed80)
[  149.921142] The buggy address belongs to the page:
[  149.921953] page:7e0013381f00 count:1 mapcount:0 
mapping:800503417c00 index:0x0 compound_mapcount: 0
[  149.923595] flags: 0x10010200(slab|head)
[  149.924388] raw: 10010200 dead0100 dead0200 
800503417c00
[  149.925706] raw:  80400040 0001 

[  149.927011] page dumped because: kasan: bad access detected
[  149.927956] 
[  149.928224] Memory state around the buggy address:
[  149.929054]  8004ce07ec00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc 
fc
[  149.930274]  8004ce07ec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[  149.931494] >8004ce07ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  149.932712]^
[  149.933281]  8004ce07ed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[  149.934508]  8004ce07ee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[  149.935725] 
==


Re: [PATCH v2 4/6] powerpc: use common ptrace_syscall_enter hook to handle _TIF_SYSCALL_EMU

2019-04-03 Thread Will Deacon
Hi Oleg,

On Tue, Mar 19, 2019 at 06:32:33PM +0100, Oleg Nesterov wrote:
> On 03/19, Oleg Nesterov wrote:
> >
> > Well, personally I see no point... Again, after the trivial simplification
> > x86 does
> >
> > if (work & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) {
> > ret = tracehook_report_syscall_entry(regs);
> > if (ret || (work & _TIF_SYSCALL_EMU))
> > return -1L;
> > }
> >
> > this looks simple enough for copy-and-paste.
> >
> > > If there's a better way to achieve the same
> >
> > I can only say that if we add a common helper, I think it should absorb
> > tracehook_report_syscall_entry() and handle both TIF's just like the code
> > above does. Not sure this makes any sense.
> 
> this won't work, looking at 6/6 I see that arm64 needs to distinguish
> _TRACE and _EMU ... I don't understand this code, but it looks suspicious.
> If tracehook_report_syscall_entry() returns nonzero the tracee was killed,
> syscall_trace_enter() should just return.
> 
> To me this is another indication that consolidation makes no sense ;)

The reason I'm pushing for consolidation here is because I think it's the
only sane way to maintain the tracing and debug hooks on the syscall
entry/exit paths. Having to look at all the different arch implementations
and distil the portable semantics is a nightmare and encourages gradual
divergence over time. Given that we don't support this SYSCALL_EMU stuff
on arm64 today, we have the opportunity to make this generic and allow other
architectures (e.g. riscv) to hook in the same way that we do. It clearly
shouldn't affect the behaviour of existing architectures which already
support the functionality.

However, I also agree that this patch series looks dodgy as it stands -- we
shouldn't have code paths that can result in calling
tracehook_report_syscall_entry() twice.

Will


[PATCH v8 04/20] powerpc: mm: Add p?d_large() definitions

2019-04-03 Thread Steven Price
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_large() functions/macros.

For powerpc pmd_large() was already implemented, so hoist it out of the
CONFIG_TRANSPARENT_HUGEPAGE condition and implement the other levels.

CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: linuxppc-dev@lists.ozlabs.org
CC: kvm-...@vger.kernel.org
Signed-off-by: Steven Price 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++--
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 581f91be9dd4..f6d1ac8b832e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -897,6 +897,12 @@ static inline int pud_present(pud_t pud)
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pud_large  pud_large
+static inline int pud_large(pud_t pud)
+{
+   return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -940,6 +946,12 @@ static inline int pgd_present(pgd_t pgd)
return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pgd_large  pgd_large
+static inline int pgd_large(pgd_t pgd)
+{
+   return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
return __pte_raw(pgd_raw(pgd));
@@ -1093,6 +1105,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool 
write)
return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_large  pmd_large
+/*
+ * returns true for pmd migration entries, THP, devmap, hugetlb
+ */
+static inline int pmd_large(pmd_t pmd)
+{
+   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
@@ -1119,15 +1140,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long 
addr, pmd_t *pmdp,
return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
-- 
2.20.1



Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Jens Axboe
On 4/3/19 9:49 AM, Will Deacon wrote:
> On Wed, Apr 03, 2019 at 09:39:52AM -0600, Jens Axboe wrote:
>> On 4/3/19 9:19 AM, Will Deacon wrote:
>>> On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
 On 4/3/19 5:11 AM, Will Deacon wrote:
> will@autoplooker:~/liburing/test$ ./io_uring_register 
> RELIMIT_MEMLOCK: 67108864 (67108864)
> [   35.477875] Unable to handle kernel NULL pointer dereference at 
> virtual address 0070
> [   35.478969] Mem abort info:
> [   35.479296]   ESR = 0x9604
> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
> [   35.480528]   SET = 0, FnV = 0
> [   35.480980]   EA = 0, S1PTW = 0
> [   35.481345] Data abort info:
> [   35.481680]   ISV = 0, ISS = 0x0004
> [   35.482267]   CM = 0, WnR = 0
> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
> [   35.483486] [0070] pgd=
> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [   35.484788] Modules linked in:
> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
> 5.1.0-rc3-00012-g40b114779944 #1
> [   35.486712] Hardware name: linux,dummy-virt (DT)
> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
> [   35.488228] pc : link_pwq+0x10/0x60
> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> [   35.489550] sp : 17e2bbc0

 Huh, this looks odd, it's crashing inside the wq setup.
>>>
>>> Enabling KASAN seems to indicate a double-free, which may well be related.
>>
>> Does this help?
> 
> Yes, thanks for the quick patch. Feel free to add:
> 
> Reported-by: Will Deacon 
> Tested-by: Will Deacon 
> 
> if you spin a proper patch.

Great, thanks for reporting/testing.

-- 
Jens Axboe



Re: [PATCH v4 4/4] ocxl: Remove some unused exported symbols

2019-04-03 Thread Frederic Barrat




Le 25/03/2019 à 06:34, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Remove some unused exported symbols.

Signed-off-by: Alastair D'Silva 
---


Acked-by: Frederic Barrat 



  drivers/misc/ocxl/config.c|  4 +---
  drivers/misc/ocxl/ocxl_internal.h | 23 +++
  include/misc/ocxl.h   | 23 ---
  3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 4dc11897237d..5e65acb8e134 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -2,8 +2,8 @@
  // Copyright 2017 IBM Corp.
  #include 
  #include 
-#include 
  #include 
+#include "ocxl_internal.h"
  
  #define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))

  #define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
@@ -299,7 +299,6 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
}
return 1;
  }
-EXPORT_SYMBOL_GPL(ocxl_config_check_afu_index);
  
  static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,

struct ocxl_afu_config *afu)
@@ -535,7 +534,6 @@ int ocxl_config_get_pasid_info(struct pci_dev *dev, int 
*count)
  {
return pnv_ocxl_get_pasid_count(dev, count);
  }
-EXPORT_SYMBOL_GPL(ocxl_config_get_pasid_info);
  
  void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,

u32 pasid_count_log)
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 321b29e77f45..06fd98c989c8 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -107,6 +107,29 @@ void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, 
u32 size);
  int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
  void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
  
+/*

+ * Get the max PASID value that can be used by the function
+ */
+int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
+
+/*
+ * Check if an AFU index is valid for the given function.
+ *
+ * AFU indexes can be sparse, so a driver should check all indexes up
+ * to the maximum found in the function description
+ */
+int ocxl_config_check_afu_index(struct pci_dev *dev,
+   struct ocxl_fn_config *fn, int afu_idx);
+
+/**
+ * Update values within a Process Element
+ *
+ * link_handle: the link handle associated with the process element
+ * pasid: the PASID for the AFU context
+ * tid: the new thread id for the process element
+ */
+int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+
  struct ocxl_context *ocxl_context_alloc(void);
  int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 4544573cc93c..9530d3be1b30 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -56,15 +56,6 @@ struct ocxl_fn_config {
  int ocxl_config_read_function(struct pci_dev *dev,
struct ocxl_fn_config *fn);
  
-/*

- * Check if an AFU index is valid for the given function.
- *
- * AFU indexes can be sparse, so a driver should check all indexes up
- * to the maximum found in the function description
- */
-int ocxl_config_check_afu_index(struct pci_dev *dev,
-   struct ocxl_fn_config *fn, int afu_idx);
-
  /*
   * Read the configuration space of a function for the AFU specified by
   * the index 'afu_idx'. Fills in a ocxl_afu_config structure
@@ -74,11 +65,6 @@ int ocxl_config_read_afu(struct pci_dev *dev,
struct ocxl_afu_config *afu,
u8 afu_idx);
  
-/*

- * Get the max PASID value that can be used by the function
- */
-int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
-
  /*
   * Tell an AFU, by writing in the configuration space, the PASIDs that
   * it can use. Range starts at 'pasid_base' and its size is a multiple
@@ -188,15 +174,6 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data);
  
-/**

- * Update values within a Process Element
- *
- * link_handle: the link handle associated with the process element
- * pasid: the PASID for the AFU context
- * tid: the new thread id for the process element
- */
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
-
  /*
   * Remove a Process Element from the Shared Process Area for a link
   */





Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Jens Axboe
On 4/3/19 9:19 AM, Will Deacon wrote:
> Hi Jens,
> 
> On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
>> On 4/3/19 5:11 AM, Will Deacon wrote:
>>> will@autoplooker:~/liburing/test$ ./io_uring_register 
>>> RELIMIT_MEMLOCK: 67108864 (67108864)
>>> [   35.477875] Unable to handle kernel NULL pointer dereference at virtual 
>>> address 0070
>>> [   35.478969] Mem abort info:
>>> [   35.479296]   ESR = 0x9604
>>> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
>>> [   35.480528]   SET = 0, FnV = 0
>>> [   35.480980]   EA = 0, S1PTW = 0
>>> [   35.481345] Data abort info:
>>> [   35.481680]   ISV = 0, ISS = 0x0004
>>> [   35.482267]   CM = 0, WnR = 0
>>> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
>>> [   35.483486] [0070] pgd=
>>> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
>>> [   35.484788] Modules linked in:
>>> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
>>> 5.1.0-rc3-00012-g40b114779944 #1
>>> [   35.486712] Hardware name: linux,dummy-virt (DT)
>>> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
>>> [   35.488228] pc : link_pwq+0x10/0x60
>>> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
>>> [   35.489550] sp : 17e2bbc0
>>
>> Huh, this looks odd, it's crashing inside the wq setup.
> 
> Enabling KASAN seems to indicate a double-free, which may well be related.

Does this help?


diff --git a/fs/io_uring.c b/fs/io_uring.c
index bbdbd56cf2ac..07d6ef195d05 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2215,6 +2215,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, 
void __user *arg,
fput(ctx->user_files[i]);
 
kfree(ctx->user_files);
+   ctx->user_files = NULL;
ctx->nr_user_files = 0;
return ret;
}

-- 
Jens Axboe



Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t

2019-04-03 Thread Daniel Jordan
On Tue, Apr 02, 2019 at 03:04:24PM -0700, Andrew Morton wrote:
> On Tue,  2 Apr 2019 16:41:53 -0400 Daniel Jordan  
> wrote:
> >  static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
> >  {
> > long ret = 0;
> > +   s64 locked_vm;
> >  
> > if (!current || !current->mm)
> > return ret; /* process exited */
> >  
> > down_write(>mm->mmap_sem);
> >  
> > +   locked_vm = atomic64_read(>mm->locked_vm);
> > if (inc) {
> > unsigned long locked, lock_limit;
> >  
> > -   locked = current->mm->locked_vm + stt_pages;
> > +   locked = locked_vm + stt_pages;
> > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> > ret = -ENOMEM;
> > else
> > -   current->mm->locked_vm += stt_pages;
> > +   atomic64_add(stt_pages, >mm->locked_vm);
> > } else {
> > -   if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> > -   stt_pages = current->mm->locked_vm;
> > +   if (WARN_ON_ONCE(stt_pages > locked_vm))
> > +   stt_pages = locked_vm;
> >  
> > -   current->mm->locked_vm -= stt_pages;
> > +   atomic64_sub(stt_pages, >mm->locked_vm);
> > }
> 
> With the current code, current->mm->locked_vm cannot go negative. 
> After the patch, it can go negative.  If someone else decreased
> current->mm->locked_vm between this function's atomic64_read() and
> atomic64_sub().
> 
> I guess this is a can't-happen in this case because the racing code
> which performed the modification would have taken it negative anyway.
> 
> But this all makes me rather queazy.

mmap_sem is still held in this patch, so updates to locked_vm are still
serialized and I don't think what you describe can happen.  A later patch
removes mmap_sem, of course, but it also rewrites the code to do something
different.  This first patch is just a mechanical type change from unsigned
long to atomic64_t.

So...does this alleviate your symptoms?

> Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
> thinking that the benefit of removing a few mmap_sem-takings from a few
> obscure drivers (sorry ;)) is pretty small.

Not sure about the other drivers, but vfio type1 isn't obscure.  We use it
extensively in our cloud, and from Andrea's __GFP_THISNODE thread a few months
back it seems Red Hat also uses it:

  https://lore.kernel.org/linux-mm/20180820032204.9591-3-aarca...@redhat.com/

> Also, the argument for switching 32-bit arches to a 64-bit counter was
> suspiciously vague.  What overflow issues?  Or are we just being lazy?

If user-controlled values are used to increase locked_vm, multiple threads
doing it at once on a 32-bit system could theoretically cause overflow, so in
the absence of atomic overflow checking, the 64-bit counter on 32b is defensive
programming.

I wouldn't have thought to do it, but Jason Gunthorpe raised the same issue in
the pinned_vm series:

  https://lore.kernel.org/linux-mm/20190115205311.gd22...@mellanox.com/

I'm fine with changing it to atomic_long_t if the scenario is too theoretical
for people.


Anyway, thanks for looking at this.


Re: [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t

2019-04-03 Thread Daniel Jordan
On Wed, Apr 03, 2019 at 08:51:13AM -0400, Steven Sistare wrote:
> On 4/2/2019 4:41 PM, Daniel Jordan wrote:
> > [1] 
> > https://lore.kernel.org/linux-mm/20190211224437.25267-1-daniel.m.jor...@oracle.com/
> 
>   You could clean all 6 patches up nicely with a common subroutine that
> increases locked_vm subject to the rlimit.  Pass a bool arg that is true if
> the  limit should be enforced, !dma->lock_cap for one call site, and
> !capable(CAP_IPC_LOCK) for the rest.  Push the warnings and debug statements
> to the subroutine as well.  One patch could refactor, and a second could
> change the locking method.

Yes, I tried writing, but didn't end up including, such a subroutine for [1].
The devil was in the details, but with the cmpxchg business, it's more
worthwhile to iron all those out.  I'll give it a try.


[RFC PATCH kernel v2] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon

2019-04-03 Thread Alexey Kardashevskiy
The NVIDIA V100 SXM2 GPUs are connected to the CPU via PCIe links and
(on POWER9) NVLinks. In addition to that, GPUs themselves have direct
peer to peer NVLinks in groups of 2 to 4 GPUs. At the moment the POWERNV
platform puts all interconnected GPUs to the same IOMMU group.

However the user may want to pass individual GPUs to the userspace so
in order to do so we need to put them into separate IOMMU groups and
cut off the interconnects.

Thankfully V100 GPUs implement an interface to do by programming link
disabling mask to BAR0 of a GPU. Once a link is disabled in a GPU using
this interface, it cannot be re-enabled until the secondary bus reset is
issued to the GPU.

This adds an extra step to the secondary bus reset handler (the one used
for such GPUs) to block NVLinks to GPUs which do not belong to the same
group as the GPU being reset.

This adds a new "isolate_nvlink" kernel parameter to allow GPU isolation;
when enabled, every GPU gets its own IOMMU group. The new parameter is off
by default to preserve the existing behaviour.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* this is rework of [PATCH kernel RFC 0/2] vfio, powerpc/powernv: Isolate 
GV100GL
but this time it is contained in the powernv platform
---
 arch/powerpc/platforms/powernv/Makefile  |   2 +-
 arch/powerpc/platforms/powernv/pci.h |   1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |   1 +
 arch/powerpc/platforms/powernv/npu-dma.c |  24 +++-
 arch/powerpc/platforms/powernv/nvlinkgpu.c   | 131 +++
 5 files changed, 156 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/nvlinkgpu.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index da2e99efbd04..60a10d3b36eb 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -6,7 +6,7 @@ obj-y   += opal-msglog.o opal-hmi.o 
opal-power.o opal-irqchip.o
 obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
-obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
+obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o nvlinkgpu.o
 obj-$(CONFIG_CXL_BASE) += pci-cxl.o
 obj-$(CONFIG_EEH)  += eeh-powernv.o
 obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 8e36da379252..9fd3f391482c 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -250,5 +250,6 @@ extern void pnv_pci_unlink_table_and_group(struct 
iommu_table *tbl,
 extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
void *tce_mem, u64 tce_size,
u64 dma_offset, unsigned int page_shift);
+extern void pnv_try_isolate_nvidia_v100(struct pci_dev *gpdev);
 
 #endif /* __POWERNV_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index f38078976c5d..464b097d9635 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -937,6 +937,7 @@ void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
}
+   pnv_try_isolate_nvidia_v100(dev);
 }
 
 static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, const char *type,
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index dc23d9d2a7d9..017eae8197e7 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -529,6 +529,23 @@ static void pnv_comp_attach_table_group(struct npu_comp 
*npucomp,
++npucomp->pe_num;
 }
 
+static bool isolate_nvlink;
+
+static int __init parse_isolate_nvlink(char *p)
+{
+   bool val;
+
+   if (!p)
+   val = true;
+   else if (kstrtobool(p, ))
+   return -EINVAL;
+
+   isolate_nvlink = val;
+
+   return 0;
+}
+early_param("isolate_nvlink", parse_isolate_nvlink);
+
 struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 {
struct iommu_table_group *table_group;
@@ -549,7 +566,7 @@ struct iommu_table_group 
*pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 
hose = pci_bus_to_host(npdev->bus);
 
-   if (hose->npu) {
+   if (hose->npu && !isolate_nvlink) {
table_group = >npu->npucomp.table_group;
 
if (!table_group->group) {
@@ -559,7 +576,10 @@ struct iommu_table_group 
*pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
pe->pe_number);
}
} else {
-   /* Create a group for 1 GPU and attached NPUs for POWER8 */
+   /*
+* Create a group for 1 

VLC doesn't play videos anymore since the PowerPC fixes 5.1-3

2019-04-03 Thread Christian Zigotzky

On 03 April 2019 at 07:05AM, Christophe Leroy wrote:

Le 03/04/2019 à 05:52, Christian Zigotzky a écrit :

Please test VLC with the RC3 of kernel 5.1.

The removing of the PowerPC fixes 5.1-3 has solved the VLC issue. 
Another user has already confirmed that [1]. This isn’t an April 
Fool‘s. ;-)


Could you bisect to identify the guilty commit ?

Thanks
Christophe



Thanks

[1] 
http://forum.hyperion-entertainment.com/viewtopic.php?f=58=4256=20#p47561





Hello Christophe,

I have found the problematic patch. The following patch from the PowerPC 
fixes 5.1-3 is responsible for the VLC issue.


diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h

index 1afe90ade595..bbc06bd72b1f 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,10 +82,10 @@ struct vdso_data {
    __u32 icache_block_size;  /* L1 i-cache block size */
    __u32 dcache_log_block_size;  /* L1 d-cache log block size */
    __u32 icache_log_block_size;  /* L1 i-cache log block size */
-   __s32 wtom_clock_sec; /* Wall to monotonic clock */
-   __s32 wtom_clock_nsec;
-   struct timespec stamp_xtime;   /* xtime as at tb_orig_stamp */
-   __u32 stamp_sec_fraction;   /* fractional seconds of stamp_xtime */
+   __u32 stamp_sec_fraction;  /* fractional seconds of stamp_xtime */
+   __s32 wtom_clock_nsec; /* Wall to monotonic clock nsec */
+   __s64 wtom_clock_sec; /* Wall to monotonic clock sec */
+   struct timespec stamp_xtime;  /* xtime as at tb_orig_stamp */
   __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls */
   __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
 };

-

Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/arch/powerpc/include/asm/vdso_datapage.h?h=v5.1-rc2=a5ed1e96cafde5ba48638f486bfca0685dc6ddc9


I created a patch for solving the VLC issue today.

vdso_datapage_vlc.patch:

diff -rupN a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
--- a/arch/powerpc/include/asm/vdso_datapage.h  2019-04-03 
22:56:44.560645936 +0200
+++ b/arch/powerpc/include/asm/vdso_datapage.h  2019-04-04 
02:20:09.479361827 +0200

@@ -82,10 +82,10 @@ struct vdso_data {
    __u32 icache_block_size;    /* L1 i-cache block 
size */
    __u32 dcache_log_block_size;    /* L1 d-cache log block 
size */
    __u32 icache_log_block_size;    /* L1 i-cache log block 
size */
-   __u32 stamp_sec_fraction;   /* fractional seconds of 
stamp_xtime */
-   __s32 wtom_clock_nsec;  /* Wall to monotonic 
clock nsec */
-   __s64 wtom_clock_sec;   /* Wall to monotonic 
clock sec */
-   struct timespec stamp_xtime;    /* xtime as at 
tb_orig_stamp */
+   __s32 wtom_clock_sec;   /* Wall to monotonic 
clock */

+   __s32 wtom_clock_nsec;
+   struct timespec stamp_xtime;    /* xtime as at tb_orig_stamp */
+   __u32 stamp_sec_fraction;   /* fractional seconds of 
stamp_xtime */

    __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls */
    __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
 };

-

Cheers,
Christian


Re: [PATCH AUTOSEL 5.0 015/262] memblock: memblock_phys_alloc_try_nid(): don't panic

2019-04-03 Thread Sasha Levin

On Thu, Mar 28, 2019 at 07:57:21AM +0200, Mike Rapoport wrote:

Hi,

On Wed, Mar 27, 2019 at 01:57:50PM -0400, Sasha Levin wrote:

From: Mike Rapoport 

[ Upstream commit 337555744e6e39dd1d87698c6084dd88a606d60a ]

The memblock_phys_alloc_try_nid() function tries to allocate memory from
the requested node and then falls back to allocation from any node in
the system.  The memblock_alloc_base() fallback used by this function
panics if the allocation fails.

Replace the memblock_alloc_base() fallback with the direct call to
memblock_alloc_range_nid() and update the memblock_phys_alloc_try_nid()
callers to check the returned value and panic in case of error.


This is a part of memblock refactoring, I don't think it should be applied
to -stable.


Dropped, thanks!

--
Thanks,
Sasha


Re: VLC doesn't play videos anymore since the PowerPC fixes 5.1-3

2019-04-03 Thread Christophe Leroy




Le 04/04/2019 à 02:58, Christian Zigotzky a écrit :

On 03 April 2019 at 07:05AM, Christophe Leroy wrote:

Le 03/04/2019 à 05:52, Christian Zigotzky a écrit :

Please test VLC with the RC3 of kernel 5.1.

The removing of the PowerPC fixes 5.1-3 has solved the VLC issue. 
Another user has already confirmed that [1]. This isn’t an April 
Fool‘s. ;-)


Could you bisect to identify the guilty commit ?

Thanks
Christophe



Thanks

[1] 
http://forum.hyperion-entertainment.com/viewtopic.php?f=58=4256=20#p47561 






Hello Christophe,

I have found the problematic patch. The following patch from the PowerPC 
fixes 5.1-3 is responsible for the VLC issue.


That change is part of the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.1-rc2=b5b4453e7912f056da1ca7572574cada32ecb60c

Just changing back the type of wtom_clock_sec to 32 bits without 
changing back the loading instruction is likely to give unexpected 
results on PPC64.


Are you using 32 bits or 64 bits powerpc ?

Christophe



diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h

index 1afe90ade595..bbc06bd72b1f 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,10 +82,10 @@ struct vdso_data {
     __u32 icache_block_size;  /* L1 i-cache block size */
     __u32 dcache_log_block_size;  /* L1 d-cache log block size */
     __u32 icache_log_block_size;  /* L1 i-cache log block size */
-   __s32 wtom_clock_sec; /* Wall to monotonic clock */
-   __s32 wtom_clock_nsec;
-   struct timespec stamp_xtime;   /* xtime as at tb_orig_stamp */
-   __u32 stamp_sec_fraction;   /* fractional seconds of stamp_xtime */
+   __u32 stamp_sec_fraction;  /* fractional seconds of stamp_xtime */
+   __s32 wtom_clock_nsec; /* Wall to monotonic clock nsec */
+   __s64 wtom_clock_sec; /* Wall to monotonic clock sec */
+   struct timespec stamp_xtime;  /* xtime as at tb_orig_stamp */
    __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls */
    __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
  };

-

Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/arch/powerpc/include/asm/vdso_datapage.h?h=v5.1-rc2=a5ed1e96cafde5ba48638f486bfca0685dc6ddc9 



I created a patch for solving the VLC issue today.

vdso_datapage_vlc.patch:

diff -rupN a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
--- a/arch/powerpc/include/asm/vdso_datapage.h  2019-04-03 
22:56:44.560645936 +0200
+++ b/arch/powerpc/include/asm/vdso_datapage.h  2019-04-04 
02:20:09.479361827 +0200

@@ -82,10 +82,10 @@ struct vdso_data {
     __u32 icache_block_size;    /* L1 i-cache block 
size */
     __u32 dcache_log_block_size;    /* L1 d-cache log block 
size */
     __u32 icache_log_block_size;    /* L1 i-cache log block 
size */
-   __u32 stamp_sec_fraction;   /* fractional seconds of 
stamp_xtime */
-   __s32 wtom_clock_nsec;  /* Wall to monotonic 
clock nsec */
-   __s64 wtom_clock_sec;   /* Wall to monotonic 
clock sec */
-   struct timespec stamp_xtime;    /* xtime as at 
tb_orig_stamp */
+   __s32 wtom_clock_sec;   /* Wall to monotonic 
clock */

+   __s32 wtom_clock_nsec;
+   struct timespec stamp_xtime;    /* xtime as at tb_orig_stamp */
+   __u32 stamp_sec_fraction;   /* fractional seconds of 
stamp_xtime */

     __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of syscalls */
     __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of syscalls */
  };

-

Cheers,
Christian


Re: [PATCH 5/6 v3] syscalls: Remove start and number from syscall_get_arguments() args

2019-04-03 Thread Paul Burton
Hi Steven,

On Mon, Apr 01, 2019 at 09:41:09AM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" 
> 
> At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
> function call syscall_get_arguments() implemented in x86 was horribly
> written and not optimized for the standard case of passing in 0 and 6 for
> the starting index and the number of system calls to get. When looking at
> all the users of this function, I discovered that all instances pass in only
> 0 and 6 for these arguments. Instead of having this function handle
> different cases that are never used, simply rewrite it to return the first 6
> arguments of a system call.
> 
> This should help out the performance of tracing system calls by ptrace,
> ftrace and perf.
> 
> Link: http://lkml.kernel.org/r/20161107213233.754809...@goodmis.org
> 
> Cc: Oleg Nesterov 
> Cc: Thomas Gleixner 
> Cc: Kees Cook 
> Cc: Andy Lutomirski 
> Cc: Dominik Brodowski 
> Cc: Dave Martin 
> Cc: "Dmitry V. Levin" 
> Cc: x...@kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-c6x-...@linux-c6x.org
> Cc: uclinux-h8-de...@lists.sourceforge.jp
> Cc: linux-hexa...@vger.kernel.org
> Cc: linux-i...@vger.kernel.org
> Cc: linux-m...@vger.kernel.org
> Cc: nios2-...@lists.rocketboards.org
> Cc: openr...@lists.librecores.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@lists.infradead.org
> Cc: linux-xte...@linux-xtensa.org
> Cc: linux-a...@vger.kernel.org
> Reported-by: Andy Lutomirski 
> Signed-off-by: Steven Rostedt (VMware) 

Acked-by: Paul Burton  # MIPS parts

Thanks,
Paul


Re: [PATCH] powerpc/xmon: add read-only mode

2019-04-03 Thread Andrew Donnellan

On 4/4/19 12:02 am, Christopher M Riedl wrote:



On March 29, 2019 at 12:49 AM Andrew Donnellan  
wrote:


On 29/3/19 3:21 pm, cmr wrote:

Operations which write to memory should be restricted on secure systems
and optionally to avoid self-destructive behaviors.


For reference:
   - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_linuxppc_issues_issues_219=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=-pHOU8dm1U-U1crivyxKr_-xvZrIBB8YUqvA3el0Ee0=zNkGBUKLoTqdSUy_VUpM8VLTEqy7sJfIXpWU-ujc6Rc=9jgy61R_p5jvtwOKCMFfnhmJegzCIIomcf4I1BRvBPg=
   - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_linuxppc_issues_issues_232=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=-pHOU8dm1U-U1crivyxKr_-xvZrIBB8YUqvA3el0Ee0=zNkGBUKLoTqdSUy_VUpM8VLTEqy7sJfIXpWU-ujc6Rc=fFYm1ZTaEp6HbeZMV5JEmlbBtDwdehfiW1H3shFoFMM=

Perhaps clarify what is meant here by "secure systems".

Otherwise commit message looks good.



I will reword this for the next patch to reflect the verbiage in the referenced
github issue -- ie. Secure Boot and not violating secure boot integrity by 
using xmon.


Sounds good.






---
   arch/powerpc/Kconfig.debug |  7 +++
   arch/powerpc/xmon/xmon.c   | 24 
   2 files changed, 31 insertions(+)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 4e00cb0a5464..33cc01adf4cb 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -117,6 +117,13 @@ config XMON_DISASSEMBLY
  to say Y here, unless you're building for a memory-constrained
  system.
   
+config XMON_RO

+   bool "Set xmon read-only mode"
+   depends on XMON
+   default y
+   help
+ Disable state- and memory-altering write operations in xmon.


The meaning of this option is a bit unclear.

  From the code - it looks like what this option actually does is enable
RO mode *by default*. In which case it should probably be called
XMON_RO_DEFAULT and the description should note that RW mode can still
be enabled via a cmdline option.



Based on Christophe's feedback the default will change for this option in the
next patch. I will also add the cmdline options to the description for clarity.



Yep, adding a description of the cmdline options is also a good idea.




+
   config DEBUGGER
bool
depends on KGDB || XMON
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a0f44f992360..c13ee73cdfd4 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -80,6 +80,7 @@ static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
   #endif
   static unsigned long in_xmon __read_mostly = 0;
   static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
+static int xmon_ro = IS_ENABLED(CONFIG_XMON_RO);
   
   static unsigned long adrs;

   static int size = 1;
@@ -1042,6 +1043,8 @@ cmds(struct pt_regs *excp)
set_lpp_cmd();
break;
case 'b':
+   if (xmon_ro == 1)
+   break;


For all these cases - it would be much better to print an error message
somewhere when we abort due to read-only mode.



I included print messages initially but then thought about how xmon is intended
for "power" users. I can add print statements to avoid confusion and frustration
since the operations are just "silently" dropped -- *if* that aligns with xmon's
"philosophy".



Power users often want a straightforward self-explanatory UX more than 
anyone :)



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-03 Thread Jens Axboe
On 4/3/19 5:11 AM, Will Deacon wrote:
> Hi Michael,
> 
> On Wed, Apr 03, 2019 at 01:47:50PM +1100, Michael Ellerman wrote:
>> Arnd Bergmann  writes:
>>> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
>>> b/arch/powerpc/kernel/syscalls/syscall.tbl
>>> index b18abb0c3dae..00f5a63c8d9a 100644
>>> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
>>> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
>>> @@ -505,3 +505,7 @@
>>>  42132  rt_sigtimedwait_time64  sys_rt_sigtimedwait 
>>> compat_sys_rt_sigtimedwait_time64
>>>  42232  futex_time64sys_futex   
>>> sys_futex
>>>  42332  sched_rr_get_interval_time64
>>> sys_sched_rr_get_interval   sys_sched_rr_get_interval
>>> +424common  pidfd_send_signal   sys_pidfd_send_signal
>>> +425common  io_uring_setup  sys_io_uring_setup
>>> +426common  io_uring_enter  sys_io_uring_enter
>>> +427common  io_uring_register   sys_io_uring_register
>>
>> Acked-by: Michael Ellerman  (powerpc)
>>
>> Lightly tested.
>>
>> The pidfd_test selftest passes.
> 
> That reports pass for me too, although it fails to unshare the pid ns, which I
> assume is benign.
> 
>> Ran the io_uring example from fio, which prints lots of:
> 
> How did you invoke that? I had a play with the tests in:

It's t/io_uring from the fio repo:

git://git.kernel.dk/fio

and you just run it ala:

# make t/io_uring
# t/io_uring /dev/some_device

>   git://git.kernel.dk/liburing
> 
> but I quickly ran into the kernel oops below.
> 
> Will
> 
> --->8
> 
> will@autoplooker:~/liburing/test$ ./io_uring_register 
> RELIMIT_MEMLOCK: 67108864 (67108864)
> [   35.477875] Unable to handle kernel NULL pointer dereference at virtual 
> address 0070
> [   35.478969] Mem abort info:
> [   35.479296]   ESR = 0x9604
> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
> [   35.480528]   SET = 0, FnV = 0
> [   35.480980]   EA = 0, S1PTW = 0
> [   35.481345] Data abort info:
> [   35.481680]   ISV = 0, ISS = 0x0004
> [   35.482267]   CM = 0, WnR = 0
> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
> [   35.483486] [0070] pgd=
> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [   35.484788] Modules linked in:
> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
> 5.1.0-rc3-00012-g40b114779944 #1
> [   35.486712] Hardware name: linux,dummy-virt (DT)
> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
> [   35.488228] pc : link_pwq+0x10/0x60
> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> [   35.489550] sp : 17e2bbc0

Huh, this looks odd, it's crashing inside the wq setup.


-- 
Jens Axboe



Re: [PATCH -next] ocxl: remove set but not used variables 'tid' and 'lpid'

2019-04-03 Thread Frederic Barrat




Le 29/03/2019 à 16:44, Yue Haibing a écrit :

From: YueHaibing 

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/misc/ocxl/link.c: In function 'xsl_fault_handler':
drivers/misc/ocxl/link.c:187:17: warning: variable 'tid' set but not used 
[-Wunused-but-set-variable]
drivers/misc/ocxl/link.c:187:6: warning: variable 'lpid' set but not used 
[-Wunused-but-set-variable]

They are never used and can be removed.

Signed-off-by: YueHaibing 
---


Acked-by: Frederic Barrat 



  drivers/misc/ocxl/link.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index d50b861..3be07e9 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -184,7 +184,7 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
u64 dsisr, dar, pe_handle;
struct pe_data *pe_data;
struct ocxl_process_element *pe;
-   int lpid, pid, tid;
+   int pid;
bool schedule = false;
  
  	read_irq(spa, , , _handle);

@@ -192,9 +192,7 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
  
  	WARN_ON(pe_handle > SPA_PE_MASK);

pe = spa->spa_mem + pe_handle;
-   lpid = be32_to_cpu(pe->lpid);
pid = be32_to_cpu(pe->pid);
-   tid = be32_to_cpu(pe->tid);
/* We could be reading all null values here if the PE is being
 * removed while an interrupt kicks in. It's not supposed to
 * happen if the driver notified the AFU to terminate the





Re: [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t

2019-04-03 Thread Steven Sistare
On 4/2/2019 4:41 PM, Daniel Jordan wrote:
> Hi,
> 
> From patch 1:
> 
>   Taking and dropping mmap_sem to modify a single counter, locked_vm, is
>   overkill when the counter could be synchronized separately.
>   
>   Make mmap_sem a little less coarse by changing locked_vm to an atomic,
>   the 64-bit variety to avoid issues with overflow on 32-bit systems.
> 
> This is a more conservative alternative to [1] with no user-visible
> effects.  Thanks to Alexey Kardashevskiy for pointing out the racy
> atomics and to Alex Williamson, Christoph Lameter, Ira Weiny, and Jason
> Gunthorpe for their comments on [1].
> 
> Davidlohr Bueso recently did a similar conversion for pinned_vm[2].
> 
> Testing
>  1. passes LTP mlock[all], munlock[all], fork, mmap, and mremap tests in an
> x86 kvm guest
>  2. a VFIO-enabled x86 kvm guest shows the same VmLck in
> /proc/pid/status before and after this change
>  3. cross-compiles on powerpc
> 
> The series is based on v5.1-rc3.  Please consider for 5.2.
> 
> Daniel
> 
> [1] 
> https://lore.kernel.org/linux-mm/20190211224437.25267-1-daniel.m.jor...@oracle.com/
> [2] https://lore.kernel.org/linux-mm/20190206175920.31082-1-d...@stgolabs.net/
> 
> Daniel Jordan (6):
>   mm: change locked_vm's type from unsigned long to atomic64_t
>   vfio/type1: drop mmap_sem now that locked_vm is atomic
>   vfio/spapr_tce: drop mmap_sem now that locked_vm is atomic
>   fpga/dlf/afu: drop mmap_sem now that locked_vm is atomic
>   powerpc/mmu: drop mmap_sem now that locked_vm is atomic
>   kvm/book3s: drop mmap_sem now that locked_vm is atomic
> 
>  arch/powerpc/kvm/book3s_64_vio.c| 34 ++--
>  arch/powerpc/mm/mmu_context_iommu.c | 28 +---
>  drivers/fpga/dfl-afu-dma-region.c   | 40 -
>  drivers/vfio/vfio_iommu_spapr_tce.c | 37 --
>  drivers/vfio/vfio_iommu_type1.c | 31 +-
>  fs/proc/task_mmu.c  |  2 +-
>  include/linux/mm_types.h|  2 +-
>  kernel/fork.c   |  2 +-
>  mm/debug.c  |  5 ++--
>  mm/mlock.c  |  4 +--
>  mm/mmap.c   | 18 ++---
>  mm/mremap.c |  6 ++---
>  12 files changed, 89 insertions(+), 120 deletions(-)
> 
> base-commit: 79a3aaa7b82e3106be97842dedfd8429248896e6

Hi Daniel,
  You could clean all 6 patches up nicely with a common subroutine that
increases locked_vm subject to the rlimit.  Pass a bool arg that is true if
the  limit should be enforced, !dma->lock_cap for one call site, and
!capable(CAP_IPC_LOCK) for the rest.  Push the warnings and debug statements
to the subroutine as well.  One patch could refactor, and a second could
change the locking method.

- Steve


Re: [PATCH] powerpc/xmon: add read-only mode

2019-04-03 Thread Christopher M Riedl


> On March 29, 2019 at 12:49 AM Andrew Donnellan  
> wrote:
> 
> 
> On 29/3/19 3:21 pm, cmr wrote:
> > Operations which write to memory should be restricted on secure systems
> > and optionally to avoid self-destructive behaviors.
> 
> For reference:
>   - https://github.com/linuxppc/issues/issues/219
>   - https://github.com/linuxppc/issues/issues/232
> 
> Perhaps clarify what is meant here by "secure systems".
> 
> Otherwise commit message looks good.
> 

I will reword this for the next patch to reflect the verbiage in the referenced
github issue -- ie. Secure Boot and not violating secure boot integrity by 
using xmon.

> 
> > ---
> >   arch/powerpc/Kconfig.debug |  7 +++
> >   arch/powerpc/xmon/xmon.c   | 24 
> >   2 files changed, 31 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> > index 4e00cb0a5464..33cc01adf4cb 100644
> > --- a/arch/powerpc/Kconfig.debug
> > +++ b/arch/powerpc/Kconfig.debug
> > @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY
> >   to say Y here, unless you're building for a memory-constrained
> >   system.
> >   
> > +config XMON_RO
> > +   bool "Set xmon read-only mode"
> > +   depends on XMON
> > +   default y
> > +   help
> > + Disable state- and memory-altering write operations in xmon.
> 
> The meaning of this option is a bit unclear.
> 
>  From the code - it looks like what this option actually does is enable 
> RO mode *by default*. In which case it should probably be called 
> XMON_RO_DEFAULT and the description should note that RW mode can still 
> be enabled via a cmdline option.
>

Based on Christophe's feedback the default will change for this option in the
next patch. I will also add the cmdline options to the description for clarity.

>
> > +
> >   config DEBUGGER
> > bool
> > depends on KGDB || XMON
> > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> > index a0f44f992360..c13ee73cdfd4 100644
> > --- a/arch/powerpc/xmon/xmon.c
> > +++ b/arch/powerpc/xmon/xmon.c
> > @@ -80,6 +80,7 @@ static int set_indicator_token = RTAS_UNKNOWN_SERVICE;
> >   #endif
> >   static unsigned long in_xmon __read_mostly = 0;
> >   static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);
> > +static int xmon_ro = IS_ENABLED(CONFIG_XMON_RO);
> >   
> >   static unsigned long adrs;
> >   static int size = 1;
> > @@ -1042,6 +1043,8 @@ cmds(struct pt_regs *excp)
> > set_lpp_cmd();
> > break;
> > case 'b':
> > +   if (xmon_ro == 1)
> > +   break;
> 
> For all these cases - it would be much better to print an error message 
> somewhere when we abort due to read-only mode.
> 

I included print messages initially but then thought about how xmon is intended
for "power" users. I can add print statements to avoid confusion and frustration
since the operations are just "silently" dropped -- *if* that aligns with xmon's
"philosophy".


Re: [PATCH] powerpc/xmon: add read-only mode

2019-04-03 Thread Christopher M Riedl
> On April 3, 2019 at 12:15 AM Christophe Leroy  wrote:
> 
> 
> 
> 
> Le 03/04/2019 à 05:38, Christopher M Riedl a écrit :
> >> On March 29, 2019 at 3:41 AM Christophe Leroy  
> >> wrote:
> >>
> >>
> >>
> >>
> >> Le 29/03/2019 à 05:21, cmr a écrit :
> >>> Operations which write to memory should be restricted on secure systems
> >>> and optionally to avoid self-destructive behaviors.
> >>>
> >>> Add a config option, XMON_RO, to control default xmon behavior along
> >>> with kernel cmdline options xmon=ro and xmon=rw for explicit control.
> >>> The default is to enable read-only mode.
> >>>
> >>> The following xmon operations are affected:
> >>> memops:
> >>>   disable memmove
> >>>   disable memset
> >>> memex:
> >>>   no-op'd mwrite
> >>> super_regs:
> >>>   no-op'd write_spr
> >>> bpt_cmds:
> >>>   disable
> >>> proc_call:
> >>>   disable
> >>>
> >>> Signed-off-by: cmr 
> >>
> >> A Fully qualified name should be used.
> > 
> > What do you mean by fully-qualified here? PPC_XMON_RO? (PPC_)XMON_READONLY?
> 
> I mean it should be
> 
> Signed-off-by: Christopher M Riedl 
> 
> instead of
> 
> Signed-off-by: cmr 
> 

Hehe, thanks :)

> > 
> >>
> >>> ---
> >>>arch/powerpc/Kconfig.debug |  7 +++
> >>>arch/powerpc/xmon/xmon.c   | 24 
> >>>2 files changed, 31 insertions(+)
> >>>
> >>> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> >>> index 4e00cb0a5464..33cc01adf4cb 100644
> >>> --- a/arch/powerpc/Kconfig.debug
> >>> +++ b/arch/powerpc/Kconfig.debug
> >>> @@ -117,6 +117,13 @@ config XMON_DISASSEMBLY
> >>> to say Y here, unless you're building for a memory-constrained
> >>> system.
> >>>
> >>> +config XMON_RO
> >>> + bool "Set xmon read-only mode"
> >>> + depends on XMON
> >>> + default y
> >>
> >> Should it really be always default y ?
> >> I would set default 'y' only when some security options are also set.
> >>
> > 
> > This is a good point, I based this on an internal Slack suggestion but 
> > giving this more thought, disabling read-only mode by default makes more 
> > sense. I'm not sure what security options could be set though?
> > 
> 
> Maybe starting with CONFIG_STRICT_KERNEL_RWX
> 
> Another point that may also be addressed by your patch is the definition 
> of PAGE_KERNEL_TEXT:
> 
> #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || 
> defined(CONFIG_BDI_SWITCH) ||\
>   defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
> #define PAGE_KERNEL_TEXT  PAGE_KERNEL_X
> #else
> #define PAGE_KERNEL_TEXT  PAGE_KERNEL_ROX
> #endif
> 
> The above let me think that it would be better if you add a config 
> XMON_RW instead of XMON_RO, with default !STRICT_KERNEL_RWX
> 
> Christophe

Thanks! I like that a lot better, this, along with your other suggestions
in the initial review, will be in the next version.


Re: [PATCH 0/4] Enabling secure boot on PowerNV systems

2019-04-03 Thread Michael Ellerman
Hi Claudio,

Thanks for posting this.

Claudio Carvalho  writes:
> This patch set is part of a series that implements secure boot on
> PowerNV systems.
>
> In order to verify the OS kernel on PowerNV, secure boot requires X.509
> certificates trusted by the platform, the secure boot modes, and several
> other pieces of information. These are stored in secure variables
> controlled by OPAL, also known as OPAL secure variables.
>
> This patch set adds the following features:
>
> 1. Enable efivarfs by selecting CONFIG_EFI in the CONFIG_OPAL_SECVAR
>introduced in this patch set. With CONFIG_EFIVAR_FS, userspace tools can
>be used to manage the secure variables.
> 2. Add support for OPAL secure variables by overwriting the EFI hooks
>(get_variable, get_next_variable, set_variable and query_variable_info)
>with OPAL call wrappers. There is probably a better way to add this
>support, for example, we are investigating if we could register the
>efivar_operations rather than overwriting the EFI hooks. In this patch
>set, CONFIG_OPAL_SECVAR selects CONFIG_EFI. If, instead, we registered
>efivar_operations, CONFIG_EFIVAR_FS would need to depend on
>CONFIG_EFI|| CONFIG_OPAL_SECVAR. Comments or suggestions on the
>preferred technique would be greatly appreciated.

I am *very* reluctant to start selecting CONFIG_EFI on powerpc.

Simply because we don't actually have EFI, and I worry we're going to
both break assumptions in the EFI code as well as impose requirements on
the powerpc code that aren't really necessary.

So I'd definitely prefer we go the route of enabling efivarfs with an
alternate backend.

Better still would be a generic secure variable interface as Matt
suggests, if the userspace tools can be relatively easily adapted to use
that interface.

cheers