Re: [PATCH 4/7] ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action

2018-04-16 Thread Andrew Donnellan

On 17/04/18 12:09, Alastair D'Silva wrote:

From: Alastair D'Silva 

The function removes the process element from NPU cache.

Signed-off-by: Alastair D'Silva 



Hmm, personally I'd suggest pnv_ocxl_spa_clear_cache() because it's just 
a wrapper around the OPAL call of a similar name.


But I don't feel strongly about this at all, so:

Acked-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation

2018-04-16 Thread Alastair D'Silva
On Tue, 2018-04-17 at 14:21 +1000, Andrew Donnellan wrote:
> On 17/04/18 12:09, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > Switch the use of TIDR on it's CPU feature, rather than assuming it
> > is available based on architecture.
> > 
> > Signed-off-by: Alastair D'Silva 
> 
> There's a use of TIDR in restore_sprs() that's behind the ARCH_300
> flag 
> as well, ideally it should never trigger in the !P9_TIDR case, but
> you 
> might want to update that too for clarity?
> 

Thanks for the review, I'll include your suggestions in the next set.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH] cxl: Configure PSL to not use APC virtual machines

2018-04-16 Thread Alastair D'Silva
On Tue, 2018-04-17 at 10:41 +0530, Vaibhav Jain wrote:
> APC virtual machines arent used on POWER-9 chips and are already
> disabled in on-chip CAPP. They also need to be disabled on the PSL
> via
> 'PSL Data Send Control Register' by setting bit(47). This forces the
> PSL to send commands to CAPP with queue.id == 0.
> 
> Signed-off-by: Vaibhav Jain 
> ---
>  drivers/misc/cxl/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index c32432168e6b..af30ee848d35 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -516,9 +516,9 @@ static int
> init_implementation_adapter_regs_psl9(struct cxl *adapter,
>   cxl_p1_write(adapter, CXL_PSL9_FIR_CNTL, psl_fircntl);
>  
>   /* Setup the PSL to transmit packets on the PCIe before the
> -  * CAPP is enabled
> +  * CAPP is enabled. Make sure that CAPP virtual machines are
> disabled
>*/
> - cxl_p1_write(adapter, CXL_PSL9_DSNDCTL,
> 0x000100102A10ULL);
> + cxl_p1_write(adapter, CXL_PSL9_DSNDCTL,
> 0x000100112A10ULL);
>  
>   /*
>* A response to an ASB_Notify request is returned by the
> 

Reviewed-by: Alastair D'Silva 

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australiamob: 0423 762 819



Re: [PATCH] cxl: Configure PSL to not use APC virtual machines

2018-04-16 Thread Andrew Donnellan

On 17/04/18 15:11, Vaibhav Jain wrote:

APC virtual machines arent used on POWER-9 chips and are already
disabled in on-chip CAPP. They also need to be disabled on the PSL via
'PSL Data Send Control Register' by setting bit(47). This forces the
PSL to send commands to CAPP with queue.id == 0.

Signed-off-by: Vaibhav Jain 


LGTM. Does this need to be sent to stable?

Acked-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



[PATCH] cxl: Configure PSL to not use APC virtual machines

2018-04-16 Thread Vaibhav Jain
APC virtual machines arent used on POWER-9 chips and are already
disabled in on-chip CAPP. They also need to be disabled on the PSL via
'PSL Data Send Control Register' by setting bit(47). This forces the
PSL to send commands to CAPP with queue.id == 0.

Signed-off-by: Vaibhav Jain 
---
 drivers/misc/cxl/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index c32432168e6b..af30ee848d35 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -516,9 +516,9 @@ static int init_implementation_adapter_regs_psl9(struct cxl 
*adapter,
cxl_p1_write(adapter, CXL_PSL9_FIR_CNTL, psl_fircntl);
 
/* Setup the PSL to transmit packets on the PCIe before the
-* CAPP is enabled
+* CAPP is enabled. Make sure that CAPP virtual machines are disabled
 */
-   cxl_p1_write(adapter, CXL_PSL9_DSNDCTL, 0x000100102A10ULL);
+   cxl_p1_write(adapter, CXL_PSL9_DSNDCTL, 0x000100112A10ULL);
 
/*
 * A response to an ASB_Notify request is returned by the
-- 
2.14.3



Re: [PATCH 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation

2018-04-16 Thread Andrew Donnellan

On 17/04/18 12:09, Alastair D'Silva wrote:

From: Alastair D'Silva 

Switch the use of TIDR on it's CPU feature, rather than assuming it
is available based on architecture.

Signed-off-by: Alastair D'Silva 


There's a use of TIDR in restore_sprs() that's behind the ARCH_300 flag 
as well, ideally it should never trigger in the !P9_TIDR case, but you 
might want to update that too for clarity?



---
  arch/powerpc/kernel/process.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1237f13fed51..a3e0a3e06d5a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1570,7 +1570,7 @@ void clear_thread_tidr(struct task_struct *t)
if (!t->thread.tidr)
return;
  
-	if (!cpu_has_feature(CPU_FTR_ARCH_300)) {

+   if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
WARN_ON_ONCE(1);
return;
}
@@ -1593,7 +1593,7 @@ int set_thread_tidr(struct task_struct *t)
  {
int rc;
  
-	if (!cpu_has_feature(CPU_FTR_ARCH_300))

+   if (!cpu_has_feature(CPU_FTR_P9_TIDR))
return -EINVAL;
  
  	if (t != current)




--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 1/7] powerpc: Add TIDR CPU feature for Power9

2018-04-16 Thread Andrew Donnellan

On 17/04/18 12:09, Alastair D'Silva wrote:

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index be8c9fa23983..5b03d8a82409 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -94,6 +94,5 @@ static inline void clear_task_ebb(struct task_struct *t)
  extern int set_thread_uses_vas(void);
  
  extern int set_thread_tidr(struct task_struct *t);

-extern void clear_thread_tidr(struct task_struct *t);


This hunk looks like it really belongs in patch 3.

Apart from that, I'm not really familiar with the CPU features code but 
nothing seems overly wrong...


Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 7/7] ocxl: Document new OCXL IOCTLs

2018-04-16 Thread Andrew Donnellan

On 17/04/18 12:09, Alastair D'Silva wrote:

From: Alastair D'Silva 

Signed-off-by: Alastair D'Silva 
---
  Documentation/accelerators/ocxl.rst | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
index ddcc58d01cfb..144595a80a1c 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -157,6 +157,16 @@ OCXL_IOCTL_GET_METADATA:
Obtains configuration information from the card, such at the size of
MMIO areas, the AFU version, and the PASID for the current context.
  
+OCXL_IOCTL_ENABLE_P9_WAIT:

+
+  Allows the AFU to wake a userspace thread executing 'wait'. Returns
+  information to userspace to allow it to configure the AFU.


Note that this is only available on POWER9.


+
+OCXL_IOCTL_GET_PLATFORM:
+
+  Notifies userspace as to the platform the kernel believes we are on,
+  which may differ from what userspace believes. Also reports on which CPU
+  features which are usable from userspace.


The first sentence here doesn't seem to relate to anything that 
GET_PLATFORM actually does - afaict you're just passing flags which I 
suppose imply what the correct platform is, but really they're just 
feature flags?


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



[PATCH 6/7] ocxl: Add an IOCTL so userspace knows which platform the kernel requires

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

In order for a userspace AFU driver to call the Power9 specific
OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
make that call.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/file.c | 25 +
 include/uapi/misc/ocxl.h |  4 
 2 files changed, 29 insertions(+)

diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index eb409a469f21..5a9f4f85aafd 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -168,12 +168,32 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context 
*ctx,
 }
 #endif
 
+
+static long afu_ioctl_get_platform(struct ocxl_context *ctx,
+   struct ocxl_ioctl_platform __user *uarg)
+{
+   struct ocxl_ioctl_platform arg;
+
+   memset(, 0, sizeof(arg));
+
+#ifdef CONFIG_PPC64
+   if (cpu_has_feature(CPU_FTR_P9_TIDR))
+   arg.flags[0] |= OCXL_IOCTL_PLATFORM_FLAGS0_P9_WAIT;
+#endif
+
+   if (copy_to_user(uarg, , sizeof(arg)))
+   return -EFAULT;
+
+   return 0;
+}
+
 #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :
\
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :   \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : 
\
+   x == OCXL_IOCTL_GET_PLATFORM ? "GET_PLATFORM" : \
"UNKNOWN")
 
 static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -239,6 +259,11 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
break;
 #endif
 
+   case OCXL_IOCTL_GET_PLATFORM:
+   rc = afu_ioctl_get_platform(ctx,
+   (struct ocxl_ioctl_platform __user *) args);
+   break;
+
default:
rc = -EINVAL;
}
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 8d2748e69c84..7bdd3efcf294 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -55,6 +55,9 @@ struct ocxl_ioctl_p9_wait {
__u64 reserved3[3];
 };
 
+#define OCXL_IOCTL_PLATFORM_FLAGS0_P9_WAIT 0x01
+struct ocxl_ioctl_platform {
+   __u64 flags[4];
 };
 
 struct ocxl_ioctl_irq_fd {
@@ -72,5 +75,6 @@ struct ocxl_ioctl_irq_fd {
 #define OCXL_IOCTL_IRQ_SET_FD  _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
 #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct 
ocxl_ioctl_metadata)
 #define OCXL_IOCTL_ENABLE_P9_WAIT  _IOR(OCXL_MAGIC, 0x15, struct 
ocxl_ioctl_p9_wait)
+#define OCXL_IOCTL_GET_PLATFORM _IOR(OCXL_MAGIC, 0x16, struct 
ocxl_ioctl_platform)
 
 #endif /* _UAPI_MISC_OCXL_H */
-- 
2.14.3



[PATCH 7/7] ocxl: Document new OCXL IOCTLs

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

Signed-off-by: Alastair D'Silva 
---
 Documentation/accelerators/ocxl.rst | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
index ddcc58d01cfb..144595a80a1c 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -157,6 +157,16 @@ OCXL_IOCTL_GET_METADATA:
   Obtains configuration information from the card, such at the size of
   MMIO areas, the AFU version, and the PASID for the current context.
 
+OCXL_IOCTL_ENABLE_P9_WAIT:
+
+  Allows the AFU to wake a userspace thread executing 'wait'. Returns
+  information to userspace to allow it to configure the AFU.
+
+OCXL_IOCTL_GET_PLATFORM:
+
+  Notifies userspace as to the platform the kernel believes we are on,
+  which may differ from what userspace believes. Also reports on which CPU
+  features which are usable from userspace.
 
 mmap
 
-- 
2.14.3



[PATCH 3/7] powerpc: use task_pid_nr() for TID allocation

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

The current implementation of TID allocation, using a global IDR, may
result in an errant process starving the system of available TIDs.
Instead, use task_pid_nr(), as mentioned by the original author. The
scenario described which prevented it's use is not applicable, as
set_thread_tidr can only be called after the task struct has been
populated.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/kernel/process.c | 97 +--
 1 file changed, 1 insertion(+), 96 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a3e0a3e06d5a..56ff7eb5ff79 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1496,103 +1496,12 @@ int set_thread_uses_vas(void)
 }
 
 #ifdef CONFIG_PPC64
-static DEFINE_SPINLOCK(vas_thread_id_lock);
-static DEFINE_IDA(vas_thread_ida);
-
-/*
- * We need to assign a unique thread id to each thread in a process.
- *
- * This thread id, referred to as TIDR, and separate from the Linux's tgid,
- * is intended to be used to direct an ASB_Notify from the hardware to the
- * thread, when a suitable event occurs in the system.
- *
- * One such event is a "paste" instruction in the context of Fast Thread
- * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard
- * (VAS) in POWER9.
- *
- * To get a unique TIDR per process we could simply reuse task_pid_nr() but
- * the problem is that task_pid_nr() is not yet available copy_thread() is
- * called. Fixing that would require changing more intrusive arch-neutral
- * code in code path in copy_process()?.
- *
- * Further, to assign unique TIDRs within each process, we need an atomic
- * field (or an IDR) in task_struct, which again intrudes into the arch-
- * neutral code. So try to assign globally unique TIDRs for now.
- *
- * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
- *  For now, only threads that expect to be notified by the VAS
- *  hardware need a TIDR value and we assign values > 0 for those.
- */
-#define MAX_THREAD_CONTEXT ((1 << 16) - 1)
-static int assign_thread_tidr(void)
-{
-   int index;
-   int err;
-   unsigned long flags;
-
-again:
-   if (!ida_pre_get(_thread_ida, GFP_KERNEL))
-   return -ENOMEM;
-
-   spin_lock_irqsave(_thread_id_lock, flags);
-   err = ida_get_new_above(_thread_ida, 1, );
-   spin_unlock_irqrestore(_thread_id_lock, flags);
-
-   if (err == -EAGAIN)
-   goto again;
-   else if (err)
-   return err;
-
-   if (index > MAX_THREAD_CONTEXT) {
-   spin_lock_irqsave(_thread_id_lock, flags);
-   ida_remove(_thread_ida, index);
-   spin_unlock_irqrestore(_thread_id_lock, flags);
-   return -ENOMEM;
-   }
-
-   return index;
-}
-
-static void free_thread_tidr(int id)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(_thread_id_lock, flags);
-   ida_remove(_thread_ida, id);
-   spin_unlock_irqrestore(_thread_id_lock, flags);
-}
-
-/*
- * Clear any TIDR value assigned to this thread.
- */
-void clear_thread_tidr(struct task_struct *t)
-{
-   if (!t->thread.tidr)
-   return;
-
-   if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
-   WARN_ON_ONCE(1);
-   return;
-   }
-
-   mtspr(SPRN_TIDR, 0);
-   free_thread_tidr(t->thread.tidr);
-   t->thread.tidr = 0;
-}
-
-void arch_release_task_struct(struct task_struct *t)
-{
-   clear_thread_tidr(t);
-}
-
 /*
  * Assign a unique TIDR (thread id) for task @t and set it in the thread
  * structure. For now, we only support setting TIDR for 'current' task.
  */
 int set_thread_tidr(struct task_struct *t)
 {
-   int rc;
-
if (!cpu_has_feature(CPU_FTR_P9_TIDR))
return -EINVAL;
 
@@ -1602,11 +1511,7 @@ int set_thread_tidr(struct task_struct *t)
if (t->thread.tidr)
return 0;
 
-   rc = assign_thread_tidr();
-   if (rc < 0)
-   return rc;
-
-   t->thread.tidr = rc;
+   t->thread.tidr = (u16)task_pid_nr(t);
mtspr(SPRN_TIDR, t->thread.tidr);
 
return 0;
-- 
2.14.3



[PATCH 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

Switch the use of TIDR on it's CPU feature, rather than assuming it
is available based on architecture.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/kernel/process.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1237f13fed51..a3e0a3e06d5a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1570,7 +1570,7 @@ void clear_thread_tidr(struct task_struct *t)
if (!t->thread.tidr)
return;
 
-   if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+   if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
WARN_ON_ONCE(1);
return;
}
@@ -1593,7 +1593,7 @@ int set_thread_tidr(struct task_struct *t)
 {
int rc;
 
-   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   if (!cpu_has_feature(CPU_FTR_P9_TIDR))
return -EINVAL;
 
if (t != current)
-- 
2.14.3



[PATCH 4/7] ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

The function removes the process element from NPU cache.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/pnv-ocxl.h   | 2 +-
 arch/powerpc/platforms/powernv/ocxl.c | 4 ++--
 drivers/misc/ocxl/link.c  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index f6945d3bc971..208b5503f4ed 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,7 +28,7 @@ extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void 
__iomem **dsisr,
 extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
void **platform_data);
 extern void pnv_ocxl_spa_release(void *platform_data);
-extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int 
pe_handle);
 
 extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
 extern void pnv_ocxl_free_xive_irq(u32 irq);
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index fa9b53af3c7b..8c65aacda9c8 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -475,7 +475,7 @@ void pnv_ocxl_spa_release(void *platform_data)
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
 
-int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
+int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
 {
struct spa_data *data = (struct spa_data *) platform_data;
int rc;
@@ -483,7 +483,7 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int 
pe_handle)
rc = opal_npu_spa_clear_cache(data->phb_opal_id, data->bdfn, pe_handle);
return rc;
 }
-EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
+EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
 
 int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
 {
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index f30790582dc0..656e8610eec2 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -599,7 +599,7 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
 * On powerpc, the entry needs to be cleared from the context
 * cache of the NPU.
 */
-   rc = pnv_ocxl_spa_remove_pe(link->platform_data, pe_handle);
+   rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
WARN_ON(rc);
 
pe_data = radix_tree_delete(>pe_tree, pe_handle);
-- 
2.14.3



[PATCH 5/7] ocxl: Expose the thread_id needed for wait on p9

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

In order to successfully issue as_notify, an AFU needs to know the TID
to notify, which in turn means that this information should be
available in userspace so it can be communicated to the AFU.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/context.c   |  5 +++-
 drivers/misc/ocxl/file.c  | 53 +++
 drivers/misc/ocxl/link.c  | 36 ++
 drivers/misc/ocxl/ocxl_internal.h |  1 +
 include/misc/ocxl.h   |  9 +++
 include/uapi/misc/ocxl.h  | 10 
 6 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 909e8807824a..95f74623113e 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct 
ocxl_afu *afu,
mutex_init(>xsl_error_lock);
mutex_init(>irq_lock);
idr_init(>irq_idr);
+   ctx->tidr = 0;
+
/*
 * Keep a reference on the AFU to make sure it's valid for the
 * duration of the life of the context
@@ -65,6 +67,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
 {
int rc;
 
+   // Locks both status & tidr
mutex_lock(>status_mutex);
if (ctx->status != OPENED) {
rc = -EIO;
@@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
}
 
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
-   current->mm->context.id, 0, amr, current->mm,
+   current->mm->context.id, ctx->tidr, amr, current->mm,
xsl_fault_error, ctx);
if (rc)
goto out;
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index 038509e5d031..eb409a469f21 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -5,6 +5,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "ocxl_internal.h"
 
 
@@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct ocxl_context 
*ctx,
return 0;
 }
 
+#ifdef CONFIG_PPC64
+static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
+   struct ocxl_ioctl_p9_wait __user *uarg)
+{
+   struct ocxl_ioctl_p9_wait arg;
+
+   memset(, 0, sizeof(arg));
+
+   if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
+   enum ocxl_context_status status;
+
+   // Locks both status & tidr
+   mutex_lock(>status_mutex);
+   if (!ctx->tidr) {
+   if (set_thread_tidr(current))
+   return -ENOENT;
+
+   ctx->tidr = current->thread.tidr;
+   }
+
+   status = ctx->status;
+   mutex_unlock(>status_mutex);
+
+   if (status == ATTACHED) {
+   int rc;
+   struct link *link = ctx->afu->fn->link;
+
+   rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
+   if (rc)
+   return rc;
+   }
+
+   arg.thread_id = ctx->tidr;
+   } else
+   return -ENOENT;
+
+   if (copy_to_user(uarg, , sizeof(arg)))
+   return -EFAULT;
+
+   return 0;
+}
+#endif
+
 #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :
\
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :   \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
+   x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : 
\
"UNKNOWN")
 
 static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -186,6 +232,13 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
(struct ocxl_ioctl_metadata __user *) args);
break;
 
+#ifdef CONFIG_PPC64
+   case OCXL_IOCTL_ENABLE_P9_WAIT:
+   rc = afu_ioctl_enable_p9_wait(ctx,
+   (struct ocxl_ioctl_p9_wait __user *) args);
+   break;
+#endif
+
default:
rc = -EINVAL;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 656e8610eec2..88876ae8f330 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 }
 EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
 
+int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
+{
+   struct link *link = (struct link *) link_handle;
+   struct spa *spa = link->spa;
+   struct ocxl_process_element *pe;
+   int pe_handle, rc;
+
+   if 

[PATCH 1/7] powerpc: Add TIDR CPU feature for Power9

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cputable.h  | 3 ++-
 arch/powerpc/include/asm/switch_to.h | 1 -
 arch/powerpc/kernel/dt_cpu_ftrs.c| 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 4e332f3531c5..54c4cbbe57b4 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -215,6 +215,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_TM_HV_ASSIST
LONG_ASM_CONST(0x1000)
 #define CPU_FTR_P9_TM_XER_SO_BUG   LONG_ASM_CONST(0x2000)
 #define CPU_FTR_P9_TLBIE_BUG   LONG_ASM_CONST(0x4000)
+#define CPU_FTR_P9_TIDR
LONG_ASM_CONST(0x8000)
 
 #ifndef __ASSEMBLY__
 
@@ -462,7 +463,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
-   CPU_FTR_P9_TLBIE_BUG)
+   CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR)
 #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
 (~CPU_FTR_SAO))
 #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index be8c9fa23983..5b03d8a82409 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -94,6 +94,5 @@ static inline void clear_task_ebb(struct task_struct *t)
 extern int set_thread_uses_vas(void);
 
 extern int set_thread_tidr(struct task_struct *t);
-extern void clear_thread_tidr(struct task_struct *t);
 
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 11a3a4fed3fb..10f8b7f55637 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -722,6 +722,7 @@ static __init void cpufeatures_cpu_quirks(void)
if ((version & 0x) == 0x004e) {
cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR);
cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG;
+   cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR;
}
 }
 
-- 
2.14.3



[PATCH 0/7] ocxl: Implement Power9 as_notify/wait for OpenCAPI

2018-04-16 Thread Alastair D'Silva
From: Alastair D'Silva 

The Power 9 as_notify/wait feature provides a lower latency way to
signal a thread that work is complete. This series enables the use of
this feature from OpenCAPI adapters, as well as addressing a potential
starvation issue when allocating thread IDs.

Alastair D'Silva (7):
  powerpc: Add TIDR CPU feature for Power9
  powerpc: Use TIDR CPU feature to control TIDR allocation
  powerpc: use task_pid_nr() for TID allocation
  ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
  ocxl: Expose the thread_id needed for wait on p9
  ocxl: Add an IOCTL so userspace knows which platform the kernel
requires
  ocxl: Document new OCXL IOCTLs

 Documentation/accelerators/ocxl.rst   | 10 
 arch/powerpc/include/asm/cputable.h   |  3 +-
 arch/powerpc/include/asm/pnv-ocxl.h   |  2 +-
 arch/powerpc/include/asm/switch_to.h  |  1 -
 arch/powerpc/kernel/dt_cpu_ftrs.c |  1 +
 arch/powerpc/kernel/process.c | 99 +--
 arch/powerpc/platforms/powernv/ocxl.c |  4 +-
 drivers/misc/ocxl/context.c   |  5 +-
 drivers/misc/ocxl/file.c  | 78 +++
 drivers/misc/ocxl/link.c  | 38 +-
 drivers/misc/ocxl/ocxl_internal.h |  1 +
 include/misc/ocxl.h   |  9 
 include/uapi/misc/ocxl.h  | 14 +
 13 files changed, 161 insertions(+), 104 deletions(-)

-- 
2.14.3



[PATCH] powerpc/64s: Default l1d_size to 64K in RFI fallback flush

2018-04-16 Thread Michael Ellerman
From: Madhavan Srinivasan 

If there is no d-cache-size property in the device tree, l1d_size could
be zero. We don't actually expect that to happen, it's only been seen
on mambo (simulator) in some configurations.

A zero-size l1d_size leads to the loop in the asm wrapping around to
2^64-1, and then walking off the end of the fallback area and
eventually causing a page fault which is fatal.

Just default to 64K which is correct on some CPUs, and sane enough to
not cause a crash on others.

Fixes: aa8a5e0062ac9 ('powerpc/64s: Add support for RFI flush of L1-D cache')
Signed-off-by: Madhavan Srinivasan 
[mpe: Rewrite comment and change log]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/setup_64.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 44c30dd38067..b78f142a4148 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -890,6 +890,17 @@ static void __ref init_fallback_flush(void)
return;
 
l1d_size = ppc64_caches.l1d.size;
+
+   /*
+* If there is no d-cache-size property in the device tree, l1d_size
+* could be zero. That leads to the loop in the asm wrapping around to
+* 2^64-1, and then walking off the end of the fallback area and
+* eventually causing a page fault which is fatal. Just default to
+* something vaguely sane.
+*/
+   if (!l1d_size)
+   l1d_size = (64 * 1024);
+
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
/*
-- 
2.14.1



Re: [PATCH v2 5/5] KVM: PPC: Book3S HV: radix do not clear partition scoped page table when page fault races with other vCPUs.

2018-04-16 Thread Nicholas Piggin
On Mon, 16 Apr 2018 14:32:40 +1000
Nicholas Piggin  wrote:

> When running a SMP radix guest, KVM can get into page fault / tlbie
> storms -- hundreds of thousands to the same address from different
> threads -- due to partition scoped page faults invalidating the
> page table entry if it was found to be already set up by a racing
> CPU.
> 
> What can happen is that guest threads can hit page faults for the
> same addresses, this can happen when KSM or THP takes out a commonly
> used page. gRA zero (the interrupt vectors and important kernel text)
> was a common one. Multiple CPUs will page fault and contend on the
> same lock, when one CPU sets up the page table and releases the lock,
> the next will find the new entry and invalidate it before installing
> its own, which causes other page faults which invalidate that entry,
> etc.
> 
> The solution to this is to avoid invalidating the entry or flushing
> TLBs in case of a race. The pte may still need bits updated, but
> those are to add R/C or relax access restrictions so no flush is
> required.
> 
> This solves the page fault / tlbie storms.

Oh, I didn't notice "KVM: PPC: Book3S HV: Radix page fault handler
optimizations" does much the same thing as this one and it's been
merged upstream now.

That also adds a partition scoped PWC flush that I'll add to
powerpc/mm, so I'll rebase this series.

Thanks,
Nick


Re: [PATCH 00/32] docs/vm: convert to ReST format

2018-04-16 Thread Jonathan Corbet
On Sun, 15 Apr 2018 20:36:56 +0300
Mike Rapoport  wrote:

> I didn't mean we should keep it as unorganized jumble of stuff and I agree
> that splitting the documentation by audience is better because developers
> are already know how to find it :)
> 
> I just thought that putting the doc into the place should not be done
> immediately after mechanical ReST conversion but rather after improving the
> contents.

OK, this is fine.  I'll go ahead and apply the set, but then I'll be
watching to see that the other improvements come :)

In applying the set, there was a significant set of conflicts with
vm/hmm.rst; hopefully I've sorted those out properly.

Thanks,

jon


Re: [PATCH] drivers/of: Introduce ARCH_HAS_OWN_OF_NUMA

2018-04-16 Thread Rob Herring
On Mon, Apr 9, 2018 at 4:05 PM, Dan Williams  wrote:
> On Mon, Apr 9, 2018 at 1:52 PM, Rob Herring  wrote:
>> On Mon, Apr 9, 2018 at 2:46 AM, Oliver O'Halloran  wrote:
>>> Some OF platforms (pseries and some SPARC systems) has their own
>>> implementations of NUMA affinity detection rather than using the generic
>>> OF_NUMA driver, which mainly exists for arm64. For other platforms one
>>> of two fallbacks provided by the base OF driver are used depending on
>>> CONFIG_NUMA.
>>>
>>> In the CONFIG_NUMA=n case the fallback is an inline function in of.h.
>>> In the =y case the fallback is a real function which is defined as a
>>> weak symbol so that it may be overwritten by the architecture if desired.
>>>
>>> The problem with this arrangement is that the real implementations all
>>> export of_node_to_nid(). Unfortunately it's not possible to export the
>>> fallback since it would clash with the non-weak version. As a result
>>> we get build failures when:
>>>
>>> a) CONFIG_NUMA=y && CONFIG_OF=y, and
>>> b) The platform doesn't implement of_node_to_nid(), and
>>> c) A module uses of_node_to_nid()
>>>
>>> Given b) will be true for most platforms this is fairly easy to hit
>>> and has been observed on ia64 and x86.
>>
>> How specifically do we hit this? The only module I see using
>> of_node_to_nid in mainline is Cell EDAC driver.
>
> The of_pmem driver is using it currently pending for a 4.17 pull
> request. Stephen hit the compile failure in -next.

I took a look at this. The correct fix here is to use dev_to_node() instead:

diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 85013bad35de..0a701837dfc0 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -67,7 +67,7 @@ static int of_pmem_region_probe(struct platform_device *pdev)
 */
memset(_desc, 0, sizeof(ndr_desc));
ndr_desc.attr_groups = region_attr_groups;
-   ndr_desc.numa_node = of_node_to_nid(np);
+   ndr_desc.numa_node = dev_to_node(>dev);
ndr_desc.res = >resource[i];
ndr_desc.of_node = np;
set_bit(ND_REGION_PAGEMAP, _desc.flags);


And we should remove the exported symbol.

I'll send a proper patch.

Rob


Re: [PATCH] ibmvnic: Clear pending interrupt after device reset

2018-04-16 Thread David Miller
From: Thomas Falcon 
Date: Sun, 15 Apr 2018 18:53:36 -0500

> Due to a firmware bug, the hypervisor can send an interrupt to a
> transmit or receive queue just prior to a partition migration, not
> allowing the device enough time to handle it and send an EOI. When
> the partition migrates, the interrupt is lost but an "EOI-pending"
> flag for the interrupt line is still set in firmware. No further
> interrupts will be sent until that flag is cleared, effectively
> freezing that queue. To workaround this, the driver will disable the
> hardware interrupt and send an H_EOI signal prior to re-enabling it.
> This will flush the pending EOI and allow the driver to continue
> operation.
> 
> Signed-off-by: Thomas Falcon 

Applied, thanks Thomas.


Re: [PATCH v2 6/6] fsl_pmc: update device bindings

2018-04-16 Thread Rob Herring
On Wed, Apr 11, 2018 at 02:35:51PM +0800, Ran Wang wrote:
> From: Li Yang 

Needs a commit msg and the subject should give some indication of what 
the update is. And also start with "dt-bindings: ..."

> 
> Signed-off-by: Li Yang 
> Signed-off-by: Zhao Chenhui 
> Signed-off-by: Ran Wang 
> ---
> Changes in v2:
>   - new file
> 
>  .../devicetree/bindings/powerpc/fsl/pmc.txt|   59 +++
>  1 files changed, 34 insertions(+), 25 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/pmc.txt 
> b/Documentation/devicetree/bindings/powerpc/fsl/pmc.txt
> index 07256b7..f1f749f 100644
> --- a/Documentation/devicetree/bindings/powerpc/fsl/pmc.txt
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/pmc.txt
> @@ -9,15 +9,20 @@ Properties:
>  
>"fsl,mpc8548-pmc" should be listed for any chip whose PMC is
>compatible.  "fsl,mpc8536-pmc" should also be listed for any chip
> -  whose PMC is compatible, and implies deep-sleep capability.
> +  whose PMC is compatible, and implies deep-sleep capability and
> +  wake on user defined packet(wakeup on ARP).
> +
> +  "fsl,p1022-pmc" should be listed for any chip whose PMC is
> +  compatible, and implies lossless Ethernet capability during sleep.
>  
>"fsl,mpc8641d-pmc" should be listed for any chip whose PMC is
>compatible; all statements below that apply to "fsl,mpc8548-pmc" also
>apply to "fsl,mpc8641d-pmc".
>  
>Compatibility does not include bit assignments in SCCR/PMCDR/DEVDISR; these
> -  bit assignments are indicated via the sleep specifier in each device's
> -  sleep property.
> +  bit assignments are indicated via the clock nodes.  Device which has a
> +  controllable clock source should have a "fsl,pmc-handle" property pointing
> +  to the clock node.
>  
>  - reg: For devices compatible with "fsl,mpc8349-pmc", the first resource
>is the PMC block, and the second resource is the Clock Configuration
> @@ -33,31 +38,35 @@ Properties:
>this is a phandle to an "fsl,gtm" node on which timer 4 can be used as
>a wakeup source from deep sleep.
>  
> -Sleep specifiers:
> +Clock nodes:
> +The clock nodes are to describe the masks in PM controller registers for each
> +soc clock.
> +- fsl,pmcdr-mask: For "fsl,mpc8548-pmc"-compatible devices, the mask will be
> +  ORed into PMCDR before suspend if the device using this clock is the 
> wake-up
> +  source and need to be running during low power mode; clear the mask if
> +  otherwise.
>  
> -  fsl,mpc8349-pmc: Sleep specifiers consist of one cell.  For each bit
> -  that is set in the cell, the corresponding bit in SCCR will be saved
> -  and cleared on suspend, and restored on resume.  This sleep controller
> -  supports disabling and resuming devices at any time.
> +- fsl,sccr-mask: For "fsl,mpc8349-pmc"-compatible devices, the corresponding
> +  bit specified by the mask in SCCR will be saved and cleared on suspend, and
> +  restored on resume.
>  
> -  fsl,mpc8536-pmc: Sleep specifiers consist of three cells, the third of
> -  which will be ORed into PMCDR upon suspend, and cleared from PMCDR
> -  upon resume.  The first two cells are as described for fsl,mpc8578-pmc.
> -  This sleep controller only supports disabling devices during system
> -  sleep, or permanently.
> -
> -  fsl,mpc8548-pmc: Sleep specifiers consist of one or two cells, the
> -  first of which will be ORed into DEVDISR (and the second into
> -  DEVDISR2, if present -- this cell should be zero or absent if the
> -  hardware does not have DEVDISR2) upon a request for permanent device
> -  disabling.  This sleep controller does not support configuring devices
> -  to disable during system sleep (unless supported by another compatible
> -  match), or dynamically.

You seem to be breaking backwards compatibility with this change. I 
doubt that is okay on these platforms.


> +- fsl,devdisr-mask: Contain one or two cells, depending on the availability 
> of
> +  DEVDISR2 register.  For compatible devices, the mask will be ORed into 
> DEVDISR
> +  or DEVDISR2 when the clock should be permenently disabled.
>  
>  Example:
>  
> - power@b00 {
> - compatible = "fsl,mpc8313-pmc", "fsl,mpc8349-pmc";
> - reg = <0xb00 0x100 0xa00 0x100>;
> - interrupts = <80 8>;
> + power@e0070 {
> + compatible = "fsl,mpc8536-pmc", "fsl,mpc8548-pmc";
> + reg = <0xe0070 0x20>;
> +
> + etsec1_clk: soc-clk@24 {
> + fsl,pmcdr-mask = <0x0080>;
> + };
> + etsec2_clk: soc-clk@25 {
> + fsl,pmcdr-mask = <0x0040>;
> + };
> + etsec3_clk: soc-clk@26 {
> + fsl,pmcdr-mask = <0x0020>;
> + };
>   };
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to 

[PATCH 5/5] powerpc/lib: Add alt patching test of branching past the last instruction

2018-04-16 Thread Michael Ellerman
Add a test of the relative branch patching logic in the alternate
section feature fixup code. This tests that if we branch past the last
instruction of the alternate section, the branch is not patched.
That's because the assembler will have created a branch that already
points to the first instruction after the patched section, which is
correct and needs no further patching.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/feature-fixups-test.S | 36 ++
 arch/powerpc/lib/feature-fixups.c  | 11 +++
 2 files changed, 47 insertions(+)

diff --git a/arch/powerpc/lib/feature-fixups-test.S 
b/arch/powerpc/lib/feature-fixups-test.S
index dd05afcbcde3..f16cec989506 100644
--- a/arch/powerpc/lib/feature-fixups-test.S
+++ b/arch/powerpc/lib/feature-fixups-test.S
@@ -171,6 +171,42 @@ globl(ftr_fixup_test6_expected)
or  2,2,2
or  3,3,3
 
+globl(ftr_fixup_test7)
+   or  1,1,1
+BEGIN_FTR_SECTION
+   or  2,2,2
+   or  2,2,2
+   or  2,2,2
+   or  2,2,2
+   or  2,2,2
+   or  2,2,2
+   or  2,2,2
+FTR_SECTION_ELSE
+2: b   3f
+3: or  5,5,5
+   beq 3b
+   b   1f
+   or  6,6,6
+   b   2b
+   bdnz3b
+1:
+ALT_FTR_SECTION_END(0, 1)
+   or  1,1,1
+   or  1,1,1
+
+globl(end_ftr_fixup_test7)
+   nop
+
+globl(ftr_fixup_test7_expected)
+   or  1,1,1
+2: b   3f
+3: or  5,5,5
+   beq 3b
+   b   1f
+   or  6,6,6
+   b   2b
+   bdnz3b
+1: or  1,1,1
 
 #if 0
 /* Test that if we have a larger else case the assembler spots it and
diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 097b45bd9de4..f3e46d4edd72 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -425,6 +425,16 @@ static void 
test_alternative_case_with_external_branch(void)
check(memcmp(ftr_fixup_test6, ftr_fixup_test6_expected, size) == 0);
 }
 
+static void test_alternative_case_with_branch_to_end(void)
+{
+   extern unsigned int ftr_fixup_test7[];
+   extern unsigned int end_ftr_fixup_test7[];
+   extern unsigned int ftr_fixup_test7_expected[];
+   int size = 4 * (end_ftr_fixup_test7 - ftr_fixup_test7);
+
+   check(memcmp(ftr_fixup_test7, ftr_fixup_test7_expected, size) == 0);
+}
+
 static void test_cpu_macros(void)
 {
extern u8 ftr_fixup_test_FTR_macros[];
@@ -480,6 +490,7 @@ static int __init test_feature_fixups(void)
test_alternative_case_too_small();
test_alternative_case_with_branch();
test_alternative_case_with_external_branch();
+   test_alternative_case_with_branch_to_end();
test_cpu_macros();
test_fw_macros();
test_lwsync_macros();
-- 
2.14.1



[PATCH 4/5] powerpc/lib: Rename ftr_fixup_test7 to ftr_fixup_test_too_big

2018-04-16 Thread Michael Ellerman
We want this to remain the last test (because it's disabled by
default), so give it a non-numbered name so we don't have to renumber
it when adding new tests before it.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/feature-fixups-test.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/feature-fixups-test.S 
b/arch/powerpc/lib/feature-fixups-test.S
index 12ff0f673956..dd05afcbcde3 100644
--- a/arch/powerpc/lib/feature-fixups-test.S
+++ b/arch/powerpc/lib/feature-fixups-test.S
@@ -176,7 +176,7 @@ globl(ftr_fixup_test6_expected)
 /* Test that if we have a larger else case the assembler spots it and
  * reports an error. #if 0'ed so as not to break the build normally.
  */
-ftr_fixup_test7:
+ftr_fixup_test_too_big:
or  1,1,1
 BEGIN_FTR_SECTION
or  2,2,2
-- 
2.14.1



[PATCH 3/5] powerpc/lib: Fix the feature fixup tests to actually work

2018-04-16 Thread Michael Ellerman
The code patching code has always been a bit confused about whether
it's best to use void *, unsigned int *, char *, etc. to point to
instructions. In fact in the feature fixups tests we use both unsigned
int[] and u8[] in different places.

Unfortunately the tests that use unsigned int[] calculate the size of
the code blocks using subtraction of those unsigned int pointers, and
then pass the result to memcmp(). This means we're only comparing 1/4
of the bytes we need to, because we need to multiply by
sizeof(unsigned int) to get the number of *bytes*.

The result is that the tests do all the patching and then only compare
some of the resulting code, so patching bugs that only effect that
last 3/4 of the code could slip through undetected. It turns out that
hasn't been happening, although one test had a bad expected case (see
previous commit).

Fix it for now by multiplying the size by 4 in the affected functions.

Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code")
Epic-brown-paper-bag-by: Michael Ellerman 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/feature-fixups.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 288fe4f0db4e..097b45bd9de4 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -285,7 +285,7 @@ static void test_basic_patching(void)
extern unsigned int end_ftr_fixup_test1[];
extern unsigned int ftr_fixup_test1_orig[];
extern unsigned int ftr_fixup_test1_expected[];
-   int size = end_ftr_fixup_test1 - ftr_fixup_test1;
+   int size = 4 * (end_ftr_fixup_test1 - ftr_fixup_test1);
 
fixup.value = fixup.mask = 8;
fixup.start_off = calc_offset(, ftr_fixup_test1 + 1);
@@ -317,7 +317,7 @@ static void test_alternative_patching(void)
extern unsigned int ftr_fixup_test2_orig[];
extern unsigned int ftr_fixup_test2_alt[];
extern unsigned int ftr_fixup_test2_expected[];
-   int size = end_ftr_fixup_test2 - ftr_fixup_test2;
+   int size = 4 * (end_ftr_fixup_test2 - ftr_fixup_test2);
 
fixup.value = fixup.mask = 0xF;
fixup.start_off = calc_offset(, ftr_fixup_test2 + 1);
@@ -349,7 +349,7 @@ static void test_alternative_case_too_big(void)
extern unsigned int end_ftr_fixup_test3[];
extern unsigned int ftr_fixup_test3_orig[];
extern unsigned int ftr_fixup_test3_alt[];
-   int size = end_ftr_fixup_test3 - ftr_fixup_test3;
+   int size = 4 * (end_ftr_fixup_test3 - ftr_fixup_test3);
 
fixup.value = fixup.mask = 0xC;
fixup.start_off = calc_offset(, ftr_fixup_test3 + 1);
@@ -376,7 +376,7 @@ static void test_alternative_case_too_small(void)
extern unsigned int ftr_fixup_test4_orig[];
extern unsigned int ftr_fixup_test4_alt[];
extern unsigned int ftr_fixup_test4_expected[];
-   int size = end_ftr_fixup_test4 - ftr_fixup_test4;
+   int size = 4 * (end_ftr_fixup_test4 - ftr_fixup_test4);
unsigned long flag;
 
/* Check a high-bit flag */
@@ -410,7 +410,7 @@ static void test_alternative_case_with_branch(void)
extern unsigned int ftr_fixup_test5[];
extern unsigned int end_ftr_fixup_test5[];
extern unsigned int ftr_fixup_test5_expected[];
-   int size = end_ftr_fixup_test5 - ftr_fixup_test5;
+   int size = 4 * (end_ftr_fixup_test5 - ftr_fixup_test5);
 
check(memcmp(ftr_fixup_test5, ftr_fixup_test5_expected, size) == 0);
 }
@@ -420,7 +420,7 @@ static void test_alternative_case_with_external_branch(void)
extern unsigned int ftr_fixup_test6[];
extern unsigned int end_ftr_fixup_test6[];
extern unsigned int ftr_fixup_test6_expected[];
-   int size = end_ftr_fixup_test6 - ftr_fixup_test6;
+   int size = 4 * (end_ftr_fixup_test6 - ftr_fixup_test6);
 
check(memcmp(ftr_fixup_test6, ftr_fixup_test6_expected, size) == 0);
 }
-- 
2.14.1



[PATCH 2/5] powerpc/lib: Fix feature fixup test of external branch

2018-04-16 Thread Michael Ellerman
The expected case for this test was wrong, the source of the alternate
code sequence is:

  FTR_SECTION_ELSE
  2:or  2,2,2
PPC_LCMPI   r3,1
beq 3f
blt 2b
b   3f
b   1b
  ALT_FTR_SECTION_END(0, 1)
  3:or  1,1,1
or  2,2,2
  4:or  3,3,3

So when it's patched the '3' label should still be on the 'or 1,1,1',
and the 4 label is irrelevant and can be removed.

Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/feature-fixups-test.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/feature-fixups-test.S 
b/arch/powerpc/lib/feature-fixups-test.S
index f4613118132e..12ff0f673956 100644
--- a/arch/powerpc/lib/feature-fixups-test.S
+++ b/arch/powerpc/lib/feature-fixups-test.S
@@ -167,9 +167,9 @@ globl(ftr_fixup_test6_expected)
blt 2b
b   3f
b   1b
-2: or  1,1,1
+3: or  1,1,1
or  2,2,2
-3: or  3,3,3
+   or  3,3,3
 
 
 #if 0
-- 
2.14.1



[PATCH 1/5] powerpc/lib: Fix off-by-one in alternate feature patching

2018-04-16 Thread Michael Ellerman
When we patch an alternate feature section, we have to adjust any
relative branches that branch out of the alternate section.

But currently we have a bug if we have a branch that points to past
the last instruction of the alternate section, eg:

  FTR_SECTION_ELSE
  1: b   2f
 or  6,6,6
  2:
  ALT_FTR_SECTION_END(...)
 nop

This will result in a relative branch at 1 with a target that equals
the end of the alternate section.

That branch does not need adjusting when it's moved to the non-else
location. Currently we do adjust it, resulting in a branch that goes
off into the link-time location of the else section, which is junk.

The fix is to not patch branches that have a target == end of the
alternate section.

Fixes: d20fe50a7b3c ("KVM: PPC: Book3S HV: Branch inside feature section")
Fixes: 9b1a735de64c ("powerpc: Add logic to patch alternative feature sections")
Cc: sta...@vger.kernel.org # v2.6.27+
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/feature-fixups.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 35f80ab7cbd8..288fe4f0db4e 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -55,7 +55,7 @@ static int patch_alt_instruction(unsigned int *src, unsigned 
int *dest,
unsigned int *target = (unsigned int *)branch_target(src);
 
/* Branch within the section doesn't need translating */
-   if (target < alt_start || target >= alt_end) {
+   if (target < alt_start || target > alt_end) {
instr = translate_branch(dest, src);
if (!instr)
return 1;
-- 
2.14.1



Re: powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic

2018-04-16 Thread Michael Ellerman
Ard Biesheuvel  writes:

> On 11 April 2018 at 16:49, Michael Ellerman
>  wrote:
>> On Tue, 2018-04-10 at 01:22:06 UTC, Michael Ellerman wrote:
>>> If you build the kernel with CONFIG_RELOCATABLE=n, then install the
>>> modules, rebuild the kernel with CONFIG_RELOCATABLE=y and leave the
>>> old modules installed, we crash something like:
>>>
>>>   Unable to handle kernel paging request for data at address 
>>> 0xd00018d66cef
>>>   Faulting instruction address: 0xc21ddd08
>>>   Oops: Kernel access of bad area, sig: 11 [#1]
>>>   Modules linked in: x_tables autofs4
>>>   CPU: 2 PID: 1 Comm: systemd Not tainted 4.16.0-rc6-gcc_ubuntu_le-g99fec39 
>>> #1
>>>   ...
>>>   NIP check_version.isra.22+0x118/0x170
>>>   Call Trace:
>>> __ksymtab_xt_unregister_table+0x58/0xfcb8 [x_tables] 
>>> (unreliable)
>>> resolve_symbol+0xb4/0x150
>>> load_module+0x10e8/0x29a0
>>> SyS_finit_module+0x110/0x140
>>> system_call+0x58/0x6c
>>>
>>> This happens because since commit 71810db27c1c ("modversions: treat
>>> symbol CRCs as 32 bit quantities"), a relocatable kernel encodes and
>>> handles symbol CRCs differently from a non-relocatable kernel.
>>>
>>> Although it's possible we could try and detect this situation and
>>> handle it, it's much more robust to simply make the state of
>>> CONFIG_RELOCATABLE part of the module vermagic.
>>>
>>> Fixes: 71810db27c1c ("modversions: treat symbol CRCs as 32 bit quantities")
>>> Signed-off-by: Michael Ellerman 
>>
>> Applied to powerpc fixes.
>>
>> https://git.kernel.org/powerpc/c/73aca179d78eaa11604ba0783a6d8b
>
> Thanks for the cc. I guess this only affects powerpc, given that it is
> the only arch that switches between CRC immediate values and CRC
> offsets depending on the configuration.

No worries.

Is there any reason we shouldn't always turn on CONFIG_MODULE_REL_CRCS?
It seems to work, but I wanted to test it more before switching to that,
hence the quick fix above.


arch/um looks like it might be switching based on config, but I don't
know enough to say:

  config LD_SCRIPT_STATIC
bool
default y
depends on STATIC_LINK
  
  config LD_SCRIPT_DYN
bool
default y
depends on !LD_SCRIPT_STATIC
  select MODULE_REL_CRCS if MODVERSIONS


cheers


Applied "ASoC: fsl_esai: Add freq check in set_dai_sysclk()" to the asoc tree

2018-04-16 Thread Mark Brown
The patch

   ASoC: fsl_esai: Add freq check in set_dai_sysclk()

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 8a2278b7fb3df67cd415c679ba1a0e5e4a1761a7 Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Sun, 8 Apr 2018 17:33:54 -0700
Subject: [PATCH] ASoC: fsl_esai: Add freq check in set_dai_sysclk()

The freq parameter indicates the physical frequency of an actual
input clock or a desired frequency of an output clock for HCKT/R.
It should never be passed 0. This might cause Division-by-zero.

So this patch adds a check to fix it.

Signed-off-by: Nicolin Chen 
Reviewed-by: Fabio Estevam 
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl_esai.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index da8fd98c7f51..d79e99ef31ad 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -226,6 +226,12 @@ static int fsl_esai_set_dai_sysclk(struct snd_soc_dai 
*dai, int clk_id,
unsigned long clk_rate;
int ret;
 
+   if (freq == 0) {
+   dev_err(dai->dev, "%sput freq of HCK%c should not be 0Hz\n",
+   in ? "in" : "out", tx ? 'T' : 'R');
+   return -EINVAL;
+   }
+
/* Bypass divider settings if the requirement doesn't change */
if (freq == esai_priv->hck_rate[tx] && dir == esai_priv->hck_dir[tx])
return 0;
-- 
2.17.0



Re: [PATCH v2] cxl: Set the PBCQ Tunnel BAR register when enabling capi mode

2018-04-16 Thread christophe lombard

Le 13/04/2018 à 13:59, Philippe Bergheaud a écrit :

Skiboot used to set the default Tunnel BAR register value when capi mode
was enabled. This approach was ok for the cxl driver, but prevented other
drivers from choosing different values.

Skiboot versions > 5.11 will not set the default value any longer. This
patch modifies the cxl driver to set/reset the Tunnel BAR register when
entering/exiting the cxl mode, with pnv_pci_set_tunnel_bar().

Signed-off-by: Philippe Bergheaud 
---


Thanks

Reviewed-by: Christophe Lombard 



[PATCH V1 11/11] powerpc/book3s64: Enable split pmd ptlock.

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

Testing with a threaded version of mmap_bench which allocate 1G chunks and
with large number of threads we find:

without patch

32.72%  mmap_bench  [kernel.vmlinux][k] do_raw_spin_lock
|
---do_raw_spin_lock
   |
--32.68%--0
  |
  |--15.82%--pte_fragment_alloc
  |  |
  |   --15.79%--do_huge_pmd_anonymous_page
  | __handle_mm_fault
  | handle_mm_fault
  | __do_page_fault
  | handle_page_fault
  | test_mmap
  | test_mmap
  | start_thread
  | __clone
  |
  |--14.95%--do_huge_pmd_anonymous_page
  |  __handle_mm_fault
  |  handle_mm_fault
  |  __do_page_fault
  |  handle_page_fault
  |  test_mmap
  |  test_mmap
  |  start_thread
  |  __clone
  |

with patch

12.89%  mmap_bench  [kernel.vmlinux][k] do_raw_spin_lock
|
---do_raw_spin_lock
   |
--12.83%--0
  |
  |--3.21%--pagevec_lru_move_fn
  |  __lru_cache_add
  |  |
  |   --2.74%--do_huge_pmd_anonymous_page
  | __handle_mm_fault
  | handle_mm_fault
  | __do_page_fault
  | handle_page_fault
  | test_mmap
  | test_mmap
  | start_thread
  | __clone
  |
  |--3.11%--do_huge_pmd_anonymous_page
  |  __handle_mm_fault
  |  handle_mm_fault
  |  __do_page_fault
  |  handle_page_fault
  |  test_mmap
  |  test_mmap
  |  start_thread
  |  __clone

.
  |
   --0.55%--pte_fragment_alloc
 |
  --0.55%--do_huge_pmd_anonymous_page
__handle_mm_fault
handle_mm_fault
__do_page_fault
handle_page_fault
test_mmap
test_mmap
start_thread
__clone

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/Kconfig.cputype | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 67d3125d0610..cc892dcfa114 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -292,6 +292,10 @@ config PPC_STD_MMU_32
def_bool y
depends on PPC_STD_MMU && PPC32
 
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+   def_bool y
+   depends on PPC_BOOK3S_64
+
 config PPC_RADIX_MMU
bool "Radix MMU Support"
depends on PPC_BOOK3S_64
-- 
2.14.3



[PATCH V1 10/11] powerpc/mm: Use page fragments for allocation page table at PMD level

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 10 --
 arch/powerpc/include/asm/book3s/64/pgalloc.h |  8 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 ++--
 arch/powerpc/mm/hash_utils_64.c  |  1 -
 arch/powerpc/mm/pgtable-book3s64.c   |  3 +--
 arch/powerpc/mm/pgtable-radix.c  |  1 -
 arch/powerpc/mm/pgtable_64.c |  2 --
 7 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index cc8cd656ccfe..0387b155f13d 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -23,16 +23,6 @@
 H_PUD_INDEX_SIZE + H_PGD_INDEX_SIZE + 
PAGE_SHIFT)
 #define H_PGTABLE_RANGE(ASM_CONST(1) << H_PGTABLE_EADDR_SIZE)
 
-#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE)) && \
-   defined(CONFIG_PPC_64K_PAGES)
-/*
- * only with hash 64k we need to use the second half of pmd page table
- * to store pointer to deposited pgtable_t
- */
-#define H_PMD_CACHE_INDEX  (H_PMD_INDEX_SIZE + 1)
-#else
-#define H_PMD_CACHE_INDEX  H_PMD_INDEX_SIZE
-#endif
 /*
  * We store the slot details in the second half of page table.
  * Increase the pud level table so that hugetlb ptes can be stored
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 005f400cbf30..01ee40f11f3a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -90,8 +90,7 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 * need to do this for 4k.
 */
 #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_PPC_64K_PAGES) && \
-   ((H_PGD_INDEX_SIZE == H_PUD_CACHE_INDEX) ||  \
-(H_PGD_INDEX_SIZE == H_PMD_CACHE_INDEX))
+   (H_PGD_INDEX_SIZE == H_PUD_CACHE_INDEX)
memset(pgd, 0, PGD_TABLE_SIZE);
 #endif
return pgd;
@@ -138,13 +137,12 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX),
-   pgtable_gfp_flags(mm, GFP_KERNEL));
+   return pmd_fragment_alloc(mm, addr);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
-   kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
+   pmd_fragment_free((unsigned long *)pmd);
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index c9db19512b3c..c233915abb68 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -212,13 +212,13 @@ extern unsigned long __pte_index_size;
 extern unsigned long __pmd_index_size;
 extern unsigned long __pud_index_size;
 extern unsigned long __pgd_index_size;
-extern unsigned long __pmd_cache_index;
 extern unsigned long __pud_cache_index;
 #define PTE_INDEX_SIZE  __pte_index_size
 #define PMD_INDEX_SIZE  __pmd_index_size
 #define PUD_INDEX_SIZE  __pud_index_size
 #define PGD_INDEX_SIZE  __pgd_index_size
-#define PMD_CACHE_INDEX __pmd_cache_index
+/* pmd table use page table fragments */
+#define PMD_CACHE_INDEX  0
 #define PUD_CACHE_INDEX __pud_cache_index
 /*
  * Because of use of pte fragments and THP, size of page table
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 63b1c1882e22..e25a6b0cd01e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1018,7 +1018,6 @@ void __init hash__early_init_mmu(void)
__pud_index_size = H_PUD_INDEX_SIZE;
__pgd_index_size = H_PGD_INDEX_SIZE;
__pud_cache_index = H_PUD_CACHE_INDEX;
-   __pmd_cache_index = H_PMD_CACHE_INDEX;
__pte_table_size = H_PTE_TABLE_SIZE;
__pmd_table_size = H_PMD_TABLE_SIZE;
__pud_table_size = H_PUD_TABLE_SIZE;
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 47323ed8d7b5..abda2b92f1ba 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -400,7 +400,7 @@ static inline void pgtable_free(void *table, int index)
pte_fragment_free(table, 0);
break;
case PMD_INDEX:
-   kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), table);
+   pmd_fragment_free(table);
break;
case PUD_INDEX:
kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
@@ -431,7 +431,6 @@ void __tlb_remove_table(void *_table)
 #else
 void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int index)
 {
-
return 

[PATCH V1 09/11] powerpc/mm: Implement helpers for pagetable fragment support at PMD level

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h   |  2 +
 arch/powerpc/include/asm/book3s/64/hash-64k.h  |  7 +++
 arch/powerpc/include/asm/book3s/64/mmu.h   |  1 +
 arch/powerpc/include/asm/book3s/64/pgalloc.h   |  2 +
 arch/powerpc/include/asm/book3s/64/pgtable.h   |  6 ++
 arch/powerpc/include/asm/book3s/64/radix-4k.h  |  3 +
 arch/powerpc/include/asm/book3s/64/radix-64k.h |  4 ++
 arch/powerpc/mm/hash_utils_64.c|  2 +
 arch/powerpc/mm/mmu_context_book3s64.c | 37 ++--
 arch/powerpc/mm/pgtable-book3s64.c | 84 ++
 arch/powerpc/mm/pgtable-radix.c|  2 +
 11 files changed, 144 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 00c4db2a7682..9a3798660cef 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -42,6 +42,8 @@
 /* 8 bytes per each pte entry */
 #define H_PTE_FRAG_SIZE_SHIFT  (H_PTE_INDEX_SIZE + 3)
 #define H_PTE_FRAG_NR  (PAGE_SIZE >> H_PTE_FRAG_SIZE_SHIFT)
+#define H_PMD_FRAG_SIZE_SHIFT  (H_PMD_INDEX_SIZE + 3)
+#define H_PMD_FRAG_NR  (PAGE_SIZE >> H_PMD_FRAG_SIZE_SHIFT)
 
 /* memory key bits, only 8 keys supported */
 #define H_PTE_PKEY_BIT00
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index cc82745355b3..c81793d47af9 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -46,6 +46,13 @@
 #define H_PTE_FRAG_SIZE_SHIFT  (H_PTE_INDEX_SIZE + 3 + 1)
 #define H_PTE_FRAG_NR  (PAGE_SIZE >> H_PTE_FRAG_SIZE_SHIFT)
 
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE)
+#define H_PMD_FRAG_SIZE_SHIFT  (H_PMD_INDEX_SIZE + 3 + 1)
+#else
+#define H_PMD_FRAG_SIZE_SHIFT  (H_PMD_INDEX_SIZE + 3)
+#endif
+#define H_PMD_FRAG_NR  (PAGE_SIZE >> H_PMD_FRAG_SIZE_SHIFT)
+
 #ifndef __ASSEMBLY__
 #include 
 
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index fde7803a2261..9c8c669a6b6a 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -138,6 +138,7 @@ typedef struct {
 * pagetable fragment support
 */
void *pte_frag;
+   void *pmd_frag;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
struct list_head iommu_group_mem_list;
 #endif
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index ed313b8d3fac..005f400cbf30 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -42,7 +42,9 @@ extern struct kmem_cache *pgtable_cache[];
})
 
 extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int);
+extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
 extern void pte_fragment_free(unsigned long *, int);
+extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
 #ifdef CONFIG_SMP
 extern void __tlb_remove_table(void *_table);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 9462bc18806c..c9db19512b3c 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -246,6 +246,12 @@ extern unsigned long __pte_frag_size_shift;
 #define PTE_FRAG_SIZE_SHIFT __pte_frag_size_shift
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
 
+extern unsigned long __pmd_frag_nr;
+#define PMD_FRAG_NR __pmd_frag_nr
+extern unsigned long __pmd_frag_size_shift;
+#define PMD_FRAG_SIZE_SHIFT __pmd_frag_size_shift
+#define PMD_FRAG_SIZE (1UL << PMD_FRAG_SIZE_SHIFT)
+
 #define PTRS_PER_PTE   (1 << PTE_INDEX_SIZE)
 #define PTRS_PER_PMD   (1 << PMD_INDEX_SIZE)
 #define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
diff --git a/arch/powerpc/include/asm/book3s/64/radix-4k.h 
b/arch/powerpc/include/asm/book3s/64/radix-4k.h
index ca366ec86310..863c3e8286fb 100644
--- a/arch/powerpc/include/asm/book3s/64/radix-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/radix-4k.h
@@ -15,4 +15,7 @@
 #define RADIX_PTE_FRAG_SIZE_SHIFT  (RADIX_PTE_INDEX_SIZE + 3)
 #define RADIX_PTE_FRAG_NR  (PAGE_SIZE >> RADIX_PTE_FRAG_SIZE_SHIFT)
 
+#define RADIX_PMD_FRAG_SIZE_SHIFT  (RADIX_PMD_INDEX_SIZE + 3)
+#define RADIX_PMD_FRAG_NR  (PAGE_SIZE >> RADIX_PMD_FRAG_SIZE_SHIFT)
+
 #endif /* _ASM_POWERPC_PGTABLE_RADIX_4K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/radix-64k.h 
b/arch/powerpc/include/asm/book3s/64/radix-64k.h
index 830082496876..ccb78ca9d0c5 100644
--- a/arch/powerpc/include/asm/book3s/64/radix-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/radix-64k.h
@@ -16,4 +16,8 @@
  */
 #define RADIX_PTE_FRAG_SIZE_SHIFT  

[PATCH V1 08/11] powerpc/book3s64/mm: Simplify the rcu callback for page table free

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

Instead of encoding shift in the table address, use an enumerated index value.
This allow us to do different things in the callback for pte and pmd.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 10 +++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 +++
 arch/powerpc/mm/pgtable-book3s64.c   | 45 
 3 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 826171568192..ed313b8d3fac 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -124,14 +124,14 @@ static inline void pud_populate(struct mm_struct *mm, 
pud_t *pud, pmd_t *pmd)
 }
 
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
-  unsigned long address)
+ unsigned long address)
 {
/*
 * By now all the pud entries should be none entries. So go
 * ahead and flush the page walk cache
 */
flush_tlb_pgtable(tlb, address);
-   pgtable_free_tlb(tlb, pud, PUD_CACHE_INDEX);
+   pgtable_free_tlb(tlb, pud, PUD_INDEX);
 }
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -146,14 +146,14 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
 }
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
-  unsigned long address)
+ unsigned long address)
 {
/*
 * By now all the pud entries should be none entries. So go
 * ahead and flush the page walk cache
 */
flush_tlb_pgtable(tlb, address);
-return pgtable_free_tlb(tlb, pmd, PMD_CACHE_INDEX);
+   return pgtable_free_tlb(tlb, pmd, PMD_INDEX);
 }
 
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
@@ -203,7 +203,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
 * ahead and flush the page walk cache
 */
flush_tlb_pgtable(tlb, address);
-   pgtable_free_tlb(tlb, table, 0);
+   pgtable_free_tlb(tlb, table, PTE_INDEX);
 }
 
 #define check_pgt_cache()  do { } while (0)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 47b5ffc8715d..9462bc18806c 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -273,6 +273,16 @@ extern unsigned long __pte_frag_size_shift;
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0xc0ffUL
 
+/*
+ * Used as an indicator for rcu callback functions
+ */
+enum pgtable_index {
+   PTE_INDEX = 0,
+   PMD_INDEX,
+   PUD_INDEX,
+   PGD_INDEX,
+};
+
 extern unsigned long __vmalloc_start;
 extern unsigned long __vmalloc_end;
 #define VMALLOC_START  __vmalloc_start
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index fc42cccb96c7..0a05e99b54a1 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -309,38 +309,45 @@ void pte_fragment_free(unsigned long *table, int kernel)
}
 }
 
+static inline void pgtable_free(void *table, int index)
+{
+   switch (index) {
+   case PTE_INDEX:
+   pte_fragment_free(table, 0);
+   break;
+   case PMD_INDEX:
+   kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), table);
+   break;
+   case PUD_INDEX:
+   kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
+   break;
+   /* We don't free pgd table via RCU callback */
+   default:
+   BUG();
+   }
+}
+
 #ifdef CONFIG_SMP
-void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
+void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int index)
 {
unsigned long pgf = (unsigned long)table;
 
-   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
-   pgf |= shift;
+   BUG_ON(index > MAX_PGTABLE_INDEX_SIZE);
+   pgf |= index;
tlb_remove_table(tlb, (void *)pgf);
 }
 
 void __tlb_remove_table(void *_table)
 {
void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
-   unsigned int shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+   unsigned int index = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
 
-   if (!shift)
-   /* PTE page needs special handling */
-   pte_fragment_free(table, 0);
-   else {
-   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
-   kmem_cache_free(PGT_CACHE(shift), table);
-   }
+   return pgtable_free(table, index);
 }
 #else
-void pgtable_free_tlb(struct 

[PATCH V1 07/11] powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable fragment

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

4K config use one full page at level 4 of the pagetable. Add support for single
fragment allocation in pagetable fragment code and and use that for 4K config.
This makes both 4k and 64k use the same code path. Later we will switch pmd to
use the page table fragment code. This is done only for 64bit platforms which
is using page table fragment support.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h |  6 --
 arch/powerpc/include/asm/book3s/64/mmu.h |  6 +++---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 26 --
 arch/powerpc/mm/mmu_context_book3s64.c   | 10 --
 arch/powerpc/mm/pgtable-book3s64.c   | 11 ---
 5 files changed, 15 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 4b5423030d4b..00c4db2a7682 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -38,8 +38,10 @@
 #define H_PAGE_4K_PFN  0x0
 #define H_PAGE_THP_HUGE 0x0
 #define H_PAGE_COMBO   0x0
-#define H_PTE_FRAG_NR  0
-#define H_PTE_FRAG_SIZE_SHIFT  0
+
+/* 8 bytes per each pte entry */
+#define H_PTE_FRAG_SIZE_SHIFT  (H_PTE_INDEX_SIZE + 3)
+#define H_PTE_FRAG_NR  (PAGE_SIZE >> H_PTE_FRAG_SIZE_SHIFT)
 
 /* memory key bits, only 8 keys supported */
 #define H_PTE_PKEY_BIT00
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5094696eecd6..fde7803a2261 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -134,10 +134,10 @@ typedef struct {
 #ifdef CONFIG_PPC_SUBPAGE_PROT
struct subpage_prot_table spt;
 #endif /* CONFIG_PPC_SUBPAGE_PROT */
-#ifdef CONFIG_PPC_64K_PAGES
-   /* for 4K PTE fragment support */
+   /*
+* pagetable fragment support
+*/
void *pte_frag;
-#endif
 #ifdef CONFIG_SPAPR_TCE_IOMMU
struct list_head iommu_group_mem_list;
 #endif
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 558a159600ad..826171568192 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -173,31 +173,6 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
return (pgtable_t)pmd_page_vaddr(pmd);
 }
 
-#ifdef CONFIG_PPC_4K_PAGES
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
- unsigned long address)
-{
-   return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
- unsigned long address)
-{
-   struct page *page;
-   pte_t *pte;
-
-   pte = (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT);
-   if (!pte)
-   return NULL;
-   page = virt_to_page(pte);
-   if (!pgtable_page_ctor(page)) {
-   __free_page(page);
-   return NULL;
-   }
-   return pte;
-}
-#else /* if CONFIG_PPC_64K_PAGES */
-
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
  unsigned long address)
 {
@@ -209,7 +184,6 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
 {
return (pgtable_t)pte_fragment_alloc(mm, address, 0);
 }
-#endif
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index b75194dff64c..87ee78973a35 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -159,9 +159,7 @@ int init_new_context(struct task_struct *tsk, struct 
mm_struct *mm)
 
mm->context.id = index;
 
-#ifdef CONFIG_PPC_64K_PAGES
mm->context.pte_frag = NULL;
-#endif
 #ifdef CONFIG_SPAPR_TCE_IOMMU
mm_iommu_init(mm);
 #endif
@@ -192,7 +190,6 @@ static void destroy_contexts(mm_context_t *ctx)
spin_unlock(_context_lock);
 }
 
-#ifdef CONFIG_PPC_64K_PAGES
 static void destroy_pagetable_page(struct mm_struct *mm)
 {
int count;
@@ -213,13 +210,6 @@ static void destroy_pagetable_page(struct mm_struct *mm)
}
 }
 
-#else
-static inline void destroy_pagetable_page(struct mm_struct *mm)
-{
-   return;
-}
-#endif
-
 void destroy_context(struct mm_struct *mm)
 {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index e4e1b2d4ca27..fc42cccb96c7 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -225,7 +225,7 @@ void mmu_partition_table_set_entry(unsigned int lpid, 
unsigned long dw0,
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 }
 

[PATCH V1 06/11] powerpc/mm/nohash: Remove pte fragment dependency from nohash

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

Now that we have removed 64K page size support, the RCU page table free can
be much simpler for nohash. Make a copy of the the rcu callback to pgalloc.h
header similar to nohash 32. We could possibly merge 32 and 64 bit there. But
that is for a later patch

We also move the book3s specific handler to pgtable_book3s64.c. This will be
updated in a later patch to handle split pmd ptlock.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/nohash/64/pgalloc.h |  57 +++---
 arch/powerpc/mm/pgtable-book3s64.c   | 114 +++
 arch/powerpc/mm/pgtable_64.c | 114 ---
 3 files changed, 159 insertions(+), 126 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index a6baf3c13bb5..21624ff1f065 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -84,6 +84,18 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
 
 #define pmd_pgtable(pmd) pmd_page(pmd)
 
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+   return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX),
+   pgtable_gfp_flags(mm, GFP_KERNEL));
+}
+
+static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+{
+   kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
+}
+
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
  unsigned long address)
 {
@@ -118,26 +130,47 @@ static inline void pte_free(struct mm_struct *mm, 
pgtable_t ptepage)
__free_page(ptepage);
 }
 
-extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
+static inline void pgtable_free(void *table, int shift)
+{
+   if (!shift) {
+   pgtable_page_dtor(table);
+   free_page((unsigned long)table);
+   } else {
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   kmem_cache_free(PGT_CACHE(shift), table);
+   }
+}
+
 #ifdef CONFIG_SMP
-extern void __tlb_remove_table(void *_table);
-#endif
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
- unsigned long address)
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int 
shift)
 {
-   tlb_flush_pgtable(tlb, address);
-   pgtable_free_tlb(tlb, page_address(table), 0);
+   unsigned long pgf = (unsigned long)table;
+
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
+   pgf |= shift;
+   tlb_remove_table(tlb, (void *)pgf);
 }
 
-static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+static inline void __tlb_remove_table(void *_table)
 {
-   return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX),
-   pgtable_gfp_flags(mm, GFP_KERNEL));
+   void *table = (void *)((unsigned long)_table & ~MAX_PGTABLE_INDEX_SIZE);
+   unsigned shift = (unsigned long)_table & MAX_PGTABLE_INDEX_SIZE;
+
+   pgtable_free(table, shift);
 }
 
-static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+#else
+static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int 
shift)
 {
-   kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
+   pgtable_free(table, shift);
+}
+#endif
+
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
+ unsigned long address)
+{
+   tlb_flush_pgtable(tlb, address);
+   pgtable_free_tlb(tlb, page_address(table), 0);
 }
 
 #define __pmd_free_tlb(tlb, pmd, addr)   \
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index e1c304183172..e4e1b2d4ca27 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -225,3 +225,117 @@ void mmu_partition_table_set_entry(unsigned int lpid, 
unsigned long dw0,
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 }
 EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
+#ifdef CONFIG_PPC_64K_PAGES
+static pte_t *get_pte_from_cache(struct mm_struct *mm)
+{
+   void *pte_frag, *ret;
+
+   spin_lock(>page_table_lock);
+   ret = mm->context.pte_frag;
+   if (ret) {
+   pte_frag = ret + PTE_FRAG_SIZE;
+   /*
+* If we have taken up all the fragments mark PTE page NULL
+*/
+   if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
+   pte_frag = NULL;
+   mm->context.pte_frag = pte_frag;
+   }
+   spin_unlock(>page_table_lock);
+   return (pte_t *)ret;
+}
+
+static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
+{
+   void *ret = NULL;
+   struct page *page;
+
+   if (!kernel) {
+   page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
+ 

[PATCH V1 05/11] powerpc/mm/book3e/64: Remove unsupported 64Kpage size from 64bit booke

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

We have in Kconfig

config PPC_64K_PAGES
bool "64k page size"
depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64)
select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64

Only supported BOOK3E 64 bit platforms is FSL_BOOK3E. Remove the dead 64k page
support code from 64bit nohash.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/mmu-book3e.h|  6 ---
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 60 
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h | 57 --
 arch/powerpc/include/asm/nohash/64/pgtable.h |  8 ++--
 4 files changed, 4 insertions(+), 127 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/nohash/64/pgtable-64k.h

diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index cda94a0f5146..e20072972e35 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -230,10 +230,6 @@ typedef struct {
unsigned intid;
unsigned intactive;
unsigned long   vdso_base;
-#ifdef CONFIG_PPC_64K_PAGES
-   /* for 4K PTE fragment support */
-   void *pte_frag;
-#endif
 } mm_context_t;
 
 /* Page size definitions, common between 32 and 64-bit
@@ -275,8 +271,6 @@ static inline unsigned int mmu_psize_to_shift(unsigned int 
mmu_psize)
  */
 #if defined(CONFIG_PPC_4K_PAGES)
 #define mmu_virtual_psize  MMU_PAGE_4K
-#elif defined(CONFIG_PPC_64K_PAGES)
-#define mmu_virtual_psize  MMU_PAGE_64K
 #else
 #error Unsupported page size
 #endif
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 9721c7867b9c..a6baf3c13bb5 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -52,8 +52,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
kmem_cache_free(PGT_CACHE(PGD_INDEX_SIZE), pgd);
 }
 
-#ifndef CONFIG_PPC_64K_PAGES
-
 #define pgd_populate(MM, PGD, PUD) pgd_set(PGD, (unsigned long)PUD)
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -131,64 +129,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
pgtable_free_tlb(tlb, page_address(table), 0);
 }
 
-#else /* if CONFIG_PPC_64K_PAGES */
-
-extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int);
-extern void pte_fragment_free(unsigned long *, int);
-extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
-#ifdef CONFIG_SMP
-extern void __tlb_remove_table(void *_table);
-#endif
-
-#define pud_populate(mm, pud, pmd) pud_set(pud, (unsigned long)pmd)
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd,
-  pte_t *pte)
-{
-   pmd_set(pmd, (unsigned long)pte);
-}
-
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
-   pgtable_t pte_page)
-{
-   pmd_set(pmd, (unsigned long)pte_page);
-}
-
-static inline pgtable_t pmd_pgtable(pmd_t pmd)
-{
-   return (pgtable_t)(pmd_val(pmd) & ~PMD_MASKED_BITS);
-}
-
-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
- unsigned long address)
-{
-   return (pte_t *)pte_fragment_alloc(mm, address, 1);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
-   unsigned long address)
-{
-   return (pgtable_t)pte_fragment_alloc(mm, address, 0);
-}
-
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
-{
-   pte_fragment_free((unsigned long *)pte, 1);
-}
-
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
-{
-   pte_fragment_free((unsigned long *)ptepage, 0);
-}
-
-static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
- unsigned long address)
-{
-   tlb_flush_pgtable(tlb, address);
-   pgtable_free_tlb(tlb, table, 0);
-}
-#endif /* CONFIG_PPC_64K_PAGES */
-
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX),
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h 
b/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
deleted file mode 100644
index 7210c2818e41..
--- a/arch/powerpc/include/asm/nohash/64/pgtable-64k.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_NOHASH_64_PGTABLE_64K_H
-#define _ASM_POWERPC_NOHASH_64_PGTABLE_64K_H
-
-#define __ARCH_USE_5LEVEL_HACK
-#include 
-
-
-#define PTE_INDEX_SIZE  8
-#define PMD_INDEX_SIZE  10
-#define PUD_INDEX_SIZE 0
-#define PGD_INDEX_SIZE  12
-
-/*
- * we support 32 fragments per PTE page of 64K size
- */
-#define PTE_FRAG_NR32
-/*
- * We use a 2K PTE page 

[PATCH V1 04/11] powerpc/mm: Rename pte fragment functions

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

We rename the alloc and get_from_cache to indicate they operate on pte
fragments. In later patch we will add pmd fragment support.

No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable_64.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index a41784dd2042..3873f94a3ae9 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -314,7 +314,7 @@ struct page *pmd_page(pmd_t pmd)
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
-static pte_t *get_from_cache(struct mm_struct *mm)
+static pte_t *get_pte_from_cache(struct mm_struct *mm)
 {
void *pte_frag, *ret;
 
@@ -333,7 +333,7 @@ static pte_t *get_from_cache(struct mm_struct *mm)
return (pte_t *)ret;
 }
 
-static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
+static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
 {
void *ret = NULL;
struct page *page;
@@ -372,12 +372,13 @@ pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned 
long vmaddr, int kernel
 {
pte_t *pte;
 
-   pte = get_from_cache(mm);
+   pte = get_pte_from_cache(mm);
if (pte)
return pte;
 
-   return __alloc_for_cache(mm, kernel);
+   return __alloc_for_ptecache(mm, kernel);
 }
+
 #endif /* CONFIG_PPC_64K_PAGES */
 
 void pte_fragment_free(unsigned long *table, int kernel)
-- 
2.14.3



[PATCH V1 03/11] powerpc/mm: Use pmd_lockptr instead of opencoding it

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

In later patch we switch pmd_lock from mm->page_table_lock to split pmd ptlock.
It avoid compilations issues, use pmd_lockptr helper.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable-book3s64.c | 4 ++--
 arch/powerpc/mm/pgtable-hash64.c   | 8 +---
 arch/powerpc/mm/pgtable-radix.c| 2 +-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 35913b0b6d56..e1c304183172 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -37,7 +37,7 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, 
unsigned long address,
int changed;
 #ifdef CONFIG_DEBUG_VM
WARN_ON(!pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
-   assert_spin_locked(>vm_mm->page_table_lock);
+   assert_spin_locked(pmd_lockptr(vma->vm_mm, pmdp));
 #endif
changed = !pmd_same(*(pmdp), entry);
if (changed) {
@@ -62,7 +62,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 {
 #ifdef CONFIG_DEBUG_VM
WARN_ON(pte_present(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp)));
-   assert_spin_locked(>page_table_lock);
+   assert_spin_locked(pmd_lockptr(mm, pmdp));
WARN_ON(!(pmd_trans_huge(pmd) || pmd_devmap(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index 199bfda5f0d9..692bfc9e372c 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -193,7 +193,7 @@ unsigned long hash__pmd_hugepage_update(struct mm_struct 
*mm, unsigned long addr
 
 #ifdef CONFIG_DEBUG_VM
WARN_ON(!hash__pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
-   assert_spin_locked(>page_table_lock);
+   assert_spin_locked(pmd_lockptr(mm, pmdp));
 #endif
 
__asm__ __volatile__(
@@ -265,7 +265,8 @@ void hash__pgtable_trans_huge_deposit(struct mm_struct *mm, 
pmd_t *pmdp,
  pgtable_t pgtable)
 {
pgtable_t *pgtable_slot;
-   assert_spin_locked(>page_table_lock);
+
+   assert_spin_locked(pmd_lockptr(mm, pmdp));
/*
 * we store the pgtable in the second half of PMD
 */
@@ -285,7 +286,8 @@ pgtable_t hash__pgtable_trans_huge_withdraw(struct 
mm_struct *mm, pmd_t *pmdp)
pgtable_t pgtable;
pgtable_t *pgtable_slot;
 
-   assert_spin_locked(>page_table_lock);
+   assert_spin_locked(pmd_lockptr(mm, pmdp));
+
pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD;
pgtable = *pgtable_slot;
/*
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index f1891e215e39..473415750cbf 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -975,7 +975,7 @@ unsigned long radix__pmd_hugepage_update(struct mm_struct 
*mm, unsigned long add
 
 #ifdef CONFIG_DEBUG_VM
WARN_ON(!radix__pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
-   assert_spin_locked(>page_table_lock);
+   assert_spin_locked(pmd_lockptr(mm, pmdp));
 #endif
 
old = radix__pte_update(mm, addr, (pte_t *)pmdp, clr, set, 1);
-- 
2.14.3



[PATCH V1 02/11] powerpc/kvm: Switch kvm pmd allocator to custom allocator

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

In the next set of patches, we will switch pmd allocator to use page fragments
and the locking will be updated to split pmd ptlock. We want to avoid using
fragments for partition-scoped table. Use slab cache similar to level 4 table

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 36 +-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index a57eafec4dc2..ccdf3761eec0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -200,6 +200,7 @@ void kvmppc_radix_set_pte_at(struct kvm *kvm, unsigned long 
addr,
 }
 
 static struct kmem_cache *kvm_pte_cache;
+static struct kmem_cache *kvm_pmd_cache;
 
 static pte_t *kvmppc_pte_alloc(void)
 {
@@ -217,6 +218,16 @@ static inline int pmd_is_leaf(pmd_t pmd)
return !!(pmd_val(pmd) & _PAGE_PTE);
 }
 
+static pmd_t *kvmppc_pmd_alloc(void)
+{
+   return kmem_cache_alloc(kvm_pmd_cache, GFP_KERNEL);
+}
+
+static void kvmppc_pmd_free(pmd_t *pmdp)
+{
+   kmem_cache_free(kvm_pmd_cache, pmdp);
+}
+
 static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa,
 unsigned int level, unsigned long mmu_seq)
 {
@@ -239,7 +250,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, 
unsigned long gpa,
if (pud && pud_present(*pud) && !pud_huge(*pud))
pmd = pmd_offset(pud, gpa);
else if (level <= 1)
-   new_pmd = pmd_alloc_one(kvm->mm, gpa);
+   new_pmd = kvmppc_pmd_alloc();
 
if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
@@ -382,7 +393,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, 
unsigned long gpa,
if (new_pud)
pud_free(kvm->mm, new_pud);
if (new_pmd)
-   pmd_free(kvm->mm, new_pmd);
+   kvmppc_pmd_free(new_pmd);
if (new_ptep)
kvmppc_pte_free(new_ptep);
return ret;
@@ -758,7 +769,7 @@ void kvmppc_free_radix(struct kvm *kvm)
kvmppc_pte_free(pte);
pmd_clear(pmd);
}
-   pmd_free(kvm->mm, pmd_offset(pud, 0));
+   kvmppc_pmd_free(pmd_offset(pud, 0));
pud_clear(pud);
}
pud_free(kvm->mm, pud_offset(pgd, 0));
@@ -770,20 +781,35 @@ void kvmppc_free_radix(struct kvm *kvm)
 
 static void pte_ctor(void *addr)
 {
-   memset(addr, 0, PTE_TABLE_SIZE);
+   memset(addr, 0, RADIX_PTE_TABLE_SIZE);
+}
+
+static void pmd_ctor(void *addr)
+{
+   memset(addr, 0, RADIX_PMD_TABLE_SIZE);
 }
 
 int kvmppc_radix_init(void)
 {
-   unsigned long size = sizeof(void *) << PTE_INDEX_SIZE;
+   unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE;
 
kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor);
if (!kvm_pte_cache)
return -ENOMEM;
+
+   size = sizeof(void *) << RADIX_PMD_INDEX_SIZE;
+
+   kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor);
+   if (!kvm_pmd_cache) {
+   kmem_cache_destroy(kvm_pte_cache);
+   return -ENOMEM;
+   }
+
return 0;
 }
 
 void kvmppc_radix_exit(void)
 {
kmem_cache_destroy(kvm_pte_cache);
+   kmem_cache_destroy(kvm_pmd_cache);
 }
-- 
2.14.3



[PATCH V1 01/11] powerpc/mm/book3s64: Move book3s64 code to pgtable-book3s64

2018-04-16 Thread Aneesh Kumar K.V
From: "Aneesh Kumar K.V" 

Only code movement and avoid #ifdef.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable-book3s64.c | 54 
 arch/powerpc/mm/pgtable_64.c   | 56 --
 2 files changed, 54 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 518518fb7c45..35913b0b6d56 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -9,10 +9,13 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
 #include 
+#include 
+#include 
 
 #include "mmu_decl.h"
 #include 
@@ -171,3 +174,54 @@ int __meminit remove_section_mapping(unsigned long start, 
unsigned long end)
return hash__remove_section_mapping(start, end);
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
+
+void __init mmu_partition_table_init(void)
+{
+   unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
+   unsigned long ptcr;
+
+   BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 36), "Partition table size too 
large.");
+   partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
+   MEMBLOCK_ALLOC_ANYWHERE));
+
+   /* Initialize the Partition Table with no entries */
+   memset((void *)partition_tb, 0, patb_size);
+
+   /*
+* update partition table control register,
+* 64 K size.
+*/
+   ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12);
+   mtspr(SPRN_PTCR, ptcr);
+   powernv_set_nmmu_ptcr(ptcr);
+}
+
+void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
+  unsigned long dw1)
+{
+   unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
+
+   partition_tb[lpid].patb0 = cpu_to_be64(dw0);
+   partition_tb[lpid].patb1 = cpu_to_be64(dw1);
+
+   /*
+* Global flush of TLBs and partition table caches for this lpid.
+* The type of flush (hash or radix) depends on what the previous
+* use of this partition ID was, not the new use.
+*/
+   asm volatile("ptesync" : : : "memory");
+   if (old & PATB_HR) {
+   asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
+"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+   asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
+"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
+   } else {
+   asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
+"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 0);
+   }
+   /* do we need fixup here ?*/
+   asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 9bf659d5078c..a41784dd2042 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -33,7 +33,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -47,13 +46,11 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 
 #include "mmu_decl.h"
 
@@ -429,59 +426,6 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, 
int shift)
 }
 #endif
 
-#ifdef CONFIG_PPC_BOOK3S_64
-void __init mmu_partition_table_init(void)
-{
-   unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
-   unsigned long ptcr;
-
-   BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 36), "Partition table size too 
large.");
-   partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
-   MEMBLOCK_ALLOC_ANYWHERE));
-
-   /* Initialize the Partition Table with no entries */
-   memset((void *)partition_tb, 0, patb_size);
-
-   /*
-* update partition table control register,
-* 64 K size.
-*/
-   ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12);
-   mtspr(SPRN_PTCR, ptcr);
-   powernv_set_nmmu_ptcr(ptcr);
-}
-
-void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
-  unsigned long dw1)
-{
-   unsigned long old = be64_to_cpu(partition_tb[lpid].patb0);
-
-   partition_tb[lpid].patb0 = cpu_to_be64(dw0);
-   partition_tb[lpid].patb1 = cpu_to_be64(dw1);
-
-   /*
-* Global flush of TLBs and partition table caches for this lpid.
-* The type of flush (hash or radix) depends on what the previous
-* use of this partition ID was, not the new use.
-*/
-   asm volatile("ptesync" : : : "memory");
-   if (old & PATB_HR) {
-   asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
-"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
-  

[PATCH] powerpc/8xx: Build fix with Hugetlbfs enabled

2018-04-16 Thread Aneesh Kumar K.V
8xx use slice code when hugetlbfs is enabled. We missed a header include on
8xx which resulted in the below build failure.

config: mpc885_ads_defconfig + CONFIG_HUGETLBFS

   CC  arch/powerpc/mm/slice.o
arch/powerpc/mm/slice.c: In function 'slice_get_unmapped_area':
arch/powerpc/mm/slice.c:655:2: error: implicit declaration of function 
'need_extra_context' [-Werror=implicit-function-declaration]
arch/powerpc/mm/slice.c:656:3: error: implicit declaration of function 
'alloc_extended_context' [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/mm/slice.o] Error 1
make: *** [arch/powerpc/mm] Error 2

on PPC64 the mmu_context.h was included via linux/pkeys.h

CC: Christophe LEROY 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/slice.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 9cd87d11fe4e..205fe557ca10 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_SPINLOCK(slice_convert_lock);
 
-- 
2.14.3



[PATCH V1 00/11] powerpc/mm/book3s64: Support for split pmd ptlock

2018-04-16 Thread Aneesh Kumar K.V
This patch series add split pmd pagetable lock for book3s64. nohash64 also 
should
be able to switch to this. I need to workout the code dependency. This series
also migh have broken the build on platforms otherthan book3s64. I am sending 
this early
to get feedback on whether we should continue with the approach.

We switch the pmd allocator to use something similar to what we already use for
level 4 pagetable allocation. We get an order 0 page and divide that to 
fragments
and hand over fragments when we get request for a pmd pagetable. The pmd lock is
now stashed in the struct page backing the allocated page.

The series helps in reducing lock contention on mm->page_table_lock.


without patch

32.72%  mmap_bench  [kernel.vmlinux][k] do_raw_spin_lock
|
---do_raw_spin_lock
   |
--32.68%--0
  |
  |--15.82%--pte_fragment_alloc
  |  |
  |   --15.79%--do_huge_pmd_anonymous_page
  | __handle_mm_fault
  | handle_mm_fault
  | __do_page_fault
  | handle_page_fault
  | test_mmap
  | test_mmap
  | start_thread
  | __clone
  |
  |--14.95%--do_huge_pmd_anonymous_page
  |  __handle_mm_fault
  |  handle_mm_fault
  |  __do_page_fault
  |  handle_page_fault
  |  test_mmap
  |  test_mmap
  |  start_thread
  |  __clone
  |

with patch

12.89%  mmap_bench  [kernel.vmlinux][k] do_raw_spin_lock
|
---do_raw_spin_lock
   |
--12.83%--0
  |
  |--3.21%--pagevec_lru_move_fn
  |  __lru_cache_add
  |  |
  |   --2.74%--do_huge_pmd_anonymous_page
  | __handle_mm_fault
  | handle_mm_fault
  | __do_page_fault
  | handle_page_fault
  | test_mmap
  | test_mmap
  | start_thread
  | __clone
  |
  |--3.11%--do_huge_pmd_anonymous_page
  |  __handle_mm_fault
  |  handle_mm_fault
  |  __do_page_fault
  |  handle_page_fault
  |  test_mmap
  |  test_mmap
  |  start_thread
  |  __clone

.
  |
   --0.55%--pte_fragment_alloc
 |
  --0.55%--do_huge_pmd_anonymous_page
__handle_mm_fault
handle_mm_fault
__do_page_fault
handle_page_fault
test_mmap
test_mmap
start_thread
__clone



Aneesh Kumar K.V (11):
  powerpc/mm/book3s64: Move book3s64 code to pgtable-book3s64
  powerpc/kvm: Switch kvm pmd allocator to custom allocator
  powerpc/mm: Use pmd_lockptr instead of opencoding it
  powerpc/mm: Rename pte fragment functions
  powerpc/mm/book3e/64: Remove unsupported 64Kpage size from 64bit booke
  powerpc/mm/nohash: Remove pte fragment dependency from nohash
  powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable
fragment
  powerpc/book3s64/mm: Simplify the rcu callback for page table free
  powerpc/mm: Implement helpers for pagetable fragment support at PMD
level
  powerpc/mm: Use page fragments for allocation page table at PMD level
  powerpc/book3s64: Enable split pmd ptlock.

 arch/powerpc/include/asm/book3s/64/hash-4k.h |   8 +-
 

[PATCHv2 3/3] powerpc/cpu: post the event cpux add/remove instead of online/offline during hotplug

2018-04-16 Thread Pingfan Liu
Technically speaking, echo 1/0 > cpuX/online is only a subset of cpu
hotplug/unplug, i.e. add/remove. The latter one includes the physical
adding/removing of a cpu device. Some user space tools such as kexec-tools
resort to the event add/remove to automatically rebuild dtb.
If the dtb is not rebuilt correctly, we may hang on 2nd kernel due to
lack the info of boot-cpu-hwid in dtb.

The steps to trigger the bug: (suppose 8 threads/core)
drmgr -c cpu -r -q 1
systemctl restart kdump.service
drmgr -c cpu -a -q 1
taskset -c 11 sh -c "echo c > /proc/sysrq-trigger"

Then, failure info:
[  205.299528] SysRq : Trigger a crash
[  205.299551] Unable to handle kernel paging request for data at address 
0x
[  205.299558] Faulting instruction address: 0xc06001a0
[  205.299564] Oops: Kernel access of bad area, sig: 11 [#1]
[  205.299569] SMP NR_CPUS=2048 NUMA pSeries
[  205.299575] Modules linked in: macsec sctp_diag sctp tcp_diag udp_diag 
inet_diag unix_diag af_packet_diag netlink_diag ip6t_rpfilter ipt_REJECT 
nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
ip6table_security ip6table_raw iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_mangle iptable_security iptable_raw ebtable_filter ebtables 
ip6table_filter ip6_tables iptable_filter xfs libcrc32c sg
pseries_rng binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif 
crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt 
dm_mirror dm_region_hash dm_log dm_mod
[  205.299658] CPU: 11 PID: 2521 Comm: bash Not tainted 
3.10.0-799.el7.ppc64le #1
[  205.299664] task: c0017bcd15e0 ti: c0014f41 task.ti: 
c0014f41
[  205.299670] NIP: c06001a0 LR: c0600ddc CTR: 
c0600180
[  205.299676] REGS: c0014f413a70 TRAP: 0300   Not tainted  
(3.10.0-799.el7.ppc64le)
[  205.299681] MSR: 80009033   CR: 28222822  
XER: 0001
[  205.299696] CFAR: c0009368 DAR:  DSISR: 4200 
SOFTE: 1
GPR00: c0600dbc c0014f413cf0 c1263200 0063
GPR04: c19ca818 c19db5f8 00c2 c140aa30
GPR08: 0007 0001  c140fc60
GPR12: c0600180 c7b36300 10139e58 4000
GPR16: 1013b5d0  101306fc 10139de4
GPR20: 10139de8 10093150  
GPR24: 1013b5e0 100fa0e8 0007 c11af1c8
GPR28: 0063 c11af588 c1179ba8 0002
[  205.299770] NIP [c06001a0] sysrq_handle_crash+0x20/0x30
[  205.299776] LR [c0600ddc] write_sysrq_trigger+0x10c/0x230
[  205.299781] Call Trace:
[  205.299786] [c0014f413cf0] [c0600dbc] 
write_sysrq_trigger+0xec/0x230 (unreliable)
[  205.299794] [c0014f413d90] [c03eb2c4] 
proc_reg_write+0x84/0x120
[  205.299801] [c0014f413dd0] [c0330a80] SyS_write+0x150/0x400
[  205.299808] [c0014f413e30] [c000a184] system_call+0x38/0xb4
[  205.299813] Instruction dump:
[  205.299816] 409effb8 7fc3f378 4bfff381 4bac 3c4c00c6 38423080 
3d42fff1 394a6930
[  205.299827] 3921 912a 7c0004ac 3940 <992a> 4e800020 
6000 6042
[  205.299838] ---[ end trace f590a5dbd3f63aab ]---
[  205.301812]
[  205.301829] Sending IPI to other CPUs
[  205.302846] IPI complete
I'm in purgatory
  -- > hang up here

This patch uses the interface register_/unregister_cpu to fix the problem

Signed-off-by: Pingfan Liu 
Reported-by: Hari Bathini 
Reviewed-by: Hari Bathini 
---
 arch/powerpc/include/asm/smp.h   | 1 +
 arch/powerpc/kernel/sysfs.c  | 2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index fac963e..3ef730d 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -35,6 +35,7 @@ extern int spinning_secondaries;
 extern void cpu_die(void);
 extern int cpu_to_chip_id(int cpu);
 
+DECLARE_PER_CPU(struct cpu, cpu_devices);
 #ifdef CONFIG_SMP
 
 struct smp_ops_t {
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index a05ab5e..dbbcc96 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -26,7 +26,7 @@
 #include 
 #endif
 
-static DEFINE_PER_CPU(struct cpu, cpu_devices);
+DEFINE_PER_CPU(struct cpu, cpu_devices);
 
 /*
  * SMT snooze delay stuff, 64-bit only for 

[PATCHv2 2/3] powerpc/cpu: dynmamically to create/destroy the file physical_id during hotplug

2018-04-16 Thread Pingfan Liu
In order to cope with the incoming patch [3/3], which causes the dir
/sys/../cpuX is created/destroyed during hotplug, we also need to create
the file cpuX/physical_id dynamically.

Signed-off-by: Pingfan Liu 
Reported-by: Hari Bathini 
Reviewed-by: Hari Bathini 
---
 arch/powerpc/kernel/sysfs.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 04d0bbd..a05ab5e 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -716,6 +716,16 @@ static struct device_attribute pa6t_attrs[] = {
 #endif /* HAS_PPC_PMC_PA6T */
 #endif /* HAS_PPC_PMC_CLASSIC */
 
+/* Only valid if CPU is present. */
+static ssize_t show_physical_id(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct cpu *cpu = container_of(dev, struct cpu, dev);
+
+   return sprintf(buf, "%d\n", get_hard_smp_processor_id(cpu->dev.id));
+}
+static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
+
 static int register_cpu_online(unsigned int cpu)
 {
struct cpu *c = _cpu(cpu_devices, cpu);
@@ -723,6 +733,8 @@ static int register_cpu_online(unsigned int cpu)
struct device_attribute *attrs, *pmc_attrs;
int i, nattrs;
 
+   device_create_file(>dev, _attr_physical_id);
+
/* For cpus present at boot a reference was already grabbed in 
register_cpu() */
if (!s->of_node)
s->of_node = of_get_cpu_node(cpu, NULL);
@@ -816,6 +828,7 @@ static int unregister_cpu_online(unsigned int cpu)
 
BUG_ON(!c->hotpluggable);
 
+   device_remove_file(s, _attr_physical_id);
 #ifdef CONFIG_PPC64
if (cpu_has_feature(CPU_FTR_SMT))
device_remove_file(s, _attr_smt_snooze_delay);
@@ -1017,16 +1030,6 @@ static void register_nodes(void)
 
 #endif
 
-/* Only valid if CPU is present. */
-static ssize_t show_physical_id(struct device *dev,
-   struct device_attribute *attr, char *buf)
-{
-   struct cpu *cpu = container_of(dev, struct cpu, dev);
-
-   return sprintf(buf, "%d\n", get_hard_smp_processor_id(cpu->dev.id));
-}
-static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
-
 static int __init topology_init(void)
 {
int cpu, r;
@@ -1049,7 +1052,6 @@ static int __init topology_init(void)
if (cpu_online(cpu) || c->hotpluggable) {
register_cpu(c, cpu);
 
-   device_create_file(>dev, _attr_physical_id);
}
}
r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
-- 
2.7.4



[PATCHv2 1/3] powerpc/cpuidle: dynamically register/unregister cpuidle_device during hotplug

2018-04-16 Thread Pingfan Liu
cpuidle_device is touched during the cpu hotplug. In order to cope with the 
incoming
patch [3/3], which causes the dir /sys/../cpuX is created/destroyed during 
hotplug,
we also need to create the file cpuX/cpuidle dynamically.

Signed-off-by: Pingfan Liu 
Reviewed-by: Hari Bathini 
---
 drivers/cpuidle/cpuidle-powernv.c | 2 ++
 drivers/cpuidle/cpuidle-pseries.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 1a8234e..962c944 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -144,6 +144,7 @@ static int powernv_cpuidle_cpu_online(unsigned int cpu)
struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
 
if (dev && cpuidle_get_driver()) {
+   cpuidle_register_device(dev);
cpuidle_pause_and_lock();
cpuidle_enable_device(dev);
cpuidle_resume_and_unlock();
@@ -159,6 +160,7 @@ static int powernv_cpuidle_cpu_dead(unsigned int cpu)
cpuidle_pause_and_lock();
cpuidle_disable_device(dev);
cpuidle_resume_and_unlock();
+   cpuidle_unregister_device(dev);
}
return 0;
 }
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 9e56bc4..a53be8a 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -193,6 +193,7 @@ static int pseries_cpuidle_cpu_online(unsigned int cpu)
struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu);
 
if (dev && cpuidle_get_driver()) {
+   cpuidle_register_device(dev);
cpuidle_pause_and_lock();
cpuidle_enable_device(dev);
cpuidle_resume_and_unlock();
@@ -208,6 +209,7 @@ static int pseries_cpuidle_cpu_dead(unsigned int cpu)
cpuidle_pause_and_lock();
cpuidle_disable_device(dev);
cpuidle_resume_and_unlock();
+   cpuidle_unregister_device(dev);
}
return 0;
 }
-- 
2.7.4



[PATCHv2 0/3] post the event cpux add/remove besides online/offline during hotplug

2018-04-16 Thread Pingfan Liu
v1->v2:
   -1.improve the commit log and explain the reproducing of bug in [3/3]
   -2.re-fragment the series, and [3/3] is the motivation, while [1~2/3] are 
preparation.   

Pingfan Liu (3):
  powerpc/cpuidle: dynamically register/unregister cpuidle_device during
hotplug
  powerpc/cpu: dynmamically to create/destroy the file physical_id
during hotplug
  powerpc/cpu: post the event cpux add/remove instead of online/offline
during hotplug

 arch/powerpc/include/asm/smp.h   |  1 +
 arch/powerpc/kernel/sysfs.c  | 26 ++
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  3 +++
 drivers/cpuidle/cpuidle-powernv.c|  2 ++
 drivers/cpuidle/cpuidle-pseries.c|  2 ++
 5 files changed, 22 insertions(+), 12 deletions(-)

-- 
2.7.4