date:20201125

Re: [PATCH v6 16/22] powerpc/book3s64/kuap: Improve error reporting with KUAP

2020-11-25 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :
>> With hash translation use DSISR_KEYFAULT to identify a wrong access.
>> With Radix we look at the AMR value and type of fault.
>> 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>   arch/powerpc/include/asm/book3s/32/kup.h |  4 +--
>>   arch/powerpc/include/asm/book3s/64/kup.h | 27 
>>   arch/powerpc/include/asm/kup.h   |  4 +--
>>   arch/powerpc/include/asm/nohash/32/kup-8xx.h |  4 +--
>>   arch/powerpc/mm/fault.c  |  2 +-
>>   5 files changed, 29 insertions(+), 12 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
>> b/arch/powerpc/include/asm/book3s/32/kup.h
>> index 32fd4452e960..b18cd931e325 100644
>> --- a/arch/powerpc/include/asm/book3s/32/kup.h
>> +++ b/arch/powerpc/include/asm/book3s/32/kup.h
>> @@ -177,8 +177,8 @@ static inline void restore_user_access(unsigned long 
>> flags)
>>  allow_user_access(to, to, end - addr, KUAP_READ_WRITE);
>>   }
>>   
>> -static inline bool
>> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
>> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long 
>> address,
>> +  bool is_write, unsigned long error_code)
>>   {
>>  unsigned long begin = regs->kuap & 0xf000;
>>  unsigned long end = regs->kuap << 28;
>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
>> b/arch/powerpc/include/asm/book3s/64/kup.h
>> index 4a3d0d601745..2922c442a218 100644
>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>> @@ -301,12 +301,29 @@ static inline void set_kuap(unsigned long value)
>>  isync();
>>   }
>>   
>> -static inline bool
>> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
>> +#define RADIX_KUAP_BLOCK_READ   UL(0x4000)
>> +#define RADIX_KUAP_BLOCK_WRITE  UL(0x8000)
>> +
>> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long 
>> address,
>> +  bool is_write, unsigned long error_code)
>>   {
>> -return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
>> -(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
>> AMR_KUAP_BLOCK_READ)),
>> -"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
>> "Read");
>> +if (!mmu_has_feature(MMU_FTR_KUAP))
>> +return false;
>> +
>> +if (radix_enabled()) {
>> +/*
>> + * Will be a storage protection fault.
>> + * Only check the details of AMR[0]
>> + */
>> +return WARN((regs->kuap & (is_write ? RADIX_KUAP_BLOCK_WRITE : 
>> RADIX_KUAP_BLOCK_READ)),
>> +"Bug: %s fault blocked by AMR!", is_write ? "Write" 
>> : "Read");
>
> I think it is pointless to keep the WARN() here.
>
> I have a series aiming at removing them. See 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cc9129bdda1dbc2f0a09cf45fece7d0b0e690784.1605541983.git.christophe.le...@csgroup.eu/

Can we do this as a spearate patch as you posted above? We can drop the
WARN in that while keeping the hash branch to look at DSISR value.

-aneesh

Re: [PATCH v6 09/22] powerpc/exec: Set thread.regs early during exec

2020-11-25 Thread Christophe Leroy





Le 26/11/2020 à 08:38, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :




+++ b/arch/powerpc/kernel/process.c

@@ -1530,10 +1530,32 @@ void flush_thread(void)
   #ifdef CONFIG_PPC_BOOK3S_64
   void arch_setup_new_exec(void)
   {
-   if (radix_enabled())
-   return;
-   hash__setup_new_exec();
+   if (!radix_enabled())
+   hash__setup_new_exec();
+
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
+
+}
+#else
+void arch_setup_new_exec(void)
+{
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
   }
+
   #endif


No need to duplicate arch_setup_new_exec() I think. radix_enabled() is defined 
at all time so the
first function should be valid at all time.



arch/powerpc/kernel/process.c: In function ‘arch_setup_new_exec’:
arch/powerpc/kernel/process.c:1529:3: error: implicit declaration of function 
‘hash__setup_new_exec’; did you mean ‘arch_setup_new_exec’? 
[-Werror=implicit-function-declaration]
  1529 |   hash__setup_new_exec();
   |   ^~~~
   |   arch_setup_new_exec


That requires us to have hash__setup_new_exec prototype for all platforms.


Yes indeed.

So maybe, just enclose that part in the #ifdef instead of duplicating the 
common part ?

Christophe

Re: [PATCH v6 09/22] powerpc/exec: Set thread.regs early during exec

2020-11-25 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :


> +++ b/arch/powerpc/kernel/process.c
>> @@ -1530,10 +1530,32 @@ void flush_thread(void)
>>   #ifdef CONFIG_PPC_BOOK3S_64
>>   void arch_setup_new_exec(void)
>>   {
>> -if (radix_enabled())
>> -return;
>> -hash__setup_new_exec();
>> +if (!radix_enabled())
>> +hash__setup_new_exec();
>> +
>> +/*
>> + * If we exec out of a kernel thread then thread.regs will not be
>> + * set.  Do it now.
>> + */
>> +if (!current->thread.regs) {
>> +struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
>> +current->thread.regs = regs - 1;
>> +}
>> +
>> +}
>> +#else
>> +void arch_setup_new_exec(void)
>> +{
>> +/*
>> + * If we exec out of a kernel thread then thread.regs will not be
>> + * set.  Do it now.
>> + */
>> +if (!current->thread.regs) {
>> +struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
>> +current->thread.regs = regs - 1;
>> +}
>>   }
>> +
>>   #endif
>
> No need to duplicate arch_setup_new_exec() I think. radix_enabled() is 
> defined at all time so the 
> first function should be valid at all time.
>

arch/powerpc/kernel/process.c: In function ‘arch_setup_new_exec’:
arch/powerpc/kernel/process.c:1529:3: error: implicit declaration of function 
‘hash__setup_new_exec’; did you mean ‘arch_setup_new_exec’? 
[-Werror=implicit-function-declaration]
 1529 |   hash__setup_new_exec();
  |   ^~~~
  |   arch_setup_new_exec


That requires us to have hash__setup_new_exec prototype for all platforms.

-aneesh

Re: [PATCH] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()

2020-11-25 Thread Jarkko Sakkinen

On Tue, 2020-11-24 at 21:52 +0800, Wang Hai wrote:
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before
> proceeding")
> Reported-by: Hulk Robot 
> Signed-off-by: Wang Hai 

Provide a reasoning for -ETIMEOUT in the commit message.

/Jarkko

> ---
>  drivers/char/tpm/tpm_ibmvtpm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c
> b/drivers/char/tpm/tpm_ibmvtpm.c
> index 994385bf37c0..813eb2cac0ce 100644
> --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev
> *vio_dev,
> ibmvtpm->rtce_buf != NULL,
> HZ)) {
> dev_err(dev, "CRQ response timed out\n");
> +   rc = -ETIMEDOUT;
> goto init_irq_cleanup;
> }
>

Re: [PATCH v6 04/22] powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init

2020-11-25 Thread Michael Ellerman

"Aneesh Kumar K.V"  writes:
> This patch consolidates UAMOR update across pkey, kuap and kuep features.
> The boot cpu initialize UAMOR via pkey init and both radix/hash do the
> secondary cpu UAMOR init in early_init_mmu_secondary.
>
> We don't check for mmu_feature in radix secondary init because UAMOR
> is a supported SPRN with all CPUs supporting radix translation.
> The old code was not updating UAMOR if we had smap disabled and smep enabled.
> This change handles that case.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 3adcf730f478..bfe441af916a 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -620,9 +620,6 @@ void setup_kuap(bool disabled)
>   cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
>   }
>  
> - /* Make sure userspace can't change the AMR */
> - mtspr(SPRN_UAMOR, 0);
> -
>   /*
>* Set the default kernel AMR values on all cpus.
>*/
> @@ -721,6 +718,11 @@ void radix__early_init_mmu_secondary(void)
>  
>   radix__switch_mmu_context(NULL, _mm);
>   tlbiel_all();
> +
> +#ifdef CONFIG_PPC_PKEY
> + /* Make sure userspace can't change the AMR */
> + mtspr(SPRN_UAMOR, 0);
> +#endif

If PPC_PKEY is disabled I think this leaves UAMOR unset, which means it
could potentially allow AMR to be used as a covert channel between
processes.

cheers

Re: [PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS

2020-11-25 Thread Michael Ellerman

"Aneesh Kumar K.V"  writes:
> The next set of patches adds support for kuap with hash translation.
> Hence make KUAP a BOOK3S_64 feature. Also make it a subfeature of
> PPC_MEM_KEYS. Hash translation is going to use pkeys to support
> KUAP/KUEP. Adding this dependency reduces the code complexity and
> enables us to move some of the initialization code to pkeys.c

The subject and change log don't really match the patch anymore since
you incorporated my changes.

This adds a new CONFIG called PPC_PKEY which is enabled if either PKEY
or KUAP/KUEP is enabled etc.

cheers

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  .../powerpc/include/asm/book3s/64/kup-radix.h |  4 ++--
>  arch/powerpc/include/asm/book3s/64/mmu.h  |  2 +-
>  arch/powerpc/include/asm/ptrace.h |  7 +-
>  arch/powerpc/kernel/asm-offsets.c |  3 +++
>  arch/powerpc/mm/book3s64/Makefile |  2 +-
>  arch/powerpc/mm/book3s64/pkeys.c  | 24 ---
>  arch/powerpc/platforms/Kconfig.cputype|  5 
>  7 files changed, 33 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
> b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index 28716e2f13e3..68eaa2fac3ab 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> @@ -16,7 +16,7 @@
>  #ifdef CONFIG_PPC_KUAP
>   BEGIN_MMU_FTR_SECTION_NESTED(67)
>   mfspr   \gpr1, SPRN_AMR
> - ld  \gpr2, STACK_REGS_KUAP(r1)
> + ld  \gpr2, STACK_REGS_AMR(r1)
>   cmpd\gpr1, \gpr2
>   beq 998f
>   isync
> @@ -48,7 +48,7 @@
>   bne \msr_pr_cr, 99f
>   .endif
>   mfspr   \gpr1, SPRN_AMR
> - std \gpr1, STACK_REGS_KUAP(r1)
> + std \gpr1, STACK_REGS_AMR(r1)
>   li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
>   sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
>   cmpd\use_cr, \gpr1, \gpr2
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> b/arch/powerpc/include/asm/book3s/64/mmu.h
> index e0b52940e43c..a2a015066bae 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -199,7 +199,7 @@ extern int mmu_io_psize;
>  void mmu_early_init_devtree(void);
>  void hash__early_init_devtree(void);
>  void radix__early_init_devtree(void);
> -#ifdef CONFIG_PPC_MEM_KEYS
> +#ifdef CONFIG_PPC_PKEY
>  void pkey_early_init_devtree(void);
>  #else
>  static inline void pkey_early_init_devtree(void) {}
> diff --git a/arch/powerpc/include/asm/ptrace.h 
> b/arch/powerpc/include/asm/ptrace.h
> index e2c778c176a3..e7f1caa007a4 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -53,9 +53,14 @@ struct pt_regs
>  #ifdef CONFIG_PPC64
>   unsigned long ppr;
>  #endif
> + union {
>  #ifdef CONFIG_PPC_KUAP
> - unsigned long kuap;
> + unsigned long kuap;
>  #endif
> +#ifdef CONFIG_PPC_PKEY
> + unsigned long amr;
> +#endif
> + };
>   };
>   unsigned long __pad[2]; /* Maintain 16 byte interrupt stack 
> alignment */
>   };
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index c2722ff36e98..418a0b314a33 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -354,6 +354,9 @@ int main(void)
>   STACK_PT_REGS_OFFSET(_PPR, ppr);
>  #endif /* CONFIG_PPC64 */
>  
> +#ifdef CONFIG_PPC_PKEY
> + STACK_PT_REGS_OFFSET(STACK_REGS_AMR, amr);
> +#endif
>  #ifdef CONFIG_PPC_KUAP
>   STACK_PT_REGS_OFFSET(STACK_REGS_KUAP, kuap);
>  #endif
> diff --git a/arch/powerpc/mm/book3s64/Makefile 
> b/arch/powerpc/mm/book3s64/Makefile
> index fd393b8be14f..1b56d3af47d4 100644
> --- a/arch/powerpc/mm/book3s64/Makefile
> +++ b/arch/powerpc/mm/book3s64/Makefile
> @@ -17,7 +17,7 @@ endif
>  obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hash_hugepage.o
>  obj-$(CONFIG_PPC_SUBPAGE_PROT)   += subpage_prot.o
>  obj-$(CONFIG_SPAPR_TCE_IOMMU)+= iommu_api.o
> -obj-$(CONFIG_PPC_MEM_KEYS)   += pkeys.o
> +obj-$(CONFIG_PPC_PKEY)   += pkeys.o
>  
>  # Instrumenting the SLB fault path can lead to duplicate SLB entries
>  KCOV_INSTRUMENT_slb.o := n
> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index b1d091a97611..7dc71f85683d 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -89,12 +89,14 @@ static int scan_pkey_feature(void)
>   }
>   }
>  
> +#ifdef CONFIG_PPC_MEM_KEYS
>   /*
>* Adjust the upper limit, based on the number of bits supported by
>* arch-neutral code.
>*/
>   pkeys_total = min_t(int, pkeys_total,
>   ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));
> +#endif
>   return pkeys_total;
>  }
>  
> @@ -102,6 +104,7

Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Michael Ellerman

Laurent Vivier  writes:
> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>
> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
> this is broken on pseries.
>
> The affinity is correctly computed in msi_desc but this is not applied
> to the system IRQs.
>
> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
> lost at this point and never passed to irq_domain_alloc_descs()
> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
> because irq_create_mapping() doesn't take an affinity parameter.
>
> As the previous patch has added the affinity parameter to
> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs().
>
> With this change, the virtqueues are correctly dispatched between the CPUs
> on pseries.
>
> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Greg Kurz 
> ---
>  arch/powerpc/platforms/pseries/msi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Acked-by: Michael Ellerman 

cheers

> diff --git a/arch/powerpc/platforms/pseries/msi.c 
> b/arch/powerpc/platforms/pseries/msi.c
> index 133f6adcb39c..b3ac2455faad 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
> nvec_in, int type)
>   return hwirq;
>   }
>  
> - virq = irq_create_mapping(NULL, hwirq);
> + virq = irq_create_mapping_affinity(NULL, hwirq,
> +entry->affinity);
>  
>   if (!virq) {
>   pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> -- 
> 2.28.0

Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Michael Ellerman

Marc Zyngier  writes:
> On 2020-11-25 16:24, Laurent Vivier wrote:
>> On 25/11/2020 17:05, Denis Kirjanov wrote:
>>> On 11/25/20, Laurent Vivier  wrote:
 With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
 
 But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ 
 affinity")
 this is broken on pseries.
>>> 
>>> Please add "Fixes" tag.
>> 
>> In fact, the code in commit 0d9f0a52c8b9f is correct.
>> 
>> The problem is with MSI/X irq affinity and pseries. So this patch
>> fixes more than virtio_scsi. I put this information because this
>> commit allows to clearly show the problem. Perhaps I should remove
>> this line in fact?
>
> This patch does not fix virtio_scsi at all, which as you noticed, is
> correct. It really fixes the PPC MSI setup, which is starting to show
> its age. So getting rid of the reference seems like the right thing to 
> do.

It's still useful to refer to that commit if the code worked prior to
that commit. But you should make it clearer that 0d9f0a52c8b9f wasn't in
error, it just exposed an existing shortcoming of the arch code.

cheers

Re: [PATCH v6 07/22] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP

2020-11-25 Thread Michael Ellerman

"Aneesh Kumar K.V"  writes:
> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index 255a1837e9f7..f5c7a17c198a 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -28,6 +28,11 @@
>   * Individual features below.
>   */
>  
> +/*
> + * Supports KUAP (key 0 controlling userspace addresses) on radix
> + */

That comment needs updating.

I think this feature now means we have either key 0 controlling uaccess
on radix OR we're using the AMR to manually implement KUAP.

> +#define MMU_FTR_KUAP ASM_CONST(0x0200)

I agree with Christophe that this name is now too generic.

With that name one would expect it to be enabled on the 32-bit CPUs that
implement KUAP.

Maybe MMU_FTR_BOOK3S_KUAP ?

If in future the other MMUs want an MMU feature for KUAP then we could
rename it to MMU_FTR_KUAP, but we'd need to be careful with ifdefs to
make sure it guards the right things.

cheers

[PATCH 11/13] ibmvfc: set and track hw queue in ibmvfc_event struct

2020-11-25 Thread Tyrel Datwyler

Extract the hwq id from a SCSI command and store it in the ibmvfc_event
structure to identify which Sub-CRQ to send the command down when
channels are being utilized.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 5 +
 drivers/scsi/ibmvscsi/ibmvfc.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 55893d09f883..f686c2cb0de2 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1387,6 +1387,7 @@ static void ibmvfc_init_event(struct ibmvfc_event *evt,
evt->crq.format = format;
evt->done = done;
evt->eh_comp = NULL;
+   evt->hwq = 0;
 }
 
 /**
@@ -1738,6 +1739,8 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
struct ibmvfc_cmd *vfc_cmd;
struct ibmvfc_fcp_cmd_iu *iu;
struct ibmvfc_event *evt;
+   u32 tag_and_hwq = blk_mq_unique_tag(cmnd->request);
+   u16 hwq = blk_mq_unique_tag_to_hwq(tag_and_hwq);
int rc;
 
if (unlikely((rc = fc_remote_port_chkready(rport))) ||
@@ -1765,6 +1768,8 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
}
 
vfc_cmd->correlation = cpu_to_be64(evt);
+   if (vhost->using_channels)
+   evt->hwq = hwq % vhost->scsi_scrqs.active_queues;
 
if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev
return ibmvfc_send_event(evt, vhost, 0);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 04086ffbfca7..abda910ae33d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -781,6 +781,7 @@ struct ibmvfc_event {
struct completion comp;
struct completion *eh_comp;
struct timer_list timer;
+   u16 hwq;
 };
 
 /* a pool of event structs for use */
-- 
2.27.0

[PATCH 13/13] ibmvfc: register Sub-CRQ handles with VIOS during channel setup

2020-11-25 Thread Tyrel Datwyler

If the ibmvfc client adapter requests channels it must submit a number
of Sub-CRQ handles matching the number of channels being requested. The
VIOS in its response will overwrite the actual number of channel
resources allocated which may be less than what was requested. The
client then must store the VIOS Sub-CRQ handle for each queue. This VIOS
handle is needed as a parameter with  h_send_sub_crq().

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 897e3236534d..6bb1028bbe44 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4494,15 +4494,35 @@ static void ibmvfc_discover_targets(struct ibmvfc_host 
*vhost)
 static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
 {
struct ibmvfc_host *vhost = evt->vhost;
+   struct ibmvfc_channel_setup *setup = vhost->channel_setup_buf;
+   struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs;
u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status);
int level = IBMVFC_DEFAULT_LOG_LEVEL;
+   int flags, active_queues, i;
 
ibmvfc_free_event(evt);
 
switch (mad_status) {
case IBMVFC_MAD_SUCCESS:
ibmvfc_dbg(vhost, "Channel Setup succeded\n");
+   flags = be32_to_cpu(setup->flags);
vhost->do_enquiry = 0;
+   active_queues = be32_to_cpu(setup->num_scsi_subq_channels);
+   scrqs->active_queues = active_queues;
+
+   if (flags & IBMVFC_CHANNELS_CANCELED) {
+   ibmvfc_dbg(vhost, "Channels Canceled\n");
+   vhost->using_channels = 0;
+   } else {
+   if (active_queues)
+   vhost->using_channels = 1;
+   for (i = 0; i < active_queues; i++)
+   scrqs->scrqs[i].vios_cookie =
+   be64_to_cpu(setup->channel_handles[i]);
+
+   ibmvfc_dbg(vhost, "Using %u channels\n",
+  vhost->scsi_scrqs.active_queues);
+   }
break;
case IBMVFC_MAD_FAILED:
level += ibmvfc_retry_host_init(vhost);
@@ -4526,9 +4546,19 @@ static void ibmvfc_channel_setup(struct ibmvfc_host 
*vhost)
struct ibmvfc_channel_setup_mad *mad;
struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf;
struct ibmvfc_event *evt = ibmvfc_get_event(vhost);
+   struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs;
+   unsigned int num_channels =
+   min(vhost->client_scsi_channels, vhost->max_vios_scsi_channels);
+   int i;
 
memset(setup_buf, 0, sizeof(*setup_buf));
-   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+   if (num_channels == 0)
+   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+   else {
+   setup_buf->num_scsi_subq_channels = cpu_to_be32(num_channels);
+   for (i = 0; i < num_channels; i++)
+   setup_buf->channel_handles[i] = 
cpu_to_be64(scrqs->scrqs[i].cookie);
+   }
 
ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT);
mad = >iu.channel_setup;
-- 
2.27.0

[PATCH 12/13] ibmvfc: send commands down HW Sub-CRQ when channelized

2020-11-25 Thread Tyrel Datwyler

When the client has negotiated the use of channels all vfcFrames are
required to go down a Sub-CRQ channel or it is a protocoal violation. If
the adapter state is channelized submit vfcFrames to the appropriate
Sub-CRQ via the h_send_sub_crq() helper.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index f686c2cb0de2..897e3236534d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -701,6 +701,15 @@ static int ibmvfc_send_crq(struct ibmvfc_host *vhost, u64 
word1, u64 word2)
return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address, word1, word2);
 }
 
+static int ibmvfc_send_sub_crq(struct ibmvfc_host *vhost, u64 cookie, u64 
word1,
+  u64 word2, u64 word3, u64 word4)
+{
+   struct vio_dev *vdev = to_vio_dev(vhost->dev);
+
+   return plpar_hcall_norets(H_SEND_SUB_CRQ, vdev->unit_address, cookie,
+ word1, word2, word3, word4);
+}
+
 /**
  * ibmvfc_send_crq_init - Send a CRQ init message
  * @vhost: ibmvfc host struct
@@ -1524,8 +1533,17 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
 
mb();
 
-   if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
- be64_to_cpu(crq_as_u64[1] {
+   if (vhost->using_channels && evt->crq.format == IBMVFC_CMD_FORMAT)
+   rc = ibmvfc_send_sub_crq(vhost,
+
vhost->scsi_scrqs.scrqs[evt->hwq].vios_cookie,
+be64_to_cpu(crq_as_u64[0]),
+be64_to_cpu(crq_as_u64[1]),
+0, 0);
+   else
+   rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
+be64_to_cpu(crq_as_u64[1]));
+
+   if (rc) {
list_del(>queue);
del_timer(>timer);
 
-- 
2.27.0

[PATCH 01/13] ibmvfc: add vhost fields and defaults for MQ enablement

2020-11-25 Thread Tyrel Datwyler

Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 7 +++
 drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 42e4d35e0d35..cd609d19e6a1 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5167,6 +5167,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
shost->max_sectors = IBMVFC_MAX_SECTORS;
shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
shost->unique_id = shost->host_no;
+   shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
 
vhost = shost_priv(shost);
INIT_LIST_HEAD(>sent);
@@ -5178,6 +5179,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
vhost->partition_number = -1;
vhost->log_level = log_level;
vhost->task_set = 1;
+
+   vhost->mq_enabled = IBMVFC_MQ;
+   vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
+   vhost->using_channels = 0;
+   vhost->do_enquiry = 1;
+
strcpy(vhost->partition_name, "UNKNOWN");
init_waitqueue_head(>work_wait_q);
init_waitqueue_head(>init_wait_q);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 9d58cfd774d3..8225bdbb127e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,6 +41,11 @@
 #define IBMVFC_DEFAULT_LOG_LEVEL   2
 #define IBMVFC_MAX_CDB_LEN 16
 #define IBMVFC_CLS3_ERROR  0
+#define IBMVFC_MQ  0
+#define IBMVFC_SCSI_CHANNELS   0
+#define IBMVFC_SCSI_HW_QUEUES  1
+#define IBMVFC_MIG_NO_SUB_TO_CRQ   0
+#define IBMVFC_MIG_NO_N_TO_M   0
 
 /*
  * Ensure we have resources for ERP and initialization:
@@ -826,6 +831,10 @@ struct ibmvfc_host {
int delay_init;
int scan_complete;
int logged_in;
+   int mq_enabled;
+   int using_channels;
+   int do_enquiry;
+   int client_scsi_channels;
int aborting_passthru;
int events_to_log;
 #define IBMVFC_AE_LINKUP   0x0001
-- 
2.27.0

[PATCH 05/13] ibmvfc: add Sub-CRQ IRQ enable/disable routine

2020-11-25 Thread Tyrel Datwyler

Each Sub-CRQ has its own interrupt. A hypercall is required to toggle
the IRQ state. Provide the necessary mechanism via a helper function.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 571abdb48384..6eaedda4917a 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3351,6 +3351,26 @@ static void ibmvfc_tasklet(void *data)
spin_unlock_irqrestore(vhost->host->host_lock, flags);
 }
 
+static int ibmvfc_toggle_scrq_irq(struct ibmvfc_sub_queue *scrq, int enable)
+{
+   struct device *dev = scrq->vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   unsigned long rc;
+   int irq_action = H_ENABLE_VIO_INTERRUPT;
+
+   if (!enable)
+   irq_action = H_DISABLE_VIO_INTERRUPT;
+
+   rc = plpar_hcall_norets(H_VIOCTL, vdev->unit_address, irq_action,
+   scrq->hw_irq, 0, 0);
+
+   if (rc)
+   dev_err(dev, "Couldn't %s sub-crq[%lu] irq. rc=%ld\n",
+   enable ? "enable" : "disable", scrq->hwq_id, rc);
+
+   return rc;
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH 09/13] ibmvfc: implement channel enquiry and setup commands

2020-11-25 Thread Tyrel Datwyler

New NPIV_ENQUIRY_CHANNEL and NPIV_SETUP_CHANNEL management datagrams
(MADs) were defined in a previous patchset. If the client advertises a
desire to use channels and the partner VIOS is channel capable then the
client must proceed with channel enquiry to determine the maximum number
of channels the VIOS is capable of providing, and registering SubCRQs
via channel setup with the VIOS immediately following NPIV Login. This
handshaking should not be performed for subsequent NPIV Logins unless
the CRQ connection has been reset.

Implement these two new MADs and issue them following a successful NPIV
login where the VIOS has set the SUPPORT_CHANNELS capability bit in the
NPIV Login response.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 135 -
 drivers/scsi/ibmvscsi/ibmvfc.h |   3 +
 2 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 53db6da20923..40a945712bdb 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -804,6 +804,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
spin_lock_irqsave(vhost->host->host_lock, flags);
vhost->state = IBMVFC_NO_CRQ;
vhost->logged_in = 0;
+   vhost->do_enquiry = 1;
+   vhost->using_channels = 0;
 
/* Clean out the queue */
memset(crq->msgs, 0, PAGE_SIZE);
@@ -4462,6 +4464,118 @@ static void ibmvfc_discover_targets(struct ibmvfc_host 
*vhost)
ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
 }
 
+static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
+{
+   struct ibmvfc_host *vhost = evt->vhost;
+   u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status);
+   int level = IBMVFC_DEFAULT_LOG_LEVEL;
+
+   ibmvfc_free_event(evt);
+
+   switch (mad_status) {
+   case IBMVFC_MAD_SUCCESS:
+   ibmvfc_dbg(vhost, "Channel Setup succeded\n");
+   vhost->do_enquiry = 0;
+   break;
+   case IBMVFC_MAD_FAILED:
+   level += ibmvfc_retry_host_init(vhost);
+   ibmvfc_log(vhost, level, "Channel Setup failed\n");
+   fallthrough;
+   case IBMVFC_MAD_DRIVER_FAILED:
+   return;
+   default:
+   dev_err(vhost->dev, "Invalid Channel Setup response: 0x%x\n",
+   mad_status);
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
+   return;
+   }
+
+   ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);
+   wake_up(>work_wait_q);
+}
+
+static void ibmvfc_channel_setup(struct ibmvfc_host *vhost)
+{
+   struct ibmvfc_channel_setup_mad *mad;
+   struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf;
+   struct ibmvfc_event *evt = ibmvfc_get_event(vhost);
+
+   memset(setup_buf, 0, sizeof(*setup_buf));
+   setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
+
+   ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT);
+   mad = >iu.channel_setup;
+   memset(mad, 0, sizeof(*mad));
+   mad->common.version = cpu_to_be32(1);
+   mad->common.opcode = cpu_to_be32(IBMVFC_CHANNEL_SETUP);
+   mad->common.length = cpu_to_be16(sizeof(*mad));
+   mad->buffer.va = cpu_to_be64(vhost->channel_setup_dma);
+   mad->buffer.len = cpu_to_be32(sizeof(*vhost->channel_setup_buf));
+
+   ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_INIT_WAIT);
+
+   if (!ibmvfc_send_event(evt, vhost, default_timeout))
+   ibmvfc_dbg(vhost, "Sent channel setup\n");
+   else
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN);
+}
+
+static void ibmvfc_channel_enquiry_done(struct ibmvfc_event *evt)
+{
+   struct ibmvfc_host *vhost = evt->vhost;
+   struct ibmvfc_channel_enquiry *rsp = >xfer_iu->channel_enquiry;
+   u32 mad_status = be16_to_cpu(rsp->common.status);
+   int level = IBMVFC_DEFAULT_LOG_LEVEL;
+
+   switch (mad_status) {
+   case IBMVFC_MAD_SUCCESS:
+   ibmvfc_dbg(vhost, "Channel Enquiry succeeded\n");
+   vhost->max_vios_scsi_channels = 
be32_to_cpu(rsp->num_scsi_subq_channels);
+   break;
+   case IBMVFC_MAD_FAILED:
+   level += ibmvfc_retry_host_init(vhost);
+   ibmvfc_log(vhost, level, "Channel Enquiry failed\n");
+   ibmvfc_free_event(evt);
+   fallthrough;
+   case IBMVFC_MAD_DRIVER_FAILED:
+   ibmvfc_free_event(evt);
+   return;
+   default:
+   dev_err(vhost->dev, "Invalid Channel Enquiry response: 0x%x\n",
+   mad_status);
+   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
+   ibmvfc_free_event(evt);
+   return;
+   }
+
+   ibmvfc_channel_setup(vhost);
+}
+
+static void ibmvfc_channel_enquiry(struct ibmvfc_host *vhost)
+{
+   struct ibmvfc_channel_enquiry

[PATCH 03/13] ibmvfc: add Subordinate CRQ definitions

2020-11-25 Thread Tyrel Datwyler

Subordinate Command Response Queues (Sub CRQ) are used in conjunction
with the primary CRQ when more than one queue is needed by the virtual
IO adapter. Recent phyp firmware versions support Sub CRQ's with ibmvfc
adapters. This feature is a prerequisite for supporting multiple
hardware backed submission queues in the vfc adapter.

The Sub CRQ command element differs from the standard CRQ in that it is
32bytes long as opposed to 16bytes for the latter. Despite this extra
16bytes the ibmvfc protocol will use the original CRQ command element
mapped to the first 16bytes of the Sub CRQ element initially.

Add definitions for the Sub CRQ command element and queue.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 8225bdbb127e..084ecdfe51ea 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -656,6 +656,29 @@ struct ibmvfc_crq_queue {
dma_addr_t msg_token;
 };
 
+struct ibmvfc_sub_crq {
+   struct ibmvfc_crq crq;
+   __be64 reserved[2];
+} __packed __aligned(8);
+
+struct ibmvfc_sub_queue {
+   struct ibmvfc_sub_crq *msgs;
+   dma_addr_t msg_token;
+   int size, cur;
+   struct ibmvfc_host *vhost;
+   unsigned long cookie;
+   unsigned long vios_cookie;
+   unsigned long hw_irq;
+   unsigned long irq;
+   unsigned long hwq_id;
+   char name[32];
+};
+
+struct ibmvfc_scsi_channels {
+   struct ibmvfc_sub_queue *scrqs;
+   unsigned int active_queues;
+};
+
 enum ibmvfc_ae_link_state {
IBMVFC_AE_LS_LINK_UP= 0x01,
IBMVFC_AE_LS_LINK_BOUNCED   = 0x02,
-- 
2.27.0

[PATCH 08/13] ibmvfc: map/request irq and register Sub-CRQ interrupt handler

2020-11-25 Thread Tyrel Datwyler

Create an irq mapping for the hw_irq number provided from phyp firmware.
Request an irq assigned our Sub-CRQ interrupt handler.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 4fb782fa2c66..53db6da20923 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5119,12 +5119,34 @@ static int ibmvfc_register_scsi_channel(struct 
ibmvfc_host *vhost,
goto reg_failed;
}
 
+   scrq->irq = irq_create_mapping(NULL, scrq->hw_irq);
+
+   if (!scrq->irq) {
+   rc = -EINVAL;
+   dev_err(dev, "Error mapping sub-crq[%d] irq\n", index);
+   goto irq_failed;
+   }
+
+   snprintf(scrq->name, sizeof(scrq->name), "ibmvfc-%x-scsi%d",
+vdev->unit_address, index);
+   rc = request_irq(scrq->irq, ibmvfc_interrupt_scsi, 0, scrq->name, scrq);
+
+   if (rc) {
+   dev_err(dev, "Couldn't register sub-crq[%d] irq\n", index);
+   irq_dispose_mapping(scrq->irq);
+   goto irq_failed;
+   }
+
scrq->hwq_id = index;
scrq->vhost = vhost;
 
LEAVE;
return 0;
 
+irq_failed:
+   do {
+   plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
scrq->cookie);
+   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
 reg_failed:
dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
 dma_map_failed:
-- 
2.27.0

[PATCH 04/13] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2020-11-25 Thread Tyrel Datwyler

Allocate a set of Sub-CRQs in advance. During channel setup the client
and VIOS negotiate the number of queues the VIOS supports and the number
that the client desires to request. Its possible that the final channel
resources allocated is less than requested, but the client is still
responsible for sending handles for every queue it is hoping for.

Also, provide deallocation cleanup routines.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 115 +
 drivers/scsi/ibmvscsi/ibmvfc.h |   1 +
 2 files changed, 116 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 260b82e3cc01..571abdb48384 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4983,6 +4983,114 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
return retrc;
 }
 
+static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
+ int index)
+{
+   struct device *dev = vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
+   int rc = -ENOMEM;
+
+   ENTER;
+
+   scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
+   if (!scrq->msgs)
+   return rc;
+
+   scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
+   scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
+DMA_BIDIRECTIONAL);
+
+   if (dma_mapping_error(dev, scrq->msg_token))
+   goto dma_map_failed;
+
+   rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
+  >cookie, >hw_irq);
+
+   if (rc) {
+   dev_warn(dev, "Error registering sub-crq: %d\n", rc);
+   dev_warn(dev, "Firmware may not support MQ\n");
+   goto reg_failed;
+   }
+
+   scrq->hwq_id = index;
+   scrq->vhost = vhost;
+
+   LEAVE;
+   return 0;
+
+reg_failed:
+   dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
+dma_map_failed:
+   free_page((unsigned long)scrq->msgs);
+   LEAVE;
+   return rc;
+}
+
+static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int 
index)
+{
+   struct device *dev = vhost->dev;
+   struct vio_dev *vdev = to_vio_dev(dev);
+   struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
+   long rc;
+
+   ENTER;
+
+   do {
+   rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
+   scrq->cookie);
+   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
+
+   if (rc)
+   dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
+
+   dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
+   free_page((unsigned long)scrq->msgs);
+   LEAVE;
+}
+
+static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
+{
+   int i, j;
+
+   ENTER;
+
+   vhost->scsi_scrqs.scrqs = kcalloc(vhost->client_scsi_channels,
+ sizeof(*vhost->scsi_scrqs.scrqs),
+ GFP_KERNEL);
+   if (!vhost->scsi_scrqs.scrqs)
+   return -1;
+
+   for (i = 0; i < vhost->client_scsi_channels; i++) {
+   if (ibmvfc_register_scsi_channel(vhost, i)) {
+   for (j = i; j > 0; j--)
+   ibmvfc_deregister_scsi_channel(vhost, j - 1);
+   kfree(vhost->scsi_scrqs.scrqs);
+   LEAVE;
+   return -1;
+   }
+   }
+
+   LEAVE;
+   return 0;
+}
+
+static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
+{
+   int i;
+
+   ENTER;
+   if (!vhost->scsi_scrqs.scrqs)
+   return;
+
+   for (i = 0; i < vhost->client_scsi_channels; i++)
+   ibmvfc_deregister_scsi_channel(vhost, i);
+
+   vhost->scsi_scrqs.active_queues = 0;
+   kfree(vhost->scsi_scrqs.scrqs);
+   LEAVE;
+}
+
 /**
  * ibmvfc_free_mem - Free memory for vhost
  * @vhost: ibmvfc host struct
@@ -5239,6 +5347,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
goto remove_shost;
}
 
+   if (vhost->mq_enabled) {
+   rc = ibmvfc_init_sub_crqs(vhost);
+   if (rc)
+   dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", 
rc);
+   }
+
if (shost_to_fc_host(shost)->rqst_q)
blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1);
dev_set_drvdata(dev, vhost);
@@ -5296,6 +5410,7 @@ static int ibmvfc_remove(struct vio_dev *vdev)
ibmvfc_purge_requests(vhost, DID_ERROR);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_free_event_pool(vhost);
+   ibmvfc_release_sub_crqs(vhost);

[PATCH 02/13] ibmvfc: define hcall wrapper for registering a Sub-CRQ

2020-11-25 Thread Tyrel Datwyler

Sub-CRQs are registred with firmware via a hypercall. Abstract that
interface into a simpler helper function.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index cd609d19e6a1..260b82e3cc01 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -138,6 +138,20 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
 static const char *unknown_error = "unknown error";
 
+static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba,
+ unsigned long length, unsigned long *cookie,
+ unsigned long *irq)
+{
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+   long rc;
+
+   rc = plpar_hcall(H_REG_SUB_CRQ, retbuf, unit_address, ioba, length);
+   *cookie = retbuf[0];
+   *irq = retbuf[1];
+
+   return rc;
+}
+
 static int ibmvfc_check_caps(struct ibmvfc_host *vhost, unsigned long 
cap_flags)
 {
u64 host_caps = be64_to_cpu(vhost->login_buf->resp.capabilities);
-- 
2.27.0

[PATCH 00/13] ibmvfc: initial MQ development

2020-11-25 Thread Tyrel Datwyler

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels. The event pool and locking still leverages the legacy single
queue implementation, and as such lock contention is problematic when increasing
the number of queues. However, this initial work demonstrates a 1.2x factor
increase in IOPs when configured with two HW queues despite lock contention.

Tyrel Datwyler (13):
  ibmvfc: add vhost fields and defaults for MQ enablement
  ibmvfc: define hcall wrapper for registering a Sub-CRQ
  ibmvfc: add Subordinate CRQ definitions
  ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
  ibmvfc: add Sub-CRQ IRQ enable/disable routine
  ibmvfc: add handlers to drain and complete Sub-CRQ responses
  ibmvfc: define Sub-CRQ interrupt handler routine
  ibmvfc: map/request irq and register Sub-CRQ interrupt handler
  ibmvfc: implement channel enquiry and setup commands
  ibmvfc: advertise client support for using hardware channels
  ibmvfc: set and track hw queue in ibmvfc_event struct
  ibmvfc: send commands down HW Sub-CRQ when channelized
  ibmvfc: register Sub-CRQ handles with VIOS during channel setup

 drivers/scsi/ibmvscsi/ibmvfc.c | 460 -
 drivers/scsi/ibmvscsi/ibmvfc.h |  37 +++
 2 files changed, 493 insertions(+), 4 deletions(-)

-- 
2.27.0

[PATCH 07/13] ibmvfc: define Sub-CRQ interrupt handler routine

2020-11-25 Thread Tyrel Datwyler

Simple handler that calls Sub-CRQ drain routine directly.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index a8730522920e..4fb782fa2c66 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3443,6 +3443,20 @@ static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue 
*scrq)
}
 }
 
+static irqreturn_t ibmvfc_interrupt_scsi(int irq, void *scrq_instance)
+{
+   struct ibmvfc_sub_queue *scrq = (struct ibmvfc_sub_queue 
*)scrq_instance;
+   struct ibmvfc_host *vhost = scrq->vhost;
+   unsigned long flags;
+
+   spin_lock_irqsave(vhost->host->host_lock, flags);
+   ibmvfc_toggle_scrq_irq(scrq, 0);
+   ibmvfc_drain_sub_crq(scrq);
+   spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
+   return IRQ_HANDLED;
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH 10/13] ibmvfc: advertise client support for using hardware channels

2020-11-25 Thread Tyrel Datwyler

Previous patches have plumbed the necessary Sub-CRQ interface and
channel negotiation MADs to fully channelized hardware queues.

Advertise client support via NPIV Login capability
IBMVFC_CAN_USE_CHANNELS when the client bits have MQ enabled via
vhost->mq_enabled, or when channels were already in use during a
subsequent NPIV Login. The later is required because channel support is
only renegotiated after a CRQ pair is broken. Simple NPIV Logout/Logins
require the client to continue to advertise the channel capability until
the CRQ pair between the client is broken.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 40a945712bdb..55893d09f883 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1272,6 +1272,10 @@ static void ibmvfc_set_login_info(struct ibmvfc_host 
*vhost)
 
login_info->max_cmds = cpu_to_be32(max_requests + 
IBMVFC_NUM_INTERNAL_REQ);
login_info->capabilities = cpu_to_be64(IBMVFC_CAN_MIGRATE | 
IBMVFC_CAN_SEND_VF_WWPN);
+
+   if (vhost->mq_enabled || vhost->using_channels)
+   login_info->capabilities |= 
cpu_to_be64(IBMVFC_CAN_USE_CHANNELS);
+
login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token);
login_info->async.len = cpu_to_be32(vhost->async_crq.size * 
sizeof(*vhost->async_crq.msgs));
strncpy(login_info->partition_name, vhost->partition_name, 
IBMVFC_MAX_NAME);
-- 
2.27.0

[PATCH 06/13] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-11-25 Thread Tyrel Datwyler

The logic for iterating over the Sub-CRQ responses is similiar to that
of the primary CRQ. Add the necessary handlers for processing those
responses.

Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 6eaedda4917a..a8730522920e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3371,6 +3371,78 @@ static int ibmvfc_toggle_scrq_irq(struct 
ibmvfc_sub_queue *scrq, int enable)
return rc;
 }
 
+static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
*vhost)
+{
+   struct ibmvfc_event *evt = (struct ibmvfc_event 
*)be64_to_cpu(crq->ioba);
+
+   switch (crq->valid) {
+   case IBMVFC_CRQ_CMD_RSP:
+   break;
+   default:
+   dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
crq->valid);
+   return;
+   }
+
+   /* The only kind of payload CRQs we should get are responses to
+* things we send. Make sure this response is to something we
+* actually sent
+*/
+   if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
+   dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
invalid!\n",
+   crq->ioba);
+   return;
+   }
+
+   if (unlikely(atomic_read(>free))) {
+   dev_err(vhost->dev, "Received duplicate correlation_token 
0x%08llx!\n",
+   crq->ioba);
+   return;
+   }
+
+   del_timer(>timer);
+   list_del(>queue);
+   ibmvfc_trc_end(evt);
+   evt->done(evt);
+}
+
+static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
+{
+   struct ibmvfc_crq *crq;
+
+   crq = >msgs[scrq->cur].crq;
+   if (crq->valid & 0x80) {
+   if (++scrq->cur == scrq->size)
+   scrq->cur = 0;
+   rmb();
+   } else
+   crq = NULL;
+
+   return crq;
+}
+
+static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue *scrq)
+{
+   struct ibmvfc_crq *crq;
+   int done = 0;
+
+   while (!done) {
+   while ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+   ibmvfc_handle_scrq(crq, scrq->vhost);
+   crq->valid = 0;
+   wmb();
+   }
+
+   ibmvfc_toggle_scrq_irq(scrq, 1);
+   if ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
+   ibmvfc_toggle_scrq_irq(scrq, 0);
+   ibmvfc_handle_scrq(crq, scrq->vhost);
+   crq->valid = 0;
+   wmb();
+   } else
+   done = 1;
+   }
+}
+
 /**
  * ibmvfc_init_tgt - Set the next init job step for the target
  * @tgt:   ibmvfc target struct
-- 
2.27.0

[PATCH net v3 1/9] ibmvnic: handle inconsistent login with reset

2020-11-25 Thread Dany Madden

Inconsistent login with the vnicserver is causing the device to be
removed. This does not give the device a chance to recover from error
state. This patch schedules a FATAL reset instead to bring the adapter
up.

Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden 
Signed-off-by: Lijun Pan 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 2aa40b2f225c..dcb23015b6b4 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -4412,7 +4412,7 @@ static int handle_login_rsp(union ibmvnic_crq 
*login_rsp_crq,
 adapter->req_rx_add_queues !=
 be32_to_cpu(login_rsp->num_rxadd_subcrqs))) {
dev_err(dev, "FATAL: Inconsistent login and login rsp\n");
-   ibmvnic_remove(adapter->vdev);
+   ibmvnic_reset(adapter, VNIC_RESET_FATAL);
return -EIO;
}
size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
-- 
2.26.2

[PATCH net v3 9/9] ibmvnic: reduce wait for completion time

2020-11-25 Thread Dany Madden

Reduce the wait time for Command Response Queue response from 30 seconds
to 20 seconds, as recommended by VIOS and Power Hypervisor teams.

Fixes: bd0b672313941 ("ibmvnic: Move login and queue negotiation into 
ibmvnic_open")
Fixes: 53da09e92910f ("ibmvnic: Add set_link_state routine for setting adapter 
link state")
Signed-off-by: Dany Madden 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index a17856be2828..d6b2686aed0f 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -834,7 +834,7 @@ static void release_napi(struct ibmvnic_adapter *adapter)
 static int ibmvnic_login(struct net_device *netdev)
 {
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
-   unsigned long timeout = msecs_to_jiffies(3);
+   unsigned long timeout = msecs_to_jiffies(2);
int retry_count = 0;
int retries = 10;
bool retry;
@@ -938,7 +938,7 @@ static void release_resources(struct ibmvnic_adapter 
*adapter)
 static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state)
 {
struct net_device *netdev = adapter->netdev;
-   unsigned long timeout = msecs_to_jiffies(3);
+   unsigned long timeout = msecs_to_jiffies(2);
union ibmvnic_crq crq;
bool resend;
int rc;
@@ -5125,7 +5125,7 @@ static int init_crq_queue(struct ibmvnic_adapter *adapter)
 static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset)
 {
struct device *dev = >vdev->dev;
-   unsigned long timeout = msecs_to_jiffies(3);
+   unsigned long timeout = msecs_to_jiffies(2);
u64 old_num_rx_queues, old_num_tx_queues;
int rc;
 
-- 
2.26.2

[PATCH net v3 7/9] ibmvnic: send_login should check for crq errors

2020-11-25 Thread Dany Madden

send_login() does not check for the result of ibmvnic_send_crq() of the
login request. This results in the driver needlessly retrying the login
10 times even when CRQ is no longer active. Check the return code and
give up in case of errors in sending the CRQ.

The only time we want to retry is if we get a PARITALSUCCESS response
from the partner.

Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden 
Signed-off-by: Sukadev Bhattiprolu 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 55b07bd4c741..9005fab09e15 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -850,10 +850,8 @@ static int ibmvnic_login(struct net_device *netdev)
adapter->init_done_rc = 0;
reinit_completion(>init_done);
rc = send_login(adapter);
-   if (rc) {
-   netdev_warn(netdev, "Unable to login\n");
+   if (rc)
return rc;
-   }
 
if (!wait_for_completion_timeout(>init_done,
 timeout)) {
@@ -3727,15 +3725,16 @@ static int send_login(struct ibmvnic_adapter *adapter)
struct ibmvnic_login_rsp_buffer *login_rsp_buffer;
struct ibmvnic_login_buffer *login_buffer;
struct device *dev = >vdev->dev;
+   struct vnic_login_client_data *vlcd;
dma_addr_t rsp_buffer_token;
dma_addr_t buffer_token;
size_t rsp_buffer_size;
union ibmvnic_crq crq;
+   int client_data_len;
size_t buffer_size;
__be64 *tx_list_p;
__be64 *rx_list_p;
-   int client_data_len;
-   struct vnic_login_client_data *vlcd;
+   int rc;
int i;
 
if (!adapter->tx_scrq || !adapter->rx_scrq) {
@@ -3841,16 +3840,23 @@ static int send_login(struct ibmvnic_adapter *adapter)
crq.login.len = cpu_to_be32(buffer_size);
 
adapter->login_pending = true;
-   ibmvnic_send_crq(adapter, );
+   rc = ibmvnic_send_crq(adapter, );
+   if (rc) {
+   adapter->login_pending = false;
+   netdev_err(adapter->netdev, "Failed to send login, rc=%d\n", 
rc);
+   goto buf_rsp_map_failed;
+   }
 
return 0;
 
 buf_rsp_map_failed:
kfree(login_rsp_buffer);
+   adapter->login_rsp_buf = NULL;
 buf_rsp_alloc_failed:
dma_unmap_single(dev, buffer_token, buffer_size, DMA_TO_DEVICE);
 buf_map_failed:
kfree(login_buffer);
+   adapter->login_buf = NULL;
 buf_alloc_failed:
return -1;
 }
-- 
2.26.2

[PATCH net v3 8/9] ibmvnic: no reset timeout for 5 seconds after reset

2020-11-25 Thread Dany Madden

Reset timeout is going off right after adapter reset. This patch ensures
that timeout is scheduled if it has been 5 seconds since the last reset.
5 seconds is the default watchdog timeout.

Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 11 +--
 drivers/net/ethernet/ibm/ibmvnic.h |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 9005fab09e15..a17856be2828 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2253,6 +2253,7 @@ static void __ibmvnic_reset(struct work_struct *work)
rc = do_reset(adapter, rwi, reset_state);
}
kfree(rwi);
+   adapter->last_reset_time = jiffies;
 
if (rc)
netdev_dbg(adapter->netdev, "Reset failed, rc=%d\n", 
rc);
@@ -2356,7 +2357,13 @@ static void ibmvnic_tx_timeout(struct net_device *dev, 
unsigned int txqueue)
   "Adapter is resetting, skip timeout reset\n");
return;
}
-
+   /* No queuing up reset until at least 5 seconds (default watchdog val)
+* after last reset
+*/
+   if (time_before(jiffies, (adapter->last_reset_time + 
dev->watchdog_timeo))) {
+   netdev_dbg(dev, "Not yet time to tx timeout.\n");
+   return;
+   }
ibmvnic_reset(adapter, VNIC_RESET_TIMEOUT);
 }
 
@@ -5277,7 +5284,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const 
struct vio_device_id *id)
adapter->state = VNIC_PROBED;
 
adapter->wait_for_reset = false;
-
+   adapter->last_reset_time = jiffies;
return 0;
 
 ibmvnic_register_fail:
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index 6f0a701c4a38..b21092f5f9c1 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1088,6 +1088,8 @@ struct ibmvnic_adapter {
unsigned long resetting;
bool napi_enabled, from_passive_init;
bool login_pending;
+   /* last device reset time */
+   unsigned long last_reset_time;
 
bool failover_pending;
bool force_reset_recovery;
-- 
2.26.2

[PATCH net v3 6/9] ibmvnic: track pending login

2020-11-25 Thread Dany Madden

From: Sukadev Bhattiprolu 

If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that
the worker thread will start reset process and free the login response
buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will
then try to access the login response buffer and crash.

Have ibmvnic track pending logins and discard any stale login responses.

Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 17 +
 drivers/net/ethernet/ibm/ibmvnic.h |  1 +
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index e2f9b0e9dea8..55b07bd4c741 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3839,6 +3839,8 @@ static int send_login(struct ibmvnic_adapter *adapter)
crq.login.cmd = LOGIN;
crq.login.ioba = cpu_to_be32(buffer_token);
crq.login.len = cpu_to_be32(buffer_size);
+
+   adapter->login_pending = true;
ibmvnic_send_crq(adapter, );
 
return 0;
@@ -4391,6 +4393,15 @@ static int handle_login_rsp(union ibmvnic_crq 
*login_rsp_crq,
u64 *size_array;
int i;
 
+   /* CHECK: Test/set of login_pending does not need to be atomic
+* because only ibmvnic_tasklet tests/clears this.
+*/
+   if (!adapter->login_pending) {
+   netdev_warn(netdev, "Ignoring unexpected login response\n");
+   return 0;
+   }
+   adapter->login_pending = false;
+
dma_unmap_single(dev, adapter->login_buf_token, adapter->login_buf_sz,
 DMA_TO_DEVICE);
dma_unmap_single(dev, adapter->login_rsp_buf_token,
@@ -4762,6 +4773,11 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
case IBMVNIC_CRQ_INIT:
dev_info(dev, "Partner initialized\n");
adapter->from_passive_init = true;
+   /* Discard any stale login responses from prev reset.
+* CHECK: should we clear even on INIT_COMPLETE?
+*/
+   adapter->login_pending = false;
+
if (!completion_done(>init_done)) {
complete(>init_done);
adapter->init_done_rc = -EIO;
@@ -5191,6 +5207,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const 
struct vio_device_id *id)
dev_set_drvdata(>dev, netdev);
adapter->vdev = dev;
adapter->netdev = netdev;
+   adapter->login_pending = false;
 
ether_addr_copy(adapter->mac_addr, mac_addr_p);
ether_addr_copy(netdev->dev_addr, adapter->mac_addr);
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index 217dcc7ded70..6f0a701c4a38 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1087,6 +1087,7 @@ struct ibmvnic_adapter {
struct delayed_work ibmvnic_delayed_reset;
unsigned long resetting;
bool napi_enabled, from_passive_init;
+   bool login_pending;
 
bool failover_pending;
bool force_reset_recovery;
-- 
2.26.2

[PATCH net v3 2/9] ibmvnic: stop free_all_rwi on failed reset

2020-11-25 Thread Dany Madden

When ibmvnic fails to reset, it breaks out of the reset loop and frees
all of the remaining resets from the workqueue. Doing so prevents the
adapter from recovering if no reset is scheduled after that. Instead,
have the driver continue to process resets on the workqueue.

Remove the no longer need free_all_rwi().

Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index dcb23015b6b4..d5a927bb4954 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2173,17 +2173,6 @@ static struct ibmvnic_rwi *get_next_rwi(struct 
ibmvnic_adapter *adapter)
return rwi;
 }
 
-static void free_all_rwi(struct ibmvnic_adapter *adapter)
-{
-   struct ibmvnic_rwi *rwi;
-
-   rwi = get_next_rwi(adapter);
-   while (rwi) {
-   kfree(rwi);
-   rwi = get_next_rwi(adapter);
-   }
-}
-
 static void __ibmvnic_reset(struct work_struct *work)
 {
struct ibmvnic_rwi *rwi;
@@ -2253,9 +2242,9 @@ static void __ibmvnic_reset(struct work_struct *work)
else
adapter->state = reset_state;
rc = 0;
-   } else if (rc && rc != IBMVNIC_INIT_FAILED &&
-   !adapter->force_reset_recovery)
-   break;
+   }
+   if (rc)
+   netdev_dbg(adapter->netdev, "Reset failed, rc=%d\n", 
rc);
 
rwi = get_next_rwi(adapter);
 
@@ -2269,11 +2258,6 @@ static void __ibmvnic_reset(struct work_struct *work)
complete(>reset_done);
}
 
-   if (rc) {
-   netdev_dbg(adapter->netdev, "Reset failed\n");
-   free_all_rwi(adapter);
-   }
-
clear_bit_unlock(0, >resetting);
 }
 
-- 
2.26.2

[PATCH net v3 3/9] ibmvnic: avoid memset null scrq msgs

2020-11-25 Thread Dany Madden

scrq->msgs could be NULL during device reset, causing Linux to crash.
So, check before memset scrq->msgs.

Fixes: c8b2ad0a4a901 ("ibmvnic: Sanitize entire SCRQ buffer on reset")
Signed-off-by: Dany Madden 
Signed-off-by: Lijun Pan 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d5a927bb4954..b08f95017825 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2845,15 +2845,26 @@ static int reset_one_sub_crq_queue(struct 
ibmvnic_adapter *adapter,
 {
int rc;
 
+   if (!scrq) {
+   netdev_dbg(adapter->netdev,
+  "Invalid scrq reset. irq (%d) or msgs (%p).\n",
+  scrq->irq, scrq->msgs);
+   return -EINVAL;
+   }
+
if (scrq->irq) {
free_irq(scrq->irq, scrq);
irq_dispose_mapping(scrq->irq);
scrq->irq = 0;
}
-
-   memset(scrq->msgs, 0, 4 * PAGE_SIZE);
-   atomic_set(>used, 0);
-   scrq->cur = 0;
+   if (scrq->msgs) {
+   memset(scrq->msgs, 0, 4 * PAGE_SIZE);
+   atomic_set(>used, 0);
+   scrq->cur = 0;
+   } else {
+   netdev_dbg(adapter->netdev, "Invalid scrq reset\n");
+   return -EINVAL;
+   }
 
rc = h_reg_sub_crq(adapter->vdev->unit_address, scrq->msg_token,
   4 * PAGE_SIZE, >crq_num, >hw_irq);
-- 
2.26.2

[PATCH net v3 5/9] ibmvnic: delay next reset if hard reset fails

2020-11-25 Thread Dany Madden

From: Sukadev Bhattiprolu 

If auto-priority failover is enabled, the backing device needs time
to settle if hard resetting fails for any reason. Add a delay of 60
seconds before retrying the hard-reset.

Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index ff474a790181..e2f9b0e9dea8 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2242,6 +2242,14 @@ static void __ibmvnic_reset(struct work_struct *work)
rc = do_hard_reset(adapter, rwi, reset_state);
rtnl_unlock();
}
+   if (rc) {
+   /* give backing device time to settle down */
+   netdev_dbg(adapter->netdev,
+  "[S:%d] Hard reset failed, waiting 
60 secs\n",
+  adapter->state);
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   schedule_timeout(60 * HZ);
+   }
} else if (!(rwi->reset_reason == VNIC_RESET_FATAL &&
adapter->from_passive_init)) {
rc = do_reset(adapter, rwi, reset_state);
-- 
2.26.2

[PATCH net v3 4/9] ibmvnic: restore adapter state on failed reset

2020-11-25 Thread Dany Madden

In a failed reset, driver could end up in VNIC_PROBED or VNIC_CLOSED
state and cannot recover in subsequent resets, leaving it offline.
This patch restores the adapter state to reset_state, the original
state when reset was called.

Fixes: b27507bb59ed5 ("net/ibmvnic: unlock rtnl_lock in reset so 
linkwatch_event can run")
Fixes: 2770a7984db58 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Dany Madden 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 67 --
 1 file changed, 36 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index b08f95017825..ff474a790181 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1857,7 +1857,7 @@ static int do_change_param_reset(struct ibmvnic_adapter 
*adapter,
if (reset_state == VNIC_OPEN) {
rc = __ibmvnic_close(netdev);
if (rc)
-   return rc;
+   goto out;
}
 
release_resources(adapter);
@@ -1875,24 +1875,25 @@ static int do_change_param_reset(struct ibmvnic_adapter 
*adapter,
}
 
rc = ibmvnic_reset_init(adapter, true);
-   if (rc)
-   return IBMVNIC_INIT_FAILED;
+   if (rc) {
+   rc = IBMVNIC_INIT_FAILED;
+   goto out;
+   }
 
/* If the adapter was in PROBE state prior to the reset,
 * exit here.
 */
if (reset_state == VNIC_PROBED)
-   return 0;
+   goto out;
 
rc = ibmvnic_login(netdev);
if (rc) {
-   adapter->state = reset_state;
-   return rc;
+   goto out;
}
 
rc = init_resources(adapter);
if (rc)
-   return rc;
+   goto out;
 
ibmvnic_disable_irqs(adapter);
 
@@ -1902,8 +1903,10 @@ static int do_change_param_reset(struct ibmvnic_adapter 
*adapter,
return 0;
 
rc = __ibmvnic_open(netdev);
-   if (rc)
-   return IBMVNIC_OPEN_FAILED;
+   if (rc) {
+   rc = IBMVNIC_OPEN_FAILED;
+   goto out;
+   }
 
/* refresh device's multicast list */
ibmvnic_set_multi(netdev);
@@ -1912,7 +1915,10 @@ static int do_change_param_reset(struct ibmvnic_adapter 
*adapter,
for (i = 0; i < adapter->req_rx_queues; i++)
napi_schedule(>napi[i]);
 
-   return 0;
+out:
+   if (rc)
+   adapter->state = reset_state;
+   return rc;
 }
 
 /**
@@ -2015,7 +2021,6 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 
rc = ibmvnic_login(netdev);
if (rc) {
-   adapter->state = reset_state;
goto out;
}
 
@@ -2083,6 +2088,9 @@ static int do_reset(struct ibmvnic_adapter *adapter,
rc = 0;
 
 out:
+   /* restore the adapter state if reset failed */
+   if (rc)
+   adapter->state = reset_state;
rtnl_unlock();
 
return rc;
@@ -2115,43 +2123,46 @@ static int do_hard_reset(struct ibmvnic_adapter 
*adapter,
if (rc) {
netdev_err(adapter->netdev,
   "Couldn't initialize crq. rc=%d\n", rc);
-   return rc;
+   goto out;
}
 
rc = ibmvnic_reset_init(adapter, false);
if (rc)
-   return rc;
+   goto out;
 
/* If the adapter was in PROBE state prior to the reset,
 * exit here.
 */
if (reset_state == VNIC_PROBED)
-   return 0;
+   goto out;
 
rc = ibmvnic_login(netdev);
-   if (rc) {
-   adapter->state = VNIC_PROBED;
-   return 0;
-   }
+   if (rc)
+   goto out;
 
rc = init_resources(adapter);
if (rc)
-   return rc;
+   goto out;
 
ibmvnic_disable_irqs(adapter);
adapter->state = VNIC_CLOSED;
 
if (reset_state == VNIC_CLOSED)
-   return 0;
+   goto out;
 
rc = __ibmvnic_open(netdev);
-   if (rc)
-   return IBMVNIC_OPEN_FAILED;
+   if (rc) {
+   rc = IBMVNIC_OPEN_FAILED;
+   goto out;
+   }
 
call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, netdev);
call_netdevice_notifiers(NETDEV_RESEND_IGMP, netdev);
-
-   return 0;
+out:
+   /* restore adapter state if reset failed */
+   if (rc)
+   adapter->state = reset_state;
+   return rc;
 }
 
 static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter)
@@ -2236,13 +2247,7 @@ static void __ibmvnic_reset(struct work_struct *work)
rc = do_reset(adapter, rwi, reset_state);
}
kfree(rwi);
-   if (rc == IBMVNIC_OPEN_FAILED) {
-   if (list_empty(>rwi_list))
-

[PATCH net v3 0/9] ibmvnic: assorted bug fixes

2020-11-25 Thread Dany Madden

Assorted fixes for ibmvnic originated from "[PATCH net 00/15] ibmvnic:
assorted bug fixes" sent by Lijun Pan.

v3 Changes as suggested by Jakub Kicinski:
- Add a space between variable declaration and code in patch 3/9. Checkpatch
  does not catch this.
- Unwrapped FIXES lines in patch 9/9.
- Removed all extra line between Fixes and Signed-off-by lines in all patches.

V2 Changes as suggested by Jakub Kicinski:
- Added "Fixes" to each patch.
- Remove "ibmvnic: process HMC disable command" from the series. Submitting it
  separately to net-next.
- Squash V1 "ibmvnic: remove free_all_rwi function" into
  ibmvnic: stop free_all_rwi on failed reset.


Dany Madden (7):
  ibmvnic: handle inconsistent login with reset
  ibmvnic: stop free_all_rwi on failed reset
  ibmvnic: avoid memset null scrq msgs
  ibmvnic: restore adapter state on failed reset
  ibmvnic: send_login should check for crq errors
  ibmvnic: no reset timeout for 5 seconds after reset
  ibmvnic: reduce wait for completion time

Sukadev Bhattiprolu (2):
  ibmvnic: delay next reset if hard reset fails
  ibmvnic: track pending login

 drivers/net/ethernet/ibm/ibmvnic.c | 168 ++---
 drivers/net/ethernet/ibm/ibmvnic.h |   3 +
 2 files changed, 106 insertions(+), 65 deletions(-)

-- 
2.26.2

[powerpc:next] BUILD SUCCESS 0bd4b96d99108b7ea9bac0573957483be7781d70

2020-11-25 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next
branch HEAD: 0bd4b96d99108b7ea9bac0573957483be7781d70  powernv/memtrace: don't 
abuse memory hot(un)plug infrastructure for memory allocations

elapsed time: 960m

configs tested: 130
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
powerpc xes_mpc85xx_defconfig
powerpc sequoia_defconfig
armoxnas_v6_defconfig
powerpc ep8248e_defconfig
arm   corgi_defconfig
powerpc   mpc834x_itxgp_defconfig
sh   allmodconfig
powerpc   lite5200b_defconfig
ia64  tiger_defconfig
sh   se7722_defconfig
arm  tct_hammer_defconfig
sh   se7721_defconfig
mips  maltaaprp_defconfig
arm nhk8815_defconfig
mipsar7_defconfig
shtitan_defconfig
powerpc mpc83xx_defconfig
powerpc  allmodconfig
m68kstmark2_defconfig
powerpc  mpc866_ads_defconfig
m68k apollo_defconfig
powerpc64   defconfig
sh apsh4a3a_defconfig
powerpc mpc512x_defconfig
s390defconfig
nios2   defconfig
mips   rs90_defconfig
nios2 3c120_defconfig
armqcom_defconfig
mips db1xxx_defconfig
powerpcfsp2_defconfig
c6xevmc6472_defconfig
sh  rsk7203_defconfig
armmvebu_v7_defconfig
mips decstation_r4k_defconfig
parisc   alldefconfig
mips  rm200_defconfig
sh   sh7770_generic_defconfig
powerpcgamecube_defconfig
armtrizeps4_defconfig
powerpc mpc836x_mds_defconfig
mips  cavium_octeon_defconfig
sh kfr2r09-romimage_defconfig
arm mv78xx0_defconfig
mips  maltasmvp_defconfig
m68kdefconfig
shsh7763rdp_defconfig
sparcalldefconfig
armmagician_defconfig
powerpc tqm8548_defconfig
shsh7785lcr_defconfig
armclps711x_defconfig
powerpc sbc8548_defconfig
arm lpc32xx_defconfig
shdreamcast_defconfig
powerpc mpc8313_rdb_defconfig
xtensa   alldefconfig
arm  lpd270_defconfig
powerpc ppa8548_defconfig
mips   ip27_defconfig
sh  rsk7201_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68k allyesconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc   allnoconfig
i386 randconfig-a004-20201125
i386 randconfig-a003-20201125
i386 randconfig-a002-20201125
i386 randconfig-a005-20201125
i386 randconfig-a001-20201125
i386 randconfig-a006-20201125
x86_64   randconfig

[powerpc:next-test] BUILD SUCCESS 6cc5522b62bbc176e1a5666c401466a37ffc746e

2020-11-25 Thread kernel test robot

defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc   allnoconfig
i386 randconfig-a004-20201125
i386 randconfig-a003-20201125
i386 randconfig-a002-20201125
i386 randconfig-a005-20201125
i386 randconfig-a001-20201125
i386 randconfig-a006-20201125
x86_64   randconfig-a015-20201125
x86_64   randconfig-a011-20201125
x86_64   randconfig-a014-20201125
x86_64   randconfig-a016-20201125
x86_64   randconfig-a012-20201125
x86_64   randconfig-a013-20201125
i386 randconfig-a012-20201125
i386 randconfig-a013-20201125
i386 randconfig-a011-20201125
i386 randconfig-a016-20201125
i386 randconfig-a014-20201125
i386 randconfig-a015-20201125
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a006-20201125
x86_64   randconfig-a005-20201125
x86_64   randconfig-a003-20201125
x86_64   randconfig-a004-20201125
x86_64   randconfig-a002-20201125
x86_64   randconfig-a001-20201125

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[powerpc:merge] BUILD SUCCESS 4c202167192a77481310a3cacae9f12618b92216

2020-11-25 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 4c202167192a77481310a3cacae9f12618b92216  Automatic merge of 
'next' into merge (2020-11-25 15:11)

elapsed time: 960m

configs tested: 130
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
powerpc xes_mpc85xx_defconfig
powerpc ep8248e_defconfig
powerpc sequoia_defconfig
armoxnas_v6_defconfig
arm   corgi_defconfig
powerpc   mpc834x_itxgp_defconfig
sh   allmodconfig
powerpc   lite5200b_defconfig
ia64  tiger_defconfig
sh   se7722_defconfig
arm  tct_hammer_defconfig
sh   se7721_defconfig
arm nhk8815_defconfig
mips  maltaaprp_defconfig
mipsar7_defconfig
shtitan_defconfig
powerpc mpc83xx_defconfig
powerpc  allmodconfig
m68kstmark2_defconfig
powerpc  mpc866_ads_defconfig
m68k apollo_defconfig
powerpc64   defconfig
sh apsh4a3a_defconfig
powerpc mpc512x_defconfig
s390defconfig
sh  rsk7264_defconfig
armvexpress_defconfig
mips  ath25_defconfig
powerpc canyonlands_defconfig
armpleb_defconfig
x86_64   alldefconfig
armneponset_defconfig
shmigor_defconfig
sh  rsk7203_defconfig
armmvebu_v7_defconfig
mips decstation_r4k_defconfig
parisc   alldefconfig
mips  rm200_defconfig
sh   sh7770_generic_defconfig
powerpcgamecube_defconfig
armtrizeps4_defconfig
powerpc mpc836x_mds_defconfig
mips  cavium_octeon_defconfig
sh kfr2r09-romimage_defconfig
arm mv78xx0_defconfig
mips  maltasmvp_defconfig
m68kdefconfig
shsh7763rdp_defconfig
sparcalldefconfig
armmagician_defconfig
powerpc tqm8548_defconfig
shsh7785lcr_defconfig
powerpc mpc8313_rdb_defconfig
xtensa   alldefconfig
arm  lpd270_defconfig
powerpc ppa8548_defconfig
arm pxa_defconfig
mips  malta_kvm_defconfig
riscvalldefconfig
c6xevmc6474_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc   allnoconfig
i386 randconfig-a004-20201125
i386 randconfig-a003-20201125
i386 randconfig-a002-20201125
i386 randconfig-a005-20201125
i386 randconfig-a001-20201125
i386 randconfig-a006-20201125
x86_64   randconfig-a015-20201125
x86_64

Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Greg Kurz

On Wed, 25 Nov 2020 16:42:30 +
Marc Zyngier  wrote:

> On 2020-11-25 16:24, Laurent Vivier wrote:
> > On 25/11/2020 17:05, Denis Kirjanov wrote:
> >> On 11/25/20, Laurent Vivier  wrote:
> >>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
> >>> 
> >>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ 
> >>> affinity")
> >>> this is broken on pseries.
> >> 
> >> Please add "Fixes" tag.
> > 
> > In fact, the code in commit 0d9f0a52c8b9f is correct.
> > 
> > The problem is with MSI/X irq affinity and pseries. So this patch
> > fixes more than virtio_scsi. I put this information because this
> > commit allows to clearly show the problem. Perhaps I should remove
> > this line in fact?
> 
> This patch does not fix virtio_scsi at all, which as you noticed, is
> correct. It really fixes the PPC MSI setup, which is starting to show
> its age. So getting rid of the reference seems like the right thing to 
> do.
> 
> I'm also not keen on the BugId thing. It should really be a lore link.
> I also cannot find any such tag in the kernel, nor is it a documented
> practice. The last reference to a Bugzilla entry seems to have happened
> with 786b5219081ff16 (five years ago).
> 

My bad, I suggested BugId to Laurent but the intent was actually BugLink,
which seems to be commonly used in the kernel.

Cheers,

--
Greg

> Thanks,
> 
>  M.

Re: [PATCH] net/ethernet/freescale: Fix incorrect IS_ERR_VALUE macro usages

2020-11-25 Thread Li Yang

On Tue, Nov 24, 2020 at 8:00 PM liwei (GF)  wrote:
>
> Hi Yang,
>
> On 2020/11/25 6:13, Li Yang wrote:
> > On Tue, Nov 24, 2020 at 3:44 PM Li Yang  wrote:
> >>
> >> On Tue, Nov 24, 2020 at 12:24 AM Wei Li  wrote:
> >>>
> >>> IS_ERR_VALUE macro should be used only with unsigned long type.
> >>> Especially it works incorrectly with unsigned shorter types on
> >>> 64bit machines.
> >>
> >> This is truly a problem for the driver to run on 64-bit architectures.
> >> But from an earlier discussion
> >> https://patchwork.kernel.org/project/linux-kbuild/patch/1464384685-347275-1-git-send-email-a...@arndb.de/,
> >> the preferred solution would be removing the IS_ERR_VALUE() usage or
> >> make the values to be unsigned long.
> >>
> >> It looks like we are having a bigger problem with the 64-bit support
> >> for the driver that the offset variables can also be real pointers
> >> which cannot be held with 32-bit data types(when uf_info->bd_mem_part
> >> == MEM_PART_SYSTEM).  So actually we have to change these offsets to
> >> unsigned long, otherwise we are having more serious issues on 64-bit
> >> systems.  Are you willing to make such changes or you want us to deal
> >> with it?
> >
> > Well, it looks like this hardware block was never integrated on a
> > 64-bit SoC and will very likely to keep so.  So probably we can keep
> > the driver 32-bit only.  It is currently limited to PPC32 in Kconfig,
> > how did you build it for 64-bit?
> >
> >>
>
> Thank you for providing the earlier discussion archive. In fact, this
> issue is detected by our static analysis tool.

Thanks for the effort, but this probably is a false positive for the
static analysis tool as the 64-bit case is not buildable.

>
> From my view, there is no harm to fix these potential misuses. But if you
> really have decided to keep the driver 32-bit only, please just ingore this 
> patch.

It is not an easy task to add proper 64-bit support, so probably we
just keep it 32-bit only for now.  Thanks for the patch anyway.

Regards,
Leo

>
> Thanks,
> Wei
>
> >>>
> >>> Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs 
> >>> instead of pointers")
> >>> Signed-off-by: Wei Li 
> >>> ---
> >>>  drivers/net/ethernet/freescale/ucc_geth.c | 30 +++
> >>>  1 file changed, 15 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
> >>> b/drivers/net/ethernet/freescale/ucc_geth.c
> >>> index 714b501be7d0..8656d9be256a 100644
> >>> --- a/drivers/net/ethernet/freescale/ucc_geth.c
> >>> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> >>> @@ -286,7 +286,7 @@ static int fill_init_enet_entries(struct 
> >>> ucc_geth_private *ugeth,
> >>> else {
> >>> init_enet_offset =
> >>> qe_muram_alloc(thread_size, thread_alignment);
> >>> -   if (IS_ERR_VALUE(init_enet_offset)) {
> >>> +   if (IS_ERR_VALUE((unsigned 
> >>> long)(int)init_enet_offset)) {
> >>> if (netif_msg_ifup(ugeth))
> >>> pr_err("Can not allocate DPRAM 
> >>> memory\n");
> >>> qe_put_snum((u8) snum);
> >>> @@ -2223,7 +2223,7 @@ static int ucc_geth_alloc_tx(struct 
> >>> ucc_geth_private *ugeth)
> >>> ugeth->tx_bd_ring_offset[j] =
> >>> qe_muram_alloc(length,
> >>>UCC_GETH_TX_BD_RING_ALIGNMENT);
> >>> -   if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j]))
> >>> +   if (!IS_ERR_VALUE((unsigned 
> >>> long)(int)ugeth->tx_bd_ring_offset[j]))
> >>> ugeth->p_tx_bd_ring[j] =
> >>> (u8 __iomem *) qe_muram_addr(ugeth->
> >>>  
> >>> tx_bd_ring_offset[j]);
> >>> @@ -2300,7 +2300,7 @@ static int ucc_geth_alloc_rx(struct 
> >>> ucc_geth_private *ugeth)
> >>> ugeth->rx_bd_ring_offset[j] =
> >>> qe_muram_alloc(length,
> >>>UCC_GETH_RX_BD_RING_ALIGNMENT);
> >>> -   if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j]))
> >>> +   if (!IS_ERR_VALUE((unsigned 
> >>> long)(int)ugeth->rx_bd_ring_offset[j]))
> >>> ugeth->p_rx_bd_ring[j] =
> >>> (u8 __iomem *) qe_muram_addr(ugeth->
> >>>  
> >>> rx_bd_ring_offset[j]);
> >>> @@ -2510,7 +2510,7 @@ static int ucc_geth_startup(struct ucc_geth_private 
> >>> *ugeth)
> >>> ugeth->tx_glbl_pram_offset =
> >>> qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram),
> >>>UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT);
> >>> -   if

Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Marc Zyngier


On 2020-11-25 16:24, Laurent Vivier wrote:

On 25/11/2020 17:05, Denis Kirjanov wrote:

On 11/25/20, Laurent Vivier  wrote:

With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ 
affinity")

this is broken on pseries.


Please add "Fixes" tag.


In fact, the code in commit 0d9f0a52c8b9f is correct.

The problem is with MSI/X irq affinity and pseries. So this patch
fixes more than virtio_scsi. I put this information because this
commit allows to clearly show the problem. Perhaps I should remove
this line in fact?


This patch does not fix virtio_scsi at all, which as you noticed, is
correct. It really fixes the PPC MSI setup, which is starting to show
its age. So getting rid of the reference seems like the right thing to 
do.


I'm also not keen on the BugId thing. It should really be a lore link.
I also cannot find any such tag in the kernel, nor is it a documented
practice. The last reference to a Bugzilla entry seems to have happened
with 786b5219081ff16 (five years ago).

Thanks,

M.
--
Jazz is not dead. It just smells funny...

Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Laurent Vivier

On 25/11/2020 17:05, Denis Kirjanov wrote:
> On 11/25/20, Laurent Vivier  wrote:
>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>>
>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
>> this is broken on pseries.
> 
> Please add "Fixes" tag.

In fact, the code in commit 0d9f0a52c8b9f is correct.

The problem is with MSI/X irq affinity and pseries. So this patch fixes more 
than
virtio_scsi. I put this information because this commit allows to clearly show 
the
problem. Perhaps I should remove this line in fact?

Thanks,
Laurent

> 
> Thanks!
> 
>>
>> The affinity is correctly computed in msi_desc but this is not applied
>> to the system IRQs.
>>
>> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
>> lost at this point and never passed to irq_domain_alloc_descs()
>> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
>> because irq_create_mapping() doesn't take an affinity parameter.
>>
>> As the previous patch has added the affinity parameter to
>> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
>> to irq_domain_alloc_descs().
>>
>> With this change, the virtqueues are correctly dispatched between the CPUs
>> on pseries.
>>
>> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
>> Signed-off-by: Laurent Vivier 
>> Reviewed-by: Greg Kurz 
>> ---
>>  arch/powerpc/platforms/pseries/msi.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/msi.c
>> b/arch/powerpc/platforms/pseries/msi.c
>> index 133f6adcb39c..b3ac2455faad 100644
>> --- a/arch/powerpc/platforms/pseries/msi.c
>> +++ b/arch/powerpc/platforms/pseries/msi.c
>> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int
>> nvec_in, int type)
>>  return hwirq;
>>  }
>>
>> -virq = irq_create_mapping(NULL, hwirq);
>> +virq = irq_create_mapping_affinity(NULL, hwirq,
>> +   entry->affinity);
>>
>>  if (!virq) {
>>  pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
>> --
>> 2.28.0
>>
>>
>

Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function

2020-11-25 Thread Greg Kurz

On Wed, 25 Nov 2020 12:16:56 +0100
Laurent Vivier  wrote:

> This function adds an affinity parameter to irq_create_mapping().
> This parameter is needed to pass it to irq_domain_alloc_descs().
> 
> irq_create_mapping() is a wrapper around irq_create_mapping_affinity()
> to pass NULL for the affinity parameter.
> 
> No functional change.
> 
> Signed-off-by: Laurent Vivier 
> ---

Reviewed-by: Greg Kurz 

>  include/linux/irqdomain.h | 12 ++--
>  kernel/irq/irqdomain.c| 13 -
>  2 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
> index 71535e87109f..ea5a337e0f8b 100644
> --- a/include/linux/irqdomain.h
> +++ b/include/linux/irqdomain.h
> @@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain 
> *domain,
>  extern void irq_domain_disassociate(struct irq_domain *domain,
>   unsigned int irq);
>  
> -extern unsigned int irq_create_mapping(struct irq_domain *host,
> -irq_hw_number_t hwirq);
> +extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
> +   irq_hw_number_t hwirq,
> +   const struct irq_affinity_desc *affinity);
>  extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
>  extern void irq_dispose_mapping(unsigned int virq);
>  
> +static inline unsigned int irq_create_mapping(struct irq_domain *host,
> +   irq_hw_number_t hwirq)
> +{
> + return irq_create_mapping_affinity(host, hwirq, NULL);
> +}
> +
> +
>  /**
>   * irq_linear_revmap() - Find a linux irq from a hw irq number.
>   * @domain: domain owning this hardware interrupt
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index cf8b374b892d..e4ca69608f3b 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct 
> irq_domain *domain)
>  EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
>  
>  /**
> - * irq_create_mapping() - Map a hardware interrupt into linux irq space
> + * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq 
> space
>   * @domain: domain owning this hardware interrupt or NULL for default domain
>   * @hwirq: hardware irq number in that domain space
> + * @affinity: irq affinity
>   *
>   * Only one mapping per hardware interrupt is permitted. Returns a linux
>   * irq number.
>   * If the sense/trigger is to be specified, set_irq_type() should be called
>   * on the number returned from that call.
>   */
> -unsigned int irq_create_mapping(struct irq_domain *domain,
> - irq_hw_number_t hwirq)
> +unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
> +irq_hw_number_t hwirq,
> +const struct irq_affinity_desc *affinity)
>  {
>   struct device_node *of_node;
>   int virq;
> @@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
>   }
>  
>   /* Allocate a virtual interrupt number */
> - virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), 
> NULL);
> + virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
> +   affinity);
>   if (virq <= 0) {
>   pr_debug("-> virq allocation failed\n");
>   return 0;
> @@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
>  
>   return virq;
>  }
> -EXPORT_SYMBOL_GPL(irq_create_mapping);
> +EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
>  
>  /**
>   * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs

Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Denis Kirjanov

On 11/25/20, Laurent Vivier  wrote:
> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>
> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
> this is broken on pseries.

Please add "Fixes" tag.

Thanks!

>
> The affinity is correctly computed in msi_desc but this is not applied
> to the system IRQs.
>
> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
> lost at this point and never passed to irq_domain_alloc_descs()
> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
> because irq_create_mapping() doesn't take an affinity parameter.
>
> As the previous patch has added the affinity parameter to
> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs().
>
> With this change, the virtqueues are correctly dispatched between the CPUs
> on pseries.
>
> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Greg Kurz 
> ---
>  arch/powerpc/platforms/pseries/msi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/msi.c
> b/arch/powerpc/platforms/pseries/msi.c
> index 133f6adcb39c..b3ac2455faad 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int
> nvec_in, int type)
>   return hwirq;
>   }
>
> - virq = irq_create_mapping(NULL, hwirq);
> + virq = irq_create_mapping_affinity(NULL, hwirq,
> +entry->affinity);
>
>   if (!virq) {
>   pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> --
> 2.28.0
>
>

[PATCH V4 4/5] ocxl: Add mmu notifier

2020-11-25 Thread Christophe Lombard

Add invalidate_range mmu notifier, when required (ATSD access of MMIO
registers is available), to initiate TLB invalidation commands.
For the time being, the ATSD0 set of registers is used by default.

The pasid and bdf values have to be configured in the Process Element
Entry.
The PEE must be set up to match the BDF/PASID of the AFU.

Acked-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/ocxl/link.c | 62 +++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 77381dda2c45..129d4eddc4d2 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -2,8 +2,10 @@
 // Copyright 2017 IBM Corp.
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +35,7 @@
 
 #define SPA_PE_VALID   0x8000
 
+struct ocxl_link;
 
 struct pe_data {
struct mm_struct *mm;
@@ -41,6 +44,8 @@ struct pe_data {
/* opaque pointer to be passed to the above callback */
void *xsl_err_data;
struct rcu_head rcu;
+   struct ocxl_link *link;
+   struct mmu_notifier mmu_notifier;
 };
 
 struct spa {
@@ -83,6 +88,8 @@ struct ocxl_link {
int domain;
int bus;
int dev;
+   void __iomem *arva; /* ATSD register virtual address */
+   spinlock_t atsd_lock;   /* to serialize shootdowns */
atomic_t irq_available;
struct spa *spa;
void *platform_data;
@@ -388,6 +395,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
link->bus = dev->bus->number;
link->dev = PCI_SLOT(dev->devfn);
atomic_set(>irq_available, MAX_IRQ_PER_LINK);
+   spin_lock_init(>atsd_lock);
 
rc = alloc_spa(dev, link);
if (rc)
@@ -403,6 +411,13 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
if (rc)
goto err_xsl_irq;
 
+   /* if link->arva is not defeined, MMIO registers are not used to
+* generate TLB invalidate. PowerBus snooping is enabled.
+* Otherwise, PowerBus snooping is disabled. TLB Invalidates are
+* initiated using MMIO registers.
+*/
+   pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0, >arva);
+
*out_link = link;
return 0;
 
@@ -454,6 +469,11 @@ static void release_xsl(struct kref *ref)
 {
struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
 
+   if (link->arva) {
+   pnv_ocxl_unmap_lpar(link->arva);
+   link->arva = NULL;
+   }
+
list_del(>list);
/* call platform code before releasing data */
pnv_ocxl_spa_release(link->platform_data);
@@ -470,6 +490,26 @@ void ocxl_link_release(struct pci_dev *dev, void 
*link_handle)
 }
 EXPORT_SYMBOL_GPL(ocxl_link_release);
 
+static void invalidate_range(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end)
+{
+   struct pe_data *pe_data = container_of(mn, struct pe_data, 
mmu_notifier);
+   struct ocxl_link *link = pe_data->link;
+   unsigned long addr, pid, page_size = PAGE_SIZE;
+
+   pid = mm->context.id;
+
+   spin_lock(>atsd_lock);
+   for (addr = start; addr < end; addr += page_size)
+   pnv_ocxl_tlb_invalidate(link->arva, pid, addr, page_size);
+   spin_unlock(>atsd_lock);
+}
+
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
+   .invalidate_range = invalidate_range,
+};
+
 static u64 calculate_cfg_state(bool kernel)
 {
u64 state;
@@ -526,6 +566,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
pe_data->mm = mm;
pe_data->xsl_err_cb = xsl_err_cb;
pe_data->xsl_err_data = xsl_err_data;
+   pe_data->link = link;
+   pe_data->mmu_notifier.ops = _mmu_notifier_ops;
 
memset(pe, 0, sizeof(struct ocxl_process_element));
pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
@@ -542,8 +584,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 * by the nest MMU. If we have a kernel context, TLBIs are
 * already global.
 */
-   if (mm)
+   if (mm) {
mm_context_add_copro(mm);
+   if (link->arva) {
+   /* Use MMIO registers for the TLB Invalidate
+* operations.
+*/
+   mmu_notifier_register(_data->mmu_notifier, mm);
+   }
+   }
+
/*
 * Barrier is to make sure PE is visible in the SPA before it
 * is used by the device. It also helps with the global TLBI
@@ -674,6 +724,16 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
WARN(1, "Couldn't find pe data when removing PE\n");
} else {
if

[PATCH V4 2/5] ocxl: Initiate a TLB invalidate command

2020-11-25 Thread Christophe Lombard

When a TLB Invalidate is required for the Logical Partition, the following
sequence has to be performed:

1. Load MMIO ATSD AVA register with the necessary value, if required.
2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
command.
3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
   has been completed.

Signed-off-by: Christophe Lombard 
---
 arch/powerpc/include/asm/pnv-ocxl.h   | 51 
 arch/powerpc/platforms/powernv/ocxl.c | 69 +++
 2 files changed, 120 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 60c3c74427d9..9acd1fbf1197 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -3,12 +3,59 @@
 #ifndef _ASM_PNV_OCXL_H
 #define _ASM_PNV_OCXL_H
 
+#include 
 #include 
 
 #define PNV_OCXL_TL_MAX_TEMPLATE63
 #define PNV_OCXL_TL_BITS_PER_RATE   4
 #define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
 
+#define PNV_OCXL_ATSD_TIMEOUT  1
+
+/* TLB Management Instructions */
+#define PNV_OCXL_ATSD_LNCH 0x00
+/* Radix Invalidate */
+#define   PNV_OCXL_ATSD_LNCH_R PPC_BIT(0)
+/* Radix Invalidation Control
+ * 0b00 Just invalidate TLB.
+ * 0b01 Invalidate just Page Walk Cache.
+ * 0b10 Invalidate TLB, Page Walk Cache, and any
+ * caching of Partition and Process Table Entries.
+ */
+#define   PNV_OCXL_ATSD_LNCH_RIC   PPC_BITMASK(1, 2)
+/* Number and Page Size of translations to be invalidated */
+#define   PNV_OCXL_ATSD_LNCH_LPPPC_BITMASK(3, 10)
+/* Invalidation Criteria
+ * 0b00 Invalidate just the target VA.
+ * 0b01 Invalidate matching PID.
+ */
+#define   PNV_OCXL_ATSD_LNCH_ISPPC_BITMASK(11, 12)
+/* 0b1: Process Scope, 0b0: Partition Scope */
+#define   PNV_OCXL_ATSD_LNCH_PRS   PPC_BIT(13)
+/* Invalidation Flag */
+#define   PNV_OCXL_ATSD_LNCH_B PPC_BIT(14)
+/* Actual Page Size to be invalidated
+ * 000 4KB
+ * 101 64KB
+ * 001 2MB
+ * 010 1GB
+ */
+#define   PNV_OCXL_ATSD_LNCH_APPPC_BITMASK(15, 17)
+/* Defines the large page select
+ * L=0b0 for 4KB pages
+ * L=0b1 for large pages)
+ */
+#define   PNV_OCXL_ATSD_LNCH_L PPC_BIT(18)
+/* Process ID */
+#define   PNV_OCXL_ATSD_LNCH_PID   PPC_BITMASK(19, 38)
+/* NoFlush – Assumed to be 0b0 */
+#define   PNV_OCXL_ATSD_LNCH_F PPC_BIT(39)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SLBIPPC_BIT(40)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON   PPC_BIT(41)
+#define PNV_OCXL_ATSD_AVA  0x08
+#define   PNV_OCXL_ATSD_AVA_AVAPPC_BITMASK(0, 51)
+#define PNV_OCXL_ATSD_STAT 0x10
+
 int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 
*supported);
 int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
 
@@ -31,4 +78,8 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, 
int pe_handle);
 int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
  uint64_t lpcr, void __iomem **arva);
 void pnv_ocxl_unmap_lpar(void __iomem *arva);
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+unsigned long pid,
+unsigned long addr,
+unsigned long page_size);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index 57fc1062677b..9105efcf242a 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -528,3 +528,72 @@ void pnv_ocxl_unmap_lpar(void __iomem *arva)
iounmap(arva);
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
+
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+unsigned long pid,
+unsigned long addr,
+unsigned long page_size)
+{
+   unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT);
+   u64 val = 0ull;
+   int pend;
+   u8 size;
+
+   if (!(arva))
+   return;
+
+   if (addr) {
+   /* load Abbreviated Virtual Address register with
+* the necessary value
+*/
+   val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51));
+   out_be64(arva + PNV_OCXL_ATSD_AVA, val);
+   }
+
+   /* Write access initiates a shoot down to initiate the
+* TLB Invalidate command
+*/
+   val = PNV_OCXL_ATSD_LNCH_R;
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10);
+   if (addr)
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00);
+   else {
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01);
+   val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;
+   }
+   val |= PNV_OCXL_ATSD_LNCH_PRS;
+   /* Actual Page Size to be invalidated
+* 000 4KB
+* 101 64KB
+

[PATCH V4 5/5] ocxl: Add new kernel traces

2020-11-25 Thread Christophe Lombard

Add specific kernel traces which provide information on mmu notifier and on
pages range.

Acked-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/ocxl/link.c  |  4 +++
 drivers/misc/ocxl/trace.h | 64 +++
 2 files changed, 68 insertions(+)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 129d4eddc4d2..ab039c115381 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -499,6 +499,7 @@ static void invalidate_range(struct mmu_notifier *mn,
unsigned long addr, pid, page_size = PAGE_SIZE;
 
pid = mm->context.id;
+   trace_ocxl_mmu_notifier_range(start, end, pid);
 
spin_lock(>atsd_lock);
for (addr = start; addr < end; addr += page_size)
@@ -590,6 +591,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
/* Use MMIO registers for the TLB Invalidate
 * operations.
 */
+   trace_ocxl_init_mmu_notifier(pasid, mm->context.id);
mmu_notifier_register(_data->mmu_notifier, mm);
}
}
@@ -725,6 +727,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
} else {
if (pe_data->mm) {
if (link->arva) {
+   trace_ocxl_release_mmu_notifier(pasid,
+   
pe_data->mm->context.id);
mmu_notifier_unregister(_data->mmu_notifier,
pe_data->mm);
spin_lock(>atsd_lock);
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..a33a5094ff6c 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -8,6 +8,70 @@
 
 #include 
 
+
+TRACE_EVENT(ocxl_mmu_notifier_range,
+   TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr),
+   TP_ARGS(start, end, pidr),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->start = start;
+   __entry->end = end;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("start=0x%lx end=0x%lx pidr=0x%lx",
+   __entry->start,
+   __entry->end,
+   __entry->pidr
+   )
+);
+
+TRACE_EVENT(ocxl_init_mmu_notifier,
+   TP_PROTO(int pasid, unsigned long pidr),
+   TP_ARGS(pasid, pidr),
+
+   TP_STRUCT__entry(
+   __field(int, pasid)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pasid = pasid;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("pasid=%d, pidr=0x%lx",
+   __entry->pasid,
+   __entry->pidr
+   )
+);
+
+TRACE_EVENT(ocxl_release_mmu_notifier,
+   TP_PROTO(int pasid, unsigned long pidr),
+   TP_ARGS(pasid, pidr),
+
+   TP_STRUCT__entry(
+   __field(int, pasid)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pasid = pasid;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("pasid=%d, pidr=0x%lx",
+   __entry->pasid,
+   __entry->pidr
+   )
+);
+
 DECLARE_EVENT_CLASS(ocxl_context,
TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr),
-- 
2.28.0

[PATCH V4 1/5] ocxl: Assign a register set to a Logical Partition

2020-11-25 Thread Christophe Lombard

Platform specific function to assign a register set to a Logical Partition.
The "ibm,mmio-atsd" property, provided by the firmware, contains the 16
base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO
registers (XTS MMIO ATSDx LPARID/AVA/launch/status register).

For the time being, the ATSD0 set of registers is used by default.

Acked-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 arch/powerpc/include/asm/pnv-ocxl.h   |  3 ++
 arch/powerpc/platforms/powernv/ocxl.c | 45 +++
 2 files changed, 48 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index d37ededca3ee..60c3c74427d9 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, 
int PE_mask, void **p
 void pnv_ocxl_spa_release(void *platform_data);
 int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
 
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+ uint64_t lpcr, void __iomem **arva);
+void pnv_ocxl_unmap_lpar(void __iomem *arva);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index ecdad219d704..57fc1062677b 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -483,3 +483,48 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, 
int pe_handle)
return rc;
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
+
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+ uint64_t lpcr, void __iomem **arva)
+{
+   struct pci_controller *hose = pci_bus_to_host(dev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   u64 mmio_atsd;
+   int rc;
+
+   /* ATSD physical address.
+* ATSD LAUNCH register: write access initiates a shoot down to
+* initiate the TLB Invalidate command.
+*/
+   rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
+   0, _atsd);
+   if (rc) {
+   dev_info(>dev, "No available ATSD found\n");
+   return rc;
+   }
+
+   /* Assign a register set to a Logical Partition and MMIO ATSD
+* LPARID register to the required value.
+*/
+   rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev),
+  lparid, lpcr);
+   if (rc) {
+   dev_err(>dev, "Error mapping device to LPAR: %d\n", rc);
+   return rc;
+   }
+
+   *arva = ioremap(mmio_atsd, 24);
+   if (!(*arva)) {
+   dev_warn(>dev, "ioremap failed - mmio_atsd: %#llx\n", 
mmio_atsd);
+   rc = -ENOMEM;
+   }
+
+   return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar);
+
+void pnv_ocxl_unmap_lpar(void __iomem *arva)
+{
+   iounmap(arva);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
-- 
2.28.0

[PATCH V4 3/5] ocxl: Update the Process Element Entry

2020-11-25 Thread Christophe Lombard

To complete the MMIO based mechanism, the fields: PASID, bus, device and
function of the Process Element Entry have to be filled. (See
OpenCAPI Power Platform Architecture document)

   Hypervisor Process Element Entry
Word
0 1  7  8  .. 12  13 ..15  16 19  20 ... 31
0  OSL Configuration State (0:31)
1  OSL Configuration State (32:63)
2   PASID  |Reserved
3   Bus   |   Device|Function |Reserved
4 Reserved
5 Reserved
6   

Acked-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/ocxl/context.c   | 4 +++-
 drivers/misc/ocxl/link.c  | 4 +++-
 drivers/misc/ocxl/ocxl_internal.h | 9 ++---
 drivers/scsi/cxlflash/ocxl_hw.c   | 6 --
 include/misc/ocxl.h   | 2 +-
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index c21f65a5c762..9eb0d93b01c6 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, 
struct mm_struct *mm)
 {
int rc;
unsigned long pidr = 0;
+   struct pci_dev *dev;
 
// Locks both status & tidr
mutex_lock(>status_mutex);
@@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, 
struct mm_struct *mm)
if (mm)
pidr = mm->context.id;
 
+   dev = to_pci_dev(ctx->afu->fn->dev.parent);
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr,
- amr, mm, xsl_fault_error, ctx);
+ amr, pci_dev_id(dev), mm, xsl_fault_error, ctx);
if (rc)
goto out;
 
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fd73d3bc0eb6..77381dda2c45 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel)
 }
 
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-   u64 amr, struct mm_struct *mm,
+   u64 amr, u16 bdf, struct mm_struct *mm,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
 {
@@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 
memset(pe, 0, sizeof(struct ocxl_process_element));
pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
+   pe->pasid = cpu_to_be32(pasid << (31 - 19));
+   pe->bdf = cpu_to_be16(bdf);
pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
pe->pid = cpu_to_be32(pidr);
pe->tid = cpu_to_be32(tidr);
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 0bad0a123af6..10125a22d5a5 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -84,13 +84,16 @@ struct ocxl_context {
 
 struct ocxl_process_element {
__be64 config_state;
-   __be32 reserved1[11];
+   __be32 pasid;
+   __be16 bdf;
+   __be16 reserved1;
+   __be32 reserved2[9];
__be32 lpid;
__be32 tid;
__be32 pid;
-   __be32 reserved2[10];
+   __be32 reserved3[10];
__be64 amr;
-   __be32 reserved3[3];
+   __be32 reserved4[3];
__be32 software_state;
 };
 
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index e4e0d767b98e..244fc27215dc 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx)
struct ocxl_hw_afu *afu = ctx->hw_afu;
struct ocxl_afu_config *acfg = >acfg;
void *link_token = afu->link_token;
+   struct pci_dev *pdev = afu->pdev;
struct device *dev = afu->dev;
bool master = ctx->master;
struct mm_struct *mm;
@@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx)
mm = current->mm;
}
 
-   rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm,
- ocxlflash_xsl_fault, ctx);
+   rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0,
+ pci_dev_id(pdev), mm, ocxlflash_xsl_fault,
+ ctx);
if (unlikely(rc)) {
dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n",
__func__, rc);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index e013736e275d..3ed736da02c8 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void 
*link_handle);
  * defined
  */
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-   u64 amr, struct mm_struct *mm,
+

[PATCH V4 0/5] ocxl: Mmio invalidation support

2020-11-25 Thread Christophe Lombard

OpenCAPI 4.0/5.0 with TLBI/SLBI Snooping, is not used due to performance
problems caused by the PAU having to process all incoming TLBI/SLBI
commands which will cause them to back up on the PowerBus.

When the Address Translation Mode requires TLB operations to be initiated
using MMIO registers, a set of registers like the following is used:
• XTS MMIO ATSD0 LPARID register
• XTS MMIO ATSD0 AVA register
• XTS MMIO ATSD0 launch register, write access initiates a shoot down
• XTS MMIO ATSD0 status register

The MMIO based mechanism also blocks the NPU/PAU from snooping TLBIE
commands from the PowerBus.

The Shootdown commands (ATSD) will be generated using MMIO registers
in the NPU/PAU and sent to the device.

Signed-off-by: Christophe Lombard 

---
Changelog[v4]
 - Rebase to latest upstream.
 - Correct a typo in page size

Changelog[v3]
 - Rebase to latest upstream.
 - Add page_size argument in pnv_ocxl_tlb_invalidate()
 - Remove double pointer
 
Changelog[v2]
 - Rebase to latest upstream.
 - Create a set of smaller patches
 - Move the device tree parsing and ioremap() for the shootdown page in a
   platform-specific file (powernv)
 - Release the shootdown page in release_xsl()
 - Initialize atsd_lock
 - Move the code to initiate the TLB Invalidate command in a
   platform-specific file (powernv)
 - Use the notifier invalidate_range
---

Christophe Lombard (5):
  ocxl: Assign a register set to a Logical Partition
  ocxl: Initiate a TLB invalidate command
  ocxl: Update the Process Element Entry
  ocxl: Add mmu notifier
  ocxl: Add new kernel traces

 arch/powerpc/include/asm/pnv-ocxl.h   |  54 
 arch/powerpc/platforms/powernv/ocxl.c | 114 ++
 drivers/misc/ocxl/context.c   |   4 +-
 drivers/misc/ocxl/link.c  |  70 +++-
 drivers/misc/ocxl/ocxl_internal.h |   9 +-
 drivers/misc/ocxl/trace.h |  64 +++
 drivers/scsi/cxlflash/ocxl_hw.c   |   6 +-
 include/misc/ocxl.h   |   2 +-
 8 files changed, 314 insertions(+), 9 deletions(-)

-- 
2.28.0

Re: [PATCH net 1/2] ibmvnic: Ensure that SCRQ entry reads are correctly ordered

2020-11-25 Thread Thomas Falcon


On 11/24/20 11:43 PM, Michael Ellerman wrote:

Thomas Falcon  writes:

Ensure that received Subordinate Command-Response Queue (SCRQ)
entries are properly read in order by the driver. These queues
are used in the ibmvnic device to process RX buffer and TX completion
descriptors. dma_rmb barriers have been added after checking for a
pending descriptor to ensure the correct descriptor entry is checked
and after reading the SCRQ descriptor to ensure the entire
descriptor is read before processing.

Fixes: 032c5e828 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Thomas Falcon 
---
  drivers/net/ethernet/ibm/ibmvnic.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 2aa40b2..489ed5e 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2403,6 +2403,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int 
budget)
  
  		if (!pending_scrq(adapter, adapter->rx_scrq[scrq_num]))

break;
+   /* ensure that we do not prematurely exit the polling loop */
+   dma_rmb();

I'd be happier if these comments were more specific about which read(s)
they are ordering vs which other read(s).

I'm sure it's obvious to you, but it may not be to a future author,
and/or after the code has been refactored over time.


Thank you for reviewing! I will submit a v2 soon with clearer comments 
on the reads being ordered here.


Thanks,

Tom





next = ibmvnic_next_scrq(adapter, adapter->rx_scrq[scrq_num]);
rx_buff =
(struct ibmvnic_rx_buff *)be64_to_cpu(next->
@@ -3098,6 +3100,9 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter 
*adapter,
unsigned int pool = scrq->pool_index;
int num_entries = 0;
  
+		/* ensure that the correct descriptor entry is read */

+   dma_rmb();
+
next = ibmvnic_next_scrq(adapter, scrq);
for (i = 0; i < next->tx_comp.num_comps; i++) {
if (next->tx_comp.rcs[i]) {
@@ -3498,6 +3503,9 @@ static union sub_crq *ibmvnic_next_scrq(struct 
ibmvnic_adapter *adapter,
}
spin_unlock_irqrestore(>lock, flags);
  
+	/* ensure that the entire SCRQ descriptor is read */

+   dma_rmb();
+
return entry;
  }

cheers

[PATCH v3 1/2] genirq/irqdomain: Add an irq_create_mapping_affinity() function

2020-11-25 Thread Laurent Vivier

There is currently no way to convey the affinity of an interrupt
via irq_create_mapping(), which creates issues for devices that
expect that affinity to be managed by the kernel.

In order to sort this out, rename irq_create_mapping() to
irq_create_mapping_affinity() with an additional affinity parameter
that can conveniently passed down to irq_domain_alloc_descs().

irq_create_mapping() is then re-implemented as a wrapper around
irq_create_mapping_affinity().

Signed-off-by: Laurent Vivier 
Reviewed-by: Greg Kurz 
---
 include/linux/irqdomain.h | 12 ++--
 kernel/irq/irqdomain.c| 13 -
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 71535e87109f..ea5a337e0f8b 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain 
*domain,
 extern void irq_domain_disassociate(struct irq_domain *domain,
unsigned int irq);
 
-extern unsigned int irq_create_mapping(struct irq_domain *host,
-  irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity);
 extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
 extern void irq_dispose_mapping(unsigned int virq);
 
+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+ irq_hw_number_t hwirq)
+{
+   return irq_create_mapping_affinity(host, hwirq, NULL);
+}
+
+
 /**
  * irq_linear_revmap() - Find a linux irq from a hw irq number.
  * @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..e4ca69608f3b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain 
*domain)
 EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 
 /**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq 
space
  * @domain: domain owning this hardware interrupt or NULL for default domain
  * @hwirq: hardware irq number in that domain space
+ * @affinity: irq affinity
  *
  * Only one mapping per hardware interrupt is permitted. Returns a linux
  * irq number.
  * If the sense/trigger is to be specified, set_irq_type() should be called
  * on the number returned from that call.
  */
-unsigned int irq_create_mapping(struct irq_domain *domain,
-   irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
+  irq_hw_number_t hwirq,
+  const struct irq_affinity_desc *affinity)
 {
struct device_node *of_node;
int virq;
@@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
}
 
/* Allocate a virtual interrupt number */
-   virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), 
NULL);
+   virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+ affinity);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
@@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 
return virq;
 }
-EXPORT_SYMBOL_GPL(irq_create_mapping);
+EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
 
 /**
  * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
-- 
2.28.0

[PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Laurent Vivier

With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
this is broken on pseries.

The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.

It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.

As the previous patch has added the affinity parameter to
irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
to irq_domain_alloc_descs().

With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
Signed-off-by: Laurent Vivier 
Reviewed-by: Greg Kurz 
---
 arch/powerpc/platforms/pseries/msi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 133f6adcb39c..b3ac2455faad 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
return hwirq;
}
 
-   virq = irq_create_mapping(NULL, hwirq);
+   virq = irq_create_mapping_affinity(NULL, hwirq,
+  entry->affinity);
 
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.28.0

[PATCH v3 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries

2020-11-25 Thread Laurent Vivier

With virtio, in multiqueue case, each queue IRQ is normally
bound to a different CPU using the affinity mask.

This works fine on x86_64 but totally ignored on pseries.

This is not obvious at first look because irqbalance is doing
some balancing to improve that.

It appears that the "managed" flag set in the MSI entry
is never copied to the system IRQ entry.

This series passes the affinity mask from rtas_setup_msi_irqs()
to irq_domain_alloc_descs() by adding an affinity parameter to
irq_create_mapping().

The first patch adds the parameter (no functional change), the
second patch passes the actual affinity mask to irq_create_mapping()
in rtas_setup_msi_irqs().

For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:

... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32

for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
for file in /proc/irq/$IRQ/ ; do
echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
done
done

Without the patch (and without irqbalanced)

IRQ: 268 CPU: 0-31
IRQ: 269 CPU: 0-31
IRQ: 270 CPU: 0-31
IRQ: 271 CPU: 0-31
IRQ: 272 CPU: 0-31
IRQ: 273 CPU: 0-31
IRQ: 274 CPU: 0-31
IRQ: 275 CPU: 0-31
IRQ: 276 CPU: 0-31
IRQ: 277 CPU: 0-31
IRQ: 278 CPU: 0-31
IRQ: 279 CPU: 0-31
IRQ: 280 CPU: 0-31
IRQ: 281 CPU: 0-31
IRQ: 282 CPU: 0-31
IRQ: 283 CPU: 0-31
IRQ: 284 CPU: 0-31
IRQ: 285 CPU: 0-31
IRQ: 286 CPU: 0-31
IRQ: 287 CPU: 0-31
IRQ: 288 CPU: 0-31
IRQ: 289 CPU: 0-31
IRQ: 290 CPU: 0-31
IRQ: 291 CPU: 0-31
IRQ: 292 CPU: 0-31
IRQ: 293 CPU: 0-31
IRQ: 294 CPU: 0-31
IRQ: 295 CPU: 0-31
IRQ: 296 CPU: 0-31
IRQ: 297 CPU: 0-31
IRQ: 298 CPU: 0-31
IRQ: 299 CPU: 0-31

With the patch:

IRQ: 265 CPU: 0
IRQ: 266 CPU: 1
IRQ: 267 CPU: 2
IRQ: 268 CPU: 3
IRQ: 269 CPU: 4
IRQ: 270 CPU: 5
IRQ: 271 CPU: 6
IRQ: 272 CPU: 7
IRQ: 273 CPU: 8
IRQ: 274 CPU: 9
IRQ: 275 CPU: 10
IRQ: 276 CPU: 11
IRQ: 277 CPU: 12
IRQ: 278 CPU: 13
IRQ: 279 CPU: 14
IRQ: 280 CPU: 15
IRQ: 281 CPU: 16
IRQ: 282 CPU: 17
IRQ: 283 CPU: 18
IRQ: 284 CPU: 19
IRQ: 285 CPU: 20
IRQ: 286 CPU: 21
IRQ: 287 CPU: 22
IRQ: 288 CPU: 23
IRQ: 289 CPU: 24
IRQ: 290 CPU: 25
IRQ: 291 CPU: 26
IRQ: 292 CPU: 27
IRQ: 293 CPU: 28
IRQ: 294 CPU: 29
IRQ: 295 CPU: 30
IRQ: 299 CPU: 31

This matches what we have on an x86_64 system.

v3: update changelog of PATCH 1 with comments from Thomas Gleixner and
Marc Zyngier.
v2: add a wrapper around original irq_create_mapping() with the
affinity parameter. Update comments

Laurent Vivier (2):
  genirq/irqdomain: Add an irq_create_mapping_affinity() function
  powerpc/pseries: pass MSI affinity to irq_create_mapping()

 arch/powerpc/platforms/pseries/msi.c |  3 ++-
 include/linux/irqdomain.h| 12 ++--
 kernel/irq/irqdomain.c   | 13 -
 3 files changed, 20 insertions(+), 8 deletions(-)

-- 
2.28.0

Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function

2020-11-25 Thread Laurent Vivier

On 25/11/2020 15:54, Marc Zyngier wrote:
> On 2020-11-25 14:09, Laurent Vivier wrote:
>> On 25/11/2020 14:20, Thomas Gleixner wrote:
>>> Laurent,
>>>
>>> On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:
>>>
>>> The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter
>>> after the colon wants to be uppercase.
>>
>> Ok.
>>
 This function adds an affinity parameter to irq_create_mapping().
 This parameter is needed to pass it to irq_domain_alloc_descs().
>>>
>>> A changelog has to explain the WHY. 'The parameter is needed' is not
>>> really useful information.
>>>
>>
>> The reason of this change is explained in PATCH 2.
>>
>> I have two patches, one to change the interface with no functional
>> change (PATCH 1) and
>> one to fix the problem (PATCH 2). Moreover they don't cover the same 
>> subsystems.
>>
>> I can either:
>> - merge the two patches
>> - or make a reference in the changelog of PATCH 1 to PATCH 2
>>   (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to
>>    irq_create_mapping()")")
>> - or copy some information from PATCH 2
>>   (something like "this parameter is needed by rtas_setup_msi_irqs()
>> to pass the affinity
>>    to irq_domain_alloc_descs() to fix multiqueue affinity")
>>
>> What do you prefer?
> 
> How about something like this for the first patch:
> 
> "There is currently no way to convey the affinity of an interrupt
>  via irq_create_mapping(), which creates issues for devices that
>  expect that affinity to be managed by the kernel.
> 
>  In order to sort this out, rename irq_create_mapping() to
>  irq_create_mapping_affinity() with an additional affinity parameter
>  that can conveniently passed down to irq_domain_alloc_descs().
> 
>  irq_create_mapping() is then re-implemented as a wrapper around
>  irq_create_mapping_affinity()."

It looks perfect. I update the changelog with that.

Thanks,
Laurent

Re: [PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS

2020-11-25 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :


> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
>> index b1d091a97611..7dc71f85683d 100644
>> --- a/arch/powerpc/mm/book3s64/pkeys.c
>> +++ b/arch/powerpc/mm/book3s64/pkeys.c
>> @@ -89,12 +89,14 @@ static int scan_pkey_feature(void)
>>  }
>>  }
>>   
>> +#ifdef CONFIG_PPC_MEM_KEYS
>>  /*
>>   * Adjust the upper limit, based on the number of bits supported by
>>   * arch-neutral code.
>>   */
>>  pkeys_total = min_t(int, pkeys_total,
>>  ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));
>
> I don't think we need an #ifdef here. I thing an 'if 
> (IS_ENABLED(CONFIG_PPC_MEM_KEYS))' should make it.

ppc64/arch/powerpc/mm/book3s64/pkeys.c: In function ‘scan_pkey_feature’:
 
ppc64/arch/powerpc/mm/book3s64/pkeys.c:98:33: error: ‘VM_PKEY_SHIFT’ undeclared 
(first use in this function) 
   98 | ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));   
 
  | ^   
   
pkey headers only include arch headers if PPC_MEM_KEYS is enabled. ie, 

#ifdef CONFIG_ARCH_HAS_PKEYS
#include 
#else /* ! CONFIG_ARCH_HAS_PKEYS */
#define arch_max_pkey() (1)
#define execute_only_pkey(mm) (0)
#define arch_override_mprotect_pkey(vma, prot, pkey) (0)
#define PKEY_DEDICATED_EXECUTE_ONLY 0
#define ARCH_VM_PKEY_FLAGS 0
..

Sorting that out should be another patch series. 



>
>> +#endif
>>  return pkeys_total;
>>   }

Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function

2020-11-25 Thread Marc Zyngier


On 2020-11-25 14:09, Laurent Vivier wrote:

On 25/11/2020 14:20, Thomas Gleixner wrote:

Laurent,

On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:

The proper subsystem prefix is: 'genirq/irqdomain:' and the first 
letter

after the colon wants to be uppercase.


Ok.


This function adds an affinity parameter to irq_create_mapping().
This parameter is needed to pass it to irq_domain_alloc_descs().


A changelog has to explain the WHY. 'The parameter is needed' is not
really useful information.



The reason of this change is explained in PATCH 2.

I have two patches, one to change the interface with no functional
change (PATCH 1) and
one to fix the problem (PATCH 2). Moreover they don't cover the same 
subsystems.


I can either:
- merge the two patches
- or make a reference in the changelog of PATCH 1 to PATCH 2
  (something like "(see folowing patch "powerpc/pseries: pass MSI 
affinity to

   irq_create_mapping()")")
- or copy some information from PATCH 2
  (something like "this parameter is needed by rtas_setup_msi_irqs()
to pass the affinity
   to irq_domain_alloc_descs() to fix multiqueue affinity")

What do you prefer?


How about something like this for the first patch:

"There is currently no way to convey the affinity of an interrupt
 via irq_create_mapping(), which creates issues for devices that
 expect that affinity to be managed by the kernel.

 In order to sort this out, rename irq_create_mapping() to
 irq_create_mapping_affinity() with an additional affinity parameter
 that can conveniently passed down to irq_domain_alloc_descs().

 irq_create_mapping() is then re-implemented as a wrapper around
 irq_create_mapping_affinity()."

Thanks,

M.
--
Jazz is not dead. It just smells funny...

Re: [PATCH V3 5/5] ocxl: Add new kernel traces

2020-11-25 Thread Frederic Barrat





On 24/11/2020 10:58, Christophe Lombard wrote:

Add specific kernel traces which provide information on mmu notifier and on
pages range.

Signed-off-by: Christophe Lombard 
---



Acked-by: Frederic Barrat 



  drivers/misc/ocxl/link.c  |  4 +++
  drivers/misc/ocxl/trace.h | 64 +++
  2 files changed, 68 insertions(+)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 129d4eddc4d2..ab039c115381 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -499,6 +499,7 @@ static void invalidate_range(struct mmu_notifier *mn,
unsigned long addr, pid, page_size = PAGE_SIZE;

pid = mm->context.id;
+   trace_ocxl_mmu_notifier_range(start, end, pid);

spin_lock(>atsd_lock);
for (addr = start; addr < end; addr += page_size)
@@ -590,6 +591,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
/* Use MMIO registers for the TLB Invalidate
 * operations.
 */
+   trace_ocxl_init_mmu_notifier(pasid, mm->context.id);
mmu_notifier_register(_data->mmu_notifier, mm);
}
}
@@ -725,6 +727,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
} else {
if (pe_data->mm) {
if (link->arva) {
+   trace_ocxl_release_mmu_notifier(pasid,
+   
pe_data->mm->context.id);
mmu_notifier_unregister(_data->mmu_notifier,
pe_data->mm);
spin_lock(>atsd_lock);
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..a33a5094ff6c 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -8,6 +8,70 @@

  #include 

+
+TRACE_EVENT(ocxl_mmu_notifier_range,
+   TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr),
+   TP_ARGS(start, end, pidr),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->start = start;
+   __entry->end = end;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("start=0x%lx end=0x%lx pidr=0x%lx",
+   __entry->start,
+   __entry->end,
+   __entry->pidr
+   )
+);
+
+TRACE_EVENT(ocxl_init_mmu_notifier,
+   TP_PROTO(int pasid, unsigned long pidr),
+   TP_ARGS(pasid, pidr),
+
+   TP_STRUCT__entry(
+   __field(int, pasid)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pasid = pasid;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("pasid=%d, pidr=0x%lx",
+   __entry->pasid,
+   __entry->pidr
+   )
+);
+
+TRACE_EVENT(ocxl_release_mmu_notifier,
+   TP_PROTO(int pasid, unsigned long pidr),
+   TP_ARGS(pasid, pidr),
+
+   TP_STRUCT__entry(
+   __field(int, pasid)
+   __field(unsigned long, pidr)
+   ),
+
+   TP_fast_assign(
+   __entry->pasid = pasid;
+   __entry->pidr = pidr;
+   ),
+
+   TP_printk("pasid=%d, pidr=0x%lx",
+   __entry->pasid,
+   __entry->pidr
+   )
+);
+
  DECLARE_EVENT_CLASS(ocxl_context,
TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr),

Re: [PATCH V3 4/5] ocxl: Add mmu notifier

2020-11-25 Thread Frederic Barrat





On 24/11/2020 10:58, Christophe Lombard wrote:

Add invalidate_range mmu notifier, when required (ATSD access of MMIO
registers is available), to initiate TLB invalidation commands.
For the time being, the ATSD0 set of registers is used by default.

The pasid and bdf values have to be configured in the Process Element
Entry.
The PEE must be set up to match the BDF/PASID of the AFU.

Signed-off-by: Christophe Lombard 
---



That looks ok too.
Acked-by: Frederic Barrat 



  drivers/misc/ocxl/link.c | 62 +++-
  1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 77381dda2c45..129d4eddc4d2 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -2,8 +2,10 @@
  // Copyright 2017 IBM Corp.
  #include 
  #include 
+#include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -33,6 +35,7 @@

  #define SPA_PE_VALID  0x8000

+struct ocxl_link;

  struct pe_data {
struct mm_struct *mm;
@@ -41,6 +44,8 @@ struct pe_data {
/* opaque pointer to be passed to the above callback */
void *xsl_err_data;
struct rcu_head rcu;
+   struct ocxl_link *link;
+   struct mmu_notifier mmu_notifier;
  };

  struct spa {
@@ -83,6 +88,8 @@ struct ocxl_link {
int domain;
int bus;
int dev;
+   void __iomem *arva; /* ATSD register virtual address */
+   spinlock_t atsd_lock;   /* to serialize shootdowns */
atomic_t irq_available;
struct spa *spa;
void *platform_data;
@@ -388,6 +395,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
link->bus = dev->bus->number;
link->dev = PCI_SLOT(dev->devfn);
atomic_set(>irq_available, MAX_IRQ_PER_LINK);
+   spin_lock_init(>atsd_lock);

rc = alloc_spa(dev, link);
if (rc)
@@ -403,6 +411,13 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
if (rc)
goto err_xsl_irq;

+   /* if link->arva is not defeined, MMIO registers are not used to
+* generate TLB invalidate. PowerBus snooping is enabled.
+* Otherwise, PowerBus snooping is disabled. TLB Invalidates are
+* initiated using MMIO registers.
+*/
+   pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0, >arva);
+
*out_link = link;
return 0;

@@ -454,6 +469,11 @@ static void release_xsl(struct kref *ref)
  {
struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);

+   if (link->arva) {
+   pnv_ocxl_unmap_lpar(link->arva);
+   link->arva = NULL;
+   }
+
list_del(>list);
/* call platform code before releasing data */
pnv_ocxl_spa_release(link->platform_data);
@@ -470,6 +490,26 @@ void ocxl_link_release(struct pci_dev *dev, void 
*link_handle)
  }
  EXPORT_SYMBOL_GPL(ocxl_link_release);

+static void invalidate_range(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end)
+{
+   struct pe_data *pe_data = container_of(mn, struct pe_data, 
mmu_notifier);
+   struct ocxl_link *link = pe_data->link;
+   unsigned long addr, pid, page_size = PAGE_SIZE;
+
+   pid = mm->context.id;
+
+   spin_lock(>atsd_lock);
+   for (addr = start; addr < end; addr += page_size)
+   pnv_ocxl_tlb_invalidate(link->arva, pid, addr, page_size);
+   spin_unlock(>atsd_lock);
+}
+
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
+   .invalidate_range = invalidate_range,
+};
+
  static u64 calculate_cfg_state(bool kernel)
  {
u64 state;
@@ -526,6 +566,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
pe_data->mm = mm;
pe_data->xsl_err_cb = xsl_err_cb;
pe_data->xsl_err_data = xsl_err_data;
+   pe_data->link = link;
+   pe_data->mmu_notifier.ops = _mmu_notifier_ops;

memset(pe, 0, sizeof(struct ocxl_process_element));
pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
@@ -542,8 +584,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 * by the nest MMU. If we have a kernel context, TLBIs are
 * already global.
 */
-   if (mm)
+   if (mm) {
mm_context_add_copro(mm);
+   if (link->arva) {
+   /* Use MMIO registers for the TLB Invalidate
+* operations.
+*/
+   mmu_notifier_register(_data->mmu_notifier, mm);
+   }
+   }
+
/*
 * Barrier is to make sure PE is visible in the SPA before it
 * is used by the device. It also helps with the global TLBI
@@ -674,6 +724,16 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
WARN(1,

Re: [PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 14:55, Aneesh Kumar K.V a écrit :

On 11/25/20 7:22 PM, Christophe Leroy wrote:



Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

This prepare kernel to operate with a different value than userspace AMR/IAMR.
For this, AMR/IAMR need to be saved and restored on entry and return from the
kernel.

With KUAP we modify kernel AMR when accessing user address from the kernel
via copy_to/from_user interfaces. We don't need to modify IAMR value in
similar fashion.

If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering
kernel from userspace. If not we can assume that AMR/IAMR is not modified
from userspace.

We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are
interrupted within kernel. This is required so that if we get interrupted
within copy_to/from_user we continue with the right AMR value.

If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace
beause kernel will be running with a different IAMR value.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/64/kup.h | 222 +++
  arch/powerpc/include/asm/ptrace.h    |   5 +-
  arch/powerpc/kernel/asm-offsets.c    |   2 +
  arch/powerpc/kernel/entry_64.S   |   6 +-
  arch/powerpc/kernel/exceptions-64s.S |   4 +-
  arch/powerpc/kernel/syscall_64.c |  32 +++-
  6 files changed, 225 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 1d38eab83d48..4dbb2d53fd8f 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -13,17 +13,46 @@
  #ifdef __ASSEMBLY__
-.macro kuap_restore_amr    gpr1, gpr2
-#ifdef CONFIG_PPC_KUAP
+.macro kuap_restore_user_amr gpr1
+#if defined(CONFIG_PPC_PKEY)
  BEGIN_MMU_FTR_SECTION_NESTED(67)
-    mfspr    \gpr1, SPRN_AMR
+    /*
+ * AMR and IAMR are going to be different when
+ * returning to userspace.
+ */
+    ld    \gpr1, STACK_REGS_AMR(r1)
+    isync
+    mtspr    SPRN_AMR, \gpr1
+    /*
+ * Restore IAMR only when returning to userspace
+ */
+    ld    \gpr1, STACK_REGS_IAMR(r1)
+    mtspr    SPRN_IAMR, \gpr1
+
+    /* No isync required, see kuap_restore_user_amr() */
+    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
+#endif
+.endm
+
+.macro kuap_restore_kernel_amr    gpr1, gpr2
+#if defined(CONFIG_PPC_PKEY)
+
+    BEGIN_MMU_FTR_SECTION_NESTED(67)
+    /*
+ * AMR is going to be mostly the same since we are
+ * returning to the kernel. Compare and do a mtspr.
+ */
  ld    \gpr2, STACK_REGS_AMR(r1)
+    mfspr    \gpr1, SPRN_AMR
  cmpd    \gpr1, \gpr2
-    beq    998f
+    beq    100f
  isync
  mtspr    SPRN_AMR, \gpr2
-    /* No isync required, see kuap_restore_amr() */
-998:
+    /*
+ * No isync required, see kuap_restore_amr()
+ * No need to restore IAMR when returning to kernel space.
+ */
+100:
  END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
@@ -42,23 +71,98 @@
  .endm
  #endif
+/*
+ *    if (pkey) {
+ *
+ *    save AMR -> stack;
+ *    if (kuap) {
+ *    if (AMR != BLOCKED)
+ *    KUAP_BLOCKED -> AMR;
+ *    }
+ *    if (from_user) {
+ *    save IAMR -> stack;
+ *    if (kuep) {
+ *    KUEP_BLOCKED ->IAMR
+ *    }
+ *    }
+ *    return;
+ *    }
+ *
+ *    if (kuap) {
+ *    if (from_kernel) {
+ *    save AMR -> stack;
+ *    if (AMR != BLOCKED)
+ *    KUAP_BLOCKED -> AMR;
+ *    }
+ *
+ *    }
+ */
  .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr
-#ifdef CONFIG_PPC_KUAP
+#if defined(CONFIG_PPC_PKEY)
+
+    /*
+ * if both pkey and kuap is disabled, nothing to do
+ */
+    BEGIN_MMU_FTR_SECTION_NESTED(68)
+    b    100f  // skip_save_amr
+    END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68)
+
+    /*
+ * if pkey is disabled and we are entering from userspace
+ * don't do anything.
+ */
  BEGIN_MMU_FTR_SECTION_NESTED(67)
  .ifnb \msr_pr_cr
-    bne    \msr_pr_cr, 99f
+    /*
+ * Without pkey we are not changing AMR outside the kernel
+ * hence skip this completely.
+ */
+    bne    \msr_pr_cr, 100f  // from userspace
  .endif
+    END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
+
+    /*
+ * pkey is enabled or pkey is disabled but entering from kernel
+ */
  mfspr    \gpr1, SPRN_AMR
  std    \gpr1, STACK_REGS_AMR(r1)
-    li    \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
-    sldi    \gpr2, \gpr2, AMR_KUAP_SHIFT
+
+    /*
+ * update kernel AMR with AMR_KUAP_BLOCKED only
+ * if KUAP feature is enabled
+ */
+    BEGIN_MMU_FTR_SECTION_NESTED(69)
+    LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
  cmpd    \use_cr, \gpr1, \gpr2
-    beq    \use_cr, 99f
-    // We don't isync here because we very recently entered via rfid

Re: [PATCH V3 3/5] ocxl: Update the Process Element Entry

2020-11-25 Thread Frederic Barrat





On 24/11/2020 10:58, Christophe Lombard wrote:

To complete the MMIO based mechanism, the fields: PASID, bus, device and
function of the Process Element Entry have to be filled. (See
OpenCAPI Power Platform Architecture document)

Hypervisor Process Element Entry
Word
 0 1  7  8  .. 12  13 ..15  16 19  20 ... 31
0  OSL Configuration State (0:31)
1  OSL Configuration State (32:63)
2   PASID  |Reserved
3   Bus   |   Device|Function |Reserved
4 Reserved
5 Reserved
6   

Signed-off-by: Christophe Lombard 
---



LGTM
Acked-by: Frederic Barrat 



  drivers/misc/ocxl/context.c   | 4 +++-
  drivers/misc/ocxl/link.c  | 4 +++-
  drivers/misc/ocxl/ocxl_internal.h | 9 ++---
  drivers/scsi/cxlflash/ocxl_hw.c   | 6 --
  include/misc/ocxl.h   | 2 +-
  5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index c21f65a5c762..9eb0d93b01c6 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, 
struct mm_struct *mm)
  {
int rc;
unsigned long pidr = 0;
+   struct pci_dev *dev;

// Locks both status & tidr
mutex_lock(>status_mutex);
@@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, 
struct mm_struct *mm)
if (mm)
pidr = mm->context.id;

+   dev = to_pci_dev(ctx->afu->fn->dev.parent);
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr,
- amr, mm, xsl_fault_error, ctx);
+ amr, pci_dev_id(dev), mm, xsl_fault_error, ctx);
if (rc)
goto out;

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fd73d3bc0eb6..77381dda2c45 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel)
  }

  int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-   u64 amr, struct mm_struct *mm,
+   u64 amr, u16 bdf, struct mm_struct *mm,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
  {
@@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,

memset(pe, 0, sizeof(struct ocxl_process_element));
pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
+   pe->pasid = cpu_to_be32(pasid << (31 - 19));
+   pe->bdf = cpu_to_be16(bdf);
pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
pe->pid = cpu_to_be32(pidr);
pe->tid = cpu_to_be32(tidr);
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 0bad0a123af6..10125a22d5a5 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -84,13 +84,16 @@ struct ocxl_context {

  struct ocxl_process_element {
__be64 config_state;
-   __be32 reserved1[11];
+   __be32 pasid;
+   __be16 bdf;
+   __be16 reserved1;
+   __be32 reserved2[9];
__be32 lpid;
__be32 tid;
__be32 pid;
-   __be32 reserved2[10];
+   __be32 reserved3[10];
__be64 amr;
-   __be32 reserved3[3];
+   __be32 reserved4[3];
__be32 software_state;
  };

diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index e4e0d767b98e..244fc27215dc 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx)
struct ocxl_hw_afu *afu = ctx->hw_afu;
struct ocxl_afu_config *acfg = >acfg;
void *link_token = afu->link_token;
+   struct pci_dev *pdev = afu->pdev;
struct device *dev = afu->dev;
bool master = ctx->master;
struct mm_struct *mm;
@@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx)
mm = current->mm;
}

-   rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm,
- ocxlflash_xsl_fault, ctx);
+   rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0,
+ pci_dev_id(pdev), mm, ocxlflash_xsl_fault,
+ ctx);
if (unlikely(rc)) {
dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n",
__func__, rc);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index e013736e275d..3ed736da02c8 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void 
*link_handle);
   * defined
   */
  int ocxl_link_add_pe(void *link_handle, int pasid, u32

Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function

2020-11-25 Thread Laurent Vivier

On 25/11/2020 14:20, Thomas Gleixner wrote:
> Laurent,
> 
> On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:
> 
> The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter
> after the colon wants to be uppercase.

Ok.

>> This function adds an affinity parameter to irq_create_mapping().
>> This parameter is needed to pass it to irq_domain_alloc_descs().
> 
> A changelog has to explain the WHY. 'The parameter is needed' is not
> really useful information.
> 

The reason of this change is explained in PATCH 2.

I have two patches, one to change the interface with no functional change 
(PATCH 1) and
one to fix the problem (PATCH 2). Moreover they don't cover the same subsystems.

I can either:
- merge the two patches
- or make a reference in the changelog of PATCH 1 to PATCH 2
  (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to
   irq_create_mapping()")")
- or copy some information from PATCH 2
  (something like "this parameter is needed by rtas_setup_msi_irqs() to pass 
the affinity
   to irq_domain_alloc_descs() to fix multiqueue affinity")

What do you prefer?

Thanks,
Laurent

Re: [PATCH V3 2/5] ocxl: Initiate a TLB invalidate command

2020-11-25 Thread Frederic Barrat





On 24/11/2020 10:58, Christophe Lombard wrote:

When a TLB Invalidate is required for the Logical Partition, the following
sequence has to be performed:

1. Load MMIO ATSD AVA register with the necessary value, if required.
2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
command.
3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
has been completed.

Signed-off-by: Christophe Lombard 
---
  arch/powerpc/include/asm/pnv-ocxl.h   | 51 +++
  arch/powerpc/platforms/powernv/ocxl.c | 70 +++
  2 files changed, 121 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 60c3c74427d9..9acd1fbf1197 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -3,12 +3,59 @@
  #ifndef _ASM_PNV_OCXL_H
  #define _ASM_PNV_OCXL_H

+#include 
  #include 

  #define PNV_OCXL_TL_MAX_TEMPLATE63
  #define PNV_OCXL_TL_BITS_PER_RATE   4
  #define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)

+#define PNV_OCXL_ATSD_TIMEOUT  1
+
+/* TLB Management Instructions */
+#define PNV_OCXL_ATSD_LNCH 0x00
+/* Radix Invalidate */
+#define   PNV_OCXL_ATSD_LNCH_R PPC_BIT(0)
+/* Radix Invalidation Control
+ * 0b00 Just invalidate TLB.
+ * 0b01 Invalidate just Page Walk Cache.
+ * 0b10 Invalidate TLB, Page Walk Cache, and any
+ * caching of Partition and Process Table Entries.
+ */
+#define   PNV_OCXL_ATSD_LNCH_RIC   PPC_BITMASK(1, 2)
+/* Number and Page Size of translations to be invalidated */
+#define   PNV_OCXL_ATSD_LNCH_LPPPC_BITMASK(3, 10)
+/* Invalidation Criteria
+ * 0b00 Invalidate just the target VA.
+ * 0b01 Invalidate matching PID.
+ */
+#define   PNV_OCXL_ATSD_LNCH_ISPPC_BITMASK(11, 12)
+/* 0b1: Process Scope, 0b0: Partition Scope */
+#define   PNV_OCXL_ATSD_LNCH_PRS   PPC_BIT(13)
+/* Invalidation Flag */
+#define   PNV_OCXL_ATSD_LNCH_B PPC_BIT(14)
+/* Actual Page Size to be invalidated
+ * 000 4KB
+ * 101 64KB
+ * 001 2MB
+ * 010 1GB
+ */
+#define   PNV_OCXL_ATSD_LNCH_APPPC_BITMASK(15, 17)
+/* Defines the large page select
+ * L=0b0 for 4KB pages
+ * L=0b1 for large pages)
+ */
+#define   PNV_OCXL_ATSD_LNCH_L PPC_BIT(18)
+/* Process ID */
+#define   PNV_OCXL_ATSD_LNCH_PID   PPC_BITMASK(19, 38)
+/* NoFlush – Assumed to be 0b0 */
+#define   PNV_OCXL_ATSD_LNCH_F PPC_BIT(39)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SLBIPPC_BIT(40)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON   PPC_BIT(41)
+#define PNV_OCXL_ATSD_AVA  0x08
+#define   PNV_OCXL_ATSD_AVA_AVAPPC_BITMASK(0, 51)
+#define PNV_OCXL_ATSD_STAT 0x10
+
  int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 
*supported);
  int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);

@@ -31,4 +78,8 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, 
int pe_handle);
  int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
  uint64_t lpcr, void __iomem **arva);
  void pnv_ocxl_unmap_lpar(void __iomem *arva);
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+unsigned long pid,
+unsigned long addr,
+unsigned long page_size);
  #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index 57fc1062677b..f665846d2b28 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -528,3 +528,73 @@ void pnv_ocxl_unmap_lpar(void __iomem *arva)
iounmap(arva);
  }
  EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
+
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+unsigned long pid,
+unsigned long addr,
+unsigned long page_size)
+{
+   unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT);
+   u64 val = 0ull;
+   int pend;
+   u8 size;
+
+   if (!(arva))
+   return;
+
+   if (addr) {
+   /* load Abbreviated Virtual Address register with
+* the necessary value
+*/
+   val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51));
+   out_be64(arva + PNV_OCXL_ATSD_AVA, val);
+   }
+
+   /* Write access initiates a shoot down to initiate the
+* TLB Invalidate command
+*/
+   val = PNV_OCXL_ATSD_LNCH_R;
+   if (addr) {
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b00);
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00);
+   } else {
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10);
+   val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01);
+   val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;

Re: [PATCH v6 16/22] powerpc/book3s64/kuap: Improve error reporting with KUAP

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

With hash translation use DSISR_KEYFAULT to identify a wrong access.
With Radix we look at the AMR value and type of fault.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/32/kup.h |  4 +--
  arch/powerpc/include/asm/book3s/64/kup.h | 27 
  arch/powerpc/include/asm/kup.h   |  4 +--
  arch/powerpc/include/asm/nohash/32/kup-8xx.h |  4 +--
  arch/powerpc/mm/fault.c  |  2 +-
  5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index 32fd4452e960..b18cd931e325 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -177,8 +177,8 @@ static inline void restore_user_access(unsigned long flags)
allow_user_access(to, to, end - addr, KUAP_READ_WRITE);
  }
  
-static inline bool

-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
  {
unsigned long begin = regs->kuap & 0xf000;
unsigned long end = regs->kuap << 28;
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 4a3d0d601745..2922c442a218 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -301,12 +301,29 @@ static inline void set_kuap(unsigned long value)
isync();
  }
  
-static inline bool

-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+#define RADIX_KUAP_BLOCK_READ  UL(0x4000)
+#define RADIX_KUAP_BLOCK_WRITE UL(0x8000)
+
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
  {
-   return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
-   (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
-   "Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
+   if (!mmu_has_feature(MMU_FTR_KUAP))
+   return false;
+
+   if (radix_enabled()) {
+   /*
+* Will be a storage protection fault.
+* Only check the details of AMR[0]
+*/
+   return WARN((regs->kuap & (is_write ? RADIX_KUAP_BLOCK_WRITE : 
RADIX_KUAP_BLOCK_READ)),
+   "Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");


I think it is pointless to keep the WARN() here.

I have a series aiming at removing them. See 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cc9129bdda1dbc2f0a09cf45fece7d0b0e690784.1605541983.git.christophe.le...@csgroup.eu/



+   }
+   /*
+* We don't want to WARN here because userspace can setup
+* keys such that a kernel access to user address can cause
+* fault
+*/
+   return !!(error_code & DSISR_KEYFAULT);
  }
  
  static __always_inline void allow_user_access(void __user *to, const void __user *from,

diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index a06e50b68d40..952be0414f43 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -59,8 +59,8 @@ void setup_kuap(bool disabled);
  #else
  static inline void setup_kuap(bool disabled) { }
  
-static inline bool

-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
  {
return false;
  }
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 567cdc557402..7bdd9e5b63ed 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -60,8 +60,8 @@ static inline void restore_user_access(unsigned long flags)
mtspr(SPRN_MD_AP, flags);
  }
  
-static inline bool

-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
  {
return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xff00),
"Bug: fault blocked by AP register !");
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 0add963a849b..c91621df0c61 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -227,7 +227,7 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned 
long error_code,
  
  	// Read/write fault in a valid region (the exception table search passed

// above), but blocked by KUAP is bad, it can never

Re: [PATCH] ASoC: fsl_xcvr: fix potential resource leak

2020-11-25 Thread Mark Brown

On Tue, 24 Nov 2020 16:19:57 +0200, Viorel Suman (OSS) wrote:
> "fw" variable must be relased before return.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl_xcvr: fix potential resource leak
  commit: 373c2cebf42772434c8dd0deffc3b3886ea8f1eb

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Re: [PATCH v6 11/22] powerpc/book3s64/pkeys: Inherit correctly on fork.

2020-11-25 Thread Aneesh Kumar K.V


On 11/25/20 7:24 PM, Christophe Leroy wrote:



Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :
Child thread.kuap value is inherited from the parent in 
copy_thread_tls. We still
need to make sure when the child returns from a fork in the kernel we 
start with the kernel

default AMR value.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/kernel/process.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/process.c 
b/arch/powerpc/kernel/process.c

index b6b8a845e454..733680de0ba4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1768,6 +1768,17 @@ int copy_thread(unsigned long clone_flags, 
unsigned long usp,

  childregs->ppr = DEFAULT_PPR;
  p->thread.tidr = 0;
+#endif
+    /*
+ * Run with the current AMR value of the kernel
+ */
+#ifdef CONFIG_PPC_KUAP
+    if (mmu_has_feature(MMU_FTR_KUAP))
+    kregs->kuap = AMR_KUAP_BLOCKED;
+#endif


Do we need that ifdef at all ?

Shouldn't mmu_has_feature(MMU_FTR_KUAP) be always false and get 
optimised out when CONFIG_PPC_KUAP is not defined ?



+#ifdef CONFIG_PPC_KUEP
+    if (mmu_has_feature(MMU_FTR_KUEP))
+    kregs->iamr = AMR_KUEP_BLOCKED;


Same ?


  #endif
  kregs->nip = ppc_function_entry(f);
  return 0;



Not really. I did hit a compile error with this patch on 
mpc885_ads_defconfig and that required me to do


modified   arch/powerpc/kernel/process.c
@@ -1772,11 +1772,10 @@ int copy_thread(unsigned long clone_flags, 
unsigned long usp,

/*
 * Run with the current AMR value of the kernel
 */
-#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_PKEY
if (mmu_has_feature(MMU_FTR_KUAP))
-   kregs->kuap = AMR_KUAP_BLOCKED;
-#endif
-#ifdef CONFIG_PPC_KUEP
+   kregs->amr = AMR_KUAP_BLOCKED;
+
if (mmu_has_feature(MMU_FTR_KUEP))
kregs->iamr = AMR_KUEP_BLOCKED;
 #endif

Re: [PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel

2020-11-25 Thread Aneesh Kumar K.V


On 11/25/20 7:22 PM, Christophe Leroy wrote:



Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :
This prepare kernel to operate with a different value than userspace 
AMR/IAMR.
For this, AMR/IAMR need to be saved and restored on entry and return 
from the

kernel.

With KUAP we modify kernel AMR when accessing user address from the 
kernel

via copy_to/from_user interfaces. We don't need to modify IAMR value in
similar fashion.

If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on 
entering

kernel from userspace. If not we can assume that AMR/IAMR is not modified
from userspace.

We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are
interrupted within kernel. This is required so that if we get interrupted
within copy_to/from_user we continue with the right AMR value.

If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to 
userspace

beause kernel will be running with a different IAMR value.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/64/kup.h | 222 +++
  arch/powerpc/include/asm/ptrace.h    |   5 +-
  arch/powerpc/kernel/asm-offsets.c    |   2 +
  arch/powerpc/kernel/entry_64.S   |   6 +-
  arch/powerpc/kernel/exceptions-64s.S |   4 +-
  arch/powerpc/kernel/syscall_64.c |  32 +++-
  6 files changed, 225 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h

index 1d38eab83d48..4dbb2d53fd8f 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -13,17 +13,46 @@
  #ifdef __ASSEMBLY__
-.macro kuap_restore_amr    gpr1, gpr2
-#ifdef CONFIG_PPC_KUAP
+.macro kuap_restore_user_amr gpr1
+#if defined(CONFIG_PPC_PKEY)
  BEGIN_MMU_FTR_SECTION_NESTED(67)
-    mfspr    \gpr1, SPRN_AMR
+    /*
+ * AMR and IAMR are going to be different when
+ * returning to userspace.
+ */
+    ld    \gpr1, STACK_REGS_AMR(r1)
+    isync
+    mtspr    SPRN_AMR, \gpr1
+    /*
+ * Restore IAMR only when returning to userspace
+ */
+    ld    \gpr1, STACK_REGS_IAMR(r1)
+    mtspr    SPRN_IAMR, \gpr1
+
+    /* No isync required, see kuap_restore_user_amr() */
+    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
+#endif
+.endm
+
+.macro kuap_restore_kernel_amr    gpr1, gpr2
+#if defined(CONFIG_PPC_PKEY)
+
+    BEGIN_MMU_FTR_SECTION_NESTED(67)
+    /*
+ * AMR is going to be mostly the same since we are
+ * returning to the kernel. Compare and do a mtspr.
+ */
  ld    \gpr2, STACK_REGS_AMR(r1)
+    mfspr    \gpr1, SPRN_AMR
  cmpd    \gpr1, \gpr2
-    beq    998f
+    beq    100f
  isync
  mtspr    SPRN_AMR, \gpr2
-    /* No isync required, see kuap_restore_amr() */
-998:
+    /*
+ * No isync required, see kuap_restore_amr()
+ * No need to restore IAMR when returning to kernel space.
+ */
+100:
  END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
@@ -42,23 +71,98 @@
  .endm
  #endif
+/*
+ *    if (pkey) {
+ *
+ *    save AMR -> stack;
+ *    if (kuap) {
+ *    if (AMR != BLOCKED)
+ *    KUAP_BLOCKED -> AMR;
+ *    }
+ *    if (from_user) {
+ *    save IAMR -> stack;
+ *    if (kuep) {
+ *    KUEP_BLOCKED ->IAMR
+ *    }
+ *    }
+ *    return;
+ *    }
+ *
+ *    if (kuap) {
+ *    if (from_kernel) {
+ *    save AMR -> stack;
+ *    if (AMR != BLOCKED)
+ *    KUAP_BLOCKED -> AMR;
+ *    }
+ *
+ *    }
+ */
  .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr
-#ifdef CONFIG_PPC_KUAP
+#if defined(CONFIG_PPC_PKEY)
+
+    /*
+ * if both pkey and kuap is disabled, nothing to do
+ */
+    BEGIN_MMU_FTR_SECTION_NESTED(68)
+    b    100f  // skip_save_amr
+    END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68)
+
+    /*
+ * if pkey is disabled and we are entering from userspace
+ * don't do anything.
+ */
  BEGIN_MMU_FTR_SECTION_NESTED(67)
  .ifnb \msr_pr_cr
-    bne    \msr_pr_cr, 99f
+    /*
+ * Without pkey we are not changing AMR outside the kernel
+ * hence skip this completely.
+ */
+    bne    \msr_pr_cr, 100f  // from userspace
  .endif
+    END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
+
+    /*
+ * pkey is enabled or pkey is disabled but entering from kernel
+ */
  mfspr    \gpr1, SPRN_AMR
  std    \gpr1, STACK_REGS_AMR(r1)
-    li    \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
-    sldi    \gpr2, \gpr2, AMR_KUAP_SHIFT
+
+    /*
+ * update kernel AMR with AMR_KUAP_BLOCKED only
+ * if KUAP feature is enabled
+ */
+    BEGIN_MMU_FTR_SECTION_NESTED(69)
+    LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
  cmpd    \use_cr, \gpr1, \gpr2
-    beq    \use_cr, 99f
-    // We don't isync here because we very recently entered via rfid
+    beq    \use_cr, 102f
+    /*
+ * We

Re: [PATCH v6 11/22] powerpc/book3s64/pkeys: Inherit correctly on fork.

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

Child thread.kuap value is inherited from the parent in copy_thread_tls. We 
still
need to make sure when the child returns from a fork in the kernel we start 
with the kernel
default AMR value.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/kernel/process.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b6b8a845e454..733680de0ba4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1768,6 +1768,17 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
childregs->ppr = DEFAULT_PPR;
  
  	p->thread.tidr = 0;

+#endif
+   /*
+* Run with the current AMR value of the kernel
+*/
+#ifdef CONFIG_PPC_KUAP
+   if (mmu_has_feature(MMU_FTR_KUAP))
+   kregs->kuap = AMR_KUAP_BLOCKED;
+#endif


Do we need that ifdef at all ?

Shouldn't mmu_has_feature(MMU_FTR_KUAP) be always false and get optimised out when CONFIG_PPC_KUAP 
is not defined ?



+#ifdef CONFIG_PPC_KUEP
+   if (mmu_has_feature(MMU_FTR_KUEP))
+   kregs->iamr = AMR_KUEP_BLOCKED;


Same ?


  #endif
kregs->nip = ppc_function_entry(f);
return 0;

Re: [PATCH v6 07/22] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP

2020-11-25 Thread Aneesh Kumar K.V


On 11/25/20 7:13 PM, Christophe Leroy wrote:



Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

This is in preparate to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.


It was obvious with MMU_FTR_RADIX_KUAP that it was only for Radix PPC64.
Now, do we expect it to be applies on PPC32 as well or is it still for 
PPC64 only ?


Right now this is PPC64 only. I added

+config PPC_PKEY
+   def_bool y
+   depends on PPC_BOOK3S_64
+   depends on PPC_MEM_KEYS || PPC_KUAP || PPC_KUEP

to select the base bits needed for both KUAP and MEM_KEYS. I haven't 
looked at PPC32 to see if we can implement it there also.






Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/64/kup.h | 18 +-
  arch/powerpc/include/asm/mmu.h   | 14 +++---
  arch/powerpc/mm/book3s64/pkeys.c |  2 +-
  3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h

index 39d2e3a0d64d..1d38eab83d48 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -24,7 +24,7 @@
  mtspr    SPRN_AMR, \gpr2
  /* No isync required, see kuap_restore_amr() */
  998:
-    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
@@ -37,7 +37,7 @@
  sldi    \gpr2, \gpr2, AMR_KUAP_SHIFT
  999:    tdne    \gpr1, \gpr2
  EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)

-    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
  #endif
@@ -58,7 +58,7 @@
  mtspr    SPRN_AMR, \gpr2
  isync
  99:
-    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
@@ -73,7 +73,7 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
  static inline void kuap_restore_amr(struct pt_regs *regs, unsigned 
long amr)

  {
-    if (mmu_has_feature(MMU_FTR_RADIX_KUAP) && unlikely(regs->kuap != 
amr)) {

+    if (mmu_has_feature(MMU_FTR_KUAP) && unlikely(regs->kuap != amr)) {
  isync();
  mtspr(SPRN_AMR, regs->kuap);
  /*
@@ -86,7 +86,7 @@ static inline void kuap_restore_amr(struct pt_regs 
*regs, unsigned long amr)

  static inline unsigned long kuap_get_and_check_amr(void)
  {
-    if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
+    if (mmu_has_feature(MMU_FTR_KUAP)) {
  unsigned long amr = mfspr(SPRN_AMR);
  if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */
  WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED);
@@ -97,7 +97,7 @@ static inline unsigned long 
kuap_get_and_check_amr(void)

  static inline void kuap_check_amr(void)
  {
-    if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
+    if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_KUAP))

  WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
  }
@@ -116,7 +116,7 @@ static inline unsigned long get_kuap(void)
   * This has no effect in terms of actually blocking things on hash,
   * so it doesn't break anything.
   */
-    if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+    if (!early_mmu_has_feature(MMU_FTR_KUAP))
  return AMR_KUAP_BLOCKED;
  return mfspr(SPRN_AMR);
@@ -124,7 +124,7 @@ static inline unsigned long get_kuap(void)
  static inline void set_kuap(unsigned long value)
  {
-    if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+    if (!early_mmu_has_feature(MMU_FTR_KUAP))
  return;
  /*
@@ -139,7 +139,7 @@ static inline void set_kuap(unsigned long value)
  static inline bool
  bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool 
is_write)

  {
-    return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
+    return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
  (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
  "Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");

  }
diff --git a/arch/powerpc/include/asm/mmu.h 
b/arch/powerpc/include/asm/mmu.h

index 255a1837e9f7..f5c7a17c198a 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -28,6 +28,11 @@
   * Individual features below.
   */
+/*
+ * Supports KUAP (key 0 controlling userspace addresses) on radix
+ */
+#define MMU_FTR_KUAP    ASM_CONST(0x0200)
+
  /*
   * Support for KUEP feature.
   */
@@ -120,11 +125,6 @@
   */
  #define MMU_FTR_1T_SEGMENT    ASM_CONST(0x4000)
-/*
- * Supports KUAP (key 0 controlling userspace addresses) on radix
- */
-#define MMU_FTR_RADIX_KUAP    ASM_CONST(0x8000)
-
  /* MMU feature bit sets for various CPUs */
  #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2    \
  MMU_FTR_HPTE_TABLE |

Re: [PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

This prepare kernel to operate with a different value than userspace AMR/IAMR.
For this, AMR/IAMR need to be saved and restored on entry and return from the
kernel.

With KUAP we modify kernel AMR when accessing user address from the kernel
via copy_to/from_user interfaces. We don't need to modify IAMR value in
similar fashion.

If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering
kernel from userspace. If not we can assume that AMR/IAMR is not modified
from userspace.

We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are
interrupted within kernel. This is required so that if we get interrupted
within copy_to/from_user we continue with the right AMR value.

If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace
beause kernel will be running with a different IAMR value.

Reviewed-by: Sandipan Das 
Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/64/kup.h | 222 +++
  arch/powerpc/include/asm/ptrace.h|   5 +-
  arch/powerpc/kernel/asm-offsets.c|   2 +
  arch/powerpc/kernel/entry_64.S   |   6 +-
  arch/powerpc/kernel/exceptions-64s.S |   4 +-
  arch/powerpc/kernel/syscall_64.c |  32 +++-
  6 files changed, 225 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 1d38eab83d48..4dbb2d53fd8f 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -13,17 +13,46 @@
  
  #ifdef __ASSEMBLY__
  
-.macro kuap_restore_amr	gpr1, gpr2

-#ifdef CONFIG_PPC_KUAP
+.macro kuap_restore_user_amr gpr1
+#if defined(CONFIG_PPC_PKEY)
BEGIN_MMU_FTR_SECTION_NESTED(67)
-   mfspr   \gpr1, SPRN_AMR
+   /*
+* AMR and IAMR are going to be different when
+* returning to userspace.
+*/
+   ld  \gpr1, STACK_REGS_AMR(r1)
+   isync
+   mtspr   SPRN_AMR, \gpr1
+   /*
+* Restore IAMR only when returning to userspace
+*/
+   ld  \gpr1, STACK_REGS_IAMR(r1)
+   mtspr   SPRN_IAMR, \gpr1
+
+   /* No isync required, see kuap_restore_user_amr() */
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
+#endif
+.endm
+
+.macro kuap_restore_kernel_amr gpr1, gpr2
+#if defined(CONFIG_PPC_PKEY)
+
+   BEGIN_MMU_FTR_SECTION_NESTED(67)
+   /*
+* AMR is going to be mostly the same since we are
+* returning to the kernel. Compare and do a mtspr.
+*/
ld  \gpr2, STACK_REGS_AMR(r1)
+   mfspr   \gpr1, SPRN_AMR
cmpd\gpr1, \gpr2
-   beq 998f
+   beq 100f
isync
mtspr   SPRN_AMR, \gpr2
-   /* No isync required, see kuap_restore_amr() */
-998:
+   /*
+* No isync required, see kuap_restore_amr()
+* No need to restore IAMR when returning to kernel space.
+*/
+100:
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
@@ -42,23 +71,98 @@
  .endm
  #endif
  
+/*

+ * if (pkey) {
+ *
+ * save AMR -> stack;
+ * if (kuap) {
+ * if (AMR != BLOCKED)
+ * KUAP_BLOCKED -> AMR;
+ * }
+ * if (from_user) {
+ * save IAMR -> stack;
+ * if (kuep) {
+ * KUEP_BLOCKED ->IAMR
+ * }
+ * }
+ * return;
+ * }
+ *
+ * if (kuap) {
+ * if (from_kernel) {
+ * save AMR -> stack;
+ * if (AMR != BLOCKED)
+ * KUAP_BLOCKED -> AMR;
+ * }
+ *
+ * }
+ */
  .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr
-#ifdef CONFIG_PPC_KUAP
+#if defined(CONFIG_PPC_PKEY)
+
+   /*
+* if both pkey and kuap is disabled, nothing to do
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(68)
+   b   100f  // skip_save_amr
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68)
+
+   /*
+* if pkey is disabled and we are entering from userspace
+* don't do anything.
+*/
BEGIN_MMU_FTR_SECTION_NESTED(67)
.ifnb \msr_pr_cr
-   bne \msr_pr_cr, 99f
+   /*
+* Without pkey we are not changing AMR outside the kernel
+* hence skip this completely.
+*/
+   bne \msr_pr_cr, 100f  // from userspace
.endif
+END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
+
+   /*
+* pkey is enabled or pkey is disabled but entering from kernel
+*/
mfspr   \gpr1, SPRN_AMR
std \gpr1, STACK_REGS_AMR(r1)
-   li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
-   sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
+
+   /*
+* update kernel AMR with AMR_KUAP_BLOCKED only
+* if KUAP feature is

Re: [PATCH v6 09/22] powerpc/exec: Set thread.regs early during exec

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

In later patches during exec, we would like to access default regs.amr to
control access to the user mapping. Having thread.regs set early makes the
code changes simpler.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/thread_info.h |  2 --
  arch/powerpc/kernel/process.c  | 37 +-
  2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 46a210b03d2b..de4c911d9ced 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -77,10 +77,8 @@ struct thread_info {
  /* how to get the thread information struct from C */
  extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct 
*src);
  
-#ifdef CONFIG_PPC_BOOK3S_64

  void arch_setup_new_exec(void);
  #define arch_setup_new_exec arch_setup_new_exec
-#endif
  
  #endif /* __ASSEMBLY__ */
  
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c

index d421a2c7f822..b6b8a845e454 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1530,10 +1530,32 @@ void flush_thread(void)
  #ifdef CONFIG_PPC_BOOK3S_64
  void arch_setup_new_exec(void)
  {
-   if (radix_enabled())
-   return;
-   hash__setup_new_exec();
+   if (!radix_enabled())
+   hash__setup_new_exec();
+
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
+
+}
+#else
+void arch_setup_new_exec(void)
+{
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
  }
+
  #endif


No need to duplicate arch_setup_new_exec() I think. radix_enabled() is defined at all time so the 
first function should be valid at all time.


  
  #ifdef CONFIG_PPC64

@@ -1765,15 +1787,6 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
preload_new_slb_context(start, sp);
  #endif
  
-	/*

-* If we exec out of a kernel thread then thread.regs will not be
-* set.  Do it now.
-*/
-   if (!current->thread.regs) {
-   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
-   current->thread.regs = regs - 1;
-   }
-
  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/*
 * Clear any transactional state, we're exec()ing. The cause is

Re: [PATCH V3 1/5] ocxl: Assign a register set to a Logical Partition

2020-11-25 Thread Frederic Barrat





On 24/11/2020 10:58, Christophe Lombard wrote:

Platform specific function to assign a register set to a Logical Partition.
The "ibm,mmio-atsd" property, provided by the firmware, contains the 16
base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO
registers (XTS MMIO ATSDx LPARID/AVA/launch/status register).

For the time being, the ATSD0 set of registers is used by default.

Signed-off-by: Christophe Lombard 
---



Looks good, thanks for the updates!
Acked-by: Frederic Barrat 



  arch/powerpc/include/asm/pnv-ocxl.h   |  3 ++
  arch/powerpc/platforms/powernv/ocxl.c | 45 +++
  2 files changed, 48 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index d37ededca3ee..60c3c74427d9 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, 
int PE_mask, void **p
  void pnv_ocxl_spa_release(void *platform_data);
  int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);

+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+ uint64_t lpcr, void __iomem **arva);
+void pnv_ocxl_unmap_lpar(void __iomem *arva);
  #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index ecdad219d704..57fc1062677b 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -483,3 +483,48 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, 
int pe_handle)
return rc;
  }
  EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
+
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+ uint64_t lpcr, void __iomem **arva)
+{
+   struct pci_controller *hose = pci_bus_to_host(dev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   u64 mmio_atsd;
+   int rc;
+
+   /* ATSD physical address.
+* ATSD LAUNCH register: write access initiates a shoot down to
+* initiate the TLB Invalidate command.
+*/
+   rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
+   0, _atsd);
+   if (rc) {
+   dev_info(>dev, "No available ATSD found\n");
+   return rc;
+   }
+
+   /* Assign a register set to a Logical Partition and MMIO ATSD
+* LPARID register to the required value.
+*/
+   rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev),
+  lparid, lpcr);
+   if (rc) {
+   dev_err(>dev, "Error mapping device to LPAR: %d\n", rc);
+   return rc;
+   }
+
+   *arva = ioremap(mmio_atsd, 24);
+   if (!(*arva)) {
+   dev_warn(>dev, "ioremap failed - mmio_atsd: %#llx\n", 
mmio_atsd);
+   rc = -ENOMEM;
+   }
+
+   return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar);
+
+void pnv_ocxl_unmap_lpar(void __iomem *arva)
+{
+   iounmap(arva);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);

Re: [PATCH v6 07/22] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

This is in preparate to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.


It was obvious with MMU_FTR_RADIX_KUAP that it was only for Radix PPC64.
Now, do we expect it to be applies on PPC32 as well or is it still for PPC64 
only ?



Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/book3s/64/kup.h | 18 +-
  arch/powerpc/include/asm/mmu.h   | 14 +++---
  arch/powerpc/mm/book3s64/pkeys.c |  2 +-
  3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 39d2e3a0d64d..1d38eab83d48 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -24,7 +24,7 @@
mtspr   SPRN_AMR, \gpr2
/* No isync required, see kuap_restore_amr() */
  998:
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
  
@@ -37,7 +37,7 @@

sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
  999:  tdne\gpr1, \gpr2
EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
  #endif
@@ -58,7 +58,7 @@
mtspr   SPRN_AMR, \gpr2
isync
  99:
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
  #endif
  .endm
  
@@ -73,7 +73,7 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
  
  static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)

  {
-   if (mmu_has_feature(MMU_FTR_RADIX_KUAP) && unlikely(regs->kuap != amr)) 
{
+   if (mmu_has_feature(MMU_FTR_KUAP) && unlikely(regs->kuap != amr)) {
isync();
mtspr(SPRN_AMR, regs->kuap);
/*
@@ -86,7 +86,7 @@ static inline void kuap_restore_amr(struct pt_regs *regs, 
unsigned long amr)
  
  static inline unsigned long kuap_get_and_check_amr(void)

  {
-   if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
+   if (mmu_has_feature(MMU_FTR_KUAP)) {
unsigned long amr = mfspr(SPRN_AMR);
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */
WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED);
@@ -97,7 +97,7 @@ static inline unsigned long kuap_get_and_check_amr(void)
  
  static inline void kuap_check_amr(void)

  {
-   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_KUAP))
WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
  }
  
@@ -116,7 +116,7 @@ static inline unsigned long get_kuap(void)

 * This has no effect in terms of actually blocking things on hash,
 * so it doesn't break anything.
 */
-   if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (!early_mmu_has_feature(MMU_FTR_KUAP))
return AMR_KUAP_BLOCKED;
  
  	return mfspr(SPRN_AMR);

@@ -124,7 +124,7 @@ static inline unsigned long get_kuap(void)
  
  static inline void set_kuap(unsigned long value)

  {
-   if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (!early_mmu_has_feature(MMU_FTR_KUAP))
return;
  
  	/*

@@ -139,7 +139,7 @@ static inline void set_kuap(unsigned long value)
  static inline bool
  bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
  {
-   return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
+   return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
  }
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 255a1837e9f7..f5c7a17c198a 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -28,6 +28,11 @@
   * Individual features below.
   */
  
+/*

+ * Supports KUAP (key 0 controlling userspace addresses) on radix
+ */
+#define MMU_FTR_KUAP   ASM_CONST(0x0200)
+
  /*
   * Support for KUEP feature.
   */
@@ -120,11 +125,6 @@
   */
  #define MMU_FTR_1T_SEGMENTASM_CONST(0x4000)
  
-/*

- * Supports KUAP (key 0 controlling userspace addresses) on radix
- */
-#define MMU_FTR_RADIX_KUAP ASM_CONST(0x8000)
-
  /* MMU feature bit sets for various CPUs */
  #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2 \
MMU_FTR_HPTE_TABLE | MMU_FTR_PPCAS_ARCH_V2
@@ -187,10 +187,10 @@ enum {
  #ifdef CONFIG_PPC_RADIX_MMU
MMU_FTR_TYPE_RADIX |
MMU_FTR_GTSE |
+#endif /* CONFIG_PPC_RADIX_MMU

Re: [PATCH v6 04/22] powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

This patch consolidates UAMOR update across pkey, kuap and kuep features.
The boot cpu initialize UAMOR via pkey init and both radix/hash do the
secondary cpu UAMOR init in early_init_mmu_secondary.

We don't check for mmu_feature in radix secondary init because UAMOR
is a supported SPRN with all CPUs supporting radix translation.
The old code was not updating UAMOR if we had smap disabled and smep enabled.
This change handles that case.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 3adcf730f478..bfe441af916a 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -620,9 +620,6 @@ void setup_kuap(bool disabled)
cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
}
  
-	/* Make sure userspace can't change the AMR */

-   mtspr(SPRN_UAMOR, 0);
-
/*
 * Set the default kernel AMR values on all cpus.
 */
@@ -721,6 +718,11 @@ void radix__early_init_mmu_secondary(void)
  
  	radix__switch_mmu_context(NULL, _mm);

tlbiel_all();
+
+#ifdef CONFIG_PPC_PKEY


It should be possible to use an 'if' with IS_ENABLED(CONFIG_PPC_PKEY) instead 
of this #ifdef


+   /* Make sure userspace can't change the AMR */
+   mtspr(SPRN_UAMOR, 0);
+#endif
  }
  
  void radix__mmu_cleanup_all(void)

Re: [PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS

2020-11-25 Thread Christophe Leroy





Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :

The next set of patches adds support for kuap with hash translation.
Hence make KUAP a BOOK3S_64 feature. Also make it a subfeature of
PPC_MEM_KEYS. Hash translation is going to use pkeys to support
KUAP/KUEP. Adding this dependency reduces the code complexity and
enables us to move some of the initialization code to pkeys.c

Signed-off-by: Aneesh Kumar K.V 
---
  .../powerpc/include/asm/book3s/64/kup-radix.h |  4 ++--
  arch/powerpc/include/asm/book3s/64/mmu.h  |  2 +-
  arch/powerpc/include/asm/ptrace.h |  7 +-
  arch/powerpc/kernel/asm-offsets.c |  3 +++
  arch/powerpc/mm/book3s64/Makefile |  2 +-
  arch/powerpc/mm/book3s64/pkeys.c  | 24 ---
  arch/powerpc/platforms/Kconfig.cputype|  5 
  7 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 28716e2f13e3..68eaa2fac3ab 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -16,7 +16,7 @@
  #ifdef CONFIG_PPC_KUAP
BEGIN_MMU_FTR_SECTION_NESTED(67)
mfspr   \gpr1, SPRN_AMR
-   ld  \gpr2, STACK_REGS_KUAP(r1)
+   ld  \gpr2, STACK_REGS_AMR(r1)
cmpd\gpr1, \gpr2
beq 998f
isync
@@ -48,7 +48,7 @@
bne \msr_pr_cr, 99f
.endif
mfspr   \gpr1, SPRN_AMR
-   std \gpr1, STACK_REGS_KUAP(r1)
+   std \gpr1, STACK_REGS_AMR(r1)
li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
cmpd\use_cr, \gpr1, \gpr2
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index e0b52940e43c..a2a015066bae 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -199,7 +199,7 @@ extern int mmu_io_psize;
  void mmu_early_init_devtree(void);
  void hash__early_init_devtree(void);
  void radix__early_init_devtree(void);
-#ifdef CONFIG_PPC_MEM_KEYS
+#ifdef CONFIG_PPC_PKEY
  void pkey_early_init_devtree(void);
  #else
  static inline void pkey_early_init_devtree(void) {}
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index e2c778c176a3..e7f1caa007a4 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -53,9 +53,14 @@ struct pt_regs
  #ifdef CONFIG_PPC64
unsigned long ppr;
  #endif
+   union {
  #ifdef CONFIG_PPC_KUAP
-   unsigned long kuap;
+   unsigned long kuap;
  #endif
+#ifdef CONFIG_PPC_PKEY
+   unsigned long amr;
+#endif
+   };
};
unsigned long __pad[2]; /* Maintain 16 byte interrupt stack 
alignment */
};
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index c2722ff36e98..418a0b314a33 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -354,6 +354,9 @@ int main(void)
STACK_PT_REGS_OFFSET(_PPR, ppr);
  #endif /* CONFIG_PPC64 */
  
+#ifdef CONFIG_PPC_PKEY

+   STACK_PT_REGS_OFFSET(STACK_REGS_AMR, amr);
+#endif
  #ifdef CONFIG_PPC_KUAP
STACK_PT_REGS_OFFSET(STACK_REGS_KUAP, kuap);
  #endif
diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index fd393b8be14f..1b56d3af47d4 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -17,7 +17,7 @@ endif
  obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hash_hugepage.o
  obj-$(CONFIG_PPC_SUBPAGE_PROT)+= subpage_prot.o
  obj-$(CONFIG_SPAPR_TCE_IOMMU) += iommu_api.o
-obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o
+obj-$(CONFIG_PPC_PKEY) += pkeys.o
  
  # Instrumenting the SLB fault path can lead to duplicate SLB entries

  KCOV_INSTRUMENT_slb.o := n
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index b1d091a97611..7dc71f85683d 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -89,12 +89,14 @@ static int scan_pkey_feature(void)
}
}
  
+#ifdef CONFIG_PPC_MEM_KEYS

/*
 * Adjust the upper limit, based on the number of bits supported by
 * arch-neutral code.
 */
pkeys_total = min_t(int, pkeys_total,
((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));


I don't think we need an #ifdef here. I thing an 'if 
(IS_ENABLED(CONFIG_PPC_MEM_KEYS))' should make it.


+#endif
return pkeys_total;
  }
  
@@ -102,6 +104,7 @@ void __init pkey_early_init_devtree(void)

  {
int pkeys_total, i;
  
+#ifdef CONFIG_PPC_MEM_KEYS

/*
 * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
 * generic defines for

Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function

2020-11-25 Thread Thomas Gleixner

Laurent,

On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:

The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter
after the colon wants to be uppercase.

> This function adds an affinity parameter to irq_create_mapping().
> This parameter is needed to pass it to irq_domain_alloc_descs().

A changelog has to explain the WHY. 'The parameter is needed' is not
really useful information.

Thanks,

tglx

Re: [PATCH v2 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Greg Kurz

On Wed, 25 Nov 2020 12:16:57 +0100
Laurent Vivier  wrote:

> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
> 
> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
> this is broken on pseries.
> 
> The affinity is correctly computed in msi_desc but this is not applied
> to the system IRQs.
> 
> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
> lost at this point and never passed to irq_domain_alloc_descs()
> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
> because irq_create_mapping() doesn't take an affinity parameter.
> 
> As the previous patch has added the affinity parameter to
> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs().
> 
> With this change, the virtqueues are correctly dispatched between the CPUs
> on pseries.
> 

Since it is public, maybe add:

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939

?

> Signed-off-by: Laurent Vivier 
> ---

Anyway,

Reviewed-by: Greg Kurz 

>  arch/powerpc/platforms/pseries/msi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/msi.c 
> b/arch/powerpc/platforms/pseries/msi.c
> index 133f6adcb39c..b3ac2455faad 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
> nvec_in, int type)
>   return hwirq;
>   }
>  
> - virq = irq_create_mapping(NULL, hwirq);
> + virq = irq_create_mapping_affinity(NULL, hwirq,
> +entry->affinity);
>  
>   if (!virq) {
>   pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);

Re: C vdso

2020-11-25 Thread Michael Ellerman

Christophe Leroy  writes:
> Quoting Michael Ellerman :
>
>> Christophe Leroy  writes:
>>> Le 03/11/2020 à 19:13, Christophe Leroy a écrit :
 Le 23/10/2020 à 15:24, Michael Ellerman a écrit :
> Christophe Leroy  writes:
>> Le 24/09/2020 à 15:17, Christophe Leroy a écrit :
>>> Le 17/09/2020 à 14:33, Michael Ellerman a écrit :
 Christophe Leroy  writes:
>
> What is the status with the generic C vdso merge ?
> In some mail, you mentionned having difficulties getting it working on
> ppc64, any progress ? What's the problem ? Can I help ?

 Yeah sorry I was hoping to get time to work on it but haven't been able
 to.

 It's causing crashes on ppc64 ie. big endian.
> ...
>>>
>>> Can you tell what defconfig you are using ? I have been able to  
>>> setup a full glibc PPC64 cross
>>> compilation chain and been able to test it under QEMU with  
>>> success, using Nathan's vdsotest tool.
>>
>> What config are you using ?
>
> ppc64_defconfig + guest.config
>
> Or pseries_defconfig.
>
> I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other  
> toolchains too.
>
> At a minimum we're seeing relocations in the output, which is a problem:
>
>    $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so
>    Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries:
>  Offset  Info   Type   Sym. Value     
> Sym. Name + Addend
>    1368  0016 R_PPC64_RELATIVE 7c0
>    1370  0016 R_PPC64_RELATIVE 9300
>    1380  0016 R_PPC64_RELATIVE 970
>    1388  0016 R_PPC64_RELATIVE 9300
>    1398  0016 R_PPC64_RELATIVE a90
>    13a0  0016 R_PPC64_RELATIVE 9300
>    13b0  0016 R_PPC64_RELATIVE b20
>    13b8  0016 R_PPC64_RELATIVE 9300

 Looks like it's due to the OPD and relation between the function()  
 and .function()

 By using DOTSYM() in the 'bl' call, that's directly the dot  
 function which is called and the OPD is
 not used anymore, it can get dropped.

 Now I get .rela.dyn full of 0, don't know if we should drop it explicitely.
>>>
>>> What is the status now with latest version of CVDSO ? I saw you had  
>>> it in next-test for some time,
>>> it is not there anymore today.
>>
>> Still having some trouble with the compat VDSO.
>>
>> eg:
>>
>> $ ./vdsotest clock-gettime-monotonic verify
>> timestamp obtained from kernel predates timestamp
>> previously obtained from libc/vDSO:
>>  [1346, 821441653] (vDSO)
>>  [570, 769440040] (kernel)
>>
>>
>> And similar for all clocks except the coarse ones.
>>
>
> Ok, I managed to get the same with QEMU. Looking at the binary, I only  
> see an mftb instead of the mftbu/mftb/mftbu triplet.
>
> Fix below. Can you carry it, or do you prefer a full patch from me ?  
> The easiest would be either to squash it into [v13,4/8]  
> ("powerpc/time: Move timebase functions into new asm/timebase.h"), or  
> to add it between patch 4 and 5 ?

I can squash it in.

cheers

Re: [PATCH] powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S

2020-11-25 Thread Michael Ellerman

On Wed, 4 Nov 2020 18:59:10 +0800, Youling Tang wrote:
> Use the common INIT_DATA_SECTION rule for the linker script in an effort
> to regularize the linker script.

Applied to powerpc/next.

[1/1] powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
  https://git.kernel.org/powerpc/c/fdcfeaba38e5b183045f5b079af94f97658eabe6

cheers

Re: [PATCH] Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"

2020-11-25 Thread Michael Ellerman

On Tue, 10 Nov 2020 21:07:52 -0500, Zhang Xiaoxu wrote:
> This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980.
> 
> Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR
> support for drc-info property"), the 'cpu_drcs' wouldn't be double
> freed when the 'cpus' node not found.
> 
> So we needn't apply this patch, otherwise, the memory will be leak.

Applied to powerpc/next.

[1/1] Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
  https://git.kernel.org/powerpc/c/a40fdaf1420d6e6bda0dd2df1e6806013e58dbe1

cheers

Re: [PATCH] powerpc/powernv/sriov: fix unsigned int win compared to less than zero

2020-11-25 Thread Michael Ellerman

On Tue, 10 Nov 2020 19:19:30 +0800, xiakaixu1...@gmail.com wrote:
> Fix coccicheck warning:
> 
> ./arch/powerpc/platforms/powernv/pci-sriov.c:443:7-10: WARNING: Unsigned 
> expression compared with zero: win < 0
> ./arch/powerpc/platforms/powernv/pci-sriov.c:462:7-10: WARNING: Unsigned 
> expression compared with zero: win < 0

Applied to powerpc/next.

[1/1] powerpc/powernv/sriov: fix unsigned int win compared to less than zero
  https://git.kernel.org/powerpc/c/027717a45ca251a7ba67a63db359994836962cd2

cheers

Re: [PATCH] powerpc/mm: Fix comparing pointer to 0 warning

2020-11-25 Thread Michael Ellerman

On Tue, 10 Nov 2020 10:56:01 +0800, xiakaixu1...@gmail.com wrote:
> Fixes coccicheck warning:
> 
> ./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0
> 
> Avoid pointer type value compared to 0.

Applied to powerpc/next.

[1/1] powerpc/mm: Fix comparing pointer to 0 warning
  https://git.kernel.org/powerpc/c/b84bf098fcc49ed6bf4b0a8bed52e9df0e8f1de7

cheers

Re: [PATCHv2] selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

2020-11-25 Thread Michael Ellerman

On Fri, 23 Oct 2020 10:45:39 +0800, Po-Hsu Lin wrote:
> The eeh-basic test got its own 60 seconds timeout (defined in commit
> 414f50434aa2 "selftests/eeh: Bump EEH wait time to 60s") per breakable
> device.
> 
> And we have discovered that the number of breakable devices varies
> on different hardware. The device recovery time ranges from 0 to 35
> seconds. In our test pool it will take about 30 seconds to run on a
> Power8 system that with 5 breakable devices, 60 seconds to run on a
> Power9 system that with 4 breakable devices.
> 
> [...]

Applied to powerpc/next.

[1/1] selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic
  https://git.kernel.org/powerpc/c/f5eca0b279117f25020112a2f65ec9c3ea25f3ac

cheers

Re: [PATCH] powerpc/ps3: Drop unused DBG macro

2020-11-25 Thread Michael Ellerman

On Fri, 23 Oct 2020 14:13:05 +1100, Michael Ellerman wrote:
> This DBG macro is unused, and has been unused since the file was
> originally merged into mainline. Just drop it.

Applied to powerpc/next.

[1/1] powerpc/ps3: Drop unused DBG macro
  https://git.kernel.org/powerpc/c/cb5d4c465f31bc44b8bbd4934678c2b140a2ad29

cheers

Re: [PATCH] powerpc/85xx: Fix declaration made after definition

2020-11-25 Thread Michael Ellerman

On Fri, 23 Oct 2020 13:08:38 +1100, Michael Ellerman wrote:
> Currently the clang build of corenet64_smp_defconfig fails with:
> 
>   arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error:
>   attribute declaration must precede definition
>   machine_arch_initcall(corenet_generic, corenet_gen_publish_devices);
> 
> Fix it by moving the initcall definition prior to the machine
> definition, and directly below the function it calls, which is the
> usual style anyway.

Applied to powerpc/next.

[1/1] powerpc/85xx: Fix declaration made after definition
  https://git.kernel.org/powerpc/c/ef78f2dd2398ce8ed9eeaab9c9f8af2e15f5d870

cheers

Re: [PATCH] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()

2020-11-25 Thread Michael Ellerman

On Wed, 28 Oct 2020 17:15:51 +0800, Qinglang Miao wrote:
> I noticed that iounmap() of msgr_block_addr before return from
> mpic_msgr_probe() in the error handling case is missing. So use
> devm_ioremap() instead of just ioremap() when remapping the message
> register block, so the mapping will be automatically released on
> probe failure.

Applied to powerpc/next.

[1/1] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
  https://git.kernel.org/powerpc/c/ffa1797040c5da391859a9556be7b735acbe1242

cheers

Re: [PATCH] powerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory

2020-11-25 Thread Michael Ellerman

On Wed, 11 Nov 2020 22:01:51 +1000, Nicholas Piggin wrote:
> read_user_stack_slow that walks user address translation by hand is
> only required on hash, because a hash fault can not be serviced from
> "NMI" context (to avoid re-entering the hash code) so the user stack
> can be mapped into Linux page tables but not accessible by the CPU.
> 
> Radix MMU mode does not have this restriction. A page fault failure
> would indicate the page is not accessible via get_user_pages either,
> so avoid this on radix.

Applied to powerpc/next.

[1/1] powerpc/64s/perf: perf interrupt does not have to get_user_pages to 
access user memory
  https://git.kernel.org/powerpc/c/987c426320cce72d1b28f55c8603b239e4f7187c

cheers

Re: [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations

2020-11-25 Thread Michael Ellerman

On Wed, 11 Nov 2020 15:53:14 +0100, David Hildenbrand wrote:
> Based on latest linux/master
> 
> powernv/memtrace is the only in-kernel user that rips out random memory
> it never added (doesn't own) in order to allocate memory without a
> linear mapping. Let's stop abusing memory hot(un)plug infrastructure for
> that - use alloc_contig_pages() for allocating memory and remove the
> linear mapping manually.
> 
> [...]

Applied to powerpc/next.

[1/8] powerpc/powernv/memtrace: Don't leak kernel memory to user space
  https://git.kernel.org/powerpc/c/c74cf7a3d59a21b290fe0468f5b470d0b8ee37df
[2/8] powerpc/powernv/memtrace: Fix crashing the kernel when enabling 
concurrently
  https://git.kernel.org/powerpc/c/d6718941a2767fb383e105d257d2105fe4f15f0e
[3/8] powerpc/mm: factor out creating/removing linear mapping
  https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42
[4/8] powerpc/mm: protect linear mapping modifications by a mutex
  https://git.kernel.org/powerpc/c/e5b2af044f31bf18defa557a8cd11c23caefa34c
[5/8] powerpc/mm: print warning in arch_remove_linear_mapping()
  https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2
[6/8] powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()
  https://git.kernel.org/powerpc/c/d8bd9a121c2f2bc8b36da930dc91b69fd2a705e2
[7/8] powerpc/mm: remove linear mapping if __add_pages() fails in 
arch_add_memory()
  https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2
[8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for 
memory allocations
  https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70

cheers

Re: [PATCH v4 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()

2020-11-25 Thread Michael Ellerman

On Wed, 14 Oct 2020 18:28:36 +1100, Jordan Niethe wrote:
> Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
> is called before a stack has been set up in r1. This was previously fine
> as the cpu_restore() functions were implemented in assembly and did not
> use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
> device tree binding for discovering CPU features") used
> __restore_cpu_cpufeatures() as the cpu_restore() function for a
> device-tree features based cputable entry. This is a C function and
> hence uses a stack in r1.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
  https://git.kernel.org/powerpc/c/3c0b976bf20d236c57adcefa80f86a0a1d737727
[2/2] powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
  https://git.kernel.org/powerpc/c/344fbab991a568dc33ad90711b489d870e18d26d

cheers

Re: [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations

2020-11-25 Thread Michael Ellerman

On Thu, 29 Oct 2020 17:27:14 +0100, David Hildenbrand wrote:
> powernv/memtrace is the only in-kernel user that rips out random memory
> it never added (doesn't own) in order to allocate memory without a
> linear mapping. Let's stop abusing memory hot(un)plug infrastructure for
> that - use alloc_contig_pages() for allocating memory and remove the
> linear mapping manually.
> 
> The original idea was discussed in:
>  https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915...@redhat.com
> 
> [...]

Applied to powerpc/next.

[1/4] powerpc/mm: factor out creating/removing linear mapping
  https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42
[2/4] powerpc/mm: print warning in arch_remove_linear_mapping()
  https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2
[3/4] powerpc/mm: remove linear mapping if __add_pages() fails in 
arch_add_memory()
  https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2
[4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for 
memory allocations
  https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70

cheers

Re: [PATCH v2 1/3] powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI

2020-11-25 Thread Michael Ellerman

On Sun, 8 Nov 2020 16:57:35 + (UTC), Christophe Leroy wrote:
> In head_64.S, we have two places using RFI to return to
> kernel. Use RFI_TO_KERNEL instead.
> 
> They are the two only places using RFI on book3s/64, so
> the RFI macro can go away.

Applied to powerpc/next.

[1/3] powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI
  https://git.kernel.org/powerpc/c/879add7720172ffd2986c44587510fabb7af52f5
[2/3] powerpc: Replace RFI by rfi on book3s/32 and booke
  https://git.kernel.org/powerpc/c/120c0518ec321f33cdc4670059fb76e96ceb56eb
[3/3] powerpc: Remove RFI macro
  https://git.kernel.org/powerpc/c/62182e6c0faf75117f8d1719c118bb5fc8574012

cheers

Re: [PATCH v13 0/8] powerpc: switch VDSO to C implementation

2020-11-25 Thread Michael Ellerman

On Tue, 3 Nov 2020 18:07:11 + (UTC), Christophe Leroy wrote:
> This is a series to switch powerpc VDSO to generic C implementation.
> 
> Changes in v13:
> - Reorganised headers to avoid the need for a fake 32 bits config for 
> building VDSO32 on PPC64
> - Rebased after the removal of powerpc 601
> - Using DOTSYM() macro to call functions directly without using OPD
> - Explicitely dropped .opd and .got1 sections which are now unused
> 
> [...]

Patch 1 applied to powerpc/next.

[1/8] powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
  https://git.kernel.org/powerpc/c/78665179e569c7e1fe102fb6c21d0f5b6951f084

cheers

Re: [PATCH] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()

2020-11-25 Thread Michael Ellerman

On Thu, 22 Oct 2020 14:05:46 + (UTC), Christophe Leroy wrote:
> fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
> On powerpc, those builtins trivially use ctlzw and ctlzd power
> instructions.
> 
> Allthough those instructions provide the expected result with
> input argument 0, __builtin_ctz() and __builtin_ctzll() are
> documented as undefined for value 0.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
  https://git.kernel.org/powerpc/c/1891ef21d92c4801ea082ee8ed478e304ddc6749

cheers

Re: [PATCH] powerpc: avoid broken GCC attribute((optimize))

2020-11-25 Thread Michael Ellerman

On Wed, 28 Oct 2020 09:04:33 +0100, Ard Biesheuvel wrote:
> Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
> introduced a couple of uses of __attribute__((optimize)) with function
> scope, to disable the stack protector in some early boot code.
> 
> Unfortunately, and this is documented in the GCC man pages [0], overriding
> function attributes for optimization is broken, and is only supported for
> debug scenarios, not for production: the problem appears to be that
> setting GCC -f flags using this method will cause it to forget about some
> or all other optimization settings that have been applied.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Avoid broken GCC __attribute__((optimize))
  https://git.kernel.org/powerpc/c/a7223f5bfcaeade4a86d35263493bcda6c940891

cheers

Re: [PATCH v2] powerpc/mm: Update tlbiel loop on POWER10

2020-11-25 Thread Michael Ellerman

On Wed, 7 Oct 2020 11:03:05 +0530, Aneesh Kumar K.V wrote:
> With POWER10, single tlbiel instruction invalidates all the congruence
> class of the TLB and hence we need to issue only one tlbiel with SET=0.

Applied to powerpc/next.

[1/1] powerpc/mm: Update tlbiel loop on POWER10
  https://git.kernel.org/powerpc/c/e80639405c40127727812a0e1f8a65ba9979f146

cheers

Re: [PATCH] powerpc/mm: move setting pte specific flags to pfn_pmd

2020-11-25 Thread Michael Ellerman

On Thu, 22 Oct 2020 14:41:15 +0530, Aneesh Kumar K.V wrote:
> powerpc used to set the pte specific flags in set_pte_at().  This is
> different from other architectures. To be consistent with other
> architecture powerpc updated pfn_pte to set _PAGE_PTE with
> commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte")
> 
> The commit didn't do the same w.r.t pfn_pmd because we expect pmd_mkhuge
> to do that. But as per Linus that is a bad rule [1].
> Hence update pfn_pmd to set _PAGE_PTE.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/mm: Move setting PTE specific flags to pfn_pmd()
  https://git.kernel.org/powerpc/c/53f45ecc9cd04b4b963f3040f2a54c3baf03b229

cheers

[PATCH v2 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()

2020-11-25 Thread Laurent Vivier

With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
this is broken on pseries.

The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.

It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.

As the previous patch has added the affinity parameter to
irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
to irq_domain_alloc_descs().

With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.

Signed-off-by: Laurent Vivier 
---
 arch/powerpc/platforms/pseries/msi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 133f6adcb39c..b3ac2455faad 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
return hwirq;
}
 
-   virq = irq_create_mapping(NULL, hwirq);
+   virq = irq_create_mapping_affinity(NULL, hwirq,
+  entry->affinity);
 
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.28.0

[PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function

2020-11-25 Thread Laurent Vivier

This function adds an affinity parameter to irq_create_mapping().
This parameter is needed to pass it to irq_domain_alloc_descs().

irq_create_mapping() is a wrapper around irq_create_mapping_affinity()
to pass NULL for the affinity parameter.

No functional change.

Signed-off-by: Laurent Vivier 
---
 include/linux/irqdomain.h | 12 ++--
 kernel/irq/irqdomain.c| 13 -
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 71535e87109f..ea5a337e0f8b 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain 
*domain,
 extern void irq_domain_disassociate(struct irq_domain *domain,
unsigned int irq);
 
-extern unsigned int irq_create_mapping(struct irq_domain *host,
-  irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
+ irq_hw_number_t hwirq,
+ const struct irq_affinity_desc *affinity);
 extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
 extern void irq_dispose_mapping(unsigned int virq);
 
+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+ irq_hw_number_t hwirq)
+{
+   return irq_create_mapping_affinity(host, hwirq, NULL);
+}
+
+
 /**
  * irq_linear_revmap() - Find a linux irq from a hw irq number.
  * @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..e4ca69608f3b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain 
*domain)
 EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 
 /**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq 
space
  * @domain: domain owning this hardware interrupt or NULL for default domain
  * @hwirq: hardware irq number in that domain space
+ * @affinity: irq affinity
  *
  * Only one mapping per hardware interrupt is permitted. Returns a linux
  * irq number.
  * If the sense/trigger is to be specified, set_irq_type() should be called
  * on the number returned from that call.
  */
-unsigned int irq_create_mapping(struct irq_domain *domain,
-   irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
+  irq_hw_number_t hwirq,
+  const struct irq_affinity_desc *affinity)
 {
struct device_node *of_node;
int virq;
@@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
}
 
/* Allocate a virtual interrupt number */
-   virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), 
NULL);
+   virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+ affinity);
if (virq <= 0) {
pr_debug("-> virq allocation failed\n");
return 0;
@@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 
return virq;
 }
-EXPORT_SYMBOL_GPL(irq_create_mapping);
+EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
 
 /**
  * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
-- 
2.28.0

[PATCH v2 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries

2020-11-25 Thread Laurent Vivier

With virtio, in multiqueue case, each queue IRQ is normally
bound to a different CPU using the affinity mask.

This works fine on x86_64 but totally ignored on pseries.

This is not obvious at first look because irqbalance is doing
some balancing to improve that.

It appears that the "managed" flag set in the MSI entry
is never copied to the system IRQ entry.

This series passes the affinity mask from rtas_setup_msi_irqs()
to irq_domain_alloc_descs() by adding an affinity parameter to
irq_create_mapping().

The first patch adds the parameter (no functional change), the
second patch passes the actual affinity mask to irq_create_mapping()
in rtas_setup_msi_irqs().

For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:

... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32

for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
for file in /proc/irq/$IRQ/ ; do
echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
done
done

Without the patch (and without irqbalanced)

IRQ: 268 CPU: 0-31
IRQ: 269 CPU: 0-31
IRQ: 270 CPU: 0-31
IRQ: 271 CPU: 0-31
IRQ: 272 CPU: 0-31
IRQ: 273 CPU: 0-31
IRQ: 274 CPU: 0-31
IRQ: 275 CPU: 0-31
IRQ: 276 CPU: 0-31
IRQ: 277 CPU: 0-31
IRQ: 278 CPU: 0-31
IRQ: 279 CPU: 0-31
IRQ: 280 CPU: 0-31
IRQ: 281 CPU: 0-31
IRQ: 282 CPU: 0-31
IRQ: 283 CPU: 0-31
IRQ: 284 CPU: 0-31
IRQ: 285 CPU: 0-31
IRQ: 286 CPU: 0-31
IRQ: 287 CPU: 0-31
IRQ: 288 CPU: 0-31
IRQ: 289 CPU: 0-31
IRQ: 290 CPU: 0-31
IRQ: 291 CPU: 0-31
IRQ: 292 CPU: 0-31
IRQ: 293 CPU: 0-31
IRQ: 294 CPU: 0-31
IRQ: 295 CPU: 0-31
IRQ: 296 CPU: 0-31
IRQ: 297 CPU: 0-31
IRQ: 298 CPU: 0-31
IRQ: 299 CPU: 0-31

With the patch:

IRQ: 265 CPU: 0
IRQ: 266 CPU: 1
IRQ: 267 CPU: 2
IRQ: 268 CPU: 3
IRQ: 269 CPU: 4
IRQ: 270 CPU: 5
IRQ: 271 CPU: 6
IRQ: 272 CPU: 7
IRQ: 273 CPU: 8
IRQ: 274 CPU: 9
IRQ: 275 CPU: 10
IRQ: 276 CPU: 11
IRQ: 277 CPU: 12
IRQ: 278 CPU: 13
IRQ: 279 CPU: 14
IRQ: 280 CPU: 15
IRQ: 281 CPU: 16
IRQ: 282 CPU: 17
IRQ: 283 CPU: 18
IRQ: 284 CPU: 19
IRQ: 285 CPU: 20
IRQ: 286 CPU: 21
IRQ: 287 CPU: 22
IRQ: 288 CPU: 23
IRQ: 289 CPU: 24
IRQ: 290 CPU: 25
IRQ: 291 CPU: 26
IRQ: 292 CPU: 27
IRQ: 293 CPU: 28
IRQ: 294 CPU: 29
IRQ: 295 CPU: 30
IRQ: 299 CPU: 31

This matches what we have on an x86_64 system.

v2: add a wrapper around original irq_create_mapping() with the
affinity parameter. Update comments

Laurent Vivier (2):
  genirq: add an irq_create_mapping_affinity() function
  powerpc/pseries: pass MSI affinity to irq_create_mapping()

 arch/powerpc/platforms/pseries/msi.c |  3 ++-
 include/linux/irqdomain.h| 12 ++--
 kernel/irq/irqdomain.c   | 13 -
 3 files changed, 20 insertions(+), 8 deletions(-)

-- 
2.28.0

Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions

2020-11-25 Thread Ravi Bangoria





diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 855457ed09b5..25a5436be6c6 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2157,11 +2157,15 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
  
  		case 23:	/* lwzx */

case 55:/* lwzux */
+   if (u && (ra == 0 || ra == rd))
+   return -1;


I guess you also need to split case 23 and 55?

- Ravi

Re: C vdso

2020-11-25 Thread Christophe Leroy




Quoting Michael Ellerman :


Christophe Leroy  writes:

Le 03/11/2020 à 19:13, Christophe Leroy a écrit :

Le 23/10/2020 à 15:24, Michael Ellerman a écrit :

Christophe Leroy  writes:

Le 24/09/2020 à 15:17, Christophe Leroy a écrit :

Le 17/09/2020 à 14:33, Michael Ellerman a écrit :

Christophe Leroy  writes:


What is the status with the generic C vdso merge ?
In some mail, you mentionned having difficulties getting it working on
ppc64, any progress ? What's the problem ? Can I help ?


Yeah sorry I was hoping to get time to work on it but haven't been able
to.

It's causing crashes on ppc64 ie. big endian.

...


Can you tell what defconfig you are using ? I have been able to  
setup a full glibc PPC64 cross
compilation chain and been able to test it under QEMU with  
success, using Nathan's vdsotest tool.


What config are you using ?


ppc64_defconfig + guest.config

Or pseries_defconfig.

I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other  
toolchains too.


At a minimum we're seeing relocations in the output, which is a problem:

   $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so
   Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries:
 Offset  Info   Type   Sym. Value     
Sym. Name + Addend

   1368  0016 R_PPC64_RELATIVE 7c0
   1370  0016 R_PPC64_RELATIVE 9300
   1380  0016 R_PPC64_RELATIVE 970
   1388  0016 R_PPC64_RELATIVE 9300
   1398  0016 R_PPC64_RELATIVE a90
   13a0  0016 R_PPC64_RELATIVE 9300
   13b0  0016 R_PPC64_RELATIVE b20
   13b8  0016 R_PPC64_RELATIVE 9300


Looks like it's due to the OPD and relation between the function()  
and .function()


By using DOTSYM() in the 'bl' call, that's directly the dot  
function which is called and the OPD is

not used anymore, it can get dropped.

Now I get .rela.dyn full of 0, don't know if we should drop it explicitely.


What is the status now with latest version of CVDSO ? I saw you had  
it in next-test for some time,

it is not there anymore today.


Still having some trouble with the compat VDSO.

eg:

$ ./vdsotest clock-gettime-monotonic verify
timestamp obtained from kernel predates timestamp
previously obtained from libc/vDSO:
[1346, 821441653] (vDSO)
[570, 769440040] (kernel)


And similar for all clocks except the coarse ones.



Ok, I managed to get the same with QEMU. Looking at the binary, I only  
see an mftb instead of the mftbu/mftb/mftbu triplet.


Fix below. Can you carry it, or do you prefer a full patch from me ?  
The easiest would be either to squash it into [v13,4/8]  
("powerpc/time: Move timebase functions into new asm/timebase.h"), or  
to add it between patch 4 and 5 ?


diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f877a576b338..c3473eb031a3 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1419,7 +1419,7 @@ static inline void msr_check_and_clear(unsigned  
long bits)

__msr_check_and_clear(bits);
 }

-#if defined(CONFIG_PPC_CELL) || defined(CONFIG_E500)
+#if defined(__powerpc64__) && (defined(CONFIG_PPC_CELL) ||  
defined(CONFIG_E500))

 #define mftb() ({unsigned long rval;   \
asm volatile(   \
"90:   mfspr %0, %2;\n"   \
diff --git a/arch/powerpc/include/asm/timebase.h  
b/arch/powerpc/include/asm/timebase.h

index a8eae3adaa91..7b372976f5a5 100644
--- a/arch/powerpc/include/asm/timebase.h
+++ b/arch/powerpc/include/asm/timebase.h
@@ -21,7 +21,7 @@ static inline u64 get_tb(void)
 {
unsigned int tbhi, tblo, tbhi2;

-   if (IS_ENABLED(CONFIG_PPC64))
+   if (IS_BUILTIN(__powerpc64__))
return mftb();

do {

Re: [PATCH v4 10/18] dt-bindings: usb: Convert DWC USB3 bindings to DT schema

2020-11-25 Thread Serge Semin

On Sat, Nov 21, 2020 at 06:42:28AM -0600, Rob Herring wrote:
> On Thu, Nov 12, 2020 at 01:29:46PM +0300, Serge Semin wrote:
> > On Wed, Nov 11, 2020 at 02:14:23PM -0600, Rob Herring wrote:
> > > On Wed, Nov 11, 2020 at 12:08:45PM +0300, Serge Semin wrote:
> > > > DWC USB3 DT node is supposed to be compliant with the Generic xHCI
> > > > Controller schema, but with additional vendor-specific properties, the
> > > > controller-specific reference clocks and PHYs. So let's convert the
> > > > currently available legacy text-based DWC USB3 bindings to the DT schema
> > > > and make sure the DWC USB3 nodes are also validated against the
> > > > usb-xhci.yaml schema.
> > > > 
> > > > Note we have to discard the nodename restriction of being prefixed with
> > > > "dwc3@" string, since in accordance with the usb-hcd.yaml schema USB 
> > > > nodes
> > > > are supposed to be named as "^usb(@.*)".
> > > > 
> > > > Signed-off-by: Serge Semin 
> > > > 
> > > > ---
> > > > 
> > > > Changelog v2:
> > > > - Discard '|' from the descriptions, since we don't need to preserve
> > > >   the text formatting in any of them.
> > > > - Drop quotes from around the string constants.
> > > > - Fix the "clock-names" prop description to be referring the enumerated
> > > >   clock-names instead of the ones from the Databook.
> > > > 
> > > > Changelog v3:
> > > > - Apply usb-xhci.yaml# schema only if the controller is supposed to work
> > > >   as either host or otg.
> > > > 
> > > > Changelog v4:
> > > > - Apply usb-drd.yaml schema first. If the controller is configured
> > > >   to work in a gadget mode only, then apply the usb.yaml schema too,
> > > >   otherwise apply the usb-xhci.yaml schema.
> > > > - Discard the Rob'es Reviewed-by tag. Please review the patch one more
> > > >   time.
> > > > ---
> > > >  .../devicetree/bindings/usb/dwc3.txt  | 125 
> > > >  .../devicetree/bindings/usb/snps,dwc3.yaml| 303 ++
> > > >  2 files changed, 303 insertions(+), 125 deletions(-)
> > > >  delete mode 100644 Documentation/devicetree/bindings/usb/dwc3.txt
> > > >  create mode 100644 Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> 
> 
> > > > diff --git a/Documentation/devicetree/bindings/usb/snps,dwc3.yaml 
> > > > b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> > > > new file mode 100644
> > > > index ..079617891da6
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml
> > > > @@ -0,0 +1,303 @@
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/usb/snps,dwc3.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: Synopsys DesignWare USB3 Controller
> > > > +
> > > > +maintainers:
> > > > +  - Felipe Balbi 
> > > > +
> > > > +description:
> > > > +  This is usually a subnode to DWC3 glue to which it is connected, but 
> > > > can also
> > > > +  be presented as a standalone DT node with an optional vendor-specific
> > > > +  compatible string.
> > > > +
> > 
> > > > +allOf:
> > > > +  - $ref: usb-drd.yaml#
> > > > +  - if:
> > > > +  properties:
> > > > +dr_mode:
> > > > +  const: peripheral
> 

> Another thing, this evaluates to true if dr_mode is not present. You 
> need to add 'required'?

Right. Will something like this do that?

+ allOf:
+  - $ref: usb-drd.yaml#
+  - if:
+  properties:
+dr_mode:
+  const: peripheral
+ 
+  required:
+- dr_mode
+then:
+  $ref: usb.yaml#
+else
+  $ref: usb-xhci.yaml#

> If dr_mode is otg, then don't you need to apply 
> both usb.yaml and usb-xhci.yaml?

No I don't. Since there is no peripheral-specific DT schema, then the
only schema any USB-gadget node needs to pass is usb.yaml, which
is already included into the usb-xhci.yaml schema. So for pure OTG devices
with xHCI host and gadget capabilities it's enough to evaluate: allOf:
[$ref: usb-drd.yaml#, $ref: usb-xhci.yaml#].  Please see the
sketch/ASCII-figure below and the following text for details.

-Sergey

> 
> > > > +then:
> > > > +  $ref: usb.yaml#
> > > 
> > > This part could be done in usb-drd.yaml?
> > 
> > Originally I was thinking about that, but then in order to minimize
> > the properties validation I've decided to split the properties in
> > accordance with the USB controllers functionality:
> > 
> > +- USB Gadget/Peripheral Controller. There is no
> > |  specific schema for the gadgets since there is no
> > |  common gadget properties (at least I failed to find
> > |  ones). So the pure gadget controllers need to be
> > |  validated just against usb.yaml schema.
> > |
> > usb.yaml <--+-- usb-hcd.yaml - Generic USB Host Controller. The schema
> > ^  turns out to include the OHCI/UHCI/EHCI
> > |  properties, which AFAICS

1 2 >

1 - 100 of 101 matches

Mail list logo