Re: [PATCH v6 16/22] powerpc/book3s64/kuap: Improve error reporting with KUAP
Christophe Leroy writes: > Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : >> With hash translation use DSISR_KEYFAULT to identify a wrong access. >> With Radix we look at the AMR value and type of fault. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> arch/powerpc/include/asm/book3s/32/kup.h | 4 +-- >> arch/powerpc/include/asm/book3s/64/kup.h | 27 >> arch/powerpc/include/asm/kup.h | 4 +-- >> arch/powerpc/include/asm/nohash/32/kup-8xx.h | 4 +-- >> arch/powerpc/mm/fault.c | 2 +- >> 5 files changed, 29 insertions(+), 12 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/32/kup.h >> b/arch/powerpc/include/asm/book3s/32/kup.h >> index 32fd4452e960..b18cd931e325 100644 >> --- a/arch/powerpc/include/asm/book3s/32/kup.h >> +++ b/arch/powerpc/include/asm/book3s/32/kup.h >> @@ -177,8 +177,8 @@ static inline void restore_user_access(unsigned long >> flags) >> allow_user_access(to, to, end - addr, KUAP_READ_WRITE); >> } >> >> -static inline bool >> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) >> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long >> address, >> + bool is_write, unsigned long error_code) >> { >> unsigned long begin = regs->kuap & 0xf000; >> unsigned long end = regs->kuap << 28; >> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >> b/arch/powerpc/include/asm/book3s/64/kup.h >> index 4a3d0d601745..2922c442a218 100644 >> --- a/arch/powerpc/include/asm/book3s/64/kup.h >> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >> @@ -301,12 +301,29 @@ static inline void set_kuap(unsigned long value) >> isync(); >> } >> >> -static inline bool >> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) >> +#define RADIX_KUAP_BLOCK_READ UL(0x4000) >> +#define RADIX_KUAP_BLOCK_WRITE UL(0x8000) >> + >> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long >> address, >> + bool is_write, unsigned long error_code) >> { >> -return WARN(mmu_has_feature(MMU_FTR_KUAP) && >> -(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : >> AMR_KUAP_BLOCK_READ)), >> -"Bug: %s fault blocked by AMR!", is_write ? "Write" : >> "Read"); >> +if (!mmu_has_feature(MMU_FTR_KUAP)) >> +return false; >> + >> +if (radix_enabled()) { >> +/* >> + * Will be a storage protection fault. >> + * Only check the details of AMR[0] >> + */ >> +return WARN((regs->kuap & (is_write ? RADIX_KUAP_BLOCK_WRITE : >> RADIX_KUAP_BLOCK_READ)), >> +"Bug: %s fault blocked by AMR!", is_write ? "Write" >> : "Read"); > > I think it is pointless to keep the WARN() here. > > I have a series aiming at removing them. See > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cc9129bdda1dbc2f0a09cf45fece7d0b0e690784.1605541983.git.christophe.le...@csgroup.eu/ Can we do this as a spearate patch as you posted above? We can drop the WARN in that while keeping the hash branch to look at DSISR value. -aneesh
Re: [PATCH v6 09/22] powerpc/exec: Set thread.regs early during exec
Le 26/11/2020 à 08:38, Aneesh Kumar K.V a écrit : Christophe Leroy writes: Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : +++ b/arch/powerpc/kernel/process.c @@ -1530,10 +1530,32 @@ void flush_thread(void) #ifdef CONFIG_PPC_BOOK3S_64 void arch_setup_new_exec(void) { - if (radix_enabled()) - return; - hash__setup_new_exec(); + if (!radix_enabled()) + hash__setup_new_exec(); + + /* +* If we exec out of a kernel thread then thread.regs will not be +* set. Do it now. +*/ + if (!current->thread.regs) { + struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; + current->thread.regs = regs - 1; + } + +} +#else +void arch_setup_new_exec(void) +{ + /* +* If we exec out of a kernel thread then thread.regs will not be +* set. Do it now. +*/ + if (!current->thread.regs) { + struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; + current->thread.regs = regs - 1; + } } + #endif No need to duplicate arch_setup_new_exec() I think. radix_enabled() is defined at all time so the first function should be valid at all time. arch/powerpc/kernel/process.c: In function ‘arch_setup_new_exec’: arch/powerpc/kernel/process.c:1529:3: error: implicit declaration of function ‘hash__setup_new_exec’; did you mean ‘arch_setup_new_exec’? [-Werror=implicit-function-declaration] 1529 | hash__setup_new_exec(); | ^~~~ | arch_setup_new_exec That requires us to have hash__setup_new_exec prototype for all platforms. Yes indeed. So maybe, just enclose that part in the #ifdef instead of duplicating the common part ? Christophe
Re: [PATCH v6 09/22] powerpc/exec: Set thread.regs early during exec
Christophe Leroy writes: > Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : > +++ b/arch/powerpc/kernel/process.c >> @@ -1530,10 +1530,32 @@ void flush_thread(void) >> #ifdef CONFIG_PPC_BOOK3S_64 >> void arch_setup_new_exec(void) >> { >> -if (radix_enabled()) >> -return; >> -hash__setup_new_exec(); >> +if (!radix_enabled()) >> +hash__setup_new_exec(); >> + >> +/* >> + * If we exec out of a kernel thread then thread.regs will not be >> + * set. Do it now. >> + */ >> +if (!current->thread.regs) { >> +struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; >> +current->thread.regs = regs - 1; >> +} >> + >> +} >> +#else >> +void arch_setup_new_exec(void) >> +{ >> +/* >> + * If we exec out of a kernel thread then thread.regs will not be >> + * set. Do it now. >> + */ >> +if (!current->thread.regs) { >> +struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; >> +current->thread.regs = regs - 1; >> +} >> } >> + >> #endif > > No need to duplicate arch_setup_new_exec() I think. radix_enabled() is > defined at all time so the > first function should be valid at all time. > arch/powerpc/kernel/process.c: In function ‘arch_setup_new_exec’: arch/powerpc/kernel/process.c:1529:3: error: implicit declaration of function ‘hash__setup_new_exec’; did you mean ‘arch_setup_new_exec’? [-Werror=implicit-function-declaration] 1529 | hash__setup_new_exec(); | ^~~~ | arch_setup_new_exec That requires us to have hash__setup_new_exec prototype for all platforms. -aneesh
Re: [PATCH] tpm: ibmvtpm: fix error return code in tpm_ibmvtpm_probe()
On Tue, 2020-11-24 at 21:52 +0800, Wang Hai wrote: > Fix to return a negative error code from the error handling > case instead of 0, as done elsewhere in this function. > > Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before > proceeding") > Reported-by: Hulk Robot > Signed-off-by: Wang Hai Provide a reasoning for -ETIMEOUT in the commit message. /Jarkko > --- > drivers/char/tpm/tpm_ibmvtpm.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/char/tpm/tpm_ibmvtpm.c > b/drivers/char/tpm/tpm_ibmvtpm.c > index 994385bf37c0..813eb2cac0ce 100644 > --- a/drivers/char/tpm/tpm_ibmvtpm.c > +++ b/drivers/char/tpm/tpm_ibmvtpm.c > @@ -687,6 +687,7 @@ static int tpm_ibmvtpm_probe(struct vio_dev > *vio_dev, > ibmvtpm->rtce_buf != NULL, > HZ)) { > dev_err(dev, "CRQ response timed out\n"); > + rc = -ETIMEDOUT; > goto init_irq_cleanup; > } >
Re: [PATCH v6 04/22] powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init
"Aneesh Kumar K.V" writes: > This patch consolidates UAMOR update across pkey, kuap and kuep features. > The boot cpu initialize UAMOR via pkey init and both radix/hash do the > secondary cpu UAMOR init in early_init_mmu_secondary. > > We don't check for mmu_feature in radix secondary init because UAMOR > is a supported SPRN with all CPUs supporting radix translation. > The old code was not updating UAMOR if we had smap disabled and smep enabled. > This change handles that case. > > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index 3adcf730f478..bfe441af916a 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -620,9 +620,6 @@ void setup_kuap(bool disabled) > cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP; > } > > - /* Make sure userspace can't change the AMR */ > - mtspr(SPRN_UAMOR, 0); > - > /* >* Set the default kernel AMR values on all cpus. >*/ > @@ -721,6 +718,11 @@ void radix__early_init_mmu_secondary(void) > > radix__switch_mmu_context(NULL, _mm); > tlbiel_all(); > + > +#ifdef CONFIG_PPC_PKEY > + /* Make sure userspace can't change the AMR */ > + mtspr(SPRN_UAMOR, 0); > +#endif If PPC_PKEY is disabled I think this leaves UAMOR unset, which means it could potentially allow AMR to be used as a covert channel between processes. cheers
Re: [PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS
"Aneesh Kumar K.V" writes: > The next set of patches adds support for kuap with hash translation. > Hence make KUAP a BOOK3S_64 feature. Also make it a subfeature of > PPC_MEM_KEYS. Hash translation is going to use pkeys to support > KUAP/KUEP. Adding this dependency reduces the code complexity and > enables us to move some of the initialization code to pkeys.c The subject and change log don't really match the patch anymore since you incorporated my changes. This adds a new CONFIG called PPC_PKEY which is enabled if either PKEY or KUAP/KUEP is enabled etc. cheers > Signed-off-by: Aneesh Kumar K.V > --- > .../powerpc/include/asm/book3s/64/kup-radix.h | 4 ++-- > arch/powerpc/include/asm/book3s/64/mmu.h | 2 +- > arch/powerpc/include/asm/ptrace.h | 7 +- > arch/powerpc/kernel/asm-offsets.c | 3 +++ > arch/powerpc/mm/book3s64/Makefile | 2 +- > arch/powerpc/mm/book3s64/pkeys.c | 24 --- > arch/powerpc/platforms/Kconfig.cputype| 5 > 7 files changed, 33 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h > b/arch/powerpc/include/asm/book3s/64/kup-radix.h > index 28716e2f13e3..68eaa2fac3ab 100644 > --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h > +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h > @@ -16,7 +16,7 @@ > #ifdef CONFIG_PPC_KUAP > BEGIN_MMU_FTR_SECTION_NESTED(67) > mfspr \gpr1, SPRN_AMR > - ld \gpr2, STACK_REGS_KUAP(r1) > + ld \gpr2, STACK_REGS_AMR(r1) > cmpd\gpr1, \gpr2 > beq 998f > isync > @@ -48,7 +48,7 @@ > bne \msr_pr_cr, 99f > .endif > mfspr \gpr1, SPRN_AMR > - std \gpr1, STACK_REGS_KUAP(r1) > + std \gpr1, STACK_REGS_AMR(r1) > li \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT) > sldi\gpr2, \gpr2, AMR_KUAP_SHIFT > cmpd\use_cr, \gpr1, \gpr2 > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h > b/arch/powerpc/include/asm/book3s/64/mmu.h > index e0b52940e43c..a2a015066bae 100644 > --- a/arch/powerpc/include/asm/book3s/64/mmu.h > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h > @@ -199,7 +199,7 @@ extern int mmu_io_psize; > void mmu_early_init_devtree(void); > void hash__early_init_devtree(void); > void radix__early_init_devtree(void); > -#ifdef CONFIG_PPC_MEM_KEYS > +#ifdef CONFIG_PPC_PKEY > void pkey_early_init_devtree(void); > #else > static inline void pkey_early_init_devtree(void) {} > diff --git a/arch/powerpc/include/asm/ptrace.h > b/arch/powerpc/include/asm/ptrace.h > index e2c778c176a3..e7f1caa007a4 100644 > --- a/arch/powerpc/include/asm/ptrace.h > +++ b/arch/powerpc/include/asm/ptrace.h > @@ -53,9 +53,14 @@ struct pt_regs > #ifdef CONFIG_PPC64 > unsigned long ppr; > #endif > + union { > #ifdef CONFIG_PPC_KUAP > - unsigned long kuap; > + unsigned long kuap; > #endif > +#ifdef CONFIG_PPC_PKEY > + unsigned long amr; > +#endif > + }; > }; > unsigned long __pad[2]; /* Maintain 16 byte interrupt stack > alignment */ > }; > diff --git a/arch/powerpc/kernel/asm-offsets.c > b/arch/powerpc/kernel/asm-offsets.c > index c2722ff36e98..418a0b314a33 100644 > --- a/arch/powerpc/kernel/asm-offsets.c > +++ b/arch/powerpc/kernel/asm-offsets.c > @@ -354,6 +354,9 @@ int main(void) > STACK_PT_REGS_OFFSET(_PPR, ppr); > #endif /* CONFIG_PPC64 */ > > +#ifdef CONFIG_PPC_PKEY > + STACK_PT_REGS_OFFSET(STACK_REGS_AMR, amr); > +#endif > #ifdef CONFIG_PPC_KUAP > STACK_PT_REGS_OFFSET(STACK_REGS_KUAP, kuap); > #endif > diff --git a/arch/powerpc/mm/book3s64/Makefile > b/arch/powerpc/mm/book3s64/Makefile > index fd393b8be14f..1b56d3af47d4 100644 > --- a/arch/powerpc/mm/book3s64/Makefile > +++ b/arch/powerpc/mm/book3s64/Makefile > @@ -17,7 +17,7 @@ endif > obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hash_hugepage.o > obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage_prot.o > obj-$(CONFIG_SPAPR_TCE_IOMMU)+= iommu_api.o > -obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o > +obj-$(CONFIG_PPC_PKEY) += pkeys.o > > # Instrumenting the SLB fault path can lead to duplicate SLB entries > KCOV_INSTRUMENT_slb.o := n > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index b1d091a97611..7dc71f85683d 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -89,12 +89,14 @@ static int scan_pkey_feature(void) > } > } > > +#ifdef CONFIG_PPC_MEM_KEYS > /* >* Adjust the upper limit, based on the number of bits supported by >* arch-neutral code. >*/ > pkeys_total = min_t(int, pkeys_total, > ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1)); > +#endif > return pkeys_total; > } > > @@ -102,6 +104,7
Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
Laurent Vivier writes: > With virtio multiqueue, normally each queue IRQ is mapped to a CPU. > > But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") > this is broken on pseries. > > The affinity is correctly computed in msi_desc but this is not applied > to the system IRQs. > > It appears the affinity is correctly passed to rtas_setup_msi_irqs() but > lost at this point and never passed to irq_domain_alloc_descs() > (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")) > because irq_create_mapping() doesn't take an affinity parameter. > > As the previous patch has added the affinity parameter to > irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs() > to irq_domain_alloc_descs(). > > With this change, the virtqueues are correctly dispatched between the CPUs > on pseries. > > BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939 > Signed-off-by: Laurent Vivier > Reviewed-by: Greg Kurz > --- > arch/powerpc/platforms/pseries/msi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Acked-by: Michael Ellerman cheers > diff --git a/arch/powerpc/platforms/pseries/msi.c > b/arch/powerpc/platforms/pseries/msi.c > index 133f6adcb39c..b3ac2455faad 100644 > --- a/arch/powerpc/platforms/pseries/msi.c > +++ b/arch/powerpc/platforms/pseries/msi.c > @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int > nvec_in, int type) > return hwirq; > } > > - virq = irq_create_mapping(NULL, hwirq); > + virq = irq_create_mapping_affinity(NULL, hwirq, > +entry->affinity); > > if (!virq) { > pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq); > -- > 2.28.0
Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
Marc Zyngier writes: > On 2020-11-25 16:24, Laurent Vivier wrote: >> On 25/11/2020 17:05, Denis Kirjanov wrote: >>> On 11/25/20, Laurent Vivier wrote: With virtio multiqueue, normally each queue IRQ is mapped to a CPU. But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") this is broken on pseries. >>> >>> Please add "Fixes" tag. >> >> In fact, the code in commit 0d9f0a52c8b9f is correct. >> >> The problem is with MSI/X irq affinity and pseries. So this patch >> fixes more than virtio_scsi. I put this information because this >> commit allows to clearly show the problem. Perhaps I should remove >> this line in fact? > > This patch does not fix virtio_scsi at all, which as you noticed, is > correct. It really fixes the PPC MSI setup, which is starting to show > its age. So getting rid of the reference seems like the right thing to > do. It's still useful to refer to that commit if the code worked prior to that commit. But you should make it clearer that 0d9f0a52c8b9f wasn't in error, it just exposed an existing shortcoming of the arch code. cheers
Re: [PATCH v6 07/22] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP
"Aneesh Kumar K.V" writes: > diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h > index 255a1837e9f7..f5c7a17c198a 100644 > --- a/arch/powerpc/include/asm/mmu.h > +++ b/arch/powerpc/include/asm/mmu.h > @@ -28,6 +28,11 @@ > * Individual features below. > */ > > +/* > + * Supports KUAP (key 0 controlling userspace addresses) on radix > + */ That comment needs updating. I think this feature now means we have either key 0 controlling uaccess on radix OR we're using the AMR to manually implement KUAP. > +#define MMU_FTR_KUAP ASM_CONST(0x0200) I agree with Christophe that this name is now too generic. With that name one would expect it to be enabled on the 32-bit CPUs that implement KUAP. Maybe MMU_FTR_BOOK3S_KUAP ? If in future the other MMUs want an MMU feature for KUAP then we could rename it to MMU_FTR_KUAP, but we'd need to be careful with ifdefs to make sure it guards the right things. cheers
[PATCH 11/13] ibmvfc: set and track hw queue in ibmvfc_event struct
Extract the hwq id from a SCSI command and store it in the ibmvfc_event structure to identify which Sub-CRQ to send the command down when channels are being utilized. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 5 + drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 6 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 55893d09f883..f686c2cb0de2 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1387,6 +1387,7 @@ static void ibmvfc_init_event(struct ibmvfc_event *evt, evt->crq.format = format; evt->done = done; evt->eh_comp = NULL; + evt->hwq = 0; } /** @@ -1738,6 +1739,8 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd, struct ibmvfc_cmd *vfc_cmd; struct ibmvfc_fcp_cmd_iu *iu; struct ibmvfc_event *evt; + u32 tag_and_hwq = blk_mq_unique_tag(cmnd->request); + u16 hwq = blk_mq_unique_tag_to_hwq(tag_and_hwq); int rc; if (unlikely((rc = fc_remote_port_chkready(rport))) || @@ -1765,6 +1768,8 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd, } vfc_cmd->correlation = cpu_to_be64(evt); + if (vhost->using_channels) + evt->hwq = hwq % vhost->scsi_scrqs.active_queues; if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev return ibmvfc_send_event(evt, vhost, 0); diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index 04086ffbfca7..abda910ae33d 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -781,6 +781,7 @@ struct ibmvfc_event { struct completion comp; struct completion *eh_comp; struct timer_list timer; + u16 hwq; }; /* a pool of event structs for use */ -- 2.27.0
[PATCH 13/13] ibmvfc: register Sub-CRQ handles with VIOS during channel setup
If the ibmvfc client adapter requests channels it must submit a number of Sub-CRQ handles matching the number of channels being requested. The VIOS in its response will overwrite the actual number of channel resources allocated which may be less than what was requested. The client then must store the VIOS Sub-CRQ handle for each queue. This VIOS handle is needed as a parameter with h_send_sub_crq(). Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 897e3236534d..6bb1028bbe44 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -4494,15 +4494,35 @@ static void ibmvfc_discover_targets(struct ibmvfc_host *vhost) static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt) { struct ibmvfc_host *vhost = evt->vhost; + struct ibmvfc_channel_setup *setup = vhost->channel_setup_buf; + struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs; u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status); int level = IBMVFC_DEFAULT_LOG_LEVEL; + int flags, active_queues, i; ibmvfc_free_event(evt); switch (mad_status) { case IBMVFC_MAD_SUCCESS: ibmvfc_dbg(vhost, "Channel Setup succeded\n"); + flags = be32_to_cpu(setup->flags); vhost->do_enquiry = 0; + active_queues = be32_to_cpu(setup->num_scsi_subq_channels); + scrqs->active_queues = active_queues; + + if (flags & IBMVFC_CHANNELS_CANCELED) { + ibmvfc_dbg(vhost, "Channels Canceled\n"); + vhost->using_channels = 0; + } else { + if (active_queues) + vhost->using_channels = 1; + for (i = 0; i < active_queues; i++) + scrqs->scrqs[i].vios_cookie = + be64_to_cpu(setup->channel_handles[i]); + + ibmvfc_dbg(vhost, "Using %u channels\n", + vhost->scsi_scrqs.active_queues); + } break; case IBMVFC_MAD_FAILED: level += ibmvfc_retry_host_init(vhost); @@ -4526,9 +4546,19 @@ static void ibmvfc_channel_setup(struct ibmvfc_host *vhost) struct ibmvfc_channel_setup_mad *mad; struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf; struct ibmvfc_event *evt = ibmvfc_get_event(vhost); + struct ibmvfc_scsi_channels *scrqs = >scsi_scrqs; + unsigned int num_channels = + min(vhost->client_scsi_channels, vhost->max_vios_scsi_channels); + int i; memset(setup_buf, 0, sizeof(*setup_buf)); - setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS); + if (num_channels == 0) + setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS); + else { + setup_buf->num_scsi_subq_channels = cpu_to_be32(num_channels); + for (i = 0; i < num_channels; i++) + setup_buf->channel_handles[i] = cpu_to_be64(scrqs->scrqs[i].cookie); + } ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT); mad = >iu.channel_setup; -- 2.27.0
[PATCH 12/13] ibmvfc: send commands down HW Sub-CRQ when channelized
When the client has negotiated the use of channels all vfcFrames are required to go down a Sub-CRQ channel or it is a protocoal violation. If the adapter state is channelized submit vfcFrames to the appropriate Sub-CRQ via the h_send_sub_crq() helper. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index f686c2cb0de2..897e3236534d 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -701,6 +701,15 @@ static int ibmvfc_send_crq(struct ibmvfc_host *vhost, u64 word1, u64 word2) return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address, word1, word2); } +static int ibmvfc_send_sub_crq(struct ibmvfc_host *vhost, u64 cookie, u64 word1, + u64 word2, u64 word3, u64 word4) +{ + struct vio_dev *vdev = to_vio_dev(vhost->dev); + + return plpar_hcall_norets(H_SEND_SUB_CRQ, vdev->unit_address, cookie, + word1, word2, word3, word4); +} + /** * ibmvfc_send_crq_init - Send a CRQ init message * @vhost: ibmvfc host struct @@ -1524,8 +1533,17 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt, mb(); - if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]), - be64_to_cpu(crq_as_u64[1] { + if (vhost->using_channels && evt->crq.format == IBMVFC_CMD_FORMAT) + rc = ibmvfc_send_sub_crq(vhost, + vhost->scsi_scrqs.scrqs[evt->hwq].vios_cookie, +be64_to_cpu(crq_as_u64[0]), +be64_to_cpu(crq_as_u64[1]), +0, 0); + else + rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]), +be64_to_cpu(crq_as_u64[1])); + + if (rc) { list_del(>queue); del_timer(>timer); -- 2.27.0
[PATCH 01/13] ibmvfc: add vhost fields and defaults for MQ enablement
Introduce several new vhost fields for managing MQ state of the adapter as well as initial defaults for MQ enablement. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 7 +++ drivers/scsi/ibmvscsi/ibmvfc.h | 9 + 2 files changed, 16 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 42e4d35e0d35..cd609d19e6a1 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -5167,6 +5167,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) shost->max_sectors = IBMVFC_MAX_SECTORS; shost->max_cmd_len = IBMVFC_MAX_CDB_LEN; shost->unique_id = shost->host_no; + shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES; vhost = shost_priv(shost); INIT_LIST_HEAD(>sent); @@ -5178,6 +5179,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) vhost->partition_number = -1; vhost->log_level = log_level; vhost->task_set = 1; + + vhost->mq_enabled = IBMVFC_MQ; + vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS; + vhost->using_channels = 0; + vhost->do_enquiry = 1; + strcpy(vhost->partition_name, "UNKNOWN"); init_waitqueue_head(>work_wait_q); init_waitqueue_head(>init_wait_q); diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index 9d58cfd774d3..8225bdbb127e 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -41,6 +41,11 @@ #define IBMVFC_DEFAULT_LOG_LEVEL 2 #define IBMVFC_MAX_CDB_LEN 16 #define IBMVFC_CLS3_ERROR 0 +#define IBMVFC_MQ 0 +#define IBMVFC_SCSI_CHANNELS 0 +#define IBMVFC_SCSI_HW_QUEUES 1 +#define IBMVFC_MIG_NO_SUB_TO_CRQ 0 +#define IBMVFC_MIG_NO_N_TO_M 0 /* * Ensure we have resources for ERP and initialization: @@ -826,6 +831,10 @@ struct ibmvfc_host { int delay_init; int scan_complete; int logged_in; + int mq_enabled; + int using_channels; + int do_enquiry; + int client_scsi_channels; int aborting_passthru; int events_to_log; #define IBMVFC_AE_LINKUP 0x0001 -- 2.27.0
[PATCH 05/13] ibmvfc: add Sub-CRQ IRQ enable/disable routine
Each Sub-CRQ has its own interrupt. A hypercall is required to toggle the IRQ state. Provide the necessary mechanism via a helper function. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 571abdb48384..6eaedda4917a 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3351,6 +3351,26 @@ static void ibmvfc_tasklet(void *data) spin_unlock_irqrestore(vhost->host->host_lock, flags); } +static int ibmvfc_toggle_scrq_irq(struct ibmvfc_sub_queue *scrq, int enable) +{ + struct device *dev = scrq->vhost->dev; + struct vio_dev *vdev = to_vio_dev(dev); + unsigned long rc; + int irq_action = H_ENABLE_VIO_INTERRUPT; + + if (!enable) + irq_action = H_DISABLE_VIO_INTERRUPT; + + rc = plpar_hcall_norets(H_VIOCTL, vdev->unit_address, irq_action, + scrq->hw_irq, 0, 0); + + if (rc) + dev_err(dev, "Couldn't %s sub-crq[%lu] irq. rc=%ld\n", + enable ? "enable" : "disable", scrq->hwq_id, rc); + + return rc; +} + /** * ibmvfc_init_tgt - Set the next init job step for the target * @tgt: ibmvfc target struct -- 2.27.0
[PATCH 09/13] ibmvfc: implement channel enquiry and setup commands
New NPIV_ENQUIRY_CHANNEL and NPIV_SETUP_CHANNEL management datagrams (MADs) were defined in a previous patchset. If the client advertises a desire to use channels and the partner VIOS is channel capable then the client must proceed with channel enquiry to determine the maximum number of channels the VIOS is capable of providing, and registering SubCRQs via channel setup with the VIOS immediately following NPIV Login. This handshaking should not be performed for subsequent NPIV Logins unless the CRQ connection has been reset. Implement these two new MADs and issue them following a successful NPIV login where the VIOS has set the SUPPORT_CHANNELS capability bit in the NPIV Login response. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 135 - drivers/scsi/ibmvscsi/ibmvfc.h | 3 + 2 files changed, 136 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 53db6da20923..40a945712bdb 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -804,6 +804,8 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost) spin_lock_irqsave(vhost->host->host_lock, flags); vhost->state = IBMVFC_NO_CRQ; vhost->logged_in = 0; + vhost->do_enquiry = 1; + vhost->using_channels = 0; /* Clean out the queue */ memset(crq->msgs, 0, PAGE_SIZE); @@ -4462,6 +4464,118 @@ static void ibmvfc_discover_targets(struct ibmvfc_host *vhost) ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD); } +static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt) +{ + struct ibmvfc_host *vhost = evt->vhost; + u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status); + int level = IBMVFC_DEFAULT_LOG_LEVEL; + + ibmvfc_free_event(evt); + + switch (mad_status) { + case IBMVFC_MAD_SUCCESS: + ibmvfc_dbg(vhost, "Channel Setup succeded\n"); + vhost->do_enquiry = 0; + break; + case IBMVFC_MAD_FAILED: + level += ibmvfc_retry_host_init(vhost); + ibmvfc_log(vhost, level, "Channel Setup failed\n"); + fallthrough; + case IBMVFC_MAD_DRIVER_FAILED: + return; + default: + dev_err(vhost->dev, "Invalid Channel Setup response: 0x%x\n", + mad_status); + ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD); + return; + } + + ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY); + wake_up(>work_wait_q); +} + +static void ibmvfc_channel_setup(struct ibmvfc_host *vhost) +{ + struct ibmvfc_channel_setup_mad *mad; + struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf; + struct ibmvfc_event *evt = ibmvfc_get_event(vhost); + + memset(setup_buf, 0, sizeof(*setup_buf)); + setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS); + + ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT); + mad = >iu.channel_setup; + memset(mad, 0, sizeof(*mad)); + mad->common.version = cpu_to_be32(1); + mad->common.opcode = cpu_to_be32(IBMVFC_CHANNEL_SETUP); + mad->common.length = cpu_to_be16(sizeof(*mad)); + mad->buffer.va = cpu_to_be64(vhost->channel_setup_dma); + mad->buffer.len = cpu_to_be32(sizeof(*vhost->channel_setup_buf)); + + ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_INIT_WAIT); + + if (!ibmvfc_send_event(evt, vhost, default_timeout)) + ibmvfc_dbg(vhost, "Sent channel setup\n"); + else + ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN); +} + +static void ibmvfc_channel_enquiry_done(struct ibmvfc_event *evt) +{ + struct ibmvfc_host *vhost = evt->vhost; + struct ibmvfc_channel_enquiry *rsp = >xfer_iu->channel_enquiry; + u32 mad_status = be16_to_cpu(rsp->common.status); + int level = IBMVFC_DEFAULT_LOG_LEVEL; + + switch (mad_status) { + case IBMVFC_MAD_SUCCESS: + ibmvfc_dbg(vhost, "Channel Enquiry succeeded\n"); + vhost->max_vios_scsi_channels = be32_to_cpu(rsp->num_scsi_subq_channels); + break; + case IBMVFC_MAD_FAILED: + level += ibmvfc_retry_host_init(vhost); + ibmvfc_log(vhost, level, "Channel Enquiry failed\n"); + ibmvfc_free_event(evt); + fallthrough; + case IBMVFC_MAD_DRIVER_FAILED: + ibmvfc_free_event(evt); + return; + default: + dev_err(vhost->dev, "Invalid Channel Enquiry response: 0x%x\n", + mad_status); + ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD); + ibmvfc_free_event(evt); + return; + } + + ibmvfc_channel_setup(vhost); +} + +static void ibmvfc_channel_enquiry(struct ibmvfc_host *vhost) +{ + struct ibmvfc_channel_enquiry
[PATCH 03/13] ibmvfc: add Subordinate CRQ definitions
Subordinate Command Response Queues (Sub CRQ) are used in conjunction with the primary CRQ when more than one queue is needed by the virtual IO adapter. Recent phyp firmware versions support Sub CRQ's with ibmvfc adapters. This feature is a prerequisite for supporting multiple hardware backed submission queues in the vfc adapter. The Sub CRQ command element differs from the standard CRQ in that it is 32bytes long as opposed to 16bytes for the latter. Despite this extra 16bytes the ibmvfc protocol will use the original CRQ command element mapped to the first 16bytes of the Sub CRQ element initially. Add definitions for the Sub CRQ command element and queue. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.h | 23 +++ 1 file changed, 23 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h index 8225bdbb127e..084ecdfe51ea 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.h +++ b/drivers/scsi/ibmvscsi/ibmvfc.h @@ -656,6 +656,29 @@ struct ibmvfc_crq_queue { dma_addr_t msg_token; }; +struct ibmvfc_sub_crq { + struct ibmvfc_crq crq; + __be64 reserved[2]; +} __packed __aligned(8); + +struct ibmvfc_sub_queue { + struct ibmvfc_sub_crq *msgs; + dma_addr_t msg_token; + int size, cur; + struct ibmvfc_host *vhost; + unsigned long cookie; + unsigned long vios_cookie; + unsigned long hw_irq; + unsigned long irq; + unsigned long hwq_id; + char name[32]; +}; + +struct ibmvfc_scsi_channels { + struct ibmvfc_sub_queue *scrqs; + unsigned int active_queues; +}; + enum ibmvfc_ae_link_state { IBMVFC_AE_LS_LINK_UP= 0x01, IBMVFC_AE_LS_LINK_BOUNCED = 0x02, -- 2.27.0
[PATCH 08/13] ibmvfc: map/request irq and register Sub-CRQ interrupt handler
Create an irq mapping for the hw_irq number provided from phyp firmware. Request an irq assigned our Sub-CRQ interrupt handler. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 4fb782fa2c66..53db6da20923 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -5119,12 +5119,34 @@ static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost, goto reg_failed; } + scrq->irq = irq_create_mapping(NULL, scrq->hw_irq); + + if (!scrq->irq) { + rc = -EINVAL; + dev_err(dev, "Error mapping sub-crq[%d] irq\n", index); + goto irq_failed; + } + + snprintf(scrq->name, sizeof(scrq->name), "ibmvfc-%x-scsi%d", +vdev->unit_address, index); + rc = request_irq(scrq->irq, ibmvfc_interrupt_scsi, 0, scrq->name, scrq); + + if (rc) { + dev_err(dev, "Couldn't register sub-crq[%d] irq\n", index); + irq_dispose_mapping(scrq->irq); + goto irq_failed; + } + scrq->hwq_id = index; scrq->vhost = vhost; LEAVE; return 0; +irq_failed: + do { + plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, scrq->cookie); + } while (rc == H_BUSY || H_IS_LONG_BUSY(rc)); reg_failed: dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL); dma_map_failed: -- 2.27.0
[PATCH 04/13] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels
Allocate a set of Sub-CRQs in advance. During channel setup the client and VIOS negotiate the number of queues the VIOS supports and the number that the client desires to request. Its possible that the final channel resources allocated is less than requested, but the client is still responsible for sending handles for every queue it is hoping for. Also, provide deallocation cleanup routines. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 115 + drivers/scsi/ibmvscsi/ibmvfc.h | 1 + 2 files changed, 116 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 260b82e3cc01..571abdb48384 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -4983,6 +4983,114 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost) return retrc; } +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost, + int index) +{ + struct device *dev = vhost->dev; + struct vio_dev *vdev = to_vio_dev(dev); + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index]; + int rc = -ENOMEM; + + ENTER; + + scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL); + if (!scrq->msgs) + return rc; + + scrq->size = PAGE_SIZE / sizeof(*scrq->msgs); + scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE, +DMA_BIDIRECTIONAL); + + if (dma_mapping_error(dev, scrq->msg_token)) + goto dma_map_failed; + + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE, + >cookie, >hw_irq); + + if (rc) { + dev_warn(dev, "Error registering sub-crq: %d\n", rc); + dev_warn(dev, "Firmware may not support MQ\n"); + goto reg_failed; + } + + scrq->hwq_id = index; + scrq->vhost = vhost; + + LEAVE; + return 0; + +reg_failed: + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL); +dma_map_failed: + free_page((unsigned long)scrq->msgs); + LEAVE; + return rc; +} + +static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int index) +{ + struct device *dev = vhost->dev; + struct vio_dev *vdev = to_vio_dev(dev); + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index]; + long rc; + + ENTER; + + do { + rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, + scrq->cookie); + } while (rc == H_BUSY || H_IS_LONG_BUSY(rc)); + + if (rc) + dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc); + + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL); + free_page((unsigned long)scrq->msgs); + LEAVE; +} + +static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost) +{ + int i, j; + + ENTER; + + vhost->scsi_scrqs.scrqs = kcalloc(vhost->client_scsi_channels, + sizeof(*vhost->scsi_scrqs.scrqs), + GFP_KERNEL); + if (!vhost->scsi_scrqs.scrqs) + return -1; + + for (i = 0; i < vhost->client_scsi_channels; i++) { + if (ibmvfc_register_scsi_channel(vhost, i)) { + for (j = i; j > 0; j--) + ibmvfc_deregister_scsi_channel(vhost, j - 1); + kfree(vhost->scsi_scrqs.scrqs); + LEAVE; + return -1; + } + } + + LEAVE; + return 0; +} + +static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost) +{ + int i; + + ENTER; + if (!vhost->scsi_scrqs.scrqs) + return; + + for (i = 0; i < vhost->client_scsi_channels; i++) + ibmvfc_deregister_scsi_channel(vhost, i); + + vhost->scsi_scrqs.active_queues = 0; + kfree(vhost->scsi_scrqs.scrqs); + LEAVE; +} + /** * ibmvfc_free_mem - Free memory for vhost * @vhost: ibmvfc host struct @@ -5239,6 +5347,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id) goto remove_shost; } + if (vhost->mq_enabled) { + rc = ibmvfc_init_sub_crqs(vhost); + if (rc) + dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", rc); + } + if (shost_to_fc_host(shost)->rqst_q) blk_queue_max_segments(shost_to_fc_host(shost)->rqst_q, 1); dev_set_drvdata(dev, vhost); @@ -5296,6 +5410,7 @@ static int ibmvfc_remove(struct vio_dev *vdev) ibmvfc_purge_requests(vhost, DID_ERROR); spin_unlock_irqrestore(vhost->host->host_lock, flags); ibmvfc_free_event_pool(vhost); + ibmvfc_release_sub_crqs(vhost);
[PATCH 02/13] ibmvfc: define hcall wrapper for registering a Sub-CRQ
Sub-CRQs are registred with firmware via a hypercall. Abstract that interface into a simpler helper function. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index cd609d19e6a1..260b82e3cc01 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -138,6 +138,20 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *); static const char *unknown_error = "unknown error"; +static long h_reg_sub_crq(unsigned long unit_address, unsigned long ioba, + unsigned long length, unsigned long *cookie, + unsigned long *irq) +{ + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + long rc; + + rc = plpar_hcall(H_REG_SUB_CRQ, retbuf, unit_address, ioba, length); + *cookie = retbuf[0]; + *irq = retbuf[1]; + + return rc; +} + static int ibmvfc_check_caps(struct ibmvfc_host *vhost, unsigned long cap_flags) { u64 host_caps = be64_to_cpu(vhost->login_buf->resp.capabilities); -- 2.27.0
[PATCH 00/13] ibmvfc: initial MQ development
Recent updates in pHyp Firmware and VIOS releases provide new infrastructure towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then negotiated with the VIOS via new Management Datagrams (MADs) for channel setup. This initial implementation adds the necessary Sub-CRQ framework and implements the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS HW backed channels. The event pool and locking still leverages the legacy single queue implementation, and as such lock contention is problematic when increasing the number of queues. However, this initial work demonstrates a 1.2x factor increase in IOPs when configured with two HW queues despite lock contention. Tyrel Datwyler (13): ibmvfc: add vhost fields and defaults for MQ enablement ibmvfc: define hcall wrapper for registering a Sub-CRQ ibmvfc: add Subordinate CRQ definitions ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels ibmvfc: add Sub-CRQ IRQ enable/disable routine ibmvfc: add handlers to drain and complete Sub-CRQ responses ibmvfc: define Sub-CRQ interrupt handler routine ibmvfc: map/request irq and register Sub-CRQ interrupt handler ibmvfc: implement channel enquiry and setup commands ibmvfc: advertise client support for using hardware channels ibmvfc: set and track hw queue in ibmvfc_event struct ibmvfc: send commands down HW Sub-CRQ when channelized ibmvfc: register Sub-CRQ handles with VIOS during channel setup drivers/scsi/ibmvscsi/ibmvfc.c | 460 - drivers/scsi/ibmvscsi/ibmvfc.h | 37 +++ 2 files changed, 493 insertions(+), 4 deletions(-) -- 2.27.0
[PATCH 07/13] ibmvfc: define Sub-CRQ interrupt handler routine
Simple handler that calls Sub-CRQ drain routine directly. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index a8730522920e..4fb782fa2c66 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3443,6 +3443,20 @@ static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue *scrq) } } +static irqreturn_t ibmvfc_interrupt_scsi(int irq, void *scrq_instance) +{ + struct ibmvfc_sub_queue *scrq = (struct ibmvfc_sub_queue *)scrq_instance; + struct ibmvfc_host *vhost = scrq->vhost; + unsigned long flags; + + spin_lock_irqsave(vhost->host->host_lock, flags); + ibmvfc_toggle_scrq_irq(scrq, 0); + ibmvfc_drain_sub_crq(scrq); + spin_unlock_irqrestore(vhost->host->host_lock, flags); + + return IRQ_HANDLED; +} + /** * ibmvfc_init_tgt - Set the next init job step for the target * @tgt: ibmvfc target struct -- 2.27.0
[PATCH 10/13] ibmvfc: advertise client support for using hardware channels
Previous patches have plumbed the necessary Sub-CRQ interface and channel negotiation MADs to fully channelized hardware queues. Advertise client support via NPIV Login capability IBMVFC_CAN_USE_CHANNELS when the client bits have MQ enabled via vhost->mq_enabled, or when channels were already in use during a subsequent NPIV Login. The later is required because channel support is only renegotiated after a CRQ pair is broken. Simple NPIV Logout/Logins require the client to continue to advertise the channel capability until the CRQ pair between the client is broken. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 40a945712bdb..55893d09f883 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1272,6 +1272,10 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost) login_info->max_cmds = cpu_to_be32(max_requests + IBMVFC_NUM_INTERNAL_REQ); login_info->capabilities = cpu_to_be64(IBMVFC_CAN_MIGRATE | IBMVFC_CAN_SEND_VF_WWPN); + + if (vhost->mq_enabled || vhost->using_channels) + login_info->capabilities |= cpu_to_be64(IBMVFC_CAN_USE_CHANNELS); + login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token); login_info->async.len = cpu_to_be32(vhost->async_crq.size * sizeof(*vhost->async_crq.msgs)); strncpy(login_info->partition_name, vhost->partition_name, IBMVFC_MAX_NAME); -- 2.27.0
[PATCH 06/13] ibmvfc: add handlers to drain and complete Sub-CRQ responses
The logic for iterating over the Sub-CRQ responses is similiar to that of the primary CRQ. Add the necessary handlers for processing those responses. Signed-off-by: Tyrel Datwyler --- drivers/scsi/ibmvscsi/ibmvfc.c | 72 ++ 1 file changed, 72 insertions(+) diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 6eaedda4917a..a8730522920e 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3371,6 +3371,78 @@ static int ibmvfc_toggle_scrq_irq(struct ibmvfc_sub_queue *scrq, int enable) return rc; } +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost) +{ + struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba); + + switch (crq->valid) { + case IBMVFC_CRQ_CMD_RSP: + break; + default: + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", crq->valid); + return; + } + + /* The only kind of payload CRQs we should get are responses to +* things we send. Make sure this response is to something we +* actually sent +*/ + if (unlikely(!ibmvfc_valid_event(>pool, evt))) { + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is invalid!\n", + crq->ioba); + return; + } + + if (unlikely(atomic_read(>free))) { + dev_err(vhost->dev, "Received duplicate correlation_token 0x%08llx!\n", + crq->ioba); + return; + } + + del_timer(>timer); + list_del(>queue); + ibmvfc_trc_end(evt); + evt->done(evt); +} + +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq) +{ + struct ibmvfc_crq *crq; + + crq = >msgs[scrq->cur].crq; + if (crq->valid & 0x80) { + if (++scrq->cur == scrq->size) + scrq->cur = 0; + rmb(); + } else + crq = NULL; + + return crq; +} + +static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue *scrq) +{ + struct ibmvfc_crq *crq; + int done = 0; + + while (!done) { + while ((crq = ibmvfc_next_scrq(scrq)) != NULL) { + ibmvfc_handle_scrq(crq, scrq->vhost); + crq->valid = 0; + wmb(); + } + + ibmvfc_toggle_scrq_irq(scrq, 1); + if ((crq = ibmvfc_next_scrq(scrq)) != NULL) { + ibmvfc_toggle_scrq_irq(scrq, 0); + ibmvfc_handle_scrq(crq, scrq->vhost); + crq->valid = 0; + wmb(); + } else + done = 1; + } +} + /** * ibmvfc_init_tgt - Set the next init job step for the target * @tgt: ibmvfc target struct -- 2.27.0
[PATCH net v3 1/9] ibmvnic: handle inconsistent login with reset
Inconsistent login with the vnicserver is causing the device to be removed. This does not give the device a chance to recover from error state. This patch schedules a FATAL reset instead to bring the adapter up. Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Dany Madden Signed-off-by: Lijun Pan --- drivers/net/ethernet/ibm/ibmvnic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 2aa40b2f225c..dcb23015b6b4 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -4412,7 +4412,7 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq, adapter->req_rx_add_queues != be32_to_cpu(login_rsp->num_rxadd_subcrqs))) { dev_err(dev, "FATAL: Inconsistent login and login rsp\n"); - ibmvnic_remove(adapter->vdev); + ibmvnic_reset(adapter, VNIC_RESET_FATAL); return -EIO; } size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + -- 2.26.2
[PATCH net v3 9/9] ibmvnic: reduce wait for completion time
Reduce the wait time for Command Response Queue response from 30 seconds to 20 seconds, as recommended by VIOS and Power Hypervisor teams. Fixes: bd0b672313941 ("ibmvnic: Move login and queue negotiation into ibmvnic_open") Fixes: 53da09e92910f ("ibmvnic: Add set_link_state routine for setting adapter link state") Signed-off-by: Dany Madden --- drivers/net/ethernet/ibm/ibmvnic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index a17856be2828..d6b2686aed0f 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -834,7 +834,7 @@ static void release_napi(struct ibmvnic_adapter *adapter) static int ibmvnic_login(struct net_device *netdev) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); - unsigned long timeout = msecs_to_jiffies(3); + unsigned long timeout = msecs_to_jiffies(2); int retry_count = 0; int retries = 10; bool retry; @@ -938,7 +938,7 @@ static void release_resources(struct ibmvnic_adapter *adapter) static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state) { struct net_device *netdev = adapter->netdev; - unsigned long timeout = msecs_to_jiffies(3); + unsigned long timeout = msecs_to_jiffies(2); union ibmvnic_crq crq; bool resend; int rc; @@ -5125,7 +5125,7 @@ static int init_crq_queue(struct ibmvnic_adapter *adapter) static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset) { struct device *dev = >vdev->dev; - unsigned long timeout = msecs_to_jiffies(3); + unsigned long timeout = msecs_to_jiffies(2); u64 old_num_rx_queues, old_num_tx_queues; int rc; -- 2.26.2
[PATCH net v3 7/9] ibmvnic: send_login should check for crq errors
send_login() does not check for the result of ibmvnic_send_crq() of the login request. This results in the driver needlessly retrying the login 10 times even when CRQ is no longer active. Check the return code and give up in case of errors in sending the CRQ. The only time we want to retry is if we get a PARITALSUCCESS response from the partner. Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Dany Madden Signed-off-by: Sukadev Bhattiprolu --- drivers/net/ethernet/ibm/ibmvnic.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 55b07bd4c741..9005fab09e15 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -850,10 +850,8 @@ static int ibmvnic_login(struct net_device *netdev) adapter->init_done_rc = 0; reinit_completion(>init_done); rc = send_login(adapter); - if (rc) { - netdev_warn(netdev, "Unable to login\n"); + if (rc) return rc; - } if (!wait_for_completion_timeout(>init_done, timeout)) { @@ -3727,15 +3725,16 @@ static int send_login(struct ibmvnic_adapter *adapter) struct ibmvnic_login_rsp_buffer *login_rsp_buffer; struct ibmvnic_login_buffer *login_buffer; struct device *dev = >vdev->dev; + struct vnic_login_client_data *vlcd; dma_addr_t rsp_buffer_token; dma_addr_t buffer_token; size_t rsp_buffer_size; union ibmvnic_crq crq; + int client_data_len; size_t buffer_size; __be64 *tx_list_p; __be64 *rx_list_p; - int client_data_len; - struct vnic_login_client_data *vlcd; + int rc; int i; if (!adapter->tx_scrq || !adapter->rx_scrq) { @@ -3841,16 +3840,23 @@ static int send_login(struct ibmvnic_adapter *adapter) crq.login.len = cpu_to_be32(buffer_size); adapter->login_pending = true; - ibmvnic_send_crq(adapter, ); + rc = ibmvnic_send_crq(adapter, ); + if (rc) { + adapter->login_pending = false; + netdev_err(adapter->netdev, "Failed to send login, rc=%d\n", rc); + goto buf_rsp_map_failed; + } return 0; buf_rsp_map_failed: kfree(login_rsp_buffer); + adapter->login_rsp_buf = NULL; buf_rsp_alloc_failed: dma_unmap_single(dev, buffer_token, buffer_size, DMA_TO_DEVICE); buf_map_failed: kfree(login_buffer); + adapter->login_buf = NULL; buf_alloc_failed: return -1; } -- 2.26.2
[PATCH net v3 8/9] ibmvnic: no reset timeout for 5 seconds after reset
Reset timeout is going off right after adapter reset. This patch ensures that timeout is scheduled if it has been 5 seconds since the last reset. 5 seconds is the default watchdog timeout. Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling") Signed-off-by: Dany Madden --- drivers/net/ethernet/ibm/ibmvnic.c | 11 +-- drivers/net/ethernet/ibm/ibmvnic.h | 2 ++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 9005fab09e15..a17856be2828 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2253,6 +2253,7 @@ static void __ibmvnic_reset(struct work_struct *work) rc = do_reset(adapter, rwi, reset_state); } kfree(rwi); + adapter->last_reset_time = jiffies; if (rc) netdev_dbg(adapter->netdev, "Reset failed, rc=%d\n", rc); @@ -2356,7 +2357,13 @@ static void ibmvnic_tx_timeout(struct net_device *dev, unsigned int txqueue) "Adapter is resetting, skip timeout reset\n"); return; } - + /* No queuing up reset until at least 5 seconds (default watchdog val) +* after last reset +*/ + if (time_before(jiffies, (adapter->last_reset_time + dev->watchdog_timeo))) { + netdev_dbg(dev, "Not yet time to tx timeout.\n"); + return; + } ibmvnic_reset(adapter, VNIC_RESET_TIMEOUT); } @@ -5277,7 +5284,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) adapter->state = VNIC_PROBED; adapter->wait_for_reset = false; - + adapter->last_reset_time = jiffies; return 0; ibmvnic_register_fail: diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h index 6f0a701c4a38..b21092f5f9c1 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.h +++ b/drivers/net/ethernet/ibm/ibmvnic.h @@ -1088,6 +1088,8 @@ struct ibmvnic_adapter { unsigned long resetting; bool napi_enabled, from_passive_init; bool login_pending; + /* last device reset time */ + unsigned long last_reset_time; bool failover_pending; bool force_reset_recovery; -- 2.26.2
[PATCH net v3 6/9] ibmvnic: track pending login
From: Sukadev Bhattiprolu If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that the worker thread will start reset process and free the login response buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will then try to access the login response buffer and crash. Have ibmvnic track pending logins and discard any stale login responses. Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Sukadev Bhattiprolu --- drivers/net/ethernet/ibm/ibmvnic.c | 17 + drivers/net/ethernet/ibm/ibmvnic.h | 1 + 2 files changed, 18 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index e2f9b0e9dea8..55b07bd4c741 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3839,6 +3839,8 @@ static int send_login(struct ibmvnic_adapter *adapter) crq.login.cmd = LOGIN; crq.login.ioba = cpu_to_be32(buffer_token); crq.login.len = cpu_to_be32(buffer_size); + + adapter->login_pending = true; ibmvnic_send_crq(adapter, ); return 0; @@ -4391,6 +4393,15 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq, u64 *size_array; int i; + /* CHECK: Test/set of login_pending does not need to be atomic +* because only ibmvnic_tasklet tests/clears this. +*/ + if (!adapter->login_pending) { + netdev_warn(netdev, "Ignoring unexpected login response\n"); + return 0; + } + adapter->login_pending = false; + dma_unmap_single(dev, adapter->login_buf_token, adapter->login_buf_sz, DMA_TO_DEVICE); dma_unmap_single(dev, adapter->login_rsp_buf_token, @@ -4762,6 +4773,11 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq, case IBMVNIC_CRQ_INIT: dev_info(dev, "Partner initialized\n"); adapter->from_passive_init = true; + /* Discard any stale login responses from prev reset. +* CHECK: should we clear even on INIT_COMPLETE? +*/ + adapter->login_pending = false; + if (!completion_done(>init_done)) { complete(>init_done); adapter->init_done_rc = -EIO; @@ -5191,6 +5207,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) dev_set_drvdata(>dev, netdev); adapter->vdev = dev; adapter->netdev = netdev; + adapter->login_pending = false; ether_addr_copy(adapter->mac_addr, mac_addr_p); ether_addr_copy(netdev->dev_addr, adapter->mac_addr); diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h index 217dcc7ded70..6f0a701c4a38 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.h +++ b/drivers/net/ethernet/ibm/ibmvnic.h @@ -1087,6 +1087,7 @@ struct ibmvnic_adapter { struct delayed_work ibmvnic_delayed_reset; unsigned long resetting; bool napi_enabled, from_passive_init; + bool login_pending; bool failover_pending; bool force_reset_recovery; -- 2.26.2
[PATCH net v3 2/9] ibmvnic: stop free_all_rwi on failed reset
When ibmvnic fails to reset, it breaks out of the reset loop and frees all of the remaining resets from the workqueue. Doing so prevents the adapter from recovering if no reset is scheduled after that. Instead, have the driver continue to process resets on the workqueue. Remove the no longer need free_all_rwi(). Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling") Signed-off-by: Dany Madden --- drivers/net/ethernet/ibm/ibmvnic.c | 22 +++--- 1 file changed, 3 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index dcb23015b6b4..d5a927bb4954 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2173,17 +2173,6 @@ static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter) return rwi; } -static void free_all_rwi(struct ibmvnic_adapter *adapter) -{ - struct ibmvnic_rwi *rwi; - - rwi = get_next_rwi(adapter); - while (rwi) { - kfree(rwi); - rwi = get_next_rwi(adapter); - } -} - static void __ibmvnic_reset(struct work_struct *work) { struct ibmvnic_rwi *rwi; @@ -2253,9 +2242,9 @@ static void __ibmvnic_reset(struct work_struct *work) else adapter->state = reset_state; rc = 0; - } else if (rc && rc != IBMVNIC_INIT_FAILED && - !adapter->force_reset_recovery) - break; + } + if (rc) + netdev_dbg(adapter->netdev, "Reset failed, rc=%d\n", rc); rwi = get_next_rwi(adapter); @@ -2269,11 +2258,6 @@ static void __ibmvnic_reset(struct work_struct *work) complete(>reset_done); } - if (rc) { - netdev_dbg(adapter->netdev, "Reset failed\n"); - free_all_rwi(adapter); - } - clear_bit_unlock(0, >resetting); } -- 2.26.2
[PATCH net v3 3/9] ibmvnic: avoid memset null scrq msgs
scrq->msgs could be NULL during device reset, causing Linux to crash. So, check before memset scrq->msgs. Fixes: c8b2ad0a4a901 ("ibmvnic: Sanitize entire SCRQ buffer on reset") Signed-off-by: Dany Madden Signed-off-by: Lijun Pan --- drivers/net/ethernet/ibm/ibmvnic.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index d5a927bb4954..b08f95017825 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2845,15 +2845,26 @@ static int reset_one_sub_crq_queue(struct ibmvnic_adapter *adapter, { int rc; + if (!scrq) { + netdev_dbg(adapter->netdev, + "Invalid scrq reset. irq (%d) or msgs (%p).\n", + scrq->irq, scrq->msgs); + return -EINVAL; + } + if (scrq->irq) { free_irq(scrq->irq, scrq); irq_dispose_mapping(scrq->irq); scrq->irq = 0; } - - memset(scrq->msgs, 0, 4 * PAGE_SIZE); - atomic_set(>used, 0); - scrq->cur = 0; + if (scrq->msgs) { + memset(scrq->msgs, 0, 4 * PAGE_SIZE); + atomic_set(>used, 0); + scrq->cur = 0; + } else { + netdev_dbg(adapter->netdev, "Invalid scrq reset\n"); + return -EINVAL; + } rc = h_reg_sub_crq(adapter->vdev->unit_address, scrq->msg_token, 4 * PAGE_SIZE, >crq_num, >hw_irq); -- 2.26.2
[PATCH net v3 5/9] ibmvnic: delay next reset if hard reset fails
From: Sukadev Bhattiprolu If auto-priority failover is enabled, the backing device needs time to settle if hard resetting fails for any reason. Add a delay of 60 seconds before retrying the hard-reset. Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery") Signed-off-by: Sukadev Bhattiprolu --- drivers/net/ethernet/ibm/ibmvnic.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index ff474a790181..e2f9b0e9dea8 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2242,6 +2242,14 @@ static void __ibmvnic_reset(struct work_struct *work) rc = do_hard_reset(adapter, rwi, reset_state); rtnl_unlock(); } + if (rc) { + /* give backing device time to settle down */ + netdev_dbg(adapter->netdev, + "[S:%d] Hard reset failed, waiting 60 secs\n", + adapter->state); + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(60 * HZ); + } } else if (!(rwi->reset_reason == VNIC_RESET_FATAL && adapter->from_passive_init)) { rc = do_reset(adapter, rwi, reset_state); -- 2.26.2
[PATCH net v3 4/9] ibmvnic: restore adapter state on failed reset
In a failed reset, driver could end up in VNIC_PROBED or VNIC_CLOSED state and cannot recover in subsequent resets, leaving it offline. This patch restores the adapter state to reset_state, the original state when reset was called. Fixes: b27507bb59ed5 ("net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run") Fixes: 2770a7984db58 ("ibmvnic: Introduce hard reset recovery") Signed-off-by: Dany Madden --- drivers/net/ethernet/ibm/ibmvnic.c | 67 -- 1 file changed, 36 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index b08f95017825..ff474a790181 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1857,7 +1857,7 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter, if (reset_state == VNIC_OPEN) { rc = __ibmvnic_close(netdev); if (rc) - return rc; + goto out; } release_resources(adapter); @@ -1875,24 +1875,25 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter, } rc = ibmvnic_reset_init(adapter, true); - if (rc) - return IBMVNIC_INIT_FAILED; + if (rc) { + rc = IBMVNIC_INIT_FAILED; + goto out; + } /* If the adapter was in PROBE state prior to the reset, * exit here. */ if (reset_state == VNIC_PROBED) - return 0; + goto out; rc = ibmvnic_login(netdev); if (rc) { - adapter->state = reset_state; - return rc; + goto out; } rc = init_resources(adapter); if (rc) - return rc; + goto out; ibmvnic_disable_irqs(adapter); @@ -1902,8 +1903,10 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter, return 0; rc = __ibmvnic_open(netdev); - if (rc) - return IBMVNIC_OPEN_FAILED; + if (rc) { + rc = IBMVNIC_OPEN_FAILED; + goto out; + } /* refresh device's multicast list */ ibmvnic_set_multi(netdev); @@ -1912,7 +1915,10 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter, for (i = 0; i < adapter->req_rx_queues; i++) napi_schedule(>napi[i]); - return 0; +out: + if (rc) + adapter->state = reset_state; + return rc; } /** @@ -2015,7 +2021,6 @@ static int do_reset(struct ibmvnic_adapter *adapter, rc = ibmvnic_login(netdev); if (rc) { - adapter->state = reset_state; goto out; } @@ -2083,6 +2088,9 @@ static int do_reset(struct ibmvnic_adapter *adapter, rc = 0; out: + /* restore the adapter state if reset failed */ + if (rc) + adapter->state = reset_state; rtnl_unlock(); return rc; @@ -2115,43 +2123,46 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, if (rc) { netdev_err(adapter->netdev, "Couldn't initialize crq. rc=%d\n", rc); - return rc; + goto out; } rc = ibmvnic_reset_init(adapter, false); if (rc) - return rc; + goto out; /* If the adapter was in PROBE state prior to the reset, * exit here. */ if (reset_state == VNIC_PROBED) - return 0; + goto out; rc = ibmvnic_login(netdev); - if (rc) { - adapter->state = VNIC_PROBED; - return 0; - } + if (rc) + goto out; rc = init_resources(adapter); if (rc) - return rc; + goto out; ibmvnic_disable_irqs(adapter); adapter->state = VNIC_CLOSED; if (reset_state == VNIC_CLOSED) - return 0; + goto out; rc = __ibmvnic_open(netdev); - if (rc) - return IBMVNIC_OPEN_FAILED; + if (rc) { + rc = IBMVNIC_OPEN_FAILED; + goto out; + } call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, netdev); call_netdevice_notifiers(NETDEV_RESEND_IGMP, netdev); - - return 0; +out: + /* restore adapter state if reset failed */ + if (rc) + adapter->state = reset_state; + return rc; } static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter) @@ -2236,13 +2247,7 @@ static void __ibmvnic_reset(struct work_struct *work) rc = do_reset(adapter, rwi, reset_state); } kfree(rwi); - if (rc == IBMVNIC_OPEN_FAILED) { - if (list_empty(>rwi_list)) -
[PATCH net v3 0/9] ibmvnic: assorted bug fixes
Assorted fixes for ibmvnic originated from "[PATCH net 00/15] ibmvnic: assorted bug fixes" sent by Lijun Pan. v3 Changes as suggested by Jakub Kicinski: - Add a space between variable declaration and code in patch 3/9. Checkpatch does not catch this. - Unwrapped FIXES lines in patch 9/9. - Removed all extra line between Fixes and Signed-off-by lines in all patches. V2 Changes as suggested by Jakub Kicinski: - Added "Fixes" to each patch. - Remove "ibmvnic: process HMC disable command" from the series. Submitting it separately to net-next. - Squash V1 "ibmvnic: remove free_all_rwi function" into ibmvnic: stop free_all_rwi on failed reset. Dany Madden (7): ibmvnic: handle inconsistent login with reset ibmvnic: stop free_all_rwi on failed reset ibmvnic: avoid memset null scrq msgs ibmvnic: restore adapter state on failed reset ibmvnic: send_login should check for crq errors ibmvnic: no reset timeout for 5 seconds after reset ibmvnic: reduce wait for completion time Sukadev Bhattiprolu (2): ibmvnic: delay next reset if hard reset fails ibmvnic: track pending login drivers/net/ethernet/ibm/ibmvnic.c | 168 ++--- drivers/net/ethernet/ibm/ibmvnic.h | 3 + 2 files changed, 106 insertions(+), 65 deletions(-) -- 2.26.2
[powerpc:next] BUILD SUCCESS 0bd4b96d99108b7ea9bac0573957483be7781d70
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next branch HEAD: 0bd4b96d99108b7ea9bac0573957483be7781d70 powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations elapsed time: 960m configs tested: 130 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig powerpc xes_mpc85xx_defconfig powerpc sequoia_defconfig armoxnas_v6_defconfig powerpc ep8248e_defconfig arm corgi_defconfig powerpc mpc834x_itxgp_defconfig sh allmodconfig powerpc lite5200b_defconfig ia64 tiger_defconfig sh se7722_defconfig arm tct_hammer_defconfig sh se7721_defconfig mips maltaaprp_defconfig arm nhk8815_defconfig mipsar7_defconfig shtitan_defconfig powerpc mpc83xx_defconfig powerpc allmodconfig m68kstmark2_defconfig powerpc mpc866_ads_defconfig m68k apollo_defconfig powerpc64 defconfig sh apsh4a3a_defconfig powerpc mpc512x_defconfig s390defconfig nios2 defconfig mips rs90_defconfig nios2 3c120_defconfig armqcom_defconfig mips db1xxx_defconfig powerpcfsp2_defconfig c6xevmc6472_defconfig sh rsk7203_defconfig armmvebu_v7_defconfig mips decstation_r4k_defconfig parisc alldefconfig mips rm200_defconfig sh sh7770_generic_defconfig powerpcgamecube_defconfig armtrizeps4_defconfig powerpc mpc836x_mds_defconfig mips cavium_octeon_defconfig sh kfr2r09-romimage_defconfig arm mv78xx0_defconfig mips maltasmvp_defconfig m68kdefconfig shsh7763rdp_defconfig sparcalldefconfig armmagician_defconfig powerpc tqm8548_defconfig shsh7785lcr_defconfig armclps711x_defconfig powerpc sbc8548_defconfig arm lpc32xx_defconfig shdreamcast_defconfig powerpc mpc8313_rdb_defconfig xtensa alldefconfig arm lpd270_defconfig powerpc ppa8548_defconfig mips ip27_defconfig sh rsk7201_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68k allyesconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig parisc defconfig s390 allyesconfig parisc allyesconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allnoconfig i386 randconfig-a004-20201125 i386 randconfig-a003-20201125 i386 randconfig-a002-20201125 i386 randconfig-a005-20201125 i386 randconfig-a001-20201125 i386 randconfig-a006-20201125 x86_64 randconfig
[powerpc:next-test] BUILD SUCCESS 6cc5522b62bbc176e1a5666c401466a37ffc746e
defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allnoconfig i386 randconfig-a004-20201125 i386 randconfig-a003-20201125 i386 randconfig-a002-20201125 i386 randconfig-a005-20201125 i386 randconfig-a001-20201125 i386 randconfig-a006-20201125 x86_64 randconfig-a015-20201125 x86_64 randconfig-a011-20201125 x86_64 randconfig-a014-20201125 x86_64 randconfig-a016-20201125 x86_64 randconfig-a012-20201125 x86_64 randconfig-a013-20201125 i386 randconfig-a012-20201125 i386 randconfig-a013-20201125 i386 randconfig-a011-20201125 i386 randconfig-a016-20201125 i386 randconfig-a014-20201125 i386 randconfig-a015-20201125 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64 rhel x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 kexec clang tested configs: x86_64 randconfig-a006-20201125 x86_64 randconfig-a005-20201125 x86_64 randconfig-a003-20201125 x86_64 randconfig-a004-20201125 x86_64 randconfig-a002-20201125 x86_64 randconfig-a001-20201125 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
[powerpc:merge] BUILD SUCCESS 4c202167192a77481310a3cacae9f12618b92216
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: 4c202167192a77481310a3cacae9f12618b92216 Automatic merge of 'next' into merge (2020-11-25 15:11) elapsed time: 960m configs tested: 130 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig powerpc xes_mpc85xx_defconfig powerpc ep8248e_defconfig powerpc sequoia_defconfig armoxnas_v6_defconfig arm corgi_defconfig powerpc mpc834x_itxgp_defconfig sh allmodconfig powerpc lite5200b_defconfig ia64 tiger_defconfig sh se7722_defconfig arm tct_hammer_defconfig sh se7721_defconfig arm nhk8815_defconfig mips maltaaprp_defconfig mipsar7_defconfig shtitan_defconfig powerpc mpc83xx_defconfig powerpc allmodconfig m68kstmark2_defconfig powerpc mpc866_ads_defconfig m68k apollo_defconfig powerpc64 defconfig sh apsh4a3a_defconfig powerpc mpc512x_defconfig s390defconfig sh rsk7264_defconfig armvexpress_defconfig mips ath25_defconfig powerpc canyonlands_defconfig armpleb_defconfig x86_64 alldefconfig armneponset_defconfig shmigor_defconfig sh rsk7203_defconfig armmvebu_v7_defconfig mips decstation_r4k_defconfig parisc alldefconfig mips rm200_defconfig sh sh7770_generic_defconfig powerpcgamecube_defconfig armtrizeps4_defconfig powerpc mpc836x_mds_defconfig mips cavium_octeon_defconfig sh kfr2r09-romimage_defconfig arm mv78xx0_defconfig mips maltasmvp_defconfig m68kdefconfig shsh7763rdp_defconfig sparcalldefconfig armmagician_defconfig powerpc tqm8548_defconfig shsh7785lcr_defconfig powerpc mpc8313_rdb_defconfig xtensa alldefconfig arm lpd270_defconfig powerpc ppa8548_defconfig arm pxa_defconfig mips malta_kvm_defconfig riscvalldefconfig c6xevmc6474_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig parisc defconfig s390 allyesconfig parisc allyesconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allnoconfig i386 randconfig-a004-20201125 i386 randconfig-a003-20201125 i386 randconfig-a002-20201125 i386 randconfig-a005-20201125 i386 randconfig-a001-20201125 i386 randconfig-a006-20201125 x86_64 randconfig-a015-20201125 x86_64
Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
On Wed, 25 Nov 2020 16:42:30 + Marc Zyngier wrote: > On 2020-11-25 16:24, Laurent Vivier wrote: > > On 25/11/2020 17:05, Denis Kirjanov wrote: > >> On 11/25/20, Laurent Vivier wrote: > >>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU. > >>> > >>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ > >>> affinity") > >>> this is broken on pseries. > >> > >> Please add "Fixes" tag. > > > > In fact, the code in commit 0d9f0a52c8b9f is correct. > > > > The problem is with MSI/X irq affinity and pseries. So this patch > > fixes more than virtio_scsi. I put this information because this > > commit allows to clearly show the problem. Perhaps I should remove > > this line in fact? > > This patch does not fix virtio_scsi at all, which as you noticed, is > correct. It really fixes the PPC MSI setup, which is starting to show > its age. So getting rid of the reference seems like the right thing to > do. > > I'm also not keen on the BugId thing. It should really be a lore link. > I also cannot find any such tag in the kernel, nor is it a documented > practice. The last reference to a Bugzilla entry seems to have happened > with 786b5219081ff16 (five years ago). > My bad, I suggested BugId to Laurent but the intent was actually BugLink, which seems to be commonly used in the kernel. Cheers, -- Greg > Thanks, > > M.
Re: [PATCH] net/ethernet/freescale: Fix incorrect IS_ERR_VALUE macro usages
On Tue, Nov 24, 2020 at 8:00 PM liwei (GF) wrote: > > Hi Yang, > > On 2020/11/25 6:13, Li Yang wrote: > > On Tue, Nov 24, 2020 at 3:44 PM Li Yang wrote: > >> > >> On Tue, Nov 24, 2020 at 12:24 AM Wei Li wrote: > >>> > >>> IS_ERR_VALUE macro should be used only with unsigned long type. > >>> Especially it works incorrectly with unsigned shorter types on > >>> 64bit machines. > >> > >> This is truly a problem for the driver to run on 64-bit architectures. > >> But from an earlier discussion > >> https://patchwork.kernel.org/project/linux-kbuild/patch/1464384685-347275-1-git-send-email-a...@arndb.de/, > >> the preferred solution would be removing the IS_ERR_VALUE() usage or > >> make the values to be unsigned long. > >> > >> It looks like we are having a bigger problem with the 64-bit support > >> for the driver that the offset variables can also be real pointers > >> which cannot be held with 32-bit data types(when uf_info->bd_mem_part > >> == MEM_PART_SYSTEM). So actually we have to change these offsets to > >> unsigned long, otherwise we are having more serious issues on 64-bit > >> systems. Are you willing to make such changes or you want us to deal > >> with it? > > > > Well, it looks like this hardware block was never integrated on a > > 64-bit SoC and will very likely to keep so. So probably we can keep > > the driver 32-bit only. It is currently limited to PPC32 in Kconfig, > > how did you build it for 64-bit? > > > >> > > Thank you for providing the earlier discussion archive. In fact, this > issue is detected by our static analysis tool. Thanks for the effort, but this probably is a false positive for the static analysis tool as the 64-bit case is not buildable. > > From my view, there is no harm to fix these potential misuses. But if you > really have decided to keep the driver 32-bit only, please just ingore this > patch. It is not an easy task to add proper 64-bit support, so probably we just keep it 32-bit only for now. Thanks for the patch anyway. Regards, Leo > > Thanks, > Wei > > >>> > >>> Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs > >>> instead of pointers") > >>> Signed-off-by: Wei Li > >>> --- > >>> drivers/net/ethernet/freescale/ucc_geth.c | 30 +++ > >>> 1 file changed, 15 insertions(+), 15 deletions(-) > >>> > >>> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c > >>> b/drivers/net/ethernet/freescale/ucc_geth.c > >>> index 714b501be7d0..8656d9be256a 100644 > >>> --- a/drivers/net/ethernet/freescale/ucc_geth.c > >>> +++ b/drivers/net/ethernet/freescale/ucc_geth.c > >>> @@ -286,7 +286,7 @@ static int fill_init_enet_entries(struct > >>> ucc_geth_private *ugeth, > >>> else { > >>> init_enet_offset = > >>> qe_muram_alloc(thread_size, thread_alignment); > >>> - if (IS_ERR_VALUE(init_enet_offset)) { > >>> + if (IS_ERR_VALUE((unsigned > >>> long)(int)init_enet_offset)) { > >>> if (netif_msg_ifup(ugeth)) > >>> pr_err("Can not allocate DPRAM > >>> memory\n"); > >>> qe_put_snum((u8) snum); > >>> @@ -2223,7 +2223,7 @@ static int ucc_geth_alloc_tx(struct > >>> ucc_geth_private *ugeth) > >>> ugeth->tx_bd_ring_offset[j] = > >>> qe_muram_alloc(length, > >>>UCC_GETH_TX_BD_RING_ALIGNMENT); > >>> - if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j])) > >>> + if (!IS_ERR_VALUE((unsigned > >>> long)(int)ugeth->tx_bd_ring_offset[j])) > >>> ugeth->p_tx_bd_ring[j] = > >>> (u8 __iomem *) qe_muram_addr(ugeth-> > >>> > >>> tx_bd_ring_offset[j]); > >>> @@ -2300,7 +2300,7 @@ static int ucc_geth_alloc_rx(struct > >>> ucc_geth_private *ugeth) > >>> ugeth->rx_bd_ring_offset[j] = > >>> qe_muram_alloc(length, > >>>UCC_GETH_RX_BD_RING_ALIGNMENT); > >>> - if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j])) > >>> + if (!IS_ERR_VALUE((unsigned > >>> long)(int)ugeth->rx_bd_ring_offset[j])) > >>> ugeth->p_rx_bd_ring[j] = > >>> (u8 __iomem *) qe_muram_addr(ugeth-> > >>> > >>> rx_bd_ring_offset[j]); > >>> @@ -2510,7 +2510,7 @@ static int ucc_geth_startup(struct ucc_geth_private > >>> *ugeth) > >>> ugeth->tx_glbl_pram_offset = > >>> qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram), > >>>UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT); > >>> - if
Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
On 2020-11-25 16:24, Laurent Vivier wrote: On 25/11/2020 17:05, Denis Kirjanov wrote: On 11/25/20, Laurent Vivier wrote: With virtio multiqueue, normally each queue IRQ is mapped to a CPU. But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") this is broken on pseries. Please add "Fixes" tag. In fact, the code in commit 0d9f0a52c8b9f is correct. The problem is with MSI/X irq affinity and pseries. So this patch fixes more than virtio_scsi. I put this information because this commit allows to clearly show the problem. Perhaps I should remove this line in fact? This patch does not fix virtio_scsi at all, which as you noticed, is correct. It really fixes the PPC MSI setup, which is starting to show its age. So getting rid of the reference seems like the right thing to do. I'm also not keen on the BugId thing. It should really be a lore link. I also cannot find any such tag in the kernel, nor is it a documented practice. The last reference to a Bugzilla entry seems to have happened with 786b5219081ff16 (five years ago). Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
On 25/11/2020 17:05, Denis Kirjanov wrote: > On 11/25/20, Laurent Vivier wrote: >> With virtio multiqueue, normally each queue IRQ is mapped to a CPU. >> >> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") >> this is broken on pseries. > > Please add "Fixes" tag. In fact, the code in commit 0d9f0a52c8b9f is correct. The problem is with MSI/X irq affinity and pseries. So this patch fixes more than virtio_scsi. I put this information because this commit allows to clearly show the problem. Perhaps I should remove this line in fact? Thanks, Laurent > > Thanks! > >> >> The affinity is correctly computed in msi_desc but this is not applied >> to the system IRQs. >> >> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but >> lost at this point and never passed to irq_domain_alloc_descs() >> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")) >> because irq_create_mapping() doesn't take an affinity parameter. >> >> As the previous patch has added the affinity parameter to >> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs() >> to irq_domain_alloc_descs(). >> >> With this change, the virtqueues are correctly dispatched between the CPUs >> on pseries. >> >> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939 >> Signed-off-by: Laurent Vivier >> Reviewed-by: Greg Kurz >> --- >> arch/powerpc/platforms/pseries/msi.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/pseries/msi.c >> b/arch/powerpc/platforms/pseries/msi.c >> index 133f6adcb39c..b3ac2455faad 100644 >> --- a/arch/powerpc/platforms/pseries/msi.c >> +++ b/arch/powerpc/platforms/pseries/msi.c >> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int >> nvec_in, int type) >> return hwirq; >> } >> >> -virq = irq_create_mapping(NULL, hwirq); >> +virq = irq_create_mapping_affinity(NULL, hwirq, >> + entry->affinity); >> >> if (!virq) { >> pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq); >> -- >> 2.28.0 >> >> >
Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
On Wed, 25 Nov 2020 12:16:56 +0100 Laurent Vivier wrote: > This function adds an affinity parameter to irq_create_mapping(). > This parameter is needed to pass it to irq_domain_alloc_descs(). > > irq_create_mapping() is a wrapper around irq_create_mapping_affinity() > to pass NULL for the affinity parameter. > > No functional change. > > Signed-off-by: Laurent Vivier > --- Reviewed-by: Greg Kurz > include/linux/irqdomain.h | 12 ++-- > kernel/irq/irqdomain.c| 13 - > 2 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h > index 71535e87109f..ea5a337e0f8b 100644 > --- a/include/linux/irqdomain.h > +++ b/include/linux/irqdomain.h > @@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain > *domain, > extern void irq_domain_disassociate(struct irq_domain *domain, > unsigned int irq); > > -extern unsigned int irq_create_mapping(struct irq_domain *host, > -irq_hw_number_t hwirq); > +extern unsigned int irq_create_mapping_affinity(struct irq_domain *host, > + irq_hw_number_t hwirq, > + const struct irq_affinity_desc *affinity); > extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec); > extern void irq_dispose_mapping(unsigned int virq); > > +static inline unsigned int irq_create_mapping(struct irq_domain *host, > + irq_hw_number_t hwirq) > +{ > + return irq_create_mapping_affinity(host, hwirq, NULL); > +} > + > + > /** > * irq_linear_revmap() - Find a linux irq from a hw irq number. > * @domain: domain owning this hardware interrupt > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c > index cf8b374b892d..e4ca69608f3b 100644 > --- a/kernel/irq/irqdomain.c > +++ b/kernel/irq/irqdomain.c > @@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct > irq_domain *domain) > EXPORT_SYMBOL_GPL(irq_create_direct_mapping); > > /** > - * irq_create_mapping() - Map a hardware interrupt into linux irq space > + * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq > space > * @domain: domain owning this hardware interrupt or NULL for default domain > * @hwirq: hardware irq number in that domain space > + * @affinity: irq affinity > * > * Only one mapping per hardware interrupt is permitted. Returns a linux > * irq number. > * If the sense/trigger is to be specified, set_irq_type() should be called > * on the number returned from that call. > */ > -unsigned int irq_create_mapping(struct irq_domain *domain, > - irq_hw_number_t hwirq) > +unsigned int irq_create_mapping_affinity(struct irq_domain *domain, > +irq_hw_number_t hwirq, > +const struct irq_affinity_desc *affinity) > { > struct device_node *of_node; > int virq; > @@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain, > } > > /* Allocate a virtual interrupt number */ > - virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), > NULL); > + virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), > + affinity); > if (virq <= 0) { > pr_debug("-> virq allocation failed\n"); > return 0; > @@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain, > > return virq; > } > -EXPORT_SYMBOL_GPL(irq_create_mapping); > +EXPORT_SYMBOL_GPL(irq_create_mapping_affinity); > > /** > * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
On 11/25/20, Laurent Vivier wrote: > With virtio multiqueue, normally each queue IRQ is mapped to a CPU. > > But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") > this is broken on pseries. Please add "Fixes" tag. Thanks! > > The affinity is correctly computed in msi_desc but this is not applied > to the system IRQs. > > It appears the affinity is correctly passed to rtas_setup_msi_irqs() but > lost at this point and never passed to irq_domain_alloc_descs() > (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")) > because irq_create_mapping() doesn't take an affinity parameter. > > As the previous patch has added the affinity parameter to > irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs() > to irq_domain_alloc_descs(). > > With this change, the virtqueues are correctly dispatched between the CPUs > on pseries. > > BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939 > Signed-off-by: Laurent Vivier > Reviewed-by: Greg Kurz > --- > arch/powerpc/platforms/pseries/msi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/msi.c > b/arch/powerpc/platforms/pseries/msi.c > index 133f6adcb39c..b3ac2455faad 100644 > --- a/arch/powerpc/platforms/pseries/msi.c > +++ b/arch/powerpc/platforms/pseries/msi.c > @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int > nvec_in, int type) > return hwirq; > } > > - virq = irq_create_mapping(NULL, hwirq); > + virq = irq_create_mapping_affinity(NULL, hwirq, > +entry->affinity); > > if (!virq) { > pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq); > -- > 2.28.0 > >
[PATCH V4 4/5] ocxl: Add mmu notifier
Add invalidate_range mmu notifier, when required (ATSD access of MMIO registers is available), to initiate TLB invalidation commands. For the time being, the ATSD0 set of registers is used by default. The pasid and bdf values have to be configured in the Process Element Entry. The PEE must be set up to match the BDF/PASID of the AFU. Acked-by: Frederic Barrat Signed-off-by: Christophe Lombard --- drivers/misc/ocxl/link.c | 62 +++- 1 file changed, 61 insertions(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 77381dda2c45..129d4eddc4d2 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -2,8 +2,10 @@ // Copyright 2017 IBM Corp. #include #include +#include #include #include +#include #include #include #include @@ -33,6 +35,7 @@ #define SPA_PE_VALID 0x8000 +struct ocxl_link; struct pe_data { struct mm_struct *mm; @@ -41,6 +44,8 @@ struct pe_data { /* opaque pointer to be passed to the above callback */ void *xsl_err_data; struct rcu_head rcu; + struct ocxl_link *link; + struct mmu_notifier mmu_notifier; }; struct spa { @@ -83,6 +88,8 @@ struct ocxl_link { int domain; int bus; int dev; + void __iomem *arva; /* ATSD register virtual address */ + spinlock_t atsd_lock; /* to serialize shootdowns */ atomic_t irq_available; struct spa *spa; void *platform_data; @@ -388,6 +395,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l link->bus = dev->bus->number; link->dev = PCI_SLOT(dev->devfn); atomic_set(>irq_available, MAX_IRQ_PER_LINK); + spin_lock_init(>atsd_lock); rc = alloc_spa(dev, link); if (rc) @@ -403,6 +411,13 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l if (rc) goto err_xsl_irq; + /* if link->arva is not defeined, MMIO registers are not used to +* generate TLB invalidate. PowerBus snooping is enabled. +* Otherwise, PowerBus snooping is disabled. TLB Invalidates are +* initiated using MMIO registers. +*/ + pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0, >arva); + *out_link = link; return 0; @@ -454,6 +469,11 @@ static void release_xsl(struct kref *ref) { struct ocxl_link *link = container_of(ref, struct ocxl_link, ref); + if (link->arva) { + pnv_ocxl_unmap_lpar(link->arva); + link->arva = NULL; + } + list_del(>list); /* call platform code before releasing data */ pnv_ocxl_spa_release(link->platform_data); @@ -470,6 +490,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle) } EXPORT_SYMBOL_GPL(ocxl_link_release); +static void invalidate_range(struct mmu_notifier *mn, +struct mm_struct *mm, +unsigned long start, unsigned long end) +{ + struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier); + struct ocxl_link *link = pe_data->link; + unsigned long addr, pid, page_size = PAGE_SIZE; + + pid = mm->context.id; + + spin_lock(>atsd_lock); + for (addr = start; addr < end; addr += page_size) + pnv_ocxl_tlb_invalidate(link->arva, pid, addr, page_size); + spin_unlock(>atsd_lock); +} + +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = { + .invalidate_range = invalidate_range, +}; + static u64 calculate_cfg_state(bool kernel) { u64 state; @@ -526,6 +566,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, pe_data->mm = mm; pe_data->xsl_err_cb = xsl_err_cb; pe_data->xsl_err_data = xsl_err_data; + pe_data->link = link; + pe_data->mmu_notifier.ops = _mmu_notifier_ops; memset(pe, 0, sizeof(struct ocxl_process_element)); pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0)); @@ -542,8 +584,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, * by the nest MMU. If we have a kernel context, TLBIs are * already global. */ - if (mm) + if (mm) { mm_context_add_copro(mm); + if (link->arva) { + /* Use MMIO registers for the TLB Invalidate +* operations. +*/ + mmu_notifier_register(_data->mmu_notifier, mm); + } + } + /* * Barrier is to make sure PE is visible in the SPA before it * is used by the device. It also helps with the global TLBI @@ -674,6 +724,16 @@ int ocxl_link_remove_pe(void *link_handle, int pasid) WARN(1, "Couldn't find pe data when removing PE\n"); } else { if
[PATCH V4 2/5] ocxl: Initiate a TLB invalidate command
When a TLB Invalidate is required for the Logical Partition, the following sequence has to be performed: 1. Load MMIO ATSD AVA register with the necessary value, if required. 2. Write the MMIO ATSD launch register to initiate the TLB Invalidate command. 3. Poll the MMIO ATSD status register to determine when the TLB Invalidate has been completed. Signed-off-by: Christophe Lombard --- arch/powerpc/include/asm/pnv-ocxl.h | 51 arch/powerpc/platforms/powernv/ocxl.c | 69 +++ 2 files changed, 120 insertions(+) diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h index 60c3c74427d9..9acd1fbf1197 100644 --- a/arch/powerpc/include/asm/pnv-ocxl.h +++ b/arch/powerpc/include/asm/pnv-ocxl.h @@ -3,12 +3,59 @@ #ifndef _ASM_PNV_OCXL_H #define _ASM_PNV_OCXL_H +#include #include #define PNV_OCXL_TL_MAX_TEMPLATE63 #define PNV_OCXL_TL_BITS_PER_RATE 4 #define PNV_OCXL_TL_RATE_BUF_SIZE ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8) +#define PNV_OCXL_ATSD_TIMEOUT 1 + +/* TLB Management Instructions */ +#define PNV_OCXL_ATSD_LNCH 0x00 +/* Radix Invalidate */ +#define PNV_OCXL_ATSD_LNCH_R PPC_BIT(0) +/* Radix Invalidation Control + * 0b00 Just invalidate TLB. + * 0b01 Invalidate just Page Walk Cache. + * 0b10 Invalidate TLB, Page Walk Cache, and any + * caching of Partition and Process Table Entries. + */ +#define PNV_OCXL_ATSD_LNCH_RIC PPC_BITMASK(1, 2) +/* Number and Page Size of translations to be invalidated */ +#define PNV_OCXL_ATSD_LNCH_LPPPC_BITMASK(3, 10) +/* Invalidation Criteria + * 0b00 Invalidate just the target VA. + * 0b01 Invalidate matching PID. + */ +#define PNV_OCXL_ATSD_LNCH_ISPPC_BITMASK(11, 12) +/* 0b1: Process Scope, 0b0: Partition Scope */ +#define PNV_OCXL_ATSD_LNCH_PRS PPC_BIT(13) +/* Invalidation Flag */ +#define PNV_OCXL_ATSD_LNCH_B PPC_BIT(14) +/* Actual Page Size to be invalidated + * 000 4KB + * 101 64KB + * 001 2MB + * 010 1GB + */ +#define PNV_OCXL_ATSD_LNCH_APPPC_BITMASK(15, 17) +/* Defines the large page select + * L=0b0 for 4KB pages + * L=0b1 for large pages) + */ +#define PNV_OCXL_ATSD_LNCH_L PPC_BIT(18) +/* Process ID */ +#define PNV_OCXL_ATSD_LNCH_PID PPC_BITMASK(19, 38) +/* NoFlush – Assumed to be 0b0 */ +#define PNV_OCXL_ATSD_LNCH_F PPC_BIT(39) +#define PNV_OCXL_ATSD_LNCH_OCAPI_SLBIPPC_BIT(40) +#define PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON PPC_BIT(41) +#define PNV_OCXL_ATSD_AVA 0x08 +#define PNV_OCXL_ATSD_AVA_AVAPPC_BITMASK(0, 51) +#define PNV_OCXL_ATSD_STAT 0x10 + int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 *supported); int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count); @@ -31,4 +78,8 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle); int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid, uint64_t lpcr, void __iomem **arva); void pnv_ocxl_unmap_lpar(void __iomem *arva); +void pnv_ocxl_tlb_invalidate(void __iomem *arva, +unsigned long pid, +unsigned long addr, +unsigned long page_size); #endif /* _ASM_PNV_OCXL_H */ diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c index 57fc1062677b..9105efcf242a 100644 --- a/arch/powerpc/platforms/powernv/ocxl.c +++ b/arch/powerpc/platforms/powernv/ocxl.c @@ -528,3 +528,72 @@ void pnv_ocxl_unmap_lpar(void __iomem *arva) iounmap(arva); } EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar); + +void pnv_ocxl_tlb_invalidate(void __iomem *arva, +unsigned long pid, +unsigned long addr, +unsigned long page_size) +{ + unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT); + u64 val = 0ull; + int pend; + u8 size; + + if (!(arva)) + return; + + if (addr) { + /* load Abbreviated Virtual Address register with +* the necessary value +*/ + val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51)); + out_be64(arva + PNV_OCXL_ATSD_AVA, val); + } + + /* Write access initiates a shoot down to initiate the +* TLB Invalidate command +*/ + val = PNV_OCXL_ATSD_LNCH_R; + val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10); + if (addr) + val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00); + else { + val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01); + val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON; + } + val |= PNV_OCXL_ATSD_LNCH_PRS; + /* Actual Page Size to be invalidated +* 000 4KB +* 101 64KB +
[PATCH V4 5/5] ocxl: Add new kernel traces
Add specific kernel traces which provide information on mmu notifier and on pages range. Acked-by: Frederic Barrat Signed-off-by: Christophe Lombard --- drivers/misc/ocxl/link.c | 4 +++ drivers/misc/ocxl/trace.h | 64 +++ 2 files changed, 68 insertions(+) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 129d4eddc4d2..ab039c115381 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -499,6 +499,7 @@ static void invalidate_range(struct mmu_notifier *mn, unsigned long addr, pid, page_size = PAGE_SIZE; pid = mm->context.id; + trace_ocxl_mmu_notifier_range(start, end, pid); spin_lock(>atsd_lock); for (addr = start; addr < end; addr += page_size) @@ -590,6 +591,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, /* Use MMIO registers for the TLB Invalidate * operations. */ + trace_ocxl_init_mmu_notifier(pasid, mm->context.id); mmu_notifier_register(_data->mmu_notifier, mm); } } @@ -725,6 +727,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid) } else { if (pe_data->mm) { if (link->arva) { + trace_ocxl_release_mmu_notifier(pasid, + pe_data->mm->context.id); mmu_notifier_unregister(_data->mmu_notifier, pe_data->mm); spin_lock(>atsd_lock); diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h index 17e21cb2addd..a33a5094ff6c 100644 --- a/drivers/misc/ocxl/trace.h +++ b/drivers/misc/ocxl/trace.h @@ -8,6 +8,70 @@ #include + +TRACE_EVENT(ocxl_mmu_notifier_range, + TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr), + TP_ARGS(start, end, pidr), + + TP_STRUCT__entry( + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned long, pidr) + ), + + TP_fast_assign( + __entry->start = start; + __entry->end = end; + __entry->pidr = pidr; + ), + + TP_printk("start=0x%lx end=0x%lx pidr=0x%lx", + __entry->start, + __entry->end, + __entry->pidr + ) +); + +TRACE_EVENT(ocxl_init_mmu_notifier, + TP_PROTO(int pasid, unsigned long pidr), + TP_ARGS(pasid, pidr), + + TP_STRUCT__entry( + __field(int, pasid) + __field(unsigned long, pidr) + ), + + TP_fast_assign( + __entry->pasid = pasid; + __entry->pidr = pidr; + ), + + TP_printk("pasid=%d, pidr=0x%lx", + __entry->pasid, + __entry->pidr + ) +); + +TRACE_EVENT(ocxl_release_mmu_notifier, + TP_PROTO(int pasid, unsigned long pidr), + TP_ARGS(pasid, pidr), + + TP_STRUCT__entry( + __field(int, pasid) + __field(unsigned long, pidr) + ), + + TP_fast_assign( + __entry->pasid = pasid; + __entry->pidr = pidr; + ), + + TP_printk("pasid=%d, pidr=0x%lx", + __entry->pasid, + __entry->pidr + ) +); + DECLARE_EVENT_CLASS(ocxl_context, TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr), TP_ARGS(pid, spa, pasid, pidr, tidr), -- 2.28.0
[PATCH V4 1/5] ocxl: Assign a register set to a Logical Partition
Platform specific function to assign a register set to a Logical Partition. The "ibm,mmio-atsd" property, provided by the firmware, contains the 16 base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO registers (XTS MMIO ATSDx LPARID/AVA/launch/status register). For the time being, the ATSD0 set of registers is used by default. Acked-by: Frederic Barrat Signed-off-by: Christophe Lombard --- arch/powerpc/include/asm/pnv-ocxl.h | 3 ++ arch/powerpc/platforms/powernv/ocxl.c | 45 +++ 2 files changed, 48 insertions(+) diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h index d37ededca3ee..60c3c74427d9 100644 --- a/arch/powerpc/include/asm/pnv-ocxl.h +++ b/arch/powerpc/include/asm/pnv-ocxl.h @@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void **p void pnv_ocxl_spa_release(void *platform_data); int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle); +int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid, + uint64_t lpcr, void __iomem **arva); +void pnv_ocxl_unmap_lpar(void __iomem *arva); #endif /* _ASM_PNV_OCXL_H */ diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c index ecdad219d704..57fc1062677b 100644 --- a/arch/powerpc/platforms/powernv/ocxl.c +++ b/arch/powerpc/platforms/powernv/ocxl.c @@ -483,3 +483,48 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle) return rc; } EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache); + +int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid, + uint64_t lpcr, void __iomem **arva) +{ + struct pci_controller *hose = pci_bus_to_host(dev->bus); + struct pnv_phb *phb = hose->private_data; + u64 mmio_atsd; + int rc; + + /* ATSD physical address. +* ATSD LAUNCH register: write access initiates a shoot down to +* initiate the TLB Invalidate command. +*/ + rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd", + 0, _atsd); + if (rc) { + dev_info(>dev, "No available ATSD found\n"); + return rc; + } + + /* Assign a register set to a Logical Partition and MMIO ATSD +* LPARID register to the required value. +*/ + rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev), + lparid, lpcr); + if (rc) { + dev_err(>dev, "Error mapping device to LPAR: %d\n", rc); + return rc; + } + + *arva = ioremap(mmio_atsd, 24); + if (!(*arva)) { + dev_warn(>dev, "ioremap failed - mmio_atsd: %#llx\n", mmio_atsd); + rc = -ENOMEM; + } + + return rc; +} +EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar); + +void pnv_ocxl_unmap_lpar(void __iomem *arva) +{ + iounmap(arva); +} +EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar); -- 2.28.0
[PATCH V4 3/5] ocxl: Update the Process Element Entry
To complete the MMIO based mechanism, the fields: PASID, bus, device and function of the Process Element Entry have to be filled. (See OpenCAPI Power Platform Architecture document) Hypervisor Process Element Entry Word 0 1 7 8 .. 12 13 ..15 16 19 20 ... 31 0 OSL Configuration State (0:31) 1 OSL Configuration State (32:63) 2 PASID |Reserved 3 Bus | Device|Function |Reserved 4 Reserved 5 Reserved 6 Acked-by: Frederic Barrat Signed-off-by: Christophe Lombard --- drivers/misc/ocxl/context.c | 4 +++- drivers/misc/ocxl/link.c | 4 +++- drivers/misc/ocxl/ocxl_internal.h | 9 ++--- drivers/scsi/cxlflash/ocxl_hw.c | 6 -- include/misc/ocxl.h | 2 +- 5 files changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c index c21f65a5c762..9eb0d93b01c6 100644 --- a/drivers/misc/ocxl/context.c +++ b/drivers/misc/ocxl/context.c @@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm) { int rc; unsigned long pidr = 0; + struct pci_dev *dev; // Locks both status & tidr mutex_lock(>status_mutex); @@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm) if (mm) pidr = mm->context.id; + dev = to_pci_dev(ctx->afu->fn->dev.parent); rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr, - amr, mm, xsl_fault_error, ctx); + amr, pci_dev_id(dev), mm, xsl_fault_error, ctx); if (rc) goto out; diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index fd73d3bc0eb6..77381dda2c45 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel) } int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, - u64 amr, struct mm_struct *mm, + u64 amr, u16 bdf, struct mm_struct *mm, void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr), void *xsl_err_data) { @@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, memset(pe, 0, sizeof(struct ocxl_process_element)); pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0)); + pe->pasid = cpu_to_be32(pasid << (31 - 19)); + pe->bdf = cpu_to_be16(bdf); pe->lpid = cpu_to_be32(mfspr(SPRN_LPID)); pe->pid = cpu_to_be32(pidr); pe->tid = cpu_to_be32(tidr); diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h index 0bad0a123af6..10125a22d5a5 100644 --- a/drivers/misc/ocxl/ocxl_internal.h +++ b/drivers/misc/ocxl/ocxl_internal.h @@ -84,13 +84,16 @@ struct ocxl_context { struct ocxl_process_element { __be64 config_state; - __be32 reserved1[11]; + __be32 pasid; + __be16 bdf; + __be16 reserved1; + __be32 reserved2[9]; __be32 lpid; __be32 tid; __be32 pid; - __be32 reserved2[10]; + __be32 reserved3[10]; __be64 amr; - __be32 reserved3[3]; + __be32 reserved4[3]; __be32 software_state; }; diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c index e4e0d767b98e..244fc27215dc 100644 --- a/drivers/scsi/cxlflash/ocxl_hw.c +++ b/drivers/scsi/cxlflash/ocxl_hw.c @@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx) struct ocxl_hw_afu *afu = ctx->hw_afu; struct ocxl_afu_config *acfg = >acfg; void *link_token = afu->link_token; + struct pci_dev *pdev = afu->pdev; struct device *dev = afu->dev; bool master = ctx->master; struct mm_struct *mm; @@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx) mm = current->mm; } - rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm, - ocxlflash_xsl_fault, ctx); + rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, + pci_dev_id(pdev), mm, ocxlflash_xsl_fault, + ctx); if (unlikely(rc)) { dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n", __func__, rc); diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h index e013736e275d..3ed736da02c8 100644 --- a/include/misc/ocxl.h +++ b/include/misc/ocxl.h @@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle); * defined */ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, - u64 amr, struct mm_struct *mm, +
[PATCH V4 0/5] ocxl: Mmio invalidation support
OpenCAPI 4.0/5.0 with TLBI/SLBI Snooping, is not used due to performance problems caused by the PAU having to process all incoming TLBI/SLBI commands which will cause them to back up on the PowerBus. When the Address Translation Mode requires TLB operations to be initiated using MMIO registers, a set of registers like the following is used: • XTS MMIO ATSD0 LPARID register • XTS MMIO ATSD0 AVA register • XTS MMIO ATSD0 launch register, write access initiates a shoot down • XTS MMIO ATSD0 status register The MMIO based mechanism also blocks the NPU/PAU from snooping TLBIE commands from the PowerBus. The Shootdown commands (ATSD) will be generated using MMIO registers in the NPU/PAU and sent to the device. Signed-off-by: Christophe Lombard --- Changelog[v4] - Rebase to latest upstream. - Correct a typo in page size Changelog[v3] - Rebase to latest upstream. - Add page_size argument in pnv_ocxl_tlb_invalidate() - Remove double pointer Changelog[v2] - Rebase to latest upstream. - Create a set of smaller patches - Move the device tree parsing and ioremap() for the shootdown page in a platform-specific file (powernv) - Release the shootdown page in release_xsl() - Initialize atsd_lock - Move the code to initiate the TLB Invalidate command in a platform-specific file (powernv) - Use the notifier invalidate_range --- Christophe Lombard (5): ocxl: Assign a register set to a Logical Partition ocxl: Initiate a TLB invalidate command ocxl: Update the Process Element Entry ocxl: Add mmu notifier ocxl: Add new kernel traces arch/powerpc/include/asm/pnv-ocxl.h | 54 arch/powerpc/platforms/powernv/ocxl.c | 114 ++ drivers/misc/ocxl/context.c | 4 +- drivers/misc/ocxl/link.c | 70 +++- drivers/misc/ocxl/ocxl_internal.h | 9 +- drivers/misc/ocxl/trace.h | 64 +++ drivers/scsi/cxlflash/ocxl_hw.c | 6 +- include/misc/ocxl.h | 2 +- 8 files changed, 314 insertions(+), 9 deletions(-) -- 2.28.0
Re: [PATCH net 1/2] ibmvnic: Ensure that SCRQ entry reads are correctly ordered
On 11/24/20 11:43 PM, Michael Ellerman wrote: Thomas Falcon writes: Ensure that received Subordinate Command-Response Queue (SCRQ) entries are properly read in order by the driver. These queues are used in the ibmvnic device to process RX buffer and TX completion descriptors. dma_rmb barriers have been added after checking for a pending descriptor to ensure the correct descriptor entry is checked and after reading the SCRQ descriptor to ensure the entire descriptor is read before processing. Fixes: 032c5e828 ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 2aa40b2..489ed5e 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2403,6 +2403,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int budget) if (!pending_scrq(adapter, adapter->rx_scrq[scrq_num])) break; + /* ensure that we do not prematurely exit the polling loop */ + dma_rmb(); I'd be happier if these comments were more specific about which read(s) they are ordering vs which other read(s). I'm sure it's obvious to you, but it may not be to a future author, and/or after the code has been refactored over time. Thank you for reviewing! I will submit a v2 soon with clearer comments on the reads being ordered here. Thanks, Tom next = ibmvnic_next_scrq(adapter, adapter->rx_scrq[scrq_num]); rx_buff = (struct ibmvnic_rx_buff *)be64_to_cpu(next-> @@ -3098,6 +3100,9 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter, unsigned int pool = scrq->pool_index; int num_entries = 0; + /* ensure that the correct descriptor entry is read */ + dma_rmb(); + next = ibmvnic_next_scrq(adapter, scrq); for (i = 0; i < next->tx_comp.num_comps; i++) { if (next->tx_comp.rcs[i]) { @@ -3498,6 +3503,9 @@ static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *adapter, } spin_unlock_irqrestore(>lock, flags); + /* ensure that the entire SCRQ descriptor is read */ + dma_rmb(); + return entry; } cheers
[PATCH v3 1/2] genirq/irqdomain: Add an irq_create_mapping_affinity() function
There is currently no way to convey the affinity of an interrupt via irq_create_mapping(), which creates issues for devices that expect that affinity to be managed by the kernel. In order to sort this out, rename irq_create_mapping() to irq_create_mapping_affinity() with an additional affinity parameter that can conveniently passed down to irq_domain_alloc_descs(). irq_create_mapping() is then re-implemented as a wrapper around irq_create_mapping_affinity(). Signed-off-by: Laurent Vivier Reviewed-by: Greg Kurz --- include/linux/irqdomain.h | 12 ++-- kernel/irq/irqdomain.c| 13 - 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 71535e87109f..ea5a337e0f8b 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain, extern void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq); -extern unsigned int irq_create_mapping(struct irq_domain *host, - irq_hw_number_t hwirq); +extern unsigned int irq_create_mapping_affinity(struct irq_domain *host, + irq_hw_number_t hwirq, + const struct irq_affinity_desc *affinity); extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec); extern void irq_dispose_mapping(unsigned int virq); +static inline unsigned int irq_create_mapping(struct irq_domain *host, + irq_hw_number_t hwirq) +{ + return irq_create_mapping_affinity(host, hwirq, NULL); +} + + /** * irq_linear_revmap() - Find a linux irq from a hw irq number. * @domain: domain owning this hardware interrupt diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index cf8b374b892d..e4ca69608f3b 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain) EXPORT_SYMBOL_GPL(irq_create_direct_mapping); /** - * irq_create_mapping() - Map a hardware interrupt into linux irq space + * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space * @domain: domain owning this hardware interrupt or NULL for default domain * @hwirq: hardware irq number in that domain space + * @affinity: irq affinity * * Only one mapping per hardware interrupt is permitted. Returns a linux * irq number. * If the sense/trigger is to be specified, set_irq_type() should be called * on the number returned from that call. */ -unsigned int irq_create_mapping(struct irq_domain *domain, - irq_hw_number_t hwirq) +unsigned int irq_create_mapping_affinity(struct irq_domain *domain, + irq_hw_number_t hwirq, + const struct irq_affinity_desc *affinity) { struct device_node *of_node; int virq; @@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain, } /* Allocate a virtual interrupt number */ - virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL); + virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), + affinity); if (virq <= 0) { pr_debug("-> virq allocation failed\n"); return 0; @@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain, return virq; } -EXPORT_SYMBOL_GPL(irq_create_mapping); +EXPORT_SYMBOL_GPL(irq_create_mapping_affinity); /** * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs -- 2.28.0
[PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
With virtio multiqueue, normally each queue IRQ is mapped to a CPU. But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") this is broken on pseries. The affinity is correctly computed in msi_desc but this is not applied to the system IRQs. It appears the affinity is correctly passed to rtas_setup_msi_irqs() but lost at this point and never passed to irq_domain_alloc_descs() (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")) because irq_create_mapping() doesn't take an affinity parameter. As the previous patch has added the affinity parameter to irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs() to irq_domain_alloc_descs(). With this change, the virtqueues are correctly dispatched between the CPUs on pseries. BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939 Signed-off-by: Laurent Vivier Reviewed-by: Greg Kurz --- arch/powerpc/platforms/pseries/msi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c index 133f6adcb39c..b3ac2455faad 100644 --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) return hwirq; } - virq = irq_create_mapping(NULL, hwirq); + virq = irq_create_mapping_affinity(NULL, hwirq, + entry->affinity); if (!virq) { pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq); -- 2.28.0
[PATCH v3 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries
With virtio, in multiqueue case, each queue IRQ is normally bound to a different CPU using the affinity mask. This works fine on x86_64 but totally ignored on pseries. This is not obvious at first look because irqbalance is doing some balancing to improve that. It appears that the "managed" flag set in the MSI entry is never copied to the system IRQ entry. This series passes the affinity mask from rtas_setup_msi_irqs() to irq_domain_alloc_descs() by adding an affinity parameter to irq_create_mapping(). The first patch adds the parameter (no functional change), the second patch passes the actual affinity mask to irq_create_mapping() in rtas_setup_msi_irqs(). For instance, with 32 CPUs VM and 32 queues virtio-scsi interface: ... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32 for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do for file in /proc/irq/$IRQ/ ; do echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list done done Without the patch (and without irqbalanced) IRQ: 268 CPU: 0-31 IRQ: 269 CPU: 0-31 IRQ: 270 CPU: 0-31 IRQ: 271 CPU: 0-31 IRQ: 272 CPU: 0-31 IRQ: 273 CPU: 0-31 IRQ: 274 CPU: 0-31 IRQ: 275 CPU: 0-31 IRQ: 276 CPU: 0-31 IRQ: 277 CPU: 0-31 IRQ: 278 CPU: 0-31 IRQ: 279 CPU: 0-31 IRQ: 280 CPU: 0-31 IRQ: 281 CPU: 0-31 IRQ: 282 CPU: 0-31 IRQ: 283 CPU: 0-31 IRQ: 284 CPU: 0-31 IRQ: 285 CPU: 0-31 IRQ: 286 CPU: 0-31 IRQ: 287 CPU: 0-31 IRQ: 288 CPU: 0-31 IRQ: 289 CPU: 0-31 IRQ: 290 CPU: 0-31 IRQ: 291 CPU: 0-31 IRQ: 292 CPU: 0-31 IRQ: 293 CPU: 0-31 IRQ: 294 CPU: 0-31 IRQ: 295 CPU: 0-31 IRQ: 296 CPU: 0-31 IRQ: 297 CPU: 0-31 IRQ: 298 CPU: 0-31 IRQ: 299 CPU: 0-31 With the patch: IRQ: 265 CPU: 0 IRQ: 266 CPU: 1 IRQ: 267 CPU: 2 IRQ: 268 CPU: 3 IRQ: 269 CPU: 4 IRQ: 270 CPU: 5 IRQ: 271 CPU: 6 IRQ: 272 CPU: 7 IRQ: 273 CPU: 8 IRQ: 274 CPU: 9 IRQ: 275 CPU: 10 IRQ: 276 CPU: 11 IRQ: 277 CPU: 12 IRQ: 278 CPU: 13 IRQ: 279 CPU: 14 IRQ: 280 CPU: 15 IRQ: 281 CPU: 16 IRQ: 282 CPU: 17 IRQ: 283 CPU: 18 IRQ: 284 CPU: 19 IRQ: 285 CPU: 20 IRQ: 286 CPU: 21 IRQ: 287 CPU: 22 IRQ: 288 CPU: 23 IRQ: 289 CPU: 24 IRQ: 290 CPU: 25 IRQ: 291 CPU: 26 IRQ: 292 CPU: 27 IRQ: 293 CPU: 28 IRQ: 294 CPU: 29 IRQ: 295 CPU: 30 IRQ: 299 CPU: 31 This matches what we have on an x86_64 system. v3: update changelog of PATCH 1 with comments from Thomas Gleixner and Marc Zyngier. v2: add a wrapper around original irq_create_mapping() with the affinity parameter. Update comments Laurent Vivier (2): genirq/irqdomain: Add an irq_create_mapping_affinity() function powerpc/pseries: pass MSI affinity to irq_create_mapping() arch/powerpc/platforms/pseries/msi.c | 3 ++- include/linux/irqdomain.h| 12 ++-- kernel/irq/irqdomain.c | 13 - 3 files changed, 20 insertions(+), 8 deletions(-) -- 2.28.0
Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
On 25/11/2020 15:54, Marc Zyngier wrote: > On 2020-11-25 14:09, Laurent Vivier wrote: >> On 25/11/2020 14:20, Thomas Gleixner wrote: >>> Laurent, >>> >>> On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote: >>> >>> The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter >>> after the colon wants to be uppercase. >> >> Ok. >> This function adds an affinity parameter to irq_create_mapping(). This parameter is needed to pass it to irq_domain_alloc_descs(). >>> >>> A changelog has to explain the WHY. 'The parameter is needed' is not >>> really useful information. >>> >> >> The reason of this change is explained in PATCH 2. >> >> I have two patches, one to change the interface with no functional >> change (PATCH 1) and >> one to fix the problem (PATCH 2). Moreover they don't cover the same >> subsystems. >> >> I can either: >> - merge the two patches >> - or make a reference in the changelog of PATCH 1 to PATCH 2 >> (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to >> irq_create_mapping()")") >> - or copy some information from PATCH 2 >> (something like "this parameter is needed by rtas_setup_msi_irqs() >> to pass the affinity >> to irq_domain_alloc_descs() to fix multiqueue affinity") >> >> What do you prefer? > > How about something like this for the first patch: > > "There is currently no way to convey the affinity of an interrupt > via irq_create_mapping(), which creates issues for devices that > expect that affinity to be managed by the kernel. > > In order to sort this out, rename irq_create_mapping() to > irq_create_mapping_affinity() with an additional affinity parameter > that can conveniently passed down to irq_domain_alloc_descs(). > > irq_create_mapping() is then re-implemented as a wrapper around > irq_create_mapping_affinity()." It looks perfect. I update the changelog with that. Thanks, Laurent
Re: [PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS
Christophe Leroy writes: > Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c >> index b1d091a97611..7dc71f85683d 100644 >> --- a/arch/powerpc/mm/book3s64/pkeys.c >> +++ b/arch/powerpc/mm/book3s64/pkeys.c >> @@ -89,12 +89,14 @@ static int scan_pkey_feature(void) >> } >> } >> >> +#ifdef CONFIG_PPC_MEM_KEYS >> /* >> * Adjust the upper limit, based on the number of bits supported by >> * arch-neutral code. >> */ >> pkeys_total = min_t(int, pkeys_total, >> ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1)); > > I don't think we need an #ifdef here. I thing an 'if > (IS_ENABLED(CONFIG_PPC_MEM_KEYS))' should make it. ppc64/arch/powerpc/mm/book3s64/pkeys.c: In function ‘scan_pkey_feature’: ppc64/arch/powerpc/mm/book3s64/pkeys.c:98:33: error: ‘VM_PKEY_SHIFT’ undeclared (first use in this function) 98 | ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1)); | ^ pkey headers only include arch headers if PPC_MEM_KEYS is enabled. ie, #ifdef CONFIG_ARCH_HAS_PKEYS #include #else /* ! CONFIG_ARCH_HAS_PKEYS */ #define arch_max_pkey() (1) #define execute_only_pkey(mm) (0) #define arch_override_mprotect_pkey(vma, prot, pkey) (0) #define PKEY_DEDICATED_EXECUTE_ONLY 0 #define ARCH_VM_PKEY_FLAGS 0 .. Sorting that out should be another patch series. > >> +#endif >> return pkeys_total; >> }
Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
On 2020-11-25 14:09, Laurent Vivier wrote: On 25/11/2020 14:20, Thomas Gleixner wrote: Laurent, On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote: The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter after the colon wants to be uppercase. Ok. This function adds an affinity parameter to irq_create_mapping(). This parameter is needed to pass it to irq_domain_alloc_descs(). A changelog has to explain the WHY. 'The parameter is needed' is not really useful information. The reason of this change is explained in PATCH 2. I have two patches, one to change the interface with no functional change (PATCH 1) and one to fix the problem (PATCH 2). Moreover they don't cover the same subsystems. I can either: - merge the two patches - or make a reference in the changelog of PATCH 1 to PATCH 2 (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to irq_create_mapping()")") - or copy some information from PATCH 2 (something like "this parameter is needed by rtas_setup_msi_irqs() to pass the affinity to irq_domain_alloc_descs() to fix multiqueue affinity") What do you prefer? How about something like this for the first patch: "There is currently no way to convey the affinity of an interrupt via irq_create_mapping(), which creates issues for devices that expect that affinity to be managed by the kernel. In order to sort this out, rename irq_create_mapping() to irq_create_mapping_affinity() with an additional affinity parameter that can conveniently passed down to irq_domain_alloc_descs(). irq_create_mapping() is then re-implemented as a wrapper around irq_create_mapping_affinity()." Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH V3 5/5] ocxl: Add new kernel traces
On 24/11/2020 10:58, Christophe Lombard wrote: Add specific kernel traces which provide information on mmu notifier and on pages range. Signed-off-by: Christophe Lombard --- Acked-by: Frederic Barrat drivers/misc/ocxl/link.c | 4 +++ drivers/misc/ocxl/trace.h | 64 +++ 2 files changed, 68 insertions(+) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 129d4eddc4d2..ab039c115381 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -499,6 +499,7 @@ static void invalidate_range(struct mmu_notifier *mn, unsigned long addr, pid, page_size = PAGE_SIZE; pid = mm->context.id; + trace_ocxl_mmu_notifier_range(start, end, pid); spin_lock(>atsd_lock); for (addr = start; addr < end; addr += page_size) @@ -590,6 +591,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, /* Use MMIO registers for the TLB Invalidate * operations. */ + trace_ocxl_init_mmu_notifier(pasid, mm->context.id); mmu_notifier_register(_data->mmu_notifier, mm); } } @@ -725,6 +727,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid) } else { if (pe_data->mm) { if (link->arva) { + trace_ocxl_release_mmu_notifier(pasid, + pe_data->mm->context.id); mmu_notifier_unregister(_data->mmu_notifier, pe_data->mm); spin_lock(>atsd_lock); diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h index 17e21cb2addd..a33a5094ff6c 100644 --- a/drivers/misc/ocxl/trace.h +++ b/drivers/misc/ocxl/trace.h @@ -8,6 +8,70 @@ #include + +TRACE_EVENT(ocxl_mmu_notifier_range, + TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr), + TP_ARGS(start, end, pidr), + + TP_STRUCT__entry( + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned long, pidr) + ), + + TP_fast_assign( + __entry->start = start; + __entry->end = end; + __entry->pidr = pidr; + ), + + TP_printk("start=0x%lx end=0x%lx pidr=0x%lx", + __entry->start, + __entry->end, + __entry->pidr + ) +); + +TRACE_EVENT(ocxl_init_mmu_notifier, + TP_PROTO(int pasid, unsigned long pidr), + TP_ARGS(pasid, pidr), + + TP_STRUCT__entry( + __field(int, pasid) + __field(unsigned long, pidr) + ), + + TP_fast_assign( + __entry->pasid = pasid; + __entry->pidr = pidr; + ), + + TP_printk("pasid=%d, pidr=0x%lx", + __entry->pasid, + __entry->pidr + ) +); + +TRACE_EVENT(ocxl_release_mmu_notifier, + TP_PROTO(int pasid, unsigned long pidr), + TP_ARGS(pasid, pidr), + + TP_STRUCT__entry( + __field(int, pasid) + __field(unsigned long, pidr) + ), + + TP_fast_assign( + __entry->pasid = pasid; + __entry->pidr = pidr; + ), + + TP_printk("pasid=%d, pidr=0x%lx", + __entry->pasid, + __entry->pidr + ) +); + DECLARE_EVENT_CLASS(ocxl_context, TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr), TP_ARGS(pid, spa, pasid, pidr, tidr),
Re: [PATCH V3 4/5] ocxl: Add mmu notifier
On 24/11/2020 10:58, Christophe Lombard wrote: Add invalidate_range mmu notifier, when required (ATSD access of MMIO registers is available), to initiate TLB invalidation commands. For the time being, the ATSD0 set of registers is used by default. The pasid and bdf values have to be configured in the Process Element Entry. The PEE must be set up to match the BDF/PASID of the AFU. Signed-off-by: Christophe Lombard --- That looks ok too. Acked-by: Frederic Barrat drivers/misc/ocxl/link.c | 62 +++- 1 file changed, 61 insertions(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 77381dda2c45..129d4eddc4d2 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -2,8 +2,10 @@ // Copyright 2017 IBM Corp. #include #include +#include #include #include +#include #include #include #include @@ -33,6 +35,7 @@ #define SPA_PE_VALID 0x8000 +struct ocxl_link; struct pe_data { struct mm_struct *mm; @@ -41,6 +44,8 @@ struct pe_data { /* opaque pointer to be passed to the above callback */ void *xsl_err_data; struct rcu_head rcu; + struct ocxl_link *link; + struct mmu_notifier mmu_notifier; }; struct spa { @@ -83,6 +88,8 @@ struct ocxl_link { int domain; int bus; int dev; + void __iomem *arva; /* ATSD register virtual address */ + spinlock_t atsd_lock; /* to serialize shootdowns */ atomic_t irq_available; struct spa *spa; void *platform_data; @@ -388,6 +395,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l link->bus = dev->bus->number; link->dev = PCI_SLOT(dev->devfn); atomic_set(>irq_available, MAX_IRQ_PER_LINK); + spin_lock_init(>atsd_lock); rc = alloc_spa(dev, link); if (rc) @@ -403,6 +411,13 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l if (rc) goto err_xsl_irq; + /* if link->arva is not defeined, MMIO registers are not used to +* generate TLB invalidate. PowerBus snooping is enabled. +* Otherwise, PowerBus snooping is disabled. TLB Invalidates are +* initiated using MMIO registers. +*/ + pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0, >arva); + *out_link = link; return 0; @@ -454,6 +469,11 @@ static void release_xsl(struct kref *ref) { struct ocxl_link *link = container_of(ref, struct ocxl_link, ref); + if (link->arva) { + pnv_ocxl_unmap_lpar(link->arva); + link->arva = NULL; + } + list_del(>list); /* call platform code before releasing data */ pnv_ocxl_spa_release(link->platform_data); @@ -470,6 +490,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle) } EXPORT_SYMBOL_GPL(ocxl_link_release); +static void invalidate_range(struct mmu_notifier *mn, +struct mm_struct *mm, +unsigned long start, unsigned long end) +{ + struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier); + struct ocxl_link *link = pe_data->link; + unsigned long addr, pid, page_size = PAGE_SIZE; + + pid = mm->context.id; + + spin_lock(>atsd_lock); + for (addr = start; addr < end; addr += page_size) + pnv_ocxl_tlb_invalidate(link->arva, pid, addr, page_size); + spin_unlock(>atsd_lock); +} + +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = { + .invalidate_range = invalidate_range, +}; + static u64 calculate_cfg_state(bool kernel) { u64 state; @@ -526,6 +566,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, pe_data->mm = mm; pe_data->xsl_err_cb = xsl_err_cb; pe_data->xsl_err_data = xsl_err_data; + pe_data->link = link; + pe_data->mmu_notifier.ops = _mmu_notifier_ops; memset(pe, 0, sizeof(struct ocxl_process_element)); pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0)); @@ -542,8 +584,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, * by the nest MMU. If we have a kernel context, TLBIs are * already global. */ - if (mm) + if (mm) { mm_context_add_copro(mm); + if (link->arva) { + /* Use MMIO registers for the TLB Invalidate +* operations. +*/ + mmu_notifier_register(_data->mmu_notifier, mm); + } + } + /* * Barrier is to make sure PE is visible in the SPA before it * is used by the device. It also helps with the global TLBI @@ -674,6 +724,16 @@ int ocxl_link_remove_pe(void *link_handle, int pasid) WARN(1,
Re: [PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel
Le 25/11/2020 à 14:55, Aneesh Kumar K.V a écrit : On 11/25/20 7:22 PM, Christophe Leroy wrote: Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : This prepare kernel to operate with a different value than userspace AMR/IAMR. For this, AMR/IAMR need to be saved and restored on entry and return from the kernel. With KUAP we modify kernel AMR when accessing user address from the kernel via copy_to/from_user interfaces. We don't need to modify IAMR value in similar fashion. If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering kernel from userspace. If not we can assume that AMR/IAMR is not modified from userspace. We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are interrupted within kernel. This is required so that if we get interrupted within copy_to/from_user we continue with the right AMR value. If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace beause kernel will be running with a different IAMR value. Reviewed-by: Sandipan Das Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/kup.h | 222 +++ arch/powerpc/include/asm/ptrace.h | 5 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/entry_64.S | 6 +- arch/powerpc/kernel/exceptions-64s.S | 4 +- arch/powerpc/kernel/syscall_64.c | 32 +++- 6 files changed, 225 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 1d38eab83d48..4dbb2d53fd8f 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -13,17 +13,46 @@ #ifdef __ASSEMBLY__ -.macro kuap_restore_amr gpr1, gpr2 -#ifdef CONFIG_PPC_KUAP +.macro kuap_restore_user_amr gpr1 +#if defined(CONFIG_PPC_PKEY) BEGIN_MMU_FTR_SECTION_NESTED(67) - mfspr \gpr1, SPRN_AMR + /* + * AMR and IAMR are going to be different when + * returning to userspace. + */ + ld \gpr1, STACK_REGS_AMR(r1) + isync + mtspr SPRN_AMR, \gpr1 + /* + * Restore IAMR only when returning to userspace + */ + ld \gpr1, STACK_REGS_IAMR(r1) + mtspr SPRN_IAMR, \gpr1 + + /* No isync required, see kuap_restore_user_amr() */ + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67) +#endif +.endm + +.macro kuap_restore_kernel_amr gpr1, gpr2 +#if defined(CONFIG_PPC_PKEY) + + BEGIN_MMU_FTR_SECTION_NESTED(67) + /* + * AMR is going to be mostly the same since we are + * returning to the kernel. Compare and do a mtspr. + */ ld \gpr2, STACK_REGS_AMR(r1) + mfspr \gpr1, SPRN_AMR cmpd \gpr1, \gpr2 - beq 998f + beq 100f isync mtspr SPRN_AMR, \gpr2 - /* No isync required, see kuap_restore_amr() */ -998: + /* + * No isync required, see kuap_restore_amr() + * No need to restore IAMR when returning to kernel space. + */ +100: END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm @@ -42,23 +71,98 @@ .endm #endif +/* + * if (pkey) { + * + * save AMR -> stack; + * if (kuap) { + * if (AMR != BLOCKED) + * KUAP_BLOCKED -> AMR; + * } + * if (from_user) { + * save IAMR -> stack; + * if (kuep) { + * KUEP_BLOCKED ->IAMR + * } + * } + * return; + * } + * + * if (kuap) { + * if (from_kernel) { + * save AMR -> stack; + * if (AMR != BLOCKED) + * KUAP_BLOCKED -> AMR; + * } + * + * } + */ .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr -#ifdef CONFIG_PPC_KUAP +#if defined(CONFIG_PPC_PKEY) + + /* + * if both pkey and kuap is disabled, nothing to do + */ + BEGIN_MMU_FTR_SECTION_NESTED(68) + b 100f // skip_save_amr + END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68) + + /* + * if pkey is disabled and we are entering from userspace + * don't do anything. + */ BEGIN_MMU_FTR_SECTION_NESTED(67) .ifnb \msr_pr_cr - bne \msr_pr_cr, 99f + /* + * Without pkey we are not changing AMR outside the kernel + * hence skip this completely. + */ + bne \msr_pr_cr, 100f // from userspace .endif + END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67) + + /* + * pkey is enabled or pkey is disabled but entering from kernel + */ mfspr \gpr1, SPRN_AMR std \gpr1, STACK_REGS_AMR(r1) - li \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT) - sldi \gpr2, \gpr2, AMR_KUAP_SHIFT + + /* + * update kernel AMR with AMR_KUAP_BLOCKED only + * if KUAP feature is enabled + */ + BEGIN_MMU_FTR_SECTION_NESTED(69) + LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED) cmpd \use_cr, \gpr1, \gpr2 - beq \use_cr, 99f - // We don't isync here because we very recently entered via rfid
Re: [PATCH V3 3/5] ocxl: Update the Process Element Entry
On 24/11/2020 10:58, Christophe Lombard wrote: To complete the MMIO based mechanism, the fields: PASID, bus, device and function of the Process Element Entry have to be filled. (See OpenCAPI Power Platform Architecture document) Hypervisor Process Element Entry Word 0 1 7 8 .. 12 13 ..15 16 19 20 ... 31 0 OSL Configuration State (0:31) 1 OSL Configuration State (32:63) 2 PASID |Reserved 3 Bus | Device|Function |Reserved 4 Reserved 5 Reserved 6 Signed-off-by: Christophe Lombard --- LGTM Acked-by: Frederic Barrat drivers/misc/ocxl/context.c | 4 +++- drivers/misc/ocxl/link.c | 4 +++- drivers/misc/ocxl/ocxl_internal.h | 9 ++--- drivers/scsi/cxlflash/ocxl_hw.c | 6 -- include/misc/ocxl.h | 2 +- 5 files changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c index c21f65a5c762..9eb0d93b01c6 100644 --- a/drivers/misc/ocxl/context.c +++ b/drivers/misc/ocxl/context.c @@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm) { int rc; unsigned long pidr = 0; + struct pci_dev *dev; // Locks both status & tidr mutex_lock(>status_mutex); @@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm) if (mm) pidr = mm->context.id; + dev = to_pci_dev(ctx->afu->fn->dev.parent); rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr, - amr, mm, xsl_fault_error, ctx); + amr, pci_dev_id(dev), mm, xsl_fault_error, ctx); if (rc) goto out; diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index fd73d3bc0eb6..77381dda2c45 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel) } int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, - u64 amr, struct mm_struct *mm, + u64 amr, u16 bdf, struct mm_struct *mm, void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr), void *xsl_err_data) { @@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr, memset(pe, 0, sizeof(struct ocxl_process_element)); pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0)); + pe->pasid = cpu_to_be32(pasid << (31 - 19)); + pe->bdf = cpu_to_be16(bdf); pe->lpid = cpu_to_be32(mfspr(SPRN_LPID)); pe->pid = cpu_to_be32(pidr); pe->tid = cpu_to_be32(tidr); diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h index 0bad0a123af6..10125a22d5a5 100644 --- a/drivers/misc/ocxl/ocxl_internal.h +++ b/drivers/misc/ocxl/ocxl_internal.h @@ -84,13 +84,16 @@ struct ocxl_context { struct ocxl_process_element { __be64 config_state; - __be32 reserved1[11]; + __be32 pasid; + __be16 bdf; + __be16 reserved1; + __be32 reserved2[9]; __be32 lpid; __be32 tid; __be32 pid; - __be32 reserved2[10]; + __be32 reserved3[10]; __be64 amr; - __be32 reserved3[3]; + __be32 reserved4[3]; __be32 software_state; }; diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c index e4e0d767b98e..244fc27215dc 100644 --- a/drivers/scsi/cxlflash/ocxl_hw.c +++ b/drivers/scsi/cxlflash/ocxl_hw.c @@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx) struct ocxl_hw_afu *afu = ctx->hw_afu; struct ocxl_afu_config *acfg = >acfg; void *link_token = afu->link_token; + struct pci_dev *pdev = afu->pdev; struct device *dev = afu->dev; bool master = ctx->master; struct mm_struct *mm; @@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx) mm = current->mm; } - rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm, - ocxlflash_xsl_fault, ctx); + rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, + pci_dev_id(pdev), mm, ocxlflash_xsl_fault, + ctx); if (unlikely(rc)) { dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n", __func__, rc); diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h index e013736e275d..3ed736da02c8 100644 --- a/include/misc/ocxl.h +++ b/include/misc/ocxl.h @@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle); * defined */ int ocxl_link_add_pe(void *link_handle, int pasid, u32
Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
On 25/11/2020 14:20, Thomas Gleixner wrote: > Laurent, > > On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote: > > The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter > after the colon wants to be uppercase. Ok. >> This function adds an affinity parameter to irq_create_mapping(). >> This parameter is needed to pass it to irq_domain_alloc_descs(). > > A changelog has to explain the WHY. 'The parameter is needed' is not > really useful information. > The reason of this change is explained in PATCH 2. I have two patches, one to change the interface with no functional change (PATCH 1) and one to fix the problem (PATCH 2). Moreover they don't cover the same subsystems. I can either: - merge the two patches - or make a reference in the changelog of PATCH 1 to PATCH 2 (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to irq_create_mapping()")") - or copy some information from PATCH 2 (something like "this parameter is needed by rtas_setup_msi_irqs() to pass the affinity to irq_domain_alloc_descs() to fix multiqueue affinity") What do you prefer? Thanks, Laurent
Re: [PATCH V3 2/5] ocxl: Initiate a TLB invalidate command
On 24/11/2020 10:58, Christophe Lombard wrote: When a TLB Invalidate is required for the Logical Partition, the following sequence has to be performed: 1. Load MMIO ATSD AVA register with the necessary value, if required. 2. Write the MMIO ATSD launch register to initiate the TLB Invalidate command. 3. Poll the MMIO ATSD status register to determine when the TLB Invalidate has been completed. Signed-off-by: Christophe Lombard --- arch/powerpc/include/asm/pnv-ocxl.h | 51 +++ arch/powerpc/platforms/powernv/ocxl.c | 70 +++ 2 files changed, 121 insertions(+) diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h index 60c3c74427d9..9acd1fbf1197 100644 --- a/arch/powerpc/include/asm/pnv-ocxl.h +++ b/arch/powerpc/include/asm/pnv-ocxl.h @@ -3,12 +3,59 @@ #ifndef _ASM_PNV_OCXL_H #define _ASM_PNV_OCXL_H +#include #include #define PNV_OCXL_TL_MAX_TEMPLATE63 #define PNV_OCXL_TL_BITS_PER_RATE 4 #define PNV_OCXL_TL_RATE_BUF_SIZE ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8) +#define PNV_OCXL_ATSD_TIMEOUT 1 + +/* TLB Management Instructions */ +#define PNV_OCXL_ATSD_LNCH 0x00 +/* Radix Invalidate */ +#define PNV_OCXL_ATSD_LNCH_R PPC_BIT(0) +/* Radix Invalidation Control + * 0b00 Just invalidate TLB. + * 0b01 Invalidate just Page Walk Cache. + * 0b10 Invalidate TLB, Page Walk Cache, and any + * caching of Partition and Process Table Entries. + */ +#define PNV_OCXL_ATSD_LNCH_RIC PPC_BITMASK(1, 2) +/* Number and Page Size of translations to be invalidated */ +#define PNV_OCXL_ATSD_LNCH_LPPPC_BITMASK(3, 10) +/* Invalidation Criteria + * 0b00 Invalidate just the target VA. + * 0b01 Invalidate matching PID. + */ +#define PNV_OCXL_ATSD_LNCH_ISPPC_BITMASK(11, 12) +/* 0b1: Process Scope, 0b0: Partition Scope */ +#define PNV_OCXL_ATSD_LNCH_PRS PPC_BIT(13) +/* Invalidation Flag */ +#define PNV_OCXL_ATSD_LNCH_B PPC_BIT(14) +/* Actual Page Size to be invalidated + * 000 4KB + * 101 64KB + * 001 2MB + * 010 1GB + */ +#define PNV_OCXL_ATSD_LNCH_APPPC_BITMASK(15, 17) +/* Defines the large page select + * L=0b0 for 4KB pages + * L=0b1 for large pages) + */ +#define PNV_OCXL_ATSD_LNCH_L PPC_BIT(18) +/* Process ID */ +#define PNV_OCXL_ATSD_LNCH_PID PPC_BITMASK(19, 38) +/* NoFlush – Assumed to be 0b0 */ +#define PNV_OCXL_ATSD_LNCH_F PPC_BIT(39) +#define PNV_OCXL_ATSD_LNCH_OCAPI_SLBIPPC_BIT(40) +#define PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON PPC_BIT(41) +#define PNV_OCXL_ATSD_AVA 0x08 +#define PNV_OCXL_ATSD_AVA_AVAPPC_BITMASK(0, 51) +#define PNV_OCXL_ATSD_STAT 0x10 + int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 *supported); int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count); @@ -31,4 +78,8 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle); int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid, uint64_t lpcr, void __iomem **arva); void pnv_ocxl_unmap_lpar(void __iomem *arva); +void pnv_ocxl_tlb_invalidate(void __iomem *arva, +unsigned long pid, +unsigned long addr, +unsigned long page_size); #endif /* _ASM_PNV_OCXL_H */ diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c index 57fc1062677b..f665846d2b28 100644 --- a/arch/powerpc/platforms/powernv/ocxl.c +++ b/arch/powerpc/platforms/powernv/ocxl.c @@ -528,3 +528,73 @@ void pnv_ocxl_unmap_lpar(void __iomem *arva) iounmap(arva); } EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar); + +void pnv_ocxl_tlb_invalidate(void __iomem *arva, +unsigned long pid, +unsigned long addr, +unsigned long page_size) +{ + unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT); + u64 val = 0ull; + int pend; + u8 size; + + if (!(arva)) + return; + + if (addr) { + /* load Abbreviated Virtual Address register with +* the necessary value +*/ + val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51)); + out_be64(arva + PNV_OCXL_ATSD_AVA, val); + } + + /* Write access initiates a shoot down to initiate the +* TLB Invalidate command +*/ + val = PNV_OCXL_ATSD_LNCH_R; + if (addr) { + val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b00); + val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00); + } else { + val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10); + val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01); + val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;
Re: [PATCH v6 16/22] powerpc/book3s64/kuap: Improve error reporting with KUAP
Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : With hash translation use DSISR_KEYFAULT to identify a wrong access. With Radix we look at the AMR value and type of fault. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/32/kup.h | 4 +-- arch/powerpc/include/asm/book3s/64/kup.h | 27 arch/powerpc/include/asm/kup.h | 4 +-- arch/powerpc/include/asm/nohash/32/kup-8xx.h | 4 +-- arch/powerpc/mm/fault.c | 2 +- 5 files changed, 29 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h index 32fd4452e960..b18cd931e325 100644 --- a/arch/powerpc/include/asm/book3s/32/kup.h +++ b/arch/powerpc/include/asm/book3s/32/kup.h @@ -177,8 +177,8 @@ static inline void restore_user_access(unsigned long flags) allow_user_access(to, to, end - addr, KUAP_READ_WRITE); } -static inline bool -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, + bool is_write, unsigned long error_code) { unsigned long begin = regs->kuap & 0xf000; unsigned long end = regs->kuap << 28; diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 4a3d0d601745..2922c442a218 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -301,12 +301,29 @@ static inline void set_kuap(unsigned long value) isync(); } -static inline bool -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) +#define RADIX_KUAP_BLOCK_READ UL(0x4000) +#define RADIX_KUAP_BLOCK_WRITE UL(0x8000) + +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, + bool is_write, unsigned long error_code) { - return WARN(mmu_has_feature(MMU_FTR_KUAP) && - (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)), - "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); + if (!mmu_has_feature(MMU_FTR_KUAP)) + return false; + + if (radix_enabled()) { + /* +* Will be a storage protection fault. +* Only check the details of AMR[0] +*/ + return WARN((regs->kuap & (is_write ? RADIX_KUAP_BLOCK_WRITE : RADIX_KUAP_BLOCK_READ)), + "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); I think it is pointless to keep the WARN() here. I have a series aiming at removing them. See https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cc9129bdda1dbc2f0a09cf45fece7d0b0e690784.1605541983.git.christophe.le...@csgroup.eu/ + } + /* +* We don't want to WARN here because userspace can setup +* keys such that a kernel access to user address can cause +* fault +*/ + return !!(error_code & DSISR_KEYFAULT); } static __always_inline void allow_user_access(void __user *to, const void __user *from, diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h index a06e50b68d40..952be0414f43 100644 --- a/arch/powerpc/include/asm/kup.h +++ b/arch/powerpc/include/asm/kup.h @@ -59,8 +59,8 @@ void setup_kuap(bool disabled); #else static inline void setup_kuap(bool disabled) { } -static inline bool -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, + bool is_write, unsigned long error_code) { return false; } diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h index 567cdc557402..7bdd9e5b63ed 100644 --- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h @@ -60,8 +60,8 @@ static inline void restore_user_access(unsigned long flags) mtspr(SPRN_MD_AP, flags); } -static inline bool -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, + bool is_write, unsigned long error_code) { return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xff00), "Bug: fault blocked by AP register !"); diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 0add963a849b..c91621df0c61 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -227,7 +227,7 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code, // Read/write fault in a valid region (the exception table search passed // above), but blocked by KUAP is bad, it can never
Re: [PATCH] ASoC: fsl_xcvr: fix potential resource leak
On Tue, 24 Nov 2020 16:19:57 +0200, Viorel Suman (OSS) wrote: > "fw" variable must be relased before return. Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: fsl_xcvr: fix potential resource leak commit: 373c2cebf42772434c8dd0deffc3b3886ea8f1eb All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH v6 11/22] powerpc/book3s64/pkeys: Inherit correctly on fork.
On 11/25/20 7:24 PM, Christophe Leroy wrote: Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : Child thread.kuap value is inherited from the parent in copy_thread_tls. We still need to make sure when the child returns from a fork in the kernel we start with the kernel default AMR value. Reviewed-by: Sandipan Das Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kernel/process.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index b6b8a845e454..733680de0ba4 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1768,6 +1768,17 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, childregs->ppr = DEFAULT_PPR; p->thread.tidr = 0; +#endif + /* + * Run with the current AMR value of the kernel + */ +#ifdef CONFIG_PPC_KUAP + if (mmu_has_feature(MMU_FTR_KUAP)) + kregs->kuap = AMR_KUAP_BLOCKED; +#endif Do we need that ifdef at all ? Shouldn't mmu_has_feature(MMU_FTR_KUAP) be always false and get optimised out when CONFIG_PPC_KUAP is not defined ? +#ifdef CONFIG_PPC_KUEP + if (mmu_has_feature(MMU_FTR_KUEP)) + kregs->iamr = AMR_KUEP_BLOCKED; Same ? #endif kregs->nip = ppc_function_entry(f); return 0; Not really. I did hit a compile error with this patch on mpc885_ads_defconfig and that required me to do modified arch/powerpc/kernel/process.c @@ -1772,11 +1772,10 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, /* * Run with the current AMR value of the kernel */ -#ifdef CONFIG_PPC_KUAP +#ifdef CONFIG_PPC_PKEY if (mmu_has_feature(MMU_FTR_KUAP)) - kregs->kuap = AMR_KUAP_BLOCKED; -#endif -#ifdef CONFIG_PPC_KUEP + kregs->amr = AMR_KUAP_BLOCKED; + if (mmu_has_feature(MMU_FTR_KUEP)) kregs->iamr = AMR_KUEP_BLOCKED; #endif
Re: [PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel
On 11/25/20 7:22 PM, Christophe Leroy wrote: Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : This prepare kernel to operate with a different value than userspace AMR/IAMR. For this, AMR/IAMR need to be saved and restored on entry and return from the kernel. With KUAP we modify kernel AMR when accessing user address from the kernel via copy_to/from_user interfaces. We don't need to modify IAMR value in similar fashion. If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering kernel from userspace. If not we can assume that AMR/IAMR is not modified from userspace. We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are interrupted within kernel. This is required so that if we get interrupted within copy_to/from_user we continue with the right AMR value. If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace beause kernel will be running with a different IAMR value. Reviewed-by: Sandipan Das Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/kup.h | 222 +++ arch/powerpc/include/asm/ptrace.h | 5 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/entry_64.S | 6 +- arch/powerpc/kernel/exceptions-64s.S | 4 +- arch/powerpc/kernel/syscall_64.c | 32 +++- 6 files changed, 225 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 1d38eab83d48..4dbb2d53fd8f 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -13,17 +13,46 @@ #ifdef __ASSEMBLY__ -.macro kuap_restore_amr gpr1, gpr2 -#ifdef CONFIG_PPC_KUAP +.macro kuap_restore_user_amr gpr1 +#if defined(CONFIG_PPC_PKEY) BEGIN_MMU_FTR_SECTION_NESTED(67) - mfspr \gpr1, SPRN_AMR + /* + * AMR and IAMR are going to be different when + * returning to userspace. + */ + ld \gpr1, STACK_REGS_AMR(r1) + isync + mtspr SPRN_AMR, \gpr1 + /* + * Restore IAMR only when returning to userspace + */ + ld \gpr1, STACK_REGS_IAMR(r1) + mtspr SPRN_IAMR, \gpr1 + + /* No isync required, see kuap_restore_user_amr() */ + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67) +#endif +.endm + +.macro kuap_restore_kernel_amr gpr1, gpr2 +#if defined(CONFIG_PPC_PKEY) + + BEGIN_MMU_FTR_SECTION_NESTED(67) + /* + * AMR is going to be mostly the same since we are + * returning to the kernel. Compare and do a mtspr. + */ ld \gpr2, STACK_REGS_AMR(r1) + mfspr \gpr1, SPRN_AMR cmpd \gpr1, \gpr2 - beq 998f + beq 100f isync mtspr SPRN_AMR, \gpr2 - /* No isync required, see kuap_restore_amr() */ -998: + /* + * No isync required, see kuap_restore_amr() + * No need to restore IAMR when returning to kernel space. + */ +100: END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm @@ -42,23 +71,98 @@ .endm #endif +/* + * if (pkey) { + * + * save AMR -> stack; + * if (kuap) { + * if (AMR != BLOCKED) + * KUAP_BLOCKED -> AMR; + * } + * if (from_user) { + * save IAMR -> stack; + * if (kuep) { + * KUEP_BLOCKED ->IAMR + * } + * } + * return; + * } + * + * if (kuap) { + * if (from_kernel) { + * save AMR -> stack; + * if (AMR != BLOCKED) + * KUAP_BLOCKED -> AMR; + * } + * + * } + */ .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr -#ifdef CONFIG_PPC_KUAP +#if defined(CONFIG_PPC_PKEY) + + /* + * if both pkey and kuap is disabled, nothing to do + */ + BEGIN_MMU_FTR_SECTION_NESTED(68) + b 100f // skip_save_amr + END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68) + + /* + * if pkey is disabled and we are entering from userspace + * don't do anything. + */ BEGIN_MMU_FTR_SECTION_NESTED(67) .ifnb \msr_pr_cr - bne \msr_pr_cr, 99f + /* + * Without pkey we are not changing AMR outside the kernel + * hence skip this completely. + */ + bne \msr_pr_cr, 100f // from userspace .endif + END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67) + + /* + * pkey is enabled or pkey is disabled but entering from kernel + */ mfspr \gpr1, SPRN_AMR std \gpr1, STACK_REGS_AMR(r1) - li \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT) - sldi \gpr2, \gpr2, AMR_KUAP_SHIFT + + /* + * update kernel AMR with AMR_KUAP_BLOCKED only + * if KUAP feature is enabled + */ + BEGIN_MMU_FTR_SECTION_NESTED(69) + LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED) cmpd \use_cr, \gpr1, \gpr2 - beq \use_cr, 99f - // We don't isync here because we very recently entered via rfid + beq \use_cr, 102f + /* + * We
Re: [PATCH v6 11/22] powerpc/book3s64/pkeys: Inherit correctly on fork.
Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : Child thread.kuap value is inherited from the parent in copy_thread_tls. We still need to make sure when the child returns from a fork in the kernel we start with the kernel default AMR value. Reviewed-by: Sandipan Das Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kernel/process.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index b6b8a845e454..733680de0ba4 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1768,6 +1768,17 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, childregs->ppr = DEFAULT_PPR; p->thread.tidr = 0; +#endif + /* +* Run with the current AMR value of the kernel +*/ +#ifdef CONFIG_PPC_KUAP + if (mmu_has_feature(MMU_FTR_KUAP)) + kregs->kuap = AMR_KUAP_BLOCKED; +#endif Do we need that ifdef at all ? Shouldn't mmu_has_feature(MMU_FTR_KUAP) be always false and get optimised out when CONFIG_PPC_KUAP is not defined ? +#ifdef CONFIG_PPC_KUEP + if (mmu_has_feature(MMU_FTR_KUEP)) + kregs->iamr = AMR_KUEP_BLOCKED; Same ? #endif kregs->nip = ppc_function_entry(f); return 0;
Re: [PATCH v6 07/22] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP
On 11/25/20 7:13 PM, Christophe Leroy wrote: Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : This is in preparate to adding support for kuap with hash translation. In preparation for that rename/move kuap related functions to non radix names. Also move the feature bit closer to MMU_FTR_KUEP. It was obvious with MMU_FTR_RADIX_KUAP that it was only for Radix PPC64. Now, do we expect it to be applies on PPC32 as well or is it still for PPC64 only ? Right now this is PPC64 only. I added +config PPC_PKEY + def_bool y + depends on PPC_BOOK3S_64 + depends on PPC_MEM_KEYS || PPC_KUAP || PPC_KUEP to select the base bits needed for both KUAP and MEM_KEYS. I haven't looked at PPC32 to see if we can implement it there also. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/kup.h | 18 +- arch/powerpc/include/asm/mmu.h | 14 +++--- arch/powerpc/mm/book3s64/pkeys.c | 2 +- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 39d2e3a0d64d..1d38eab83d48 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -24,7 +24,7 @@ mtspr SPRN_AMR, \gpr2 /* No isync required, see kuap_restore_amr() */ 998: - END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm @@ -37,7 +37,7 @@ sldi \gpr2, \gpr2, AMR_KUAP_SHIFT 999: tdne \gpr1, \gpr2 EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) - END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm #endif @@ -58,7 +58,7 @@ mtspr SPRN_AMR, \gpr2 isync 99: - END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm @@ -73,7 +73,7 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) { - if (mmu_has_feature(MMU_FTR_RADIX_KUAP) && unlikely(regs->kuap != amr)) { + if (mmu_has_feature(MMU_FTR_KUAP) && unlikely(regs->kuap != amr)) { isync(); mtspr(SPRN_AMR, regs->kuap); /* @@ -86,7 +86,7 @@ static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) static inline unsigned long kuap_get_and_check_amr(void) { - if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) { + if (mmu_has_feature(MMU_FTR_KUAP)) { unsigned long amr = mfspr(SPRN_AMR); if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */ WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED); @@ -97,7 +97,7 @@ static inline unsigned long kuap_get_and_check_amr(void) static inline void kuap_check_amr(void) { - if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP)) + if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_KUAP)) WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED); } @@ -116,7 +116,7 @@ static inline unsigned long get_kuap(void) * This has no effect in terms of actually blocking things on hash, * so it doesn't break anything. */ - if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP)) + if (!early_mmu_has_feature(MMU_FTR_KUAP)) return AMR_KUAP_BLOCKED; return mfspr(SPRN_AMR); @@ -124,7 +124,7 @@ static inline unsigned long get_kuap(void) static inline void set_kuap(unsigned long value) { - if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP)) + if (!early_mmu_has_feature(MMU_FTR_KUAP)) return; /* @@ -139,7 +139,7 @@ static inline void set_kuap(unsigned long value) static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { - return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) && + return WARN(mmu_has_feature(MMU_FTR_KUAP) && (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)), "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); } diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 255a1837e9f7..f5c7a17c198a 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -28,6 +28,11 @@ * Individual features below. */ +/* + * Supports KUAP (key 0 controlling userspace addresses) on radix + */ +#define MMU_FTR_KUAP ASM_CONST(0x0200) + /* * Support for KUEP feature. */ @@ -120,11 +125,6 @@ */ #define MMU_FTR_1T_SEGMENT ASM_CONST(0x4000) -/* - * Supports KUAP (key 0 controlling userspace addresses) on radix - */ -#define MMU_FTR_RADIX_KUAP ASM_CONST(0x8000) - /* MMU feature bit sets for various CPUs */ #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2 \ MMU_FTR_HPTE_TABLE |
Re: [PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel
Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : This prepare kernel to operate with a different value than userspace AMR/IAMR. For this, AMR/IAMR need to be saved and restored on entry and return from the kernel. With KUAP we modify kernel AMR when accessing user address from the kernel via copy_to/from_user interfaces. We don't need to modify IAMR value in similar fashion. If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering kernel from userspace. If not we can assume that AMR/IAMR is not modified from userspace. We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are interrupted within kernel. This is required so that if we get interrupted within copy_to/from_user we continue with the right AMR value. If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace beause kernel will be running with a different IAMR value. Reviewed-by: Sandipan Das Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/kup.h | 222 +++ arch/powerpc/include/asm/ptrace.h| 5 +- arch/powerpc/kernel/asm-offsets.c| 2 + arch/powerpc/kernel/entry_64.S | 6 +- arch/powerpc/kernel/exceptions-64s.S | 4 +- arch/powerpc/kernel/syscall_64.c | 32 +++- 6 files changed, 225 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 1d38eab83d48..4dbb2d53fd8f 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -13,17 +13,46 @@ #ifdef __ASSEMBLY__ -.macro kuap_restore_amr gpr1, gpr2 -#ifdef CONFIG_PPC_KUAP +.macro kuap_restore_user_amr gpr1 +#if defined(CONFIG_PPC_PKEY) BEGIN_MMU_FTR_SECTION_NESTED(67) - mfspr \gpr1, SPRN_AMR + /* +* AMR and IAMR are going to be different when +* returning to userspace. +*/ + ld \gpr1, STACK_REGS_AMR(r1) + isync + mtspr SPRN_AMR, \gpr1 + /* +* Restore IAMR only when returning to userspace +*/ + ld \gpr1, STACK_REGS_IAMR(r1) + mtspr SPRN_IAMR, \gpr1 + + /* No isync required, see kuap_restore_user_amr() */ + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67) +#endif +.endm + +.macro kuap_restore_kernel_amr gpr1, gpr2 +#if defined(CONFIG_PPC_PKEY) + + BEGIN_MMU_FTR_SECTION_NESTED(67) + /* +* AMR is going to be mostly the same since we are +* returning to the kernel. Compare and do a mtspr. +*/ ld \gpr2, STACK_REGS_AMR(r1) + mfspr \gpr1, SPRN_AMR cmpd\gpr1, \gpr2 - beq 998f + beq 100f isync mtspr SPRN_AMR, \gpr2 - /* No isync required, see kuap_restore_amr() */ -998: + /* +* No isync required, see kuap_restore_amr() +* No need to restore IAMR when returning to kernel space. +*/ +100: END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm @@ -42,23 +71,98 @@ .endm #endif +/* + * if (pkey) { + * + * save AMR -> stack; + * if (kuap) { + * if (AMR != BLOCKED) + * KUAP_BLOCKED -> AMR; + * } + * if (from_user) { + * save IAMR -> stack; + * if (kuep) { + * KUEP_BLOCKED ->IAMR + * } + * } + * return; + * } + * + * if (kuap) { + * if (from_kernel) { + * save AMR -> stack; + * if (AMR != BLOCKED) + * KUAP_BLOCKED -> AMR; + * } + * + * } + */ .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr -#ifdef CONFIG_PPC_KUAP +#if defined(CONFIG_PPC_PKEY) + + /* +* if both pkey and kuap is disabled, nothing to do +*/ + BEGIN_MMU_FTR_SECTION_NESTED(68) + b 100f // skip_save_amr + END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68) + + /* +* if pkey is disabled and we are entering from userspace +* don't do anything. +*/ BEGIN_MMU_FTR_SECTION_NESTED(67) .ifnb \msr_pr_cr - bne \msr_pr_cr, 99f + /* +* Without pkey we are not changing AMR outside the kernel +* hence skip this completely. +*/ + bne \msr_pr_cr, 100f // from userspace .endif +END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67) + + /* +* pkey is enabled or pkey is disabled but entering from kernel +*/ mfspr \gpr1, SPRN_AMR std \gpr1, STACK_REGS_AMR(r1) - li \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT) - sldi\gpr2, \gpr2, AMR_KUAP_SHIFT + + /* +* update kernel AMR with AMR_KUAP_BLOCKED only +* if KUAP feature is
Re: [PATCH v6 09/22] powerpc/exec: Set thread.regs early during exec
Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : In later patches during exec, we would like to access default regs.amr to control access to the user mapping. Having thread.regs set early makes the code changes simpler. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/thread_info.h | 2 -- arch/powerpc/kernel/process.c | 37 +- 2 files changed, 25 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 46a210b03d2b..de4c911d9ced 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -77,10 +77,8 @@ struct thread_info { /* how to get the thread information struct from C */ extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); -#ifdef CONFIG_PPC_BOOK3S_64 void arch_setup_new_exec(void); #define arch_setup_new_exec arch_setup_new_exec -#endif #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index d421a2c7f822..b6b8a845e454 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1530,10 +1530,32 @@ void flush_thread(void) #ifdef CONFIG_PPC_BOOK3S_64 void arch_setup_new_exec(void) { - if (radix_enabled()) - return; - hash__setup_new_exec(); + if (!radix_enabled()) + hash__setup_new_exec(); + + /* +* If we exec out of a kernel thread then thread.regs will not be +* set. Do it now. +*/ + if (!current->thread.regs) { + struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; + current->thread.regs = regs - 1; + } + +} +#else +void arch_setup_new_exec(void) +{ + /* +* If we exec out of a kernel thread then thread.regs will not be +* set. Do it now. +*/ + if (!current->thread.regs) { + struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; + current->thread.regs = regs - 1; + } } + #endif No need to duplicate arch_setup_new_exec() I think. radix_enabled() is defined at all time so the first function should be valid at all time. #ifdef CONFIG_PPC64 @@ -1765,15 +1787,6 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) preload_new_slb_context(start, sp); #endif - /* -* If we exec out of a kernel thread then thread.regs will not be -* set. Do it now. -*/ - if (!current->thread.regs) { - struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; - current->thread.regs = regs - 1; - } - #ifdef CONFIG_PPC_TRANSACTIONAL_MEM /* * Clear any transactional state, we're exec()ing. The cause is
Re: [PATCH V3 1/5] ocxl: Assign a register set to a Logical Partition
On 24/11/2020 10:58, Christophe Lombard wrote: Platform specific function to assign a register set to a Logical Partition. The "ibm,mmio-atsd" property, provided by the firmware, contains the 16 base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO registers (XTS MMIO ATSDx LPARID/AVA/launch/status register). For the time being, the ATSD0 set of registers is used by default. Signed-off-by: Christophe Lombard --- Looks good, thanks for the updates! Acked-by: Frederic Barrat arch/powerpc/include/asm/pnv-ocxl.h | 3 ++ arch/powerpc/platforms/powernv/ocxl.c | 45 +++ 2 files changed, 48 insertions(+) diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h index d37ededca3ee..60c3c74427d9 100644 --- a/arch/powerpc/include/asm/pnv-ocxl.h +++ b/arch/powerpc/include/asm/pnv-ocxl.h @@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void **p void pnv_ocxl_spa_release(void *platform_data); int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle); +int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid, + uint64_t lpcr, void __iomem **arva); +void pnv_ocxl_unmap_lpar(void __iomem *arva); #endif /* _ASM_PNV_OCXL_H */ diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c index ecdad219d704..57fc1062677b 100644 --- a/arch/powerpc/platforms/powernv/ocxl.c +++ b/arch/powerpc/platforms/powernv/ocxl.c @@ -483,3 +483,48 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle) return rc; } EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache); + +int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid, + uint64_t lpcr, void __iomem **arva) +{ + struct pci_controller *hose = pci_bus_to_host(dev->bus); + struct pnv_phb *phb = hose->private_data; + u64 mmio_atsd; + int rc; + + /* ATSD physical address. +* ATSD LAUNCH register: write access initiates a shoot down to +* initiate the TLB Invalidate command. +*/ + rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd", + 0, _atsd); + if (rc) { + dev_info(>dev, "No available ATSD found\n"); + return rc; + } + + /* Assign a register set to a Logical Partition and MMIO ATSD +* LPARID register to the required value. +*/ + rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev), + lparid, lpcr); + if (rc) { + dev_err(>dev, "Error mapping device to LPAR: %d\n", rc); + return rc; + } + + *arva = ioremap(mmio_atsd, 24); + if (!(*arva)) { + dev_warn(>dev, "ioremap failed - mmio_atsd: %#llx\n", mmio_atsd); + rc = -ENOMEM; + } + + return rc; +} +EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar); + +void pnv_ocxl_unmap_lpar(void __iomem *arva) +{ + iounmap(arva); +} +EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
Re: [PATCH v6 07/22] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP
Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : This is in preparate to adding support for kuap with hash translation. In preparation for that rename/move kuap related functions to non radix names. Also move the feature bit closer to MMU_FTR_KUEP. It was obvious with MMU_FTR_RADIX_KUAP that it was only for Radix PPC64. Now, do we expect it to be applies on PPC32 as well or is it still for PPC64 only ? Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/kup.h | 18 +- arch/powerpc/include/asm/mmu.h | 14 +++--- arch/powerpc/mm/book3s64/pkeys.c | 2 +- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 39d2e3a0d64d..1d38eab83d48 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -24,7 +24,7 @@ mtspr SPRN_AMR, \gpr2 /* No isync required, see kuap_restore_amr() */ 998: - END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm @@ -37,7 +37,7 @@ sldi\gpr2, \gpr2, AMR_KUAP_SHIFT 999: tdne\gpr1, \gpr2 EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE) - END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm #endif @@ -58,7 +58,7 @@ mtspr SPRN_AMR, \gpr2 isync 99: - END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) + END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67) #endif .endm @@ -73,7 +73,7 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) { - if (mmu_has_feature(MMU_FTR_RADIX_KUAP) && unlikely(regs->kuap != amr)) { + if (mmu_has_feature(MMU_FTR_KUAP) && unlikely(regs->kuap != amr)) { isync(); mtspr(SPRN_AMR, regs->kuap); /* @@ -86,7 +86,7 @@ static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) static inline unsigned long kuap_get_and_check_amr(void) { - if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) { + if (mmu_has_feature(MMU_FTR_KUAP)) { unsigned long amr = mfspr(SPRN_AMR); if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */ WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED); @@ -97,7 +97,7 @@ static inline unsigned long kuap_get_and_check_amr(void) static inline void kuap_check_amr(void) { - if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP)) + if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_KUAP)) WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED); } @@ -116,7 +116,7 @@ static inline unsigned long get_kuap(void) * This has no effect in terms of actually blocking things on hash, * so it doesn't break anything. */ - if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP)) + if (!early_mmu_has_feature(MMU_FTR_KUAP)) return AMR_KUAP_BLOCKED; return mfspr(SPRN_AMR); @@ -124,7 +124,7 @@ static inline unsigned long get_kuap(void) static inline void set_kuap(unsigned long value) { - if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP)) + if (!early_mmu_has_feature(MMU_FTR_KUAP)) return; /* @@ -139,7 +139,7 @@ static inline void set_kuap(unsigned long value) static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) { - return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) && + return WARN(mmu_has_feature(MMU_FTR_KUAP) && (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)), "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); } diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 255a1837e9f7..f5c7a17c198a 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -28,6 +28,11 @@ * Individual features below. */ +/* + * Supports KUAP (key 0 controlling userspace addresses) on radix + */ +#define MMU_FTR_KUAP ASM_CONST(0x0200) + /* * Support for KUEP feature. */ @@ -120,11 +125,6 @@ */ #define MMU_FTR_1T_SEGMENTASM_CONST(0x4000) -/* - * Supports KUAP (key 0 controlling userspace addresses) on radix - */ -#define MMU_FTR_RADIX_KUAP ASM_CONST(0x8000) - /* MMU feature bit sets for various CPUs */ #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2 \ MMU_FTR_HPTE_TABLE | MMU_FTR_PPCAS_ARCH_V2 @@ -187,10 +187,10 @@ enum { #ifdef CONFIG_PPC_RADIX_MMU MMU_FTR_TYPE_RADIX | MMU_FTR_GTSE | +#endif /* CONFIG_PPC_RADIX_MMU
Re: [PATCH v6 04/22] powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init
Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : This patch consolidates UAMOR update across pkey, kuap and kuep features. The boot cpu initialize UAMOR via pkey init and both radix/hash do the secondary cpu UAMOR init in early_init_mmu_secondary. We don't check for mmu_feature in radix secondary init because UAMOR is a supported SPRN with all CPUs supporting radix translation. The old code was not updating UAMOR if we had smap disabled and smep enabled. This change handles that case. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 3adcf730f478..bfe441af916a 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -620,9 +620,6 @@ void setup_kuap(bool disabled) cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP; } - /* Make sure userspace can't change the AMR */ - mtspr(SPRN_UAMOR, 0); - /* * Set the default kernel AMR values on all cpus. */ @@ -721,6 +718,11 @@ void radix__early_init_mmu_secondary(void) radix__switch_mmu_context(NULL, _mm); tlbiel_all(); + +#ifdef CONFIG_PPC_PKEY It should be possible to use an 'if' with IS_ENABLED(CONFIG_PPC_PKEY) instead of this #ifdef + /* Make sure userspace can't change the AMR */ + mtspr(SPRN_UAMOR, 0); +#endif } void radix__mmu_cleanup_all(void)
Re: [PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS
Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit : The next set of patches adds support for kuap with hash translation. Hence make KUAP a BOOK3S_64 feature. Also make it a subfeature of PPC_MEM_KEYS. Hash translation is going to use pkeys to support KUAP/KUEP. Adding this dependency reduces the code complexity and enables us to move some of the initialization code to pkeys.c Signed-off-by: Aneesh Kumar K.V --- .../powerpc/include/asm/book3s/64/kup-radix.h | 4 ++-- arch/powerpc/include/asm/book3s/64/mmu.h | 2 +- arch/powerpc/include/asm/ptrace.h | 7 +- arch/powerpc/kernel/asm-offsets.c | 3 +++ arch/powerpc/mm/book3s64/Makefile | 2 +- arch/powerpc/mm/book3s64/pkeys.c | 24 --- arch/powerpc/platforms/Kconfig.cputype| 5 7 files changed, 33 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 28716e2f13e3..68eaa2fac3ab 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -16,7 +16,7 @@ #ifdef CONFIG_PPC_KUAP BEGIN_MMU_FTR_SECTION_NESTED(67) mfspr \gpr1, SPRN_AMR - ld \gpr2, STACK_REGS_KUAP(r1) + ld \gpr2, STACK_REGS_AMR(r1) cmpd\gpr1, \gpr2 beq 998f isync @@ -48,7 +48,7 @@ bne \msr_pr_cr, 99f .endif mfspr \gpr1, SPRN_AMR - std \gpr1, STACK_REGS_KUAP(r1) + std \gpr1, STACK_REGS_AMR(r1) li \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT) sldi\gpr2, \gpr2, AMR_KUAP_SHIFT cmpd\use_cr, \gpr1, \gpr2 diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index e0b52940e43c..a2a015066bae 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -199,7 +199,7 @@ extern int mmu_io_psize; void mmu_early_init_devtree(void); void hash__early_init_devtree(void); void radix__early_init_devtree(void); -#ifdef CONFIG_PPC_MEM_KEYS +#ifdef CONFIG_PPC_PKEY void pkey_early_init_devtree(void); #else static inline void pkey_early_init_devtree(void) {} diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index e2c778c176a3..e7f1caa007a4 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -53,9 +53,14 @@ struct pt_regs #ifdef CONFIG_PPC64 unsigned long ppr; #endif + union { #ifdef CONFIG_PPC_KUAP - unsigned long kuap; + unsigned long kuap; #endif +#ifdef CONFIG_PPC_PKEY + unsigned long amr; +#endif + }; }; unsigned long __pad[2]; /* Maintain 16 byte interrupt stack alignment */ }; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index c2722ff36e98..418a0b314a33 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -354,6 +354,9 @@ int main(void) STACK_PT_REGS_OFFSET(_PPR, ppr); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_PPC_PKEY + STACK_PT_REGS_OFFSET(STACK_REGS_AMR, amr); +#endif #ifdef CONFIG_PPC_KUAP STACK_PT_REGS_OFFSET(STACK_REGS_KUAP, kuap); #endif diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile index fd393b8be14f..1b56d3af47d4 100644 --- a/arch/powerpc/mm/book3s64/Makefile +++ b/arch/powerpc/mm/book3s64/Makefile @@ -17,7 +17,7 @@ endif obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hash_hugepage.o obj-$(CONFIG_PPC_SUBPAGE_PROT)+= subpage_prot.o obj-$(CONFIG_SPAPR_TCE_IOMMU) += iommu_api.o -obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o +obj-$(CONFIG_PPC_PKEY) += pkeys.o # Instrumenting the SLB fault path can lead to duplicate SLB entries KCOV_INSTRUMENT_slb.o := n diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index b1d091a97611..7dc71f85683d 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -89,12 +89,14 @@ static int scan_pkey_feature(void) } } +#ifdef CONFIG_PPC_MEM_KEYS /* * Adjust the upper limit, based on the number of bits supported by * arch-neutral code. */ pkeys_total = min_t(int, pkeys_total, ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1)); I don't think we need an #ifdef here. I thing an 'if (IS_ENABLED(CONFIG_PPC_MEM_KEYS))' should make it. +#endif return pkeys_total; } @@ -102,6 +104,7 @@ void __init pkey_early_init_devtree(void) { int pkeys_total, i; +#ifdef CONFIG_PPC_MEM_KEYS /* * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral * generic defines for
Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
Laurent, On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote: The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter after the colon wants to be uppercase. > This function adds an affinity parameter to irq_create_mapping(). > This parameter is needed to pass it to irq_domain_alloc_descs(). A changelog has to explain the WHY. 'The parameter is needed' is not really useful information. Thanks, tglx
Re: [PATCH v2 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
On Wed, 25 Nov 2020 12:16:57 +0100 Laurent Vivier wrote: > With virtio multiqueue, normally each queue IRQ is mapped to a CPU. > > But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") > this is broken on pseries. > > The affinity is correctly computed in msi_desc but this is not applied > to the system IRQs. > > It appears the affinity is correctly passed to rtas_setup_msi_irqs() but > lost at this point and never passed to irq_domain_alloc_descs() > (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")) > because irq_create_mapping() doesn't take an affinity parameter. > > As the previous patch has added the affinity parameter to > irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs() > to irq_domain_alloc_descs(). > > With this change, the virtqueues are correctly dispatched between the CPUs > on pseries. > Since it is public, maybe add: BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939 ? > Signed-off-by: Laurent Vivier > --- Anyway, Reviewed-by: Greg Kurz > arch/powerpc/platforms/pseries/msi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/msi.c > b/arch/powerpc/platforms/pseries/msi.c > index 133f6adcb39c..b3ac2455faad 100644 > --- a/arch/powerpc/platforms/pseries/msi.c > +++ b/arch/powerpc/platforms/pseries/msi.c > @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int > nvec_in, int type) > return hwirq; > } > > - virq = irq_create_mapping(NULL, hwirq); > + virq = irq_create_mapping_affinity(NULL, hwirq, > +entry->affinity); > > if (!virq) { > pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
Re: C vdso
Christophe Leroy writes: > Quoting Michael Ellerman : > >> Christophe Leroy writes: >>> Le 03/11/2020 à 19:13, Christophe Leroy a écrit : Le 23/10/2020 à 15:24, Michael Ellerman a écrit : > Christophe Leroy writes: >> Le 24/09/2020 à 15:17, Christophe Leroy a écrit : >>> Le 17/09/2020 à 14:33, Michael Ellerman a écrit : Christophe Leroy writes: > > What is the status with the generic C vdso merge ? > In some mail, you mentionned having difficulties getting it working on > ppc64, any progress ? What's the problem ? Can I help ? Yeah sorry I was hoping to get time to work on it but haven't been able to. It's causing crashes on ppc64 ie. big endian. > ... >>> >>> Can you tell what defconfig you are using ? I have been able to >>> setup a full glibc PPC64 cross >>> compilation chain and been able to test it under QEMU with >>> success, using Nathan's vdsotest tool. >> >> What config are you using ? > > ppc64_defconfig + guest.config > > Or pseries_defconfig. > > I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other > toolchains too. > > At a minimum we're seeing relocations in the output, which is a problem: > > $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so > Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries: > Offset Info Type Sym. Value > Sym. Name + Addend > 1368 0016 R_PPC64_RELATIVE 7c0 > 1370 0016 R_PPC64_RELATIVE 9300 > 1380 0016 R_PPC64_RELATIVE 970 > 1388 0016 R_PPC64_RELATIVE 9300 > 1398 0016 R_PPC64_RELATIVE a90 > 13a0 0016 R_PPC64_RELATIVE 9300 > 13b0 0016 R_PPC64_RELATIVE b20 > 13b8 0016 R_PPC64_RELATIVE 9300 Looks like it's due to the OPD and relation between the function() and .function() By using DOTSYM() in the 'bl' call, that's directly the dot function which is called and the OPD is not used anymore, it can get dropped. Now I get .rela.dyn full of 0, don't know if we should drop it explicitely. >>> >>> What is the status now with latest version of CVDSO ? I saw you had >>> it in next-test for some time, >>> it is not there anymore today. >> >> Still having some trouble with the compat VDSO. >> >> eg: >> >> $ ./vdsotest clock-gettime-monotonic verify >> timestamp obtained from kernel predates timestamp >> previously obtained from libc/vDSO: >> [1346, 821441653] (vDSO) >> [570, 769440040] (kernel) >> >> >> And similar for all clocks except the coarse ones. >> > > Ok, I managed to get the same with QEMU. Looking at the binary, I only > see an mftb instead of the mftbu/mftb/mftbu triplet. > > Fix below. Can you carry it, or do you prefer a full patch from me ? > The easiest would be either to squash it into [v13,4/8] > ("powerpc/time: Move timebase functions into new asm/timebase.h"), or > to add it between patch 4 and 5 ? I can squash it in. cheers
Re: [PATCH] powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S
On Wed, 4 Nov 2020 18:59:10 +0800, Youling Tang wrote: > Use the common INIT_DATA_SECTION rule for the linker script in an effort > to regularize the linker script. Applied to powerpc/next. [1/1] powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.S https://git.kernel.org/powerpc/c/fdcfeaba38e5b183045f5b079af94f97658eabe6 cheers
Re: [PATCH] Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
On Tue, 10 Nov 2020 21:07:52 -0500, Zhang Xiaoxu wrote: > This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980. > > Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR > support for drc-info property"), the 'cpu_drcs' wouldn't be double > freed when the 'cpus' node not found. > > So we needn't apply this patch, otherwise, the memory will be leak. Applied to powerpc/next. [1/1] Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path" https://git.kernel.org/powerpc/c/a40fdaf1420d6e6bda0dd2df1e6806013e58dbe1 cheers
Re: [PATCH] powerpc/powernv/sriov: fix unsigned int win compared to less than zero
On Tue, 10 Nov 2020 19:19:30 +0800, xiakaixu1...@gmail.com wrote: > Fix coccicheck warning: > > ./arch/powerpc/platforms/powernv/pci-sriov.c:443:7-10: WARNING: Unsigned > expression compared with zero: win < 0 > ./arch/powerpc/platforms/powernv/pci-sriov.c:462:7-10: WARNING: Unsigned > expression compared with zero: win < 0 Applied to powerpc/next. [1/1] powerpc/powernv/sriov: fix unsigned int win compared to less than zero https://git.kernel.org/powerpc/c/027717a45ca251a7ba67a63db359994836962cd2 cheers
Re: [PATCH] powerpc/mm: Fix comparing pointer to 0 warning
On Tue, 10 Nov 2020 10:56:01 +0800, xiakaixu1...@gmail.com wrote: > Fixes coccicheck warning: > > ./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0 > > Avoid pointer type value compared to 0. Applied to powerpc/next. [1/1] powerpc/mm: Fix comparing pointer to 0 warning https://git.kernel.org/powerpc/c/b84bf098fcc49ed6bf4b0a8bed52e9df0e8f1de7 cheers
Re: [PATCHv2] selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic
On Fri, 23 Oct 2020 10:45:39 +0800, Po-Hsu Lin wrote: > The eeh-basic test got its own 60 seconds timeout (defined in commit > 414f50434aa2 "selftests/eeh: Bump EEH wait time to 60s") per breakable > device. > > And we have discovered that the number of breakable devices varies > on different hardware. The device recovery time ranges from 0 to 35 > seconds. In our test pool it will take about 30 seconds to run on a > Power8 system that with 5 breakable devices, 60 seconds to run on a > Power9 system that with 4 breakable devices. > > [...] Applied to powerpc/next. [1/1] selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic https://git.kernel.org/powerpc/c/f5eca0b279117f25020112a2f65ec9c3ea25f3ac cheers
Re: [PATCH] powerpc/ps3: Drop unused DBG macro
On Fri, 23 Oct 2020 14:13:05 +1100, Michael Ellerman wrote: > This DBG macro is unused, and has been unused since the file was > originally merged into mainline. Just drop it. Applied to powerpc/next. [1/1] powerpc/ps3: Drop unused DBG macro https://git.kernel.org/powerpc/c/cb5d4c465f31bc44b8bbd4934678c2b140a2ad29 cheers
Re: [PATCH] powerpc/85xx: Fix declaration made after definition
On Fri, 23 Oct 2020 13:08:38 +1100, Michael Ellerman wrote: > Currently the clang build of corenet64_smp_defconfig fails with: > > arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error: > attribute declaration must precede definition > machine_arch_initcall(corenet_generic, corenet_gen_publish_devices); > > Fix it by moving the initcall definition prior to the machine > definition, and directly below the function it calls, which is the > usual style anyway. Applied to powerpc/next. [1/1] powerpc/85xx: Fix declaration made after definition https://git.kernel.org/powerpc/c/ef78f2dd2398ce8ed9eeaab9c9f8af2e15f5d870 cheers
Re: [PATCH] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
On Wed, 28 Oct 2020 17:15:51 +0800, Qinglang Miao wrote: > I noticed that iounmap() of msgr_block_addr before return from > mpic_msgr_probe() in the error handling case is missing. So use > devm_ioremap() instead of just ioremap() when remapping the message > register block, so the mapping will be automatically released on > probe failure. Applied to powerpc/next. [1/1] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() https://git.kernel.org/powerpc/c/ffa1797040c5da391859a9556be7b735acbe1242 cheers
Re: [PATCH] powerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory
On Wed, 11 Nov 2020 22:01:51 +1000, Nicholas Piggin wrote: > read_user_stack_slow that walks user address translation by hand is > only required on hash, because a hash fault can not be serviced from > "NMI" context (to avoid re-entering the hash code) so the user stack > can be mapped into Linux page tables but not accessible by the CPU. > > Radix MMU mode does not have this restriction. A page fault failure > would indicate the page is not accessible via get_user_pages either, > so avoid this on radix. Applied to powerpc/next. [1/1] powerpc/64s/perf: perf interrupt does not have to get_user_pages to access user memory https://git.kernel.org/powerpc/c/987c426320cce72d1b28f55c8603b239e4f7187c cheers
Re: [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
On Wed, 11 Nov 2020 15:53:14 +0100, David Hildenbrand wrote: > Based on latest linux/master > > powernv/memtrace is the only in-kernel user that rips out random memory > it never added (doesn't own) in order to allocate memory without a > linear mapping. Let's stop abusing memory hot(un)plug infrastructure for > that - use alloc_contig_pages() for allocating memory and remove the > linear mapping manually. > > [...] Applied to powerpc/next. [1/8] powerpc/powernv/memtrace: Don't leak kernel memory to user space https://git.kernel.org/powerpc/c/c74cf7a3d59a21b290fe0468f5b470d0b8ee37df [2/8] powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently https://git.kernel.org/powerpc/c/d6718941a2767fb383e105d257d2105fe4f15f0e [3/8] powerpc/mm: factor out creating/removing linear mapping https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42 [4/8] powerpc/mm: protect linear mapping modifications by a mutex https://git.kernel.org/powerpc/c/e5b2af044f31bf18defa557a8cd11c23caefa34c [5/8] powerpc/mm: print warning in arch_remove_linear_mapping() https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2 [6/8] powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping() https://git.kernel.org/powerpc/c/d8bd9a121c2f2bc8b36da930dc91b69fd2a705e2 [7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2 [8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70 cheers
Re: [PATCH v4 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
On Wed, 14 Oct 2020 18:28:36 +1100, Jordan Niethe wrote: > Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore() > is called before a stack has been set up in r1. This was previously fine > as the cpu_restore() functions were implemented in assembly and did not > use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new > device tree binding for discovering CPU features") used > __restore_cpu_cpufeatures() as the cpu_restore() function for a > device-tree features based cputable entry. This is a C function and > hence uses a stack in r1. > > [...] Applied to powerpc/next. [1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore() https://git.kernel.org/powerpc/c/3c0b976bf20d236c57adcefa80f86a0a1d737727 [2/2] powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C https://git.kernel.org/powerpc/c/344fbab991a568dc33ad90711b489d870e18d26d cheers
Re: [PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
On Thu, 29 Oct 2020 17:27:14 +0100, David Hildenbrand wrote: > powernv/memtrace is the only in-kernel user that rips out random memory > it never added (doesn't own) in order to allocate memory without a > linear mapping. Let's stop abusing memory hot(un)plug infrastructure for > that - use alloc_contig_pages() for allocating memory and remove the > linear mapping manually. > > The original idea was discussed in: > https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915...@redhat.com > > [...] Applied to powerpc/next. [1/4] powerpc/mm: factor out creating/removing linear mapping https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42 [2/4] powerpc/mm: print warning in arch_remove_linear_mapping() https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2 [3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2 [4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70 cheers
Re: [PATCH v2 1/3] powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI
On Sun, 8 Nov 2020 16:57:35 + (UTC), Christophe Leroy wrote: > In head_64.S, we have two places using RFI to return to > kernel. Use RFI_TO_KERNEL instead. > > They are the two only places using RFI on book3s/64, so > the RFI macro can go away. Applied to powerpc/next. [1/3] powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFI https://git.kernel.org/powerpc/c/879add7720172ffd2986c44587510fabb7af52f5 [2/3] powerpc: Replace RFI by rfi on book3s/32 and booke https://git.kernel.org/powerpc/c/120c0518ec321f33cdc4670059fb76e96ceb56eb [3/3] powerpc: Remove RFI macro https://git.kernel.org/powerpc/c/62182e6c0faf75117f8d1719c118bb5fc8574012 cheers
Re: [PATCH v13 0/8] powerpc: switch VDSO to C implementation
On Tue, 3 Nov 2020 18:07:11 + (UTC), Christophe Leroy wrote: > This is a series to switch powerpc VDSO to generic C implementation. > > Changes in v13: > - Reorganised headers to avoid the need for a fake 32 bits config for > building VDSO32 on PPC64 > - Rebased after the removal of powerpc 601 > - Using DOTSYM() macro to call functions directly without using OPD > - Explicitely dropped .opd and .got1 sections which are now unused > > [...] Patch 1 applied to powerpc/next. [1/8] powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32 https://git.kernel.org/powerpc/c/78665179e569c7e1fe102fb6c21d0f5b6951f084 cheers
Re: [PATCH] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
On Thu, 22 Oct 2020 14:05:46 + (UTC), Christophe Leroy wrote: > fls() and fls64() are using __builtin_ctz() and _builtin_ctzll(). > On powerpc, those builtins trivially use ctlzw and ctlzd power > instructions. > > Allthough those instructions provide the expected result with > input argument 0, __builtin_ctz() and __builtin_ctzll() are > documented as undefined for value 0. > > [...] Applied to powerpc/next. [1/1] powerpc/bitops: Fix possible undefined behaviour with fls() and fls64() https://git.kernel.org/powerpc/c/1891ef21d92c4801ea082ee8ed478e304ddc6749 cheers
Re: [PATCH] powerpc: avoid broken GCC __attribute__((optimize))
On Wed, 28 Oct 2020 09:04:33 +0100, Ard Biesheuvel wrote: > Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") > introduced a couple of uses of __attribute__((optimize)) with function > scope, to disable the stack protector in some early boot code. > > Unfortunately, and this is documented in the GCC man pages [0], overriding > function attributes for optimization is broken, and is only supported for > debug scenarios, not for production: the problem appears to be that > setting GCC -f flags using this method will cause it to forget about some > or all other optimization settings that have been applied. > > [...] Applied to powerpc/next. [1/1] powerpc: Avoid broken GCC __attribute__((optimize)) https://git.kernel.org/powerpc/c/a7223f5bfcaeade4a86d35263493bcda6c940891 cheers
Re: [PATCH v2] powerpc/mm: Update tlbiel loop on POWER10
On Wed, 7 Oct 2020 11:03:05 +0530, Aneesh Kumar K.V wrote: > With POWER10, single tlbiel instruction invalidates all the congruence > class of the TLB and hence we need to issue only one tlbiel with SET=0. Applied to powerpc/next. [1/1] powerpc/mm: Update tlbiel loop on POWER10 https://git.kernel.org/powerpc/c/e80639405c40127727812a0e1f8a65ba9979f146 cheers
Re: [PATCH] powerpc/mm: move setting pte specific flags to pfn_pmd
On Thu, 22 Oct 2020 14:41:15 +0530, Aneesh Kumar K.V wrote: > powerpc used to set the pte specific flags in set_pte_at(). This is > different from other architectures. To be consistent with other > architecture powerpc updated pfn_pte to set _PAGE_PTE with > commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte") > > The commit didn't do the same w.r.t pfn_pmd because we expect pmd_mkhuge > to do that. But as per Linus that is a bad rule [1]. > Hence update pfn_pmd to set _PAGE_PTE. > > [...] Applied to powerpc/next. [1/1] powerpc/mm: Move setting PTE specific flags to pfn_pmd() https://git.kernel.org/powerpc/c/53f45ecc9cd04b4b963f3040f2a54c3baf03b229 cheers
[PATCH v2 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
With virtio multiqueue, normally each queue IRQ is mapped to a CPU. But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") this is broken on pseries. The affinity is correctly computed in msi_desc but this is not applied to the system IRQs. It appears the affinity is correctly passed to rtas_setup_msi_irqs() but lost at this point and never passed to irq_domain_alloc_descs() (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")) because irq_create_mapping() doesn't take an affinity parameter. As the previous patch has added the affinity parameter to irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs() to irq_domain_alloc_descs(). With this change, the virtqueues are correctly dispatched between the CPUs on pseries. Signed-off-by: Laurent Vivier --- arch/powerpc/platforms/pseries/msi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c index 133f6adcb39c..b3ac2455faad 100644 --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) return hwirq; } - virq = irq_create_mapping(NULL, hwirq); + virq = irq_create_mapping_affinity(NULL, hwirq, + entry->affinity); if (!virq) { pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq); -- 2.28.0
[PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
This function adds an affinity parameter to irq_create_mapping(). This parameter is needed to pass it to irq_domain_alloc_descs(). irq_create_mapping() is a wrapper around irq_create_mapping_affinity() to pass NULL for the affinity parameter. No functional change. Signed-off-by: Laurent Vivier --- include/linux/irqdomain.h | 12 ++-- kernel/irq/irqdomain.c| 13 - 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 71535e87109f..ea5a337e0f8b 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain, extern void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq); -extern unsigned int irq_create_mapping(struct irq_domain *host, - irq_hw_number_t hwirq); +extern unsigned int irq_create_mapping_affinity(struct irq_domain *host, + irq_hw_number_t hwirq, + const struct irq_affinity_desc *affinity); extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec); extern void irq_dispose_mapping(unsigned int virq); +static inline unsigned int irq_create_mapping(struct irq_domain *host, + irq_hw_number_t hwirq) +{ + return irq_create_mapping_affinity(host, hwirq, NULL); +} + + /** * irq_linear_revmap() - Find a linux irq from a hw irq number. * @domain: domain owning this hardware interrupt diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index cf8b374b892d..e4ca69608f3b 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain) EXPORT_SYMBOL_GPL(irq_create_direct_mapping); /** - * irq_create_mapping() - Map a hardware interrupt into linux irq space + * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space * @domain: domain owning this hardware interrupt or NULL for default domain * @hwirq: hardware irq number in that domain space + * @affinity: irq affinity * * Only one mapping per hardware interrupt is permitted. Returns a linux * irq number. * If the sense/trigger is to be specified, set_irq_type() should be called * on the number returned from that call. */ -unsigned int irq_create_mapping(struct irq_domain *domain, - irq_hw_number_t hwirq) +unsigned int irq_create_mapping_affinity(struct irq_domain *domain, + irq_hw_number_t hwirq, + const struct irq_affinity_desc *affinity) { struct device_node *of_node; int virq; @@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain, } /* Allocate a virtual interrupt number */ - virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL); + virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), + affinity); if (virq <= 0) { pr_debug("-> virq allocation failed\n"); return 0; @@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain, return virq; } -EXPORT_SYMBOL_GPL(irq_create_mapping); +EXPORT_SYMBOL_GPL(irq_create_mapping_affinity); /** * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs -- 2.28.0
[PATCH v2 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries
With virtio, in multiqueue case, each queue IRQ is normally bound to a different CPU using the affinity mask. This works fine on x86_64 but totally ignored on pseries. This is not obvious at first look because irqbalance is doing some balancing to improve that. It appears that the "managed" flag set in the MSI entry is never copied to the system IRQ entry. This series passes the affinity mask from rtas_setup_msi_irqs() to irq_domain_alloc_descs() by adding an affinity parameter to irq_create_mapping(). The first patch adds the parameter (no functional change), the second patch passes the actual affinity mask to irq_create_mapping() in rtas_setup_msi_irqs(). For instance, with 32 CPUs VM and 32 queues virtio-scsi interface: ... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32 for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do for file in /proc/irq/$IRQ/ ; do echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list done done Without the patch (and without irqbalanced) IRQ: 268 CPU: 0-31 IRQ: 269 CPU: 0-31 IRQ: 270 CPU: 0-31 IRQ: 271 CPU: 0-31 IRQ: 272 CPU: 0-31 IRQ: 273 CPU: 0-31 IRQ: 274 CPU: 0-31 IRQ: 275 CPU: 0-31 IRQ: 276 CPU: 0-31 IRQ: 277 CPU: 0-31 IRQ: 278 CPU: 0-31 IRQ: 279 CPU: 0-31 IRQ: 280 CPU: 0-31 IRQ: 281 CPU: 0-31 IRQ: 282 CPU: 0-31 IRQ: 283 CPU: 0-31 IRQ: 284 CPU: 0-31 IRQ: 285 CPU: 0-31 IRQ: 286 CPU: 0-31 IRQ: 287 CPU: 0-31 IRQ: 288 CPU: 0-31 IRQ: 289 CPU: 0-31 IRQ: 290 CPU: 0-31 IRQ: 291 CPU: 0-31 IRQ: 292 CPU: 0-31 IRQ: 293 CPU: 0-31 IRQ: 294 CPU: 0-31 IRQ: 295 CPU: 0-31 IRQ: 296 CPU: 0-31 IRQ: 297 CPU: 0-31 IRQ: 298 CPU: 0-31 IRQ: 299 CPU: 0-31 With the patch: IRQ: 265 CPU: 0 IRQ: 266 CPU: 1 IRQ: 267 CPU: 2 IRQ: 268 CPU: 3 IRQ: 269 CPU: 4 IRQ: 270 CPU: 5 IRQ: 271 CPU: 6 IRQ: 272 CPU: 7 IRQ: 273 CPU: 8 IRQ: 274 CPU: 9 IRQ: 275 CPU: 10 IRQ: 276 CPU: 11 IRQ: 277 CPU: 12 IRQ: 278 CPU: 13 IRQ: 279 CPU: 14 IRQ: 280 CPU: 15 IRQ: 281 CPU: 16 IRQ: 282 CPU: 17 IRQ: 283 CPU: 18 IRQ: 284 CPU: 19 IRQ: 285 CPU: 20 IRQ: 286 CPU: 21 IRQ: 287 CPU: 22 IRQ: 288 CPU: 23 IRQ: 289 CPU: 24 IRQ: 290 CPU: 25 IRQ: 291 CPU: 26 IRQ: 292 CPU: 27 IRQ: 293 CPU: 28 IRQ: 294 CPU: 29 IRQ: 295 CPU: 30 IRQ: 299 CPU: 31 This matches what we have on an x86_64 system. v2: add a wrapper around original irq_create_mapping() with the affinity parameter. Update comments Laurent Vivier (2): genirq: add an irq_create_mapping_affinity() function powerpc/pseries: pass MSI affinity to irq_create_mapping() arch/powerpc/platforms/pseries/msi.c | 3 ++- include/linux/irqdomain.h| 12 ++-- kernel/irq/irqdomain.c | 13 - 3 files changed, 20 insertions(+), 8 deletions(-) -- 2.28.0
Re: [PATCH 1/2] powerpc: sstep: Fix load and update instructions
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 855457ed09b5..25a5436be6c6 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -2157,11 +2157,15 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, case 23: /* lwzx */ case 55:/* lwzux */ + if (u && (ra == 0 || ra == rd)) + return -1; I guess you also need to split case 23 and 55? - Ravi
Re: C vdso
Quoting Michael Ellerman : Christophe Leroy writes: Le 03/11/2020 à 19:13, Christophe Leroy a écrit : Le 23/10/2020 à 15:24, Michael Ellerman a écrit : Christophe Leroy writes: Le 24/09/2020 à 15:17, Christophe Leroy a écrit : Le 17/09/2020 à 14:33, Michael Ellerman a écrit : Christophe Leroy writes: What is the status with the generic C vdso merge ? In some mail, you mentionned having difficulties getting it working on ppc64, any progress ? What's the problem ? Can I help ? Yeah sorry I was hoping to get time to work on it but haven't been able to. It's causing crashes on ppc64 ie. big endian. ... Can you tell what defconfig you are using ? I have been able to setup a full glibc PPC64 cross compilation chain and been able to test it under QEMU with success, using Nathan's vdsotest tool. What config are you using ? ppc64_defconfig + guest.config Or pseries_defconfig. I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other toolchains too. At a minimum we're seeing relocations in the output, which is a problem: $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries: Offset Info Type Sym. Value Sym. Name + Addend 1368 0016 R_PPC64_RELATIVE 7c0 1370 0016 R_PPC64_RELATIVE 9300 1380 0016 R_PPC64_RELATIVE 970 1388 0016 R_PPC64_RELATIVE 9300 1398 0016 R_PPC64_RELATIVE a90 13a0 0016 R_PPC64_RELATIVE 9300 13b0 0016 R_PPC64_RELATIVE b20 13b8 0016 R_PPC64_RELATIVE 9300 Looks like it's due to the OPD and relation between the function() and .function() By using DOTSYM() in the 'bl' call, that's directly the dot function which is called and the OPD is not used anymore, it can get dropped. Now I get .rela.dyn full of 0, don't know if we should drop it explicitely. What is the status now with latest version of CVDSO ? I saw you had it in next-test for some time, it is not there anymore today. Still having some trouble with the compat VDSO. eg: $ ./vdsotest clock-gettime-monotonic verify timestamp obtained from kernel predates timestamp previously obtained from libc/vDSO: [1346, 821441653] (vDSO) [570, 769440040] (kernel) And similar for all clocks except the coarse ones. Ok, I managed to get the same with QEMU. Looking at the binary, I only see an mftb instead of the mftbu/mftb/mftbu triplet. Fix below. Can you carry it, or do you prefer a full patch from me ? The easiest would be either to squash it into [v13,4/8] ("powerpc/time: Move timebase functions into new asm/timebase.h"), or to add it between patch 4 and 5 ? diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index f877a576b338..c3473eb031a3 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1419,7 +1419,7 @@ static inline void msr_check_and_clear(unsigned long bits) __msr_check_and_clear(bits); } -#if defined(CONFIG_PPC_CELL) || defined(CONFIG_E500) +#if defined(__powerpc64__) && (defined(CONFIG_PPC_CELL) || defined(CONFIG_E500)) #define mftb() ({unsigned long rval; \ asm volatile( \ "90: mfspr %0, %2;\n" \ diff --git a/arch/powerpc/include/asm/timebase.h b/arch/powerpc/include/asm/timebase.h index a8eae3adaa91..7b372976f5a5 100644 --- a/arch/powerpc/include/asm/timebase.h +++ b/arch/powerpc/include/asm/timebase.h @@ -21,7 +21,7 @@ static inline u64 get_tb(void) { unsigned int tbhi, tblo, tbhi2; - if (IS_ENABLED(CONFIG_PPC64)) + if (IS_BUILTIN(__powerpc64__)) return mftb(); do {
Re: [PATCH v4 10/18] dt-bindings: usb: Convert DWC USB3 bindings to DT schema
On Sat, Nov 21, 2020 at 06:42:28AM -0600, Rob Herring wrote: > On Thu, Nov 12, 2020 at 01:29:46PM +0300, Serge Semin wrote: > > On Wed, Nov 11, 2020 at 02:14:23PM -0600, Rob Herring wrote: > > > On Wed, Nov 11, 2020 at 12:08:45PM +0300, Serge Semin wrote: > > > > DWC USB3 DT node is supposed to be compliant with the Generic xHCI > > > > Controller schema, but with additional vendor-specific properties, the > > > > controller-specific reference clocks and PHYs. So let's convert the > > > > currently available legacy text-based DWC USB3 bindings to the DT schema > > > > and make sure the DWC USB3 nodes are also validated against the > > > > usb-xhci.yaml schema. > > > > > > > > Note we have to discard the nodename restriction of being prefixed with > > > > "dwc3@" string, since in accordance with the usb-hcd.yaml schema USB > > > > nodes > > > > are supposed to be named as "^usb(@.*)". > > > > > > > > Signed-off-by: Serge Semin > > > > > > > > --- > > > > > > > > Changelog v2: > > > > - Discard '|' from the descriptions, since we don't need to preserve > > > > the text formatting in any of them. > > > > - Drop quotes from around the string constants. > > > > - Fix the "clock-names" prop description to be referring the enumerated > > > > clock-names instead of the ones from the Databook. > > > > > > > > Changelog v3: > > > > - Apply usb-xhci.yaml# schema only if the controller is supposed to work > > > > as either host or otg. > > > > > > > > Changelog v4: > > > > - Apply usb-drd.yaml schema first. If the controller is configured > > > > to work in a gadget mode only, then apply the usb.yaml schema too, > > > > otherwise apply the usb-xhci.yaml schema. > > > > - Discard the Rob'es Reviewed-by tag. Please review the patch one more > > > > time. > > > > --- > > > > .../devicetree/bindings/usb/dwc3.txt | 125 > > > > .../devicetree/bindings/usb/snps,dwc3.yaml| 303 ++ > > > > 2 files changed, 303 insertions(+), 125 deletions(-) > > > > delete mode 100644 Documentation/devicetree/bindings/usb/dwc3.txt > > > > create mode 100644 Documentation/devicetree/bindings/usb/snps,dwc3.yaml > > > > > > diff --git a/Documentation/devicetree/bindings/usb/snps,dwc3.yaml > > > > b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml > > > > new file mode 100644 > > > > index ..079617891da6 > > > > --- /dev/null > > > > +++ b/Documentation/devicetree/bindings/usb/snps,dwc3.yaml > > > > @@ -0,0 +1,303 @@ > > > > +# SPDX-License-Identifier: GPL-2.0 > > > > +%YAML 1.2 > > > > +--- > > > > +$id: http://devicetree.org/schemas/usb/snps,dwc3.yaml# > > > > +$schema: http://devicetree.org/meta-schemas/core.yaml# > > > > + > > > > +title: Synopsys DesignWare USB3 Controller > > > > + > > > > +maintainers: > > > > + - Felipe Balbi > > > > + > > > > +description: > > > > + This is usually a subnode to DWC3 glue to which it is connected, but > > > > can also > > > > + be presented as a standalone DT node with an optional vendor-specific > > > > + compatible string. > > > > + > > > > > > +allOf: > > > > + - $ref: usb-drd.yaml# > > > > + - if: > > > > + properties: > > > > +dr_mode: > > > > + const: peripheral > > Another thing, this evaluates to true if dr_mode is not present. You > need to add 'required'? Right. Will something like this do that? + allOf: + - $ref: usb-drd.yaml# + - if: + properties: +dr_mode: + const: peripheral + + required: +- dr_mode +then: + $ref: usb.yaml# +else + $ref: usb-xhci.yaml# > If dr_mode is otg, then don't you need to apply > both usb.yaml and usb-xhci.yaml? No I don't. Since there is no peripheral-specific DT schema, then the only schema any USB-gadget node needs to pass is usb.yaml, which is already included into the usb-xhci.yaml schema. So for pure OTG devices with xHCI host and gadget capabilities it's enough to evaluate: allOf: [$ref: usb-drd.yaml#, $ref: usb-xhci.yaml#]. Please see the sketch/ASCII-figure below and the following text for details. -Sergey > > > > > +then: > > > > + $ref: usb.yaml# > > > > > > This part could be done in usb-drd.yaml? > > > > Originally I was thinking about that, but then in order to minimize > > the properties validation I've decided to split the properties in > > accordance with the USB controllers functionality: > > > > +- USB Gadget/Peripheral Controller. There is no > > | specific schema for the gadgets since there is no > > | common gadget properties (at least I failed to find > > | ones). So the pure gadget controllers need to be > > | validated just against usb.yaml schema. > > | > > usb.yaml <--+-- usb-hcd.yaml - Generic USB Host Controller. The schema > > ^ turns out to include the OHCI/UHCI/EHCI > > | properties, which AFAICS