Re: [PATCH v1 05/10] powerpc/mm: Do early ioremaps from top to bottom on PPC64 too.

2019-08-13 Thread Christoph Hellwig
On Tue, Aug 13, 2019 at 08:11:38PM +, Christophe Leroy wrote:
> Until vmalloc system is up and running, ioremap basically
> allocates addresses at the border of the IOREMAP area.

Note that while a few other architectures have a magic hack like powerpc
to make ioremap work before vmalloc, the normal practice would be
to explicitly use early_ioremap.  I guess your change is fine for now,
but it might make sense convert powerpc to the explicit early_ioremap
scheme as well.


Re: [PATCH v1 10/10] powerpc/mm: refactor ioremap_range() and use ioremap_page_range()

2019-08-13 Thread Christoph Hellwig
Somehow this series is missing a cover letter.

While you are touching all this "fun" can you also look into killing
__ioremap?  It seems to be a weird non-standard version of ioremap_prot
(probably predating ioremap_prot) that is missing a few lines of code
setting attributes that might not even be applicable for the two drivers
calling it.


Re: [PATCH v1 02/10] powerpc/mm: rework io-workaround invocation.

2019-08-13 Thread Christoph Hellwig
On Tue, Aug 13, 2019 at 08:11:34PM +, Christophe Leroy wrote:
> ppc_md.ioremap() is only used for I/O workaround on CELL platform,
> so indirect function call can be avoided.
> 
> This patch reworks the io-workaround and ioremap() functions to
> use static keys for the activation of io-workaround.
> 
> When CONFIG_PPC_IO_WORKAROUNDS or CONFIG_PPC_INDIRECT_MMIO are not
> selected, the I/O workaround ioremap() voids and the static key is
> not used at all.

Why bother with the complex static key?  ioremap isn't exactly a fast
path.  Just make it a normal branch if enabled, with the option to
compile it out entirely as in your patch.


Re: [PATCH] powerpc/32s: fix boot failure with DEBUG_PAGEALLOC without KASAN.

2019-08-13 Thread Christoph Hellwig
On Wed, Aug 14, 2019 at 05:28:35AM +, Christophe Leroy wrote:
> When KASAN is selected, the definitive hash table has to be
> set up later, but there is already an early temporary one.
> 
> When KASAN is not selected, there is no early hash table,
> so the setup of the definitive hash table cannot be delayed.

I think you also want to add this information to the code itself
as comments..


Re: [REGRESSION] Boot failure with DEBUG_PAGEALLOC on Wii, after PPC32 KASAN patches

2019-08-13 Thread Christophe Leroy

Hi

Le 13/08/2019 à 17:51, Jonathan Neuschäfer a écrit :

Hi,

I noticed that my Nintendo Wii doesn't boot with wii_defconfig plus
CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
on recent kernels. I get a splash like this one:

[0.022245] BUG: Unable to handle kernel data access at 0x6601
[0.025172] Faulting instruction address: 0xc01afa48
[0.027522] Oops: Kernel access of bad area, sig: 11 [#1]
[0.030076] BE PAGE_SIZE=4K MMU=Hash PREEMPT DEBUG_PAGEALLOC wii


[...]



(Without CONFIG_DEBUG_PAGEALLOC I haven't noticed any problems.)


'git bisect' says:

72f208c6a8f7bc78ef5248babd9e6ed6302bd2a0 is the first bad commit
commit 72f208c6a8f7bc78ef5248babd9e6ed6302bd2a0
Author: Christophe Leroy 
Date:   Fri Apr 26 16:23:35 2019 +

 powerpc/32s: move hash code patching out of MMU_init_hw()



[...]




I can revert this commit, and then 5.3-rc2 (plus a patchset adding a
serial driver) boot again.

Christophe, is there anything I should test in order to figure out how
to fix this properly?


I just sent out a patch that should fix it. Please test and tell me.

Thanks
Christophe


[PATCH] powerpc/32s: fix boot failure with DEBUG_PAGEALLOC without KASAN.

2019-08-13 Thread Christophe Leroy
When KASAN is selected, the definitive hash table has to be
set up later, but there is already an early temporary one.

When KASAN is not selected, there is no early hash table,
so the setup of the definitive hash table cannot be delayed.

Reported-by: Jonathan Neuschafer 
Fixes: 72f208c6a8f7 ("powerpc/32s: move hash code patching out of 
MMU_init_hw()")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S  | 2 ++
 arch/powerpc/mm/book3s32/mmu.c | 5 +
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index f255e22184b4..c8b4f7ed318c 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -897,9 +897,11 @@ start_here:
bl  machine_init
bl  __save_cpu_setup
bl  MMU_init
+#ifdef CONFIG_KASAN
 BEGIN_MMU_FTR_SECTION
bl  MMU_init_hw_patch
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
+#endif
 
 /*
  * Go back to running unmapped so we can load up new values
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index e249fbf6b9c3..6ddcbfad5c9e 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -358,6 +358,11 @@ void __init MMU_init_hw(void)
hash_mb2 = hash_mb = 32 - LG_HPTEG_SIZE - lg_n_hpteg;
if (lg_n_hpteg > 16)
hash_mb2 = 16 - LG_HPTEG_SIZE;
+
+   if (IS_ENABLED(CONFIG_KASAN))
+   return;
+
+   MMU_init_hw_patch();
 }
 
 void __init MMU_init_hw_patch(void)
-- 
2.13.3



Re: [PATCH v1 08/10] powerpc/mm: move __ioremap_at() and __iounmap_at() into ioremap.c

2019-08-13 Thread Christoph Hellwig
> +/**
> + * __iounmap_from - Low level function to tear down the page tables
> + *  for an IO mapping. This is used for mappings that
> + *  are manipulated manually, like partial unmapping of
> + *  PCI IOs or ISA space.
> + */
> +void __iounmap_at(void *ea, unsigned long size)

The comment doesn't mention the function name.  That's why I ususally
don't even add the function name so that it doesn't get out of sync.


Re: [PATCH v1 01/10] powerpc/mm: drop ppc_md.iounmap()

2019-08-13 Thread Christoph Hellwig
On Tue, Aug 13, 2019 at 08:11:33PM +, Christophe Leroy wrote:
> ppc_md.iounmap() is never set, drop it.
> 
> Signed-off-by: Christophe Leroy 

Hah, I was just going to send the same patch as part of an tree-wide
ioremap related series..

Reviewed-by: Christoph Hellwig 


Re: [PATCH v2 1/3] KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts

2019-08-13 Thread Jordan Niethe
On Tue, 2019-08-13 at 20:03 +1000, Paul Mackerras wrote:
> Escalation interrupts are interrupts sent to the host by the XIVE
> hardware when it has an interrupt to deliver to a guest VCPU but that
> VCPU is not running anywhere in the system.  Hence we disable the
> escalation interrupt for the VCPU being run when we enter the guest
> and re-enable it when the guest does an H_CEDE hypercall indicating
> it is idle.
> 
> It is possible that an escalation interrupt gets generated just as we
> are entering the guest.  In that case the escalation interrupt may be
> using a queue entry in one of the interrupt queues, and that queue
> entry may not have been processed when the guest exits with an
> H_CEDE.
> The existing entry code detects this situation and does not clear the
> vcpu->arch.xive_esc_on flag as an indication that there is a pending
> queue entry (if the queue entry gets processed, xive_esc_irq() will
> clear the flag).  There is a comment in the code saying that if the
> flag is still set on H_CEDE, we have to abort the cede rather than
> re-enabling the escalation interrupt, lest we end up with two
> occurrences of the escalation interrupt in the interrupt queue.
> 
> However, the exit code doesn't do that; it aborts the cede in the
> sense
> that vcpu->arch.ceded gets cleared, but it still enables the
> escalation
> interrupt by setting the source's PQ bits to 00.  Instead we need to
> set the PQ bits to 10, indicating that an interrupt has been
> triggered.
> We also need to avoid setting vcpu->arch.xive_esc_on in this case
> (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
> xive_esc_irq() will run at some point and clear it, and if we race
> with
> that we may end up with an incorrect result (i.e. xive_esc_on set
> when
> the escalation interrupt has just been handled).
> 
> It is extremely unlikely that having two queue entries would cause
> observable problems; theoretically it could cause queue overflow, but
> the CPU would have to have thousands of interrupts targetted to it
> for
> that to be possible.  However, this fix will also make it possible to
> determine accurately whether there is an unhandled escalation
> interrupt in the queue, which will be needed by the following patch.
> 
> Cc: sta...@vger.kernel.org # v4.16+
> Fixes: 9b9b13a6d153 ("KVM: PPC: Book3S HV: Keep XIVE escalation
> interrupt masked unless ceded")
> Signed-off-by: Paul Mackerras 
> ---
> v2: don't set xive_esc_on if we're not using a XIVE escalation
> interrupt.
> 
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 36 +
> 
>  1 file changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 337e644..2e7e788 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2831,29 +2831,39 @@ kvm_cede_prodded:
>  kvm_cede_exit:
>   ld  r9, HSTATE_KVM_VCPU(r13)
>  #ifdef CONFIG_KVM_XICS
> - /* Abort if we still have a pending escalation */
> + /* are we using XIVE with single escalation? */
> + ld  r10, VCPU_XIVE_ESC_VADDR(r9)
> + cmpdi   r10, 0
> + beq 3f
> + li  r6, XIVE_ESB_SET_PQ_00
Would it make sense to put the above instruction down into the 4: label
instead? If we do not branch to 4, r6 is overwriten anyway. 
I think that would save a load when we do not branch to 4. Also it
would mean that you could use r5 everywhere instead of changing it to
r6? 
> + /*
> +  * If we still have a pending escalation, abort the cede,
> +  * and we must set PQ to 10 rather than 00 so that we don't
> +  * potentially end up with two entries for the escalation
> +  * interrupt in the XIVE interrupt queue.  In that case
> +  * we also don't want to set xive_esc_on to 1 here in
> +  * case we race with xive_esc_irq().
> +  */
>   lbz r5, VCPU_XIVE_ESC_ON(r9)
>   cmpwi   r5, 0
> - beq 1f
> + beq 4f
>   li  r0, 0
>   stb r0, VCPU_CEDED(r9)
> -1:   /* Enable XIVE escalation */
> - li  r5, XIVE_ESB_SET_PQ_00
> + li  r6, XIVE_ESB_SET_PQ_10
> + b   5f
> +4:   li  r0, 1
> + stb r0, VCPU_XIVE_ESC_ON(r9)
> + /* make sure store to xive_esc_on is seen before xive_esc_irq
> runs */
> + sync
> +5:   /* Enable XIVE escalation */
>   mfmsr   r0
>   andi.   r0, r0, MSR_DR  /* in real mode? */
>   beq 1f
> - ld  r10, VCPU_XIVE_ESC_VADDR(r9)
> - cmpdi   r10, 0
> - beq 3f
> - ldx r0, r10, r5
> + ldx r0, r10, r6
>   b   2f
>  1:   ld  r10, VCPU_XIVE_ESC_RADDR(r9)
> - cmpdi   r10, 0
> - beq 3f
> - ldcix   r0, r10, r5
> + ldcix   r0, r10, r6
>  2:   sync
> - li  r0, 1
> - stb r0, VCPU_XIVE_ESC_ON(r9)
>  #endif /* CONFIG_KVM_XICS */
>  3:   b   guest_exit_cont
>  



Re: [PATCH v5 1/4] nvdimm: Consider probe return -EOPNOTSUPP as success

2019-08-13 Thread Dan Williams
Hi Aneesh, logic looks correct but there are some cleanups I'd like to
see and a lead-in patch that I attached.

I've started prefixing nvdimm patches with:

libnvdimm/$component:

...since this patch mostly impacts the pmem driver lets prefix it
"libnvdimm/pmem: "

On Fri, Aug 9, 2019 at 12:45 AM Aneesh Kumar K.V
 wrote:
>
> This patch add -EOPNOTSUPP as return from probe callback to

s/This patch add/Add/

No need to say "this patch" it's obviously a patch.

> indicate we were not able to initialize a namespace due to pfn superblock
> feature/version mismatch. We want to consider this a probe success so that
> we can create new namesapce seed and there by avoid marking the failed
> namespace as the seed namespace.

Please replace usage of "we" with the exact agent involved as which
"we" is being referred to gets confusing for the reader.

i.e. "indicate that the pmem driver was not..." "The nvdimm core wants
to consider this...".

>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/bus.c  |  2 +-
>  drivers/nvdimm/pmem.c | 26 ++
>  2 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index 798c5c4aea9c..16c35e6446a7 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -95,7 +95,7 @@ static int nvdimm_bus_probe(struct device *dev)
> rc = nd_drv->probe(dev);
> debug_nvdimm_unlock(dev);
>
> -   if (rc == 0)
> +   if (rc == 0 || rc == -EOPNOTSUPP)
> nd_region_probe_success(nvdimm_bus, dev);

This now makes the nd_region_probe_success() helper obviously misnamed
since it now wants to take actions on non-probe success. I attached a
lead-in cleanup that you can pull into your series that renames that
routine to nd_region_advance_seeds().

When you rebase this needs a comment about why EOPNOTSUPP has special handling.

> else
> nd_region_disable(nvdimm_bus, dev);
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 4c121dd03dd9..3f498881dd28 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -490,6 +490,7 @@ static int pmem_attach_disk(struct device *dev,
>
>  static int nd_pmem_probe(struct device *dev)
>  {
> +   int ret;
> struct nd_namespace_common *ndns;
>
> ndns = nvdimm_namespace_common_probe(dev);
> @@ -505,12 +506,29 @@ static int nd_pmem_probe(struct device *dev)
> if (is_nd_pfn(dev))
> return pmem_attach_disk(dev, ndns);
>
> -   /* if we find a valid info-block we'll come back as that personality 
> */
> -   if (nd_btt_probe(dev, ndns) == 0 || nd_pfn_probe(dev, ndns) == 0
> -   || nd_dax_probe(dev, ndns) == 0)

Similar need for an updated comment here to explain the special
translation of error codes.

> +   ret = nd_btt_probe(dev, ndns);
> +   if (ret == 0)
> return -ENXIO;
> +   else if (ret == -EOPNOTSUPP)

Are there cases where the btt driver needs to return EOPNOTSUPP? I'd
otherwise like to keep this special casing constrained to the pfn /
dax info block cases.
From 9ec13a8672e87e0b1c5b9427ab926168e53d55bc Mon Sep 17 00:00:00 2001
From: Dan Williams 
Date: Tue, 13 Aug 2019 13:09:27 -0700
Subject: [PATCH] libnvdimm/region: Rewrite _probe_success() to
 _advance_seeds()

The nd_region_probe_success() helper collides seed management with
nvdimm->busy tracking. Given the 'busy' increment is handled internal to the
nd_region driver 'probe' path move the decrement to the 'remove' path.
With that cleanup the routine can be renamed to the more descriptive
nd_region_advance_seeds().

The change is prompted by an incoming need to optionally advance the
seeds on other events besides 'probe' success.

Cc: "Aneesh Kumar K.V" 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/bus.c|  7 +---
 drivers/nvdimm/namespace_devs.c | 34 ++---
 drivers/nvdimm/nd-core.h|  3 +-
 drivers/nvdimm/region_devs.c| 68 +
 4 files changed, 41 insertions(+), 71 deletions(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 29479d3b01b0..ee6de34ae525 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -95,10 +95,8 @@ static int nvdimm_bus_probe(struct device *dev)
 	rc = nd_drv->probe(dev);
 	debug_nvdimm_unlock(dev);
 
-	if (rc == 0)
-		nd_region_probe_success(nvdimm_bus, dev);
-	else
-		nd_region_disable(nvdimm_bus, dev);
+	if (rc == 0 && dev->parent && is_nd_region(dev->parent))
+		nd_region_advance_seeds(to_nd_region(dev->parent), dev);
 	nvdimm_bus_probe_end(nvdimm_bus);
 
 	dev_dbg(_bus->dev, "END: %s.probe(%s) = %d\n", dev->driver->name,
@@ -121,7 +119,6 @@ static int nvdimm_bus_remove(struct device *dev)
 		rc = nd_drv->remove(dev);
 		debug_nvdimm_unlock(dev);
 	}
-	nd_region_disable(nvdimm_bus, dev);
 
 	dev_dbg(_bus->dev, "%s.remove(%s) = %d\n", dev->driver->name,
 			dev_name(dev), rc);
diff --git 

[PATCHv6 2/2] PCI: layerscape: Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC separately

2019-08-13 Thread Xiaowei Bao
Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC separately.

Signed-off-by: Xiaowei Bao 
---
v2:
 - No change.
v3:
 - modify the commit message.
v4:
 - send the patch again with '--to'.
v5:
 - No change.
v6:
 - remove the [EXT] tag of the $SUBJECT in email.

 drivers/pci/controller/dwc/Kconfig  | 20 ++--
 drivers/pci/controller/dwc/Makefile |  3 ++-
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/dwc/Kconfig 
b/drivers/pci/controller/dwc/Kconfig
index 6ea778a..869c645 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -131,13 +131,29 @@ config PCI_KEYSTONE_EP
  DesignWare core functions to implement the driver.
 
 config PCI_LAYERSCAPE
-   bool "Freescale Layerscape PCIe controller"
+   bool "Freescale Layerscape PCIe controller - Host mode"
depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
depends on PCI_MSI_IRQ_DOMAIN
select MFD_SYSCON
select PCIE_DW_HOST
help
- Say Y here if you want PCIe controller support on Layerscape SoCs.
+ Say Y here if you want to enable PCIe controller support on Layerscape
+ SoCs to work in Host mode.
+ This controller can work either as EP or RC. The RCW[HOST_AGT_PEX]
+ determines which PCIe controller works in EP mode and which PCIe
+ controller works in RC mode.
+
+config PCI_LAYERSCAPE_EP
+   bool "Freescale Layerscape PCIe controller - Endpoint mode"
+   depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
+   depends on PCI_ENDPOINT
+   select PCIE_DW_EP
+   help
+ Say Y here if you want to enable PCIe controller support on Layerscape
+ SoCs to work in Endpoint mode.
+ This controller can work either as EP or RC. The RCW[HOST_AGT_PEX]
+ determines which PCIe controller works in EP mode and which PCIe
+ controller works in RC mode.
 
 config PCI_HISI
depends on OF && (ARM64 || COMPILE_TEST)
diff --git a/drivers/pci/controller/dwc/Makefile 
b/drivers/pci/controller/dwc/Makefile
index b085dfd..824fde7 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -8,7 +8,8 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
 obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
 obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone.o
-obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
+obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
+obj-$(CONFIG_PCI_LAYERSCAPE_EP) += pci-layerscape-ep.o
 obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
 obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
 obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
-- 
2.9.5



[PATCHv6 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Xiaowei Bao
The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1
is 32bit, BAR2 and BAR4 is 64bit, this is determined by hardware,
so set the bar_fixed_64bit with 0x14.

Signed-off-by: Xiaowei Bao 
---
v2:
 - Replace value 0x14 with a macro.
v3:
 - No change.
v4:
 - send the patch again with '--to'.
v5:
 - fix the commit message.
v6:
 - remove the [EXT] tag of the $SUBJECT in email.

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index be61d96..ca9aa45 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -44,6 +44,7 @@ static const struct pci_epc_features ls_pcie_epc_features = {
.linkup_notifier = false,
.msi_capable = true,
.msix_capable = false,
+   .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
 };
 
 static const struct pci_epc_features*
-- 
2.9.5



Re: [PATCH 1/2] powerpc: rewrite LOAD_REG_IMMEDIATE() as an intelligent macro

2019-08-13 Thread Paul Mackerras
On Tue, Aug 13, 2019 at 09:59:35AM +, Christophe Leroy wrote:

[snip]

> +.macro __LOAD_REG_IMMEDIATE r, x
> + .if \x & ~0x != 0
> + __LOAD_REG_IMMEDIATE_32 \r, (\x) >> 32
> + rldicr  \r, \r, 32, 31
> + .if (\x) & 0x != 0
> + oris \r, \r, (\x)@__AS_ATHIGH
> + .endif
> + .if (\x) & 0x != 0
> + oris \r, \r, (\x)@l
> + .endif
> + .else
> + __LOAD_REG_IMMEDIATE_32 \r, \x
> + .endif
> +.endm

Doesn't this force all negative constants, even small ones, to use
the long sequence?  For example, __LOAD_REG_IMMEDIATE r3, -1 will
generate (as far as I can see):

li  r3, -1
rldicr  r3, r3, 32, 31
orisr3, r3, 0x
ori r3, r3, 0x

which seems suboptimal.

Paul.


Re: [RFC PATCH] powerpc/64s/radix: introduce option to disable broadcast tlbie

2019-08-13 Thread Michael Ellerman
Hi Nick,

Just a few comments.

Nicholas Piggin  writes:
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
> b/arch/powerpc/mm/book3s64/radix_tlb.c
> index 71f7fede2fa4..56ceecbd3d5c 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -285,6 +286,30 @@ static inline void _tlbie_pid(unsigned long pid, 
> unsigned long ric)
>   asm volatile("eieio; tlbsync; ptesync": : :"memory");
>  }
>  
> +struct tlbiel_pid {
> + unsigned long pid;
> + unsigned long ric;
> +};
> +
> +static void do_tlbiel_pid(void *info)
> +{
> + struct tlbiel_pid *t = info;
> +
> + if (t->ric == RIC_FLUSH_TLB)
> + _tlbiel_pid(t->pid, RIC_FLUSH_TLB);
> + else if (t->ric == RIC_FLUSH_PWC)
> + _tlbiel_pid(t->pid, RIC_FLUSH_PWC);
> + else
> + _tlbiel_pid(t->pid, RIC_FLUSH_ALL);
> +}
> +
> +static inline void _tlbiel_pid_broadcast(const struct cpumask *cpus,
> + unsigned long pid, unsigned long ric)

Can we call these "multicast" instead of "broadcast"?

I think that's more accurate, and avoids confusion with tlbie which
literally does a broadcast (at least architecturally).

> @@ -524,6 +604,12 @@ static bool mm_needs_flush_escalation(struct mm_struct 
> *mm)
>   return false;
>  }
>  
> +static bool tlbie_enabled = true;
> +static bool use_tlbie(void)
> +{
> + return tlbie_enabled;
> +}

No synchronisation, but that's OK. Would probably be good to have
a comment though explaining why.

We could use a static_key but I guess the overhead of a comparison and
branch is in the noise vs the tlbie/tlbiel.

> @@ -1100,3 +1221,13 @@ extern void radix_kvm_prefetch_workaround(struct 
> mm_struct *mm)
>  }
>  EXPORT_SYMBOL_GPL(radix_kvm_prefetch_workaround);
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> +
> +static int __init radix_tlb_setup(void)
> +{
> + debugfs_create_bool("tlbie_enabled", 0600,
> + powerpc_debugfs_root,
> + _enabled);
> +
> + return 0;
> +}
> +arch_initcall(radix_tlb_setup);

For working around hardware bugs we would want a command line parameter
or other boot time way to flip this. But I guess you're saying because
we haven't converted all uses of tlbie we can't really support that
anyway, and so a runtime switch is sufficient?

cheers


[PATCH v1 09/10] powerpc/mm: make __ioremap_caller() common to PPC32 and PPC64

2019-08-13 Thread Christophe Leroy
__ioremap_caller() do the same thing. Define a common one.

__ioremap() is not reused because most of the tests included in
it are unnecessary when coming from __ioremap_caller()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ioremap.c| 99 
 arch/powerpc/mm/pgtable_32.c | 75 -
 arch/powerpc/mm/pgtable_64.c | 61 ---
 3 files changed, 99 insertions(+), 136 deletions(-)

diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 889ee656cf64..537c9148cea1 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -76,6 +76,105 @@ void __iomem *ioremap_prot(phys_addr_t addr, unsigned long 
size, unsigned long f
 }
 EXPORT_SYMBOL(ioremap_prot);
 
+int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size,
+pgprot_t prot, int nid)
+{
+   unsigned long i;
+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);
+
+   if (err) {
+   if (slab_is_available())
+   unmap_kernel_range(ea, size);
+   else
+   WARN_ON_ONCE(1); /* Should clean up */
+   return err;
+   }
+   }
+
+   return 0;
+}
+
+void __iomem *__ioremap_caller(phys_addr_t addr, unsigned long size,
+  pgprot_t prot, void *caller)
+{
+   phys_addr_t pa = addr & PAGE_MASK;
+   int ret;
+   unsigned long va;
+
+   size = PAGE_ALIGN(addr + size) - pa;
+
+#ifdef CONFIG_PPC64
+   /* We don't support the 4K PFN hack with ioremap */
+   if (pgprot_val(prot) & H_PAGE_4K_PFN)
+   return NULL;
+#else
+   /*
+* If the address lies within the first 16 MB, assume it's in ISA
+* memory space
+*/
+   if (pa < SZ_16M)
+   pa += _ISA_MEM_BASE;
+
+#ifndef CONFIG_CRASH_DUMP
+   /*
+* Don't allow anybody to remap normal RAM that we're using.
+* mem_init() sets high_memory so only do the check after that.
+*/
+   if (slab_is_available() && pa <= virt_to_phys(high_memory - 1) &&
+   page_is_ram(__phys_to_pfn(pa))) {
+   pr_err("%s(): phys addr 0x%llx is RAM lr %ps\n", __func__,
+  (unsigned long long)pa, __builtin_return_address(0));
+   return NULL;
+   }
+#endif
+#endif /* CONFIG_PPC64 */
+
+   if (size == 0 || pa == 0)
+   return NULL;
+
+   /*
+* Is it already mapped?  Perhaps overlapped by a previous
+* mapping.
+*/
+   va = p_block_mapped(pa);
+   if (va)
+   return (void __iomem *)va + (addr & ~PAGE_MASK);
+
+   /*
+* Choose an address to map it to.
+* Once the vmalloc system is running, we use it.
+* Before that, we map using addresses going
+* down from ioremap_bot.  vmalloc will use
+* the addresses from IOREMAP_BASE through
+* ioremap_bot
+*
+*/
+   if (slab_is_available()) {
+   struct vm_struct *area;
+
+   area = __get_vm_area_caller(size, VM_IOREMAP, IOREMAP_BASE,
+   ioremap_bot, caller);
+   if (area == NULL)
+   return NULL;
+
+   area->phys_addr = pa;
+   va = (unsigned long)area->addr;
+   } else {
+   ioremap_bot -= size;
+   va = ioremap_bot;
+   }
+   ret = ioremap_range(va, pa, size, prot, NUMA_NO_NODE);
+   if (!ret)
+   return (void __iomem *)va + (addr & ~PAGE_MASK);
+
+   if (!slab_is_available())
+   ioremap_bot += size;
+
+   return NULL;
+}
+
 /*
  * Unmap an IO region and remove it from vmalloc'd list.
  * Access to IO memory should be serialized by driver.
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 4597f45e4dc6..bacf3b85191c 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -35,81 +35,6 @@
 
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
-void __iomem *
-__ioremap_caller(phys_addr_t addr, unsigned long size, pgprot_t prot, void 
*caller)
-{
-   unsigned long v, i;
-   phys_addr_t p;
-   int err;
-
-   /*
-* Choose an address to map it to.
-* Once the vmalloc system is running, we use it.
-* Before then, we use space going down from IOREMAP_TOP
-* (ioremap_bot records where we're up to).
-*/
-   p = addr & PAGE_MASK;
-   size = PAGE_ALIGN(addr + size) - p;
-
-   /*
-* If the address lies within the first 16 MB, assume it's in ISA
-* memory space
-*/
-   if (p < 16*1024*1024)
-   p += _ISA_MEM_BASE;
-
-#ifndef CONFIG_CRASH_DUMP
-   /*
-* Don't 

[PATCH v1 10/10] powerpc/mm: refactor ioremap_range() and use ioremap_page_range()

2019-08-13 Thread Christophe Leroy
book3s64's ioremap_range() is almost same as fallback ioremap_range(),
except that it calls radix__ioremap_range() when radix is enabled.

radix__ioremap_range() is also very similar to the other ones, expect
that it calls ioremap_page_range when slab is available.

Lets keep only one version of ioremap_range() which calls
ioremap_page_range() on all platforms when slab is available.

At the same time, drop the nid parameter which is not used.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/radix.h |  3 ---
 arch/powerpc/mm/book3s64/pgtable.c | 21 -
 arch/powerpc/mm/book3s64/radix_pgtable.c   | 20 
 arch/powerpc/mm/ioremap.c  | 23 +--
 4 files changed, 13 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index e04a839cb5b9..574eca33f893 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -266,9 +266,6 @@ extern void radix__vmemmap_remove_mapping(unsigned long 
start,
 extern int radix__map_kernel_page(unsigned long ea, unsigned long pa,
 pgprot_t flags, unsigned int psz);
 
-extern int radix__ioremap_range(unsigned long ea, phys_addr_t pa,
-   unsigned long size, pgprot_t prot, int nid);
-
 static inline unsigned long radix__get_tree_size(void)
 {
unsigned long rts_field;
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 7d0e0d0d22c4..4c8bed856533 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -446,24 +446,3 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
 
return true;
 }
-
-int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
-{
-   unsigned long i;
-
-   if (radix_enabled())
-   return radix__ioremap_range(ea, pa, size, prot, nid);
-
-   for (i = 0; i < size; i += PAGE_SIZE) {
-   int err = map_kernel_page(ea + i, pa + i, prot);
-   if (err) {
-   if (slab_is_available())
-   unmap_kernel_range(ea, size);
-   else
-   WARN_ON_ONCE(1); /* Should clean up */
-   return err;
-   }
-   }
-
-   return 0;
-}
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 11303e2fffb1..d39edbb07bd1 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1218,26 +1218,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
return 1;
 }
 
-int radix__ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size,
-   pgprot_t prot, int nid)
-{
-   if (likely(slab_is_available())) {
-   int err = ioremap_page_range(ea, ea + size, pa, prot);
-   if (err)
-   unmap_kernel_range(ea, size);
-   return err;
-   } else {
-   unsigned long i;
-
-   for (i = 0; i < size; i += PAGE_SIZE) {
-   int err = map_kernel_page(ea + i, pa + i, prot);
-   if (WARN_ON_ONCE(err)) /* Should clean up */
-   return err;
-   }
-   return 0;
-   }
-}
-
 int __init arch_ioremap_p4d_supported(void)
 {
return 0;
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 537c9148cea1..dc538d7f2467 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -76,21 +76,24 @@ void __iomem *ioremap_prot(phys_addr_t addr, unsigned long 
size, unsigned long f
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size,
-pgprot_t prot, int nid)
+static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size,
+pgprot_t prot)
 {
unsigned long i;
 
+   if (slab_is_available()) {
+   int err = ioremap_page_range(ea, ea + size, pa, prot);
+
+   if (err)
+   unmap_kernel_range(ea, size);
+   return err;
+   }
+
for (i = 0; i < size; i += PAGE_SIZE) {
int err = map_kernel_page(ea + i, pa + i, prot);
 
-   if (err) {
-   if (slab_is_available())
-   unmap_kernel_range(ea, size);
-   else
-   WARN_ON_ONCE(1); /* Should clean up */
+   if (WARN_ON_ONCE(err))  /* Should clean up */
return err;
-   }
}
 
return 0;
@@ -165,7 +168,7 @@ void __iomem *__ioremap_caller(phys_addr_t addr, unsigned 
long size,
   

[PATCH v1 08/10] powerpc/mm: move __ioremap_at() and __iounmap_at() into ioremap.c

2019-08-13 Thread Christophe Leroy
Allthough __ioremap_at() and __iounmap_at() are specific to PPC64,
lets move them into ioremap.c as it wouldn't be worth creating an
ioremap_64.c only for those functions.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ioremap.c| 43 +++
 arch/powerpc/mm/pgtable_64.c | 42 --
 2 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 57d742509cec..889ee656cf64 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -103,3 +103,46 @@ void iounmap(volatile void __iomem *token)
vunmap(addr);
 }
 EXPORT_SYMBOL(iounmap);
+
+#ifdef CONFIG_PPC64
+/**
+ * __ioremap_at - Low level function to establish the page tables
+ *for an IO mapping
+ */
+void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
+{
+   /* We don't support the 4K PFN hack with ioremap */
+   if (pgprot_val(prot) & H_PAGE_4K_PFN)
+   return NULL;
+
+   if ((ea + size) >= (void *)IOREMAP_END) {
+   pr_warn("Outside the supported range\n");
+   return NULL;
+   }
+
+   WARN_ON(pa & ~PAGE_MASK);
+   WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
+   WARN_ON(size & ~PAGE_MASK);
+
+   if (ioremap_range((unsigned long)ea, pa, size, prot, NUMA_NO_NODE))
+   return NULL;
+
+   return (void __iomem *)ea;
+}
+EXPORT_SYMBOL(__ioremap_at);
+
+/**
+ * __iounmap_from - Low level function to tear down the page tables
+ *  for an IO mapping. This is used for mappings that
+ *  are manipulated manually, like partial unmapping of
+ *  PCI IOs or ISA space.
+ */
+void __iounmap_at(void *ea, unsigned long size)
+{
+   WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
+   WARN_ON(size & ~PAGE_MASK);
+
+   unmap_kernel_range((unsigned long)ea, size);
+}
+EXPORT_SYMBOL(__iounmap_at);
+#endif
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index b50a53a0a42b..32220f7381d7 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -119,45 +119,6 @@ int __weak ioremap_range(unsigned long ea, phys_addr_t pa, 
unsigned long size, p
return 0;
 }
 
-/**
- * __ioremap_at - Low level function to establish the page tables
- *for an IO mapping
- */
-void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
-{
-   /* We don't support the 4K PFN hack with ioremap */
-   if (pgprot_val(prot) & H_PAGE_4K_PFN)
-   return NULL;
-
-   if ((ea + size) >= (void *)IOREMAP_END) {
-   pr_warn("Outside the supported range\n");
-   return NULL;
-   }
-
-   WARN_ON(pa & ~PAGE_MASK);
-   WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
-   WARN_ON(size & ~PAGE_MASK);
-
-   if (ioremap_range((unsigned long)ea, pa, size, prot, NUMA_NO_NODE))
-   return NULL;
-
-   return (void __iomem *)ea;
-}
-
-/**
- * __iounmap_from - Low level function to tear down the page tables
- *  for an IO mapping. This is used for mappings that
- *  are manipulated manually, like partial unmapping of
- *  PCI IOs or ISA space.
- */
-void __iounmap_at(void *ea, unsigned long size)
-{
-   WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
-   WARN_ON(size & ~PAGE_MASK);
-
-   unmap_kernel_range((unsigned long)ea, size);
-}
-
 void __iomem * __ioremap_caller(phys_addr_t addr, unsigned long size,
pgprot_t prot, void *caller)
 {
@@ -201,9 +162,6 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned 
long size,
return ret;
 }
 
-EXPORT_SYMBOL(__ioremap_at);
-EXPORT_SYMBOL(__iounmap_at);
-
 #ifndef __PAGETABLE_PUD_FOLDED
 /* 4 level page table */
 struct page *pgd_page(pgd_t pgd)
-- 
2.13.3



[PATCH v1 07/10] powerpc/mm: move iounmap() into ioremap.c and drop __iounmap()

2019-08-13 Thread Christophe Leroy
On PPC64 iounmap() does nothing else than calling __iounmap()
and is the only user of __iounmap().
__iounmap() is almost similar to PPC32 iounmap().

Lets define a common iounmap() and drop __iounmap().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  2 ++
 arch/powerpc/include/asm/io.h|  5 -
 arch/powerpc/include/asm/nohash/32/pgtable.h |  2 ++
 arch/powerpc/mm/ioremap.c| 31 
 arch/powerpc/mm/pgtable_32.c | 14 -
 arch/powerpc/mm/pgtable_64.c | 28 -
 6 files changed, 35 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index aa1bc5f8da90..af34554d19e8 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -165,6 +165,8 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot);
 #define IOREMAP_TOPKVIRT_TOP
 #endif
 
+#define IOREMAP_BASE   VMALLOC_START
+
 /*
  * Just any arbitrary offset to the start of the vmalloc VM area: the
  * current 16MB value just means that there will be a 64MB "hole" after the
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 23e5d5d16c7e..02d6256fe1ea 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -712,9 +712,6 @@ static inline void iosync(void)
  * * __ioremap_caller is the same as above but takes an explicit caller
  *   reference rather than using __builtin_return_address(0)
  *
- * * __iounmap, is the low level implementation used by iounmap and cannot
- *   be hooked (but can be used by a hook on iounmap)
- *
  */
 extern void __iomem *ioremap(phys_addr_t address, unsigned long size);
 extern void __iomem *ioremap_prot(phys_addr_t address, unsigned long size,
@@ -734,8 +731,6 @@ extern void __iomem *__ioremap(phys_addr_t, unsigned long 
size,
 extern void __iomem *__ioremap_caller(phys_addr_t, unsigned long size,
  pgprot_t prot, void *caller);
 
-extern void __iounmap(volatile void __iomem *addr);
-
 extern void __iomem * __ioremap_at(phys_addr_t pa, void *ea,
   unsigned long size, pgprot_t prot);
 extern void __iounmap_at(void *ea, unsigned long size);
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 7ce2a7c9fade..09f2739ab556 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -93,6 +93,8 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot);
 #define IOREMAP_TOPKVIRT_TOP
 #endif
 
+#define IOREMAP_BASE   VMALLOC_START
+
 /*
  * Just any arbitrary offset to the start of the vmalloc VM area: the
  * current 16MB value just means that there will be a 64MB "hole" after the
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 0c23660522ca..57d742509cec 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -1,7 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 
 #include 
+#include 
+#include 
 #include 
+#include 
 
 unsigned long ioremap_bot;
 EXPORT_SYMBOL(ioremap_bot);
@@ -72,3 +75,31 @@ void __iomem *ioremap_prot(phys_addr_t addr, unsigned long 
size, unsigned long f
return __ioremap_caller(addr, size, pte_pgprot(pte), caller);
 }
 EXPORT_SYMBOL(ioremap_prot);
+
+/*
+ * Unmap an IO region and remove it from vmalloc'd list.
+ * Access to IO memory should be serialized by driver.
+ */
+void iounmap(volatile void __iomem *token)
+{
+   void *addr;
+
+   /*
+* If mapped by BATs then there is nothing to do.
+*/
+   if (v_block_mapped((unsigned long)token))
+   return;
+
+   if (!slab_is_available())
+   return;
+
+   addr = (void *)((unsigned long __force)PCI_FIX_ADDR(token) & PAGE_MASK);
+   if (WARN_ON((unsigned long)addr < IOREMAP_BASE))
+   return;
+   if ((unsigned long)addr >= ioremap_bot) {
+   pr_warn("Attempt to %s early bolted mapping at 0x%p\n", 
__func__, addr);
+   return;
+   }
+   vunmap(addr);
+}
+EXPORT_SYMBOL(iounmap);
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 7efdb1dee19b..4597f45e4dc6 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -110,20 +110,6 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, 
pgprot_t prot, void *call
return (void __iomem *) (v + ((unsigned long)addr & ~PAGE_MASK));
 }
 
-void iounmap(volatile void __iomem *addr)
-{
-   /*
-* If mapped by BATs then there is nothing to do.
-* Calling vfree() generates a benign warning.
-*/
-   if (v_block_mapped((unsigned long)addr))
-   return;
-
-   if (addr > high_memory && (unsigned long) addr < 

[PATCH v1 06/10] powerpc/mm: make ioremap_bot common to all

2019-08-13 Thread Christophe Leroy
Drop multiple definitions of ioremap_bot and make one common to
all subarches.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 2 --
 arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
 arch/powerpc/include/asm/nohash/32/pgtable.h | 2 --
 arch/powerpc/include/asm/pgtable.h   | 2 ++
 arch/powerpc/mm/ioremap.c| 3 +++
 arch/powerpc/mm/mmu_decl.h   | 1 -
 arch/powerpc/mm/nohash/tlb.c | 2 ++
 arch/powerpc/mm/pgtable_32.c | 3 ---
 arch/powerpc/mm/pgtable_64.c | 3 ---
 9 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 838de59f6754..aa1bc5f8da90 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -201,8 +201,6 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot);
 #include 
 #include 
 
-extern unsigned long ioremap_bot;
-
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
 
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8308f32e9782..11819e3c755e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -289,7 +289,6 @@ extern unsigned long __kernel_io_end;
 #define KERN_IO_END __kernel_io_end
 
 extern struct page *vmemmap;
-extern unsigned long ioremap_bot;
 extern unsigned long pci_io_base;
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 0284f8f5305f..7ce2a7c9fade 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -11,8 +11,6 @@
 #include/* For sub-arch specific PPC_PIN_SIZE */
 #include 
 
-extern unsigned long ioremap_bot;
-
 #ifdef CONFIG_44x
 extern int icache_44x_need_flush;
 #endif
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index c58ba7963688..c54bb68c1354 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -68,6 +68,8 @@ extern pgd_t swapper_pg_dir[];
 
 extern void paging_init(void);
 
+extern unsigned long ioremap_bot;
+
 /*
  * kern_addr_valid is intended to indicate whether an address is a valid
  * kernel address.  Most 32-bit archs define it as always true (like this)
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index a44d9e4c948a..0c23660522ca 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -3,6 +3,9 @@
 #include 
 #include 
 
+unsigned long ioremap_bot;
+EXPORT_SYMBOL(ioremap_bot);
+
 void __iomem *__ioremap(phys_addr_t addr, unsigned long size, unsigned long 
flags)
 {
return __ioremap_caller(addr, size, __pgprot(flags), 
__builtin_return_address(0));
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 32c1a191c28a..6ee64d5e2824 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -108,7 +108,6 @@ extern u8 early_hash[];
 
 #endif /* CONFIG_PPC32 */
 
-extern unsigned long ioremap_bot;
 extern unsigned long __max_low_memory;
 extern phys_addr_t __initial_memory_limit_addr;
 extern phys_addr_t total_memory;
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index d4acf6fa0596..350a54f70a37 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -704,6 +704,8 @@ static void __init early_init_mmu_global(void)
 * for use by the TLB miss code
 */
linear_map_top = memblock_end_of_DRAM();
+
+   ioremap_bot = IOREMAP_END;
 }
 
 static void __init early_mmu_set_memory_limit(void)
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 8126c2d1afbf..7efdb1dee19b 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -33,9 +33,6 @@
 
 #include 
 
-unsigned long ioremap_bot;
-EXPORT_SYMBOL(ioremap_bot);/* aka VMALLOC_END */
-
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
 void __iomem *
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 0f0b1e1ea5ab..d631659c8859 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -99,9 +99,6 @@ unsigned long __pte_frag_nr;
 EXPORT_SYMBOL(__pte_frag_nr);
 unsigned long __pte_frag_size_shift;
 EXPORT_SYMBOL(__pte_frag_size_shift);
-unsigned long ioremap_bot;
-#else /* !CONFIG_PPC_BOOK3S_64 */
-unsigned long ioremap_bot = IOREMAP_END;
 #endif
 
 int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
-- 
2.13.3



[PATCH v1 05/10] powerpc/mm: Do early ioremaps from top to bottom on PPC64 too.

2019-08-13 Thread Christophe Leroy
Until vmalloc system is up and running, ioremap basically
allocates addresses at the border of the IOREMAP area.

On PPC32, addresses are allocated down from the top of the area
while on PPC64, addresses are allocated up from the base of the
area.

On PPC32, the base of vmalloc area is not known yet when ioremap()
starts to be used, while the end of it is fixed. On PPC64, both the
start and the end are already fixed when ioremap() starts to being
used.

Changing PPC64 behaviour is the lighest change, so change PPC64
ioremap() to allocate addresses from the top as PPC32 does.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s64/hash_utils.c|  2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c |  2 +-
 arch/powerpc/mm/pgtable_64.c | 18 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index e6d471058597..0f954dc40346 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1030,7 +1030,7 @@ void __init hash__early_init_mmu(void)
__kernel_io_start = H_KERN_IO_START;
__kernel_io_end = H_KERN_IO_END;
vmemmap = (struct page *)H_VMEMMAP_START;
-   ioremap_bot = IOREMAP_BASE;
+   ioremap_bot = IOREMAP_END;
 
 #ifdef CONFIG_PCI
pci_io_base = ISA_IO_BASE;
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index b4ca9e95e678..11303e2fffb1 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -611,7 +611,7 @@ void __init radix__early_init_mmu(void)
__kernel_io_start = RADIX_KERN_IO_START;
__kernel_io_end = RADIX_KERN_IO_END;
vmemmap = (struct page *)RADIX_VMEMMAP_START;
-   ioremap_bot = IOREMAP_BASE;
+   ioremap_bot = IOREMAP_END;
 
 #ifdef CONFIG_PCI
pci_io_base = ISA_IO_BASE;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 6fa2e969bf0e..0f0b1e1ea5ab 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -101,7 +101,7 @@ unsigned long __pte_frag_size_shift;
 EXPORT_SYMBOL(__pte_frag_size_shift);
 unsigned long ioremap_bot;
 #else /* !CONFIG_PPC_BOOK3S_64 */
-unsigned long ioremap_bot = IOREMAP_BASE;
+unsigned long ioremap_bot = IOREMAP_END;
 #endif
 
 int __weak ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, 
pgprot_t prot, int nid)
@@ -169,11 +169,11 @@ void __iomem * __ioremap_caller(phys_addr_t addr, 
unsigned long size,
 
/*
 * Choose an address to map it to.
-* Once the imalloc system is running, we use it.
+* Once the vmalloc system is running, we use it.
 * Before that, we map using addresses going
-* up from ioremap_bot.  imalloc will use
-* the addresses from ioremap_bot through
-* IMALLOC_END
+* down from ioremap_bot.  vmalloc will use
+* the addresses from IOREMAP_BASE through
+* ioremap_bot
 * 
 */
paligned = addr & PAGE_MASK;
@@ -186,7 +186,7 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned 
long size,
struct vm_struct *area;
 
area = __get_vm_area_caller(size, VM_IOREMAP,
-   ioremap_bot, IOREMAP_END,
+   IOREMAP_BASE, ioremap_bot,
caller);
if (area == NULL)
return NULL;
@@ -194,9 +194,9 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned 
long size,
area->phys_addr = paligned;
ret = __ioremap_at(paligned, area->addr, size, prot);
} else {
-   ret = __ioremap_at(paligned, (void *)ioremap_bot, size, prot);
+   ret = __ioremap_at(paligned, (void *)ioremap_bot - size, size, 
prot);
if (ret)
-   ioremap_bot += size;
+   ioremap_bot -= size;
}
 
if (ret)
@@ -217,7 +217,7 @@ void __iounmap(volatile void __iomem *token)

addr = (void *) ((unsigned long __force)
 PCI_FIX_ADDR(token) & PAGE_MASK);
-   if ((unsigned long)addr < ioremap_bot) {
+   if ((unsigned long)addr >= ioremap_bot) {
printk(KERN_WARNING "Attempt to iounmap early bolted mapping"
   " at 0x%p\n", addr);
return;
-- 
2.13.3



[PATCH v1 02/10] powerpc/mm: rework io-workaround invocation.

2019-08-13 Thread Christophe Leroy
ppc_md.ioremap() is only used for I/O workaround on CELL platform,
so indirect function call can be avoided.

This patch reworks the io-workaround and ioremap() functions to
use static keys for the activation of io-workaround.

When CONFIG_PPC_IO_WORKAROUNDS or CONFIG_PPC_INDIRECT_MMIO are not
selected, the I/O workaround ioremap() voids and the static key is
not used at all.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/io-workarounds.h | 19 +++
 arch/powerpc/include/asm/machdep.h|  2 --
 arch/powerpc/kernel/io-workarounds.c  | 11 ++-
 arch/powerpc/mm/pgtable_64.c  | 17 +
 4 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/io-workarounds.h 
b/arch/powerpc/include/asm/io-workarounds.h
index 01567ea4ceaf..ce337d17ac40 100644
--- a/arch/powerpc/include/asm/io-workarounds.h
+++ b/arch/powerpc/include/asm/io-workarounds.h
@@ -8,6 +8,7 @@
 #ifndef _IO_WORKAROUNDS_H
 #define _IO_WORKAROUNDS_H
 
+#ifdef CONFIG_PPC_IO_WORKAROUNDS
 #include 
 #include 
 
@@ -32,4 +33,22 @@ extern int spiderpci_iowa_init(struct iowa_bus *, void *);
 #define SPIDER_PCI_DUMMY_READ  0x0810
 #define SPIDER_PCI_DUMMY_READ_BASE 0x0814
 
+#endif
+
+#if defined(CONFIG_PPC_IO_WORKAROUNDS) && defined(CONFIG_PPC_INDIRECT_MMIO)
+DECLARE_STATIC_KEY_FALSE(iowa_key);
+static inline bool iowa_is_active(void)
+{
+   return static_branch_unlikely(_key);
+}
+#else
+static inline bool iowa_is_active(void)
+{
+   return false;
+}
+#endif
+
+void __iomem *iowa_ioremap(phys_addr_t addr, unsigned long size,
+  pgprot_t prot, void *caller);
+
 #endif /* _IO_WORKAROUNDS_H */
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 3370df4bdaa0..657ec893bdcb 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -31,8 +31,6 @@ struct pci_host_bridge;
 struct machdep_calls {
char*name;
 #ifdef CONFIG_PPC64
-   void __iomem *  (*ioremap)(phys_addr_t addr, unsigned long size,
-  pgprot_t prot, void *caller);
 #ifdef CONFIG_PM
void(*iommu_save)(void);
void(*iommu_restore)(void);
diff --git a/arch/powerpc/kernel/io-workarounds.c 
b/arch/powerpc/kernel/io-workarounds.c
index fbd2d0007c52..8b5b2aa70840 100644
--- a/arch/powerpc/kernel/io-workarounds.c
+++ b/arch/powerpc/kernel/io-workarounds.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 
+DEFINE_STATIC_KEY_FALSE(iowa_key);
 
 #define IOWA_MAX_BUS   8
 
@@ -149,8 +150,8 @@ static const struct ppc_pci_io iowa_pci_io = {
 };
 
 #ifdef CONFIG_PPC_INDIRECT_MMIO
-static void __iomem *iowa_ioremap(phys_addr_t addr, unsigned long size,
- pgprot_t prot, void *caller)
+void __iomem *iowa_ioremap(phys_addr_t addr, unsigned long size,
+  pgprot_t prot, void *caller)
 {
struct iowa_bus *bus;
void __iomem *res = __ioremap_caller(addr, size, prot, caller);
@@ -163,8 +164,6 @@ static void __iomem *iowa_ioremap(phys_addr_t addr, 
unsigned long size,
}
return res;
 }
-#else /* CONFIG_PPC_INDIRECT_MMIO */
-#define iowa_ioremap NULL
 #endif /* !CONFIG_PPC_INDIRECT_MMIO */
 
 /* Enable IO workaround */
@@ -175,7 +174,9 @@ static void io_workaround_init(void)
if (io_workaround_inited)
return;
ppc_pci_io = iowa_pci_io;
-   ppc_md.ioremap = iowa_ioremap;
+#ifdef CONFIG_PPC_INDIRECT_MMIO
+   static_branch_enable(_key);
+#endif
io_workaround_inited = 1;
 }
 
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 11eb90ea2d4f..194efc6f39fb 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -214,8 +215,8 @@ void __iomem * ioremap(phys_addr_t addr, unsigned long size)
pgprot_t prot = pgprot_noncached(PAGE_KERNEL);
void *caller = __builtin_return_address(0);
 
-   if (ppc_md.ioremap)
-   return ppc_md.ioremap(addr, size, prot, caller);
+   if (iowa_is_active())
+   return iowa_ioremap(addr, size, prot, caller);
return __ioremap_caller(addr, size, prot, caller);
 }
 
@@ -224,8 +225,8 @@ void __iomem * ioremap_wc(phys_addr_t addr, unsigned long 
size)
pgprot_t prot = pgprot_noncached_wc(PAGE_KERNEL);
void *caller = __builtin_return_address(0);
 
-   if (ppc_md.ioremap)
-   return ppc_md.ioremap(addr, size, prot, caller);
+   if (iowa_is_active())
+   return iowa_ioremap(addr, size, prot, caller);
return __ioremap_caller(addr, size, prot, caller);
 }
 
@@ -234,8 +235,8 @@ void __iomem *ioremap_coherent(phys_addr_t addr, unsigned 
long size)
pgprot_t prot = pgprot_cached(PAGE_KERNEL);
void *caller = 

[PATCH v1 01/10] powerpc/mm: drop ppc_md.iounmap()

2019-08-13 Thread Christophe Leroy
ppc_md.iounmap() is never set, drop it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/machdep.h | 2 --
 arch/powerpc/mm/pgtable_64.c   | 5 +
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index c43d6eca9edd..3370df4bdaa0 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -33,8 +33,6 @@ struct machdep_calls {
 #ifdef CONFIG_PPC64
void __iomem *  (*ioremap)(phys_addr_t addr, unsigned long size,
   pgprot_t prot, void *caller);
-   void(*iounmap)(volatile void __iomem *token);
-
 #ifdef CONFIG_PM
void(*iommu_save)(void);
void(*iommu_restore)(void);
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 9ad59b733984..11eb90ea2d4f 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -285,10 +285,7 @@ void __iounmap(volatile void __iomem *token)
 
 void iounmap(volatile void __iomem *token)
 {
-   if (ppc_md.iounmap)
-   ppc_md.iounmap(token);
-   else
-   __iounmap(token);
+   __iounmap(token);
 }
 
 EXPORT_SYMBOL(ioremap);
-- 
2.13.3



[PATCH v1 03/10] powerpc/mm: move common 32/64 bits ioremap functions into ioremap.c

2019-08-13 Thread Christophe Leroy
ioremap(), __ioremap(), ioremap_wc() and ioremap_coherent() are
now identical on PPC32 and PPC64 as iowa_is_active() will always
return false on PPC32. Move them into a new common location called
ioremap.c

Allthough ioremap_wt() only exists on PPC32, move it into ioremap.c
as well. As it is the only one specific to PPC32, it is not worth
creating an ioremap_32.c file and leaving it in pgtable_32.c would
make it the only ioremap function in that file at the end of the
series.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/Makefile |  2 +-
 arch/powerpc/mm/ioremap.c| 52 
 arch/powerpc/mm/pgtable_32.c | 43 
 arch/powerpc/mm/pgtable_64.c | 39 -
 4 files changed, 53 insertions(+), 83 deletions(-)
 create mode 100644 arch/powerpc/mm/ioremap.c

diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 0f499db315d6..29c682fe9144 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -7,7 +7,7 @@ ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
 
 obj-y  := fault.o mem.o pgtable.o mmap.o \
   init_$(BITS).o pgtable_$(BITS).o \
-  pgtable-frag.o \
+  pgtable-frag.o ioremap.o \
   init-common.o mmu_context.o drmem.o
 obj-$(CONFIG_PPC_MMU_NOHASH)   += nohash/
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s32/
diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
new file mode 100644
index ..89479ee88344
--- /dev/null
+++ b/arch/powerpc/mm/ioremap.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include 
+#include 
+
+void __iomem *__ioremap(phys_addr_t addr, unsigned long size, unsigned long 
flags)
+{
+   return __ioremap_caller(addr, size, __pgprot(flags), 
__builtin_return_address(0));
+}
+EXPORT_SYMBOL(__ioremap);
+
+void __iomem *ioremap(phys_addr_t addr, unsigned long size)
+{
+   pgprot_t prot = pgprot_noncached(PAGE_KERNEL);
+   void *caller = __builtin_return_address(0);
+
+   if (iowa_is_active())
+   return iowa_ioremap(addr, size, prot, caller);
+   return __ioremap_caller(addr, size, prot, caller);
+}
+EXPORT_SYMBOL(ioremap);
+
+void __iomem *ioremap_wc(phys_addr_t addr, unsigned long size)
+{
+   pgprot_t prot = pgprot_noncached_wc(PAGE_KERNEL);
+   void *caller = __builtin_return_address(0);
+
+   if (iowa_is_active())
+   return iowa_ioremap(addr, size, prot, caller);
+   return __ioremap_caller(addr, size, prot, caller);
+}
+EXPORT_SYMBOL(ioremap_wc);
+
+#ifdef CONFIG_PPC32
+void __iomem *ioremap_wt(phys_addr_t addr, unsigned long size)
+{
+   pgprot_t prot = pgprot_cached_wthru(PAGE_KERNEL);
+
+   return __ioremap_caller(addr, size, prot, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(ioremap_wt);
+#endif
+
+void __iomem *ioremap_coherent(phys_addr_t addr, unsigned long size)
+{
+   pgprot_t prot = pgprot_cached(PAGE_KERNEL);
+   void *caller = __builtin_return_address(0);
+
+   if (iowa_is_active())
+   return iowa_ioremap(addr, size, prot, caller);
+   return __ioremap_caller(addr, size, prot, caller);
+}
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 35cb96cfc258..1999ec11706d 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -39,42 +39,6 @@ EXPORT_SYMBOL(ioremap_bot);  /* aka VMALLOC_END */
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
 void __iomem *
-ioremap(phys_addr_t addr, unsigned long size)
-{
-   pgprot_t prot = pgprot_noncached(PAGE_KERNEL);
-
-   return __ioremap_caller(addr, size, prot, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap);
-
-void __iomem *
-ioremap_wc(phys_addr_t addr, unsigned long size)
-{
-   pgprot_t prot = pgprot_noncached_wc(PAGE_KERNEL);
-
-   return __ioremap_caller(addr, size, prot, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_wc);
-
-void __iomem *
-ioremap_wt(phys_addr_t addr, unsigned long size)
-{
-   pgprot_t prot = pgprot_cached_wthru(PAGE_KERNEL);
-
-   return __ioremap_caller(addr, size, prot, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_wt);
-
-void __iomem *
-ioremap_coherent(phys_addr_t addr, unsigned long size)
-{
-   pgprot_t prot = pgprot_cached(PAGE_KERNEL);
-
-   return __ioremap_caller(addr, size, prot, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_coherent);
-
-void __iomem *
 ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long flags)
 {
pte_t pte = __pte(flags);
@@ -92,12 +56,6 @@ ioremap_prot(phys_addr_t addr, unsigned long size, unsigned 
long flags)
 EXPORT_SYMBOL(ioremap_prot);
 
 void __iomem *
-__ioremap(phys_addr_t addr, unsigned long size, unsigned long flags)
-{
-   return __ioremap_caller(addr, size, __pgprot(flags), 

[PATCH v1 04/10] powerpc/mm: move ioremap_prot() into ioremap.c

2019-08-13 Thread Christophe Leroy
Both ioremap_prot() are idenfical, move them into ioremap.c

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ioremap.c| 19 +++
 arch/powerpc/mm/pgtable_32.c | 17 -
 arch/powerpc/mm/pgtable_64.c | 24 
 3 files changed, 19 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
index 89479ee88344..a44d9e4c948a 100644
--- a/arch/powerpc/mm/ioremap.c
+++ b/arch/powerpc/mm/ioremap.c
@@ -50,3 +50,22 @@ void __iomem *ioremap_coherent(phys_addr_t addr, unsigned 
long size)
return iowa_ioremap(addr, size, prot, caller);
return __ioremap_caller(addr, size, prot, caller);
 }
+
+void __iomem *ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long 
flags)
+{
+   pte_t pte = __pte(flags);
+   void *caller = __builtin_return_address(0);
+
+   /* writeable implies dirty for kernel addresses */
+   if (pte_write(pte))
+   pte = pte_mkdirty(pte);
+
+   /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
+   pte = pte_exprotect(pte);
+   pte = pte_mkprivileged(pte);
+
+   if (iowa_is_active())
+   return iowa_ioremap(addr, size, pte_pgprot(pte), caller);
+   return __ioremap_caller(addr, size, pte_pgprot(pte), caller);
+}
+EXPORT_SYMBOL(ioremap_prot);
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 1999ec11706d..8126c2d1afbf 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -39,23 +39,6 @@ EXPORT_SYMBOL(ioremap_bot);  /* aka VMALLOC_END */
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
 void __iomem *
-ioremap_prot(phys_addr_t addr, unsigned long size, unsigned long flags)
-{
-   pte_t pte = __pte(flags);
-
-   /* writeable implies dirty for kernel addresses */
-   if (pte_write(pte))
-   pte = pte_mkdirty(pte);
-
-   /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
-   pte = pte_exprotect(pte);
-   pte = pte_mkprivileged(pte);
-
-   return __ioremap_caller(addr, size, pte_pgprot(pte), 
__builtin_return_address(0));
-}
-EXPORT_SYMBOL(ioremap_prot);
-
-void __iomem *
 __ioremap_caller(phys_addr_t addr, unsigned long size, pgprot_t prot, void 
*caller)
 {
unsigned long v, i;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 3ad921ac4862..6fa2e969bf0e 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -204,29 +204,6 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned 
long size,
return ret;
 }
 
-void __iomem * ioremap_prot(phys_addr_t addr, unsigned long size,
-unsigned long flags)
-{
-   pte_t pte = __pte(flags);
-   void *caller = __builtin_return_address(0);
-
-   /* writeable implies dirty for kernel addresses */
-   if (pte_write(pte))
-   pte = pte_mkdirty(pte);
-
-   /* we don't want to let _PAGE_EXEC leak out */
-   pte = pte_exprotect(pte);
-   /*
-* Force kernel mapping.
-*/
-   pte = pte_mkprivileged(pte);
-
-   if (iowa_is_active())
-   return iowa_ioremap(addr, size, pte_pgprot(pte), caller);
-   return __ioremap_caller(addr, size, pte_pgprot(pte), caller);
-}
-
-
 /*  
  * Unmap an IO region and remove it from imalloc'd list.
  * Access to IO memory should be serialized by driver.
@@ -253,7 +230,6 @@ void iounmap(volatile void __iomem *token)
__iounmap(token);
 }
 
-EXPORT_SYMBOL(ioremap_prot);
 EXPORT_SYMBOL(__ioremap_at);
 EXPORT_SYMBOL(iounmap);
 EXPORT_SYMBOL(__iounmap);
-- 
2.13.3



Re: [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states

2019-08-13 Thread Nathan Lynch
Gautham R Shenoy  writes:

> On Sat, Aug 3, 2019 at 1:03 AM Nathan Lynch  wrote:
>>
>> rtas_cpu_state_change_mask() potentially operates on scores of cpus,
>> so explicitly allow rescheduling in the loop body.
>>
>
> Are we seeing softlockups/rcu stalls while running this ?

I have not seen a report yet, but since the loop is bound only by the
number of processors in the LPAR I suspect it's only a matter of time.

> Reviewed-by: Gautham R. Shenoy 

Thanks!


Re: [PATCH v2 2/3] powerpc/rtas: allow rescheduling while changing cpu states

2019-08-13 Thread Gautham R Shenoy
On Sat, Aug 3, 2019 at 1:03 AM Nathan Lynch  wrote:
>
> rtas_cpu_state_change_mask() potentially operates on scores of cpus,
> so explicitly allow rescheduling in the loop body.
>

Are we seeing softlockups/rcu stalls while running this ?

> Signed-off-by: Nathan Lynch 

Reviewed-by: Gautham R. Shenoy 

> ---
>  arch/powerpc/kernel/rtas.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 05824eb4323b..b7ca2fde68a9 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -898,6 +899,7 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state 
> state,
> cpumask_clear_cpu(cpu, cpus);
> }
> }
> +   cond_resched();
> }
>
> return ret;
> --
> 2.20.1
>


-- 
Thanks and Regards
gautham.


[RFC PATCH] bpf: handle 32-bit zext during constant blinding

2019-08-13 Thread Naveen N. Rao
Since BPF constant blinding is performed after the verifier pass, there
are certain ALU32 instructions inserted which don't have a corresponding
zext instruction inserted after. This is causing a kernel oops on
powerpc and can be reproduced by running 'test_cgroup_storage' with
bpf_jit_harden=2.

Fix this by emitting BPF_ZEXT during constant blinding if
prog->aux->verifier_zext is set.

Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to 
analysis result")
Reported-by: Michael Ellerman 
Signed-off-by: Naveen N. Rao 
---
This approach (the location where zext is being introduced below, in 
particular) works for powerpc, but I am not entirely sure if this is 
sufficient for other architectures as well. This is broken on v5.3-rc4.

- Naveen


 kernel/bpf/core.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8191a7db2777..d84146e6fd9e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -890,7 +890,8 @@ int bpf_jit_get_func_addr(const struct bpf_prog *prog,
 
 static int bpf_jit_blind_insn(const struct bpf_insn *from,
  const struct bpf_insn *aux,
- struct bpf_insn *to_buff)
+ struct bpf_insn *to_buff,
+ bool emit_zext)
 {
struct bpf_insn *to = to_buff;
u32 imm_rnd = get_random_int();
@@ -939,6 +940,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
*to++ = BPF_ALU32_REG(from->code, from->dst_reg, BPF_REG_AX);
+   if (emit_zext)
+   *to++ = BPF_ZEXT_REG(from->dst_reg);
break;
 
case BPF_ALU64 | BPF_ADD | BPF_K:
@@ -992,6 +995,10 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
off -= 2;
*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
+   if (emit_zext) {
+   *to++ = BPF_ZEXT_REG(BPF_REG_AX);
+   off--;
+   }
*to++ = BPF_JMP32_REG(from->code, from->dst_reg, BPF_REG_AX,
  off);
break;
@@ -1005,6 +1012,8 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
case 0: /* Part 2 of BPF_LD | BPF_IMM | BPF_DW. */
*to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ 
aux[0].imm);
*to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
+   if (emit_zext)
+   *to++ = BPF_ZEXT_REG(BPF_REG_AX);
*to++ = BPF_ALU64_REG(BPF_OR,  aux[0].dst_reg, BPF_REG_AX);
break;
 
@@ -1088,7 +1097,8 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog 
*prog)
insn[1].code == 0)
memcpy(aux, insn, sizeof(aux));
 
-   rewritten = bpf_jit_blind_insn(insn, aux, insn_buff);
+   rewritten = bpf_jit_blind_insn(insn, aux, insn_buff,
+   clone->aux->verifier_zext);
if (!rewritten)
continue;
 
-- 
2.22.0



Re: [PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM

2019-08-13 Thread Gautham R Shenoy
Hello Nathan,

On Sat, Aug 3, 2019 at 1:06 AM Nathan Lynch  wrote:
>
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
>
> 1. Set cpu 7 offline via sysfs.
>
> 2. Begin a partition migration, whose implementation requires the OS
>to ensure all present cpus are online; cpu 7 is onlined:
>
>  rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
>
>This sets cpu 7 online in all respects except for the cpu's
>corresponding struct device; dev->offline remains true.
>
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
>already online and returns success. The driver core (device_online)
>sets dev->offline = false.
>
> 4. The migration completes and restores cpu 7 to offline state:
>
>  rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
>
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
>
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
>
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to 
> migration/hibernation")
> Signed-off-by: Nathan Lynch 

Looks good to me. This locking scheme makes the code consistent with
dlpar_cpu() which also uses the  high-level device APIs.

Reviewed-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/kernel/rtas.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 5faf0a64c92b..05824eb4323b 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum 
> rtas_cpu_state state,
> return 0;
>
> for_each_cpu(cpu, cpus) {
> +   struct device *dev = get_cpu_device(cpu);
> +
> switch (state) {
> case DOWN:
> -   cpuret = cpu_down(cpu);
> +   cpuret = device_offline(dev);
> break;
> case UP:
> -   cpuret = cpu_up(cpu);
> +   cpuret = device_online(dev);
> break;
> }
> -   if (cpuret) {
> +   if (cpuret < 0) {
> pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
> __func__,
> ((state == UP) ? "up" : "down"),
> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
> data.token = rtas_token("ibm,suspend-me");
> data.complete = 
>
> +   lock_device_hotplug();
> +
> /* All present CPUs must be online */
> cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
> cpuret = rtas_online_cpus_mask(offline_mask);
> @@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle)
> __func__);
>
>  out:
> +   unlock_device_hotplug();
> free_cpumask_var(offline_mask);
> return atomic_read();
>  }
> --
> 2.20.1
>


-- 
Thanks and Regards
gautham.


[REGRESSION] Boot failure with DEBUG_PAGEALLOC on Wii, after PPC32 KASAN patches

2019-08-13 Thread Jonathan Neuschäfer
Hi,

I noticed that my Nintendo Wii doesn't boot with wii_defconfig plus
CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
on recent kernels. I get a splash like this one:

[0.022245] BUG: Unable to handle kernel data access at 0x6601
[0.025172] Faulting instruction address: 0xc01afa48
[0.027522] Oops: Kernel access of bad area, sig: 11 [#1]
[0.030076] BE PAGE_SIZE=4K MMU=Hash PREEMPT DEBUG_PAGEALLOC wii
[0.032917] Modules linked in:
[0.034368] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.1.0-rc3-wii-00151-g9a634f40158a #1337
[0.038318] NIP:  c01afa48 LR: c0195fd0 CTR: 
[0.040707] REGS: c0c15e78 TRAP: 0300   Not tainted  
(5.1.0-rc3-wii-00151-g9a634f40158a)
[0.044531] MSR:  9032   CR: 84000844  XER: 
[0.047708] DAR: 6601 DSISR: 4000
[0.047708] GPR00: c0919998 c0c15f30 c0bad460 c0bad434   
 01010101
[0.047708] GPR08: 0002 0001 0002 0110 44000842 7b67efdb 
b3a9f2fa 7763f327
[0.047708] GPR16: f5bff97f 797ebc55 3aafa378 e76bacd3 af931fb0  
013de444 00d009b0
[0.047708] GPR24: c0951504 c0c3 d3efdcc0 c0951504 c0951500  
c0878fe0 c0878fe0
[0.065470] NIP [c01afa48] fs_context_for_mount+0x8/0x1c
[0.067988] LR [c0195fd0] vfs_kern_mount.part.6+0x24/0xb0
[0.070540] Call Trace:
[0.071699] [c0c15f40] [c019404c] get_fs_type+0x98/0x14c
[0.074214] [c0c15f60] [c0919998] mnt_init+0x16c/0x264
[0.076645] [c0c15f90] [c0919594] vfs_caches_init+0x7c/0x94
[0.079283] [c0c15fb0] [c0900c34] start_kernel+0x41c/0x480
[0.081878] [c0c15ff0] [346c] 0x346c
[0.083731] Instruction dump:
[0.085135] 7d005028 31080001 7d00512d 40a2fff4 2f9a 419e000c 387a0054 
48195e99
[0.088805] 935f000c 4bfffef4 9421fff0 7c852378 <80066601> 00725100 3880 
38210010
[0.092568] ---[ end trace 7373e1c0f977bdb3 ]---
[0.094750]
[1.083137] Kernel panic - not syncing: Attempted to kill the idle task!

(Without CONFIG_DEBUG_PAGEALLOC I haven't noticed any problems.)


'git bisect' says:

72f208c6a8f7bc78ef5248babd9e6ed6302bd2a0 is the first bad commit
commit 72f208c6a8f7bc78ef5248babd9e6ed6302bd2a0
Author: Christophe Leroy 
Date:   Fri Apr 26 16:23:35 2019 +

powerpc/32s: move hash code patching out of MMU_init_hw()

For KASAN, hash table handling will be activated early for
accessing to KASAN shadow areas.

In order to avoid any modification of the hash functions while
they are still used with the early hash table, the code patching
is moved out of MMU_init_hw() and put close to the big-bang switch
to the final hash table.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 


I can revert this commit, and then 5.3-rc2 (plus a patchset adding a
serial driver) boot again.

Christophe, is there anything I should test in order to figure out how
to fix this properly?


Thanks,
Jonathan Neuschäfer


signature.asc
Description: PGP signature


[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

Christophe Leroy (christophe.le...@c-s.fr) changed:

   What|Removed |Added

 CC||christophe.le...@c-s.fr

--- Comment #16 from Christophe Leroy (christophe.le...@c-s.fr) ---
Interesting.

I see in that commit that in fs/btrfs/free-space-cache.c, copy_page() is done
using entry->bitmap.

entry->bitmap is allocated with kmalloc() so there is a possibility that
entry->bitmap is not page aligned.

copy_page() in arch/powerpc/kernel/misc_32.S assumes that source and
destination are aligned on cache lines at least.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCH v4 13/25] powernv/fadump: support copying multiple kernel memory regions

2019-08-13 Thread Mahesh J Salgaonkar
On 2019-07-16 17:03:30 Tue, Hari Bathini wrote:
> Firmware uses 32-bit field for region size while copying/backing-up
> memory during MPIPL. So, the maximum copy size for a region would
> be a page less than 4GB (aligned to pagesize) but FADump capture
> kernel usually needs more memory than that to be preserved to avoid
> running into out of memory errors.
> 
> So, request firmware to copy multiple kernel memory regions instead
> of just one (which worked fine for pseries as 64-bit field was used
> for size there). With support to copy multiple kernel memory regions,
> also handle holes in the memory area to be preserved. Support as many
> as 128 kernel memory regions. This allows having an adequate FADump
> capture kernel size for different scenarios.

Can you split this patch into 2 ? One for handling holes in boot memory
and other for handling 4Gb region size ? So that it will be easy to
review changes.

Thanks,
-Mahesh.

> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/kernel/fadump-common.c  |   15 ++
>  arch/powerpc/kernel/fadump-common.h  |   16 ++
>  arch/powerpc/kernel/fadump.c |  173 
> ++
>  arch/powerpc/platforms/powernv/opal-fadump.c |   25 +++-
>  arch/powerpc/platforms/powernv/opal-fadump.h |5 -
>  arch/powerpc/platforms/pseries/rtas-fadump.c |   12 ++
>  arch/powerpc/platforms/pseries/rtas-fadump.h |5 +
>  7 files changed, 211 insertions(+), 40 deletions(-)
> 



Re: [PATCH v4 12/25] powernv/fadump: define register/un-register callback functions

2019-08-13 Thread Mahesh J Salgaonkar
On 2019-07-16 17:03:23 Tue, Hari Bathini wrote:
> Make OPAL calls to register and un-register with firmware for MPIPL.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/platforms/powernv/opal-fadump.c |   71 
> +-
>  1 file changed, 69 insertions(+), 2 deletions(-)
> 
[...]
> @@ -88,12 +104,63 @@ static int opal_fadump_setup_kernel_metadata(struct 
> fw_dump *fadump_conf)
>  
>  static int opal_fadump_register_fadump(struct fw_dump *fadump_conf)
>  {
> - return -EIO;
> + int i, err = -EIO;
> + s64 rc;
> +
> + for (i = 0; i < opal_fdm->region_cnt; i++) {
> + rc = opal_mpipl_update(OPAL_MPIPL_ADD_RANGE,
> +opal_fdm->rgn[i].src,
> +opal_fdm->rgn[i].dest,
> +opal_fdm->rgn[i].size);
> + if (rc != OPAL_SUCCESS)

You may want to remove ranges which has been added so far on error and reset
opal_fdm->registered_regions.

> + break;
> +
> + opal_fdm->registered_regions++;
> + }
> +
> + switch (rc) {
> + case OPAL_SUCCESS:
> + pr_info("Registration is successful!\n");
> + fadump_conf->dump_registered = 1;
> + err = 0;
> + break;
> + case OPAL_UNSUPPORTED:
> + pr_err("Support not available.\n");
> + fadump_conf->fadump_supported = 0;
> + fadump_conf->fadump_enabled = 0;
> + break;
> + case OPAL_INTERNAL_ERROR:
> + pr_err("Failed to register. Hardware Error(%lld).\n", rc);
> + break;
> + case OPAL_PARAMETER:
> + pr_err("Failed to register. Parameter Error(%lld).\n", rc);
> + break;
> + case OPAL_PERMISSION:

You may want to remove this check. With latest opal mpipl patches
opal_mpipl_update() no more returns OPAL_PERMISSION.

Even if opal does, we can not say fadump already registered just by
looking at return status of single entry addition.

Thanks,
-Mahesh.

> + pr_err("Already registered!\n");
> + fadump_conf->dump_registered = 1;
> + err = -EEXIST;
> + break;
> + default:
> + pr_err("Failed to register. Unknown Error(%lld).\n", rc);
> + break;
> + }
> +
> + return err;
>  }



Re: [PATCH v2 2/3] KVM: PPC: Book3S HV: Don't push XIVE context when not using XIVE device

2019-08-13 Thread Cédric Le Goater
On 13/08/2019 12:01, Paul Mackerras wrote:
> At present, when running a guest on POWER9 using HV KVM but not using
> an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
> is run with the kernel_irqchip=off option, the guest entry code goes
> ahead and tries to load the guest context into the XIVE hardware, even
> though no context has been set up.
> 
> To fix this, we check that the "CAM word" is non-zero before pushing
> it to the hardware.  The CAM word is initialized to a non-zero value
> in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
> and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.

If a "CAM word" is defined, it means the vCPU (VP) was enabled at the
XIVE HW level. So this is the criteria to consider that a vCPU needs
to update (push) its XIVE thread interrupt context when scheduled
to run.


Reviewed-by: Cédric Le Goater 

Thanks,

C.

> 
> Cc: sta...@vger.kernel.org # v4.11+
> Reported-by: Cédric Le Goater 
> Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt 
> controller")
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 ++
>  arch/powerpc/kvm/book3s_xive.c  | 11 ++-
>  arch/powerpc/kvm/book3s_xive_native.c   |  3 +++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 2e7e788..07181d0 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -942,6 +942,8 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
>   ld  r11, VCPU_XIVE_SAVED_STATE(r4)
>   li  r9, TM_QW1_OS
>   lwz r8, VCPU_XIVE_CAM_WORD(r4)
> + cmpwi   r8, 0
> + beq no_xive
>   li  r7, TM_QW1_OS + TM_WORD2
>   mfmsr   r0
>   andi.   r0, r0, MSR_DR  /* in real mode? */
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 09f838a..586867e 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -67,8 +67,14 @@ void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu)
>   void __iomem *tima = local_paca->kvm_hstate.xive_tima_virt;
>   u64 pq;
>  
> - if (!tima)
> + /*
> +  * Nothing to do if the platform doesn't have a XIVE
> +  * or this vCPU doesn't have its own XIVE context
> +  * (e.g. because it's not using an in-kernel interrupt controller).
> +  */
> + if (!tima || !vcpu->arch.xive_cam_word)
>   return;
> +
>   eieio();
>   __raw_writeq(vcpu->arch.xive_saved_state.w01, tima + TM_QW1_OS);
>   __raw_writel(vcpu->arch.xive_cam_word, tima + TM_QW1_OS + TM_WORD2);
> @@ -1146,6 +1152,9 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
>   /* Disable the VP */
>   xive_native_disable_vp(xc->vp_id);
>  
> + /* Clear the cam word so guest entry won't try to push context */
> + vcpu->arch.xive_cam_word = 0;
> +
>   /* Free the queues */
>   for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
>   struct xive_q *q = >queues[i];
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
> b/arch/powerpc/kvm/book3s_xive_native.c
> index 368427f..11b91b4 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -81,6 +81,9 @@ void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
>   /* Disable the VP */
>   xive_native_disable_vp(xc->vp_id);
>  
> + /* Clear the cam word so guest entry won't try to push context */
> + vcpu->arch.xive_cam_word = 0;
> +
>   /* Free the queues */
>   for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
>   kvmppc_xive_native_cleanup_queue(vcpu, i);
> 



[Bug 204479] KASAN hit at modprobe zram

2019-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204479

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #284271|0   |1
is obsolete||

--- Comment #21 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 284361
  --> https://bugzilla.kernel.org/attachment.cgi?id=284361=edit
kernel .config (5.3-rc4, PowerMac G4 DP)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 204479] KASAN hit at modprobe zram

2019-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204479

--- Comment #20 from Erhard F. (erhar...@mailbox.org) ---
(In reply to Christophe Leroy from comment #18)
> Two possibilities, either the value in .rodata.cst16 is wrong or the stack
> gets corrupted.
> 
> Maybe you could try disabling KASAN in lib/raid6/Makefile for altivec8.o ?
> Or maybe for the entire lib/raid6/ directory, just to see what happens ?
Disabled KASAN with KASAN_SANITIZE := n in lib/raid6/Makefile. As you can see
in my latest dmesg, the G4 continues booting without further issues.

If btrfs gets loaded it still fails with KASAN (will update bug #204397).

Another funny issue. Mounting my nfs share works via:
modprobe nfs
mount /media/distanthome

If I mount it without modprobing nfs beforehand I get:
[...]
[   66.271748]
==
[   66.272076] BUG: KASAN: global-out-of-bounds in _copy_to_iter+0x3d4/0x5a8
[   66.272331] Write of size 4096 at addr f1c27000 by task modprobe/312

[   66.272598] CPU: 0 PID: 312 Comm: modprobe Tainted: GW
5.3.0-rc4+ #1
[   66.272883] Call Trace:
[   66.272964] [e100b848] [c075026c] dump_stack+0xb0/0x10c (unreliable)
[   66.273211] [e100b878] [c02334a8] print_address_description+0x80/0x45c
[   66.273456] [e100b908] [c0233128] __kasan_report+0x140/0x188
[   66.273667] [e100b948] [c0233fbc] check_memory_region+0x28/0x184
[   66.273889] [e100b958] [c023206c] memcpy+0x48/0x74
[   66.274061] [e100b978] [c044342c] _copy_to_iter+0x3d4/0x5a8
[   66.274265] [e100baa8] [c04437a8] copy_page_to_iter+0x90/0x550
[   66.274482] [e100bb08] [c01b6898] generic_file_read_iter+0x5c8/0x7bc
[   66.274720] [e100bb78] [c0249034] __vfs_read+0x1b0/0x1f4
[   66.274912] [e100bca8] [c0249134] vfs_read+0xbc/0x124
[   66.275094] [e100bcd8] [c02491f0] kernel_read+0x54/0x70
[   66.275284] [e100bd08] [c02535c8] kernel_read_file+0x240/0x358
[   66.275499] [e100bdb8] [c02537cc] kernel_read_file_from_fd+0x54/0x74
[   66.275737] [e100bdf8] [c01068ac] sys_finit_module+0xd8/0x140
[   66.275949] [e100bf38] [c001a274] ret_from_syscall+0x0/0x34
[   66.276152] --- interrupt: c01 at 0xa602c4
   LR = 0xbe87c4


[   66.276417] Memory state around the buggy address:
[   66.276588]  f1c27a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   66.276824]  f1c27a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   66.277060] >f1c27b00: 00 00 00 00 00 00 00 00 05 fa fa fa fa fa fa fa
[   66.277293]^
[   66.277453]  f1c27b80: 07 fa fa fa fa fa fa fa 00 03 fa fa fa fa fa fa
[   66.277688]  f1c27c00: 04 fa fa fa fa fa fa fa 00 06 fa fa fa fa fa fa
[   66.277920]
==
[   66.428224] RPC: Registered named UNIX socket transport module.
[   66.428484] RPC: Registered udp transport module.
[   66.428647] RPC: Registered tcp transport module.
[   66.428809] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   66.741275] Key type dns_resolver registered
[   67.974192] NFS: Registering the id_resolver key type
[   67.974534] Key type id_resolver registered
[   67.974681] Key type id_legacy registered


But maybe it's better to not open too many ppc32 KASAN related bugs for now. ;)
It probably can wait until you patches are in some later 5.3-rc I guess.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 204479] KASAN hit at modprobe zram

2019-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204479

--- Comment #19 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 284355
  --> https://bugzilla.kernel.org/attachment.cgi?id=284355=edit
dmesg (kernel 5.3-rc4 + shadow patch + parallel patch, PowerMac G4 DP)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

RE: [EXT] Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Xiaowei Bao


> -Original Message-
> From: Lorenzo Pieralisi 
> Sent: 2019年8月13日 18:04
> To: Xiaowei Bao 
> Cc: bhelg...@google.com; M.h. Lian ; Mingkai Hu
> ; Roy Zang ;
> l.st...@pengutronix.de; kis...@ti.com; tpie...@impinj.com; Leonard
> Crestez ; andrew.smir...@gmail.com;
> yue.w...@amlogic.com; hayashi.kunih...@socionext.com;
> d...@amazon.co.uk; jon...@amazon.com; linux-...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-arm-ker...@lists.infradead.org
> Subject: [EXT] Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit
> property in EP driver.
> 
> Caution: EXT Email
> 
> git log --oneline --follow drivers/pci/controller/dwc/pci-layerscape.c
> 
> Do you see any commit with a $SUBJECT ending with a period ?
> 
> There is not. So remove it from yours too.
OK, thanks a lot, I will remove it in the next version patch, I have to get the 
approved
Form IT team of our company. 
> 
> On Tue, Aug 13, 2019 at 02:28:39PM +0800, Xiaowei Bao wrote:
> > The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1 is
> > 32bit, BAR2 and BAR4 is 64bit, this is determined by hardware, so set
> > the bar_fixed_64bit with 0x14.
> >
> > Signed-off-by: Xiaowei Bao 
> > ---
> > v2:
> >  - Replace value 0x14 with a macro.
> > v3:
> >  - No change.
> > v4:
> >  - send the patch again with '--to'.
> > v5:
> >  - fix the commit message.
> >
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
> >  1 file changed, 1 insertion(+)
> 
> scripts/get_maintainer.pl -f drivers/pci/controller/dwc/pci-layerscape-ep.c
> Now, with the output you get justify all the people you send this email to.
> 
> So, again, trim the CC list and it is the last time I tell you.
Do you mean that I use scripts/get_maintainer.pl -f drivers/pci/controller/
dwc/pci-layerscape-ep.c to get the mail list who I need to send? I use the
command of ' scripts/get_maintainer.pl *.patch' to get the mail list before.
If yes, I will use the command that you provided. Thanks a lot.
> 
> Before sending patches on mailing lists use git --dry-run to check the emails
> you are sending.
> 
> Thanks,
> Lorenzo
> 
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index be61d96..ca9aa45 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -44,6 +44,7 @@ static const struct pci_epc_features
> ls_pcie_epc_features = {
> >   .linkup_notifier = false,
> >   .msi_capable = true,
> >   .msix_capable = false,
> > + .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> >  };
> >
> >  static const struct pci_epc_features*
> > --
> > 2.9.5
> >


Re: [PATCH v4 11/25] powernv/fadump: register kernel metadata address with opal

2019-08-13 Thread Mahesh J Salgaonkar
On 2019-07-16 17:03:15 Tue, Hari Bathini wrote:
> OPAL allows registering address with it in the first kernel and
> retrieving it after MPIPL. Setup kernel metadata and register its
> address with OPAL to use it for processing the crash dump.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/kernel/fadump-common.h  |4 +
>  arch/powerpc/kernel/fadump.c |   65 ++-
>  arch/powerpc/platforms/powernv/opal-fadump.c |   73 
> ++
>  arch/powerpc/platforms/powernv/opal-fadump.h |   37 +
>  arch/powerpc/platforms/pseries/rtas-fadump.c |   32 +--
>  5 files changed, 177 insertions(+), 34 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-fadump.h
> 
[...]
> @@ -346,30 +349,42 @@ int __init fadump_reserve_mem(void)
>* use memblock_find_in_range() here since it doesn't allocate
>* from bottom to top.
>*/
> - for (base = fw_dump.boot_memory_size;
> -  base <= (memory_boundary - size);
> -  base += size) {
> + while (base <= (memory_boundary - size)) {
>   if (memblock_is_region_memory(base, size) &&
>   !memblock_is_region_reserved(base, size))
>   break;
> +
> + base += size;
>   }
> - if ((base > (memory_boundary - size)) ||
> - memblock_reserve(base, size)) {
> +
> + if (base > (memory_boundary - size)) {
> + pr_err("Failed to find memory chunk for reservation\n");
> + goto error_out;
> + }
> + fw_dump.reserve_dump_area_start = base;
> +
> + /*
> +  * Calculate the kernel metadata address and register it with
> +  * f/w if the platform supports.
> +  */
> + if (fw_dump.ops->setup_kernel_metadata(_dump) < 0)
> + goto error_out;

I see setup_kernel_metadata() registers the metadata address with opal without
having any minimum data initialized in it. Secondaly, why can't this wait until
registration ? I think we should defer this until fadump registration.
What if kernel crashes before metadata area is initialized ?

> +
> + if (memblock_reserve(base, size)) {
>   pr_err("Failed to reserve memory\n");
> - return 0;
> + goto error_out;
>   }
[...]
> -
>  static struct fadump_ops rtas_fadump_ops = {
> - .init_fadump_mem_struct = rtas_fadump_init_mem_struct,
> - .register_fadump= rtas_fadump_register_fadump,
> - .unregister_fadump  = rtas_fadump_unregister_fadump,
> - .invalidate_fadump  = rtas_fadump_invalidate_fadump,
> - .process_fadump = rtas_fadump_process_fadump,
> - .fadump_region_show = rtas_fadump_region_show,
> - .fadump_trigger = rtas_fadump_trigger,
> + .init_fadump_mem_struct = rtas_fadump_init_mem_struct,
> + .get_kernel_metadata_size   = rtas_fadump_get_kernel_metadata_size,
> + .setup_kernel_metadata  = rtas_fadump_setup_kernel_metadata,
> + .register_fadump= rtas_fadump_register_fadump,
> + .unregister_fadump  = rtas_fadump_unregister_fadump,
> + .invalidate_fadump  = rtas_fadump_invalidate_fadump,
> + .process_fadump = rtas_fadump_process_fadump,
> + .fadump_region_show = rtas_fadump_region_show,
> + .fadump_trigger = rtas_fadump_trigger,

Can you make the tab space changes in your previous patch where these
were initially introduced ? So that this patch can only show new members
that are added.

Thanks,
-Mahesh.



[PATCH v2 1/3] KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts

2019-08-13 Thread Paul Mackerras
Escalation interrupts are interrupts sent to the host by the XIVE
hardware when it has an interrupt to deliver to a guest VCPU but that
VCPU is not running anywhere in the system.  Hence we disable the
escalation interrupt for the VCPU being run when we enter the guest
and re-enable it when the guest does an H_CEDE hypercall indicating
it is idle.

It is possible that an escalation interrupt gets generated just as we
are entering the guest.  In that case the escalation interrupt may be
using a queue entry in one of the interrupt queues, and that queue
entry may not have been processed when the guest exits with an H_CEDE.
The existing entry code detects this situation and does not clear the
vcpu->arch.xive_esc_on flag as an indication that there is a pending
queue entry (if the queue entry gets processed, xive_esc_irq() will
clear the flag).  There is a comment in the code saying that if the
flag is still set on H_CEDE, we have to abort the cede rather than
re-enabling the escalation interrupt, lest we end up with two
occurrences of the escalation interrupt in the interrupt queue.

However, the exit code doesn't do that; it aborts the cede in the sense
that vcpu->arch.ceded gets cleared, but it still enables the escalation
interrupt by setting the source's PQ bits to 00.  Instead we need to
set the PQ bits to 10, indicating that an interrupt has been triggered.
We also need to avoid setting vcpu->arch.xive_esc_on in this case
(i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
xive_esc_irq() will run at some point and clear it, and if we race with
that we may end up with an incorrect result (i.e. xive_esc_on set when
the escalation interrupt has just been handled).

It is extremely unlikely that having two queue entries would cause
observable problems; theoretically it could cause queue overflow, but
the CPU would have to have thousands of interrupts targetted to it for
that to be possible.  However, this fix will also make it possible to
determine accurately whether there is an unhandled escalation
interrupt in the queue, which will be needed by the following patch.

Cc: sta...@vger.kernel.org # v4.16+
Fixes: 9b9b13a6d153 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt 
masked unless ceded")
Signed-off-by: Paul Mackerras 
---
v2: don't set xive_esc_on if we're not using a XIVE escalation
interrupt.

 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 36 +
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 337e644..2e7e788 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2831,29 +2831,39 @@ kvm_cede_prodded:
 kvm_cede_exit:
ld  r9, HSTATE_KVM_VCPU(r13)
 #ifdef CONFIG_KVM_XICS
-   /* Abort if we still have a pending escalation */
+   /* are we using XIVE with single escalation? */
+   ld  r10, VCPU_XIVE_ESC_VADDR(r9)
+   cmpdi   r10, 0
+   beq 3f
+   li  r6, XIVE_ESB_SET_PQ_00
+   /*
+* If we still have a pending escalation, abort the cede,
+* and we must set PQ to 10 rather than 00 so that we don't
+* potentially end up with two entries for the escalation
+* interrupt in the XIVE interrupt queue.  In that case
+* we also don't want to set xive_esc_on to 1 here in
+* case we race with xive_esc_irq().
+*/
lbz r5, VCPU_XIVE_ESC_ON(r9)
cmpwi   r5, 0
-   beq 1f
+   beq 4f
li  r0, 0
stb r0, VCPU_CEDED(r9)
-1: /* Enable XIVE escalation */
-   li  r5, XIVE_ESB_SET_PQ_00
+   li  r6, XIVE_ESB_SET_PQ_10
+   b   5f
+4: li  r0, 1
+   stb r0, VCPU_XIVE_ESC_ON(r9)
+   /* make sure store to xive_esc_on is seen before xive_esc_irq runs */
+   sync
+5: /* Enable XIVE escalation */
mfmsr   r0
andi.   r0, r0, MSR_DR  /* in real mode? */
beq 1f
-   ld  r10, VCPU_XIVE_ESC_VADDR(r9)
-   cmpdi   r10, 0
-   beq 3f
-   ldx r0, r10, r5
+   ldx r0, r10, r6
b   2f
 1: ld  r10, VCPU_XIVE_ESC_RADDR(r9)
-   cmpdi   r10, 0
-   beq 3f
-   ldcix   r0, r10, r5
+   ldcix   r0, r10, r6
 2: sync
-   li  r0, 1
-   stb r0, VCPU_XIVE_ESC_ON(r9)
 #endif /* CONFIG_KVM_XICS */
 3: b   guest_exit_cont
 
-- 
2.7.4



[PATCH v2 0/3] powerpc/xive: Fix race condition leading to host crashes and hangs

2019-08-13 Thread Paul Mackerras
This series fixes a race condition that has been observed in testing
on POWER9 machines running KVM guests.  An interrupt being freed by
free_irq() can have an instance present in a XIVE interrupt queue,
which can then be presented to the generic interrupt code after the
data structures for it have been freed, leading to a variety of
crashes and hangs.

This series is based on current upstream kernel source plus Cédric Le
Goater's patch "KVM: PPC: Book3S HV: XIVE: Free escalation interrupts
before disabling the VP", which is a pre-requisite for this series.
As it touches both KVM and generic PPC code, this series will probably
go in via Michael Ellerman's powerpc tree.

V2 of this patch series adds a patch fixing a bug noticed by Cédric,
and also fixes a bug in patch 1/2 of the v1 series.

Paul.

 arch/powerpc/include/asm/xive.h |  8 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 38 +-
 arch/powerpc/kvm/book3s_xive.c  | 42 +++-
 arch/powerpc/kvm/book3s_xive.h  |  2 +
 arch/powerpc/kvm/book3s_xive_native.c   |  6 +++
 arch/powerpc/sysdev/xive/common.c   | 87 -
 6 files changed, 146 insertions(+), 37 deletions(-)


[PATCH v2 2/3] KVM: PPC: Book3S HV: Don't push XIVE context when not using XIVE device

2019-08-13 Thread Paul Mackerras
At present, when running a guest on POWER9 using HV KVM but not using
an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
is run with the kernel_irqchip=off option, the guest entry code goes
ahead and tries to load the guest context into the XIVE hardware, even
though no context has been set up.

To fix this, we check that the "CAM word" is non-zero before pushing
it to the hardware.  The CAM word is initialized to a non-zero value
in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.

Cc: sta...@vger.kernel.org # v4.11+
Reported-by: Cédric Le Goater 
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt 
controller")
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 ++
 arch/powerpc/kvm/book3s_xive.c  | 11 ++-
 arch/powerpc/kvm/book3s_xive_native.c   |  3 +++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 2e7e788..07181d0 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -942,6 +942,8 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
ld  r11, VCPU_XIVE_SAVED_STATE(r4)
li  r9, TM_QW1_OS
lwz r8, VCPU_XIVE_CAM_WORD(r4)
+   cmpwi   r8, 0
+   beq no_xive
li  r7, TM_QW1_OS + TM_WORD2
mfmsr   r0
andi.   r0, r0, MSR_DR  /* in real mode? */
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 09f838a..586867e 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -67,8 +67,14 @@ void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu)
void __iomem *tima = local_paca->kvm_hstate.xive_tima_virt;
u64 pq;
 
-   if (!tima)
+   /*
+* Nothing to do if the platform doesn't have a XIVE
+* or this vCPU doesn't have its own XIVE context
+* (e.g. because it's not using an in-kernel interrupt controller).
+*/
+   if (!tima || !vcpu->arch.xive_cam_word)
return;
+
eieio();
__raw_writeq(vcpu->arch.xive_saved_state.w01, tima + TM_QW1_OS);
__raw_writel(vcpu->arch.xive_cam_word, tima + TM_QW1_OS + TM_WORD2);
@@ -1146,6 +1152,9 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
/* Disable the VP */
xive_native_disable_vp(xc->vp_id);
 
+   /* Clear the cam word so guest entry won't try to push context */
+   vcpu->arch.xive_cam_word = 0;
+
/* Free the queues */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
struct xive_q *q = >queues[i];
diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
b/arch/powerpc/kvm/book3s_xive_native.c
index 368427f..11b91b4 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -81,6 +81,9 @@ void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
/* Disable the VP */
xive_native_disable_vp(xc->vp_id);
 
+   /* Clear the cam word so guest entry won't try to push context */
+   vcpu->arch.xive_cam_word = 0;
+
/* Free the queues */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
kvmppc_xive_native_cleanup_queue(vcpu, i);
-- 
2.7.4



[PATCH v2 3/3] powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race

2019-08-13 Thread Paul Mackerras
Testing has revealed the existence of a race condition where a XIVE
interrupt being shut down can be in one of the XIVE interrupt queues
(of which there are up to 8 per CPU, one for each priority) at the
point where free_irq() is called.  If this happens, can return an
interrupt number which has been shut down.  This can lead to various
symptoms:

- irq_to_desc(irq) can be NULL.  In this case, no end-of-interrupt
  function gets called, resulting in the CPU's elevated interrupt
  priority (numerically lowered CPPR) never gets reset.  That then
  means that the CPU stops processing interrupts, causing device
  timeouts and other errors in various device drivers.

- The irq descriptor or related data structures can be in the process
  of being freed as the interrupt code is using them.  This typically
  leads to crashes due to bad pointer dereferences.

This race is basically what commit 62e0468650c3 ("genirq: Add optional
hardware synchronization for shutdown", 2019-06-28) is intended to
fix, given a get_irqchip_state() method for the interrupt controller
being used.  It works by polling the interrupt controller when an
interrupt is being freed until the controller says it is not pending.

With XIVE, the PQ bits of the interrupt source indicate the state of
the interrupt source, and in particular the P bit goes from 0 to 1 at
the point where the hardware writes an entry into the interrupt queue
that this interrupt is directed towards.  Normally, the code will then
process the interrupt and do an end-of-interrupt (EOI) operation which
will reset PQ to 00 (assuming another interrupt hasn't been generated
in the meantime).  However, there are situations where the code resets
P even though a queue entry exists (for example, by setting PQ to 01,
which disables the interrupt source), and also situations where the
code leaves P at 1 after removing the queue entry (for example, this
is done for escalation interrupts so they cannot fire again until
they are explicitly re-enabled).

The code already has a 'saved_p' flag for the interrupt source which
indicates that a queue entry exists, although it isn't maintained
consistently.  This patch adds a 'stale_p' flag to indicate that
P has been left at 1 after processing a queue entry, and adds code
to set and clear saved_p and stale_p as necessary to maintain a
consistent indication of whether a queue entry may or may not exist.

With this, we can implement xive_get_irqchip_state() by looking at
stale_p, saved_p and the ESB PQ bits for the interrupt.

There is some additional code to handle escalation interrupts
properly; because they are enabled and disabled in KVM assembly code,
which does not have access to the xive_irq_data struct for the
escalation interrupt.  Hence, stale_p may be incorrect when the
escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
with some careful attention to barriers in order to ensure the correct
result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().

Finally, this adds code to make noise on the console (pr_crit and
WARN_ON(1)) if we find an interrupt queue entry for an interrupt
which does not have a descriptor.  While this won't catch the race
reliably, if it does get triggered it will be an indication that
the race is occurring and needs to be debugged.

Signed-off-by: Paul Mackerras 
---
v2: call xive_cleanup_single_escalation
from kvmppc_xive_native_cleanup_vcpu() too.

 arch/powerpc/include/asm/xive.h   |  8 
 arch/powerpc/kvm/book3s_xive.c| 31 +
 arch/powerpc/kvm/book3s_xive.h|  2 +
 arch/powerpc/kvm/book3s_xive_native.c |  3 ++
 arch/powerpc/sysdev/xive/common.c | 87 ++-
 5 files changed, 108 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index e401698..efb0e59 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -46,7 +46,15 @@ struct xive_irq_data {
 
/* Setup/used by frontend */
int target;
+   /*
+* saved_p means that there is a queue entry for this interrupt
+* in some CPU's queue (not including guest vcpu queues), even
+* if P is not set in the source ESB.
+* stale_p means that there is no queue entry for this interrupt
+* in some CPU's queue, even if P is set in the source ESB.
+*/
bool saved_p;
+   bool stale_p;
 };
 #define XIVE_IRQ_FLAG_STORE_EOI0x01
 #define XIVE_IRQ_FLAG_LSI  0x02
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 586867e..591bfb4 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -166,6 +166,9 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
 */
vcpu->arch.xive_esc_on = false;
 
+   /* This orders xive_esc_on = false vs. subsequent stale_p = true */
+   

Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Lorenzo Pieralisi
git log --oneline --follow drivers/pci/controller/dwc/pci-layerscape.c

Do you see any commit with a $SUBJECT ending with a period ?

There is not. So remove it from yours too.

On Tue, Aug 13, 2019 at 02:28:39PM +0800, Xiaowei Bao wrote:
> The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1
> is 32bit, BAR2 and BAR4 is 64bit, this is determined by hardware,
> so set the bar_fixed_64bit with 0x14.
> 
> Signed-off-by: Xiaowei Bao 
> ---
> v2:
>  - Replace value 0x14 with a macro.
> v3:
>  - No change.
> v4:
>  - send the patch again with '--to'.
> v5:
>  - fix the commit message.
> 
>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
>  1 file changed, 1 insertion(+)

scripts/get_maintainer.pl -f drivers/pci/controller/dwc/pci-layerscape-ep.c
Now, with the output you get justify all the people you send this email
to.

So, again, trim the CC list and it is the last time I tell you.

Before sending patches on mailing lists use git --dry-run to check
the emails you are sending.

Thanks,
Lorenzo

> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index be61d96..ca9aa45 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -44,6 +44,7 @@ static const struct pci_epc_features ls_pcie_epc_features = 
> {
>   .linkup_notifier = false,
>   .msi_capable = true,
>   .msix_capable = false,
> + .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
>  };
>  
>  static const struct pci_epc_features*
> -- 
> 2.9.5
> 


[PATCH 2/2] powerpc/32: replace LOAD_MSR_KERNEL() by LOAD_REG_IMMEDIATE()

2019-08-13 Thread Christophe Leroy
LOAD_MSR_KERNEL() and LOAD_REG_IMMEDIATE() are doing the same thing
in the same way. Drop LOAD_MSR_KERNEL()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 18 +-
 arch/powerpc/kernel/head_32.h  | 21 -
 2 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 54fab22c9a43..972b05504a0a 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -230,7 +230,7 @@ transfer_to_handler_cont:
 */
lis r12,reenable_mmu@h
ori r12,r12,reenable_mmu@l
-   LOAD_MSR_KERNEL(r0, MSR_KERNEL)
+   LOAD_REG_IMMEDIATE(r0, MSR_KERNEL)
mtspr   SPRN_SRR0,r12
mtspr   SPRN_SRR1,r0
SYNC
@@ -304,7 +304,7 @@ stack_ovf:
addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD
lis r9,StackOverflow@ha
addir9,r9,StackOverflow@l
-   LOAD_MSR_KERNEL(r10,MSR_KERNEL)
+   LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
mtspr   SPRN_NRI, r0
 #endif
@@ -324,7 +324,7 @@ trace_syscall_entry_irq_off:
bl  trace_hardirqs_on
 
/* Now enable for real */
-   LOAD_MSR_KERNEL(r10, MSR_KERNEL | MSR_EE)
+   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL | MSR_EE)
mtmsr   r10
 
REST_GPR(0, r1)
@@ -394,7 +394,7 @@ ret_from_syscall:
 #endif
mr  r6,r3
/* disable interrupts so current_thread_info()->flags can't change */
-   LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */
+   LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)  /* doesn't include MSR_EE */
/* Note: We don't bother telling lockdep about it */
SYNC
MTMSRD(r10)
@@ -824,7 +824,7 @@ ret_from_except:
 * can't change between when we test it and when we return
 * from the interrupt. */
/* Note: We don't bother telling lockdep about it */
-   LOAD_MSR_KERNEL(r10,MSR_KERNEL)
+   LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)
SYNC/* Some chip revs have problems here... */
MTMSRD(r10) /* disable interrupts */
 
@@ -991,7 +991,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
 * can restart the exception exit path at the label
 * exc_exit_restart below.  -- paulus
 */
-   LOAD_MSR_KERNEL(r10,MSR_KERNEL & ~MSR_RI)
+   LOAD_REG_IMMEDIATE(r10,MSR_KERNEL & ~MSR_RI)
SYNC
MTMSRD(r10) /* clear the RI bit */
.globl exc_exit_restart
@@ -1066,7 +1066,7 @@ exc_exit_restart_end:
REST_NVGPRS(r1);\
lwz r3,_MSR(r1);\
andi.   r3,r3,MSR_PR;   \
-   LOAD_MSR_KERNEL(r10,MSR_KERNEL);\
+   LOAD_REG_IMMEDIATE(r10,MSR_KERNEL); \
bne user_exc_return;\
lwz r0,GPR0(r1);\
lwz r2,GPR2(r1);\
@@ -1236,7 +1236,7 @@ recheck:
 * neither. Those disable/enable cycles used to peek at
 * TI_FLAGS aren't advertised.
 */
-   LOAD_MSR_KERNEL(r10,MSR_KERNEL)
+   LOAD_REG_IMMEDIATE(r10,MSR_KERNEL)
SYNC
MTMSRD(r10) /* disable interrupts */
lwz r9,TI_FLAGS(r2)
@@ -1329,7 +1329,7 @@ _GLOBAL(enter_rtas)
lwz r4,RTASBASE(r4)
mfmsr   r9
stw r9,8(r1)
-   LOAD_MSR_KERNEL(r0,MSR_KERNEL)
+   LOAD_REG_IMMEDIATE(r0,MSR_KERNEL)
SYNC/* disable interrupts so SRR0/1 */
MTMSRD(r0)  /* don't get trashed */
li  r9,MSR_KERNEL & ~(MSR_IR|MSR_DR)
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 4a692553651f..8abc7783dbe5 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -5,19 +5,6 @@
 #include /* for STACK_FRAME_REGS_MARKER */
 
 /*
- * MSR_KERNEL is > 0x8000 on 4xx/Book-E since it include MSR_CE.
- */
-.macro __LOAD_MSR_KERNEL r, x
-.if \x >= 0x8000
-   lis \r, (\x)@h
-   ori \r, \r, (\x)@l
-.else
-   li \r, (\x)
-.endif
-.endm
-#define LOAD_MSR_KERNEL(r, x) __LOAD_MSR_KERNEL r, x
-
-/*
  * Exception entry code.  This code runs with address translation
  * turned off, i.e. using physical addresses.
  * We assume sprg3 has the physical address of the current
@@ -92,7 +79,7 @@
 #ifdef CONFIG_40x
rlwinm  r9,r9,0,14,12   /* clear MSR_WE (necessary?) */
 #else
-   LOAD_MSR_KERNEL(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take 
exceptions */
+   LOAD_REG_IMMEDIATE(r10, MSR_KERNEL & ~(MSR_IR|MSR_DR)) /* can take 
exceptions */
MTMSRD(r10) /* (except for mach check 

[PATCH 1/2] powerpc: rewrite LOAD_REG_IMMEDIATE() as an intelligent macro

2019-08-13 Thread Christophe Leroy
Today LOAD_REG_IMMEDIATE() is a basic #define which loads all
parts on a value into a register, including the parts that are NUL.

This means always 2 instructions on PPC32 and always 5 instructions
on PPC64. And those instructions cannot run in parallele as they are
updating the same register.

Ex: LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in:

3c 20 00 00 lis r1,0
60 21 00 00 ori r1,r1,0
78 21 07 c6 rldicr  r1,r1,32,31
64 21 00 00 orisr1,r1,0
60 21 40 00 ori r1,r1,16384

Rewrite LOAD_REG_IMMEDIATE() with GAS macro in order to skip
the parts that are NUL.

Rename existing LOAD_REG_IMMEDIATE() as LOAD_REG_IMMEDIATE_SYM()
and use that one for loading value of symbols which are not known
at compile time.

Now LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in:

38 20 40 00 li  r1,16384

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/ppc_asm.h   | 42 +++-
 arch/powerpc/kernel/exceptions-64e.S | 10 -
 arch/powerpc/kernel/head_64.S|  2 +-
 3 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index e0637730a8e7..9a7c2ca9b714 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -311,13 +311,43 @@ GLUE(.,name):
addis   reg,reg,(name - 0b)@ha; \
addireg,reg,(name - 0b)@l;
 
-#ifdef __powerpc64__
-#ifdef HAVE_AS_ATHIGH
+#if defined(__powerpc64__) && defined(HAVE_AS_ATHIGH)
 #define __AS_ATHIGH high
 #else
 #define __AS_ATHIGH h
 #endif
-#define LOAD_REG_IMMEDIATE(reg,expr)   \
+
+.macro __LOAD_REG_IMMEDIATE_32 r, x
+   .if (\x) >= 0x8000 || (\x) < -0x8000
+   lis \r, (\x)@__AS_ATHIGH
+   .if (\x) & 0x != 0
+   ori \r, \r, (\x)@l
+   .endif
+   .else
+   li \r, (\x)@l
+   .endif
+.endm
+
+.macro __LOAD_REG_IMMEDIATE r, x
+   .if \x & ~0x != 0
+   __LOAD_REG_IMMEDIATE_32 \r, (\x) >> 32
+   rldicr  \r, \r, 32, 31
+   .if (\x) & 0x != 0
+   oris \r, \r, (\x)@__AS_ATHIGH
+   .endif
+   .if (\x) & 0x != 0
+   oris \r, \r, (\x)@l
+   .endif
+   .else
+   __LOAD_REG_IMMEDIATE_32 \r, \x
+   .endif
+.endm
+
+#ifdef __powerpc64__
+
+#define LOAD_REG_IMMEDIATE(reg, expr) __LOAD_REG_IMMEDIATE reg, expr
+
+#define LOAD_REG_IMMEDIATE_SYM(reg,expr)   \
lis reg,(expr)@highest; \
ori reg,reg,(expr)@higher;  \
rldicr  reg,reg,32,31;  \
@@ -335,11 +365,13 @@ GLUE(.,name):
 
 #else /* 32-bit */
 
-#define LOAD_REG_IMMEDIATE(reg,expr)   \
+#define LOAD_REG_IMMEDIATE(reg, expr) __LOAD_REG_IMMEDIATE_32 reg, expr
+
+#define LOAD_REG_IMMEDIATE_SYM(reg,expr)   \
lis reg,(expr)@ha;  \
addireg,reg,(expr)@l;
 
-#define LOAD_REG_ADDR(reg,name)LOAD_REG_IMMEDIATE(reg, name)
+#define LOAD_REG_ADDR(reg,name)LOAD_REG_IMMEDIATE_SYM(reg, 
name)
 
 #define LOAD_REG_ADDRBASE(reg, name)   lis reg,name@ha
 #define ADDROFF(name)  name@l
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 1cfb3da4a84a..898aae6da167 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -751,8 +751,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
ld  r14,interrupt_base_book3e@got(r15)
ld  r15,__end_interrupts@got(r15)
 #else
-   LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
-   LOAD_REG_IMMEDIATE(r15,__end_interrupts)
+   LOAD_REG_IMMEDIATE_SYM(r14,interrupt_base_book3e)
+   LOAD_REG_IMMEDIATE_SYM(r15,__end_interrupts)
 #endif
cmpld   cr0,r10,r14
cmpld   cr1,r10,r15
@@ -821,8 +821,8 @@ kernel_dbg_exc:
ld  r14,interrupt_base_book3e@got(r15)
ld  r15,__end_interrupts@got(r15)
 #else
-   LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
-   LOAD_REG_IMMEDIATE(r15,__end_interrupts)
+   LOAD_REG_IMMEDIATE_SYM(r14,interrupt_base_book3e)
+   LOAD_REG_IMMEDIATE_SYM(r15,__end_interrupts)
 #endif
cmpld   cr0,r10,r14
cmpld   cr1,r10,r15
@@ -1449,7 +1449,7 @@ a2_tlbinit_code_start:
 a2_tlbinit_after_linear_map:
 
/* Now we branch the new virtual address mapped by this entry */
-   LOAD_REG_IMMEDIATE(r3,1f)
+   LOAD_REG_IMMEDIATE_SYM(r3,1f)
mtctr   r3
bctr
 
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 91d297e696dd..1fd44761e997 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -635,7 +635,7 @@ __after_prom_start:
sub r5,r5,r11
 #else
/* just copy interrupts */
-   LOAD_REG_IMMEDIATE(r5, 

Re: [EXT] Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Lorenzo Pieralisi
You should fix your email client set-up to avoid sticking an [EXT]
tag to your emails $SUBJECT.

On Tue, Aug 13, 2019 at 07:39:48AM +, Xiaowei Bao wrote:
> 
> 
> > -Original Message-
> > From: Kishon Vijay Abraham I 
> > Sent: 2019年8月13日 15:30
> > To: Xiaowei Bao ; lorenzo.pieral...@arm.com;
> > bhelg...@google.com; M.h. Lian ; Mingkai Hu
> > ; Roy Zang ;
> > l.st...@pengutronix.de; tpie...@impinj.com; Leonard Crestez
> > ; andrew.smir...@gmail.com;
> > yue.w...@amlogic.com; hayashi.kunih...@socionext.com;
> > d...@amazon.co.uk; jon...@amazon.com; linux-...@vger.kernel.org;
> > linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> > linux-arm-ker...@lists.infradead.org
> > Subject: [EXT] Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit
> > property in EP driver.
> > 
> > Caution: EXT Email

See above, this "Caution" stuff should disappear.

Also, quoting the email header is useless, please configure your email
client to remove it.

Thanks,
Lorenzo

> > On 13/08/19 11:58 AM, Xiaowei Bao wrote:
> > > The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1 is
> > > 32bit, BAR2 and BAR4 is 64bit, this is determined by hardware, so set
> > > the bar_fixed_64bit with 0x14.
> > >
> > > Signed-off-by: Xiaowei Bao 
> > 
> > Acked-by: Kishon Vijay Abraham I 
> > > ---
> > > v2:
> > >  - Replace value 0x14 with a macro.
> > > v3:
> > >  - No change.
> > > v4:
> > >  - send the patch again with '--to'.
> > > v5:
> > >  - fix the commit message.
> > >
> > >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > index be61d96..ca9aa45 100644
> > > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > @@ -44,6 +44,7 @@ static const struct pci_epc_features
> > ls_pcie_epc_features = {
> > >   .linkup_notifier = false,
> > >   .msi_capable = true,
> > >   .msix_capable = false,
> > > + .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> > >  };
> > >
> > >  static const struct pci_epc_features*
> I check other platforms, it is 'static const struct pci_epc_features', I can 
> get the correct 
> Value use this define way in pci-epf-test.c file.
> > >


[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #15 from Erhard F. (erhar...@mailbox.org) ---
On Fri, 09 Aug 2019 12:31:26 +
bugzilla-dae...@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=204371
# cat ~/bisect01.log 
binäre Suche: danach noch 37903 Commits zum Testen übrig (ungefähr 15 Schritte)
[9abf8acea297b4c65f5fa3206e2b8e468e730e84] Merge tag 'tty-4.17-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
binäre Suche: danach noch 19051 Commits zum Testen übrig (ungefähr 14 Schritte)
[7c00e8ae041b349992047769af741b67379ce19a] Merge tag 'armsoc-soc' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
binäre Suche: danach noch 9762 Commits zum Testen übrig (ungefähr 13 Schritte)
[dafa5f6577a9eecd2941add553d1672c30b02364] Merge branch 'linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
binäre Suche: danach noch 4644 Commits zum Testen übrig (ungefähr 12 Schritte)
[2ed9db3074fcd8d12709fe40ff0e691d74229818] net: sched: cls_api: fix dead code
in switch
binäre Suche: danach noch 2319 Commits zum Testen übrig (ungefähr 11 Schritte)
[b219a1d2de0c025318475e3bbf8e3215cf49d083] Merge branch 'for-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
binäre Suche: danach noch 1153 Commits zum Testen übrig (ungefähr 10 Schritte)
[85a0b791bc17f7a49280b33e2905d109c062a47b] Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
binäre Suche: danach noch 629 Commits zum Testen übrig (ungefähr 9 Schritte)
[10f3e23f07cb0c20f9bcb77a5b5a7eb2a1b2a2fe] Merge tag 'ext4_for_linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
binäre Suche: danach noch 273 Commits zum Testen übrig (ungefähr 8 Schritte)
[575b94386bd539a7d803aee9fd4a8d275844c40f] Merge tag 'locks-v4.19-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
binäre Suche: danach noch 136 Commits zum Testen übrig (ungefähr 7 Schritte)
[d7e8555b1dd493c809e56e359974eecabe7d3fde] btrfs: remove unused member
async_submit_bio::fs_info
binäre Suche: danach noch 68 Commits zum Testen übrig (ungefähr 6 Schritte)
[389305b2aa68723c754f88d9dbd268a400e10664] btrfs: relocation: Only remove reloc
rb_trees if reloc control has been initialized
binäre Suche: danach noch 34 Commits zum Testen übrig (ungefähr 5 Schritte)
[d814a49198eafa6163698bdd93961302f3a877a4] btrfs: use correct compare function
of dirty_metadata_bytes
binäre Suche: danach noch 16 Commits zum Testen übrig (ungefähr 4 Schritte)
[c7b562c5480322ffaf591f45a4ff7ee089340ab4] btrfs: raid56: catch errors from
full_stripe_write
binäre Suche: danach noch 8 Commits zum Testen übrig (ungefähr 3 Schritte)
[65ad010488a5cc0f123a9924f7ad26a1b3f6a4f6] btrfs: pass only eb to
num_extent_pages
binäre Suche: danach noch 3 Commits zum Testen übrig (ungefähr 2 Schritte)
[37508515621551538addaf826ab4b8a9aaf0a382] btrfs: simplify some assignments of
inode numbers
binäre Suche: danach noch 1 Commit zum Testen übrig (ungefähr 1 Schritt)
[69d2480456d1baf027a86e530989d7bedd698d5f] btrfs: use copy_page for copying
pages instead of memcpy
binäre Suche: danach noch 0 Commits zum Testen übrig (ungefähr 0 Schritte)
[3ffbd68c48320730ef64ebfb5e639220f1f65483] btrfs: simplify pointer chasing of
local fs_info variables
69d2480456d1baf027a86e530989d7bedd698d5f is the first bad commit
commit 69d2480456d1baf027a86e530989d7bedd698d5f
Author: David Sterba 
Date:   Fri Jun 29 10:56:44 2018 +0200

btrfs: use copy_page for copying pages instead of memcpy

Use the helper that's possibly optimized for full page copies.

Signed-off-by: David Sterba 

:04 04 87de10a38618c1655c3266ff5a31358068fa1ca6
d0a2612d260215acaff66adaa5183ebd29a4b710 M  fs

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #284035|0   |1
is obsolete||

--- Comment #14 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 284353
  --> https://bugzilla.kernel.org/attachment.cgi?id=284353=edit
kernel .config (PowerMac G4 DP, kernel 4.18.0-rc8+, final bisect)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Kishon Vijay Abraham I



On 13/08/19 11:58 AM, Xiaowei Bao wrote:
> The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1
> is 32bit, BAR2 and BAR4 is 64bit, this is determined by hardware,
> so set the bar_fixed_64bit with 0x14.
> 
> Signed-off-by: Xiaowei Bao 

Acked-by: Kishon Vijay Abraham I 
> ---
> v2:
>  - Replace value 0x14 with a macro.
> v3:
>  - No change.
> v4:
>  - send the patch again with '--to'.
> v5:
>  - fix the commit message.
> 
>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index be61d96..ca9aa45 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -44,6 +44,7 @@ static const struct pci_epc_features ls_pcie_epc_features = 
> {
>   .linkup_notifier = false,
>   .msi_capable = true,
>   .msix_capable = false,
> + .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
>  };
>  
>  static const struct pci_epc_features*
> 


RE: [EXT] Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Xiaowei Bao


> -Original Message-
> From: Kishon Vijay Abraham I 
> Sent: 2019年8月13日 15:30
> To: Xiaowei Bao ; lorenzo.pieral...@arm.com;
> bhelg...@google.com; M.h. Lian ; Mingkai Hu
> ; Roy Zang ;
> l.st...@pengutronix.de; tpie...@impinj.com; Leonard Crestez
> ; andrew.smir...@gmail.com;
> yue.w...@amlogic.com; hayashi.kunih...@socionext.com;
> d...@amazon.co.uk; jon...@amazon.com; linux-...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-arm-ker...@lists.infradead.org
> Subject: [EXT] Re: [PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit
> property in EP driver.
> 
> Caution: EXT Email
> 
> On 13/08/19 11:58 AM, Xiaowei Bao wrote:
> > The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1 is
> > 32bit, BAR2 and BAR4 is 64bit, this is determined by hardware, so set
> > the bar_fixed_64bit with 0x14.
> >
> > Signed-off-by: Xiaowei Bao 
> 
> Acked-by: Kishon Vijay Abraham I 
> > ---
> > v2:
> >  - Replace value 0x14 with a macro.
> > v3:
> >  - No change.
> > v4:
> >  - send the patch again with '--to'.
> > v5:
> >  - fix the commit message.
> >
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index be61d96..ca9aa45 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -44,6 +44,7 @@ static const struct pci_epc_features
> ls_pcie_epc_features = {
> >   .linkup_notifier = false,
> >   .msi_capable = true,
> >   .msix_capable = false,
> > + .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> >  };
> >
> >  static const struct pci_epc_features*
I check other platforms, it is 'static const struct pci_epc_features', I can 
get the correct 
Value use this define way in pci-epf-test.c file.
> >


[PATCHv5 2/2] PCI: layerscape: Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC separately

2019-08-13 Thread Xiaowei Bao
Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC separately.

Signed-off-by: Xiaowei Bao 
---
v2:
 - No change.
v3:
 - modify the commit message.
v4:
 - send the patch again with '--to'.
v5:
 - No change.

 drivers/pci/controller/dwc/Kconfig  | 20 ++--
 drivers/pci/controller/dwc/Makefile |  3 ++-
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/dwc/Kconfig 
b/drivers/pci/controller/dwc/Kconfig
index 6ea778a..869c645 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -131,13 +131,29 @@ config PCI_KEYSTONE_EP
  DesignWare core functions to implement the driver.
 
 config PCI_LAYERSCAPE
-   bool "Freescale Layerscape PCIe controller"
+   bool "Freescale Layerscape PCIe controller - Host mode"
depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
depends on PCI_MSI_IRQ_DOMAIN
select MFD_SYSCON
select PCIE_DW_HOST
help
- Say Y here if you want PCIe controller support on Layerscape SoCs.
+ Say Y here if you want to enable PCIe controller support on Layerscape
+ SoCs to work in Host mode.
+ This controller can work either as EP or RC. The RCW[HOST_AGT_PEX]
+ determines which PCIe controller works in EP mode and which PCIe
+ controller works in RC mode.
+
+config PCI_LAYERSCAPE_EP
+   bool "Freescale Layerscape PCIe controller - Endpoint mode"
+   depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
+   depends on PCI_ENDPOINT
+   select PCIE_DW_EP
+   help
+ Say Y here if you want to enable PCIe controller support on Layerscape
+ SoCs to work in Endpoint mode.
+ This controller can work either as EP or RC. The RCW[HOST_AGT_PEX]
+ determines which PCIe controller works in EP mode and which PCIe
+ controller works in RC mode.
 
 config PCI_HISI
depends on OF && (ARM64 || COMPILE_TEST)
diff --git a/drivers/pci/controller/dwc/Makefile 
b/drivers/pci/controller/dwc/Makefile
index b085dfd..824fde7 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -8,7 +8,8 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
 obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
 obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone.o
-obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
+obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
+obj-$(CONFIG_PCI_LAYERSCAPE_EP) += pci-layerscape-ep.o
 obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
 obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
 obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
-- 
2.9.5



[PATCHv5 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Xiaowei Bao
The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1
is 32bit, BAR2 and BAR4 is 64bit, this is determined by hardware,
so set the bar_fixed_64bit with 0x14.

Signed-off-by: Xiaowei Bao 
---
v2:
 - Replace value 0x14 with a macro.
v3:
 - No change.
v4:
 - send the patch again with '--to'.
v5:
 - fix the commit message.

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index be61d96..ca9aa45 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -44,6 +44,7 @@ static const struct pci_epc_features ls_pcie_epc_features = {
.linkup_notifier = false,
.msi_capable = true,
.msix_capable = false,
+   .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
 };
 
 static const struct pci_epc_features*
-- 
2.9.5



RE: [EXT] Re: [PATCHv4 1/2] PCI: layerscape: Add the bar_fixed_64bit property in EP driver.

2019-08-13 Thread Xiaowei Bao


> -Original Message-
> From: Kishon Vijay Abraham I 
> Sent: 2019年8月13日 12:36
> To: Xiaowei Bao ; lorenzo.pieral...@arm.com;
> bhelg...@google.com; M.h. Lian ; Mingkai Hu
> ; Roy Zang ;
> l.st...@pengutronix.de; tpie...@impinj.com; Leonard Crestez
> ; andrew.smir...@gmail.com;
> yue.w...@amlogic.com; hayashi.kunih...@socionext.com;
> d...@amazon.co.uk; jon...@amazon.com; linux-...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-arm-ker...@lists.infradead.org
> Subject: [EXT] Re: [PATCHv4 1/2] PCI: layerscape: Add the bar_fixed_64bit
> property in EP driver.
> 
> Caution: EXT Email
> 
> On 13/08/19 8:23 AM, Xiaowei Bao wrote:
> > The PCIe controller of layerscape just have 4 BARs, BAR0 and BAR1 is
> > 32bit, BAR3 and BAR4 is 64bit, this is determined by hardware,
> 
> Do you mean BAR2 instead of BAR3 here?
Yes.
> 
> Thanks
> Kishon
> 
> > so set the bar_fixed_64bit with 0x14.
> >
> > Signed-off-by: Xiaowei Bao 
> > ---
> > v2:
> >  - Replace value 0x14 with a macro.
> > v3:
> >  - No change.
> > v4:
> >  - send the patch again with '--to'.
> >
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c |1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index be61d96..227c33b 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -44,6 +44,7 @@ static int ls_pcie_establish_link(struct dw_pcie *pci)
> >   .linkup_notifier = false,
> >   .msi_capable = true,
> >   .msix_capable = false,
> > + .bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> >  };
> >
> >  static const struct pci_epc_features*
> >