Re: [PATCH 05/26] vfio: KVM: Pass get/put helpers from KVM to VFIO, don't do circular lookup

2023-12-01 Thread Sean Christopherson
On Mon, Sep 18, 2023, Jason Gunthorpe wrote:
> On Mon, Sep 18, 2023 at 08:49:57AM -0700, Sean Christopherson wrote:
> > On Mon, Sep 18, 2023, Jason Gunthorpe wrote:
> > > On Fri, Sep 15, 2023 at 05:30:57PM -0700, Sean Christopherson wrote:
> > > > Explicitly pass KVM's get/put helpers to VFIO when attaching a VM to
> > > > VFIO instead of having VFIO do a symbol lookup back into KVM.  Having 
> > > > both
> > > > KVM and VFIO do symbol lookups increases the overall complexity and 
> > > > places
> > > > an unnecessary dependency on KVM (from VFIO) without adding any value.
> > > > 
> > > > Signed-off-by: Sean Christopherson 
> > > > ---
> > > >  drivers/vfio/vfio.h  |  2 ++
> > > >  drivers/vfio/vfio_main.c | 74 +++-
> > > >  include/linux/vfio.h |  4 ++-
> > > >  virt/kvm/vfio.c  |  9 +++--
> > > >  4 files changed, 47 insertions(+), 42 deletions(-)
> > > 
> > > I don't mind this, but Christoph had disliked my prior attempt to do
> > > this with function pointers..
> > > 
> > > The get can be inlined, IIRC, what about putting a pointer to the put
> > > inside the kvm struct?
> > 
> > That wouldn't allow us to achieve our goal, which is to hide the details of
> > "struct kvm" from VFIO (and the rest of the kernel).
> 
> > What's the objection to handing VFIO a function pointer?
> 
> Hmm, looks like it was this thread:
> 
>  
> https://lore.kernel.org/r/0-v1-33906a626da1+16b0-vfio_kvm_no_group_...@nvidia.com
> 
> Your rational looks a little better to me.
> 
> > > The the normal kvm get/put don't have to exported symbols at all?
> > 
> > The export of kvm_get_kvm_safe() can go away (I forgot to do that in this 
> > series),
> > but kvm_get_kvm() will hang around as it's needed by KVM sub-modules (PPC 
> > and x86),
> > KVMGT (x86), and drivers/s390/crypto/vfio_ap_ops.c (no idea what to call 
> > that beast).
> 
> My thought would be to keep it as an inline, there should be some way
> to do that without breaking your desire to hide the bulk of the kvm
> struct content. Like put the refcount as the first element in the
> struct and just don't ifdef it away?.

That doesn't work because of the need to invoke kvm_destroy_vm() when the last
reference is put, i.e. all of kvm_destroy_vm() would need to be inlined (LOL) or
VFIO would need to do a symbol lookup on kvm_destroy_vm(), which puts back us at
square one.

There's one more wrinkle: this patch is buggy in that it doesn't ensure the 
liveliness
of KVM-the-module, i.e. nothing prevents userspace from unloading kvm.ko while 
VFIO
still holds a reference to a kvm structure, and so invoking ->put_kvm() could 
jump
into freed code.  To fix that, KVM would also need to pass along a module 
pointer :-(

One thought would be to have vac.ko (tentative name), which is the "base" module
that will hold the KVM/virtualization bits that need to be singletons, i.e. 
can't
be per-KVM, provide the symbols needed for VFIO to manage references.  But that
just ends up moving the module reference trickiness into VAC+KVM, e.g. vac.ko 
would
still need to be handed a function pointer in order to call back into the 
correct
kvm.ko code.

Hrm, but I suspect the vac.ko <=> kvm.ko interactions will need to deal with
module shenanigans anyways, and the shenanigans would only affect crazy people
like us that actually want multiple KVM modules.

I'll plan on going that route.  The very worst case scenario is that it just 
punts
this conversation down to a possibile future.  Dropping this patch and the 
previous
prep patch won't meaningful affect the goals of this series, as only 
kvm_get_kvm_safe()
and kvm_get_kvm() would need to be exposed outside of #ifdef __KVM__.  Then we 
can
figure out what to do with them if/when the whole multi-KVM thing comes along.


Re: [PATCH 1/6] x86: Use PCI_HEADER_TYPE_* instead of literals

2023-12-01 Thread Bjorn Helgaas
[+cc scsi, powerpc folks]

On Fri, Dec 01, 2023 at 02:44:47PM -0600, Bjorn Helgaas wrote:
> On Fri, Nov 24, 2023 at 11:09:13AM +0200, Ilpo Järvinen wrote:
> > Replace 0x7f and 0x80 literals with PCI_HEADER_TYPE_* defines.
> > 
> > Signed-off-by: Ilpo Järvinen 
> 
> Applied entire series on the PCI "enumeration" branch for v6.8,
> thanks!
> 
> If anybody wants to take pieces separately, let me know and I'll drop
> from PCI.

OK, b4 picked up the entire series but I was only cc'd on this first
patch, so I missed the responses about EDAC, xtensa, bcma already
being applied elsewhere.

So I kept these in the PCI tree:

  420ac76610d7 ("scsi: lpfc: Use PCI_HEADER_TYPE_MFD instead of literal")
  3773343dd890 ("powerpc/fsl-pci: Use PCI_HEADER_TYPE_MASK instead of literal")
  197e0da1f1a3 ("x86/pci: Use PCI_HEADER_TYPE_* instead of literals")

and dropped the others.

x86, SCSI, powerpc folks, if you want to take these instead, let me
know and I'll drop them.

> > ---
> >  arch/x86/kernel/aperture_64.c  | 3 +--
> >  arch/x86/kernel/early-quirks.c | 4 ++--
> >  2 files changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
> > index 4feaa670d578..89c0c8a3fc7e 100644
> > --- a/arch/x86/kernel/aperture_64.c
> > +++ b/arch/x86/kernel/aperture_64.c
> > @@ -259,10 +259,9 @@ static u32 __init search_agp_bridge(u32 *order, int 
> > *valid_agp)
> > order);
> > }
> >  
> > -   /* No multi-function device? */
> > type = read_pci_config_byte(bus, slot, func,
> >PCI_HEADER_TYPE);
> > -   if (!(type & 0x80))
> > +   if (!(type & PCI_HEADER_TYPE_MFD))
> > break;
> > }
> > }
> > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> > index a6c1867fc7aa..59f4aefc6bc1 100644
> > --- a/arch/x86/kernel/early-quirks.c
> > +++ b/arch/x86/kernel/early-quirks.c
> > @@ -779,13 +779,13 @@ static int __init check_dev_quirk(int num, int slot, 
> > int func)
> > type = read_pci_config_byte(num, slot, func,
> > PCI_HEADER_TYPE);
> >  
> > -   if ((type & 0x7f) == PCI_HEADER_TYPE_BRIDGE) {
> > +   if ((type & PCI_HEADER_TYPE_MASK) == PCI_HEADER_TYPE_BRIDGE) {
> > sec = read_pci_config_byte(num, slot, func, PCI_SECONDARY_BUS);
> > if (sec > num)
> > early_pci_scan_bus(sec);
> > }
> >  
> > -   if (!(type & 0x80))
> > +   if (!(type & PCI_HEADER_TYPE_MFD))
> > return -1;
> >  
> > return 0;
> > -- 
> > 2.30.2
> > 


RE: [PATCH v5 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-12-01 Thread Roy Zang
> From: Frank Li 
> Subject: [PATCH v5 4/4] PCI: layerscape: Add suspend/resume for ls1043a
> 
> Add suspend/resume support for Layerscape LS1043a.
> 
> In the suspend path, PME_Turn_Off message is sent to the endpoint to
> transition the link to L2/L3_Ready state. In this SoC, there is no way to 
> check if
> the controller has received the PME_To_Ack from the endpoint or not. So to be
> on the safer side, the driver just waits for PCIE_PME_TO_L2_TIMEOUT_US
> before asserting the SoC specific PMXMTTURNOFF bit to complete the
> PME_Turn_Off handshake. Then the link would enter L2/L3 state depending on
> the VAUX supply.
> 
> In the resume path, the link is brought back from L2 to L0 by doing a software
> reset.
> 
> Signed-off-by: Frank Li 
Acked-by:  Roy Zang 
Roy


RE: [PATCH v5 3/4] PCI: layerscape(ep): Rename pf_* as pf_lut_*

2023-12-01 Thread Roy Zang
> From: Frank Li 
> Subject: [PATCH v5 3/4] PCI: layerscape(ep): Rename pf_* as pf_lut_*
> 
> 'pf' and 'lut' is just difference name in difference chips, but basic it is a 
> MMIO
> base address plus an offset.
> 
> Rename it to avoid duplicate pf_* and lut_* in driver.
> 
> Signed-off-by: Frank Li 
Acked-by:  Roy Zang 
Roy


RE: [PATCH v5 2/4] PCI: layerscape: Add suspend/resume for ls1021a

2023-12-01 Thread Roy Zang
> From: Frank Li 
> Subject: [PATCH v5 2/4] PCI: layerscape: Add suspend/resume for ls1021a
> 
> Add suspend/resume support for Layerscape LS1021a.
> 
> In the suspend path, PME_Turn_Off message is sent to the endpoint to
> transition the link to L2/L3_Ready state. In this SoC, there is no way to 
> check if
> the controller has received the PME_To_Ack from the endpoint or not. So to be
> on the safer side, the driver just waits for PCIE_PME_TO_L2_TIMEOUT_US
> before asserting the SoC specific PMXMTTURNOFF bit to complete the
> PME_Turn_Off handshake. Then the link would enter L2/L3 state depending on
> the VAUX supply.
> 
> In the resume path, the link is brought back from L2 to L0 by doing a software
> reset.
> 
> Signed-off-by: Frank Li 
Acked-by:  Roy Zang 
Roy


RE: [PATCH v5 1/4] PCI: layerscape: Add function pointer for exit_from_l2()

2023-12-01 Thread Roy Zang
> From: Frank Li 
> Subject: [PATCH v5 1/4] PCI: layerscape: Add function pointer for 
> exit_from_l2()
> 
> Since difference SoCs require different sequence for exiting L2, let's add a
> separate "exit_from_l2()" callback. This callback can be used to execute SoC
> specific sequence.
> 
> Change ls_pcie_exit_from_l2() return value from void to int. Return error if
> exit_from_l2() failure at exit resume flow.
> 
> Reviewed-by: Manivannan Sadhasivam 
> Signed-off-by: Frank Li 
Acked-by:  Roy Zang 
Roy


[PATCH v5 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-12-01 Thread Frank Li
Add suspend/resume support for Layerscape LS1043a.

In the suspend path, PME_Turn_Off message is sent to the endpoint to
transition the link to L2/L3_Ready state. In this SoC, there is no way to
check if the controller has received the PME_To_Ack from the endpoint or
not. So to be on the safer side, the driver just waits for
PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
bit to complete the PME_Turn_Off handshake. Then the link would enter L2/L3
state depending on the VAUX supply.

In the resume path, the link is brought back from L2 to L0 by doing a
software reset.

Signed-off-by: Frank Li 
---

Notes:
Change from v4 to v5
- update commit message
- use comments
/* Reset the PEX wrapper to bring the link out of L2 */

Change from v3 to v4
- Call scfg_pcie_send_turnoff_msg() shared with ls1021a
- update commit message

Change from v2 to v3
- Remove ls_pcie_lut_readl(writel) function

Change from v1 to v2
- Update subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 63 -
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index a9151e98fde6f..715365e91f8ef 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -41,6 +41,15 @@
 #define SCFG_PEXSFTRSTCR   0x190
 #define PEXSR(idx) BIT(idx)
 
+/* LS1043A PEX PME control register */
+#define SCFG_PEXPMECR  0x144
+#define PEXPME(idx)BIT(31 - (idx) * 4)
+
+/* LS1043A PEX LUT debug register */
+#define LS_PCIE_LDBG   0x7fc
+#define LDBG_SRBIT(30)
+#define LDBG_WEBIT(31)
+
 #define PCIE_IATU_NUM  6
 
 struct ls_pcie_drvdata {
@@ -224,6 +233,45 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
PEXSR(pcie->index));
 }
 
+static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+
+   scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
PEXPME(pcie->index));
+}
+
+static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+   u32 val;
+
+   /*
+* Reset the PEX wrapper to bring the link out of L2.
+* LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
both setting and
+*  clearing the soft reset on the PEX module.
+* LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
+*/
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val |= LDBG_WE;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val |= LDBG_SR;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val &= ~LDBG_SR;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
+   val &= ~LDBG_WE;
+   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
+
+   return 0;
+}
+
 static const struct dw_pcie_host_ops ls_pcie_host_ops = {
.host_init = ls_pcie_host_init,
.pme_turn_off = ls_pcie_send_turnoff_msg,
@@ -241,6 +289,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
.exit_from_l2 = ls1021a_pcie_exit_from_l2,
 };
 
+static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
+   .host_init = ls_pcie_host_init,
+   .pme_turn_off = ls1043a_pcie_send_turnoff_msg,
+};
+
+static const struct ls_pcie_drvdata ls1043a_drvdata = {
+   .pf_lut_off = 0x1,
+   .pm_support = true,
+   .scfg_support = true,
+   .ops = _pcie_host_ops,
+   .exit_from_l2 = ls1043a_pcie_exit_from_l2,
+};
+
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_lut_off = 0xc,
.pm_support = true,
@@ -252,7 +313,7 @@ static const struct of_device_id ls_pcie_of_match[] = {
{ .compatible = "fsl,ls1012a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1021a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1028a-pcie", .data = _drvdata },
-   { .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
+   { .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1046a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls2080a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls2085a-pcie", .data = _drvdata },
-- 
2.34.1



[PATCH v5 3/4] PCI: layerscape(ep): Rename pf_* as pf_lut_*

2023-12-01 Thread Frank Li
'pf' and 'lut' is just difference name in difference chips, but basic it is
a MMIO base address plus an offset.

Rename it to avoid duplicate pf_* and lut_* in driver.

Signed-off-by: Frank Li 
---

Notes:
pf_lut is better than pf_* or lut* because some chip use 'pf', some chip
use 'lut'.

Change from v4 to v5
- rename layerscape-ep code also
change from v1 to v4
- new patch at v3

 .../pci/controller/dwc/pci-layerscape-ep.c| 16 -
 drivers/pci/controller/dwc/pci-layerscape.c   | 36 +--
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 3d3c50ef4b6ff..2ca339f938a86 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -49,7 +49,7 @@ struct ls_pcie_ep {
boolbig_endian;
 };
 
-static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
+static u32 ls_pcie_pf_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
 {
struct dw_pcie *pci = pcie->pci;
 
@@ -59,7 +59,7 @@ static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
return ioread32(pci->dbi_base + offset);
 }
 
-static void ls_lut_writel(struct ls_pcie_ep *pcie, u32 offset, u32 value)
+static void ls_pcie_pf_lut_writel(struct ls_pcie_ep *pcie, u32 offset, u32 
value)
 {
struct dw_pcie *pci = pcie->pci;
 
@@ -76,8 +76,8 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
*dev_id)
u32 val, cfg;
u8 offset;
 
-   val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
-   ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
+   val = ls_pcie_pf_lut_readl(pcie, PEX_PF0_PME_MES_DR);
+   ls_pcie_pf_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
 
if (!val)
return IRQ_NONE;
@@ -96,9 +96,9 @@ static irqreturn_t ls_pcie_ep_event_handler(int irq, void 
*dev_id)
dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, pcie->lnkcap);
dw_pcie_dbi_ro_wr_dis(pci);
 
-   cfg = ls_lut_readl(pcie, PEX_PF0_CONFIG);
+   cfg = ls_pcie_pf_lut_readl(pcie, PEX_PF0_CONFIG);
cfg |= PEX_PF0_CFG_READY;
-   ls_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
+   ls_pcie_pf_lut_writel(pcie, PEX_PF0_CONFIG, cfg);
dw_pcie_ep_linkup(>ep);
 
dev_dbg(pci->dev, "Link up\n");
@@ -130,10 +130,10 @@ static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep 
*pcie,
}
 
/* Enable interrupts */
-   val = ls_lut_readl(pcie, PEX_PF0_PME_MES_IER);
+   val = ls_pcie_pf_lut_readl(pcie, PEX_PF0_PME_MES_IER);
val |=  PEX_PF0_PME_MES_IER_LDDIE | PEX_PF0_PME_MES_IER_HRDIE |
PEX_PF0_PME_MES_IER_LUDIE;
-   ls_lut_writel(pcie, PEX_PF0_PME_MES_IER, val);
+   ls_pcie_pf_lut_writel(pcie, PEX_PF0_PME_MES_IER, val);
 
return 0;
 }
diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 8bdaae9be7d56..a9151e98fde6f 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -44,7 +44,7 @@
 #define PCIE_IATU_NUM  6
 
 struct ls_pcie_drvdata {
-   const u32 pf_off;
+   const u32 pf_lut_off;
const struct dw_pcie_host_ops *ops;
int (*exit_from_l2)(struct dw_pcie_rp *pp);
bool scfg_support;
@@ -54,13 +54,13 @@ struct ls_pcie_drvdata {
 struct ls_pcie {
struct dw_pcie *pci;
const struct ls_pcie_drvdata *drvdata;
-   void __iomem *pf_base;
+   void __iomem *pf_lut_base;
struct regmap *scfg;
int index;
bool big_endian;
 };
 
-#define ls_pcie_pf_readl_addr(addr)ls_pcie_pf_readl(pcie, addr)
+#define ls_pcie_pf_lut_readl_addr(addr)ls_pcie_pf_lut_readl(pcie, addr)
 #define to_ls_pcie(x)  dev_get_drvdata((x)->dev)
 
 static bool ls_pcie_is_bridge(struct ls_pcie *pcie)
@@ -101,20 +101,20 @@ static void ls_pcie_fix_error_response(struct ls_pcie 
*pcie)
iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
 }
 
-static u32 ls_pcie_pf_readl(struct ls_pcie *pcie, u32 off)
+static u32 ls_pcie_pf_lut_readl(struct ls_pcie *pcie, u32 off)
 {
if (pcie->big_endian)
-   return ioread32be(pcie->pf_base + off);
+   return ioread32be(pcie->pf_lut_base + off);
 
-   return ioread32(pcie->pf_base + off);
+   return ioread32(pcie->pf_lut_base + off);
 }
 
-static void ls_pcie_pf_writel(struct ls_pcie *pcie, u32 off, u32 val)
+static void ls_pcie_pf_lut_writel(struct ls_pcie *pcie, u32 off, u32 val)
 {
if (pcie->big_endian)
-   iowrite32be(val, pcie->pf_base + off);
+   iowrite32be(val, pcie->pf_lut_base + off);
else
-   iowrite32(val, pcie->pf_base + off);
+   iowrite32(val, pcie->pf_lut_base + off);
 }
 
 static void 

[PATCH v5 2/4] PCI: layerscape: Add suspend/resume for ls1021a

2023-12-01 Thread Frank Li
Add suspend/resume support for Layerscape LS1021a.

In the suspend path, PME_Turn_Off message is sent to the endpoint to
transition the link to L2/L3_Ready state. In this SoC, there is no way to
check if the controller has received the PME_To_Ack from the endpoint or
not. So to be on the safer side, the driver just waits for
PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
bit to complete the PME_Turn_Off handshake. Then the link would enter L2/L3
state depending on the VAUX supply.

In the resume path, the link is brought back from L2 to L0 by doing a
software reset.

Signed-off-by: Frank Li 
---

Notes:
Change from v4 to v5
- update comit message
- remove a empty line
- use comments
/* Reset the PEX wrapper to bring the link out of L2 */
- pci->pp.ops = pcie->drvdata->ops,
ls_pcie_host_ops to the "ops" member of layerscape_drvdata.
- don't set pcie->scfg = NULL at error path

Change from v3 to v4
- update commit message.
- it is reset a glue logic part for PCI controller.
- use regmap_write_bits() to reduce code change.

Change from v2 to v3
- update according to mani's feedback
change from v1 to v2
- change subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 81 -
 1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index aea89926bcc4f..8bdaae9be7d56 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -35,11 +35,19 @@
 #define PF_MCR_PTOMR   BIT(0)
 #define PF_MCR_EXL2S   BIT(1)
 
+/* LS1021A PEXn PM Write Control Register */
+#define SCFG_PEXPMWRCR(idx)(0x5c + (idx) * 0x64)
+#define PMXMTTURNOFF   BIT(31)
+#define SCFG_PEXSFTRSTCR   0x190
+#define PEXSR(idx) BIT(idx)
+
 #define PCIE_IATU_NUM  6
 
 struct ls_pcie_drvdata {
const u32 pf_off;
+   const struct dw_pcie_host_ops *ops;
int (*exit_from_l2)(struct dw_pcie_rp *pp);
+   bool scfg_support;
bool pm_support;
 };
 
@@ -47,6 +55,8 @@ struct ls_pcie {
struct dw_pcie *pci;
const struct ls_pcie_drvdata *drvdata;
void __iomem *pf_base;
+   struct regmap *scfg;
+   int index;
bool big_endian;
 };
 
@@ -171,18 +181,70 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp)
return 0;
 }
 
+static void scfg_pcie_send_turnoff_msg(struct regmap *scfg, u32 reg, u32 mask)
+{
+   /* Send PME_Turn_Off message */
+   regmap_write_bits(scfg, reg, mask, mask);
+
+   /*
+* There is no specific register to check for PME_To_Ack from endpoint.
+* So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US.
+*/
+   mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000);
+
+   /*
+* Layerscape hardware reference manual recommends clearing the 
PMXMTTURNOFF bit
+* to complete the PME_Turn_Off handshake.
+*/
+   regmap_write_bits(scfg, reg, mask, 0);
+}
+
+static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+
+   scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), 
PMXMTTURNOFF);
+}
+
+static int scfg_pcie_exit_from_l2(struct regmap *scfg, u32 reg, u32 mask)
+{
+   /* Reset the PEX wrapper to bring the link out of L2 */
+   regmap_write_bits(scfg, reg, mask, mask);
+   regmap_write_bits(scfg, reg, mask, 0);
+
+   return 0;
+}
+
+static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+
+   return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
PEXSR(pcie->index));
+}
+
 static const struct dw_pcie_host_ops ls_pcie_host_ops = {
.host_init = ls_pcie_host_init,
.pme_turn_off = ls_pcie_send_turnoff_msg,
 };
 
+static const struct dw_pcie_host_ops ls1021a_pcie_host_ops = {
+   .host_init = ls_pcie_host_init,
+   .pme_turn_off = ls1021a_pcie_send_turnoff_msg,
+};
+
 static const struct ls_pcie_drvdata ls1021a_drvdata = {
-   .pm_support = false,
+   .pm_support = true,
+   .scfg_support = true,
+   .ops = _pcie_host_ops,
+   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
 };
 
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_off = 0xc,
.pm_support = true,
+   .ops = _pcie_host_ops;
.exit_from_l2 = ls_pcie_exit_from_l2,
 };
 
@@ -205,6 +267,8 @@ static int ls_pcie_probe(struct platform_device *pdev)
struct dw_pcie *pci;
struct ls_pcie *pcie;
struct resource *dbi_base;
+   u32 index[2];
+   int ret;
 
pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
if (!pcie)
@@ -220,6 +284,7 @@ static int ls_pcie_probe(struct 

[PATCH v5 1/4] PCI: layerscape: Add function pointer for exit_from_l2()

2023-12-01 Thread Frank Li
Since difference SoCs require different sequence for exiting L2, let's add
a separate "exit_from_l2()" callback. This callback can be used to execute
SoC specific sequence.

Change ls_pcie_exit_from_l2() return value from void to int. Return error
if exit_from_l2() failure at exit resume flow.

Reviewed-by: Manivannan Sadhasivam 
Signed-off-by: Frank Li 
---

Notes:
Change from v4 to v5
- none
Change from v3 to v4
- update commit message
  Add mani's review by tag
Change from v2 to v3
- fixed according to mani's feedback
  1. update commit message
  2. move dw_pcie_host_ops to next patch
  3. check return value from exit_from_l2()
Change from v1 to v2
- change subject 'a' to 'A'

Change from v1 to v2
- change subject 'a' to 'A'

 drivers/pci/controller/dwc/pci-layerscape.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 37956e09c65bd..aea89926bcc4f 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -39,6 +39,7 @@
 
 struct ls_pcie_drvdata {
const u32 pf_off;
+   int (*exit_from_l2)(struct dw_pcie_rp *pp);
bool pm_support;
 };
 
@@ -125,7 +126,7 @@ static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
dev_err(pcie->pci->dev, "PME_Turn_off timeout\n");
 }
 
-static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
+static int ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
 {
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
struct ls_pcie *pcie = to_ls_pcie(pci);
@@ -150,6 +151,8 @@ static void ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
 1);
if (ret)
dev_err(pcie->pci->dev, "L2 exit timeout\n");
+
+   return ret;
 }
 
 static int ls_pcie_host_init(struct dw_pcie_rp *pp)
@@ -180,6 +183,7 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
 static const struct ls_pcie_drvdata layerscape_drvdata = {
.pf_off = 0xc,
.pm_support = true,
+   .exit_from_l2 = ls_pcie_exit_from_l2,
 };
 
 static const struct of_device_id ls_pcie_of_match[] = {
@@ -247,11 +251,14 @@ static int ls_pcie_suspend_noirq(struct device *dev)
 static int ls_pcie_resume_noirq(struct device *dev)
 {
struct ls_pcie *pcie = dev_get_drvdata(dev);
+   int ret;
 
if (!pcie->drvdata->pm_support)
return 0;
 
-   ls_pcie_exit_from_l2(>pci->pp);
+   ret = pcie->drvdata->exit_from_l2(>pci->pp);
+   if (ret)
+   return ret;
 
return dw_pcie_resume_noirq(pcie->pci);
 }
-- 
2.34.1



[PATCH v5 0/4] PCI: layerscape: Add suspend/resume support for ls1043 and ls1021

2023-12-01 Thread Frank Li
Add suspend/resume support for ls1043 and ls1021.

Change log see each patch

Frank Li (4):
  PCI: layerscape: Add function pointer for exit_from_l2()
  PCI: layerscape: Add suspend/resume for ls1021a
  PCI: layerscape(ep): Rename pf_* as pf_lut_*
  PCI: layerscape: Add suspend/resume for ls1043a

 .../pci/controller/dwc/pci-layerscape-ep.c|  16 +-
 drivers/pci/controller/dwc/pci-layerscape.c   | 189 --
 2 files changed, 176 insertions(+), 29 deletions(-)

-- 
2.34.1



[PATCH 11/12] KVM: PPC: Reduce reliance on analyse_instr() in mmio emulation

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

Commit 709236039964 ("KVM: PPC: Reimplement non-SIMD LOAD/STORE
instruction mmio emulation with analyse_instr() input") and
commit 2b33cb585f94 ("KVM: PPC: Reimplement LOAD_FP/STORE_FP instruction
mmio emulation with analyse_instr() input") made
kvmppc_emulate_loadstore() use the results from analyse_instr() for
instruction emulation. In particular the effective address from
analyse_instr() is used for UPDATE type instructions and fact that
op.val is all ready endian corrected is used in the STORE case.

However, these changes now have some negative implications for the
nestedv2 case.  For analyse_instr() to determine the correct effective
address, the GPRs must be loaded from the L0. This is not needed as
vcpu->arch.vaddr_accessed is already set. Change back to using
vcpu->arch.vaddr_accessed.

In the STORE case, use kvmppc_get_gpr() value instead of the op.val.
kvmppc_get_gpr() will reload from the L0 if needed in the nestedv2 case.
This means if a byte reversal is needed must now be passed to
kvmppc_handle_store() like in the kvmppc_handle_load() case.

This means the call to kvmhv_nestedv2_reload_ptregs() can be avoided as
there is no concern about op.val being stale. Drop the call to
kvmhv_nestedv2_mark_dirty_ptregs() as without the call to
kvmhv_nestedv2_reload_ptregs(), stale state could be marked as valid.

This is fine as the required marking things dirty is already handled for
the UPDATE case by the call to kvmppc_set_gpr(). For LOADs, it is
handled in kvmppc_complete_mmio_load(). This is called either directly
in __kvmppc_handle_load() if the load can be handled in KVM, or on the
next kvm_arch_vcpu_ioctl_run() if an exit was required.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/emulate_loadstore.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 077fd88a0b68..ec60c7979718 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -93,7 +93,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 
emulated = EMULATE_FAIL;
vcpu->arch.regs.msr = kvmppc_get_msr(vcpu);
-   kvmhv_nestedv2_reload_ptregs(vcpu, >arch.regs);
if (analyse_instr(, >arch.regs, inst) == 0) {
int type = op.type & INSTR_TYPE_MASK;
int size = GETSIZE(op.type);
@@ -112,7 +111,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
op.reg, size, !instr_byte_swap);
 
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
-   kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+   kvmppc_set_gpr(vcpu, op.update_reg, 
vcpu->arch.vaddr_accessed);
 
break;
}
@@ -132,7 +131,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 KVM_MMIO_REG_FPR|op.reg, size, 1);
 
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
-   kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+   kvmppc_set_gpr(vcpu, op.update_reg, 
vcpu->arch.vaddr_accessed);
 
break;
 #endif
@@ -224,16 +223,17 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
break;
}
 #endif
-   case STORE:
-   /* if need byte reverse, op.val has been reversed by
-* analyse_instr().
-*/
-   emulated = kvmppc_handle_store(vcpu, op.val, size, 1);
+   case STORE: {
+   int instr_byte_swap = op.type & BYTEREV;
+
+   emulated = kvmppc_handle_store(vcpu, 
kvmppc_get_gpr(vcpu, op.reg),
+  size, !instr_byte_swap);
 
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
-   kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+   kvmppc_set_gpr(vcpu, op.update_reg, 
vcpu->arch.vaddr_accessed);
 
break;
+   }
 #ifdef CONFIG_PPC_FPU
case STORE_FP:
if (kvmppc_check_fp_disabled(vcpu))
@@ -254,7 +254,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
kvmppc_get_fpr(vcpu, op.reg), size, 1);
 
if ((op.type & UPDATE) && (emulated != EMULATE_FAIL))
-   kvmppc_set_gpr(vcpu, op.update_reg, op.ea);
+   kvmppc_set_gpr(vcpu, op.update_reg, 
vcpu->arch.vaddr_accessed);
 
break;
 #endif
@@ -358,7 +358,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
}
 
trace_kvm_ppc_instr(ppc_inst_val(inst), 

[PATCH 09/12] KVM: PPC: Book3S HV nestedv2: Do not call H_COPY_TOFROM_GUEST

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

H_COPY_TOFROM_GUEST is part of the nestedv1 API and so should not be
called by a nestedv2 host. Do not attempt to call it.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 916af6c153a5..4a1abb9f7c05 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -40,6 +40,9 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
unsigned long quadrant, ret = n;
bool is_load = !!to;
 
+   if (kvmhv_is_nestedv2())
+   return H_UNSUPPORTED;
+
/* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
if (kvmhv_on_pseries())
return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr,
-- 
2.42.0



[PATCH 12/12] KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

In the nestedv2 case, if there is a pending decrementer exception, the
L1 must get the L2's timebase from the L0 to see if the exception should
be cancelled. This adds the overhead of a H_GUEST_GET_STATE call to the
likely case in which the decrementer should not be cancelled.

Avoid this logic for the nestedv2 case.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2ee3f2478570..e48126a59ba7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4834,7 +4834,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 * entering a nested guest in which case the decrementer is now owned
 * by L2 and the L1 decrementer is provided in hdec_expires
 */
-   if (kvmppc_core_pending_dec(vcpu) &&
+   if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) &&
((tb < kvmppc_dec_expires_host_tb(vcpu)) ||
 (trap == BOOK3S_INTERRUPT_SYSCALL &&
  kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED)))
-- 
2.42.0



[PATCH 03/12] KVM: PPC: Book3S HV nestedv2: Do not check msr on hcalls

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

The check for a hcall coming from userspace is done for KVM-PR. This is
not supported for nestedv2 and the L0 will directly inject the necessary
exception to the L2 if userspace performs a hcall. Avoid checking the
MSR and thus avoid a H_GUEST_GET_STATE hcall in the L1.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s_hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5543e8490cd9..069c336b6f3c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1688,7 +1688,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
{
int i;
 
-   if (unlikely(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
+   if (!kvmhv_is_nestedv2() && unlikely(__kvmppc_get_msr_hv(vcpu) 
& MSR_PR)) {
/*
 * Guest userspace executed sc 1. This can only be
 * reached by the P9 path because the old path
@@ -4949,7 +4949,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
accumulate_time(vcpu, >arch.hcall);
 
-   if (WARN_ON_ONCE(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
+   if (!kvmhv_is_nestedv2() && 
WARN_ON_ONCE(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
/*
 * These should have been caught reflected
 * into the guest by now. Final sanity check:
-- 
2.42.0



[PATCH 10/12] KVM: PPC: Book3S HV nestedv2: Register the VPA with the L0

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

In the nestedv2 case, the L1 may register the L2's VPA with the L0. This
allows the L0 to manage the L2's dispatch count, as well as enable
possible performance optimisations by seeing if certain resources are
not being used by the L2 (such as the PMCs).

Use the H_GUEST_SET_STATE call to inform the L0 of the L2's VPA
address. This can not be done in the H_GUEST_VCPU_RUN input buffer.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/kvm/book3s_hv.c | 38 ++--
 arch/powerpc/kvm/book3s_hv_nestedv2.c| 29 ++
 3 files changed, 59 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2477021bff54..d8729ec81ca0 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -682,6 +682,7 @@ void kvmhv_nestedv2_vcpu_free(struct kvm_vcpu *vcpu, struct 
kvmhv_nestedv2_io *i
 int kvmhv_nestedv2_flush_vcpu(struct kvm_vcpu *vcpu, u64 time_limit);
 int kvmhv_nestedv2_set_ptbl_entry(unsigned long lpid, u64 dw0, u64 dw1);
 int kvmhv_nestedv2_parse_output(struct kvm_vcpu *vcpu);
+int kvmhv_nestedv2_set_vpa(struct kvm_vcpu *vcpu, unsigned long vpa);
 
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47fe470375df..2ee3f2478570 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -650,7 +650,8 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
return err;
 }
 
-static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *vpap)
+static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *vpap,
+  struct kvmppc_vpa *old_vpap)
 {
struct kvm *kvm = vcpu->kvm;
void *va;
@@ -690,9 +691,8 @@ static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct 
kvmppc_vpa *vpap)
kvmppc_unpin_guest_page(kvm, va, gpa, false);
va = NULL;
}
-   if (vpap->pinned_addr)
-   kvmppc_unpin_guest_page(kvm, vpap->pinned_addr, vpap->gpa,
-   vpap->dirty);
+   *old_vpap = *vpap;
+
vpap->gpa = gpa;
vpap->pinned_addr = va;
vpap->dirty = false;
@@ -702,6 +702,9 @@ static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct 
kvmppc_vpa *vpap)
 
 static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 {
+   struct kvm *kvm = vcpu->kvm;
+   struct kvmppc_vpa old_vpa = { 0 };
+
if (!(vcpu->arch.vpa.update_pending ||
  vcpu->arch.slb_shadow.update_pending ||
  vcpu->arch.dtl.update_pending))
@@ -709,17 +712,34 @@ static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 
spin_lock(>arch.vpa_update_lock);
if (vcpu->arch.vpa.update_pending) {
-   kvmppc_update_vpa(vcpu, >arch.vpa);
-   if (vcpu->arch.vpa.pinned_addr)
+   kvmppc_update_vpa(vcpu, >arch.vpa, _vpa);
+   if (old_vpa.pinned_addr) {
+   if (kvmhv_is_nestedv2())
+   kvmhv_nestedv2_set_vpa(vcpu, ~0ull);
+   kvmppc_unpin_guest_page(kvm, old_vpa.pinned_addr, 
old_vpa.gpa,
+   old_vpa.dirty);
+   }
+   if (vcpu->arch.vpa.pinned_addr) {
init_vpa(vcpu, vcpu->arch.vpa.pinned_addr);
+   if (kvmhv_is_nestedv2())
+   kvmhv_nestedv2_set_vpa(vcpu, 
__pa(vcpu->arch.vpa.pinned_addr));
+   }
}
if (vcpu->arch.dtl.update_pending) {
-   kvmppc_update_vpa(vcpu, >arch.dtl);
+   kvmppc_update_vpa(vcpu, >arch.dtl, _vpa);
+   if (old_vpa.pinned_addr)
+   kvmppc_unpin_guest_page(kvm, old_vpa.pinned_addr, 
old_vpa.gpa,
+   old_vpa.dirty);
vcpu->arch.dtl_ptr = vcpu->arch.dtl.pinned_addr;
vcpu->arch.dtl_index = 0;
}
-   if (vcpu->arch.slb_shadow.update_pending)
-   kvmppc_update_vpa(vcpu, >arch.slb_shadow);
+   if (vcpu->arch.slb_shadow.update_pending) {
+   kvmppc_update_vpa(vcpu, >arch.slb_shadow, _vpa);
+   if (old_vpa.pinned_addr)
+   kvmppc_unpin_guest_page(kvm, old_vpa.pinned_addr, 
old_vpa.gpa,
+   old_vpa.dirty);
+   }
+
spin_unlock(>arch.vpa_update_lock);
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv_nestedv2.c 
b/arch/powerpc/kvm/book3s_hv_nestedv2.c
index fd3c4f2d9480..5378eb40b162 100644
--- a/arch/powerpc/kvm/book3s_hv_nestedv2.c
+++ b/arch/powerpc/kvm/book3s_hv_nestedv2.c
@@ -855,6 +855,35 @@ int kvmhv_nestedv2_set_ptbl_entry(unsigned long lpid, u64 
dw0, u64 dw1)
 }
 

[PATCH 08/12] KVM: PPC: Book3S HV nestedv2: Avoid msr check in kvmppc_handle_exit_hv()

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

The msr check in kvmppc_handle_exit_hv() is not needed for nestedv2 hosts,
skip the check to avoid a H_GUEST_GET_STATE hcall.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4dc6a928073f..47fe470375df 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1597,7 +1597,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 * That can happen due to a bug, or due to a machine check
 * occurring at just the wrong time.
 */
-   if (__kvmppc_get_msr_hv(vcpu) & MSR_HV) {
+   if (!kvmhv_is_nestedv2() && (__kvmppc_get_msr_hv(vcpu) & MSR_HV)) {
printk(KERN_EMERG "KVM trap in HV mode!\n");
printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
vcpu->arch.trap, kvmppc_get_pc(vcpu),
-- 
2.42.0



[PATCH 04/12] KVM: PPC: Book3S HV nestedv2: Get the PID only if needed to copy tofrom a guest

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

kvmhv_copy_tofrom_guest_radix() gets the PID at the start of the
function. If pid is not used, then this is a wasteful H_GUEST_GET_STATE
hcall for nestedv2 hosts. Move the assignment to where pid will be used.

Suggested-by: Nicholas Piggin 
Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 175a8eb2681f..916af6c153a5 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -97,7 +97,7 @@ static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu 
*vcpu, gva_t eaddr,
  void *to, void *from, unsigned long n)
 {
int lpid = vcpu->kvm->arch.lpid;
-   int pid = kvmppc_get_pid(vcpu);
+   int pid;
 
/* This would cause a data segment intr so don't allow the access */
if (eaddr & (0x3FFUL << 52))
@@ -110,6 +110,8 @@ static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu 
*vcpu, gva_t eaddr,
/* If accessing quadrant 3 then pid is expected to be 0 */
if (((eaddr >> 62) & 0x3) == 0x3)
pid = 0;
+   else
+   pid = kvmppc_get_pid(vcpu);
 
eaddr &= ~(0xFFFUL << 52);
 
-- 
2.42.0



[PATCH 07/12] KVM: PPC: Book3S HV nestedv2: Do not inject certain interrupts

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

There is no need to inject an external interrupt in
kvmppc_book3s_irqprio_deliver() as the test for BOOK3S_IRQPRIO_EXTERNAL
in kvmhv_run_single_vcpu() before guest entry will raise LPCR_MER if
needed. There is also no need to inject the decrementer interrupt as
this will be raised within the L2 if needed. Avoiding these injections
reduces H_GUEST_GET_STATE hcalls by the L1.

Suggested-by: Nicholas Piggin 
Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 6cd20ab9e94e..8acec144120e 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -302,11 +302,11 @@ static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu 
*vcpu,
 
switch (priority) {
case BOOK3S_IRQPRIO_DECREMENTER:
-   deliver = (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
+   deliver = !kvmhv_is_nestedv2() && (kvmppc_get_msr(vcpu) & 
MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_DECREMENTER;
break;
case BOOK3S_IRQPRIO_EXTERNAL:
-   deliver = (kvmppc_get_msr(vcpu) & MSR_EE) && !crit;
+   deliver = !kvmhv_is_nestedv2() && (kvmppc_get_msr(vcpu) & 
MSR_EE) && !crit;
vec = BOOK3S_INTERRUPT_EXTERNAL;
break;
case BOOK3S_IRQPRIO_SYSTEM_RESET:
-- 
2.42.0



[PATCH 00/12] KVM: PPC: Nested APIv2 : Performance improvements

2023-12-01 Thread Vaibhav Jain
From: vaibhav 


This patch series introduces series of performance improvements to recently
added support for Nested APIv2 PPC64 Guests via [1]. Details for Nested
APIv2 for PPC64 Guests is available in Documentation/powerpc/kvm-nested.rst.

This patch series introduces various optimizations for a Nested APIv2
guests namely:

* Reduce the number times L1 hypervisor requests for L2 state from L0.
* Register the L2 VPA with L1
* Optimizing interrupt delivery of some interrupt types.
* Optimize emulation of mmio loads/stores for L2 in L1.

The hcalls needed for testing these patches have been implemented in the
spapr qemu model and is available at [2].

There are scripts available to assist in setting up an environment for
testing nested guests at [3].

These patches are consequence of insights from on going performance
engineering effort for improving performance of Nested APIv2
Guests. Special thanks goes to:
* Gautam Menghani
* Jordan Niethe
* Nicholas Piggin
* Vaidyanathan Srinivasan

Refs:
[1] https://lore.kernel.org/all/20230905034658.82835-1-jniet...@gmail.com
[2] https://github.com/planetharsh/qemu/tree/upstream-0714-kop
[3] https://github.com/iamjpn/kvm-powervm-test

Jordan Niethe (11):
  KVM: PPC: Book3S HV nestedv2: Invalidate RPT before deleting a guest
  KVM: PPC: Book3S HV nestedv2: Avoid reloading the tb offset
  KVM: PPC: Book3S HV nestedv2: Do not check msr on hcalls
  KVM: PPC: Book3S HV nestedv2: Get the PID only if needed to copy
tofrom a guest
  KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0
  KVM: PPC: Book3S HV nestedv2: Do not inject certain interrupts
  KVM: PPC: Book3S HV nestedv2: Avoid msr check in
kvmppc_handle_exit_hv()
  KVM: PPC: Book3S HV nestedv2: Do not call H_COPY_TOFROM_GUEST
  KVM: PPC: Book3S HV nestedv2: Register the VPA with the L0
  KVM: PPC: Reduce reliance on analyse_instr() in mmio emulation
  KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer
exception

Nicholas Piggin (1):
  KVM: PPC: Book3S HV: Handle pending exceptions on guest entry with
MSR_EE

 arch/powerpc/include/asm/kvm_book3s.h| 10 +++-
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/kvm/book3s.c|  4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c   |  7 ++-
 arch/powerpc/kvm/book3s_hv.c | 72 +---
 arch/powerpc/kvm/book3s_hv_nested.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_nestedv2.c| 29 ++
 arch/powerpc/kvm/emulate_loadstore.c | 21 ---
 8 files changed, 107 insertions(+), 39 deletions(-)

-- 
2.42.0



[PATCH 06/12] KVM: PPC: Book3S HV: Handle pending exceptions on guest entry with MSR_EE

2023-12-01 Thread Vaibhav Jain
From: Nicholas Piggin 

Commit 026728dc5d41 ("KVM: PPC: Book3S HV P9: Inject pending xive
interrupts at guest entry") changed guest entry so that if external
interrupts are enabled, BOOK3S_IRQPRIO_EXTERNAL is not tested for. Test
for this regardless of MSR_EE.

For an L1 host, do not inject an interrupt, but always
use LPCR_MER. If the L0 desires it can inject an interrupt.

Fixes: 026728dc5d41 ("KVM: PPC: Book3S HV P9: Inject pending xive interrupts at 
guest entry")
Signed-off-by: Nicholas Piggin 
[jpn: use kvmpcc_get_msr(), write commit message]
Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s_hv.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6d1f0bca27aa..4dc6a928073f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4738,13 +4738,19 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 
if (!nested) {
kvmppc_core_prepare_to_enter(vcpu);
-   if (__kvmppc_get_msr_hv(vcpu) & MSR_EE) {
-   if (xive_interrupt_pending(vcpu))
+   if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
+>arch.pending_exceptions) ||
+   xive_interrupt_pending(vcpu)) {
+   /*
+* For nested HV, don't synthesize but always pass MER,
+* the L0 will be able to optimise that more
+* effectively than manipulating registers directly.
+*/
+   if (!kvmhv_on_pseries() && (__kvmppc_get_msr_hv(vcpu) & 
MSR_EE))
kvmppc_inject_interrupt_hv(vcpu,
-   BOOK3S_INTERRUPT_EXTERNAL, 0);
-   } else if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
->arch.pending_exceptions)) {
-   lpcr |= LPCR_MER;
+  
BOOK3S_INTERRUPT_EXTERNAL, 0);
+   else
+   lpcr |= LPCR_MER;
}
} else if (vcpu->arch.pending_exceptions ||
   vcpu->arch.doorbell_request ||
-- 
2.42.0



[PATCH 05/12] KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

LPCR_MER is conditionally set during entry to a guest if there is a
pending external interrupt. In the nestedv2 case, this change is not
being communicated to the L0, which means it is not being set in the L2.
Ensure the updated LPCR value is passed to the L0.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s_hv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 069c336b6f3c..6d1f0bca27aa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4084,6 +4084,8 @@ static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu 
*vcpu, u64 time_limit,
if (rc < 0)
return -EINVAL;
 
+   kvmppc_gse_put_u64(io->vcpu_run_input, KVMPPC_GSID_LPCR, lpcr);
+
accumulate_time(vcpu, >arch.in_guest);
rc = plpar_guest_run_vcpu(0, vcpu->kvm->arch.lpid, vcpu->vcpu_id,
  , );
-- 
2.42.0



[PATCH 01/12] KVM: PPC: Book3S HV nestedv2: Invalidate RPT before deleting a guest

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

An L0 must invalidate the L2's RPT during H_GUEST_DELETE if this has not
already been done. This is a slow operation that means H_GUEST_DELETE
must return H_BUSY multiple times before completing. Invalidating the
tables before deleting the guest so there is less work for the L0 to do.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/include/asm/kvm_book3s.h | 1 +
 arch/powerpc/kvm/book3s_hv.c  | 6 --
 arch/powerpc/kvm/book3s_hv_nested.c   | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 4f527d09c92b..a37736ed3728 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -302,6 +302,7 @@ void kvmhv_nested_exit(void);
 void kvmhv_vm_nested_init(struct kvm *kvm);
 long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
 long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu);
+void kvmhv_flush_lpid(u64 lpid);
 void kvmhv_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1ed6ec140701..5543e8490cd9 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5691,10 +5691,12 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
}
 
-   if (kvmhv_is_nestedv2())
+   if (kvmhv_is_nestedv2()) {
+   kvmhv_flush_lpid(kvm->arch.lpid);
plpar_guest_delete(0, kvm->arch.lpid);
-   else
+   } else {
kvmppc_free_lpid(kvm->arch.lpid);
+   }
 
kvmppc_free_pimap(kvm);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 3b658b8696bc..5c375ec1a3c6 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -503,7 +503,7 @@ void kvmhv_nested_exit(void)
}
 }
 
-static void kvmhv_flush_lpid(u64 lpid)
+void kvmhv_flush_lpid(u64 lpid)
 {
long rc;
 
-- 
2.42.0



[PATCH 02/12] KVM: PPC: Book3S HV nestedv2: Avoid reloading the tb offset

2023-12-01 Thread Vaibhav Jain
From: Jordan Niethe 

The kvmppc_get_tb_offset() getter reloads KVMPPC_GSID_TB_OFFSET from the
L0 for nestedv2 host. This is unnecessary as the value does not change.
KVMPPC_GSID_TB_OFFSET also need not be reloaded in
kvmppc_{s,g}et_dec_expires().

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/include/asm/kvm_book3s.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index a37736ed3728..3e1e2a698c9e 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -594,13 +594,17 @@ static inline u##size kvmppc_get_##reg(struct kvm_vcpu 
*vcpu) \
 
 
 KVMPPC_BOOK3S_VCORE_ACCESSOR(vtb, 64, KVMPPC_GSID_VTB)
-KVMPPC_BOOK3S_VCORE_ACCESSOR(tb_offset, 64, KVMPPC_GSID_TB_OFFSET)
 KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(arch_compat, 32, KVMPPC_GSID_LOGICAL_PVR)
 KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(lpcr, 64, KVMPPC_GSID_LPCR)
+KVMPPC_BOOK3S_VCORE_ACCESSOR_SET(tb_offset, 64, KVMPPC_GSID_TB_OFFSET)
+
+static inline u64 kvmppc_get_tb_offset(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.vcore->tb_offset;
+}
 
 static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
 {
-   WARN_ON(kvmhv_nestedv2_cached_reload(vcpu, KVMPPC_GSID_TB_OFFSET) < 0);
WARN_ON(kvmhv_nestedv2_cached_reload(vcpu, KVMPPC_GSID_DEC_EXPIRY_TB) < 
0);
return vcpu->arch.dec_expires;
 }
@@ -608,7 +612,6 @@ static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu 
*vcpu)
 static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
 {
vcpu->arch.dec_expires = val;
-   WARN_ON(kvmhv_nestedv2_cached_reload(vcpu, KVMPPC_GSID_TB_OFFSET) < 0);
kvmhv_nestedv2_mark_dirty(vcpu, KVMPPC_GSID_DEC_EXPIRY_TB);
 }
 
-- 
2.42.0



[powerpc:next-test] BUILD SUCCESS 0f9c7c805ff837d0d0ffeaa9cc16d9664c9aa325

2023-12-01 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 0f9c7c805ff837d0d0ffeaa9cc16d9664c9aa325  selftests/powerpc: Check 
all FPRs in fpu_syscall test

elapsed time: 1453m

configs tested: 206
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
archsdk_defconfig   gcc  
arc   randconfig-001-20231201   gcc  
arc   randconfig-002-20231201   gcc  
arc   tb10x_defconfig   gcc  
arcvdk_hs38_defconfig   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   gcc  
arm  allyesconfig   gcc  
arm defconfig   clang
arm mv78xx0_defconfig   clang
armneponset_defconfig   clang
arm   randconfig-001-20231201   gcc  
arm   randconfig-002-20231201   gcc  
arm   randconfig-003-20231201   gcc  
arm   randconfig-004-20231201   gcc  
armspear6xx_defconfig   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20231201   gcc  
arm64 randconfig-002-20231201   gcc  
arm64 randconfig-003-20231201   gcc  
arm64 randconfig-004-20231201   gcc  
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20231201   gcc  
csky  randconfig-002-20231201   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
hexagon   randconfig-001-20231201   clang
hexagon   randconfig-002-20231201   clang
i386 allmodconfig   clang
i386  allnoconfig   clang
i386 allyesconfig   clang
i386 buildonly-randconfig-001-20231130   gcc  
i386 buildonly-randconfig-002-20231130   gcc  
i386 buildonly-randconfig-003-20231130   gcc  
i386 buildonly-randconfig-004-20231130   gcc  
i386 buildonly-randconfig-005-20231130   gcc  
i386 buildonly-randconfig-006-20231130   gcc  
i386defconfig   gcc  
i386  randconfig-001-20231130   gcc  
i386  randconfig-002-20231130   gcc  
i386  randconfig-003-20231130   gcc  
i386  randconfig-004-20231130   gcc  
i386  randconfig-005-20231130   gcc  
i386  randconfig-006-20231130   gcc  
i386  randconfig-011-20231130   clang
i386  randconfig-011-20231201   clang
i386  randconfig-012-20231130   clang
i386  randconfig-012-20231201   clang
i386  randconfig-013-20231130   clang
i386  randconfig-013-20231201   clang
i386  randconfig-014-20231130   clang
i386  randconfig-014-20231201   clang
i386  randconfig-015-20231130   clang
i386  randconfig-015-20231201   clang
i386  randconfig-016-20231130   clang
i386  randconfig-016-20231201   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch loongson3_defconfig   gcc  
loongarch randconfig-001-20231201   gcc  
loongarch randconfig-002-20231201   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k  hp300_defconfig   gcc  
m68k   virt_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze

[powerpc:next] BUILD SUCCESS ede66cd22441820cbd399936bf84fdc4294bc7fa

2023-12-01 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: ede66cd22441820cbd399936bf84fdc4294bc7fa  powerpc/64s: Fix 
CONFIG_NUMA=n build due to create_section_mapping()

elapsed time: 1451m

configs tested: 200
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alphaallyesconfig   gcc  
alpha   defconfig   gcc  
arc  allmodconfig   gcc  
arc   allnoconfig   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
archsdk_defconfig   gcc  
arc   randconfig-001-20231201   gcc  
arc   randconfig-002-20231201   gcc  
arc   tb10x_defconfig   gcc  
arcvdk_hs38_defconfig   gcc  
arm  allmodconfig   gcc  
arm   allnoconfig   gcc  
arm  allyesconfig   gcc  
arm defconfig   clang
arm   randconfig-001-20231201   gcc  
arm   randconfig-002-20231201   gcc  
arm   randconfig-003-20231201   gcc  
arm   randconfig-004-20231201   gcc  
armspear6xx_defconfig   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20231201   gcc  
arm64 randconfig-002-20231201   gcc  
arm64 randconfig-003-20231201   gcc  
arm64 randconfig-004-20231201   gcc  
csky allmodconfig   gcc  
csky  allnoconfig   gcc  
csky allyesconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20231201   gcc  
csky  randconfig-002-20231201   gcc  
hexagon  allmodconfig   clang
hexagon   allnoconfig   clang
hexagon  allyesconfig   clang
hexagon defconfig   clang
i386 allmodconfig   clang
i386  allnoconfig   clang
i386 allyesconfig   clang
i386 buildonly-randconfig-001-20231130   gcc  
i386 buildonly-randconfig-002-20231130   gcc  
i386 buildonly-randconfig-003-20231130   gcc  
i386 buildonly-randconfig-004-20231130   gcc  
i386 buildonly-randconfig-005-20231130   gcc  
i386 buildonly-randconfig-006-20231130   gcc  
i386defconfig   gcc  
i386  randconfig-001-20231130   gcc  
i386  randconfig-002-20231130   gcc  
i386  randconfig-003-20231130   gcc  
i386  randconfig-004-20231130   gcc  
i386  randconfig-005-20231130   gcc  
i386  randconfig-006-20231130   gcc  
i386  randconfig-011-20231130   clang
i386  randconfig-011-20231201   clang
i386  randconfig-012-20231130   clang
i386  randconfig-012-20231201   clang
i386  randconfig-013-20231130   clang
i386  randconfig-013-20231201   clang
i386  randconfig-014-20231130   clang
i386  randconfig-014-20231201   clang
i386  randconfig-015-20231130   clang
i386  randconfig-015-20231201   clang
i386  randconfig-016-20231130   clang
i386  randconfig-016-20231201   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch loongson3_defconfig   gcc  
loongarch randconfig-001-20231201   gcc  
loongarch randconfig-002-20231201   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k  hp300_defconfig   gcc  
m68k   virt_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips allmodconfig   gcc  
mips  allnoconfig   clang
mips

Re: [PATCH v2] powerpc/book3s/hash: Drop _PAGE_PRIVILEGED from PAGE_NONE

2023-12-01 Thread Christophe Leroy


Le 01/12/2023 à 11:35, Michael Ellerman a écrit :
> "Aneesh Kumar K.V"  writes:
>> There used to be a dependency on _PAGE_PRIVILEGED with pte_savedwrite.
>> But that got dropped by
>> commit 6a56ccbcf6c6 ("mm/autonuma: use can_change_(pte|pmd)_writable() to 
>> replace savedwrite")
>>
>> With the change in this patch numa fault pte (pte_protnone()) gets mapped as 
>> regular user pte
>> with RWX cleared (no-access) whereas earlier it used to be mapped 
>> _PAGE_PRIVILEGED.
>>
>> Hash fault handling code did get some WARN_ON added because those
>> functions are not expected to get called with _PAGE_READ cleared.
>> commit 18061c17c8ec ("powerpc/mm: Update PROTFAULT handling in the page 
>> fault path")
>> explains the details.
>   
> You say "did get" which makes me think you're talking about the past.
> But I think you're referring to the WARN_ON you are adding in this patch?
> 
>> Also revert commit 1abce0580b89 ("powerpc/64s: Fix __pte_needs_flush() false 
>> positive warning")
> 
> That could be done separately as a follow-up couldn't it? Would reduce
> the diff size.
> 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>   arch/powerpc/include/asm/book3s/64/pgtable.h  | 9 +++--
>>   arch/powerpc/include/asm/book3s/64/tlbflush.h | 9 ++---
>>   arch/powerpc/mm/book3s64/hash_utils.c | 7 +++
>>   3 files changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index cb77eddca54b..2cc58ac74080 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -17,12 +17,6 @@
>>   #define _PAGE_EXEC 0x1 /* execute permission */
>>   #define _PAGE_WRITE0x2 /* write access allowed */
>>   #define _PAGE_READ 0x4 /* read access allowed */
>> -#define _PAGE_NA_PAGE_PRIVILEGED
>   
>> -#define _PAGE_NAX   _PAGE_EXEC
>> -#define _PAGE_RO_PAGE_READ
>> -#define _PAGE_ROX   (_PAGE_READ | _PAGE_EXEC)
>> -#define _PAGE_RW(_PAGE_READ | _PAGE_WRITE)
>> -#define _PAGE_RWX   (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>   
> Those are unrelated I think?

They are related. Those exists only because _PAGE_NA is different from 
the one defined in asm/pgtable-masks.h

As soon as you remove _PAGE_PRIVILEGED from _PAGE_NA, everything become 
standard and is taken from asm/pgtable-masks.h

> 
>>   #define _PAGE_PRIVILEGED   0x8 /* kernel access only */
>>   #define _PAGE_SAO  0x00010 /* Strong access order */
>>   #define _PAGE_NON_IDEMPOTENT   0x00020 /* non idempotent memory */
>> @@ -529,6 +523,9 @@ static inline bool pte_user(pte_t pte)
>>   }
>>   
>>   #define pte_access_permitted pte_access_permitted
>> +/*
>> + * execute-only mappings return false
>> + */
> 
> That would fit better in the existing comment block inside the function
> I think. Normally this location would be a function description comment.
> 
>>   static inline bool pte_access_permitted(pte_t pte, bool write)
>>   {
>>  /*
>ie. here
> 
> cheers


Re: [PATCH v3 5/7] kexec_file, ricv: print out debugging message if required

2023-12-01 Thread Conor Dooley
On Thu, Nov 30, 2023 at 10:39:53AM +0800, Baoquan He wrote:

$subject has a typo in the arch bit :)

> Replace pr_debug() with the newly added kexec_dprintk() in kexec_file
> loading related codes.

Commit messages should be understandable in isolation, but this only
explains (part of) what is obvious in the diff. Why is this change
being made?

> 
> And also remove kexec_image_info() because the content has been printed
> out in generic code.
> 
> Signed-off-by: Baoquan He 
> ---
>  arch/riscv/kernel/elf_kexec.c | 11 ++-
>  arch/riscv/kernel/machine_kexec.c | 26 --
>  2 files changed, 6 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
> index e60fbd8660c4..5bd1ec3341fe 100644
> --- a/arch/riscv/kernel/elf_kexec.c
> +++ b/arch/riscv/kernel/elf_kexec.c
> @@ -216,7 +216,6 @@ static void *elf_kexec_load(struct kimage *image, char 
> *kernel_buf,
>   if (ret)
>   goto out;
>   kernel_start = image->start;
> - pr_notice("The entry point of kernel at 0x%lx\n", image->start);
>  
>   /* Add the kernel binary to the image */
>   ret = riscv_kexec_elf_load(image, , _info,
> @@ -252,8 +251,8 @@ static void *elf_kexec_load(struct kimage *image, char 
> *kernel_buf,
>   image->elf_load_addr = kbuf.mem;
>   image->elf_headers_sz = headers_sz;
>  
> - pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx 
> memsz=0x%lx\n",
> -  image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
> + kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx 
> memsz=0x%lx\n",
> +   image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
>  
>   /* Setup cmdline for kdump kernel case */
>   modified_cmdline = setup_kdump_cmdline(image, cmdline,
> @@ -275,6 +274,8 @@ static void *elf_kexec_load(struct kimage *image, char 
> *kernel_buf,
>   pr_err("Error loading purgatory ret=%d\n", ret);
>   goto out;
>   }
> + kexec_dprintk("Loaded purgatory at 0x%lx\n", kbuf.mem);
> +
>   ret = kexec_purgatory_get_set_symbol(image, "riscv_kernel_entry",
>_start,
>sizeof(kernel_start), 0);
> @@ -293,7 +294,7 @@ static void *elf_kexec_load(struct kimage *image, char 
> *kernel_buf,
>   if (ret)
>   goto out;
>   initrd_pbase = kbuf.mem;

> - pr_notice("Loaded initrd at 0x%lx\n", initrd_pbase);
> + kexec_dprintk("Loaded initrd at 0x%lx\n", initrd_pbase);

This is not a pr_debug().

>   }
>  
>   /* Add the DTB to the image */
> @@ -318,7 +319,7 @@ static void *elf_kexec_load(struct kimage *image, char 
> *kernel_buf,
>   }
>   /* Cache the fdt buffer address for memory cleanup */
>   image->arch.fdt = fdt;

> - pr_notice("Loaded device tree at 0x%lx\n", kbuf.mem);
> + kexec_dprintk("Loaded device tree at 0x%lx\n", kbuf.mem);

Neither is this. Why are they being moved from pr_notice()?

Thanks,
Conor.

>   goto out;
>  
>  out_free_fdt:
> diff --git a/arch/riscv/kernel/machine_kexec.c 
> b/arch/riscv/kernel/machine_kexec.c
> index 2d139b724bc8..ed9cad20c039 100644
> --- a/arch/riscv/kernel/machine_kexec.c
> +++ b/arch/riscv/kernel/machine_kexec.c
> @@ -18,30 +18,6 @@
>  #include 
>  #include 
>  
> -/*
> - * kexec_image_info - Print received image details
> - */
> -static void
> -kexec_image_info(const struct kimage *image)
> -{
> - unsigned long i;
> -
> - pr_debug("Kexec image info:\n");
> - pr_debug("\ttype:%d\n", image->type);
> - pr_debug("\tstart:   %lx\n", image->start);
> - pr_debug("\thead:%lx\n", image->head);
> - pr_debug("\tnr_segments: %lu\n", image->nr_segments);
> -
> - for (i = 0; i < image->nr_segments; i++) {
> - pr_debug("\tsegment[%lu]: %016lx - %016lx", i,
> - image->segment[i].mem,
> - image->segment[i].mem + image->segment[i].memsz);
> - pr_debug("\t\t0x%lx bytes, %lu pages\n",
> - (unsigned long) image->segment[i].memsz,
> - (unsigned long) image->segment[i].memsz /  PAGE_SIZE);
> - }
> -}
> -
>  /*
>   * machine_kexec_prepare - Initialize kexec
>   *
> @@ -60,8 +36,6 @@ machine_kexec_prepare(struct kimage *image)
>   unsigned int control_code_buffer_sz = 0;
>   int i = 0;
>  
> - kexec_image_info(image);
> -
>   /* Find the Flattened Device Tree and save its physical address */
>   for (i = 0; i < image->nr_segments; i++) {
>   if (image->segment[i].memsz <= sizeof(fdt))
> -- 
> 2.41.0
> 


signature.asc
Description: PGP signature


Re: [PATCH v2] powerpc/book3s/hash: Drop _PAGE_PRIVILEGED from PAGE_NONE

2023-12-01 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> There used to be a dependency on _PAGE_PRIVILEGED with pte_savedwrite.
> But that got dropped by
> commit 6a56ccbcf6c6 ("mm/autonuma: use can_change_(pte|pmd)_writable() to 
> replace savedwrite")
>
> With the change in this patch numa fault pte (pte_protnone()) gets mapped as 
> regular user pte
> with RWX cleared (no-access) whereas earlier it used to be mapped 
> _PAGE_PRIVILEGED.
>
> Hash fault handling code did get some WARN_ON added because those
> functions are not expected to get called with _PAGE_READ cleared.
> commit 18061c17c8ec ("powerpc/mm: Update PROTFAULT handling in the page fault 
> path")
> explains the details.
 
You say "did get" which makes me think you're talking about the past.
But I think you're referring to the WARN_ON you are adding in this patch?

> Also revert commit 1abce0580b89 ("powerpc/64s: Fix __pte_needs_flush() false 
> positive warning")

That could be done separately as a follow-up couldn't it? Would reduce
the diff size.

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h  | 9 +++--
>  arch/powerpc/include/asm/book3s/64/tlbflush.h | 9 ++---
>  arch/powerpc/mm/book3s64/hash_utils.c | 7 +++
>  3 files changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index cb77eddca54b..2cc58ac74080 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -17,12 +17,6 @@
>  #define _PAGE_EXEC   0x1 /* execute permission */
>  #define _PAGE_WRITE  0x2 /* write access allowed */
>  #define _PAGE_READ   0x4 /* read access allowed */
> -#define _PAGE_NA _PAGE_PRIVILEGED
 
> -#define _PAGE_NAX_PAGE_EXEC
> -#define _PAGE_RO _PAGE_READ
> -#define _PAGE_ROX(_PAGE_READ | _PAGE_EXEC)
> -#define _PAGE_RW (_PAGE_READ | _PAGE_WRITE)
> -#define _PAGE_RWX(_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 
Those are unrelated I think?

>  #define _PAGE_PRIVILEGED 0x8 /* kernel access only */
>  #define _PAGE_SAO0x00010 /* Strong access order */
>  #define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */
> @@ -529,6 +523,9 @@ static inline bool pte_user(pte_t pte)
>  }
>  
>  #define pte_access_permitted pte_access_permitted
> +/*
> + * execute-only mappings return false
> + */

That would fit better in the existing comment block inside the function
I think. Normally this location would be a function description comment.

>  static inline bool pte_access_permitted(pte_t pte, bool write)
>  {
>   /*
  ie. here

cheers


Re: [PATCH v2] powerpc/mm: Fix null-pointer dereference in pgtable_cache_add

2023-12-01 Thread Michael Ellerman
Kunwu Chan  writes:
> kasprintf() returns a pointer to dynamically allocated memory
> which can be NULL upon failure. Ensure the allocation was successful
> by checking the pointer validity.
>
> Suggested-by: Christophe Leroy 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Kunwu Chan 
> ---
> v2: Use "panic" instead of "return"
> ---
>  arch/powerpc/mm/init-common.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
> index 119ef491f797..9788950b33f5 100644
> --- a/arch/powerpc/mm/init-common.c
> +++ b/arch/powerpc/mm/init-common.c
> @@ -139,6 +139,8 @@ void pgtable_cache_add(unsigned int shift)
>  
>   align = max_t(unsigned long, align, minalign);
>   name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
> + if (!name)
> + panic("Failed to allocate memory for order %d", shift);
>   new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
>   if (!new)
>   panic("Could not allocate pgtable cache for order %d", shift);

It would be nice to avoid two calls to panic. Can you reorganise the
logic so that there's only one? Initialising new to NULL might help.

cheers


Re: [PATCH][next] powerpc/crypto: Avoid -Wstringop-overflow warnings

2023-12-01 Thread Herbert Xu
On Tue, Nov 21, 2023 at 12:52:44PM -0600, Gustavo A. R. Silva wrote:
> The compiler doesn't know that `32` is an offset into the Hash table:
> 
>  56 struct Hash_ctx {
>  57 u8 H[16];   /* subkey */
>  58 u8 Htable[256]; /* Xi, Hash table(offset 32) */
>  59 };
> 
> So, it legitimately complains about a potential out-of-bounds issue
> if `256 bytes` are accessed in `htable` (this implies going
> `32 bytes` beyond the boundaries of `Htable`):
> 
> arch/powerpc/crypto/aes-gcm-p10-glue.c: In function 'gcmp10_init':
> arch/powerpc/crypto/aes-gcm-p10-glue.c:120:9: error: 'gcm_init_htable' 
> accessing 256 bytes in a region of size 224 [-Werror=stringop-overflow=]
>   120 | gcm_init_htable(hash->Htable+32, hash->H);
>   | ^
> arch/powerpc/crypto/aes-gcm-p10-glue.c:120:9: note: referencing argument 1 of 
> type 'unsigned char[256]'
> arch/powerpc/crypto/aes-gcm-p10-glue.c:120:9: note: referencing argument 2 of 
> type 'unsigned char[16]'
> arch/powerpc/crypto/aes-gcm-p10-glue.c:40:17: note: in a call to function 
> 'gcm_init_htable'
>40 | asmlinkage void gcm_init_htable(unsigned char htable[256], unsigned 
> char Xi[16]);
>   | ^~~
> 
> Address this by avoiding specifying the size of `htable` in the function
> prototype; and just for consistency, do the same for parameter `Xi`.
> 
> Reported-by: Stephen Rothwell 
> Closes: 
> https://lore.kernel.org/linux-next/20231121131903.68a37...@canb.auug.org.au/
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  arch/powerpc/crypto/aes-gcm-p10-glue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] powerpc/irq: Allow softirq to hardirq stack transition

2023-12-01 Thread Christophe Leroy


Le 01/12/2023 à 11:05, Michael Ellerman a écrit :
> Christophe Leroy  writes:
>> Le 30/11/2023 à 13:50, Michael Ellerman a écrit :
>>> Allow a transition from the softirq stack to the hardirq stack when
>>> handling a hardirq. Doing so means a hardirq received while deep in
>>> softirq processing is less likely to cause a stack overflow of the
>>> softirq stack.
>>>
>>> Previously it wasn't safe to do so because irq_exit() (which initiates
>>> softirq processing) was called on the hardirq stack.
>>>
>>> That was changed in commit 1b1b6a6f4cc0 ("powerpc: handle irq_enter/
>>> irq_exit in interrupt handler wrappers") and 1346d00e1bdf ("powerpc:
>>> Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK").
>>>
>>> The allowed transitions are now:
>>>- process stack -> hardirq stack
>>>- process stack -> softirq stack
>>>- process stack -> softirq stack -> hardirq stack
>>
>> It means you don't like my patch
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/6cd9d8bb2258d8b51999c2584eac74423d2b5e29.1657203774.git.christophe.le...@csgroup.eu/
>> ?
> 
> I did like your patch :)
> 
> But then we got reports of folks hitting stack overflow in some distro
> kernels, and in at least some cases it was a hardirq coming in during
> softirq handling and overflowing the softirq stack.

Fair enough, I'll discard it.

> 
>> I never got any feedback.
> 
> Sorry, not enough hours in the day.
> 

Yes same problem here :)


Re: [PATCH] powerpc/irq: Allow softirq to hardirq stack transition

2023-12-01 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 30/11/2023 à 13:50, Michael Ellerman a écrit :
>> Allow a transition from the softirq stack to the hardirq stack when
>> handling a hardirq. Doing so means a hardirq received while deep in
>> softirq processing is less likely to cause a stack overflow of the
>> softirq stack.
>> 
>> Previously it wasn't safe to do so because irq_exit() (which initiates
>> softirq processing) was called on the hardirq stack.
>> 
>> That was changed in commit 1b1b6a6f4cc0 ("powerpc: handle irq_enter/
>> irq_exit in interrupt handler wrappers") and 1346d00e1bdf ("powerpc:
>> Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK").
>> 
>> The allowed transitions are now:
>>   - process stack -> hardirq stack
>>   - process stack -> softirq stack
>>   - process stack -> softirq stack -> hardirq stack
>
> It means you don't like my patch 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/6cd9d8bb2258d8b51999c2584eac74423d2b5e29.1657203774.git.christophe.le...@csgroup.eu/
>  
> ?

I did like your patch :)

But then we got reports of folks hitting stack overflow in some distro
kernels, and in at least some cases it was a hardirq coming in during
softirq handling and overflowing the softirq stack.

> I never got any feedback.

Sorry, not enough hours in the day.

cheers


Re: [PATCH] scsi: ibmvscsi: replace deprecated strncpy with strscpy

2023-12-01 Thread Michael Ellerman
Justin Stitt  writes:
> strncpy() is deprecated for use on NUL-terminated destination strings
> [1] and as such we should prefer more robust and less ambiguous string
> interfaces.
>
> We expect partition_name to be NUL-terminated based on its usage with
> format strings:
> |   dev_info(hostdata->dev, "host srp version: %s, "
> |"host partition %s (%d), OS %d, max io %u\n",
> |hostdata->madapter_info.srp_version,
> |hostdata->madapter_info.partition_name,
> |be32_to_cpu(hostdata->madapter_info.partition_number),
> |be32_to_cpu(hostdata->madapter_info.os_type),
> |be32_to_cpu(hostdata->madapter_info.port_max_txu[0]));
> ...
> |   len = snprintf(buf, PAGE_SIZE, "%s\n",
> |hostdata->madapter_info.partition_name);
>
> Moreover, NUL-padding is not required as madapter_info is explicitly
> memset to 0:
> |   memset(>madapter_info, 0x00,
> |   sizeof(hostdata->madapter_info));
>
> Considering the above, a suitable replacement is `strscpy` [2] due to
> the fact that it guarantees NUL-termination on the destination buffer
> without unnecessarily NUL-padding.
>
> Link: 
> https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
>  [1]
> Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html 
> [2]
> Link: https://github.com/KSPP/linux/issues/90
> Cc: linux-harden...@vger.kernel.org
> Signed-off-by: Justin Stitt 
> ---
> Note: build-tested only.

I gave it a quick boot, no issues.

Tested-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH 15/17] soc: fsl: cpm1: qmc: Handle timeslot entries at channel start() and stop()

2023-12-01 Thread Herve Codina
Hi Arnd,

On Wed, 29 Nov 2023 15:03:02 +0100
"Arnd Bergmann"  wrote:

> On Tue, Nov 28, 2023, at 15:08, Herve Codina wrote:
> > @@ -272,6 +274,8 @@ int qmc_chan_get_info(struct qmc_chan *chan, struct 
> > qmc_chan_info *info)
> > if (ret)
> > return ret;
> > 
> > +   spin_lock_irqsave(>ts_lock, flags);
> > +
> > info->mode = chan->mode;
> > info->rx_fs_rate = tsa_info.rx_fs_rate;
> > info->rx_bit_rate = tsa_info.rx_bit_rate;
> > @@ -280,6 +284,8 @@ int qmc_chan_get_info(struct qmc_chan *chan, struct 
> > qmc_chan_info *info)
> > info->tx_bit_rate = tsa_info.tx_bit_rate;
> > info->nb_rx_ts = hweight64(chan->rx_ts_mask);
> > 
> > +   spin_unlock_irqrestore(>ts_lock, flags);
> > +
> > return 0;
> >  }  
> 
> I would normally use spin_lock_irq() instead of spin_lock_irqsave()
> in functions that are only called outside of atomic context.

I would prefer to keep spin_lock_irqsave() here.
This function is part of the API and so, its quite difficult to ensure
that all calls (current and future) will be done outside of an atomic
context.

> 
> > +static int qmc_chan_start_rx(struct qmc_chan *chan);
> > +
> >  int qmc_chan_stop(struct qmc_chan *chan, int direction)
> >  {  
> ... 
> > -static void qmc_chan_start_rx(struct qmc_chan *chan)
> > +static int qmc_setup_chan_trnsync(struct qmc *qmc, struct qmc_chan *chan);
> > +
> > +static int qmc_chan_start_rx(struct qmc_chan *chan)
> >  {  
> 
> Can you reorder the static functions in a way that avoids the
> forward declarations?

Yes, sure.
I will do that in the next iteration.

Thanks for the review,

Best regards,
Hervé


Re: [kvm-unit-tests PATCH v1 00/18] arm/arm64: Rework cache maintenance at boot

2023-12-01 Thread Shaoqin Huang

Hi Alexandru,

Just take your time. I also appreciate your work. :)

Thanks,
Shaoqin

On 11/30/23 18:35, Alexandru Elisei wrote:

Hi,

Thank you so much for reviving this, much appreciated.

I wanted to let you know that I definitely plan to review the series as
soon as possible, unfortunately I don't believe I won't be able to do that
for at least 2 weeks.

Thanks,
Alex

On Thu, Nov 30, 2023 at 04:07:02AM -0500, Shaoqin Huang wrote:

Hi,

I'm posting Alexandru's patch set[1] rebased on the latest branch with the
conflicts being resolved. No big changes compare to its original code.

As this version 1 of this series was posted one years ago, I would first let you
recall it, what's the intention of this series and what this series do. You can
view it by click the link[2] and view the cover-letter.

Since when writing the series[1], the efi support for arm64[3] hasn't been
merged into the kvm-unit-tests, but now the efi support for arm64 has been
merged. Directly rebase the series[1] onto the latest branch will break the efi
tests. This is mainly because the Patch #15 ("arm/arm64: Enable the MMU early")
moves the mmu_enable() out of the setup_mmu(), which causes the efi test will
not enable the mmu. So I do a small change in the efi_mem_init() which makes the
efi test also enable the MMU early, and make it works.

And another change should be noticed is in the Patch #17 ("arm/arm64: Perform
dcache maintenance"). In the efi_mem_init(), it will disable the mmu, and build
a new pagetable and re-enable the mmu, if the asm_mmu_disable clean and
invalidate the data caches for entire memory, we don't need to care the dcache
and after mmu disabled, we use the mmu_setup_early() to re-enable the mmu, which
takes care all the cache maintenance. But the situation changes since the Patch
#18 ("arm/arm64: Rework the cache maintenance in asm_mmu_disable") only clean
and invalidate the data caches for the stack memory area. So we need to clean
and invalidate the data caches manually before disable the mmu, I'm not
confident about current cache maintenance at the efi setup patch, so I ask for
your help to review it if it's right or not.

And I also drop one patch ("s390: Do not use the physical allocator") from[1]
since this cause s390 test to fail.

This series may include bug, so I really appreciate your review to improve this
series together.

You can get the code from:

$ git clone https://gitlab.com/shahuang/kvm-unit-tests.git \
-b arm-arm64-rework-cache-maintenance-at-boot-v1

[1] 
https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/arm-arm64-rework-cache-maintenance-at-boot-v2-wip2
[2] https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/
[3] 
https://patchwork.kernel.org/project/kvm/cover/20230530160924.82158-1-nikos.nikole...@arm.com/

Changelog:
--
RFC->v1:
   - Gathered Reviewed-by tags.
   - Various changes to commit messages and comments to hopefully make the code
 easier to understand.
   - Patches #8 ("lib/alloc_phys: Expand documentation with usage and 
limitations")
 are new.
   - Folded patch "arm: page.h: Add missing libcflat.h include" into #17
 ("arm/arm64: Perform dcache maintenance at boot").
   - Reordered the series to group patches that touch aproximately the same code
 together - the patches that change the physical allocator are now first,
 followed come the patches that change how the secondaries are brought 
online.
   - Fixed several nasty bugs where the r4 register was being clobbered in the 
arm
 assembly.
   - Unmap the early UART address if the DTB address does not match the early
 address.
   - Added dcache maintenance when a page table is modified with the MMU 
disabled.
   - Moved the cache maintenance when disabling the MMU to be executed before 
the
 MMU is disabled.
   - Rebase it on lasted branch which efi support has been merged.
   - Make the efi test also enable MMU early.
   - Add cache maintenance on efi setup path especially before mmu_disable.

RFC: 
https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/

Alexandru Elisei (18):
   Makefile: Define __ASSEMBLY__ for assembly files
   powerpc: Replace the physical allocator with the page allocator
   lib/alloc_phys: Initialize align_min
   lib/alloc_phys: Consolidate allocate functions into memalign_early()
   lib/alloc_phys: Remove locking
   lib/alloc_phys: Remove allocation accounting
   lib/alloc_phys: Add callback to perform cache maintenance
   lib/alloc_phys: Expand documentation with usage and limitations
   arm/arm64: Zero secondary CPUs' stack
   arm/arm64: Allocate secondaries' stack using the page allocator
   arm/arm64: assembler.h: Replace size with end address for
 dcache_by_line_op
   arm/arm64: Add C functions for doing cache maintenance
   arm/arm64: Configure secondaries' stack before enabling the MMU
   arm/arm64: Use pgd_alloc() to allocate mmu_idmap
   arm/arm64: Enable the MMU early
   arm/arm64: Map the 

[powerpc:fixes-test] BUILD SUCCESS e0ecbc227526df88604d776a5ccaf74a6bbb0160

2023-12-01 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
fixes-test
branch HEAD: e0ecbc227526df88604d776a5ccaf74a6bbb0160  powerpc/ftrace: Fix 
stack teardown in ftrace_no_trace

elapsed time: 1240m

configs tested: 166
configs skipped: 90

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alpha   defconfig   gcc  
arc   allnoconfig   gcc  
arc defconfig   gcc  
archsdk_defconfig   gcc  
arc   randconfig-001-20231201   gcc  
arc   randconfig-002-20231201   gcc  
arc   tb10x_defconfig   gcc  
arcvdk_hs38_defconfig   gcc  
arm   allnoconfig   gcc  
arm   randconfig-001-20231201   gcc  
arm   randconfig-002-20231201   gcc  
arm   randconfig-003-20231201   gcc  
arm   randconfig-004-20231201   gcc  
armspear6xx_defconfig   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20231201   gcc  
arm64 randconfig-002-20231201   gcc  
arm64 randconfig-003-20231201   gcc  
arm64 randconfig-004-20231201   gcc  
csky  allnoconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20231201   gcc  
csky  randconfig-002-20231201   gcc  
hexagon  allmodconfig   clang
hexagon  allyesconfig   clang
i386 allmodconfig   clang
i386  allnoconfig   clang
i386 allyesconfig   clang
i386  randconfig-011-20231201   clang
i386  randconfig-012-20231201   clang
i386  randconfig-013-20231201   clang
i386  randconfig-014-20231201   clang
i386  randconfig-015-20231201   clang
i386  randconfig-016-20231201   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch loongson3_defconfig   gcc  
loongarch randconfig-001-20231201   gcc  
loongarch randconfig-002-20231201   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k  hp300_defconfig   gcc  
m68k   virt_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips allmodconfig   gcc  
mips allyesconfig   gcc  
mips db1xxx_defconfig   gcc  
mips  fuloong2e_defconfig   gcc  
mips loongson1b_defconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 randconfig-001-20231201   gcc  
nios2 randconfig-002-20231201   gcc  
openrisc allmodconfig   gcc  
openrisc  allnoconfig   gcc  
openrisc allyesconfig   gcc  
openriscdefconfig   gcc  
parisc   allmodconfig   gcc  
pariscallnoconfig   gcc  
parisc   allyesconfig   gcc  
parisc  defconfig   gcc  
pariscgeneric-64bit_defconfig   gcc  
pariscrandconfig-001-20231201   gcc  
pariscrandconfig-002-20231201   gcc  
parisc64defconfig   gcc  
powerpc  allmodconfig   clang
powerpc   allnoconfig   gcc  
powerpc  allyesconfig   clang
powerpc  bamboo_defconfig   gcc  
powerpc   eiger_defconfig   gcc  
powerpc rainier_defconfig   gcc  
powerpc   randconfig-001-20231201   gcc  
powerpc   randconfig-002-20231201   gcc  
powerpc