Re: [PATCH v4 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-11-30 Thread Manivannan Sadhasivam
On Thu, Nov 30, 2023 at 03:22:14PM -0500, Frank Li wrote:
> On Thu, Nov 30, 2023 at 03:17:39PM -0500, Frank Li wrote:
> > On Thu, Nov 30, 2023 at 10:21:00PM +0530, Manivannan Sadhasivam wrote:
> > > On Wed, Nov 29, 2023 at 04:44:12PM -0500, Frank Li wrote:
> > > > In the suspend path, PME_Turn_Off message is sent to the endpoint to
> > > > transition the link to L2/L3_Ready state. In this SoC, there is no way 
> > > > to
> > > > check if the controller has received the PME_To_Ack from the endpoint or
> > > > not. So to be on the safer side, the driver just waits for
> > > > PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> > > > bit to complete the PME_Turn_Off handshake. This link would then enter
> > > > L2/L3 state depending on the VAUX supply.
> > > > 
> > > > In the resume path, the link is brought back from L2 to L0 by doing a
> > > > software reset.
> > > > 
> > > 
> > > Same comment on the patch description as on patch 2/4.
> > > 
> > > > Signed-off-by: Frank Li 
> > > > ---
> > > > 
> > > > Notes:
> > > > Change from v3 to v4
> > > > - Call scfg_pcie_send_turnoff_msg() shared with ls1021a
> > > > - update commit message
> > > > 
> > > > Change from v2 to v3
> > > > - Remove ls_pcie_lut_readl(writel) function
> > > > 
> > > > Change from v1 to v2
> > > > - Update subject 'a' to 'A'
> > > > 
> > > >  drivers/pci/controller/dwc/pci-layerscape.c | 63 -
> > > >  1 file changed, 62 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> > > > b/drivers/pci/controller/dwc/pci-layerscape.c
> > > > index 590e07bb27002..d39700b3afaaa 100644
> > > > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > > > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > > > @@ -41,6 +41,15 @@
> > > >  #define SCFG_PEXSFTRSTCR   0x190
> > > >  #define PEXSR(idx) BIT(idx)
> > > >  
> > > > +/* LS1043A PEX PME control register */
> > > > +#define SCFG_PEXPMECR  0x144
> > > > +#define PEXPME(idx)BIT(31 - (idx) * 4)
> > > > +
> > > > +/* LS1043A PEX LUT debug register */
> > > > +#define LS_PCIE_LDBG   0x7fc
> > > > +#define LDBG_SRBIT(30)
> > > > +#define LDBG_WEBIT(31)
> > > > +
> > > >  #define PCIE_IATU_NUM  6
> > > >  
> > > >  struct ls_pcie_drvdata {
> > > > @@ -225,6 +234,45 @@ static int ls1021a_pcie_exit_from_l2(struct 
> > > > dw_pcie_rp *pp)
> > > > return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> > > > PEXSR(pcie->index));
> > > >  }
> > > >  
> > > > +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> > > > +{
> > > > +   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > +   struct ls_pcie *pcie = to_ls_pcie(pci);
> > > > +
> > > > +   scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
> > > > PEXPME(pcie->index));
> > > > +}
> > > > +
> > > > +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> > > > +{
> > > > +   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > +   struct ls_pcie *pcie = to_ls_pcie(pci);
> > > > +   u32 val;
> > > > +
> > > > +   /*
> > > > +* Only way let PEX module exit L2 is do a software reset.
> > > 
> > > Can you expand PEX? What is it used for?
> > > 
> > > Also if the reset is only for the PEX module, please use the same comment 
> > > in
> > > both patches 2 and 4. Patch 2 doesn't mention PEX in the comment.
> > 
> > After read spec again, I think PEX is pci express. So it should software
> > reset controller. I don't know what exactly did in the chip. But without
> > below code, PCIe can't exit L2/L3.
> > 
> > Any harmful if dwc controller reset? Anyway these code works well with
> > intel network card.
> 
> Sorry, sent too quick. It is PCIe express wrapper
> 
> Copy from spec: 
> 
> "PEXLDBG[SR]. Once set the
> PEXLDBG[SR] will enable the soft reset to the PEX wrapper."
> 

Okay. Please use the below comment in both patches 2 and 4:

/* Reset the PEX wrapper to bring the link out of L2 */

- Mani

> Frank
> 
> > 
> > Frank
> > 
> > > 
> > > - Mani
> > > 
> > > > +* LDBG_WE: allows the user to have write access to the 
> > > > PEXDBG[SR] for both setting and
> > > > +*  clearing the soft reset on the PEX module.
> > > > +* LDBG_SR: When SR is set to 1, the PEX module enters soft 
> > > > reset.
> > > > +*/
> > > > +   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > > +   val |= LDBG_WE;
> > > > +   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > > +
> > > > +   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > > +   val |= LDBG_SR;
> > > > +   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > > +
> > > > +   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > > +   val &= ~LDBG_SR;
> > > > +   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > > +
> > > > +   val = 

Re: [PATCH v2] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

2023-11-30 Thread Haren Myneni




On 11/29/23 6:07 PM, Michael Ellerman wrote:

Haren Myneni  writes:

VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor expects to reissue these HCALLs after 1 or 10msecs.
It might cause these HCALLs takes longer when multiple threads
issue open or close VAS windows simultaneously.

So instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again.

Signed-off-by: Haren Myneni 
Suggested-by: Nathan Lynch 

---
v1 -> v2:
- Use usleep_range instead of using RTAS sleep routine as
   suggested by Nathan
---
  arch/powerpc/platforms/pseries/vas.c | 24 +++-
  1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 71d52a670d95..bade4402741f 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -36,9 +36,31 @@ static bool migration_in_progress;
  
  static long hcall_return_busy_check(long rc)

  {
+   unsigned int ms;
+
/* Check if we are stalled for some time */
if (H_IS_LONG_BUSY(rc)) {
-   msleep(get_longbusy_msecs(rc));
+   ms = get_longbusy_msecs(rc);
+   /*
+* Allocate, Modify and Deallocate HCALLs returns
+* H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
+* for the long delay. So the delay should always be 1
+* or 10msecs, but sleeps 1msec in case if the long
+* delay is > H_LONG_BUSY_ORDER_10_MSEC.
+*/
+   if (ms > 10)
+   ms = 1;
  
I don't understand this. The hypervisor asked you to sleep for more than

10 milliseconds, so instead you sleep for 1?

I can understand that we don't want to usleep() for the longer durations
that could be returned, but so shouldn't the code be using msleep() for
those values?

Sleeping for a very short duration definitely seems wrong.


Allocate, modify and deallocate HCALLs return only 1MSECS and 10MSECS 
for long delay. we should not expect > 10MSECS for these HCALLs. Hence 
ms = 1 if ms > 10


But it is confusing. So will use ms = 10 for ms >= 10 as Nathan suggested.





+   /*
+* msleep() will often sleep at least 20 msecs even
+* though the hypervisor expects to reissue these
  
That makes it sound like the hypervisor is reissuing the hcalls.


Better would be "the hypervisor suggests the kernel should reissue the
hcall after ...".


+* HCALLs after 1 or 10msecs. So use usleep_range()
+* to sleep with the expected value.
+*
+* See Documentation/timers/timers-howto.rst on using
+* the value range in usleep_range().
+*/
+   usleep_range(ms * 100, ms * 1000);


If ms == 1, then that's 100 usecs, which is not 1 millisecond?

Please use USEC_PER_MSEC.


Using usleep_range() same way as mentioned in  rtas_busy_delay().


Thanks
Haren




rc = H_BUSY;
} else if (rc == H_BUSY) {
cond_resched();


cheers



[powerpc:fixes] BUILD SUCCESS dc158d23b33df9033bcc8e7117e8591dd2f9d125

2023-11-30 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
fixes
branch HEAD: dc158d23b33df9033bcc8e7117e8591dd2f9d125  KVM: PPC: Book3S HV: Fix 
KVM_RUN clobbering FP/VEC user registers

elapsed time: 1203m

configs tested: 166
configs skipped: 90

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alpha allnoconfig   gcc  
alpha   defconfig   gcc  
arc   allnoconfig   gcc  
arc defconfig   gcc  
archsdk_defconfig   gcc  
arc   randconfig-001-20231201   gcc  
arc   randconfig-002-20231201   gcc  
arc   tb10x_defconfig   gcc  
arcvdk_hs38_defconfig   gcc  
arm   allnoconfig   gcc  
arm   randconfig-001-20231201   gcc  
arm   randconfig-002-20231201   gcc  
arm   randconfig-003-20231201   gcc  
arm   randconfig-004-20231201   gcc  
armspear6xx_defconfig   gcc  
arm64allmodconfig   clang
arm64 allnoconfig   gcc  
arm64   defconfig   gcc  
arm64 randconfig-001-20231201   gcc  
arm64 randconfig-002-20231201   gcc  
arm64 randconfig-003-20231201   gcc  
arm64 randconfig-004-20231201   gcc  
csky  allnoconfig   gcc  
cskydefconfig   gcc  
csky  randconfig-001-20231201   gcc  
csky  randconfig-002-20231201   gcc  
hexagon  allmodconfig   clang
hexagon  allyesconfig   clang
i386 allmodconfig   clang
i386  allnoconfig   clang
i386 allyesconfig   clang
i386  randconfig-011-20231201   clang
i386  randconfig-012-20231201   clang
i386  randconfig-013-20231201   clang
i386  randconfig-014-20231201   clang
i386  randconfig-015-20231201   clang
i386  randconfig-016-20231201   clang
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchallyesconfig   gcc  
loongarch   defconfig   gcc  
loongarch loongson3_defconfig   gcc  
loongarch randconfig-001-20231201   gcc  
loongarch randconfig-002-20231201   gcc  
m68k allmodconfig   gcc  
m68k  allnoconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68k  hp300_defconfig   gcc  
m68k   virt_defconfig   gcc  
microblaze   allmodconfig   gcc  
microblazeallnoconfig   gcc  
microblaze   allyesconfig   gcc  
microblaze  defconfig   gcc  
mips allmodconfig   gcc  
mips allyesconfig   gcc  
mips db1xxx_defconfig   gcc  
mips  fuloong2e_defconfig   gcc  
mips loongson1b_defconfig   gcc  
nios2allmodconfig   gcc  
nios2 allnoconfig   gcc  
nios2allyesconfig   gcc  
nios2   defconfig   gcc  
nios2 randconfig-001-20231201   gcc  
nios2 randconfig-002-20231201   gcc  
openrisc allmodconfig   gcc  
openrisc  allnoconfig   gcc  
openrisc allyesconfig   gcc  
openriscdefconfig   gcc  
parisc   allmodconfig   gcc  
pariscallnoconfig   gcc  
parisc   allyesconfig   gcc  
parisc  defconfig   gcc  
pariscgeneric-64bit_defconfig   gcc  
pariscrandconfig-001-20231201   gcc  
pariscrandconfig-002-20231201   gcc  
parisc64defconfig   gcc  
powerpc  allmodconfig   clang
powerpc   allnoconfig   gcc  
powerpc  allyesconfig   clang
powerpc  bamboo_defconfig   gcc  
powerpc   eiger_defconfig   gcc  
powerpc rainier_defconfig   gcc  
powerpc   randconfig-001-20231201   gcc  
powerpc   randconfig-002-20231201   gcc  
powerpc   

Re: linux-next: build failure after merge of the mm tree

2023-11-30 Thread Michael Ellerman
Andrew Morton  writes:
> On Fri, 01 Dec 2023 09:39:20 +1100 Michael Ellerman  
> wrote:
>
>> > I am still carrying this patch (it should probably go into the mm
>> > tree).  Is someone going to pick it up (assuming it is correct)?
>> 
>> I applied it to my next a few days ago, but I must have forgotten to
>> push. It's in there now.
>
> I'll keep a copy in mm.git, to keep the dependencies nice.  I added
> your acked-by.

Sure thing. Thanks.

cheers


Re: [PATCH v4 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-11-30 Thread Manivannan Sadhasivam
On Thu, Nov 30, 2023 at 03:17:39PM -0500, Frank Li wrote:
> On Thu, Nov 30, 2023 at 10:21:00PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Nov 29, 2023 at 04:44:12PM -0500, Frank Li wrote:
> > > In the suspend path, PME_Turn_Off message is sent to the endpoint to
> > > transition the link to L2/L3_Ready state. In this SoC, there is no way to
> > > check if the controller has received the PME_To_Ack from the endpoint or
> > > not. So to be on the safer side, the driver just waits for
> > > PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> > > bit to complete the PME_Turn_Off handshake. This link would then enter
> > > L2/L3 state depending on the VAUX supply.
> > > 
> > > In the resume path, the link is brought back from L2 to L0 by doing a
> > > software reset.
> > > 
> > 
> > Same comment on the patch description as on patch 2/4.
> > 
> > > Signed-off-by: Frank Li 
> > > ---
> > > 
> > > Notes:
> > > Change from v3 to v4
> > > - Call scfg_pcie_send_turnoff_msg() shared with ls1021a
> > > - update commit message
> > > 
> > > Change from v2 to v3
> > > - Remove ls_pcie_lut_readl(writel) function
> > > 
> > > Change from v1 to v2
> > > - Update subject 'a' to 'A'
> > > 
> > >  drivers/pci/controller/dwc/pci-layerscape.c | 63 -
> > >  1 file changed, 62 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> > > b/drivers/pci/controller/dwc/pci-layerscape.c
> > > index 590e07bb27002..d39700b3afaaa 100644
> > > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > > @@ -41,6 +41,15 @@
> > >  #define SCFG_PEXSFTRSTCR 0x190
> > >  #define PEXSR(idx)   BIT(idx)
> > >  
> > > +/* LS1043A PEX PME control register */
> > > +#define SCFG_PEXPMECR0x144
> > > +#define PEXPME(idx)  BIT(31 - (idx) * 4)
> > > +
> > > +/* LS1043A PEX LUT debug register */
> > > +#define LS_PCIE_LDBG 0x7fc
> > > +#define LDBG_SR  BIT(30)
> > > +#define LDBG_WE  BIT(31)
> > > +
> > >  #define PCIE_IATU_NUM6
> > >  
> > >  struct ls_pcie_drvdata {
> > > @@ -225,6 +234,45 @@ static int ls1021a_pcie_exit_from_l2(struct 
> > > dw_pcie_rp *pp)
> > >   return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> > > PEXSR(pcie->index));
> > >  }
> > >  
> > > +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > + struct ls_pcie *pcie = to_ls_pcie(pci);
> > > +
> > > + scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
> > > PEXPME(pcie->index));
> > > +}
> > > +
> > > +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > + struct ls_pcie *pcie = to_ls_pcie(pci);
> > > + u32 val;
> > > +
> > > + /*
> > > +  * Only way let PEX module exit L2 is do a software reset.
> > 
> > Can you expand PEX? What is it used for?
> > 
> > Also if the reset is only for the PEX module, please use the same comment in
> > both patches 2 and 4. Patch 2 doesn't mention PEX in the comment.
> 
> After read spec again, I think PEX is pci express. So it should software
> reset controller. I don't know what exactly did in the chip. But without
> below code, PCIe can't exit L2/L3.
> 
> Any harmful if dwc controller reset? Anyway these code works well with
> intel network card.

If it is a DWC controller reset, then we need to program all CSRs like DBI
etc... But from your reply it seems like the reset is limited to some module,
so it is fine.

- Mani

> 
> Frank
> 
> > 
> > - Mani
> > 
> > > +  * LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
> > > both setting and
> > > +  *  clearing the soft reset on the PEX module.
> > > +  * LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
> > > +  */
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val |= LDBG_WE;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val |= LDBG_SR;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val &= ~LDBG_SR;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val &= ~LDBG_WE;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
> > >   .host_init = ls_pcie_host_init,
> > >   .pme_turn_off = ls_pcie_send_turnoff_msg,
> > > @@ -242,6 +290,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata 
> > > = {
> > >   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
> > >  };
> > >  
> > > +static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
> > > + .host_init = ls_pcie_host_init,
> > > + 

[PATCH] powerpc/44x: select I2C for CURRITUCK

2023-11-30 Thread Randy Dunlap
Fix build errors when CURRITUCK=y and I2C is not builtin (=m or is
not set). Fixes these build errors:

powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function 
`avr_halt_system':
ppc476.c:(.text+0x58): undefined reference to `i2c_smbus_write_byte_data'
powerpc-linux-ld: arch/powerpc/platforms/44x/ppc476.o: in function 
`ppc47x_device_probe':
ppc476.c:(.init.text+0x18): undefined reference to `i2c_register_driver'

Fixes: 2a2c74b2efcb ("IBM Akebono: Add the Akebono platform")
Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Closes: lore.kernel.org/r/202312010820.cmdwf5x9-...@intel.com
Cc: Alistair Popple 
Cc: Benjamin Herrenschmidt 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman 
---
 arch/powerpc/platforms/44x/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff -- a/arch/powerpc/platforms/44x/Kconfig 
b/arch/powerpc/platforms/44x/Kconfig
--- a/arch/powerpc/platforms/44x/Kconfig
+++ b/arch/powerpc/platforms/44x/Kconfig
@@ -173,6 +173,7 @@ config ISS4xx
 config CURRITUCK
bool "IBM Currituck (476fpe) Support"
depends on PPC_47x
+   select I2C
select SWIOTLB
select 476FPE
select FORCE_PCI


Re: [PATCH v2] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

2023-11-30 Thread Haren Myneni




On 11/29/23 5:43 PM, Nathan Lynch wrote:

Haren Myneni  writes:

VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor expects to reissue these HCALLs after 1 or 10msecs.


I would word this as "the architecture suggests that the OS reissue
these [...]" instead of framing it as something the platform "expects".


It might cause these HCALLs takes longer when multiple threads
issue open or close VAS windows simultaneously.


This is imprecise. Over-sleeping by the OS doesn't cause individual
hcalls to take longer. It is more accurate to say that the higher-level
operation (allocate, modify, free) may take longer than necessary in
cases where the OS must retry the hcalls involved.


Correct, takes longer with multiple threads opening/closing windows. I 
will make it clear.





So instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again.

Signed-off-by: Haren Myneni 
Suggested-by: Nathan Lynch 

---
v1 -> v2:
- Use usleep_range instead of using RTAS sleep routine as
   suggested by Nathan
---
  arch/powerpc/platforms/pseries/vas.c | 24 +++-
  1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 71d52a670d95..bade4402741f 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -36,9 +36,31 @@ static bool migration_in_progress;
  
  static long hcall_return_busy_check(long rc)

  {
+   unsigned int ms;


This should move down into the H_IS_LONG_BUSY() block if it's not used
outside of it.


+
/* Check if we are stalled for some time */
if (H_IS_LONG_BUSY(rc)) {
-   msleep(get_longbusy_msecs(rc));
+   ms = get_longbusy_msecs(rc);
+   /*
+* Allocate, Modify and Deallocate HCALLs returns
+* H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
+* for the long delay. So the delay should always be 1
+* or 10msecs, but sleeps 1msec in case if the long
+* delay is > H_LONG_BUSY_ORDER_10_MSEC.
+*/
+   if (ms > 10)
+   ms = 1;


It's strange to coerce ms to 1 when it's greater than 10. Just clamp it
to 10, e.g.

 ms = clamp(get_longbusy_msecs(rc), 1, 10);


Sure, these HCALLs should not return > H_LONG_BUSY_ORDER_10_MSEC.




+
+   /*
+* msleep() will often sleep at least 20 msecs even
+* though the hypervisor expects to reissue these
+* HCALLs after 1 or 10msecs. So use usleep_range()
+* to sleep with the expected value.
+*
+* See Documentation/timers/timers-howto.rst on using
+* the value range in usleep_range().
+*/
+   usleep_range(ms * 100, ms * 1000);


If there's going to be commentary here I think it should just explain
why potentially sleeping for less than the suggested time is OK. There
is wording you can crib in rtas_busy_delay().



rc = H_BUSY;
} else if (rc == H_BUSY) {
cond_resched();
--
2.26.3


Re: [PATCH] powerpc/mm: Fix null-pointer dereference in pgtable_cache_add

2023-11-30 Thread Kunwu Chan

Thanks for your reply.

Ok, I know what you mean, when name is NULL. The process should be 
aborted and the specific reason for the error should be printed, not 
just return.


I will update v2 patch with "panic".

Thanks again,
Kunwu

On 2023/11/28 19:32, Michael Ellerman wrote:

Kunwu Chan  writes:

Hi Christophe,

Thanks for your reply.
It's my bad. According your reply, i read the code in
sysfs_do_create_link_sd.There is a null pointer check indeed.

My intention was to check null pointer after memory allocation.
Whether we can add a comment here for someone like me, the null pointer
check is no need here?


I don't mind there being a NULL check for name.

But the code shouldn't silently return if name can't be allocated.
Notice that if we can't create the cache we *panic*. A failure to
allocate name, which causes us to skip the cache creation, needs to also
panic.

cheers


On 2023/11/24 23:17, Christophe Leroy wrote:



Le 22/11/2023 à 10:00, Kunwu Chan a écrit :

[Vous ne recevez pas souvent de courriers de chen...@kylinos.cn. Découvrez 
pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]

kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.


Are you sure this is needed ? Did you check what happens what name is NULL ?

If I followed stuff correctly, I end up in function
sysfs_do_create_link_sd() which already handles the NULL name case which
a big hammer warning.



Signed-off-by: Kunwu Chan 
---
arch/powerpc/mm/init-common.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 119ef491f797..0884fc601c46 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -139,6 +139,8 @@ void pgtable_cache_add(unsigned int shift)

   align = max_t(unsigned long, align, minalign);
   name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
+   if (!name)
+   return;
   new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
   if (!new)
   panic("Could not allocate pgtable cache for order %d", 
shift);
--
2.34.1



Re: [PATCH 5/5] powerpc/64s: Fix CONFIG_NUMA=n build

2023-11-30 Thread Arnd Bergmann
On Thu, Nov 30, 2023, at 07:43, Michael Ellerman wrote:
> "Arnd Bergmann"  writes:
>> On Wed, Nov 29, 2023, at 14:19, Michael Ellerman wrote:
>>> diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
>>> index 7f9ff0640124..72341b9fb552 100644
>>> --- a/arch/powerpc/mm/mmu_decl.h
>>> +++ b/arch/powerpc/mm/mmu_decl.h
>>> +
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> +int create_section_mapping(unsigned long start, unsigned long end,
>>> +  int nid, pgprot_t prot);
>>> +#endif
>>
>> This one should probably go next to the remove_section_mapping()
>> declaration in arch/powerpc/include/asm/sparsemem.h for consistency.
> 
> That doesn't work due to:
>
> In file included from ../include/linux/numa.h:26,
>  from ../include/linux/async.h:13,
>  from ../init/initramfs.c:3:
> ../arch/powerpc/include/asm/sparsemem.h:19:44: error: unknown type name 
> ‘pgprot_t’
>19 |   int nid, pgprot_t prot);
>   |^~~~
>
> Which might be fixable, but I'd rather just move
> remove_section_mapping() into mmu_decl.h as well.

Ok, makes sense.

 Arnd


[PATCH v2] powerpc/mm: Fix null-pointer dereference in pgtable_cache_add

2023-11-30 Thread Kunwu Chan
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.

Suggested-by: Christophe Leroy 
Suggested-by: Michael Ellerman 
Signed-off-by: Kunwu Chan 
---
v2: Use "panic" instead of "return"
---
 arch/powerpc/mm/init-common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 119ef491f797..9788950b33f5 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -139,6 +139,8 @@ void pgtable_cache_add(unsigned int shift)
 
align = max_t(unsigned long, align, minalign);
name = kasprintf(GFP_KERNEL, "pgtable-2^%d", shift);
+   if (!name)
+   panic("Failed to allocate memory for order %d", shift);
new = kmem_cache_create(name, table_size, align, 0, ctor(shift));
if (!new)
panic("Could not allocate pgtable cache for order %d", shift);
-- 
2.34.1



Re: [kvm-unit-tests PATCH v1 00/18] arm/arm64: Rework cache maintenance at boot

2023-11-30 Thread Alexandru Elisei
Hi,

Thank you so much for reviving this, much appreciated.

I wanted to let you know that I definitely plan to review the series as
soon as possible, unfortunately I don't believe I won't be able to do that
for at least 2 weeks.

Thanks,
Alex

On Thu, Nov 30, 2023 at 04:07:02AM -0500, Shaoqin Huang wrote:
> Hi,
> 
> I'm posting Alexandru's patch set[1] rebased on the latest branch with the
> conflicts being resolved. No big changes compare to its original code.
> 
> As this version 1 of this series was posted one years ago, I would first let 
> you
> recall it, what's the intention of this series and what this series do. You 
> can
> view it by click the link[2] and view the cover-letter.
> 
> Since when writing the series[1], the efi support for arm64[3] hasn't been
> merged into the kvm-unit-tests, but now the efi support for arm64 has been
> merged. Directly rebase the series[1] onto the latest branch will break the 
> efi
> tests. This is mainly because the Patch #15 ("arm/arm64: Enable the MMU 
> early")
> moves the mmu_enable() out of the setup_mmu(), which causes the efi test will
> not enable the mmu. So I do a small change in the efi_mem_init() which makes 
> the
> efi test also enable the MMU early, and make it works.
> 
> And another change should be noticed is in the Patch #17 ("arm/arm64: Perform
> dcache maintenance"). In the efi_mem_init(), it will disable the mmu, and 
> build
> a new pagetable and re-enable the mmu, if the asm_mmu_disable clean and
> invalidate the data caches for entire memory, we don't need to care the dcache
> and after mmu disabled, we use the mmu_setup_early() to re-enable the mmu, 
> which
> takes care all the cache maintenance. But the situation changes since the 
> Patch
> #18 ("arm/arm64: Rework the cache maintenance in asm_mmu_disable") only clean
> and invalidate the data caches for the stack memory area. So we need to clean
> and invalidate the data caches manually before disable the mmu, I'm not
> confident about current cache maintenance at the efi setup patch, so I ask for
> your help to review it if it's right or not.
> 
> And I also drop one patch ("s390: Do not use the physical allocator") from[1]
> since this cause s390 test to fail.
> 
> This series may include bug, so I really appreciate your review to improve 
> this
> series together.
> 
> You can get the code from:
> 
> $ git clone https://gitlab.com/shahuang/kvm-unit-tests.git \
>   -b arm-arm64-rework-cache-maintenance-at-boot-v1
> 
> [1] 
> https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/arm-arm64-rework-cache-maintenance-at-boot-v2-wip2
> [2] 
> https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/
> [3] 
> https://patchwork.kernel.org/project/kvm/cover/20230530160924.82158-1-nikos.nikole...@arm.com/
> 
> Changelog:
> --
> RFC->v1:
>   - Gathered Reviewed-by tags.
>   - Various changes to commit messages and comments to hopefully make the code
> easier to understand.
>   - Patches #8 ("lib/alloc_phys: Expand documentation with usage and 
> limitations")
> are new.
>   - Folded patch "arm: page.h: Add missing libcflat.h include" into #17
> ("arm/arm64: Perform dcache maintenance at boot").
>   - Reordered the series to group patches that touch aproximately the same 
> code
> together - the patches that change the physical allocator are now first,
> followed come the patches that change how the secondaries are brought 
> online.
>   - Fixed several nasty bugs where the r4 register was being clobbered in the 
> arm
> assembly.
>   - Unmap the early UART address if the DTB address does not match the early
> address.
>   - Added dcache maintenance when a page table is modified with the MMU 
> disabled.
>   - Moved the cache maintenance when disabling the MMU to be executed before 
> the
> MMU is disabled.
>   - Rebase it on lasted branch which efi support has been merged.
>   - Make the efi test also enable MMU early.
>   - Add cache maintenance on efi setup path especially before mmu_disable.
> 
> RFC: 
> https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/
> 
> Alexandru Elisei (18):
>   Makefile: Define __ASSEMBLY__ for assembly files
>   powerpc: Replace the physical allocator with the page allocator
>   lib/alloc_phys: Initialize align_min
>   lib/alloc_phys: Consolidate allocate functions into memalign_early()
>   lib/alloc_phys: Remove locking
>   lib/alloc_phys: Remove allocation accounting
>   lib/alloc_phys: Add callback to perform cache maintenance
>   lib/alloc_phys: Expand documentation with usage and limitations
>   arm/arm64: Zero secondary CPUs' stack
>   arm/arm64: Allocate secondaries' stack using the page allocator
>   arm/arm64: assembler.h: Replace size with end address for
> dcache_by_line_op
>   arm/arm64: Add C functions for doing cache maintenance
>   arm/arm64: Configure secondaries' stack before enabling the MMU
>   arm/arm64: Use pgd_alloc() to allocate mmu_idmap
>   

Re: Ping? Re: [PATCH rc] kvm: Prevent compiling virt/kvm/vfio.c unless VFIO is selected

2023-11-30 Thread Jason Gunthorpe
On Wed, Nov 29, 2023 at 06:02:08PM -0800, Sean Christopherson wrote:

> > > Ah, it's the same warning, I just missed the CONFIG_MODULES=n requirement.
> > 
> > Oh, wait, doesn't that mean the approach won't work? IIRC doesn't
> > symbol_get turn into just  when non-modular turning this into a
> > link failure without the kconfig part?
> 
> Yes, but it doesn't cause linker errors.  IIUC, because the extern declaration
> is tagged "weak", a dummy default is used.  E.g. on x86, this is what is 
> generated
> with VFIO=y

Oh wow that is some pretty dark magic there :|

Jason


[PATCH] powerpc/irq: Allow softirq to hardirq stack transition

2023-11-30 Thread Michael Ellerman
Allow a transition from the softirq stack to the hardirq stack when
handling a hardirq. Doing so means a hardirq received while deep in
softirq processing is less likely to cause a stack overflow of the
softirq stack.

Previously it wasn't safe to do so because irq_exit() (which initiates
softirq processing) was called on the hardirq stack.

That was changed in commit 1b1b6a6f4cc0 ("powerpc: handle irq_enter/
irq_exit in interrupt handler wrappers") and 1346d00e1bdf ("powerpc:
Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK").

The allowed transitions are now:
 - process stack -> hardirq stack
 - process stack -> softirq stack
 - process stack -> softirq stack -> hardirq stack

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/irq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 6f7d4edaa0bc..7504ceec5c58 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -284,15 +284,14 @@ static __always_inline void call_do_irq(struct pt_regs 
*regs, void *sp)
 void __do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-   void *cursp, *irqsp, *sirqsp;
+   void *cursp, *irqsp;
 
/* Switch to the irq stack to handle this */
cursp = (void *)(current_stack_pointer & ~(THREAD_SIZE - 1));
irqsp = hardirq_ctx[raw_smp_processor_id()];
-   sirqsp = softirq_ctx[raw_smp_processor_id()];
 
/* Already there ? If not switch stack and call */
-   if (unlikely(cursp == irqsp || cursp == sirqsp))
+   if (unlikely(cursp == irqsp))
__do_irq(regs, current_stack_pointer);
else
call_do_irq(regs, irqsp);
-- 
2.41.0



[PATCH 1/2] powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()

2023-11-30 Thread Michael Ellerman
With NUMA=n and FA_DUMP=y or PRESERVE_FA_DUMP=y the build fails with:

  arch/powerpc/kernel/fadump.c:1739:22: error: no previous prototype for 
‘arch_reserved_kernel_pages’ [-Werror=missing-prototypes]
  1739 | unsigned long __init arch_reserved_kernel_pages(void)
   |  ^~

The prototype for arch_reserved_kernel_pages() is in include/linux/mm.h,
but it's guarded by __HAVE_ARCH_RESERVED_KERNEL_PAGES. The powerpc
headers define __HAVE_ARCH_RESERVED_KERNEL_PAGES in asm/mmzone.h, which
is not included into the generic headers when NUMA=n.

Move the definition of __HAVE_ARCH_RESERVED_KERNEL_PAGES into asm/mmu.h
which is included regardless of NUMA=n.

Additionally the ifdef around __HAVE_ARCH_RESERVED_KERNEL_PAGES needs to
also check for CONFIG_PRESERVE_FA_DUMP.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/mmu.h| 4 
 arch/powerpc/include/asm/mmzone.h | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 52cc25864a1b..d8b7e246a32f 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -412,5 +412,9 @@ extern void *abatron_pteptrs[2];
 #include 
 #endif
 
+#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
+#define __HAVE_ARCH_RESERVED_KERNEL_PAGES
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_MMU_H_ */
diff --git a/arch/powerpc/include/asm/mmzone.h 
b/arch/powerpc/include/asm/mmzone.h
index 4740ca230d36..da827d2d0866 100644
--- a/arch/powerpc/include/asm/mmzone.h
+++ b/arch/powerpc/include/asm/mmzone.h
@@ -42,9 +42,6 @@ u64 memory_hotplug_max(void);
 #else
 #define memory_hotplug_max() memblock_end_of_DRAM()
 #endif /* CONFIG_NUMA */
-#ifdef CONFIG_FA_DUMP
-#define __HAVE_ARCH_RESERVED_KERNEL_PAGES
-#endif
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_MMZONE_H_ */
-- 
2.41.0



[kvm-unit-tests PATCH v1 02/18] powerpc: Replace the physical allocator with the page allocator

2023-11-30 Thread Shaoqin Huang
From: Alexandru Elisei 

The spapr_hcall test makes two page sized allocations using the physical
allocator. Replace the physical allocator with the page allocator, which
has has more features (like support for freeing allocations), and would
allow for further simplification of the physical allocator.

CC: Laurent Vivier 
CC: Thomas Huth 
CC: kvm-...@vger.kernel.org
Signed-off-by: Alexandru Elisei 
---
 lib/powerpc/setup.c | 9 ++---
 powerpc/Makefile.common | 1 +
 powerpc/spapr_hcall.c   | 5 +++--
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/powerpc/setup.c b/lib/powerpc/setup.c
index 1be4c030..80fd38ae 100644
--- a/lib/powerpc/setup.c
+++ b/lib/powerpc/setup.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -111,6 +112,7 @@ static void mem_init(phys_addr_t freemem_start)
struct mem_region primary, mem = {
.start = (phys_addr_t)-1,
};
+   phys_addr_t base, top;
int nr_regs, i;
 
nr_regs = dt_get_memory_params(regs, NR_MEM_REGIONS);
@@ -146,9 +148,10 @@ static void mem_init(phys_addr_t freemem_start)
__physical_start = mem.start;   /* PHYSICAL_START */
__physical_end = mem.end;   /* PHYSICAL_END */
 
-   phys_alloc_init(freemem_start, primary.end - freemem_start);
-   phys_alloc_set_minimum_alignment(__icache_bytes > __dcache_bytes
-? __icache_bytes : __dcache_bytes);
+   base = PAGE_ALIGN(freemem_start) >> PAGE_SHIFT;
+   top = primary.end >> PAGE_SHIFT;
+   page_alloc_init_area(0, base, top);
+   page_alloc_ops_enable();
 }
 
 void setup(const void *fdt)
diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index f8f47490..ae70443a 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -34,6 +34,7 @@ include $(SRCDIR)/scripts/asm-offsets.mak
 cflatobjs += lib/util.o
 cflatobjs += lib/getchar.o
 cflatobjs += lib/alloc_phys.o
+cflatobjs += lib/alloc_page.o
 cflatobjs += lib/alloc.o
 cflatobjs += lib/devicetree.o
 cflatobjs += lib/migrate.o
diff --git a/powerpc/spapr_hcall.c b/powerpc/spapr_hcall.c
index e9b5300a..77ab4187 100644
--- a/powerpc/spapr_hcall.c
+++ b/powerpc/spapr_hcall.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -58,8 +59,8 @@ static void test_h_page_init(int argc, char **argv)
if (argc > 1)
report_abort("Unsupported argument: '%s'", argv[1]);
 
-   dst = memalign(PAGE_SIZE, PAGE_SIZE);
-   src = memalign(PAGE_SIZE, PAGE_SIZE);
+   dst = alloc_page();
+   src = alloc_page();
if (!dst || !src)
report_abort("Failed to alloc memory");
 
-- 
2.40.1



Re: [PATCH v2 1/3] powerpc/code-patching: Add generic memory patching

2023-11-30 Thread Naveen N Rao
On Mon, Oct 16, 2023 at 04:01:45PM +1100, Benjamin Gray wrote:
> patch_instruction() is designed for patching instructions in otherwise
> readonly memory. Other consumers also sometimes need to patch readonly
> memory, so have abused patch_instruction() for arbitrary data patches.
> 
> This is a problem on ppc64 as patch_instruction() decides on the patch
> width using the 'instruction' opcode to see if it's a prefixed
> instruction. Data that triggers this can lead to larger writes, possibly
> crossing a page boundary and failing the write altogether.
> 
> Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and
> patch_u64() (on ppc64) designed for aligned data patches. The patch
> size is now determined by the called function, and is passed as an
> additional parameter to generic internals.
> 
> While the instruction flushing is not required for data patches, the
> use cases for data patching (mainly module loading and static calls)
> are less performance sensitive than for instruction patching
> (ftrace activation).

That's debatable. While it is nice to be able to activate function 
tracing quickly, it is not necessarily a hot path. On the flip side, I 
do have a use case for data patching for ftrace activation :)

> So the instruction flushing remains unconditional
> in this patch.
> 
> ppc32 does not support prefixed instructions, so is unaffected by the
> original issue. Care is taken in not exposing the size parameter in the
> public (non-static) interface, so the compiler can const-propagate it
> away.
> 
> Signed-off-by: Benjamin Gray 
> 
> ---
> 
> v2: * Deduplicate patch_32() definition
> * Use u32 for val32
> * Remove noinline
> ---
>  arch/powerpc/include/asm/code-patching.h | 33 
>  arch/powerpc/lib/code-patching.c | 66 ++--
>  2 files changed, 83 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index 3f881548fb61..7c6056bb1706 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -75,6 +75,39 @@ int patch_branch(u32 *addr, unsigned long target, int 
> flags);
>  int patch_instruction(u32 *addr, ppc_inst_t instr);
>  int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
>  
> +/*
> + * patch_uint() and patch_ulong() should only be called on addresses where 
> the
> + * patch does not cross a cacheline, otherwise it may not be flushed properly
> + * and mixes of new and stale data may be observed. It cannot cross a page
> + * boundary, as only the target page is mapped as writable.

Should we enforce alignment requirements, especially for patch_ulong() 
on 64-bit powerpc? I am not sure if there are use cases for unaligned 
64-bit writes. That should also ensure that the write doesn't cross a 
cacheline.

> + *
> + * patch_instruction() and other instruction patchers automatically satisfy 
> this
> + * requirement due to instruction alignment requirements.
> + */
> +
> +#ifdef CONFIG_PPC64
> +
> +int patch_uint(void *addr, unsigned int val);
> +int patch_ulong(void *addr, unsigned long val);
> +
> +#define patch_u64 patch_ulong
> +
> +#else
> +
> +static inline int patch_uint(u32 *addr, unsigned int val)

Is there a reason to use u32 * here?

> +{
> + return patch_instruction(addr, ppc_inst(val));
> +}
> +
> +static inline int patch_ulong(void *addr, unsigned long val)
> +{
> + return patch_instruction(addr, ppc_inst(val));
> +}
> +
> +#endif
> +
> +#define patch_u32 patch_uint
> +
>  static inline unsigned long patch_site_addr(s32 *site)
>  {
>   return (unsigned long)site + *site;
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index b00112d7ad46..60289332412f 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -20,15 +20,14 @@
>  #include 
>  #include 
>  
> -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
> *patch_addr)
> +static int __patch_memory(void *exec_addr, unsigned long val, void 
> *patch_addr,
> +   bool is_dword)
>  {
> - if (!ppc_inst_prefixed(instr)) {
> - u32 val = ppc_inst_val(instr);
> + if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
> + u32 val32 = val;

Would be good to add a comment indicating the need for this for BE.

- Naveen



Re: [PATCH] perf test record+probe_libc_inet_pton: Fix call chain match on powerpc

2023-11-30 Thread Likhitha Korrapati

Hi Arnaldo,

Thank you for pointing it. From next time I will take care of it.

-Likhitha.

On 30/11/23 02:26, Arnaldo Carvalho de Melo wrote:

Em Sun, Nov 26, 2023 at 02:09:14AM -0500, Likhitha Korrapati escreveu:

The perf test "probe libc's inet_pton & backtrace it with ping" fails on
powerpc as below:

root@xxx perf]# perf test -v "probe libc's inet_pton & backtrace it with
ping"
  85: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 96028
ping 96056 [002] 127271.101961: probe_libc:inet_pton: (7fffa1779a60)
7fffa1779a60 __GI___inet_pton+0x0
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
7fffa172a73c getaddrinfo+0x121c
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
FAIL: expected backtrace entry
"gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/glibc-hwcaps/power10/libc.so.6\)$"
got "7fffa172a73c getaddrinfo+0x121c
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)"
test child finished with -1
 end 

Try to have quoted output, the ones separated by  at the beginning
of the line indented two spaces, so as to avoid:

perf test record+probe_libc_inet_pton: Fix call chain match on powerpc

The perf test "probe libc's inet_pton & backtrace it with ping" fails on
powerpc as below:

root@xxx perf]# perf test -v "probe libc's inet_pton & backtrace it with
ping"
  85: probe libc's inet_pton & backtrace it with ping :

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Author:Likhitha Korrapati 
# Date:  Sun Nov 26 02:09:14 2023 -0500


I'm copy and pasting from the original post, thanks!

- Arnaldo


probe libc's inet_pton & backtrace it with ping: FAILED!

This test installs a probe on libc's inet_pton function, which will use
uprobes and then uses perf trace on a ping to localhost. It gets 3
levels deep backtrace and checks whether it is what we expected or not.

The test started failing from RHEL 9.4 where as it works in previous
distro version (RHEL 9.2). Test expects gaih_inet function to be part of
backtrace. But in the glibc version (2.34-86) which is part of distro
where it fails, this function is missing and hence the test is failing.

 From nm and ping command output we can confirm that gaih_inet function
is not present in the expected backtrace for glibc version glibc-2.34-86

[root@xxx perf]# nm /usr/lib64/glibc-hwcaps/power10/libc.so.6 | grep gaih_inet
001273e0 t gaih_inet_serv
001cd8d8 r gaih_inet_typeproto

[root@xxx perf]# perf script -i /tmp/perf.data.6E8
ping  104048 [000] 128582.508976: probe_libc:inet_pton: (7fff83779a60)
 7fff83779a60 __GI___inet_pton+0x0
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
 7fff8372a73c getaddrinfo+0x121c
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
11dc73534 [unknown] (/usr/bin/ping)
 7fff8362a8c4 __libc_start_call_main+0x84
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)

FAIL: expected backtrace entry
"gaih_inet.*\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/glibc-hwcaps/power10/libc.so.6\)$"
got "7fff9d52a73c getaddrinfo+0x121c
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)"

With version glibc-2.34-60 gaih_inet function is present as part of the
expected backtrace. So we cannot just remove the gaih_inet function from
the backtrace.

[root@xxx perf]# nm /usr/lib64/glibc-hwcaps/power10/libc.so.6 | grep gaih_inet
00130490 t gaih_inet.constprop.0
0012e830 t gaih_inet_serv
001d45e4 r gaih_inet_typeproto

[root@xxx perf]# ./perf script -i /tmp/perf.data.b6S
ping   67906 [000] 22699.591699: probe_libc:inet_pton_3: (7fffbdd80820)
 7fffbdd80820 __GI___inet_pton+0x0
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
 7fffbdd31160 gaih_inet.constprop.0+0xcd0
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
 7fffbdd31c7c getaddrinfo+0x14c
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
1140d3558 [unknown] (/usr/bin/ping)

This patch solves this issue by doing a conditional skip. If there is a
gaih_inet function present in the libc then it will be added to the
expected backtrace else the function will be skipped from being added
to the expected backtrace.

Output with the patch

[root@xxx perf]# ./perf test -v "probe libc's inet_pton & backtrace it
with ping"
  83: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 102662
ping 102692 [000] 127935.549973: probe_libc:inet_pton: (7fff93379a60)
7fff93379a60 __GI___inet_pton+0x0
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
7fff9332a73c getaddrinfo+0x121c
(/usr/lib64/glibc-hwcaps/power10/libc.so.6)
11ef03534 [unknown] (/usr/bin/ping)
test child finished with 0
 end 
probe libc's inet_pton & backtrace it with ping: Ok

Signed-off-by: Likhitha Korrapati 
Reported-by: Disha Goel 
---
  tools/perf/tests/shell/record+probe_libc_inet_pton.sh | 5 -
  1 file changed, 4 insertions(+), 1 

[kvm-unit-tests PATCH v1 00/18] arm/arm64: Rework cache maintenance at boot

2023-11-30 Thread Shaoqin Huang
Hi,

I'm posting Alexandru's patch set[1] rebased on the latest branch with the
conflicts being resolved. No big changes compare to its original code.

As this version 1 of this series was posted one years ago, I would first let you
recall it, what's the intention of this series and what this series do. You can
view it by click the link[2] and view the cover-letter.

Since when writing the series[1], the efi support for arm64[3] hasn't been
merged into the kvm-unit-tests, but now the efi support for arm64 has been
merged. Directly rebase the series[1] onto the latest branch will break the efi
tests. This is mainly because the Patch #15 ("arm/arm64: Enable the MMU early")
moves the mmu_enable() out of the setup_mmu(), which causes the efi test will
not enable the mmu. So I do a small change in the efi_mem_init() which makes the
efi test also enable the MMU early, and make it works.

And another change should be noticed is in the Patch #17 ("arm/arm64: Perform
dcache maintenance"). In the efi_mem_init(), it will disable the mmu, and build
a new pagetable and re-enable the mmu, if the asm_mmu_disable clean and
invalidate the data caches for entire memory, we don't need to care the dcache
and after mmu disabled, we use the mmu_setup_early() to re-enable the mmu, which
takes care all the cache maintenance. But the situation changes since the Patch
#18 ("arm/arm64: Rework the cache maintenance in asm_mmu_disable") only clean
and invalidate the data caches for the stack memory area. So we need to clean
and invalidate the data caches manually before disable the mmu, I'm not
confident about current cache maintenance at the efi setup patch, so I ask for
your help to review it if it's right or not.

And I also drop one patch ("s390: Do not use the physical allocator") from[1]
since this cause s390 test to fail.

This series may include bug, so I really appreciate your review to improve this
series together.

You can get the code from:

$ git clone https://gitlab.com/shahuang/kvm-unit-tests.git \
-b arm-arm64-rework-cache-maintenance-at-boot-v1

[1] 
https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/arm-arm64-rework-cache-maintenance-at-boot-v2-wip2
[2] https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/
[3] 
https://patchwork.kernel.org/project/kvm/cover/20230530160924.82158-1-nikos.nikole...@arm.com/

Changelog:
--
RFC->v1:
  - Gathered Reviewed-by tags.
  - Various changes to commit messages and comments to hopefully make the code
easier to understand.
  - Patches #8 ("lib/alloc_phys: Expand documentation with usage and 
limitations")
are new.
  - Folded patch "arm: page.h: Add missing libcflat.h include" into #17
("arm/arm64: Perform dcache maintenance at boot").
  - Reordered the series to group patches that touch aproximately the same code
together - the patches that change the physical allocator are now first,
followed come the patches that change how the secondaries are brought 
online.
  - Fixed several nasty bugs where the r4 register was being clobbered in the 
arm
assembly.
  - Unmap the early UART address if the DTB address does not match the early
address.
  - Added dcache maintenance when a page table is modified with the MMU 
disabled.
  - Moved the cache maintenance when disabling the MMU to be executed before the
MMU is disabled.
  - Rebase it on lasted branch which efi support has been merged.
  - Make the efi test also enable MMU early.
  - Add cache maintenance on efi setup path especially before mmu_disable.

RFC: 
https://lore.kernel.org/all/20220809091558.14379-1-alexandru.eli...@arm.com/

Alexandru Elisei (18):
  Makefile: Define __ASSEMBLY__ for assembly files
  powerpc: Replace the physical allocator with the page allocator
  lib/alloc_phys: Initialize align_min
  lib/alloc_phys: Consolidate allocate functions into memalign_early()
  lib/alloc_phys: Remove locking
  lib/alloc_phys: Remove allocation accounting
  lib/alloc_phys: Add callback to perform cache maintenance
  lib/alloc_phys: Expand documentation with usage and limitations
  arm/arm64: Zero secondary CPUs' stack
  arm/arm64: Allocate secondaries' stack using the page allocator
  arm/arm64: assembler.h: Replace size with end address for
dcache_by_line_op
  arm/arm64: Add C functions for doing cache maintenance
  arm/arm64: Configure secondaries' stack before enabling the MMU
  arm/arm64: Use pgd_alloc() to allocate mmu_idmap
  arm/arm64: Enable the MMU early
  arm/arm64: Map the UART when creating the translation tables
  arm/arm64: Perform dcache maintenance at boot
  arm/arm64: Rework the cache maintenance in asm_mmu_disable

 Makefile   |   5 +-
 arm/Makefile.arm   |   4 +-
 arm/Makefile.arm64 |   4 +-
 arm/Makefile.common|   6 +-
 arm/cstart.S   |  71 +++--
 arm/cstart64.S |  76 +--
 lib/alloc_phys.c   | 122 

[kvm-unit-tests PATCH v1 01/18] Makefile: Define __ASSEMBLY__ for assembly files

2023-11-30 Thread Shaoqin Huang
From: Alexandru Elisei 

There are 25 header files today (found with grep -r "#ifndef __ASSEMBLY__)
with functionality relies on the __ASSEMBLY__ prepocessor constant being
correctly defined to work correctly. So far, kvm-unit-tests has relied on
the assembly files to define the constant before including any header
files which depend on it.

Let's make sure that nobody gets this wrong and define it as a compiler
constant when compiling assembly files. __ASSEMBLY__ is now defined for all
.S files, even those that didn't set it explicitely before.

Reviewed-by: Nikos Nikoleris 
Reviewed-by: Andrew Jones 
Signed-off-by: Alexandru Elisei 
Signed-off-by: Shaoqin Huang 
---
 Makefile   | 5 -
 arm/cstart.S   | 1 -
 arm/cstart64.S | 1 -
 powerpc/cstart64.S | 1 -
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 602910dd..27ed14e6 100644
--- a/Makefile
+++ b/Makefile
@@ -92,6 +92,9 @@ CFLAGS += -Woverride-init -Wmissing-prototypes 
-Wstrict-prototypes
 
 autodepend-flags = -MMD -MP -MF $(dir $*).$(notdir $*).d
 
+AFLAGS  = $(CFLAGS)
+AFLAGS += -D__ASSEMBLY__
+
 LDFLAGS += -nostdlib $(no_pie) -z noexecstack
 
 $(libcflat): $(cflatobjs)
@@ -113,7 +116,7 @@ directories:
@mkdir -p $(OBJDIRS)
 
 %.o: %.S
-   $(CC) $(CFLAGS) -c -nostdlib -o $@ $<
+   $(CC) $(AFLAGS) -c -nostdlib -o $@ $<
 
 -include */.*.d */*/.*.d
 
diff --git a/arm/cstart.S b/arm/cstart.S
index 3dd71ed9..b24ecabc 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -5,7 +5,6 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#define __ASSEMBLY__
 #include 
 #include 
 #include 
diff --git a/arm/cstart64.S b/arm/cstart64.S
index bc2be45a..a8ad6dc8 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -5,7 +5,6 @@
  *
  * This work is licensed under the terms of the GNU GPL, version 2.
  */
-#define __ASSEMBLY__
 #include 
 #include 
 #include 
diff --git a/powerpc/cstart64.S b/powerpc/cstart64.S
index 34e39341..fa32ef24 100644
--- a/powerpc/cstart64.S
+++ b/powerpc/cstart64.S
@@ -5,7 +5,6 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#define __ASSEMBLY__
 #include 
 #include 
 #include 
-- 
2.40.1



Re: [PATCH v2 2/3] powerpc/64: Convert patch_instruction() to patch_u32()

2023-11-30 Thread Naveen N Rao
On Mon, Oct 16, 2023 at 04:01:46PM +1100, Benjamin Gray wrote:
> This use of patch_instruction() is working on 32 bit data, and can fail
> if the data looks like a prefixed instruction and the extra write
> crosses a page boundary. Use patch_u32() to fix the write size.
> 
> Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules")
> Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
> Signed-off-by: Benjamin Gray 
> 
> ---
> 
> v2: * Added the fixes tag, it seems appropriate even if the subject does
>   mention a more robust solution being required.
> 
> patch_u64() should be more efficient, but judging from the bug report
> it doesn't seem like the data is doubleword aligned.

That doesn't look to be the case anymore due to commits 77e69ee7ce07 
("powerpc/64: modules support building with PCREL addresing") and 
7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL 
addresing")

- Naveen



[PATCH 2/2] powerpc: Fix build error due to is_valid_bugaddr()

2023-11-30 Thread Michael Ellerman
With CONFIG_GENERIC_BUG=n the build fails with:

  arch/powerpc/kernel/traps.c:1442:5: error: no previous prototype for 
‘is_valid_bugaddr’ [-Werror=missing-prototypes]
  1442 | int is_valid_bugaddr(unsigned long addr)
   | ^~~~

The prototype is only defined, and the function is only needed, when
CONFIG_GENERIC_BUG=y, so move the implementation under that.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/traps.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 5ea2014aff90..11e062b47d3f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1439,10 +1439,12 @@ static int emulate_instruction(struct pt_regs *regs)
return -EINVAL;
 }
 
+#ifdef CONFIG_GENERIC_BUG
 int is_valid_bugaddr(unsigned long addr)
 {
return is_kernel_addr(addr);
 }
+#endif
 
 #ifdef CONFIG_MATH_EMULATION
 static int emulate_math(struct pt_regs *regs)
-- 
2.41.0



Re: [PATCH 1/2] kexec: fix KEXEC_FILE dependencies

2023-11-30 Thread Eric DeVolder



On 11/30/23 10:56, Andrew Morton wrote:

On Thu, 2 Nov 2023 16:03:18 +0800 Baoquan He  wrote:


CONFIG_KEXEC_FILE, but still get purgatory code built in which is
totally useless.

Not sure if I think too much over this.

I see your point here, and I would suggest changing the
CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY symbol to just indicate
the availability of the purgatory code for the arch, rather
than actually controlling the code itself. I already mentioned
this for s390, but riscv would need the same thing on top.

I think the change below should address your concern.

Since no new comment, do you mind spinning v2 to wrap all these up?

This patchset remains in mm-hotfixes-unstable from the previous -rc
cycle.  Eric, do you have any comments?  Arnd, do you plan on a v2?  If
not, should I merge v1?  If so, should I now add cc:stable?


My apologies, I lost this. I've looked at these changes, and I am in 
favor of these changes.


Furthermore, I ran the following thru the Kconfig regression script, and 
did not find anything!


I believe the following patch represents the current discussion threads 
around Kconfig and KEXEC/CRASH.


Reviewed-by: Eric DeVolder 

Tested-by: Eric DeVolder 

Thanks!

eric

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..1f11a62809f2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -608,10 +608,10 @@ config ARCH_SUPPORTS_KEXEC
 def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)

 config ARCH_SUPPORTS_KEXEC_FILE
-    def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
+    def_bool PPC64

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
+    def_bool y

 config ARCH_SELECTS_KEXEC_FILE
 def_bool y
diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild
index d25ad1c19f88..ab181d187c23 100644
--- a/arch/riscv/Kbuild
+++ b/arch/riscv/Kbuild
@@ -5,7 +5,7 @@ obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
 obj-y += errata/
 obj-$(CONFIG_KVM) += kvm/

-obj-$(CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY) += purgatory/
+obj-$(CONFIG_KEXEC_FILE) += purgatory/

 # for cleaning
 subdir- += boot
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a..98857d76e458 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -702,9 +702,7 @@ config ARCH_SELECTS_KEXEC_FILE
 select KEXEC_ELF

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
-    depends on CRYPTO=y
-    depends on CRYPTO_SHA256=y
+    def_bool y

 config ARCH_SUPPORTS_CRASH_DUMP
 def_bool y
diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
index e60fbd8660c4..3ac341d296db 100644
--- a/arch/riscv/kernel/elf_kexec.c
+++ b/arch/riscv/kernel/elf_kexec.c
@@ -266,7 +266,7 @@ static void *elf_kexec_load(struct kimage *image, 
char *kernel_buf,

     cmdline = modified_cmdline;
 }

-#ifdef CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY
+#ifdef CONFIG_KEXEC_FILE
 /* Add purgatory to the image */
 kbuf.top_down = true;
 kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
@@ -280,7 +280,7 @@ static void *elf_kexec_load(struct kimage *image, 
char *kernel_buf,

                  sizeof(kernel_start), 0);
 if (ret)
     pr_err("Error update purgatory ret=%d\n", ret);
-#endif /* CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY */
+#endif /* CONFIG_KEXEC_FILE */

 /* Add the initrd to the image */
 if (initrd != NULL) {
diff --git a/arch/s390/Kbuild b/arch/s390/Kbuild
index a5d3503b353c..f2ce80b65551 100644
--- a/arch/s390/Kbuild
+++ b/arch/s390/Kbuild
@@ -7,7 +7,7 @@ obj-$(CONFIG_S390_HYPFS)    += hypfs/
 obj-$(CONFIG_APPLDATA_BASE)    += appldata/
 obj-y                += net/
 obj-$(CONFIG_PCI)        += pci/
-obj-$(CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY) += purgatory/
+obj-$(CONFIG_KEXEC_FILE) += purgatory/

 # for cleaning
 subdir- += boot tools
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3bec98d20283..d5d8f99d1f25 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -254,13 +254,13 @@ config ARCH_SUPPORTS_KEXEC
 def_bool y

 config ARCH_SUPPORTS_KEXEC_FILE
-    def_bool CRYPTO && CRYPTO_SHA256 && CRYPTO_SHA256_S390
+    def_bool y

 config ARCH_SUPPORTS_KEXEC_SIG
 def_bool MODULE_SIG_FORMAT

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
+    def_bool y

 config ARCH_SUPPORTS_CRASH_DUMP
 def_bool y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3762f41bb092..1566748f16c4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2072,7 +2072,7 @@ config ARCH_SUPPORTS_KEXEC
 def_bool y

 config ARCH_SUPPORTS_KEXEC_FILE
-    def_bool X86_64 && CRYPTO && CRYPTO_SHA256
+    def_bool X86_64

 config ARCH_SELECTS_KEXEC_FILE
 def_bool y
@@ -2080,7 +2080,7 @@ config ARCH_SELECTS_KEXEC_FILE
 select HAVE_IMA_KEXEC if IMA

 config ARCH_SUPPORTS_KEXEC_PURGATORY
-    def_bool KEXEC_FILE
+    def_bool y

 config ARCH_SUPPORTS_KEXEC_SIG
 def_bool y
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 7aff28ded2f4..92120e396008 100644
--- a/kernel/Kconfig.kexec
+++ 

Re: [PATCH] scsi: ibmvfc: replace deprecated strncpy with strscpy

2023-11-30 Thread Kees Cook
On Mon, Oct 30, 2023 at 07:04:33PM +, Justin Stitt wrote:
> strncpy() is deprecated for use on NUL-terminated destination strings
> [1] and as such we should prefer more robust and less ambiguous string
> interfaces.
> 
> We expect these fields to be NUL-terminated as the property names from
> which they are derived are also NUL-terminated.
> 
> Moreover, NUL-padding is not required as our destination buffers are
> already NUL-allocated and any future NUL-byte assignments are redundant
> (like the ones that strncpy() does).
> ibmvfc_probe() ->
> |   struct ibmvfc_host *vhost;
> |   struct Scsi_Host *shost;
> ...
> | shost = scsi_host_alloc(_template, sizeof(*vhost));
> ... **side note: is this a bug? Looks like a type to me   ^**

I think this is the "privsize", so *vhost is correct, IIUC.

> ...
> | vhost = shost_priv(shost);

I.e. vhost is a part of the shost allocation

> 
> ... where shost_priv() is:
> |   static inline void *shost_priv(struct Scsi_Host *shost)
> |   {
> | return (void *)shost->hostdata;
> |   }
> 
> .. and:
> scsi_host_alloc() ->
> | shost = kzalloc(sizeof(struct Scsi_Host) + privsize, GFP_KERNEL);

As seen here. :)

> 
> And for login_info->..., NUL-padding is also not required as it is
> explicitly memset to 0:
> | memset(login_info, 0, sizeof(*login_info));
> 
> Considering the above, a suitable replacement is `strscpy` [2] due to
> the fact that it guarantees NUL-termination on the destination buffer
> without unnecessarily NUL-padding.
> 
> Link: 
> https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
>  [1]
> Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html 
> [2]
> Link: https://github.com/KSPP/linux/issues/90
> Cc: linux-harden...@vger.kernel.org
> Signed-off-by: Justin Stitt 

Yeah, this conversion looks correct to me too.

Reviewed-by: Kees Cook 

-Kees

> ---
> Note: build-tested only.
> 
> Found with: $ rg "strncpy\("
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ce9eb00e2ca0..e73a39b1c832 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -1455,7 +1455,7 @@ static void ibmvfc_gather_partition_info(struct 
> ibmvfc_host *vhost)
>  
>   name = of_get_property(rootdn, "ibm,partition-name", NULL);
>   if (name)
> - strncpy(vhost->partition_name, name, 
> sizeof(vhost->partition_name));
> + strscpy(vhost->partition_name, name, 
> sizeof(vhost->partition_name));
>   num = of_get_property(rootdn, "ibm,partition-no", NULL);
>   if (num)
>   vhost->partition_number = *num;
> @@ -1498,13 +1498,15 @@ static void ibmvfc_set_login_info(struct ibmvfc_host 
> *vhost)
>   login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token);
>   login_info->async.len = cpu_to_be32(async_crq->size *
>   sizeof(*async_crq->msgs.async));
> - strncpy(login_info->partition_name, vhost->partition_name, 
> IBMVFC_MAX_NAME);
> - strncpy(login_info->device_name,
> - dev_name(>host->shost_gendev), IBMVFC_MAX_NAME);
> + strscpy(login_info->partition_name, vhost->partition_name,
> + sizeof(login_info->partition_name));
> +
> + strscpy(login_info->device_name,
> + dev_name(>host->shost_gendev), 
> sizeof(login_info->device_name));
>  
>   location = of_get_property(of_node, "ibm,loc-code", NULL);
>   location = location ? location : dev_name(vhost->dev);
> - strncpy(login_info->drc_name, location, IBMVFC_MAX_NAME);
> + strscpy(login_info->drc_name, location, sizeof(login_info->drc_name));
>  }
>  
>  /**
> 
> ---
> base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa
> change-id: 20231030-strncpy-drivers-scsi-ibmvscsi-ibmvfc-c-ccfce3255d58
> 
> Best regards,
> --
> Justin Stitt 
> 

-- 
Kees Cook


Re: [PATCH] scsi: ibmvscsi: replace deprecated strncpy with strscpy

2023-11-30 Thread Kees Cook
On Mon, Oct 30, 2023 at 08:40:48PM +, Justin Stitt wrote:
> strncpy() is deprecated for use on NUL-terminated destination strings
> [1] and as such we should prefer more robust and less ambiguous string
> interfaces.
> 
> We expect partition_name to be NUL-terminated based on its usage with
> format strings:
> |   dev_info(hostdata->dev, "host srp version: %s, "
> |"host partition %s (%d), OS %d, max io %u\n",
> |hostdata->madapter_info.srp_version,
> |hostdata->madapter_info.partition_name,
> |be32_to_cpu(hostdata->madapter_info.partition_number),
> |be32_to_cpu(hostdata->madapter_info.os_type),
> |be32_to_cpu(hostdata->madapter_info.port_max_txu[0]));
> ...
> |   len = snprintf(buf, PAGE_SIZE, "%s\n",
> |hostdata->madapter_info.partition_name);
> 
> Moreover, NUL-padding is not required as madapter_info is explicitly
> memset to 0:
> |   memset(>madapter_info, 0x00,
> |   sizeof(hostdata->madapter_info));
> 
> Considering the above, a suitable replacement is `strscpy` [2] due to
> the fact that it guarantees NUL-termination on the destination buffer
> without unnecessarily NUL-padding.
> 
> Link: 
> https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
>  [1]
> Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html 
> [2]
> Link: https://github.com/KSPP/linux/issues/90
> Cc: linux-harden...@vger.kernel.org
> Signed-off-by: Justin Stitt 

Agreed; this conversion looks correct to me too.

Reviewed-by: Kees Cook 

-- 
Kees Cook


Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-11-30 Thread Peter Xu
On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
> On Fri, Nov 24, 2023 at 09:06:01AM +, Ryan Roberts wrote:
> > I don't have any micro-benchmarks for GUP though, if that's your question. 
> > Is
> > there an easy-to-use test I can run to get some numbers? I'd be happy to 
> > try it out.
> 
> Thanks Ryan.  Then nothing is needed to be tested if gup is not yet touched
> from your side, afaict.  I'll see whether I can provide some rough numbers
> instead in the next post (I'll probably only be able to test it in a VM,
> though, but hopefully that should still reflect mostly the truth).

An update: I finished a round of 64K cont_pte test, in the slow gup micro
benchmark I see ~15% perf degrade with this patchset applied on a VM on top
of Apple M1.

Frankly that's even less than I expected, considering not only how slow gup
THP used to be, but also on the fact that that's a tight loop over slow
gup, which in normal cases shouldn't happen: "present" ptes normally goes
to fast-gup, while !present goes into a fault following it.  I assume
that's why nobody cared slow gup for THP before.  I think adding cont_pte
support shouldn't be very hard, but that will include making cont_pte idea
global just for arm64 and riscv Svnapot.

The current plan is I'll add that performance number into my commit message
only, as I don't ever expect any real workload will regress with it.  Maybe
a global cont_pte api will be needed at some point, but perhaps not yet
feel strongly for this use case.

Please feel free to raise any concerns otherwise.

Thanks,

-- 
Peter Xu



Re: [PATCH v4 05/13] powerpc/rtas: Facilitate high-level call sequences

2023-11-30 Thread Nathan Lynch
Nathan Lynch  writes:
> Alternatively, sys_rtas() could be refactored into locking and
> non-locking paths, e.g.
>
> static long __do_sys_rtas(struct rtas_function *func)
> {
>   // [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ]
> }

Of course this conflicts with the code generated by SYSCALL_DEFINE1(rtas), so
a different function name would be needed here.


Re: linux-next: build failure after merge of the mm tree

2023-11-30 Thread Stephen Rothwell
Hi all,

On Mon, 27 Nov 2023 14:48:52 +1100 Stephen Rothwell  
wrote:
> 
> Just cc'ing the PowerPC guys to see if my fix is sensible.
> 
> On Mon, 27 Nov 2023 13:28:09 +1100 Stephen Rothwell  
> wrote:
> >
> > After merging the mm tree, today's linux-next build (powerpc64
> > allnoconfig) failed like this:
> > 
> > arch/powerpc/mm/book3s64/pgtable.c:557:5: error: no previous prototype for 
> > 'pmd_move_must_withdraw' [-Werror=missing-prototypes]
> >   557 | int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> >   | ^~
> > cc1: all warnings being treated as errors
> > 
> > Caused by commit
> > 
> >   c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
> > 
> > I have added the following patch for today (which could be applied to
> > the mm or powerpc trees):
> > 
> > From 194805b44c11b4c0aa28bdcdc0bb0d82acef394c Mon Sep 17 00:00:00 2001
> > From: Stephen Rothwell 
> > Date: Mon, 27 Nov 2023 13:08:57 +1100
> > Subject: [PATCH] powerpc: pmd_move_must_withdraw() is only needed for
> >  CONFIG_TRANSPARENT_HUGEPAGE
> > 
> > Signed-off-by: Stephen Rothwell 
> > ---
> >  arch/powerpc/mm/book3s64/pgtable.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> > b/arch/powerpc/mm/book3s64/pgtable.c
> > index be229290a6a7..3438ab72c346 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -542,6 +542,7 @@ void ptep_modify_prot_commit(struct vm_area_struct 
> > *vma, unsigned long addr,
> > set_pte_at(vma->vm_mm, addr, ptep, pte);
> >  }
> >  
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  /*
> >   * For hash translation mode, we use the deposited table to store hash slot
> >   * information and they are stored at PTRS_PER_PMD offset from related pmd
> > @@ -563,6 +564,7 @@ int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> >  
> > return true;
> >  }
> > +#endif
> >  
> >  /*
> >   * Does the CPU support tlbie?
> > -- 
> > 2.40.1  

I am still carrying this patch (it should probably go into the mm
tree).  Is someone going to pick it up (assuming it is correct)?

-- 
Cheers,
Stephen Rothwell


pgpKkhduvXRmu.pgp
Description: OpenPGP digital signature


Re: linux-next: build failure after merge of the mm tree

2023-11-30 Thread Andrew Morton
On Fri, 1 Dec 2023 09:04:39 +1100 Stephen Rothwell  
wrote:

> Hi all,
> 
> > > diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> > > b/arch/powerpc/mm/book3s64/pgtable.c
> > > index be229290a6a7..3438ab72c346 100644
> > > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > > @@ -542,6 +542,7 @@ void ptep_modify_prot_commit(struct vm_area_struct 
> > > *vma, unsigned long addr,
> > >   set_pte_at(vma->vm_mm, addr, ptep, pte);
> > >  }
> > >  
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > >  /*
> > >   * For hash translation mode, we use the deposited table to store hash 
> > > slot
> > >   * information and they are stored at PTRS_PER_PMD offset from related 
> > > pmd
> > > @@ -563,6 +564,7 @@ int pmd_move_must_withdraw(struct spinlock 
> > > *new_pmd_ptl,
> > >  
> > >   return true;
> > >  }
> > > +#endif
> > >  
> > >  /*
> > >   * Does the CPU support tlbie?
> > > -- 
> > > 2.40.1  
> 
> I am still carrying this patch (it should probably go into the mm
> tree).  Is someone going to pick it up (assuming it is correct)?

AFAIK we're still awaiting input from the ppc team.

I'll grab it.  If it breaks things then we-told-you-so!


Re: linux-next: build failure after merge of the mm tree

2023-11-30 Thread Michael Ellerman
Stephen Rothwell  writes:
> On Mon, 27 Nov 2023 14:48:52 +1100 Stephen Rothwell  
> wrote:
>> 
>> Just cc'ing the PowerPC guys to see if my fix is sensible.
>> 
>> On Mon, 27 Nov 2023 13:28:09 +1100 Stephen Rothwell  
>> wrote:
>> >
>> > After merging the mm tree, today's linux-next build (powerpc64
>> > allnoconfig) failed like this:
>> > 
>> > arch/powerpc/mm/book3s64/pgtable.c:557:5: error: no previous prototype for 
>> > 'pmd_move_must_withdraw' [-Werror=missing-prototypes]
>> >   557 | int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
>> >   | ^~
>> > cc1: all warnings being treated as errors
>> > 
>> > Caused by commit
>> > 
>> >   c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
>> > 
>> > I have added the following patch for today (which could be applied to
>> > the mm or powerpc trees):
>> > 
>> > From 194805b44c11b4c0aa28bdcdc0bb0d82acef394c Mon Sep 17 00:00:00 2001
>> > From: Stephen Rothwell 
>> > Date: Mon, 27 Nov 2023 13:08:57 +1100
>> > Subject: [PATCH] powerpc: pmd_move_must_withdraw() is only needed for
>> >  CONFIG_TRANSPARENT_HUGEPAGE
>> > 
>> > Signed-off-by: Stephen Rothwell 
>> > ---
>> >  arch/powerpc/mm/book3s64/pgtable.c | 2 ++
>> >  1 file changed, 2 insertions(+)
>> > 
>> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
>> > b/arch/powerpc/mm/book3s64/pgtable.c
>> > index be229290a6a7..3438ab72c346 100644
>> > --- a/arch/powerpc/mm/book3s64/pgtable.c
>> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
>> > @@ -542,6 +542,7 @@ void ptep_modify_prot_commit(struct vm_area_struct 
>> > *vma, unsigned long addr,
>> >set_pte_at(vma->vm_mm, addr, ptep, pte);
>> >  }
>> >  
>> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> >  /*
>> >   * For hash translation mode, we use the deposited table to store hash 
>> > slot
>> >   * information and they are stored at PTRS_PER_PMD offset from related pmd
>> > @@ -563,6 +564,7 @@ int pmd_move_must_withdraw(struct spinlock 
>> > *new_pmd_ptl,
>> >  
>> >return true;
>> >  }
>> > +#endif
>> >  
>> >  /*
>> >   * Does the CPU support tlbie?
>> > -- 
>> > 2.40.1  
>
> I am still carrying this patch (it should probably go into the mm
> tree).  Is someone going to pick it up (assuming it is correct)?

I applied it to my next a few days ago, but I must have forgotten to
push. It's in there now.

cheers


Re: [PATCH v4 05/13] powerpc/rtas: Facilitate high-level call sequences

2023-11-30 Thread Michael Ellerman
Nathan Lynch  writes:
> Michael Ellerman  writes:
>> Nathan Lynch  writes:
>>> Michael Ellerman  writes:
 Nathan Lynch via B4 Relay 
 writes:
> From: Nathan Lynch 
>
> On RTAS platforms there is a general restriction that the OS must not
> enter RTAS on more than one CPU at a time. This low-level
> serialization requirement is satisfied by holding a spin
> lock (rtas_lock) across most RTAS function invocations.
 ...
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 1fc0b3fffdd1..52f2242d0c28 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -581,6 +652,28 @@ static const struct rtas_function 
> *rtas_token_to_function(s32 token)
>   return NULL;
>  }
>  
> +static void __rtas_function_lock(struct rtas_function *func)
> +{
> + if (func && func->lock)
> + mutex_lock(func->lock);
> +}

 This is obviously going to defeat most static analysis tools.
>>>
>>> I guess it's not that obvious to me :-) Is it because the mutex_lock()
>>> is conditional? I'll improve this if it's possible.
>>
>> Well maybe I'm not giving modern static analysis tools enough credit :)
>>
>> But what I mean that it's not easy to reason about what the function
>> does in isolation. ie. all you can say is that it may or may not lock a
>> mutex, and you can't say which mutex.
>
> I've pulled the thread on this a little bit and here is what I can do:
>
> * Discard rtas_lock_function() and rtas_unlock_function() and make the
>   function mutexes extern as needed. As of now only
>   rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
>   __acquires(), __releases(), and __must_hold() annotations in
>   papr-vpd.c since it explicitly manipulates the mutex.
>
> * Then sys_rtas() becomes the only site that needs
>   __rtas_function_lock() and __rtas_function_unlock(), which can be
>   open-coded and commented (and, one hopes, not emulated elsewhere).
>
> This will look something like:
>
> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
> {
> struct rtas_function *func = rtas_token_to_function(token);
>
> if (func->lock)
> mutex_lock(func->lock);
>
> [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]
>
> if (func->lock)
> mutex_unlock(func->lock);
>
> The indirection seems unavoidable since we're working backwards from a
> token value (supplied by the user and not known at build time) to the
> function descriptor.
>
> Is that tolerable for now?

Yeah. Thanks for looking into it.

I wasn't unhappy with the original version, but just slightly uneasy
about the locking via pointer.

But that new proposal sounds good, more code will have static lock
annotations, and only sys_rtas() which is already weird, will have the
dynamic stuff.

> Alternatively, sys_rtas() could be refactored into locking and
> non-locking paths, e.g.
>
> static long __do_sys_rtas(struct rtas_function *func)
> {
>   // [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ]
> }
>
> static long do_sys_rtas(struct rtas_function *func, struct mutex *mtx)
> {
>   mutex_lock(mtx);
>   ret = __do_sys_rtas(func);
>   mutex_unlock(mtx);
>
>   return ret;
> }
>
> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
> {
>   // real code does copy_from_user etc
>   struct rtas_function *func = rtas_token_to_function(uargs->token);
>   long ret;
>
>   // [ ... input validation and filtering ... ]
>
>   if (func->lock)
>   ret = do_sys_rtas(func, func->lock);
>   else
>   ret = __do_sys_rtas(func);
>
>   // [ ... copy out results ... ]
>
>   return ret;
> }

You could go even further and switch on the token, and handle each case
separately so that you can then statically take the appropriate lock.
But that's probably overkill.

cheers


Re: linux-next: build failure after merge of the mm tree

2023-11-30 Thread Andrew Morton
On Fri, 01 Dec 2023 09:39:20 +1100 Michael Ellerman  wrote:

> > I am still carrying this patch (it should probably go into the mm
> > tree).  Is someone going to pick it up (assuming it is correct)?
> 
> I applied it to my next a few days ago, but I must have forgotten to
> push. It's in there now.

I'll keep a copy in mm.git, to keep the dependencies nice.  I added
your acked-by.


Re: [PATCH v4 2/4] PCI: layerscape: Add suspend/resume for ls1021a

2023-11-30 Thread Manivannan Sadhasivam
On Wed, Nov 29, 2023 at 04:44:10PM -0500, Frank Li wrote:
> ls1021a add suspend/resume support.
> 

"Add suspend/resume support for Layerscape LS1021a"

> In the suspend path, PME_Turn_Off message is sent to the endpoint to
> transition the link to L2/L3_Ready state. In this SoC, there is no way to
> check if the controller has received the PME_To_Ack from the endpoint or
> not. So to be on the safer side, the driver just waits for
> PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> bit to complete the PME_Turn_Off handshake. This link would then enter

"Then the link would enter L2/L3 state depending on the VAUX supply."

> L2/L3 state depending on the VAUX supply.
> 
> In the resume path, the link is brought back from L2 to L0 by doing a
> software reset.
> 
> Signed-off-by: Frank Li 

Couple of comments below. But the patch is looking good now. Thanks for the
update.

> ---
> 
> Notes:
> Change from v3 to v4
> - update commit message.
> - it is reset a glue logic part for PCI controller.
> - use regmap_write_bits() to reduce code change.
> 
> Change from v2 to v3
> - update according to mani's feedback
> change from v1 to v2
> - change subject 'a' to 'A'
> 
>  drivers/pci/controller/dwc/pci-layerscape.c | 83 -
>  1 file changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index aea89926bcc4f..42bca2c3b5c3e 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -35,11 +35,19 @@
>  #define PF_MCR_PTOMR BIT(0)
>  #define PF_MCR_EXL2S BIT(1)
>  
> +/* LS1021A PEXn PM Write Control Register */
> +#define SCFG_PEXPMWRCR(idx)  (0x5c + (idx) * 0x64)
> +#define PMXMTTURNOFF BIT(31)
> +#define SCFG_PEXSFTRSTCR 0x190
> +#define PEXSR(idx)   BIT(idx)
> +
>  #define PCIE_IATU_NUM6
>  
>  struct ls_pcie_drvdata {
>   const u32 pf_off;
> + const struct dw_pcie_host_ops *ops;
>   int (*exit_from_l2)(struct dw_pcie_rp *pp);
> + bool scfg_support;
>   bool pm_support;
>  };
>  
> @@ -47,6 +55,8 @@ struct ls_pcie {
>   struct dw_pcie *pci;
>   const struct ls_pcie_drvdata *drvdata;
>   void __iomem *pf_base;
> + struct regmap *scfg;
> + int index;
>   bool big_endian;
>  };
>  
> @@ -171,13 +181,65 @@ static int ls_pcie_host_init(struct dw_pcie_rp *pp)
>   return 0;
>  }
>  
> +static void scfg_pcie_send_turnoff_msg(struct regmap *scfg, u32 reg, u32 
> mask)
> +{
> + /* Send PME_Turn_Off message */
> + regmap_write_bits(scfg, reg, mask, mask);
> +
> + /*
> +  * There is no specific register to check for PME_To_Ack from endpoint.
> +  * So on the safe side, wait for PCIE_PME_TO_L2_TIMEOUT_US.
> +  */
> + mdelay(PCIE_PME_TO_L2_TIMEOUT_US/1000);
> +
> + /*
> +  * Layerscape hardware reference manual recommends clearing the 
> PMXMTTURNOFF bit
> +  * to complete the PME_Turn_Off handshake.
> +  */
> + regmap_write_bits(scfg, reg, mask, 0);
> +}
> +
> +static void ls1021a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> +
> + scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMWRCR(pcie->index), 
> PMXMTTURNOFF);
> +}
> +
> +static int scfg_pcie_exit_from_l2(struct regmap *scfg, u32 reg, u32 mask)
> +{
> + /* Only way exit from l2 is that do software reset */

"Only way exit from L2 is by doing a software reset of the controller"

I really hope that the reset is not a global controller reset i.e., not
resetting all CSRs.

> + regmap_write_bits(scfg, reg, mask, mask);
> +

No need of a newline.

> + regmap_write_bits(scfg, reg, mask, 0);
> +
> + return 0;
> +}
> +
> +static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> +
> + return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> PEXSR(pcie->index));
> +}
> +
>  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
>   .host_init = ls_pcie_host_init,
>   .pme_turn_off = ls_pcie_send_turnoff_msg,
>  };
>  
> +static const struct dw_pcie_host_ops ls1021a_pcie_host_ops = {
> + .host_init = ls_pcie_host_init,
> + .pme_turn_off = ls1021a_pcie_send_turnoff_msg,
> +};
> +
>  static const struct ls_pcie_drvdata ls1021a_drvdata = {
> - .pm_support = false,
> + .pm_support = true,
> + .scfg_support = true,
> + .ops = _pcie_host_ops,
> + .exit_from_l2 = ls1021a_pcie_exit_from_l2,
>  };
>  
>  static const struct ls_pcie_drvdata layerscape_drvdata = {
> @@ -205,6 +267,8 @@ static int ls_pcie_probe(struct platform_device *pdev)
>   struct dw_pcie *pci;
>   struct ls_pcie *pcie;
>   struct resource *dbi_base;
> 

Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save

2023-11-30 Thread Timothy Pearson
- Original Message -
> From: "Michael Ellerman" 
> To: "Timothy Pearson" 
> Cc: "Jens Axboe" , "regressions" 
> , "npiggin" ,
> "christophe leroy" , "linuxppc-dev" 
> 
> Sent: Tuesday, November 28, 2023 6:57:01 AM
> Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec 
> register  save

> Michael Ellerman  writes:
>> Timothy Pearson  writes:
>>
>>> Just wanted to check back and see if this patch was going to be queued
>>> up soon?  We're still having to work around / advertise the data
>>> destruction issues the underlying bug is causing on e.g. Debian
>>> Stable.
>>
>> Yeah I'll apply it this week, so it will be in rc4.
> 
> I reworked the change log to include the exact call path I identified
> instead of the more high level description you had. And tweaked a few
> other bits of wording and so on, apparently fr0 is a kernelism, the ABI
> and binutils calls it f0.
> 
> I'm not sure how wedded you were to your change log, so if you dislike
> my edits let me know and we can come up with a joint one.
> 
> The actual patch is unchanged.
> 
> cheers

The commit message looks OK to me.  I've also seen application crashes as a 
result of the register corruption, but that may be a minor detail that isn't 
really worth updating things over at this point -- those come from e.g. glibc 
using vs0 as part of a path that processes pointer information, typically seen 
where there's a need to replicate the same pointer to adjacent fields in a data 
struct.


Re: [PATCH v4 3/4] PCI: layerscape: Rename pf_* as pf_lut_*

2023-11-30 Thread Manivannan Sadhasivam
On Wed, Nov 29, 2023 at 04:44:11PM -0500, Frank Li wrote:
> 'pf' and 'lut' is just difference name in difference chips, but basic it is
> a MMIO base address plus an offset.
> 
> Rename it to avoid duplicate pf_* and lut_* in driver.
> 
> Signed-off-by: Frank Li 

Reviewed-by: Manivannan Sadhasivam 

Can you fix the name in pci-layerscape-ep.c also?

- Mani

> ---
> 
> Notes:
> pf_lut is better than pf_* or lut* because some chip use 'pf', some chip
> use 'lut'.
> 
> change from v1 to v4
> - new patch at v3
> 
>  drivers/pci/controller/dwc/pci-layerscape.c | 34 ++---
>  1 file changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index 42bca2c3b5c3e..590e07bb27002 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -44,7 +44,7 @@
>  #define PCIE_IATU_NUM6
>  
>  struct ls_pcie_drvdata {
> - const u32 pf_off;
> + const u32 pf_lut_off;
>   const struct dw_pcie_host_ops *ops;
>   int (*exit_from_l2)(struct dw_pcie_rp *pp);
>   bool scfg_support;
> @@ -54,13 +54,13 @@ struct ls_pcie_drvdata {
>  struct ls_pcie {
>   struct dw_pcie *pci;
>   const struct ls_pcie_drvdata *drvdata;
> - void __iomem *pf_base;
> + void __iomem *pf_lut_base;
>   struct regmap *scfg;
>   int index;
>   bool big_endian;
>  };
>  
> -#define ls_pcie_pf_readl_addr(addr)  ls_pcie_pf_readl(pcie, addr)
> +#define ls_pcie_pf_lut_readl_addr(addr)  ls_pcie_pf_lut_readl(pcie, addr)
>  #define to_ls_pcie(x)dev_get_drvdata((x)->dev)
>  
>  static bool ls_pcie_is_bridge(struct ls_pcie *pcie)
> @@ -101,20 +101,20 @@ static void ls_pcie_fix_error_response(struct ls_pcie 
> *pcie)
>   iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
>  }
>  
> -static u32 ls_pcie_pf_readl(struct ls_pcie *pcie, u32 off)
> +static u32 ls_pcie_pf_lut_readl(struct ls_pcie *pcie, u32 off)
>  {
>   if (pcie->big_endian)
> - return ioread32be(pcie->pf_base + off);
> + return ioread32be(pcie->pf_lut_base + off);
>  
> - return ioread32(pcie->pf_base + off);
> + return ioread32(pcie->pf_lut_base + off);
>  }
>  
> -static void ls_pcie_pf_writel(struct ls_pcie *pcie, u32 off, u32 val)
> +static void ls_pcie_pf_lut_writel(struct ls_pcie *pcie, u32 off, u32 val)
>  {
>   if (pcie->big_endian)
> - iowrite32be(val, pcie->pf_base + off);
> + iowrite32be(val, pcie->pf_lut_base + off);
>   else
> - iowrite32(val, pcie->pf_base + off);
> + iowrite32(val, pcie->pf_lut_base + off);
>  }
>  
>  static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> @@ -124,11 +124,11 @@ static void ls_pcie_send_turnoff_msg(struct dw_pcie_rp 
> *pp)
>   u32 val;
>   int ret;
>  
> - val = ls_pcie_pf_readl(pcie, LS_PCIE_PF_MCR);
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_PF_MCR);
>   val |= PF_MCR_PTOMR;
> - ls_pcie_pf_writel(pcie, LS_PCIE_PF_MCR, val);
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_PF_MCR, val);
>  
> - ret = readx_poll_timeout(ls_pcie_pf_readl_addr, LS_PCIE_PF_MCR,
> + ret = readx_poll_timeout(ls_pcie_pf_lut_readl_addr, LS_PCIE_PF_MCR,
>val, !(val & PF_MCR_PTOMR),
>PCIE_PME_TO_L2_TIMEOUT_US/10,
>PCIE_PME_TO_L2_TIMEOUT_US);
> @@ -147,15 +147,15 @@ static int ls_pcie_exit_from_l2(struct dw_pcie_rp *pp)
>* Set PF_MCR_EXL2S bit in LS_PCIE_PF_MCR register for the link
>* to exit L2 state.
>*/
> - val = ls_pcie_pf_readl(pcie, LS_PCIE_PF_MCR);
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_PF_MCR);
>   val |= PF_MCR_EXL2S;
> - ls_pcie_pf_writel(pcie, LS_PCIE_PF_MCR, val);
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_PF_MCR, val);
>  
>   /*
>* L2 exit timeout of 10ms is not defined in the specifications,
>* it was chosen based on empirical observations.
>*/
> - ret = readx_poll_timeout(ls_pcie_pf_readl_addr, LS_PCIE_PF_MCR,
> + ret = readx_poll_timeout(ls_pcie_pf_lut_readl_addr, LS_PCIE_PF_MCR,
>val, !(val & PF_MCR_EXL2S),
>1000,
>1);
> @@ -243,7 +243,7 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
>  };
>  
>  static const struct ls_pcie_drvdata layerscape_drvdata = {
> - .pf_off = 0xc,
> + .pf_lut_off = 0xc,
>   .pm_support = true,
>   .exit_from_l2 = ls_pcie_exit_from_l2,
>  };
> @@ -293,7 +293,7 @@ static int ls_pcie_probe(struct platform_device *pdev)
>  
>   pcie->big_endian = of_property_read_bool(dev->of_node, "big-endian");
>  
> - pcie->pf_base = pci->dbi_base + pcie->drvdata->pf_off;
> + pcie->pf_lut_base = pci->dbi_base + pcie->drvdata->pf_lut_off;
>  

Re: [PATCH v4 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-11-30 Thread Manivannan Sadhasivam
On Wed, Nov 29, 2023 at 04:44:12PM -0500, Frank Li wrote:
> In the suspend path, PME_Turn_Off message is sent to the endpoint to
> transition the link to L2/L3_Ready state. In this SoC, there is no way to
> check if the controller has received the PME_To_Ack from the endpoint or
> not. So to be on the safer side, the driver just waits for
> PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> bit to complete the PME_Turn_Off handshake. This link would then enter
> L2/L3 state depending on the VAUX supply.
> 
> In the resume path, the link is brought back from L2 to L0 by doing a
> software reset.
> 

Same comment on the patch description as on patch 2/4.

> Signed-off-by: Frank Li 
> ---
> 
> Notes:
> Change from v3 to v4
> - Call scfg_pcie_send_turnoff_msg() shared with ls1021a
> - update commit message
> 
> Change from v2 to v3
> - Remove ls_pcie_lut_readl(writel) function
> 
> Change from v1 to v2
> - Update subject 'a' to 'A'
> 
>  drivers/pci/controller/dwc/pci-layerscape.c | 63 -
>  1 file changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index 590e07bb27002..d39700b3afaaa 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -41,6 +41,15 @@
>  #define SCFG_PEXSFTRSTCR 0x190
>  #define PEXSR(idx)   BIT(idx)
>  
> +/* LS1043A PEX PME control register */
> +#define SCFG_PEXPMECR0x144
> +#define PEXPME(idx)  BIT(31 - (idx) * 4)
> +
> +/* LS1043A PEX LUT debug register */
> +#define LS_PCIE_LDBG 0x7fc
> +#define LDBG_SR  BIT(30)
> +#define LDBG_WE  BIT(31)
> +
>  #define PCIE_IATU_NUM6
>  
>  struct ls_pcie_drvdata {
> @@ -225,6 +234,45 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp 
> *pp)
>   return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> PEXSR(pcie->index));
>  }
>  
> +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> +
> + scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
> PEXPME(pcie->index));
> +}
> +
> +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> + u32 val;
> +
> + /*
> +  * Only way let PEX module exit L2 is do a software reset.

Can you expand PEX? What is it used for?

Also if the reset is only for the PEX module, please use the same comment in
both patches 2 and 4. Patch 2 doesn't mention PEX in the comment.

- Mani

> +  * LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
> both setting and
> +  *  clearing the soft reset on the PEX module.
> +  * LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
> +  */
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val |= LDBG_WE;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val |= LDBG_SR;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val &= ~LDBG_SR;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val &= ~LDBG_WE;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + return 0;
> +}
> +
>  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
>   .host_init = ls_pcie_host_init,
>   .pme_turn_off = ls_pcie_send_turnoff_msg,
> @@ -242,6 +290,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
>   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
>  };
>  
> +static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
> + .host_init = ls_pcie_host_init,
> + .pme_turn_off = ls1043a_pcie_send_turnoff_msg,
> +};
> +
> +static const struct ls_pcie_drvdata ls1043a_drvdata = {
> + .pf_lut_off = 0x1,
> + .pm_support = true,
> + .scfg_support = true,
> + .ops = _pcie_host_ops,
> + .exit_from_l2 = ls1043a_pcie_exit_from_l2,
> +};
> +
>  static const struct ls_pcie_drvdata layerscape_drvdata = {
>   .pf_lut_off = 0xc,
>   .pm_support = true,
> @@ -252,7 +313,7 @@ static const struct of_device_id ls_pcie_of_match[] = {
>   { .compatible = "fsl,ls1012a-pcie", .data = _drvdata },
>   { .compatible = "fsl,ls1021a-pcie", .data = _drvdata },
>   { .compatible = "fsl,ls1028a-pcie", .data = _drvdata },
> - { .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
> + { .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
>   { .compatible = "fsl,ls1046a-pcie", .data = _drvdata },
>   { .compatible = "fsl,ls2080a-pcie", .data = _drvdata },
>   { 

Re: [PATCH] powerpc/irq: Allow softirq to hardirq stack transition

2023-11-30 Thread Christophe Leroy


Le 30/11/2023 à 13:50, Michael Ellerman a écrit :
> Allow a transition from the softirq stack to the hardirq stack when
> handling a hardirq. Doing so means a hardirq received while deep in
> softirq processing is less likely to cause a stack overflow of the
> softirq stack.
> 
> Previously it wasn't safe to do so because irq_exit() (which initiates
> softirq processing) was called on the hardirq stack.
> 
> That was changed in commit 1b1b6a6f4cc0 ("powerpc: handle irq_enter/
> irq_exit in interrupt handler wrappers") and 1346d00e1bdf ("powerpc:
> Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK").
> 
> The allowed transitions are now:
>   - process stack -> hardirq stack
>   - process stack -> softirq stack
>   - process stack -> softirq stack -> hardirq stack

It means you don't like my patch 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/6cd9d8bb2258d8b51999c2584eac74423d2b5e29.1657203774.git.christophe.le...@csgroup.eu/
 
?

I never got any feedback.


> 
> Signed-off-by: Michael Ellerman 
> ---
>   arch/powerpc/kernel/irq.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 6f7d4edaa0bc..7504ceec5c58 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -284,15 +284,14 @@ static __always_inline void call_do_irq(struct pt_regs 
> *regs, void *sp)
>   void __do_IRQ(struct pt_regs *regs)
>   {
>   struct pt_regs *old_regs = set_irq_regs(regs);
> - void *cursp, *irqsp, *sirqsp;
> + void *cursp, *irqsp;
>   
>   /* Switch to the irq stack to handle this */
>   cursp = (void *)(current_stack_pointer & ~(THREAD_SIZE - 1));
>   irqsp = hardirq_ctx[raw_smp_processor_id()];
> - sirqsp = softirq_ctx[raw_smp_processor_id()];
>   
>   /* Already there ? If not switch stack and call */
> - if (unlikely(cursp == irqsp || cursp == sirqsp))
> + if (unlikely(cursp == irqsp))
>   __do_irq(regs, current_stack_pointer);
>   else
>   call_do_irq(regs, irqsp);


Re: [PATCH 1/2] kexec: fix KEXEC_FILE dependencies

2023-11-30 Thread Andrew Morton
On Thu, 2 Nov 2023 16:03:18 +0800 Baoquan He  wrote:

> > > CONFIG_KEXEC_FILE, but still get purgatory code built in which is
> > > totally useless.
> > >
> > > Not sure if I think too much over this.
> > 
> > I see your point here, and I would suggest changing the
> > CONFIG_ARCH_SUPPORTS_KEXEC_PURGATORY symbol to just indicate
> > the availability of the purgatory code for the arch, rather
> > than actually controlling the code itself. I already mentioned
> > this for s390, but riscv would need the same thing on top.
> > 
> > I think the change below should address your concern.
> 
> Since no new comment, do you mind spinning v2 to wrap all these up?

This patchset remains in mm-hotfixes-unstable from the previous -rc
cycle.  Eric, do you have any comments?  Arnd, do you plan on a v2?  If
not, should I merge v1?  If so, should I now add cc:stable?


Re: [PATCH 2/3] powerpc/pseries/memhp: Remove unbalanced dlpar_release_drc() call

2023-11-30 Thread Scott Cheloha
> On Nov 28, 2023, at 9:21 AM, Nathan Lynch  wrote:
> 
> Nick Child  writes:
>> Hi Nathan,
>> Patches 1 and 3 LGTM
> 
> thanks.
> 
>> Regarding this patch, dlpar_memory_remove_by_count() calls 
>> dlpar_add_lmb() and does not free drc on add error.
>> dlpar_add_lmb() is called here in error recovery so probably
>> not a big deal.
>> 
>> This is all new code to me but it looks like if the requested
>> number of lmbs cannot be removed then it attempts to add back
>> the ones that were successfully removed. So if you cannot add
>> an lmb that WAS successfully removed, it seems sane to also
>> release the drc.
> 
> Maybe I'll drop this one for now and turn my attention to removing all
> the high-level rollback logic in this code. There's no reliable way to
> accomplish what it's trying to do (i.e. restore the original quantity of
> LMBs) and it just complicates things.

Removing all of the rollback code is a wonderful idea.



Re: [PATCH v4 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-11-30 Thread Frank Li
On Thu, Nov 30, 2023 at 10:21:00PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Nov 29, 2023 at 04:44:12PM -0500, Frank Li wrote:
> > In the suspend path, PME_Turn_Off message is sent to the endpoint to
> > transition the link to L2/L3_Ready state. In this SoC, there is no way to
> > check if the controller has received the PME_To_Ack from the endpoint or
> > not. So to be on the safer side, the driver just waits for
> > PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> > bit to complete the PME_Turn_Off handshake. This link would then enter
> > L2/L3 state depending on the VAUX supply.
> > 
> > In the resume path, the link is brought back from L2 to L0 by doing a
> > software reset.
> > 
> 
> Same comment on the patch description as on patch 2/4.
> 
> > Signed-off-by: Frank Li 
> > ---
> > 
> > Notes:
> > Change from v3 to v4
> > - Call scfg_pcie_send_turnoff_msg() shared with ls1021a
> > - update commit message
> > 
> > Change from v2 to v3
> > - Remove ls_pcie_lut_readl(writel) function
> > 
> > Change from v1 to v2
> > - Update subject 'a' to 'A'
> > 
> >  drivers/pci/controller/dwc/pci-layerscape.c | 63 -
> >  1 file changed, 62 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> > b/drivers/pci/controller/dwc/pci-layerscape.c
> > index 590e07bb27002..d39700b3afaaa 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > @@ -41,6 +41,15 @@
> >  #define SCFG_PEXSFTRSTCR   0x190
> >  #define PEXSR(idx) BIT(idx)
> >  
> > +/* LS1043A PEX PME control register */
> > +#define SCFG_PEXPMECR  0x144
> > +#define PEXPME(idx)BIT(31 - (idx) * 4)
> > +
> > +/* LS1043A PEX LUT debug register */
> > +#define LS_PCIE_LDBG   0x7fc
> > +#define LDBG_SRBIT(30)
> > +#define LDBG_WEBIT(31)
> > +
> >  #define PCIE_IATU_NUM  6
> >  
> >  struct ls_pcie_drvdata {
> > @@ -225,6 +234,45 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp 
> > *pp)
> > return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> > PEXSR(pcie->index));
> >  }
> >  
> > +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> > +{
> > +   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +   struct ls_pcie *pcie = to_ls_pcie(pci);
> > +
> > +   scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
> > PEXPME(pcie->index));
> > +}
> > +
> > +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> > +{
> > +   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > +   struct ls_pcie *pcie = to_ls_pcie(pci);
> > +   u32 val;
> > +
> > +   /*
> > +* Only way let PEX module exit L2 is do a software reset.
> 
> Can you expand PEX? What is it used for?
> 
> Also if the reset is only for the PEX module, please use the same comment in
> both patches 2 and 4. Patch 2 doesn't mention PEX in the comment.

After read spec again, I think PEX is pci express. So it should software
reset controller. I don't know what exactly did in the chip. But without
below code, PCIe can't exit L2/L3.

Any harmful if dwc controller reset? Anyway these code works well with
intel network card.

Frank

> 
> - Mani
> 
> > +* LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
> > both setting and
> > +*  clearing the soft reset on the PEX module.
> > +* LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
> > +*/
> > +   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > +   val |= LDBG_WE;
> > +   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > +
> > +   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > +   val |= LDBG_SR;
> > +   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > +
> > +   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > +   val &= ~LDBG_SR;
> > +   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > +
> > +   val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > +   val &= ~LDBG_WE;
> > +   ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > +
> > +   return 0;
> > +}
> > +
> >  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
> > .host_init = ls_pcie_host_init,
> > .pme_turn_off = ls_pcie_send_turnoff_msg,
> > @@ -242,6 +290,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
> > .exit_from_l2 = ls1021a_pcie_exit_from_l2,
> >  };
> >  
> > +static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
> > +   .host_init = ls_pcie_host_init,
> > +   .pme_turn_off = ls1043a_pcie_send_turnoff_msg,
> > +};
> > +
> > +static const struct ls_pcie_drvdata ls1043a_drvdata = {
> > +   .pf_lut_off = 0x1,
> > +   .pm_support = true,
> > +   .scfg_support = true,
> > +   .ops = _pcie_host_ops,
> > +   .exit_from_l2 = ls1043a_pcie_exit_from_l2,
> > +};
> > +
> >  static const struct ls_pcie_drvdata layerscape_drvdata = {
> > .pf_lut_off = 0xc,
> > .pm_support = true,
> > @@ 

Re: [PATCH v4 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-11-30 Thread Frank Li
On Thu, Nov 30, 2023 at 03:17:39PM -0500, Frank Li wrote:
> On Thu, Nov 30, 2023 at 10:21:00PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Nov 29, 2023 at 04:44:12PM -0500, Frank Li wrote:
> > > In the suspend path, PME_Turn_Off message is sent to the endpoint to
> > > transition the link to L2/L3_Ready state. In this SoC, there is no way to
> > > check if the controller has received the PME_To_Ack from the endpoint or
> > > not. So to be on the safer side, the driver just waits for
> > > PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> > > bit to complete the PME_Turn_Off handshake. This link would then enter
> > > L2/L3 state depending on the VAUX supply.
> > > 
> > > In the resume path, the link is brought back from L2 to L0 by doing a
> > > software reset.
> > > 
> > 
> > Same comment on the patch description as on patch 2/4.
> > 
> > > Signed-off-by: Frank Li 
> > > ---
> > > 
> > > Notes:
> > > Change from v3 to v4
> > > - Call scfg_pcie_send_turnoff_msg() shared with ls1021a
> > > - update commit message
> > > 
> > > Change from v2 to v3
> > > - Remove ls_pcie_lut_readl(writel) function
> > > 
> > > Change from v1 to v2
> > > - Update subject 'a' to 'A'
> > > 
> > >  drivers/pci/controller/dwc/pci-layerscape.c | 63 -
> > >  1 file changed, 62 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> > > b/drivers/pci/controller/dwc/pci-layerscape.c
> > > index 590e07bb27002..d39700b3afaaa 100644
> > > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > > @@ -41,6 +41,15 @@
> > >  #define SCFG_PEXSFTRSTCR 0x190
> > >  #define PEXSR(idx)   BIT(idx)
> > >  
> > > +/* LS1043A PEX PME control register */
> > > +#define SCFG_PEXPMECR0x144
> > > +#define PEXPME(idx)  BIT(31 - (idx) * 4)
> > > +
> > > +/* LS1043A PEX LUT debug register */
> > > +#define LS_PCIE_LDBG 0x7fc
> > > +#define LDBG_SR  BIT(30)
> > > +#define LDBG_WE  BIT(31)
> > > +
> > >  #define PCIE_IATU_NUM6
> > >  
> > >  struct ls_pcie_drvdata {
> > > @@ -225,6 +234,45 @@ static int ls1021a_pcie_exit_from_l2(struct 
> > > dw_pcie_rp *pp)
> > >   return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> > > PEXSR(pcie->index));
> > >  }
> > >  
> > > +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > + struct ls_pcie *pcie = to_ls_pcie(pci);
> > > +
> > > + scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
> > > PEXPME(pcie->index));
> > > +}
> > > +
> > > +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > + struct ls_pcie *pcie = to_ls_pcie(pci);
> > > + u32 val;
> > > +
> > > + /*
> > > +  * Only way let PEX module exit L2 is do a software reset.
> > 
> > Can you expand PEX? What is it used for?
> > 
> > Also if the reset is only for the PEX module, please use the same comment in
> > both patches 2 and 4. Patch 2 doesn't mention PEX in the comment.
> 
> After read spec again, I think PEX is pci express. So it should software
> reset controller. I don't know what exactly did in the chip. But without
> below code, PCIe can't exit L2/L3.
> 
> Any harmful if dwc controller reset? Anyway these code works well with
> intel network card.

Sorry, sent too quick. It is PCIe express wrapper

Copy from spec: 

"PEXLDBG[SR]. Once set the
PEXLDBG[SR] will enable the soft reset to the PEX wrapper."

Frank

> 
> Frank
> 
> > 
> > - Mani
> > 
> > > +  * LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
> > > both setting and
> > > +  *  clearing the soft reset on the PEX module.
> > > +  * LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
> > > +  */
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val |= LDBG_WE;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val |= LDBG_SR;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val &= ~LDBG_SR;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> > > + val &= ~LDBG_WE;
> > > + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
> > >   .host_init = ls_pcie_host_init,
> > >   .pme_turn_off = ls_pcie_send_turnoff_msg,
> > > @@ -242,6 +290,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata 
> > > = {
> > >   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
> > >  };
> > >  
> > > +static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
> > > + .host_init = ls_pcie_host_init,
> > > + .pme_turn_off = 

Re: [PATCH v4 05/13] powerpc/rtas: Facilitate high-level call sequences

2023-11-30 Thread Nathan Lynch
Michael Ellerman  writes:

> Nathan Lynch  writes:
>> Michael Ellerman  writes:
>>
>>> Nathan Lynch via B4 Relay 
>>> writes:
 From: Nathan Lynch 

 On RTAS platforms there is a general restriction that the OS must not
 enter RTAS on more than one CPU at a time. This low-level
 serialization requirement is satisfied by holding a spin
 lock (rtas_lock) across most RTAS function invocations.
>>> ...
 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
 index 1fc0b3fffdd1..52f2242d0c28 100644
 --- a/arch/powerpc/kernel/rtas.c
 +++ b/arch/powerpc/kernel/rtas.c
 @@ -581,6 +652,28 @@ static const struct rtas_function 
 *rtas_token_to_function(s32 token)
return NULL;
  }
  
 +static void __rtas_function_lock(struct rtas_function *func)
 +{
 +  if (func && func->lock)
 +  mutex_lock(func->lock);
 +}
>>>
>>> This is obviously going to defeat most static analysis tools.
>>
>> I guess it's not that obvious to me :-) Is it because the mutex_lock()
>> is conditional? I'll improve this if it's possible.
>
> Well maybe I'm not giving modern static analysis tools enough credit :)
>
> But what I mean that it's not easy to reason about what the function
> does in isolation. ie. all you can say is that it may or may not lock a
> mutex, and you can't say which mutex.

I've pulled the thread on this a little bit and here is what I can do:

* Discard rtas_lock_function() and rtas_unlock_function() and make the
  function mutexes extern as needed. As of now only
  rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
  __acquires(), __releases(), and __must_hold() annotations in
  papr-vpd.c since it explicitly manipulates the mutex.

* Then sys_rtas() becomes the only site that needs
  __rtas_function_lock() and __rtas_function_unlock(), which can be
  open-coded and commented (and, one hopes, not emulated elsewhere).

This will look something like:

SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
{
struct rtas_function *func = rtas_token_to_function(token);

if (func->lock)
mutex_lock(func->lock);

[ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]

if (func->lock)
mutex_unlock(func->lock);

The indirection seems unavoidable since we're working backwards from a
token value (supplied by the user and not known at build time) to the
function descriptor.

Is that tolerable for now?

Alternatively, sys_rtas() could be refactored into locking and
non-locking paths, e.g.

static long __do_sys_rtas(struct rtas_function *func)
{
// [ ... acquire rtas_lock, enter RTAS, fetch any error etc ... ]
}

static long do_sys_rtas(struct rtas_function *func, struct mutex *mtx)
{
mutex_lock(mtx);
ret = __do_sys_rtas(func);
mutex_unlock(mtx);

return ret;
}

SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
{
// real code does copy_from_user etc
struct rtas_function *func = rtas_token_to_function(uargs->token);
long ret;

// [ ... input validation and filtering ... ]

if (func->lock)
ret = do_sys_rtas(func, func->lock);
else
ret = __do_sys_rtas(func);

// [ ... copy out results ... ]

return ret;
}


Re: [PATCH v4 05/13] powerpc/rtas: Facilitate high-level call sequences

2023-11-30 Thread Nathan Lynch
Michael Ellerman  writes:
> Nathan Lynch  writes:
>> Michael Ellerman  writes:
>>> Nathan Lynch  writes:
 Michael Ellerman  writes:
> Nathan Lynch via B4 Relay 
> writes:
>> From: Nathan Lynch 
>>
>> On RTAS platforms there is a general restriction that the OS must not
>> enter RTAS on more than one CPU at a time. This low-level
>> serialization requirement is satisfied by holding a spin
>> lock (rtas_lock) across most RTAS function invocations.
> ...
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 1fc0b3fffdd1..52f2242d0c28 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -581,6 +652,28 @@ static const struct rtas_function 
>> *rtas_token_to_function(s32 token)
>>  return NULL;
>>  }
>>  
>> +static void __rtas_function_lock(struct rtas_function *func)
>> +{
>> +if (func && func->lock)
>> +mutex_lock(func->lock);
>> +}
>
> This is obviously going to defeat most static analysis tools.

 I guess it's not that obvious to me :-) Is it because the mutex_lock()
 is conditional? I'll improve this if it's possible.
>>>
>>> Well maybe I'm not giving modern static analysis tools enough credit :)
>>>
>>> But what I mean that it's not easy to reason about what the function
>>> does in isolation. ie. all you can say is that it may or may not lock a
>>> mutex, and you can't say which mutex.
>>
>> I've pulled the thread on this a little bit and here is what I can do:
>>
>> * Discard rtas_lock_function() and rtas_unlock_function() and make the
>>   function mutexes extern as needed. As of now only
>>   rtas_ibm_get_vpd_lock will need to be exposed. This enables us to put
>>   __acquires(), __releases(), and __must_hold() annotations in
>>   papr-vpd.c since it explicitly manipulates the mutex.
>>
>> * Then sys_rtas() becomes the only site that needs
>>   __rtas_function_lock() and __rtas_function_unlock(), which can be
>>   open-coded and commented (and, one hopes, not emulated elsewhere).
>>
>> This will look something like:
>>
>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>> {
>> struct rtas_function *func = rtas_token_to_function(token);
>>
>> if (func->lock)
>> mutex_lock(func->lock);
>>
>> [ ... acquire rtas_lock, enter RTAS, fetch any errors ... ]
>>
>> if (func->lock)
>> mutex_unlock(func->lock);
>>
>> The indirection seems unavoidable since we're working backwards from a
>> token value (supplied by the user and not known at build time) to the
>> function descriptor.
>>
>> Is that tolerable for now?
>
> Yeah. Thanks for looking into it.
>
> I wasn't unhappy with the original version, but just slightly uneasy
> about the locking via pointer.
>
> But that new proposal sounds good, more code will have static lock
> annotations, and only sys_rtas() which is already weird, will have the
> dynamic stuff.

OK, I'll work that up then.