date:20230419

[PATCH] powerpc/configs/powernv: Add IGB=y

2023-04-19 Thread Michael Ellerman

Some powernv machines use IGB for networking, so build the driver in to
enable net booting such machines.

Suggested-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/powernv_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index c92652575064..92e3a8fea04a 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -170,6 +170,7 @@ CONFIG_S2IO=m
 CONFIG_E100=y
 CONFIG_E1000=y
 CONFIG_E1000E=y
+CONFIG_IGB=y
 CONFIG_IXGB=m
 CONFIG_IXGBE=m
 CONFIG_I40E=m
-- 
2.39.2

[PATCH 1/2] powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems

2023-04-19 Thread Michael Ellerman

The ext4 code will mount ext2 filesystems, no need to build in both.

Suggested-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/ppc64_defconfig | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index a17cb31105e3..2836190448d5 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -319,10 +319,6 @@ CONFIG_VIRTIO_BALLOON=m
 CONFIG_VHOST_NET=m
 CONFIG_RAS=y
 CONFIG_LIBNVDIMM=y
-CONFIG_EXT2_FS=y
-CONFIG_EXT2_FS_XATTR=y
-CONFIG_EXT2_FS_POSIX_ACL=y
-CONFIG_EXT2_FS_SECURITY=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_EXT4_FS_SECURITY=y
-- 
2.39.2

[PATCH 2/2] powerpc/configs/64s: Drop JFS Filesystem

2023-04-19 Thread Michael Ellerman

Unlikely that anyone is still regularly using JFS, drop it from the
defconfig.

Suggested-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/ppc64_defconfig | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 2836190448d5..7e8bc53f4e64 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -322,9 +322,6 @@ CONFIG_LIBNVDIMM=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_EXT4_FS_SECURITY=y
-CONFIG_JFS_FS=m
-CONFIG_JFS_POSIX_ACL=y
-CONFIG_JFS_SECURITY=y
 CONFIG_XFS_FS=y
 CONFIG_XFS_POSIX_ACL=y
 CONFIG_BTRFS_FS=m
-- 
2.39.2

Re: [PATCH 10/11] iommu: Split iommu_group_add_device()

2023-04-19 Thread Baolu Lu


On 4/20/23 12:11 AM, Jason Gunthorpe wrote:

@@ -451,16 +454,17 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
goto out_unlock;
  
  	group = dev->iommu_group;

-   ret = iommu_group_add_device(group, dev);
+   gdev = iommu_group_alloc_device(group, dev);
mutex_lock(>mutex);
-   if (ret)
+   if (IS_ERR(gdev)) {
+   ret = PTR_ERR(gdev);
goto err_put_group;
+   }
  
+	list_add_tail(>list, >devices);


Do we need to put

dev->iommu_group = group;

here?


if (group_list && !group->default_domain && list_empty(>entry))
list_add_tail(>entry, group_list);
mutex_unlock(>mutex);
-   iommu_group_put(group);
-
mutex_unlock(_probe_device_lock);
  
  	return 0;


Best regards,
baolu

Re: [PATCH 08/11] iommu: Always destroy the iommu_group during iommu_release_device()

2023-04-19 Thread Baolu Lu


On 4/20/23 12:11 AM, Jason Gunthorpe wrote:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dbaf3ed9012c45..a82516c8ea87ad 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -569,7 +569,6 @@ static void __iommu_group_remove_device(struct device *dev)
dev->iommu_group = NULL;
goto out;


Nit, given that below line has been removed, can above simply be a
loop break?


}
-   WARN(true, "Corrupted iommu_group device_list");
  out:
mutex_unlock(>mutex);


Best regards,
baolu

[PATCH v2 4/4] PCI/DPC: Disable DPC interrupt during suspend

2023-04-19 Thread Kai-Heng Feng

PCIe service that shares IRQ with PME may cause spurious wakeup on
system suspend.

Since AER is conditionally disabled in previous patch, also apply the
same logic to disable DPC which depends on AER to work.

PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states
that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready
(D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose
much here to disable DPC during system suspend.

This is very similar to previous attempts to suspend AER and DPC [1],
but with a different reason.

[1] 
https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295

Signed-off-by: Kai-Heng Feng 
---
v2:
 - Only disable DPC IRQ.
 - No more check on PME IRQ#.

 drivers/pci/pcie/dpc.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index a5d7c69b764e..98bdefde6df1 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -385,6 +385,30 @@ static int dpc_probe(struct pcie_device *dev)
return status;
 }
 
+static int dpc_suspend(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   u16 ctl;
+
+   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
+   ctl &= ~PCI_EXP_DPC_CTL_INT_EN;
+   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+
+   return 0;
+}
+
+static int dpc_resume(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   u16 ctl;
+
+   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
+   ctl |= PCI_EXP_DPC_CTL_INT_EN;
+   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+
+   return 0;
+}
+
 static void dpc_remove(struct pcie_device *dev)
 {
struct pci_dev *pdev = dev->port;
@@ -400,6 +424,8 @@ static struct pcie_port_service_driver dpcdriver = {
.port_type  = PCIE_ANY_PORT,
.service= PCIE_PORT_SERVICE_DPC,
.probe  = dpc_probe,
+   .suspend= dpc_suspend,
+   .resume = dpc_resume,
.remove = dpc_remove,
 };
 
-- 
2.34.1

[PATCH v2 3/4] PCI/AER: Disable AER interrupt on suspend

2023-04-19 Thread Kai-Heng Feng

PCIe service that shares IRQ with PME may cause spurious wakeup on
system suspend.

PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states
that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready
(D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose
much here to disable AER during system suspend.

This is very similar to previous attempts to suspend AER and DPC [1],
but with a different reason.

[1] 
https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295

Signed-off-by: Kai-Heng Feng 
---
v2:
 - Only disable AER IRQ.
 - No more check on PME IRQ#.
 - Use helper.

 drivers/pci/pcie/aer.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 1420e1f27105..9c07fdbeb52d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev)
return 0;
 }
 
+static int aer_suspend(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+   struct pci_dev *pdev = rpc->rpd;
+
+   aer_disable_irq(pdev);
+
+   return 0;
+}
+
+static int aer_resume(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+   struct pci_dev *pdev = rpc->rpd;
+
+   aer_enable_irq(pdev);
+
+   return 0;
+}
+
 /**
  * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
  * @dev: pointer to Root Port, RCEC, or RCiEP
@@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = {
.service= PCIE_PORT_SERVICE_AER,
 
.probe  = aer_probe,
+   .suspend= aer_suspend,
+   .resume = aer_resume,
.remove = aer_remove,
 };
 
-- 
2.34.1

[PATCH v2 2/4] PCI/AER: Factor out interrput toggling into helpers

2023-04-19 Thread Kai-Heng Feng

There are many places that enable and disable AER interrput, so move
them into helpers.

Signed-off-by: Kai-Heng Feng 
---
v2:
 - New patch.

 drivers/pci/pcie/aer.c | 45 +-
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f6c24ded134c..1420e1f27105 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context)
return IRQ_WAKE_THREAD;
 }
 
+static void aer_enable_irq(struct pci_dev *pdev)
+{
+   int aer = pdev->aer_cap;
+   u32 reg32;
+
+   /* Enable Root Port's interrupt in response to error messages */
+   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
+   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
+   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+}
+
+static void aer_disable_irq(struct pci_dev *pdev)
+{
+   int aer = pdev->aer_cap;
+   u32 reg32;
+
+   /* Disable Root's interrupt in response to error messages */
+   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
+   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
+   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+}
+
 /**
  * aer_enable_rootport - enable Root Port's interrupts when receiving messages
  * @rpc: pointer to a Root Port data structure
@@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
 
-   /* Enable Root Port's interrupt in response to error messages */
-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   aer_enable_irq(pdev);
 }
 
 /**
@@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
int aer = pdev->aer_cap;
u32 reg32;
 
-   /* Disable Root's interrupt in response to error messages */
-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   aer_disable_irq(pdev);
 
/* Clear Root's error status reg */
pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
@@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 */
aer = root ? root->aer_cap : 0;
 
-   if ((host->native_aer || pcie_ports_native) && aer) {
-   /* Disable Root's interrupt in response to error messages */
-   pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
-   }
+   if ((host->native_aer || pcie_ports_native) && aer)
+   aer_disable_irq(root);
 
if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET);
@@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
 
-   /* Enable Root Port's interrupt in response to error messages */
-   pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   aer_enable_irq(root);
}
 
return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
-- 
2.34.1

Re: [PATCH] Revert "ASoC: fsl: remove unnecessary dai_link->platform"

2023-04-19 Thread Kuninori Morimoto



Hi Shengjiu
Cc Mark

Thank you for the patch

> This reverts commit 33683cbf49b5412061cb1e4c876063fdef86def4.
> 
> dai_link->platform is needed. The platform component is
> "snd_dmaengine_pcm", which is registered from cpu driver,
> 
> If dai_link->platform is not assigned, then platform
> component will not be probed, then there will be issue:
> 
> aplay: main:831: audio open error: Invalid argument
> 
> Signed-off-by: Shengjiu Wang 
> ---

And sorry to my noise patch. I understood the issue.

Can I ask 2 things ?

My original patch removed 3 platforms.
Then, I understood that 2 of them are used as
soc-generic-dmaengine-pcm (= 1st, 3rd platform).

I think we want to have comment here that
why dummy component is needed. Can you agree ?

I wonder how about 2nd platform ? Is it same ?
I'm asking because it doesn't have of_node which other 2 platforms have.

Thank you for your help !!

Best regards
---
Kuninori Morimoto

[PATCH v8 04/10] nmi: backtrace: Allow runtime arch specific override

2023-04-19 Thread Douglas Anderson

From: Sumit Garg 

Add a boolean return to arch_trigger_cpumask_backtrace() to support a
use-case where a particular architecture detects at runtime if it supports
NMI backtrace or it would like to fallback to default implementation using
SMP cross-calls.

Currently such an architecture example is arm64 supporting pseudo NMIs
feature which is only available on platforms which have support for GICv3
or later version.

Signed-off-by: Sumit Garg 
Tested-by: Chen-Yu Tsai 
Signed-off-by: Douglas Anderson 
---

Changes in v8:
- Add loongarch support, too

 arch/arm/include/asm/irq.h   |  2 +-
 arch/arm/kernel/smp.c|  3 ++-
 arch/loongarch/include/asm/irq.h |  2 +-
 arch/loongarch/kernel/process.c  |  3 ++-
 arch/mips/include/asm/irq.h  |  2 +-
 arch/mips/kernel/process.c   |  3 ++-
 arch/powerpc/include/asm/nmi.h   |  2 +-
 arch/powerpc/kernel/stacktrace.c |  3 ++-
 arch/sparc/include/asm/irq_64.h  |  2 +-
 arch/sparc/kernel/process_64.c   |  4 +++-
 arch/x86/include/asm/irq.h   |  2 +-
 arch/x86/kernel/apic/hw_nmi.c|  3 ++-
 include/linux/nmi.h  | 12 
 13 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index a7c2337b0c7d..e6b62c7d6f0e 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -32,7 +32,7 @@ void init_IRQ(void);
 #ifdef CONFIG_SMP
 #include 
 
-extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
+extern bool arch_trigger_cpumask_backtrace(const cpumask_t *mask,
   bool exclude_self);
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 #endif
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 0b8c25763adc..acb97d9219b1 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -849,7 +849,8 @@ static void raise_nmi(cpumask_t *mask)
__ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask);
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_nmi);
+   return true;
 }
diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h
index a115e8999c69..c7a152d6bf0c 100644
--- a/arch/loongarch/include/asm/irq.h
+++ b/arch/loongarch/include/asm/irq.h
@@ -40,7 +40,7 @@ void spurious_interrupt(void);
 #define NR_IRQS_LEGACY 16
 
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
-void arch_trigger_cpumask_backtrace(const struct cpumask *mask, bool 
exclude_self);
+bool arch_trigger_cpumask_backtrace(const struct cpumask *mask, bool 
exclude_self);
 
 #define MAX_IO_PICS 2
 #define NR_IRQS(64 + (256 * MAX_IO_PICS))
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index fa2443c7afb2..8f7f818f5c4e 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -339,9 +339,10 @@ static void raise_backtrace(cpumask_t *mask)
}
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
+   return true;
 }
 
 #ifdef CONFIG_64BIT
diff --git a/arch/mips/include/asm/irq.h b/arch/mips/include/asm/irq.h
index 44f9824c1d8c..daf16173486a 100644
--- a/arch/mips/include/asm/irq.h
+++ b/arch/mips/include/asm/irq.h
@@ -77,7 +77,7 @@ extern int cp0_fdc_irq;
 
 extern int get_c0_fdc_int(void);
 
-void arch_trigger_cpumask_backtrace(const struct cpumask *mask,
+bool arch_trigger_cpumask_backtrace(const struct cpumask *mask,
bool exclude_self);
 #define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
 
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 093dbbd6b843..7d538571830a 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -750,9 +750,10 @@ static void raise_backtrace(cpumask_t *mask)
}
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+bool arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
 {
nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
+   return true;
 }
 
 int mips_get_process_fp_mode(struct task_struct *task)
diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index c3c7adef74de..135f65adcf63 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -12,7 +12,7 @@ static inline void watchdog_nmi_set_timeout_pct(u64 pct) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
-extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask,
+extern bool arch_trigger_cpumask_backtrace(const cpumask_t *mask,
   bool exclude_self);
 #define

Re: [PATCH 4/33] mm: add utility functions for ptdesc

2023-04-19 Thread Vishal Moola

On Wed, Apr 19, 2023 at 6:34 AM Vernon Yang  wrote:
>
> On Mon, Apr 17, 2023 at 01:50:19PM -0700, Vishal Moola wrote:
> > Introduce utility functions setting the foundation for ptdescs. These
> > will also assist in the splitting out of ptdesc from struct page.
> >
> > ptdesc_alloc() is defined to allocate new ptdesc pages as compound
> > pages. This is to standardize ptdescs by allowing for one allocation
> > and one free function, in contrast to 2 allocation and 2 free functions.
> >
> > Signed-off-by: Vishal Moola (Oracle) 
> > ---
> >  include/asm-generic/tlb.h | 11 ++
> >  include/linux/mm.h| 44 +++
> >  include/linux/pgtable.h   | 13 
> >  3 files changed, 68 insertions(+)
> >
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index b46617207c93..6bade9e0e799 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather 
> > *tlb, struct page *page)
> >   return tlb_remove_page_size(tlb, page, PAGE_SIZE);
> >  }
> >
> > +static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> > +{
> > + tlb_remove_table(tlb, pt);
> > +}
> > +
> > +/* Like tlb_remove_ptdesc, but for page-like page directories. */
> > +static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
> > ptdesc *pt)
> > +{
> > + tlb_remove_page(tlb, ptdesc_page(pt));
> > +}
> > +
> >  static inline void tlb_change_page_size(struct mmu_gather *tlb,
> >unsigned int page_size)
> >  {
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index b18848ae7e22..ec3cbe2fa665 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2744,6 +2744,45 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
> > pud_t *pud, unsigned long a
> >  }
> >  #endif /* CONFIG_MMU */
> >
> > +static inline struct ptdesc *virt_to_ptdesc(const void *x)
> > +{
> > + return page_ptdesc(virt_to_head_page(x));
> > +}
> > +
> > +static inline void *ptdesc_to_virt(struct ptdesc *pt)
> > +{
> > + return page_to_virt(ptdesc_page(pt));
> > +}
> > +
> > +static inline void *ptdesc_address(struct ptdesc *pt)
> > +{
> > + return folio_address(ptdesc_folio(pt));
> > +}
> > +
> > +static inline bool ptdesc_is_reserved(struct ptdesc *pt)
> > +{
> > + return folio_test_reserved(ptdesc_folio(pt));
> > +}
> > +
> > +static inline struct ptdesc *ptdesc_alloc(gfp_t gfp, unsigned int order)
> > +{
> > + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> > +
> > + return page_ptdesc(page);
> > +}
> > +
> > +static inline void ptdesc_free(struct ptdesc *pt)
> > +{
> > + struct page *page = ptdesc_page(pt);
> > +
> > + __free_pages(page, compound_order(page));
> > +}
> > +
> > +static inline void ptdesc_clear(void *x)
> > +{
> > + clear_page(x);
> > +}
> > +
> >  #if USE_SPLIT_PTE_PTLOCKS
> >  #if ALLOC_SPLIT_PTLOCKS
> >  void __init ptlock_cache_init(void);
> > @@ -2970,6 +3009,11 @@ static inline void mark_page_reserved(struct page 
> > *page)
> >   adjust_managed_page_count(page, -1);
> >  }
> >
> > +static inline void free_reserved_ptdesc(struct ptdesc *pt)
> > +{
> > + free_reserved_page(ptdesc_page(pt));
> > +}
> > +
> >  /*
> >   * Default method to free all the __init memory into the buddy system.
> >   * The freed pages will be poisoned with pattern "poison" if it's within
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index 7cc6ea057ee9..7cd803aa38eb 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -97,6 +97,19 @@ TABLE_MATCH(ptl, ptl);
> >  #undef TABLE_MATCH
> >  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> >
> > +#define ptdesc_page(pt)  (_Generic((pt),   
> >   \
> > + const struct ptdesc *:  (const struct page *)(pt),  \
> > + struct ptdesc *:(struct page *)(pt)))
> > +
> > +#define ptdesc_folio(pt) (_Generic((pt), \
> > + const struct ptdesc *:  (const struct folio *)(pt), \
> > + struct ptdesc *:(struct folio *)(pt)))
> > +
> > +static inline struct ptdesc *page_ptdesc(struct page *page)
> > +{
> > + return (struct ptdesc *)page;
> > +}
>
> Hi Vishal,
>
> I'm a little curious, why is the page_ptdesc() using inline functions instead 
> of macro?
> If this is any magic, please tell me, thank you very much.

No magic here, I was mainly basing it off Matthew's netmem
series. I'm not too clear on when to use macros vs inlines
myself :/.

If there's a benefit to having it be a macro let me
know and I can make that change in v2.

Re: [PATCH] Revert "ASoC: fsl: remove unnecessary dai_link->platform"

2023-04-19 Thread Mark Brown

On Wed, 19 Apr 2023 18:29:18 +0800, Shengjiu Wang wrote:
> This reverts commit 33683cbf49b5412061cb1e4c876063fdef86def4.
> 
> dai_link->platform is needed. The platform component is
> "snd_dmaengine_pcm", which is registered from cpu driver,
> 
> If dai_link->platform is not assigned, then platform
> component will not be probed, then there will be issue:
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] Revert "ASoC: fsl: remove unnecessary dai_link->platform"
  commit: 09cda705860125ffee1b1359b1da79f8e0c77a40

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Re: [PATCH v3 00/19] arch: Consolidate

2023-04-19 Thread Helge Deller


Hi Thomas,


Am 17.04.23 um 16:12 schrieb Arnd Bergmann:>> On Mon, Apr 17, 2023, at 14:56, 
Thomas Zimmermann wrote:

Various architectures provide  with helpers for fbdev
framebuffer devices. Share the contained code where possible. There
is already , which implements generic (as in
'empty') functions of the fbdev helpers. The header was added in
commit aafe4dbed0bf ("asm-generic: add generic versions of common
headers"), but never used.

Each per-architecture header file declares and/or implements fbdev
helpers and defines a preprocessor token for each. The generic
header then provides the remaining helpers. It works like the I/O
helpers in .


Looks all good to me,

Acked-by: Arnd Bergmann 


Thanks a lot. I know that Helge wants to test the PARISC changes, so
I'll keep this series pending for a bit longer. I'd like to merge the
patches through the DRM tree, if no one objects.


Yes, patch is good and I've tested it on parisc. Thanks!

You may add:
Acked-by: Helge Deller 
to the series and take it through the drm tree.

Helge

[PATCH 04/11] iommu: Simplify the __iommu_group_remove_device() flow

2023-04-19 Thread Jason Gunthorpe

Instead of returning the struct group_device and then later freeing it, do
the entire free under the group->mutex and defer only putting the
iommu_group.

It is safe to remove the sysfs_links and free memory while holding that
mutex.

Move the sanity assert of the group status into
__iommu_group_free_device().

The next patch will improve upon this and consolidate the group put and
the mutex into __iommu_group_remove_device().

__iommu_group_free_device() is close to being the paired undo of
iommu_group_add_device(), following patches will improve on that.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 83 ---
 1 file changed, 39 insertions(+), 44 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e08856c17121d8..471f19f7de8c4a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -466,32 +466,8 @@ int iommu_probe_device(struct device *dev)
 
 }
 
-/*
- * Remove a device from a group's device list and return the group device
- * if successful.
- */
-static struct group_device *
-__iommu_group_remove_device(struct iommu_group *group, struct device *dev)
-{
-   struct group_device *device;
-
-   lockdep_assert_held(>mutex);
-   for_each_group_device(group, device) {
-   if (device->dev == dev) {
-   list_del(>list);
-   return device;
-   }
-   }
-
-   return NULL;
-}
-
-/*
- * Release a device from its group and decrements the iommu group reference
- * count.
- */
-static void __iommu_group_release_device(struct iommu_group *group,
-struct group_device *grp_dev)
+static void __iommu_group_free_device(struct iommu_group *group,
+ struct group_device *grp_dev)
 {
struct device *dev = grp_dev->dev;
 
@@ -500,16 +476,45 @@ static void __iommu_group_release_device(struct 
iommu_group *group,
 
trace_remove_device_from_group(group->id, dev);
 
+   /*
+* If the group has become empty then ownership must have been
+* released, and the current domain must be set back to NULL or
+* the default domain.
+*/
+   if (list_empty(>devices))
+   WARN_ON(group->owner_cnt ||
+   group->domain != group->default_domain);
+
kfree(grp_dev->name);
kfree(grp_dev);
dev->iommu_group = NULL;
-   iommu_group_put(group);
 }
 
-static void iommu_release_device(struct device *dev)
+/*
+ * Remove the iommu_group from the struct device. The attached group must be 
put
+ * by the caller after releaseing the group->mutex.
+ */
+static void __iommu_group_remove_device(struct device *dev)
 {
struct iommu_group *group = dev->iommu_group;
struct group_device *device;
+
+   lockdep_assert_held(>mutex);
+   for_each_group_device(group, device) {
+   if (device->dev != dev)
+   continue;
+
+   list_del(>list);
+   __iommu_group_free_device(group, device);
+   /* Caller must put iommu_group */
+   return;
+   }
+   WARN(true, "Corrupted iommu_group device_list");
+}
+
+static void iommu_release_device(struct device *dev)
+{
+   struct iommu_group *group = dev->iommu_group;
const struct iommu_ops *ops;
 
if (!dev->iommu || !group)
@@ -518,16 +523,7 @@ static void iommu_release_device(struct device *dev)
iommu_device_unlink(dev->iommu->iommu_dev, dev);
 
mutex_lock(>mutex);
-   device = __iommu_group_remove_device(group, dev);
-
-   /*
-* If the group has become empty then ownership must have been released,
-* and the current domain must be set back to NULL or the default
-* domain.
-*/
-   if (list_empty(>devices))
-   WARN_ON(group->owner_cnt ||
-   group->domain != group->default_domain);
+   __iommu_group_remove_device(dev);
 
/*
 * release_device() must stop using any attached domain on the device.
@@ -543,8 +539,8 @@ static void iommu_release_device(struct device *dev)
ops->release_device(dev);
mutex_unlock(>mutex);
 
-   if (device)
-   __iommu_group_release_device(group, device);
+   /* Pairs with the get in iommu_group_add_device() */
+   iommu_group_put(group);
 
module_put(ops->owner);
dev_iommu_free(dev);
@@ -1103,7 +1099,6 @@ EXPORT_SYMBOL_GPL(iommu_group_add_device);
 void iommu_group_remove_device(struct device *dev)
 {
struct iommu_group *group = dev->iommu_group;
-   struct group_device *device;
 
if (!group)
return;
@@ -,11 +1106,11 @@ void iommu_group_remove_device(struct device *dev)
dev_info(dev, "Removing from iommu group %d\n", group->id);
 
mutex_lock(>mutex);
-   device =

[PATCH 05/11] iommu: Add iommu_init/deinit_driver() paired functions

2023-04-19 Thread Jason Gunthorpe

Move the driver init and destruction code into two logically paired
functions.

There is a subtle ordering dependency in how the group's domains are
freed, the current code does the kobject_put() on the group which will
hopefully trigger the free of the domains before the module_put() that
protects the domain->ops.

Reorganize this to be explicit and documented. The domains are cleaned up
by iommu_deinit_driver() if it is the last device to be deinit'd from the
group.  This must be done in a specific order - after
ops->release_device() and before the module_put(). Make it very clear and
obvious by putting the order directly in one function.

Leave WARN_ON's in case the refcounting gets messed up somehow.

This also moves the module_put() and dev_iommu_free() under the
group->mutex to keep the code simple.

Building paired functions like this helps ensure that error cleanup flows
in __iommu_probe_device() are correct because they share the same code
that handles the normal flow. These details become relavent as following
patches add more error unwind into __iommu_probe_device(), and ultimately
a following series adds fine-grained locking to __iommu_probe_device().

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 186 --
 1 file changed, 108 insertions(+), 78 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 471f19f7de8c4a..e428de5b386833 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -328,10 +328,95 @@ static u32 dev_iommu_get_max_pasids(struct device *dev)
return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
 }
 
+static int iommu_init_driver(struct device *dev, const struct iommu_ops *ops)
+{
+   struct iommu_device *iommu_dev;
+   struct iommu_group *group;
+   int ret;
+
+   if (!dev_iommu_get(dev))
+   return -ENOMEM;
+
+   if (!try_module_get(ops->owner)) {
+   ret = -EINVAL;
+   goto err_free;
+   }
+
+   iommu_dev = ops->probe_device(dev);
+   if (IS_ERR(iommu_dev)) {
+   ret = PTR_ERR(iommu_dev);
+   goto err_module_put;
+   }
+
+   group = ops->device_group(dev);
+   if (WARN_ON_ONCE(group == NULL))
+   group = ERR_PTR(-EINVAL);
+   if (IS_ERR(group)) {
+   ret = PTR_ERR(group);
+   goto err_release;
+   }
+   dev->iommu_group = group;
+
+   dev->iommu->iommu_dev = iommu_dev;
+   dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);
+   if (ops->is_attach_deferred)
+   dev->iommu->attach_deferred = ops->is_attach_deferred(dev);
+   return 0;
+
+err_release:
+   if (ops->release_device)
+   ops->release_device(dev);
+err_module_put:
+   module_put(ops->owner);
+err_free:
+   dev_iommu_free(dev);
+   return ret;
+}
+
+static void iommu_deinit_driver(struct device *dev)
+{
+   struct iommu_group *group = dev->iommu_group;
+   const struct iommu_ops *ops = dev_iommu_ops(dev);
+
+   lockdep_assert_held(>mutex);
+
+   /*
+* release_device() must stop using any attached domain on the device.
+* If there are still other devices in the group they are not effected
+* by this callback.
+*
+* The IOMMU driver must set the device to either an identity or
+* blocking translation and stop using any domain pointer, as it is
+* going to be freed.
+*/
+   if (ops->release_device)
+   ops->release_device(dev);
+
+   /*
+* If this is the last driver to use the group then we must free the
+* domains before we do the module_put().
+*/
+   if (list_empty(>devices)) {
+   if (group->default_domain) {
+   iommu_domain_free(group->default_domain);
+   group->default_domain = NULL;
+   }
+   if (group->blocking_domain) {
+   iommu_domain_free(group->blocking_domain);
+   group->blocking_domain = NULL;
+   }
+   group->domain = NULL;
+   }
+
+   /* Caller must put iommu_group */
+   dev->iommu_group = NULL;
+   module_put(ops->owner);
+   dev_iommu_free(dev);
+}
+
 static int __iommu_probe_device(struct device *dev, struct list_head 
*group_list)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
-   struct iommu_device *iommu_dev;
struct iommu_group *group;
static DEFINE_MUTEX(iommu_probe_device_lock);
int ret;
@@ -353,62 +438,30 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
goto out_unlock;
}
 
-   if (!dev_iommu_get(dev)) {
-   ret = -ENOMEM;
+   ret = iommu_init_driver(dev, ops);
+   if (ret)
goto out_unlock;
-   }
-
-   if (!try_module_get(ops->owner)) {
-

[PATCH 06/11] iommu: Move the iommu driver sysfs setup into iommu_init/deinit_driver()

2023-04-19 Thread Jason Gunthorpe

It makes logical sense that once the driver is attached to the device the
sysfs links appear, even if we haven't fully created the group_device or
attached the device to a domain.

Fix the missing error handling on sysfs creation since
iommu_init_driver() can trivially handle this.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu-sysfs.c |  6 --
 drivers/iommu/iommu.c   | 13 +
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iommu-sysfs.c b/drivers/iommu/iommu-sysfs.c
index 99869217fbec7d..c8aba0e2a30d70 100644
--- a/drivers/iommu/iommu-sysfs.c
+++ b/drivers/iommu/iommu-sysfs.c
@@ -107,9 +107,6 @@ int iommu_device_link(struct iommu_device *iommu, struct 
device *link)
 {
int ret;
 
-   if (!iommu || IS_ERR(iommu))
-   return -ENODEV;
-
ret = sysfs_add_link_to_group(>dev->kobj, "devices",
  >kobj, dev_name(link));
if (ret)
@@ -126,9 +123,6 @@ EXPORT_SYMBOL_GPL(iommu_device_link);
 
 void iommu_device_unlink(struct iommu_device *iommu, struct device *link)
 {
-   if (!iommu || IS_ERR(iommu))
-   return;
-
sysfs_remove_link(>kobj, "iommu");
sysfs_remove_link_from_group(>dev->kobj, "devices", 
dev_name(link));
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e428de5b386833..dbaf3ed9012c45 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -348,12 +348,16 @@ static int iommu_init_driver(struct device *dev, const 
struct iommu_ops *ops)
goto err_module_put;
}
 
+   ret = iommu_device_link(iommu_dev, dev);
+   if (ret)
+   goto err_release;
+
group = ops->device_group(dev);
if (WARN_ON_ONCE(group == NULL))
group = ERR_PTR(-EINVAL);
if (IS_ERR(group)) {
ret = PTR_ERR(group);
-   goto err_release;
+   goto err_unlink;
}
dev->iommu_group = group;
 
@@ -363,6 +367,8 @@ static int iommu_init_driver(struct device *dev, const 
struct iommu_ops *ops)
dev->iommu->attach_deferred = ops->is_attach_deferred(dev);
return 0;
 
+err_unlink:
+   iommu_device_unlink(iommu_dev, dev);
 err_release:
if (ops->release_device)
ops->release_device(dev);
@@ -380,6 +386,8 @@ static void iommu_deinit_driver(struct device *dev)
 
lockdep_assert_held(>mutex);
 
+   iommu_device_unlink(dev->iommu->iommu_dev, dev);
+
/*
 * release_device() must stop using any attached domain on the device.
 * If there are still other devices in the group they are not effected
@@ -454,7 +462,6 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
iommu_group_put(group);
 
mutex_unlock(_probe_device_lock);
-   iommu_device_link(dev->iommu->iommu_dev, dev);
 
return 0;
 
@@ -577,8 +584,6 @@ static void iommu_release_device(struct device *dev)
if (!dev->iommu || !group)
return;
 
-   iommu_device_unlink(dev->iommu->iommu_dev, dev);
-
__iommu_group_remove_device(dev);
 }
 
-- 
2.40.0

[PATCH 09/11] iommu/power: Remove iommu_del_device()

2023-04-19 Thread Jason Gunthorpe

This is only called from a BUS_NOTIFY_DEL_DEVICE notifier and it only
calls iommu_group_remove_device().

The core code now cleans up any iommu_group, even without a driver, during
BUS_NOTIFY_REMOVED_DEVICE. There is no reason for POWER to install its own
bus notifiers and duplicate the core code's work, remove this code.

Signed-off-by: Jason Gunthorpe 
---
 arch/powerpc/include/asm/iommu.h   |  5 -
 arch/powerpc/kernel/iommu.c| 17 -
 arch/powerpc/platforms/powernv/pci.c   | 25 -
 arch/powerpc/platforms/pseries/iommu.c | 25 -
 4 files changed, 72 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7e29c73e3dd48d..55d6213dbeaf42 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -205,7 +205,6 @@ extern void iommu_register_group(struct iommu_table_group 
*table_group,
 int pci_domain_number, unsigned long pe_num);
 extern int iommu_add_device(struct iommu_table_group *table_group,
struct device *dev);
-extern void iommu_del_device(struct device *dev);
 extern long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl,
unsigned long entry, unsigned long *hpa,
enum dma_data_direction *direction);
@@ -227,10 +226,6 @@ static inline int iommu_add_device(struct 
iommu_table_group *table_group,
 {
return 0;
 }
-
-static inline void iommu_del_device(struct device *dev)
-{
-}
 #endif /* !CONFIG_IOMMU_API */
 
 u64 dma_iommu_get_required_mask(struct device *dev);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ee95937bdaf14e..f02dd2149394e2 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1162,21 +1162,4 @@ int iommu_add_device(struct iommu_table_group 
*table_group, struct device *dev)
return iommu_group_add_device(table_group->group, dev);
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
-
-void iommu_del_device(struct device *dev)
-{
-   /*
-* Some devices might not have IOMMU table and group
-* and we needn't detach them from the associated
-* IOMMU groups
-*/
-   if (!device_iommu_mapped(dev)) {
-   pr_debug("iommu_tce: skipping device %s with no tbl\n",
-dev_name(dev));
-   return;
-   }
-
-   iommu_group_remove_device(dev);
-}
-EXPORT_SYMBOL_GPL(iommu_del_device);
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 233a50e65fcedd..7725492097b627 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -865,28 +865,3 @@ void __init pnv_pci_init(void)
/* Configure IOMMU DMA hooks */
set_pci_dma_ops(_iommu_ops);
 }
-
-static int pnv_tce_iommu_bus_notifier(struct notifier_block *nb,
-   unsigned long action, void *data)
-{
-   struct device *dev = data;
-
-   switch (action) {
-   case BUS_NOTIFY_DEL_DEVICE:
-   iommu_del_device(dev);
-   return 0;
-   default:
-   return 0;
-   }
-}
-
-static struct notifier_block pnv_tce_iommu_bus_nb = {
-   .notifier_call = pnv_tce_iommu_bus_notifier,
-};
-
-static int __init pnv_tce_iommu_bus_notifier_init(void)
-{
-   bus_register_notifier(_bus_type, _tce_iommu_bus_nb);
-   return 0;
-}
-machine_subsys_initcall_sync(powernv, pnv_tce_iommu_bus_notifier_init);
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index c74b71d4733d40..7818ace838ce61 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1699,28 +1699,3 @@ static int __init disable_multitce(char *str)
 }
 
 __setup("multitce=", disable_multitce);
-
-static int tce_iommu_bus_notifier(struct notifier_block *nb,
-   unsigned long action, void *data)
-{
-   struct device *dev = data;
-
-   switch (action) {
-   case BUS_NOTIFY_DEL_DEVICE:
-   iommu_del_device(dev);
-   return 0;
-   default:
-   return 0;
-   }
-}
-
-static struct notifier_block tce_iommu_bus_nb = {
-   .notifier_call = tce_iommu_bus_notifier,
-};
-
-static int __init tce_iommu_bus_notifier_init(void)
-{
-   bus_register_notifier(_bus_type, _iommu_bus_nb);
-   return 0;
-}
-machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
-- 
2.40.0

[PATCH 08/11] iommu: Always destroy the iommu_group during iommu_release_device()

2023-04-19 Thread Jason Gunthorpe

Have release fully clean up the iommu related parts of the struct device,
no matter what state they are in.

POWER creates iommu_groups without drivers attached, and the next patch
removes the open-coding of this same cleanup from POWER.

Split the logic so that the three things owned by the iommu core are
always cleaned up:
 - Any attached iommu_group
 - Any allocated dev->iommu, eg for fwsepc
 - Any attached driver via a struct group_device

This fixes a bug where a fwspec created without an iommu_group being
probed would not be freed.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dbaf3ed9012c45..a82516c8ea87ad 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -569,7 +569,6 @@ static void __iommu_group_remove_device(struct device *dev)
dev->iommu_group = NULL;
goto out;
}
-   WARN(true, "Corrupted iommu_group device_list");
 out:
mutex_unlock(>mutex);
 
@@ -581,10 +580,12 @@ static void iommu_release_device(struct device *dev)
 {
struct iommu_group *group = dev->iommu_group;
 
-   if (!dev->iommu || !group)
-   return;
+   if (group)
+   __iommu_group_remove_device(dev);
 
-   __iommu_group_remove_device(dev);
+   /* Free any fwspec if no iommu_driver was ever attached */
+   if (dev->iommu)
+   dev_iommu_free(dev);
 }
 
 static int __init iommu_set_def_domain_type(char *str)
-- 
2.40.0

[PATCH 10/11] iommu: Split iommu_group_add_device()

2023-04-19 Thread Jason Gunthorpe

Move the list_add_tail() for the group_device into the critical region
that immediately follows in __iommu_probe_device(). This avoids one case
of unlocking and immediately re-locking the group->mutex.

Consistently make the caller responsible for setting dev->iommu_group,
prior patches moved this into iommu_init_driver(), make the no-driver path
do this in iommu_group_add_device().

This completes making __iommu_group_free_device() and
iommu_group_alloc_device() into pair'd functions.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 66 ---
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a82516c8ea87ad..5ebff82041f2d1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -128,6 +128,8 @@ static int iommu_create_device_direct_mappings(struct 
iommu_domain *domain,
   struct device *dev);
 static ssize_t iommu_group_store_type(struct iommu_group *group,
  const char *buf, size_t count);
+static struct group_device *iommu_group_alloc_device(struct iommu_group *group,
+struct device *dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -427,6 +429,7 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
static DEFINE_MUTEX(iommu_probe_device_lock);
+   struct group_device *gdev;
int ret;
 
if (!ops)
@@ -451,16 +454,17 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
goto out_unlock;
 
group = dev->iommu_group;
-   ret = iommu_group_add_device(group, dev);
+   gdev = iommu_group_alloc_device(group, dev);
mutex_lock(>mutex);
-   if (ret)
+   if (IS_ERR(gdev)) {
+   ret = PTR_ERR(gdev);
goto err_put_group;
+   }
 
+   list_add_tail(>list, >devices);
if (group_list && !group->default_domain && list_empty(>entry))
list_add_tail(>entry, group_list);
mutex_unlock(>mutex);
-   iommu_group_put(group);
-
mutex_unlock(_probe_device_lock);
 
return 0;
@@ -572,7 +576,10 @@ static void __iommu_group_remove_device(struct device *dev)
 out:
mutex_unlock(>mutex);
 
-   /* Pairs with the get in iommu_group_add_device() */
+   /*
+* Pairs with the get in iommu_init_driver() or
+* iommu_group_add_device()
+*/
iommu_group_put(group);
 }
 
@@ -1061,22 +1068,16 @@ static int iommu_create_device_direct_mappings(struct 
iommu_domain *domain,
return ret;
 }
 
-/**
- * iommu_group_add_device - add a device to an iommu group
- * @group: the group into which to add the device (reference should be held)
- * @dev: the device
- *
- * This function is called by an iommu driver to add a device into a
- * group.  Adding a device increments the group reference count.
- */
-int iommu_group_add_device(struct iommu_group *group, struct device *dev)
+/* This is undone by __iommu_group_free_device() */
+static struct group_device *iommu_group_alloc_device(struct iommu_group *group,
+struct device *dev)
 {
int ret, i = 0;
struct group_device *device;
 
device = kzalloc(sizeof(*device), GFP_KERNEL);
if (!device)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
device->dev = dev;
 
@@ -1107,17 +1108,11 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
goto err_free_name;
}
 
-   iommu_group_ref_get(group);
-   dev->iommu_group = group;
-
-   mutex_lock(>mutex);
-   list_add_tail(>list, >devices);
-   mutex_unlock(>mutex);
trace_add_device_to_group(group->id, dev);
 
dev_info(dev, "Adding to iommu group %d\n", group->id);
 
-   return 0;
+   return device;
 
 err_free_name:
kfree(device->name);
@@ -1126,7 +1121,32 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
 err_free_device:
kfree(device);
dev_err(dev, "Failed to add to iommu group %d: %d\n", group->id, ret);
-   return ret;
+   return ERR_PTR(ret);
+}
+
+/**
+ * iommu_group_add_device - add a device to an iommu group
+ * @group: the group into which to add the device (reference should be held)
+ * @dev: the device
+ *
+ * This function is called by an iommu driver to add a device into a
+ * group.  Adding a device increments the group reference count.
+ */
+int iommu_group_add_device(struct iommu_group *group, struct device *dev)
+{
+   struct group_device *gdev;
+
+   gdev =

[PATCH 02/11] iommu: Use iommu_group_ref_get/put() for dev->iommu_group

2023-04-19 Thread Jason Gunthorpe

No reason to open code this, use the proper helper functions.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c486e648402d5c..73e9f50fba9dd2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -496,7 +496,7 @@ static void __iommu_group_release_device(struct iommu_group 
*group,
kfree(grp_dev->name);
kfree(grp_dev);
dev->iommu_group = NULL;
-   kobject_put(group->devices_kobj);
+   iommu_group_put(group);
 }
 
 static void iommu_release_device(struct device *dev)
@@ -1063,8 +1063,7 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
goto err_free_name;
}
 
-   kobject_get(group->devices_kobj);
-
+   iommu_group_ref_get(group);
dev->iommu_group = group;
 
mutex_lock(>mutex);
-- 
2.40.0

[PATCH 03/11] iommu: Inline iommu_group_get_for_dev() into __iommu_probe_device()

2023-04-19 Thread Jason Gunthorpe

This is the only caller, and it doesn't need the generality of the
function. We already know there is no iommu_group, so it is simply two
function calls.

Moving it here allows the following patches to split the logic in these
functions.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 50 ---
 1 file changed, 9 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 73e9f50fba9dd2..e08856c17121d8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -126,7 +126,6 @@ static int iommu_setup_default_domain(struct iommu_group 
*group,
  int target_type);
 static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
   struct device *dev);
-static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 static ssize_t iommu_group_store_type(struct iommu_group *group,
  const char *buf, size_t count);
 
@@ -375,12 +374,18 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
if (ops->is_attach_deferred)
dev->iommu->attach_deferred = ops->is_attach_deferred(dev);
 
-   group = iommu_group_get_for_dev(dev);
+   group = ops->device_group(dev);
+   if (WARN_ON_ONCE(group == NULL))
+   group = ERR_PTR(-EINVAL);
if (IS_ERR(group)) {
ret = PTR_ERR(group);
goto out_release;
}
 
+   ret = iommu_group_add_device(group, dev);
+   if (ret)
+   goto err_put_group;
+
mutex_lock(>mutex);
if (group_list && !group->default_domain && list_empty(>entry))
list_add_tail(>entry, group_list);
@@ -392,6 +397,8 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
 
return 0;
 
+err_put_group:
+   iommu_group_put(group);
 out_release:
if (ops->release_device)
ops->release_device(dev);
@@ -1666,45 +1673,6 @@ iommu_group_alloc_default_domain(struct iommu_group 
*group, int req_type)
return dom;
 }
 
-/**
- * iommu_group_get_for_dev - Find or create the IOMMU group for a device
- * @dev: target device
- *
- * This function is intended to be called by IOMMU drivers and extended to
- * support common, bus-defined algorithms when determining or creating the
- * IOMMU group for a device.  On success, the caller will hold a reference
- * to the returned IOMMU group, which will already include the provided
- * device.  The reference should be released with iommu_group_put().
- */
-static struct iommu_group *iommu_group_get_for_dev(struct device *dev)
-{
-   const struct iommu_ops *ops = dev_iommu_ops(dev);
-   struct iommu_group *group;
-   int ret;
-
-   group = iommu_group_get(dev);
-   if (group)
-   return group;
-
-   group = ops->device_group(dev);
-   if (WARN_ON_ONCE(group == NULL))
-   return ERR_PTR(-EINVAL);
-
-   if (IS_ERR(group))
-   return group;
-
-   ret = iommu_group_add_device(group, dev);
-   if (ret)
-   goto out_put_group;
-
-   return group;
-
-out_put_group:
-   iommu_group_put(group);
-
-   return ERR_PTR(ret);
-}
-
 struct iommu_domain *iommu_group_default_domain(struct iommu_group *group)
 {
return group->default_domain;
-- 
2.40.0

[PATCH 11/11] iommu: Avoid locking/unlocking for iommu_probe_device()

2023-04-19 Thread Jason Gunthorpe

Remove the race where a hotplug of a device into an existing group will
have the device installed in the group->devices, but not yet attached to
the group's current domain.

Move the group attachment logic from iommu_probe_device() and put it under
the same mutex that updates the group->devices list so everything is
atomic under the lock.

We retain the two step setup of the default domain for the
bus_iommu_probe() case solely so that we have a more complete view of the
group when creating the default domain for boot time devices. This is not
generally necessary with the current code structure but seems to be
supporting some odd corner cases like alias RID's and IOMMU_RESV_DIRECT or
driver bugs returning different default_domain types for the same group.

During bus_iommu_probe() the group will have a device list but both
group->default_domain and group->domain will be NULL.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu.c | 78 +++
 1 file changed, 35 insertions(+), 43 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5ebff82041f2d1..8fc230eb36d65f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -130,6 +130,8 @@ static ssize_t iommu_group_store_type(struct iommu_group 
*group,
  const char *buf, size_t count);
 static struct group_device *iommu_group_alloc_device(struct iommu_group *group,
 struct device *dev);
+static void __iommu_group_free_device(struct iommu_group *group,
+ struct group_device *grp_dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -461,14 +463,39 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
goto err_put_group;
}
 
+   /*
+* The gdev must be in the list before calling
+* iommu_setup_default_domain()
+*/
list_add_tail(>list, >devices);
-   if (group_list && !group->default_domain && list_empty(>entry))
-   list_add_tail(>entry, group_list);
+   WARN_ON(group->default_domain && !group->domain);
+   if (group->default_domain)
+   iommu_create_device_direct_mappings(group->default_domain, dev);
+   if (group->domain) {
+   ret = __iommu_device_set_domain(group, dev, group->domain, 0);
+   if (ret)
+   goto err_remove_gdev;
+   } else if (!group->default_domain && !group_list) {
+   ret = iommu_setup_default_domain(group, 0);
+   if (ret)
+   goto err_remove_gdev;
+   } else if (!group->default_domain) {
+   /*
+* With a group_list argument we defer the default_domain setup
+* to the caller by providing a de-duplicated list of groups
+* that need further setup.
+*/
+   if (list_empty(>entry))
+   list_add_tail(>entry, group_list);
+   }
mutex_unlock(>mutex);
mutex_unlock(_probe_device_lock);
 
return 0;
 
+err_remove_gdev:
+   list_del(>list);
+   __iommu_group_free_device(group, gdev);
 err_put_group:
iommu_deinit_driver(dev);
mutex_unlock(>mutex);
@@ -482,52 +509,17 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops;
-   struct iommu_group *group;
int ret;
 
ret = __iommu_probe_device(dev, NULL);
if (ret)
-   goto err_out;
-
-   group = iommu_group_get(dev);
-   if (!group) {
-   ret = -ENODEV;
-   goto err_release;
-   }
-
-   mutex_lock(>mutex);
-
-   if (group->default_domain)
-   iommu_create_device_direct_mappings(group->default_domain, dev);
-
-   if (group->domain) {
-   ret = __iommu_device_set_domain(group, dev, group->domain, 0);
-   if (ret)
-   goto err_unlock;
-   } else if (!group->default_domain) {
-   ret = iommu_setup_default_domain(group, 0);
-   if (ret)
-   goto err_unlock;
-   }
-
-   mutex_unlock(>mutex);
-   iommu_group_put(group);
+   return ret;
 
ops = dev_iommu_ops(dev);
if (ops->probe_finalize)
ops->probe_finalize(dev);
 
return 0;
-
-err_unlock:
-   mutex_unlock(>mutex);
-   iommu_group_put(group);
-err_release:
-   iommu_release_device(dev);
-
-err_out:
-   return ret;
-
 }
 
 static void __iommu_group_free_device(struct iommu_group *group,
@@ -1809,11 +1801,6 @@ int bus_iommu_probe(struct bus_type *bus)
LIST_HEAD(group_list);
int ret;
 
-   /*
-

[PATCH 00/11] Consolidate the probe_device path

2023-04-19 Thread Jason Gunthorpe

Now that the domain allocation path is less duplicated we can tackle the
probe_device path. Details of this are spread across several functions,
broadly move most of the code into __iommu_probe_device() and organize it
more strictly in terms of paired do/undo functions.

Make the locking simpler by obtaining the group->mutex fewer times and
avoiding adding a half-initialized device to an initialized
group. Previously we would lock/unlock the group three times on these
paths.

This locking change is the primary point of the series, creating the
paired do/undo functions is a path to being able to organize the setup
code under a single lock and still have a logical, not duplicated, error
unwind.

This follows the prior series:

https://lore.kernel.org/r/0-v4-79d0c229580a+650-iommu_err_unwind_...@nvidia.com

Jason Gunthorpe (11):
  iommu: Have __iommu_probe_device() check for already probed devices
  iommu: Use iommu_group_ref_get/put() for dev->iommu_group
  iommu: Inline iommu_group_get_for_dev() into __iommu_probe_device()
  iommu: Simplify the __iommu_group_remove_device() flow
  iommu: Add iommu_init/deinit_driver() paired functions
  iommu: Move the iommu driver sysfs setup into
iommu_init/deinit_driver()
  iommu: Do not export iommu_device_link/unlink()
  iommu: Always destroy the iommu_group during iommu_release_device()
  iommu/power: Remove iommu_del_device()
  iommu: Split iommu_group_add_device()
  iommu: Avoid locking/unlocking for iommu_probe_device()

 arch/powerpc/include/asm/iommu.h   |   5 -
 arch/powerpc/kernel/iommu.c|  17 -
 arch/powerpc/platforms/powernv/pci.c   |  25 --
 arch/powerpc/platforms/pseries/iommu.c |  25 --
 drivers/acpi/scan.c|   2 +-
 drivers/iommu/intel/iommu.c|   7 -
 drivers/iommu/iommu-sysfs.c|   8 -
 drivers/iommu/iommu.c  | 411 +
 drivers/iommu/of_iommu.c   |   2 +-
 9 files changed, 212 insertions(+), 290 deletions(-)


base-commit: 172314c88ed17bd838404d837bfb256d9bfd4e3d
-- 
2.40.0

[PATCH 01/11] iommu: Have __iommu_probe_device() check for already probed devices

2023-04-19 Thread Jason Gunthorpe

This is a step toward making __iommu_probe_device() self contained.

It should, under proper locking, check if the device is already associated
with an iommu driver and resolve parallel probes. All but one of the
callers open code this test using two different means, but they all
rely on dev->iommu_group.

Currently the bus_iommu_probe()/probe_iommu_group() and
probe_acpi_namespace_devices() rejects already probed devices with an
unlocked read of dev->iommu_group. The OF and ACPI "replay" functions use
device_iommu_mapped() which is the same read without the pointless
refcount.

Move this test into __iommu_probe_device() and put it under the
iommu_probe_device_lock. The store to dev->iommu_group is in
iommu_group_add_device() which is also called under this lock for iommu
driver devices, making it properly locked.

The only path that didn't have this check is the hotplug path triggered by
BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned
outside the probe path is via iommu_group_add_device(). Today there are
only three callers, VFIO no-iommu, powernv and power pseries - none of
these cases probe iommu drivers. Thus adding this additional check is
safe.

Signed-off-by: Jason Gunthorpe 
---
 drivers/acpi/scan.c |  2 +-
 drivers/iommu/intel/iommu.c |  7 ---
 drivers/iommu/iommu.c   | 19 +--
 drivers/iommu/of_iommu.c|  2 +-
 4 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 0c6f06abe3f47f..945866f3bd8ebd 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1579,7 +1579,7 @@ static const struct iommu_ops 
*acpi_iommu_configure_id(struct device *dev,
 * If we have reason to believe the IOMMU driver missed the initial
 * iommu_probe_device() call for dev, replay it to get things in order.
 */
-   if (!err && dev->bus && !device_iommu_mapped(dev))
+   if (!err && dev->bus)
err = iommu_probe_device(dev);
 
/* Ignore all other errors apart from EPROBE_DEFER */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b871a6afd80321..3c37b50c121c2d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3763,7 +3763,6 @@ static int __init probe_acpi_namespace_devices(void)
for_each_active_dev_scope(drhd->devices,
  drhd->devices_cnt, i, dev) {
struct acpi_device_physical_node *pn;
-   struct iommu_group *group;
struct acpi_device *adev;
 
if (dev->bus != _bus_type)
@@ -3773,12 +3772,6 @@ static int __init probe_acpi_namespace_devices(void)
mutex_lock(>physical_node_lock);
list_for_each_entry(pn,
>physical_node_list, node) {
-   group = iommu_group_get(pn->dev);
-   if (group) {
-   iommu_group_put(group);
-   continue;
-   }
-
ret = iommu_probe_device(pn->dev);
if (ret)
break;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6bd275fb640441..c486e648402d5c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -347,9 +347,16 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
 * but for now enforcing a simple global ordering is fine.
 */
mutex_lock(_probe_device_lock);
+
+   /* Device is probed already if in a group */
+   if (dev->iommu_group) {
+   ret = 0;
+   goto out_unlock;
+   }
+
if (!dev_iommu_get(dev)) {
ret = -ENOMEM;
-   goto err_unlock;
+   goto out_unlock;
}
 
if (!try_module_get(ops->owner)) {
@@ -395,7 +402,7 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
 err_free:
dev_iommu_free(dev);
 
-err_unlock:
+out_unlock:
mutex_unlock(_probe_device_lock);
 
return ret;
@@ -1707,16 +1714,8 @@ struct iommu_domain *iommu_group_default_domain(struct 
iommu_group *group)
 static int probe_iommu_group(struct device *dev, void *data)
 {
struct list_head *group_list = data;
-   struct iommu_group *group;
int ret;
 
-   /* Device is probed already if in a group */
-   group = iommu_group_get(dev);
-   if (group) {
-   iommu_group_put(group);
-   return 0;
-   }
-
ret = __iommu_probe_device(dev, group_list);
if (ret == -ENODEV)
ret = 0;
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 40f57d293a79d4..157b286e36bf3a 100644
--- a/drivers/iommu/of_iommu.c

[PATCH 07/11] iommu: Do not export iommu_device_link/unlink()

2023-04-19 Thread Jason Gunthorpe

These are not used outside iommu.c, they should not be available to
modular code.

Signed-off-by: Jason Gunthorpe 
---
 drivers/iommu/iommu-sysfs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/iommu/iommu-sysfs.c b/drivers/iommu/iommu-sysfs.c
index c8aba0e2a30d70..cbe378c34ba3eb 100644
--- a/drivers/iommu/iommu-sysfs.c
+++ b/drivers/iommu/iommu-sysfs.c
@@ -119,11 +119,9 @@ int iommu_device_link(struct iommu_device *iommu, struct 
device *link)
 
return ret;
 }
-EXPORT_SYMBOL_GPL(iommu_device_link);
 
 void iommu_device_unlink(struct iommu_device *iommu, struct device *link)
 {
sysfs_remove_link(>kobj, "iommu");
sysfs_remove_link_from_group(>dev->kobj, "devices", 
dev_name(link));
 }
-EXPORT_SYMBOL_GPL(iommu_device_unlink);
-- 
2.40.0

[PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

2023-04-19 Thread Gaurav Batra

When DMA window is backed by 2MB TCEs, the DMA address for the mapped
page should be the offset of the page relative to the 2MB TCE. The code
was incorrectly setting the DMA address to the beginning of the TCE
range.

Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
ethernet port, when DMA window is backed by 2MB TCEs.

Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
Signed-off-by: Gaurav Batra 

Reviewed-by: Greg Joyce 
Reviewed-by: Brian King 
---
 arch/powerpc/kernel/iommu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ee95937bdaf1..ca57526ce47a 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
/* Convert entry to a dma_addr_t */
entry += tbl->it_offset;
dma_addr = entry << tbl->it_page_shift;
-   dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
+   dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
 
DBG("  - %lu pages, entry: %lx, dma_addr: %lx\n",
npages, entry, dma_addr);
@@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
unsigned int order;
unsigned int nio_pages, io_order;
struct page *page;
+   int tcesize = (1 << tbl->it_page_shift);
 
size = PAGE_ALIGN(size);
order = get_order(size);
@@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
memset(ret, 0, size);
 
/* Set up tces to cover the allocated range */
-   nio_pages = size >> tbl->it_page_shift;
+   nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
+
io_order = get_iommu_order(size, tbl);
mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
  mask >> tbl->it_page_shift, io_order, 0);
@@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
free_pages((unsigned long)ret, order);
return NULL;
}
-   *dma_handle = mapping;
+
+   *dma_handle = mapping | ((u64)ret & (tcesize - 1));
return ret;
 }
 
--

Re: [PATCH] ASoC: fsl: Simplify an error message

2023-04-19 Thread Mark Brown

On Sun, 16 Apr 2023 08:29:34 +0200, Christophe JAILLET wrote:
> dev_err_probe() already display the error code. There is no need to
> duplicate it explicitly in the error message.
> 
> 

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl: Simplify an error message
  commit: 574399f4c997ad71fab95dd875a9ff55424f9a3d

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Re: [PATCH] ASoC: fsl_asrc_dma: fix potential null-ptr-deref

2023-04-19 Thread Mark Brown

On Mon, 17 Apr 2023 06:32:42 -0700, Nikita Zhandarovich wrote:
> dma_request_slave_channel() may return NULL which will lead to
> NULL pointer dereference error in 'tmp_chan->private'.
> 
> Correct this behaviour by, first, switching from deprecated function
> dma_request_slave_channel() to dma_request_chan(). Secondly, enable
> sanity check for the resuling value of dma_request_chan().
> Also, fix description that follows the enacted changes and that
> concerns the use of dma_request_slave_channel().
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl_asrc_dma: fix potential null-ptr-deref
  commit: 86a24e99c97234f87d9f70b528a691150e145197

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Re: [PATCH] ASoC: fsl_sai: Fix pins setting for i.MX8QM platform

2023-04-19 Thread Mark Brown

On Tue, 18 Apr 2023 17:42:59 +0800, Chancel Liu wrote:
> SAI on i.MX8QM platform supports the data lines up to 4. So the pins
> setting should be corrected to 4.
> 
> 

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl_sai: Fix pins setting for i.MX8QM platform
  commit: 238787157d83969e5149c8e99787d5d90e85fbe5

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

Re: [PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush

2023-04-19 Thread Palmer Dabbelt


On Mon, 27 Mar 2023 05:13:04 PDT (-0700), a...@kernel.org wrote:

From: Arnd Bergmann 

No other architecture intentionally writes back dirty cache lines into
a buffer that a device has just finished writing into. If the cache is
clean, this has no effect at all, but if a cacheline in the buffer has
actually been written by the CPU,  there is a drive bug that is likely
made worse by overwriting that buffer.

Signed-off-by: Arnd Bergmann 
---
 arch/riscv/mm/dma-noncoherent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index d919efab6eba..640f4c496d26 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -42,7 +42,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
break;
case DMA_FROM_DEVICE:
case DMA_BIDIRECTIONAL:
-   ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+   ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
break;
default:
break;


Acked-by: Palmer Dabbelt

Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA

2023-04-19 Thread Palmer Dabbelt


On Mon, 27 Mar 2023 05:13:05 PDT (-0700), a...@kernel.org wrote:

From: Arnd Bergmann 

For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned
first to let the device see data written by the CPU, and invalidated
after the transfer to let the CPU see data written by the device.

riscv also invalidates the caches before the transfer, which does
not appear to serve any purpose.

Signed-off-by: Arnd Bergmann 
---
 arch/riscv/mm/dma-noncoherent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index 640f4c496d26..69c80b2155a1 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
break;
case DMA_BIDIRECTIONAL:
-   ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+   ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
break;
default:
break;


Acked-by: Palmer Dabbelt

Re: [PATCH v3 5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-04-19 Thread Robert Richter

Bjorn,

On 18.04.23 00:00:58, Robert Richter wrote:
> On 14.04.23 16:32:54, Bjorn Helgaas wrote:
> > On Thu, Apr 13, 2023 at 01:40:52PM +0200, Robert Richter wrote:
> > > On 12.04.23 17:02:33, Bjorn Helgaas wrote:
> > > > On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:

> > I'm mostly interested in the PCI entities involved because that's all
> > aer.c can deal with.  For the above, I think the PCI core only knows
> > about these:
> > 
> >   00:00.0 RCEC  with AER, RCEC EA includes 00:01.0
> >   00:01.0 RCiEP with AER
> > 
> > aer_irq() would handle AER interrupts from 00:00.0.
> > cxl_handle_error() would be called for 00:00.0 and would call
> > handle_error_source() for everything below it (only 00:01.0 here).
> > 
> > > > The current code uses pcie_walk_rcec() in this path, which basically
> > > > searches below a Root Port or RCEC for devices that have an AER error
> > > > status bit set, add them to the e_info[] list, and call
> > > > handle_error_source() for each one:
> > > 
> > > For reference, this series adds support to handle RCH downstream
> > > port-detected errors as described in CXL 3.0, 12.2.1.1.
> > > 
> > > This flow looks correct to me, see comments inline.
> > 
> > We seem to be on the same page here, so I'll trim it out.
> > 
> > > ...
> > > > So we insert cxl_handle_error() in handle_error_source(), where it
> > > > gets called for the RCEC, and then it uses pcie_walk_rcec() again to
> > > > forcibly call handle_error_source() for *every* device "below" the
> > > > RCEC (even though they don't have AER error status bits set).
> > > 
> > > The CXL device contains the links to the dport's caps. Also, there can
> > > be multiple RCs with CXL devs connected to it. So we must search for
> > > all CXL devices now, determine the corresponding dport and inspect
> > > both, PCIe AER and CXL RAS caps.
> > > 
> > > > Then handle_error_source() ultimately calls the CXL driver err_handler
> > > > entry points (.cor_error_detected(), .error_detected(), etc), which
> > > > can look at the CXL-specific error status in the CXL RAS or RCRB or
> > > > whatever.
> > > 
> > > The AER driver (portdrv) does not have the knowledge of CXL internals.
> > > Thus the approach is to pass dport errors to the cxl_mem driver to
> > > handle it there in addition to cxl mem dev errors.
> > > 
> > > > So this basically looks like a workaround for the fact that the AER
> > > > code only calls handle_error_source() when it finds AER error status,
> > > > and CXL doesn't *set* that AER error status.  There's not that much
> > > > code here, but it seems like a quite a bit of complexity in an area
> > > > that is already pretty complicated.
> > 
> > My main point here (correct me if I got this wrong) is that:
> > 
> >   - A RCEC generates an AER interrupt
> > 
> >   - find_source_device() searches all devices below the RCEC and
> > builds a list everything for which to call handle_error_source()
> 
> find_source_device() does not walk the RCEC if the error source is the
> RCEC itself (note that find_device_iter() is called for the root/rcec
> device first and exits early then).
> 
> > 
> >   - cxl_handle_error() *again* looks at all devices below the same
> > RCEC and calls handle_error_source() for each one
> > 
> > So the main difference here is that the existing flow only calls
> > handle_error_source() when it finds an error logged in an AER status
> > register, while the new CXL flow calls handle_error_source() for
> > *every* device below the RCEC.
> 
> That is limited as much as possible:
> 
>  * The RCEC walk to handle CXL dport errors is done only in case of
>internal errors, for an RCEC only (not a port) (check in
>cxl_handle_error()).
> 
>  * Internal errors are only enabled for RCECs connected to CXL devices
>(handles_cxl_errors()).
> 
>  * The handler is only called if it is a CXL memory device (class code
>set and zero devfn) (check in cxl_handle_error_iter()).
> 
> An optimization I see here is to convert some runtime checks to cached
> values determined during device enumeration (CXL device list, RCEC is
> associated with CXL devices). Some sort of RCEC-to-CXL-dev
> association, similar to rcec->rcec_ea.
> 
> > 
> > I think it's OK to do that, but the almost recursive structure and the
> > unusual reference counting make the overall AER flow much harder to
> > understand.
> > 
> > What if we changed is_error_source() to add every CXL.mem device it
> > finds to the e_info[] list, which I think could nicely encapsulate the
> > idea that "CXL devices have error state we don't know how to interpret
> > here"?  Would the existing loop in aer_process_err_devices() then do
> > what you need?
> 
> I did not want to mix this with devices determined by the Error Source
> Identification Register. CXL device may not be the error source of an
> error which may cause some unwanted side-effects. We must also touch
> AER_MAX_MULTI_ERR_DEVICES then and how the dev list is implemented

Re: [PATCH v3 5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

2023-04-19 Thread Robert Richter

Dan,

thanks for review, see comments inline.

On 17.04.23 18:01:41, Dan Williams wrote:
> Terry Bowman wrote:
> > From: Robert Richter 
> > 
> > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > to an RCEC.
> > 
> > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > Corrected Internal Errors (CIE). The error source is the id of the
> > RCEC. A CXL handler must then inspect the error status in various CXL
> > registers residing in the dport's component register space (CXL RAS
> > cap) or the dport's RCRB (AER ext cap). [1]
> > 
> > Errors showing up in the RCEC's error handler must be handled and
> > connected to the CXL subsystem. Implement this by forwarding the error
> > to all CXL devices below the RCEC. Since the entire CXL device is
> > controlled only using PCIe Configuration Space of device 0, Function
> > 0, only pass it there [2]. These devices have the Memory Device class
> > code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> > can implement the handler. In addition to errors directed to the CXL
> > endpoint device, the handler must also inspect the CXL downstream
> > port's CXL RAS and PCIe AER external capabilities that is connected to
> > the device.
> > 
> > Since CXL downstream port errors are signaled using internal errors,
> > the handler requires those errors to be unmasked. This is subject of a
> > follow-on patch.
> > 
> > The reason for choosing this implementation is that a CXL RCEC device
> > is bound to the AER port driver, but the driver does not allow it to
> > register a custom specific handler to support CXL. Connecting the RCEC
> > hard-wired with a CXL handler does not work, as the CXL subsystem
> > might not be present all the time. The alternative to add an
> > implementation to the portdrv to allow the registration of a custom
> > RCEC error handler isn't worth doing it as CXL would be its only user.
> > Instead, just check for an CXL RCEC and pass it down to the connected
> > CXL device's error handler. With this approach the code can entirely
> > be implemented in the PCIe AER driver and is independent of the CXL
> > subsystem. The CXL driver only provides the handler.
> > 
> > [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> > [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> > 
> > Co-developed-by: Terry Bowman 
> > Signed-off-by: Robert Richter 
> > Signed-off-by: Terry Bowman 
> > Cc: "Oliver O'Halloran" 
> > Cc: Bjorn Helgaas 
> > Cc: Mahesh J Salgaonkar 
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-...@vger.kernel.org
> > ---
> >  drivers/pci/pcie/Kconfig |  8 ++
> >  drivers/pci/pcie/aer.c   | 61 
> >  2 files changed, 69 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> > index 228652a59f27..b0dbd864d3a3 100644
> > --- a/drivers/pci/pcie/Kconfig
> > +++ b/drivers/pci/pcie/Kconfig
> > @@ -49,6 +49,14 @@ config PCIEAER_INJECT
> >   gotten from:
> >  
> > https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
> >  
> > +config PCIEAER_CXL
> > +   bool "PCI Express CXL RAS support"
> > +   default y
> > +   depends on PCIEAER && CXL_PCI
> > +   help
> > + This enables CXL error handling for Restricted CXL Hosts
> > + (RCHs).
> > +
> >  #
> >  # PCI Express ECRC
> >  #
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 7a25b62d9e01..171a08fd8ebd 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
> > return true;
> >  }
> >  
> > +#ifdef CONFIG_PCIEAER_CXL
> > +
> > +static bool is_cxl_mem_dev(struct pci_dev *dev)
> > +{
> > +   /*
> > +* A CXL device is controlled only using PCIe Configuration
> > +* Space of device 0, Function 0.
> > +*/
> > +   if (dev->devfn != PCI_DEVFN(0, 0))
> > +   return false;
> > +
> > +   /* Right now there is only a CXL.mem driver */
> > +   if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> > +   return false;
> > +
> > +   return true;
> > +}
> 
> This part feels broken because most the errors of concern here are CXL
> link generic and that can involve CXL.cache and CXL.mem errors on
> devices that are not PCI_CLASS_MEMORY_CXL. This situation feels like it
> wants formal acknowledgement in 'struct pci_dev' that CXL links ride on
> top of PCIe links.

There is already rcec->rcec_ea that holds the RCEC-to-endpoint
association. Determining if the RCiEP is a CXL dev is a small check
which is exactly what is_cxl_mem_dev() is for. I don't see a benefit
in holding the same information in an additional cxl_link structure.

And as you also said below, for RCRB handling a

Re: [PATCH] Revert "ASoC: fsl: remove unnecessary dai_link->platform"

2023-04-19 Thread Mark Brown

On Wed, Apr 19, 2023 at 06:29:18PM +0800, Shengjiu Wang wrote:
> This reverts commit 33683cbf49b5412061cb1e4c876063fdef86def4.

Please include human readable descriptions of things like commits and
issues being discussed in e-mail in your mails, this makes them much
easier for humans to read especially when they have no internet access.
I do frequently catch up on my mail on flights or while otherwise
travelling so this is even more pressing for me than just being about
making things a bit easier to read.

Please submit patches using subject lines reflecting the style for the
subsystem, this makes it easier for people to identify relevant patches.
Look at what existing commits in the area you're changing are doing and
make sure your subject lines visually resemble what they're doing.
There's no need to resubmit to fix this alone.


signature.asc
Description: PGP signature

Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode

2023-04-19 Thread Marcelo Tosatti

On Wed, Apr 19, 2023 at 01:30:57PM +0200, David Hildenbrand wrote:
> On 06.04.23 20:27, Peter Zijlstra wrote:
> > On Thu, Apr 06, 2023 at 05:51:52PM +0200, David Hildenbrand wrote:
> > > On 06.04.23 17:02, Peter Zijlstra wrote:
> > 
> > > > DavidH, what do you thikn about reviving Jann's patches here:
> > > > 
> > > > https://bugs.chromium.org/p/project-zero/issues/detail?id=2365#c1
> > > > 
> > > > Those are far more invasive, but afaict they seem to do the right thing.
> > > > 
> > > 
> > > I recall seeing those while discussed on secur...@kernel.org. What we
> > > currently have was (IMHO for good reasons) deemed better to fix the issue,
> > > especially when caring about backports and getting it right.
> > 
> > Yes, and I think that was the right call. However, we can now revisit
> > without having the pressure of a known defect and backport
> > considerations.
> > 
> > > The alternative that was discussed in that context IIRC was to simply
> > > allocate a fresh page table, place the fresh page table into the list
> > > instead, and simply free the old page table (then using common machinery).
> > > 
> > > TBH, I'd wish (and recently raised) that we could just stop wasting memory
> > > on page tables for THPs that are maybe never going to get PTE-mapped ... 
> > > and
> > > eventually just allocate on demand (with some caching?) and handle the
> > > places where we're OOM and cannot PTE-map a THP in some descend way.
> > > 
> > > ... instead of trying to figure out how to deal with these page tables we
> > > cannot free but have to special-case simply because of GUP-fast.
> > 
> > Not keeping them around sounds good to me, but I'm not *that* familiar
> > with the THP code, most of that happened after I stopped tracking mm. So
> > I'm not sure how feasible is it.
> > 
> > But it does look entirely feasible to rework this page-table freeing
> > along the lines Jann did.
> 
> It's most probably more feasible, although the easiest would be to just
> allocate a fresh page table to deposit and free the old one using the mmu
> gatherer.
> 
> This way we can avoid the khugepaged of tlb_remove_table_smp_sync(), but not
> the tlb_remove_table_one() usage. I suspect khugepaged isn't really relevant
> in RT kernels (IIRC, most of RT setups disable THP completely).

People will disable khugepaged because it causes IPIs (and the fact one
has to disable khugepaged is a configuration overhead, and a source of
headache for configuring the realtime system, since one can forget of
doing that, etc).

But people do want to run non-RT applications along with RT applications
(in case you have a single box on a priviledged location, for example).

> 
> tlb_remove_table_one() only triggers if __get_free_page(GFP_NOWAIT |
> __GFP_NOWARN); fails. IIUC, that can happen easily under memory pressure
> because it doesn't wait for direct reclaim.
> 
> I don't know much about RT workloads (so I'd appreciate some feedback), but
> I guess we can run int memory pressure as well due to some !rt housekeeping
> task on the system?

Yes, exactly (memory for -RT app will be mlocked).

Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode

2023-04-19 Thread David Hildenbrand


On 06.04.23 20:27, Peter Zijlstra wrote:

On Thu, Apr 06, 2023 at 05:51:52PM +0200, David Hildenbrand wrote:

On 06.04.23 17:02, Peter Zijlstra wrote:



DavidH, what do you thikn about reviving Jann's patches here:

https://bugs.chromium.org/p/project-zero/issues/detail?id=2365#c1

Those are far more invasive, but afaict they seem to do the right thing.



I recall seeing those while discussed on secur...@kernel.org. What we
currently have was (IMHO for good reasons) deemed better to fix the issue,
especially when caring about backports and getting it right.


Yes, and I think that was the right call. However, we can now revisit
without having the pressure of a known defect and backport
considerations.


The alternative that was discussed in that context IIRC was to simply
allocate a fresh page table, place the fresh page table into the list
instead, and simply free the old page table (then using common machinery).

TBH, I'd wish (and recently raised) that we could just stop wasting memory
on page tables for THPs that are maybe never going to get PTE-mapped ... and
eventually just allocate on demand (with some caching?) and handle the
places where we're OOM and cannot PTE-map a THP in some descend way.

... instead of trying to figure out how to deal with these page tables we
cannot free but have to special-case simply because of GUP-fast.


Not keeping them around sounds good to me, but I'm not *that* familiar
with the THP code, most of that happened after I stopped tracking mm. So
I'm not sure how feasible is it.

But it does look entirely feasible to rework this page-table freeing
along the lines Jann did.


It's most probably more feasible, although the easiest would be to just 
allocate a fresh page table to deposit and free the old one using the 
mmu gatherer.


This way we can avoid the khugepaged of tlb_remove_table_smp_sync(), but 
not the tlb_remove_table_one() usage. I suspect khugepaged isn't really 
relevant in RT kernels (IIRC, most of RT setups disable THP completely).


tlb_remove_table_one() only triggers if __get_free_page(GFP_NOWAIT | 
__GFP_NOWARN); fails. IIUC, that can happen easily under memory pressure 
because it doesn't wait for direct reclaim.


I don't know much about RT workloads (so I'd appreciate some feedback), 
but I guess we can run int memory pressure as well due to some !rt 
housekeeping task on the system?


--
Thanks,

David / dhildenb

Re: [PATCH v3 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-04-19 Thread Justin Forbes

On Wed, Apr 19, 2023 at 6:12 AM Catalin Marinas  wrote:
>
> On Tue, Apr 18, 2023 at 03:05:57PM -0700, Andrew Morton wrote:
> > On Wed, 12 Apr 2023 18:27:08 +0100 Catalin Marinas 
> >  wrote:
> > > > It sounds nice in theory. In practice. EXPERT hides too much. When you
> > > > flip expert, you expose over a 175ish new config options which are
> > > > hidden behind EXPERT.  You don't have to know what you are doing just
> > > > with the MAX_ORDER, but a whole bunch more as well.  If everyone were
> > > > already running 10, this might be less of a problem. At least Fedora
> > > > and RHEL are running 13 for 4K pages on aarch64. This was not some
> > > > accidental choice, we had to carry a patch to even allow it for a
> > > > while.  If this does go in as is, we will likely just carry a patch to
> > > > remove the "if EXPERT", but that is a bit of a disservice to users who
> > > > might be trying to debug something else upstream, bisecting upstream
> > > > kernels or testing a patch.  In those cases, people tend to use
> > > > pristine upstream sources without distro patches to verify, and they
> > > > tend to use their existing configs. With this change, their MAX_ORDER
> > > > will drop to 10 from 13 silently.   That can look like a different
> > > > issue enough to ruin a bisect or have them give bad feedback on a
> > > > patch because it introduces a "regression" which is not a regression
> > > > at all, but a config change they couldn't see.
> > >
> > > If we remove EXPERT (as prior to this patch), I'd rather keep the ranges
> > > and avoid having to explain to people why some random MAX_ORDER doesn't
> > > build (keeping the range would also make sense for randconfig, not sure
> > > we got to any conclusion there).
> >
> > Well this doesn't seem to have got anywhere.  I think I'll send the
> > patchset into Linus for the next merge window as-is.  Please let's take
> > a look at this Kconfig presentation issue during the following -rc
> > cycle.
>
> That's fine by me. I have a slight preference to drop EXPERT and keep
> the ranges in, especially if it affects current distro kernels. Debian
> seems to enable EXPERT already in their arm64 kernel config but I'm not
> sure about the Fedora or other distro kernels. If they don't, we can
> fix/revert this Kconfig entry once the merging window is closed.

Fedora and RHEL do not enable EXPERT already.

Justin

Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode

2023-04-19 Thread Marcelo Tosatti

On Thu, Apr 06, 2023 at 03:32:06PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 06, 2023 at 09:49:22AM -0300, Marcelo Tosatti wrote:
> 
> > > > 2) Depends on the application and the definition of "occasional".
> > > > 
> > > > For certain types of applications (for example PLC software or
> > > > RAN processing), upon occurrence of an event, it is necessary to
> > > > complete a certain task in a maximum amount of time (deadline).
> > > 
> > > If the application is properly NOHZ_FULL and never does a kernel entry,
> > > it will never get that IPI. If it is a pile of shit and does kernel
> > > entries while it pretends to be NOHZ_FULL it gets to keep the pieces and
> > > no amount of crying will get me to care.
> > 
> > I suppose its common practice to use certain system calls in latency
> > sensitive applications, for example nanosleep. Some examples:
> > 
> > 1) cyclictest   (nanosleep)
> 
> cyclictest is not a NOHZ_FULL application, if you tihnk it is, you're
> deluded.

On the field (what end-users do on production):

cyclictest runs on NOHZ_FULL cores.
PLC type programs run on NOHZ_FULL cores.

So accordingly to physical reality i observe, i am not deluded.

> > 2) PLC programs (nanosleep)
> 
> What's a PLC? Programmable Logic Circuit?

Programmable logic controller.

> > A system call does not necessarily have to take locks, does it ?
> 
> This all is unrelated to locks

OK.

> > Or even if application does system calls, but runs under a VM,
> > then you are requiring it to never VM-exit.
> 
> That seems to be a goal for performance anyway.

Not sure what you mean.

> > This reduces the flexibility of developing such applications.
> 
> Yeah, that's the cards you're dealt, deal with it.

This is not what happens on the field.

[PATCH] Revert "ASoC: fsl: remove unnecessary dai_link->platform"

2023-04-19 Thread Shengjiu Wang

This reverts commit 33683cbf49b5412061cb1e4c876063fdef86def4.

dai_link->platform is needed. The platform component is
"snd_dmaengine_pcm", which is registered from cpu driver,

If dai_link->platform is not assigned, then platform
component will not be probed, then there will be issue:

aplay: main:831: audio open error: Invalid argument

Signed-off-by: Shengjiu Wang 
---
 sound/soc/fsl/imx-audmix.c | 14 ++
 sound/soc/fsl/imx-spdif.c  |  5 -
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c
index 2c57fe9d2d08..1292a845c424 100644
--- a/sound/soc/fsl/imx-audmix.c
+++ b/sound/soc/fsl/imx-audmix.c
@@ -207,8 +207,8 @@ static int imx_audmix_probe(struct platform_device *pdev)
for (i = 0; i < num_dai; i++) {
struct snd_soc_dai_link_component *dlc;
 
-   /* for CPU/Codec x 2 */
-   dlc = devm_kcalloc(>dev, 4, sizeof(*dlc), GFP_KERNEL);
+   /* for CPU/Codec/Platform x 2 */
+   dlc = devm_kcalloc(>dev, 6, sizeof(*dlc), GFP_KERNEL);
if (!dlc)
return -ENOMEM;
 
@@ -240,9 +240,11 @@ static int imx_audmix_probe(struct platform_device *pdev)
 
priv->dai[i].cpus = [0];
priv->dai[i].codecs = [1];
+   priv->dai[i].platforms = [2];
 
priv->dai[i].num_cpus = 1;
priv->dai[i].num_codecs = 1;
+   priv->dai[i].num_platforms = 1;
 
priv->dai[i].name = dai_name;
priv->dai[i].stream_name = "HiFi-AUDMIX-FE";
@@ -250,6 +252,7 @@ static int imx_audmix_probe(struct platform_device *pdev)
priv->dai[i].codecs->name = "snd-soc-dummy";
priv->dai[i].cpus->of_node = args.np;
priv->dai[i].cpus->dai_name = dev_name(_pdev->dev);
+   priv->dai[i].platforms->of_node = args.np;
priv->dai[i].dynamic = 1;
priv->dai[i].dpcm_playback = 1;
priv->dai[i].dpcm_capture = (i == 0 ? 1 : 0);
@@ -264,17 +267,20 @@ static int imx_audmix_probe(struct platform_device *pdev)
be_cp = devm_kasprintf(>dev, GFP_KERNEL,
   "AUDMIX-Capture-%d", i);
 
-   priv->dai[num_dai + i].cpus = [2];
-   priv->dai[num_dai + i].codecs = [3];
+   priv->dai[num_dai + i].cpus = [3];
+   priv->dai[num_dai + i].codecs = [4];
+   priv->dai[num_dai + i].platforms = [5];
 
priv->dai[num_dai + i].num_cpus = 1;
priv->dai[num_dai + i].num_codecs = 1;
+   priv->dai[num_dai + i].num_platforms = 1;
 
priv->dai[num_dai + i].name = be_name;
priv->dai[num_dai + i].codecs->dai_name = "snd-soc-dummy-dai";
priv->dai[num_dai + i].codecs->name = "snd-soc-dummy";
priv->dai[num_dai + i].cpus->of_node = audmix_np;
priv->dai[num_dai + i].cpus->dai_name = be_name;
+   priv->dai[num_dai + i].platforms->name = "snd-soc-dummy";
priv->dai[num_dai + i].no_pcm = 1;
priv->dai[num_dai + i].dpcm_playback = 1;
priv->dai[num_dai + i].dpcm_capture  = 1;
diff --git a/sound/soc/fsl/imx-spdif.c b/sound/soc/fsl/imx-spdif.c
index 114b49660193..4446fba755b9 100644
--- a/sound/soc/fsl/imx-spdif.c
+++ b/sound/soc/fsl/imx-spdif.c
@@ -26,7 +26,7 @@ static int imx_spdif_audio_probe(struct platform_device *pdev)
}
 
data = devm_kzalloc(>dev, sizeof(*data), GFP_KERNEL);
-   comp = devm_kzalloc(>dev, 2 * sizeof(*comp), GFP_KERNEL);
+   comp = devm_kzalloc(>dev, 3 * sizeof(*comp), GFP_KERNEL);
if (!data || !comp) {
ret = -ENOMEM;
goto end;
@@ -34,15 +34,18 @@ static int imx_spdif_audio_probe(struct platform_device 
*pdev)
 
data->dai.cpus  = [0];
data->dai.codecs= [1];
+   data->dai.platforms = [2];
 
data->dai.num_cpus  = 1;
data->dai.num_codecs= 1;
+   data->dai.num_platforms = 1;
 
data->dai.name = "S/PDIF PCM";
data->dai.stream_name = "S/PDIF PCM";
data->dai.codecs->dai_name = "snd-soc-dummy-dai";
data->dai.codecs->name = "snd-soc-dummy";
data->dai.cpus->of_node = spdif_np;
+   data->dai.platforms->of_node = spdif_np;
data->dai.playback_only = true;
data->dai.capture_only = true;
 
-- 
2.34.1

Re: [PATCH v3 02/14] arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER

2023-04-19 Thread Catalin Marinas

On Tue, Apr 18, 2023 at 03:05:57PM -0700, Andrew Morton wrote:
> On Wed, 12 Apr 2023 18:27:08 +0100 Catalin Marinas  
> wrote:
> > > It sounds nice in theory. In practice. EXPERT hides too much. When you
> > > flip expert, you expose over a 175ish new config options which are
> > > hidden behind EXPERT.  You don't have to know what you are doing just
> > > with the MAX_ORDER, but a whole bunch more as well.  If everyone were
> > > already running 10, this might be less of a problem. At least Fedora
> > > and RHEL are running 13 for 4K pages on aarch64. This was not some
> > > accidental choice, we had to carry a patch to even allow it for a
> > > while.  If this does go in as is, we will likely just carry a patch to
> > > remove the "if EXPERT", but that is a bit of a disservice to users who
> > > might be trying to debug something else upstream, bisecting upstream
> > > kernels or testing a patch.  In those cases, people tend to use
> > > pristine upstream sources without distro patches to verify, and they
> > > tend to use their existing configs. With this change, their MAX_ORDER
> > > will drop to 10 from 13 silently.   That can look like a different
> > > issue enough to ruin a bisect or have them give bad feedback on a
> > > patch because it introduces a "regression" which is not a regression
> > > at all, but a config change they couldn't see.
> > 
> > If we remove EXPERT (as prior to this patch), I'd rather keep the ranges
> > and avoid having to explain to people why some random MAX_ORDER doesn't
> > build (keeping the range would also make sense for randconfig, not sure
> > we got to any conclusion there).
> 
> Well this doesn't seem to have got anywhere.  I think I'll send the
> patchset into Linus for the next merge window as-is.  Please let's take
> a look at this Kconfig presentation issue during the following -rc
> cycle.

That's fine by me. I have a slight preference to drop EXPERT and keep
the ranges in, especially if it affects current distro kernels. Debian
seems to enable EXPERT already in their arm64 kernel config but I'm not
sure about the Fedora or other distro kernels. If they don't, we can
fix/revert this Kconfig entry once the merging window is closed.

-- 
Catalin

Re: [PATCH] ASoC: fsl: Simplify an error message

2023-04-19 Thread Shengjiu Wang

On Sun, Apr 16, 2023 at 2:29 PM Christophe JAILLET <
christophe.jail...@wanadoo.fr> wrote:

> dev_err_probe() already display the error code. There is no need to
> duplicate it explicitly in the error message.
>
> Signed-off-by: Christophe JAILLET 
>

Acked-by: Shengjiu Wang 

Best regards
wang shengjiu

> ---
>  sound/soc/fsl/fsl-asoc-card.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
> index bffa1048d31e..40870668ee24 100644
> --- a/sound/soc/fsl/fsl-asoc-card.c
> +++ b/sound/soc/fsl/fsl-asoc-card.c
> @@ -858,7 +858,7 @@ static int fsl_asoc_card_probe(struct platform_device
> *pdev)
>
> ret = devm_snd_soc_register_card(>dev, >card);
> if (ret) {
> -   dev_err_probe(>dev, ret, "snd_soc_register_card
> failed: %d\n", ret);
> +   dev_err_probe(>dev, ret, "snd_soc_register_card
> failed\n");
> goto asrc_fail;
> }
>
> --
> 2.34.1
>
>

Re: [PATCH] ASoC: fsl_asrc_dma: fix potential null-ptr-deref

2023-04-19 Thread Shengjiu Wang

On Mon, Apr 17, 2023 at 9:33 PM Nikita Zhandarovich <
n.zhandarov...@fintech.ru> wrote:

> dma_request_slave_channel() may return NULL which will lead to
> NULL pointer dereference error in 'tmp_chan->private'.
>
> Correct this behaviour by, first, switching from deprecated function
> dma_request_slave_channel() to dma_request_chan(). Secondly, enable
> sanity check for the resuling value of dma_request_chan().
> Also, fix description that follows the enacted changes and that
> concerns the use of dma_request_slave_channel().
>
> Fixes: 706e2c881158 ("ASoC: fsl_asrc_dma: Reuse the dma channel if
> available in Back-End")
> Co-developed-by: Natalia Petrova 
> Signed-off-by: Nikita Zhandarovich 
>

Acked-by: Shengjiu Wang 

Best regards
wang shengjiu

> ---
>  sound/soc/fsl/fsl_asrc_dma.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
> index 3b81a465814a..05a7d1588d20 100644
> --- a/sound/soc/fsl/fsl_asrc_dma.c
> +++ b/sound/soc/fsl/fsl_asrc_dma.c
> @@ -209,14 +209,19 @@ static int fsl_asrc_dma_hw_params(struct
> snd_soc_component *component,
> be_chan =
> soc_component_to_pcm(component_be)->chan[substream->stream];
> tmp_chan = be_chan;
> }
> -   if (!tmp_chan)
> -   tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" :
> "rx");
> +   if (!tmp_chan) {
> +   tmp_chan = dma_request_chan(dev_be, tx ? "tx" : "rx");
> +   if (IS_ERR(tmp_chan)) {
> +   dev_err(dev, "failed to request DMA channel for
> Back-End\n");
> +   return -EINVAL;
> +   }
> +   }
>
> /*
>  * An EDMA DEV_TO_DEV channel is fixed and bound with DMA event of
> each
>  * peripheral, unlike SDMA channel that is allocated dynamically.
> So no
>  * need to configure dma_request and dma_request2, but get
> dma_chan of
> -* Back-End device directly via dma_request_slave_channel.
> +* Back-End device directly via dma_request_chan.
>  */
> if (!asrc->use_edma) {
> /* Get DMA request of Back-End */
>

Re: [PATCH] ASoC: fsl_sai: Fix pins setting for i.MX8QM platform

2023-04-19 Thread Shengjiu Wang

On Tue, Apr 18, 2023 at 5:44 PM Chancel Liu  wrote:

> SAI on i.MX8QM platform supports the data lines up to 4. So the pins
> setting should be corrected to 4.
>
> Fixes: eba0f0077519 ("ASoC: fsl_sai: Enable combine mode soft")
> Signed-off-by: Chancel Liu 
>

Acked-by: Shengjiu Wang 

Best regards
wang shengjiu

> ---
>  sound/soc/fsl/fsl_sai.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
> index 07d13dca852e..abdaffb00fbd 100644
> --- a/sound/soc/fsl/fsl_sai.c
> +++ b/sound/soc/fsl/fsl_sai.c
> @@ -1544,7 +1544,7 @@ static const struct fsl_sai_soc_data
> fsl_sai_imx8qm_data = {
> .use_imx_pcm = true,
> .use_edma = true,
> .fifo_depth = 64,
> -   .pins = 1,
> +   .pins = 4,
> .reg_offset = 0,
> .mclk0_is_mclk1 = false,
> .flags = 0,
> --
> 2.25.1
>
>

Re: [PATCH 01/33] s390: Use _pt_s390_gaddr for gmap address tracking

2023-04-19 Thread Vishal Moola

On Tue, Apr 18, 2023 at 8:45 AM David Hildenbrand  wrote:
>
> On 17.04.23 22:50, Vishal Moola (Oracle) wrote:
> > s390 uses page->index to keep track of page tables for the guest address
> > space. In an attempt to consolidate the usage of page fields in s390,
> > replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.
> >
> > This will help with the splitting of struct ptdesc from struct page, as
> > well as allow s390 to use _pt_frag_refcount for fragmented page table
> > tracking.
> >
> > Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
> > before freeing the pages as well.
> >
> > Signed-off-by: Vishal Moola (Oracle) 
> > ---
>
> [...]
>
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 3fc9e680f174..2616d64c0e8c 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -144,7 +144,7 @@ struct page {
> >   struct {/* Page table pages */
> >   unsigned long _pt_pad_1;/* compound_head */
> >   pgtable_t pmd_huge_pte; /* protected by page->ptl */
> > - unsigned long _pt_pad_2;/* mapping */
> > + unsigned long _pt_s390_gaddr;   /* mapping */
> >   union {
> >   struct mm_struct *pt_mm; /* x86 pgds only */
> >   atomic_t pt_frag_refcount; /* powerpc */
>
> The confusing part is, that these gmap page tables are not ordinary
> process page tables that we would ordinarily place into this section
> here. That's why they are also not allocated/freed using the typical
> page table constructor/destructor ...

I initially thought the same, so I was quite confused when I saw
__gmap_segment_gaddr was using pmd_pgtable_page().

Although they are not ordinary process page tables, since we
eventually want to move them out of struct page, I think shifting them
to be in ptdescs, being a memory descriptor for page tables, makes
the most sense.

Another option is to leave pmd_pgtable_page() as is just for this case.
Or we can revert commit 7e25de77bc5ea which uses the function here
then figure out where these gmap pages table pages will go later.

[PATCH] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

2023-04-19 Thread Gaurav Batra

When DMA window is backed by 2MB TCEs, the DMA address for the mapped
page should be the offset of the page relative to the 2MB TCE. The code
was incorrectly setting the DMA address to the beginning of the TCE
range.

Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
ethernet port, when DMA window is backed by 2MB TCEs.

Signed-off-by: Gaurav Batra 

Reviewed-by: Greg Joyce 
Reviewed-by: Brian King 
---
 arch/powerpc/kernel/iommu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ee95937bdaf1..ca57526ce47a 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
/* Convert entry to a dma_addr_t */
entry += tbl->it_offset;
dma_addr = entry << tbl->it_page_shift;
-   dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
+   dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
 
DBG("  - %lu pages, entry: %lx, dma_addr: %lx\n",
npages, entry, dma_addr);
@@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
unsigned int order;
unsigned int nio_pages, io_order;
struct page *page;
+   int tcesize = (1 << tbl->it_page_shift);
 
size = PAGE_ALIGN(size);
order = get_order(size);
@@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
memset(ret, 0, size);
 
/* Set up tces to cover the allocated range */
-   nio_pages = size >> tbl->it_page_shift;
+   nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
+
io_order = get_iommu_order(size, tbl);
mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
  mask >> tbl->it_page_shift, io_order, 0);
@@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
free_pages((unsigned long)ret, order);
return NULL;
}
-   *dma_handle = mapping;
+
+   *dma_handle = mapping | ((u64)ret & (tcesize - 1));
return ret;
 }
 
--

Re: [PATCH v9 4/4] tpm/kexec: Duplicate TPM measurement log in of-tree for kexec

2023-04-19 Thread Simon Horman

On Tue, Apr 18, 2023 at 09:44:09AM -0400, Stefan Berger wrote:
> The memory area of the TPM measurement log is currently not properly
> duplicated for carrying it across kexec when an Open Firmware
> Devicetree is used. Therefore, the contents of the log get corrupted.
> Fix this for the kexec_file_load() syscall by allocating a buffer and
> copying the contents of the existing log into it. The new buffer is
> preserved across the kexec and a pointer to it is available when the new
> kernel is started. To achieve this, store the allocated buffer's address
> in the flattened device tree (fdt) under the name linux,tpm-kexec-buffer
> and search for this entry early in the kernel startup before the TPM
> subsystem starts up. Adjust the pointer in the of-tree stored under
> linux,sml-base to point to this buffer holding the preserved log. The TPM
> driver can then read the base address from this entry when making the log
> available. Invalidate the log by removing 'linux,sml-base' from the
> devicetree if anything goes wrong with updating the buffer.
> 
> Use subsys_initcall() to call the function to restore the buffer even if
> the TPM subsystem or driver are not used. This allows the buffer to be
> carried across the next kexec without involvement of the TPM subsystem
> and ensures a valid buffer pointed to by the of-tree.

Hi Stefan,

some minor feedback from my side.

> Use the subsys_initcall(), rather than an ealier initcall, since

nit via checkpatch.pl --codespell: s/ealier/earlier/

> page_is_ram() in get_kexec_buffer() only starts working at this stage.
> 
> Signed-off-by: Stefan Berger 
> Cc: Rob Herring 
> Cc: Frank Rowand 
> Cc: Eric Biederman 
> Tested-by: Nageswara R Sastry 
> Tested-by: Coiby Xu 
> Reviewed-by: Rob Herring 

...

> +void tpm_add_kexec_buffer(struct kimage *image)
> +{
> + struct kexec_buf kbuf = { .image = image, .buf_align = 1,
> +   .buf_min = 0, .buf_max = ULONG_MAX,
> +   .top_down = true };
> + struct device_node *np;
> + void *buffer;
> + u32 size;
> + u64 base;
> + int ret;
> +
> + if (!IS_ENABLED(CONFIG_PPC64))
> + return;
> +
> + np = of_find_node_by_name(NULL, "vtpm");
> + if (!np)
> + return;
> +
> + if (of_tpm_get_sml_parameters(np, , ) < 0)
> + return;
> +
> + buffer = vmalloc(size);
> + if (!buffer)
> + return;
> + memcpy(buffer, __va(base), size);
> +
> + kbuf.buffer = buffer;
> + kbuf.bufsz = size;
> + kbuf.memsz = size;
> + ret = kexec_add_buffer();
> + if (ret) {
> + pr_err("Error passing over kexec TPM measurement log buffer: 
> %d\n",
> +ret);

Does buffer need to be freed here?

> + return;
> + }
> +
> + image->tpm_buffer = buffer;
> + image->tpm_buffer_addr = kbuf.mem;
> + image->tpm_buffer_size = size;
> +}

[PATCH 33/33] mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

2023-04-19 Thread Vishal Moola (Oracle)

These functions are no longer necessary. Remove them and cleanup
Documentation referencing them.

Signed-off-by: Vishal Moola (Oracle) 
---
 Documentation/mm/split_page_table_lock.rst| 12 +--
 .../zh_CN/mm/split_page_table_lock.rst| 14 ++---
 include/linux/mm.h| 20 ---
 3 files changed, 13 insertions(+), 33 deletions(-)

diff --git a/Documentation/mm/split_page_table_lock.rst 
b/Documentation/mm/split_page_table_lock.rst
index 50ee0dfc95be..b3c612183135 100644
--- a/Documentation/mm/split_page_table_lock.rst
+++ b/Documentation/mm/split_page_table_lock.rst
@@ -53,7 +53,7 @@ Support of split page table lock by an architecture
 ===
 
 There's no need in special enabling of PTE split page table lock: everything
-required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
+required is done by ptdesc_pte_ctor() and ptdesc_pte_dtor(), which
 must be called on PTE table allocation / freeing.
 
 Make sure the architecture doesn't use slab allocator for page table
@@ -63,8 +63,8 @@ This field shares storage with page->ptl.
 PMD split lock only makes sense if you have more than two page table
 levels.
 
-PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
-allocation and pgtable_pmd_page_dtor() on freeing.
+PMD split lock enabling requires ptdesc_pmd_ctor() call on PMD table
+allocation and ptdesc_pmd_dtor() on freeing.
 
 Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
 pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
@@ -72,7 +72,7 @@ paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
 
 With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
 
-NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
+NOTE: ptdesc_pte_ctor() and ptdesc_pmd_ctor() can fail -- it must
 be handled properly.
 
 page->ptl
@@ -92,7 +92,7 @@ trick:
split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
one more cache line for indirect access;
 
-The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
-pgtable_pmd_page_ctor() for PMD table.
+The spinlock_t allocated in ptdesc_pte_ctor() for PTE table and in
+ptdesc_pmd_ctor() for PMD table.
 
 Please, never access page->ptl directly -- use appropriate helper.
diff --git a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst 
b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
index 4fb7aa666037..a3323eb9dc40 100644
--- a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
+++ b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
@@ -56,16 +56,16 @@ Hugetlb特定的辅助函数:
 架构对分页表锁的支持
 
 
-没有必要特别启用PTE分页表锁：所有需要的东西都由pgtable_pte_page_ctor()
-和pgtable_pte_page_dtor()完成，它们必须在PTE表分配/释放时被调用。
+没有必要特别启用PTE分页表锁：所有需要的东西都由ptdesc_pte_ctor()
+和ptdesc_pte_dtor()完成，它们必须在PTE表分配/释放时被调用。
 
 确保架构不使用slab分配器来分配页表：slab使用page->slab_cache来分配其页
 面。这个区域与page->ptl共享存储。
 
 PMD分页锁只有在你有两个以上的页表级别时才有意义。
 
-启用PMD分页锁需要在PMD表分配时调用pgtable_pmd_page_ctor()，在释放时调
-用pgtable_pmd_page_dtor()。
+启用PMD分页锁需要在PMD表分配时调用ptdesc_pmd_ctor()，在释放时调
+用ptdesc_pmd_dtor()。
 
 分配通常发生在pmd_alloc_one()中，释放发生在pmd_free()和pmd_free_tlb()
 中，但要确保覆盖所有的PMD表分配/释放路径：即X86_PAE在pgd_alloc()中预先
@@ -73,7 +73,7 @@ PMD分页锁只有在你有两个以上的页表级别时才有意义。
 
 一切就绪后，你可以设置CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK。
 
-注意：pgtable_pte_page_ctor()和pgtable_pmd_page_ctor()可能失败--必
+注意：ptdesc_pte_ctor()和ptdesc_pmd_ctor()可能失败--必
 须正确处理。
 
 page->ptl
@@ -90,7 +90,7 @@ page->ptl用于访问分割页表锁，其中'page'是包含该表的页面struc
的指针并动态分配它。这允许在启用DEBUG_SPINLOCK或DEBUG_LOCK_ALLOC的
情况下使用分页锁，但由于间接访问而多花了一个缓存行。
 
-PTE表的spinlock_t分配在pgtable_pte_page_ctor()中，PMD表的spinlock_t
-分配在pgtable_pmd_page_ctor()中。
+PTE表的spinlock_t分配在ptdesc_pte_ctor()中，PMD表的spinlock_t
+分配在ptdesc_pmd_ctor()中。
 
 请不要直接访问page->ptl - -使用适当的辅助函数。
diff --git a/include/linux/mm.h b/include/linux/mm.h
index cb136d2fdf74..e08638dc58cf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2858,11 +2858,6 @@ static inline bool ptdesc_pte_ctor(struct ptdesc *ptdesc)
return true;
 }
 
-static inline bool pgtable_pte_page_ctor(struct page *page)
-{
-   return ptdesc_pte_ctor(page_ptdesc(page));
-}
-
 static inline void ptdesc_pte_dtor(struct ptdesc *ptdesc)
 {
struct folio *folio = ptdesc_folio(ptdesc);
@@ -2872,11 +2867,6 @@ static inline void ptdesc_pte_dtor(struct ptdesc *ptdesc)
lruvec_stat_sub_folio(folio, NR_PAGETABLE);
 }
 
-static inline void pgtable_pte_page_dtor(struct page *page)
-{
-   ptdesc_pte_dtor(page_ptdesc(page));
-}
-
 #define pte_offset_map_lock(mm, pmd, address, ptlp)\
 ({ \
spinlock_t *__ptl = pte_lockptr(mm, pmd);   \
@@ -2967,11 +2957,6 @@ static inline bool ptdesc_pmd_ctor(struct ptdesc *ptdesc)
return true;
 }
 
-static inline bool pgtable_pmd_page_ctor(struct page *page)
-{
-

[PATCH 31/33] sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents

2023-04-19 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable pte constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/sparc/mm/srmmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
index 13f027afc875..964938aa7b88 100644
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -355,7 +355,8 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
return NULL;
page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
spin_lock(>page_table_lock);
-   if (page_ref_inc_return(page) == 2 && !pgtable_pte_page_ctor(page)) {
+   if (page_ref_inc_return(page) == 2 &&
+   !ptdesc_pte_ctor(page_ptdesc(page))) {
page_ref_dec(page);
ptep = NULL;
}
@@ -371,7 +372,7 @@ void pte_free(struct mm_struct *mm, pgtable_t ptep)
page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
spin_lock(>page_table_lock);
if (page_ref_dec_return(page) == 1)
-   pgtable_pte_page_dtor(page);
+   ptdesc_pte_dtor(page_ptdesc(page));
spin_unlock(>page_table_lock);
 
srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
-- 
2.39.2

[PATCH 32/33] um: Convert {pmd, pte}_free_tlb() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents. Also cleans up some spacing issues.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/um/include/asm/pgalloc.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
index 8ec7cd46dd96..760b029505c1 100644
--- a/arch/um/include/asm/pgalloc.h
+++ b/arch/um/include/asm/pgalloc.h
@@ -25,19 +25,19 @@
  */
 extern pgd_t *pgd_alloc(struct mm_struct *);
 
-#define __pte_free_tlb(tlb,pte, address)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb),(pte));   \
+#define __pte_free_tlb(tlb, pte, address)  \
+do {   \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #ifdef CONFIG_3_LEVEL_PGTABLES
 
-#define __pmd_free_tlb(tlb, pmd, address)  \
-do {   \
-   pgtable_pmd_page_dtor(virt_to_page(pmd));   \
-   tlb_remove_page((tlb),virt_to_page(pmd));   \
-} while (0)\
+#define __pmd_free_tlb(tlb, pmd, address)  \
+do {   \
+   ptdesc_pmd_dtor(virt_to_ptdesc(pmd));   \
+   tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
+} while (0)
 
 #endif
 
-- 
2.39.2

[PATCH 30/33] sparc64: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/sparc/mm/init_64.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 04f9db0c3111..eedb3e03b1fe 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2893,14 +2893,15 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 
 pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-   struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!page)
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_KERNEL | __GFP_ZERO, 0);
+
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(page)) {
-   __free_page(page);
+   if (!ptdesc_pte_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
return NULL;
}
-   return (pte_t *) page_address(page);
+   return (pte_t *) ptdesc_address(ptdesc);
 }
 
 void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
@@ -2910,10 +2911,10 @@ void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 
 static void __pte_free(pgtable_t pte)
 {
-   struct page *page = virt_to_page(pte);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pte);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   ptdesc_pte_dtor(ptdesc);
+   ptdesc_free(ptdesc);
 }
 
 void pte_free(struct mm_struct *mm, pgtable_t pte)
-- 
2.39.2

[PATCH 29/33] sh: Convert pte_free_tlb() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents. Also cleans up some spacing issues.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/sh/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
index a9e98233c4d4..30e1823d2347 100644
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -31,10 +31,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
set_pmd(pmd, __pmd((unsigned long)page_address(pte)));
 }
 
-#define __pte_free_tlb(tlb,pte,addr)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif /* __ASM_SH_PGALLOC_H */
-- 
2.39.2

[PATCH 27/33] openrisc: Convert __pte_free_tlb() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/openrisc/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/openrisc/include/asm/pgalloc.h 
b/arch/openrisc/include/asm/pgalloc.h
index b7b2b8d16fad..14e641686281 100644
--- a/arch/openrisc/include/asm/pgalloc.h
+++ b/arch/openrisc/include/asm/pgalloc.h
@@ -66,10 +66,10 @@ extern inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
 
-#define __pte_free_tlb(tlb, pte, addr) \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif
-- 
2.39.2

[PATCH 28/33] riscv: Convert alloc_{pmd, pte}_late() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/riscv/include/asm/pgalloc.h |  8 
 arch/riscv/mm/init.c | 16 ++--
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 59dc12b5b7e8..cb5536403bd8 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -153,10 +153,10 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-#define __pte_free_tlb(tlb, pte, buf)   \
-do {\
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, buf)  \
+do {   \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 #endif /* CONFIG_MMU */
 
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 0f14f4a8d179..2737cbc4ad12 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -346,12 +346,10 @@ static inline phys_addr_t __init 
alloc_pte_fixmap(uintptr_t va)
 
 static phys_addr_t __init alloc_pte_late(uintptr_t va)
 {
-   unsigned long vaddr;
-
-   vaddr = __get_free_page(GFP_KERNEL);
-   BUG_ON(!vaddr || !pgtable_pte_page_ctor(virt_to_page(vaddr)));
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_KERNEL, 0);
 
-   return __pa(vaddr);
+   BUG_ON(!ptdesc || !ptdesc_pte_ctor(ptdesc));
+   return __pa((pte_t *)ptdesc_address(ptdesc));
 }
 
 static void __init create_pte_mapping(pte_t *ptep,
@@ -429,12 +427,10 @@ static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va)
 
 static phys_addr_t __init alloc_pmd_late(uintptr_t va)
 {
-   unsigned long vaddr;
-
-   vaddr = __get_free_page(GFP_KERNEL);
-   BUG_ON(!vaddr || !pgtable_pmd_page_ctor(virt_to_page(vaddr)));
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_KERNEL, 0);
 
-   return __pa(vaddr);
+   BUG_ON(!ptdesc || !ptdesc_pmd_ctor(ptdesc));
+   return __pa((pmd_t *)ptdesc_address(ptdesc));
 }
 
 static void __init create_pmd_mapping(pmd_t *pmdp,
-- 
2.39.2

[PATCH 26/33] nios2: Convert __pte_free_tlb() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/nios2/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/nios2/include/asm/pgalloc.h b/arch/nios2/include/asm/pgalloc.h
index ecd1657bb2ce..ed868f4c0ca9 100644
--- a/arch/nios2/include/asm/pgalloc.h
+++ b/arch/nios2/include/asm/pgalloc.h
@@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
 
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
-#define __pte_free_tlb(tlb, pte, addr) \
-   do {\
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+   do {\
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
} while (0)
 
 #endif /* _ASM_NIOS2_PGALLOC_H */
-- 
2.39.2

[PATCH 25/33] mips: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/mips/include/asm/pgalloc.h | 31 +--
 arch/mips/mm/pgtable.c  |  7 ---
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index f72e737dda21..7f7cc3140b27 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -51,13 +51,13 @@ extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-   free_pages((unsigned long)pgd, PGD_TABLE_ORDER);
+   ptdesc_free(virt_to_ptdesc(pgd));
 }
 
-#define __pte_free_tlb(tlb,pte,address)\
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, address)  \
+do {   \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -65,18 +65,18 @@ do {
\
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pmd_t *pmd;
-   struct page *pg;
+   struct ptdesc *ptdesc;
 
-   pg = alloc_pages(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
-   if (!pg)
+   ptdesc = ptdesc_alloc(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
+   if (!ptdesc)
return NULL;
 
-   if (!pgtable_pmd_page_ctor(pg)) {
-   __free_pages(pg, PMD_TABLE_ORDER);
+   if (!ptdesc_pmd_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
return NULL;
}
 
-   pmd = (pmd_t *)page_address(pg);
+   pmd = (pmd_t *)ptdesc_address(ptdesc);
pmd_init(pmd);
return pmd;
 }
@@ -90,10 +90,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pud_t *pud;
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_KERNEL, PUD_TABLE_ORDER);
 
-   pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
-   if (pud)
-   pud_init(pud);
+   if (!ptdesc)
+   return NULL;
+   pud = (pud_t *)ptdesc_address(ptdesc);
+
+   pud_init(pud);
return pud;
 }
 
diff --git a/arch/mips/mm/pgtable.c b/arch/mips/mm/pgtable.c
index b13314be5d0e..d626db9ac224 100644
--- a/arch/mips/mm/pgtable.c
+++ b/arch/mips/mm/pgtable.c
@@ -10,10 +10,11 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   pgd_t *ret, *init;
+   pgd_t *init, *ret = NULL;
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_KERNEL, PGD_TABLE_ORDER);
 
-   ret = (pgd_t *) __get_free_pages(GFP_KERNEL, PGD_TABLE_ORDER);
-   if (ret) {
+   if (ptdesc) {
+   ret = (pgd_t *) ptdesc_address(ptdesc);
init = pgd_offset(_mm, 0UL);
pgd_init(ret);
memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
-- 
2.39.2

[PATCH 24/33] m68k: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/m68k/include/asm/mcf_pgalloc.h  | 41 ++--
 arch/m68k/include/asm/sun3_pgalloc.h |  8 +++---
 arch/m68k/mm/motorola.c  |  4 +--
 3 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/arch/m68k/include/asm/mcf_pgalloc.h 
b/arch/m68k/include/asm/mcf_pgalloc.h
index 5c2c0a864524..b0e909e23e14 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -7,20 +7,19 @@
 
 extern inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long) pte);
+   ptdesc_free(virt_to_ptdesc(pte));
 }
 
 extern const char bad_pmd_string[];
 
 extern inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   unsigned long page = __get_free_page(GFP_DMA);
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_DMA | __GFP_ZERO, 0);
 
-   if (!page)
+   if (!ptdesc)
return NULL;
 
-   memset((void *)page, 0, PAGE_SIZE);
-   return (pte_t *) (page);
+   return (pte_t *) (ptdesc_address(ptdesc));
 }
 
 extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned long address)
@@ -35,36 +34,36 @@ extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned 
long address)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
  unsigned long address)
 {
-   struct page *page = virt_to_page(pgtable);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   ptdesc_pte_dtor(ptdesc);
+   ptdesc_free(ptdesc);
 }
 
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-   struct page *page = alloc_pages(GFP_DMA, 0);
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_DMA, 0);
pte_t *pte;
 
-   if (!page)
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(page)) {
-   __free_page(page);
+   if (!ptdesc_pte_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
return NULL;
}
 
-   pte = page_address(page);
-   clear_page(pte);
+   pte = ptdesc_address(ptdesc);
+   ptdesc_clear(pte);
 
return pte;
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
 {
-   struct page *page = virt_to_page(pgtable);
+   struct ptdesc *ptdesc = virt_to_ptdesc(ptdesc);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   ptdesc_pte_dtor(ptdesc);
+   ptdesc_free(ptdesc);
 }
 
 /*
@@ -75,16 +74,18 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t 
pgtable)
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-   free_page((unsigned long) pgd);
+   ptdesc_free(virt_to_ptdesc(pgd));
 }
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
pgd_t *new_pgd;
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_DMA | GFP_NOWARN, 0);
 
-   new_pgd = (pgd_t *)__get_free_page(GFP_DMA | __GFP_NOWARN);
-   if (!new_pgd)
+   if (!ptdesc)
return NULL;
+   new_pgd = (pgd_t *) ptdesc_address(ptdesc);
+
memcpy(new_pgd, swapper_pg_dir, PTRS_PER_PGD * sizeof(pgd_t));
memset(new_pgd, 0, PAGE_OFFSET >> PGDIR_SHIFT);
return new_pgd;
diff --git a/arch/m68k/include/asm/sun3_pgalloc.h 
b/arch/m68k/include/asm/sun3_pgalloc.h
index 198036aff519..013d375fc239 100644
--- a/arch/m68k/include/asm/sun3_pgalloc.h
+++ b/arch/m68k/include/asm/sun3_pgalloc.h
@@ -17,10 +17,10 @@
 
 extern const char bad_pmd_string[];
 
-#define __pte_free_tlb(tlb,pte,addr)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t 
*pte)
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 911301224078..1e47b977bcf1 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -161,7 +161,7 @@ void *get_pointer_table(int type)
 * m68k doesn't have SPLIT_PTE_PTLOCKS for not having
 * SMP.
 */
-   pgtable_pte_page_ctor(virt_to_page(page));
+   ptdesc_pte_ctor(virt_to_ptdesc(page));

[PATCH 22/33] hexagon: Convert __pte_free_tlb() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/hexagon/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/hexagon/include/asm/pgalloc.h 
b/arch/hexagon/include/asm/pgalloc.h
index f0c47e6a7427..0f8432430e68 100644
--- a/arch/hexagon/include/asm/pgalloc.h
+++ b/arch/hexagon/include/asm/pgalloc.h
@@ -87,10 +87,10 @@ static inline void pmd_populate_kernel(struct mm_struct 
*mm, pmd_t *pmd,
max_kernel_seg = pmdindex;
 }
 
-#define __pte_free_tlb(tlb, pte, addr) \
-do {   \
-   pgtable_pte_page_dtor((pte));   \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   ptdesc_pte_dtor((page_ptdesc(pte)));\
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif
-- 
2.39.2

[PATCH 23/33] loongarch: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/loongarch/include/asm/pgalloc.h | 27 +++
 arch/loongarch/mm/pgtable.c  |  7 ---
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/loongarch/include/asm/pgalloc.h 
b/arch/loongarch/include/asm/pgalloc.h
index af1d1e4a6965..1fe074f85b6b 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -45,9 +45,9 @@ extern void pagetable_init(void);
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
 #define __pte_free_tlb(tlb, pte, address)  \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+do {   \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -55,18 +55,18 @@ do {
\
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pmd_t *pmd;
-   struct page *pg;
+   struct ptdesc *ptdesc;
 
-   pg = alloc_page(GFP_KERNEL_ACCOUNT);
-   if (!pg)
+   ptdesc = ptdesc_alloc(GFP_KERNEL_ACCOUNT, 0);
+   if (!ptdesc)
return NULL;
 
-   if (!pgtable_pmd_page_ctor(pg)) {
-   __free_page(pg);
+   if (!ptdesc_pmd_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
return NULL;
}
 
-   pmd = (pmd_t *)page_address(pg);
+   pmd = (pmd_t *)ptdesc_address(ptdesc);
pmd_init(pmd);
return pmd;
 }
@@ -80,10 +80,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pud_t *pud;
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_KERNEL, 0);
 
-   pud = (pud_t *) __get_free_page(GFP_KERNEL);
-   if (pud)
-   pud_init(pud);
+   if (!ptdesc)
+   return NULL;
+   pud = (pud_t *)ptdesc_address(ptdesc);
+
+   pud_init(pud);
return pud;
 }
 
diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
index 36a6dc0148ae..ff07b8f1ef30 100644
--- a/arch/loongarch/mm/pgtable.c
+++ b/arch/loongarch/mm/pgtable.c
@@ -11,10 +11,11 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   pgd_t *ret, *init;
+   pgd_t *init, *ret = NULL;
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_KERNEL, 0);
 
-   ret = (pgd_t *) __get_free_page(GFP_KERNEL);
-   if (ret) {
+   if (ptdesc) {
+   ret = (pgd_t *)ptdesc_address(ptdesc);
init = pgd_offset(_mm, 0UL);
pgd_init(ret);
memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
-- 
2.39.2

[PATCH 21/33] csky: Convert __pte_free_tlb() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/csky/include/asm/pgalloc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 7d57e5da0914..af26f1191b43 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -63,8 +63,8 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 #define __pte_free_tlb(tlb, pte, address)  \
 do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page(tlb, pte);  \
+   ptdesc_pte_dtor(page_ptdesc(pte));  \
+   tlb_remove_page_ptdesc(tlb, page_ptdesc(pte));  \
 } while (0)
 
 extern void pagetable_init(void);
-- 
2.39.2

[PATCH 20/33] arm64: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/arm64/include/asm/tlb.h | 14 --
 arch/arm64/mm/mmu.c  |  7 ---
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..6cb70c247e30 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
  unsigned long addr)
 {
-   pgtable_pte_page_dtor(pte);
-   tlb_remove_table(tlb, pte);
+   struct ptdesc *ptdesc = page_ptdesc(pte);
+
+   ptdesc_pte_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 2
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
  unsigned long addr)
 {
-   struct page *page = virt_to_page(pmdp);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pgtable_pmd_page_dtor(page);
-   tlb_remove_table(tlb, page);
+   ptdesc_pmd_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif
 
@@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
  unsigned long addr)
 {
-   tlb_remove_table(tlb, virt_to_page(pudp));
+   tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
 }
 #endif
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index af6bc8403ee4..5ba005fd607e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
 static phys_addr_t pgd_pgtable_alloc(int shift)
 {
phys_addr_t pa = __pgd_pgtable_alloc(shift);
+   struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
 
/*
 * Call proper page table ctor in case later we need to
@@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
 * this pre-allocated page table.
 *
 * We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
-* folded, and if so pgtable_pmd_page_ctor() becomes nop.
+* folded, and if so ptdesc_pte_dtor() becomes nop.
 */
if (shift == PAGE_SHIFT)
-   BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
+   BUG_ON(!ptdesc_pte_dtor(ptdesc));
else if (shift == PMD_SHIFT)
-   BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
+   BUG_ON(!ptdesc_pte_dtor(ptdesc));
 
return pa;
 }
-- 
2.39.2

[PATCH 19/33] arm: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

late_alloc() also uses the __get_free_pages() helper function. Convert
this to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/arm/include/asm/tlb.h | 12 +++-
 arch/arm/mm/mmu.c  |  6 +++---
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index b8cbe03ad260..9ab8a6929d35 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
 static inline void
 __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
-   pgtable_pte_page_dtor(pte);
+   struct ptdesc *ptdesc = page_ptdesc(pte);
+
+   ptdesc_pte_dtor(ptdesc);
 
 #ifndef CONFIG_ARM_LPAE
/*
@@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
unsigned long addr)
__tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
 #endif
 
-   tlb_remove_table(tlb, pte);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 static inline void
 __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
 {
 #ifdef CONFIG_ARM_LPAE
-   struct page *page = virt_to_page(pmdp);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pgtable_pmd_page_dtor(page);
-   tlb_remove_table(tlb, page);
+   ptdesc_pmd_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 #endif
 }
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 463fc2a8448f..7add505bd797 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -737,11 +737,11 @@ static void __init *early_alloc(unsigned long sz)
 
 static void *__init late_alloc(unsigned long sz)
 {
-   void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
+   void *ptdesc = ptdesc_alloc(GFP_PGTABLE_KERNEL, get_order(sz));
 
-   if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
+   if (!ptdesc || !ptdesc_pte_ctor(ptdesc))
BUG();
-   return ptr;
+   return ptdesc;
 }
 
 static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
-- 
2.39.2

[PATCH 18/33] pgalloc: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/asm-generic/pgalloc.h | 62 +--
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index a7cf825befae..7d4a1f5d3c17 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -18,7 +18,11 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+   struct ptdesc *ptdesc = ptdesc_alloc(GFP_PGTABLE_KERNEL, 0);
+
+   if (!ptdesc)
+   return NULL;
+   return (pte_t *)ptdesc_address(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +45,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
*mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long)pte);
+   ptdesc_free(virt_to_ptdesc(pte));
 }
 
 /**
@@ -49,7 +53,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
pte_t *pte)
  * @mm: the mm_struct of the current context
  * @gfp: GFP flags to use for the allocation
  *
- * Allocates a page and runs the pgtable_pte_page_ctor().
+ * Allocates a ptdesc and runs the ptdesc_pte_ctor().
  *
  * This function is intended for architectures that need
  * anything beyond simple page allocation or must have custom GFP flags.
@@ -58,17 +62,17 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
pte_t *pte)
  */
 static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
 {
-   struct page *pte;
+   struct ptdesc *ptdesc;
 
-   pte = alloc_page(gfp);
-   if (!pte)
+   ptdesc = ptdesc_alloc(gfp, 0);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(pte)) {
-   __free_page(pte);
+   if (!ptdesc_pte_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
return NULL;
}
 
-   return pte;
+   return ptdesc_page(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
@@ -76,7 +80,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, 
gfp_t gfp)
  * pte_alloc_one - allocate a page for PTE-level user page table
  * @mm: the mm_struct of the current context
  *
- * Allocates a page and runs the pgtable_pte_page_ctor().
+ * Allocates a ptdesc and runs the ptdesc_pte_ctor().
  *
  * Return: `struct page` initialized as page table or %NULL on error
  */
@@ -98,8 +102,10 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
  */
 static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
 {
-   pgtable_pte_page_dtor(pte_page);
-   __free_page(pte_page);
+   struct ptdesc *ptdesc = page_ptdesc(pte_page);
+
+   ptdesc_pte_dtor(ptdesc);
+   ptdesc_free(ptdesc);
 }
 
 
@@ -110,7 +116,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
  * pmd_alloc_one - allocate a page for PMD-level page table
  * @mm: the mm_struct of the current context
  *
- * Allocates a page and runs the pgtable_pmd_page_ctor().
+ * Allocates a ptdesc and runs the ptdesc_pmd_ctor().
  * Allocations use %GFP_PGTABLE_USER in user context and
  * %GFP_PGTABLE_KERNEL in kernel context.
  *
@@ -118,28 +124,30 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
  */
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
gfp_t gfp = GFP_PGTABLE_USER;
 
if (mm == _mm)
gfp = GFP_PGTABLE_KERNEL;
-   page = alloc_page(gfp);
-   if (!page)
+   ptdesc = ptdesc_alloc(gfp, 0);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pmd_page_ctor(page)) {
-   __free_page(page);
+   if (!ptdesc_pmd_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
return NULL;
}
-   return (pmd_t *)page_address(page);
+   return (pmd_t *)ptdesc_address(ptdesc);
 }
 #endif
 
 #ifndef __HAVE_ARCH_PMD_FREE
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
+
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
-   free_page((unsigned long)pmd);
+   ptdesc_pmd_dtor(ptdesc);
+   ptdesc_free(ptdesc);
 }
 #endif
 
@@ -149,11 +157,15 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
 
 static inline pud_t *__pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   gfp_t gfp = GFP_PGTABLE_USER;
+   gfp_t gfp = GFP_PGTABLE_USER | __GFP_ZERO;
+   struct

[PATCH 17/33] mm: Remove page table members from struct page

2023-04-19 Thread Vishal Moola (Oracle)

The page table members are now split out into their own ptdesc struct.
Remove them from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm_types.h | 14 --
 include/linux/pgtable.h  |  3 ---
 2 files changed, 17 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2616d64c0e8c..4355f95abc5a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -141,20 +141,6 @@ struct page {
struct {/* Tail pages of compound page */
unsigned long compound_head;/* Bit zero is set */
};
-   struct {/* Page table pages */
-   unsigned long _pt_pad_1;/* compound_head */
-   pgtable_t pmd_huge_pte; /* protected by page->ptl */
-   unsigned long _pt_s390_gaddr;   /* mapping */
-   union {
-   struct mm_struct *pt_mm; /* x86 pgds only */
-   atomic_t pt_frag_refcount; /* powerpc */
-   };
-#if ALLOC_SPLIT_PTLOCKS
-   spinlock_t *ptl;
-#else
-   spinlock_t ptl;
-#endif
-   };
struct {/* ZONE_DEVICE pages */
/** @pgmap: Points to the hosting device page map. */
struct dev_pagemap *pgmap;
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 7cd803aa38eb..8cacdf1fc411 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -91,9 +91,6 @@ TABLE_MATCH(flags, __page_flags);
 TABLE_MATCH(compound_head, pt_list);
 TABLE_MATCH(compound_head, _pt_pad_1);
 TABLE_MATCH(mapping, _pt_s390_gaddr);
-TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
-TABLE_MATCH(pt_mm, pt_mm);
-TABLE_MATCH(ptl, ptl);
 #undef TABLE_MATCH
 static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
 
-- 
2.39.2

[PATCH 16/33] s390: Convert various pgalloc functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/include/asm/pgalloc.h |   4 +-
 arch/s390/include/asm/tlb.h |   4 +-
 arch/s390/mm/pgalloc.c  | 108 
 3 files changed, 59 insertions(+), 57 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 17eb618f1348..9841481560ae 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long vmaddr)
if (!table)
return NULL;
crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
-   if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
+   if (!ptdesc_pmd_ctor(virt_to_ptdesc(table))) {
crst_table_free(mm, table);
return NULL;
}
@@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
if (mm_pmd_folded(mm))
return;
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   ptdesc_pmd_dtor(virt_to_ptdesc(pmd));
crst_table_free(mm, (unsigned long *) pmd);
 }
 
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index b91f4a9b044c..1388c819b467 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmd,
 {
if (mm_pmd_folded(tlb->mm))
return;
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   ptdesc_pmd_dtor(virt_to_ptdesc(pmd));
__tlb_adjust_range(tlb, address, PAGE_SIZE);
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_puds = 1;
-   tlb_remove_table(tlb, pmd);
+   tlb_remove_ptdesc(tlb, pmd);
 }
 
 /*
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 6b99932abc66..16a29d2cfe85 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
 
 unsigned long *crst_table_alloc(struct mm_struct *mm)
 {
-   struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
+   struct ptdesc *ptdesc = ptdesc(GFP_KERNEL, CRST_ALLOC_ORDER);
 
-   if (!page)
+   if (!ptdesc)
return NULL;
-   arch_set_page_dat(page, CRST_ALLOC_ORDER);
-   return (unsigned long *) page_to_virt(page);
+   arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
+   return (unsigned long *) ptdesc_to_virt(ptdesc);
 }
 
 void crst_table_free(struct mm_struct *mm, unsigned long *table)
 {
-   free_pages((unsigned long)table, CRST_ALLOC_ORDER);
+   ptdesc_free(virt_to_ptdesc(table);
 }
 
 static void __crst_table_upgrade(void *arg)
@@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
unsigned int bits)
 
 struct page *page_table_alloc_pgste(struct mm_struct *mm)
 {
-   struct page *page;
+   struct page *ptdesc;
u64 *table;
 
-   page = alloc_page(GFP_KERNEL);
-   if (page) {
-   table = (u64 *)page_to_virt(page);
+   ptdesc = ptdesc_alloc(GFP_KERNEL, 0);
+   if (ptdesc) {
+   table = (u64 *)ptdesc_to_virt(page);
memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
}
-   return page;
+   return ptdesc_page(ptdesc);
 }
 
 void page_table_free_pgste(struct page *page)
 {
-   __free_page(page);
+   ptdesc_free(page_ptdesc(page));
 }
 
 #endif /* CONFIG_PGSTE */
@@ -230,7 +230,7 @@ void page_table_free_pgste(struct page *page)
 unsigned long *page_table_alloc(struct mm_struct *mm)
 {
unsigned long *table;
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned int mask, bit;
 
/* Try to get a fragment of a 4K page as a 2K page table */
@@ -238,9 +238,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
table = NULL;
spin_lock_bh(>context.lock);
if (!list_empty(>context.pgtable_list)) {
-   page = list_first_entry(>context.pgtable_list,
-   struct page, lru);
-   mask = atomic_read(>pt_frag_refcount);
+   ptdesc = list_first_entry(>context.pgtable_list,
+   struct ptdesc, pt_list);
+   mask = atomic_read(>pt_frag_refcount);
/*
 * The pending removal bits must also be checked.
 * Failure to do so might lead to an impossible
@@ -253,13

[PATCH 14/33] x86: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/x86/mm/pgtable.c | 46 +--
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index afab0bc7862b..9b6f81c8eb32 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
 
 void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 {
-   pgtable_pte_page_dtor(pte);
+   ptdesc_pte_dtor(page_ptdesc(pte));
paravirt_release_pte(page_to_pfn(pte));
paravirt_tlb_remove_table(tlb, pte);
 }
@@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 #if CONFIG_PGTABLE_LEVELS > 2
 void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
-   struct page *page = virt_to_page(pmd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
/*
 * NOTE! For PAE, any changes to the top page-directory-pointer-table
@@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #ifdef CONFIG_X86_PAE
tlb->need_flush_all = 1;
 #endif
-   pgtable_pmd_page_dtor(page);
-   paravirt_tlb_remove_table(tlb, page);
+   ptdesc_pmd_dtor(ptdesc);
+   paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
 }
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 
 static inline void pgd_list_add(pgd_t *pgd)
 {
-   struct page *page = virt_to_page(pgd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
 
-   list_add(>lru, _list);
+   list_add(>pt_list, _list);
 }
 
 static inline void pgd_list_del(pgd_t *pgd)
 {
-   struct page *page = virt_to_page(pgd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
 
-   list_del(>lru);
+   list_del(>pt_list);
 }
 
 #define UNSHARED_PTRS_PER_PGD  \
@@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
 
 static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
 {
-   virt_to_page(pgd)->pt_mm = mm;
+   virt_to_ptdesc(pgd)->pt_mm = mm;
 }
 
 struct mm_struct *pgd_page_get_mm(struct page *page)
 {
-   return page->pt_mm;
+   return page_ptdesc(page)->pt_mm;
 }
 
 static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
@@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
pmd_t *pmd)
 static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
int i;
+   struct ptdesc *ptdesc;
 
for (i = 0; i < count; i++)
if (pmds[i]) {
-   pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
-   free_page((unsigned long)pmds[i]);
+   ptdesc = virt_to_ptdesc(pmds[i]);
+
+   ptdesc_pmd_dtor(ptdesc);
+   ptdesc_free(ptdesc);
mm_dec_nr_pmds(mm);
}
 }
@@ -232,16 +235,21 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
*pmds[], int count)
gfp &= ~__GFP_ACCOUNT;
 
for (i = 0; i < count; i++) {
-   pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
-   if (!pmd)
+   pmd_t *pmd = NULL;
+   struct ptdesc *ptdesc = ptdesc_alloc(gfp, 0);
+
+   if (!ptdesc)
failed = true;
-   if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
-   free_page((unsigned long)pmd);
-   pmd = NULL;
+   if (ptdesc && !ptdesc_pmd_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
+   ptdesc = NULL;
failed = true;
}
-   if (pmd)
+   if (ptdesc) {
mm_inc_nr_pmds(mm);
+   pmd = (pmd_t *)ptdesc_address(ptdesc);
+   }
+
pmds[i] = pmd;
}
 
@@ -838,7 +846,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 
free_page((unsigned long)pmd_sv);
 
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   ptdesc_pmd_dtor(virt_to_ptdesc(pmd));
free_page((unsigned long)pmd);
 
return 1;
-- 
2.39.2

[PATCH 15/33] s390: Convert various gmap functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use ptdesc_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/mm/gmap.c | 230 
 1 file changed, 128 insertions(+), 102 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index a61ea1a491dc..9c6ea1d16e09 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -34,7 +34,7 @@
 static struct gmap *gmap_alloc(unsigned long limit)
 {
struct gmap *gmap;
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned long *table;
unsigned long etype, atype;
 
@@ -67,12 +67,12 @@ static struct gmap *gmap_alloc(unsigned long limit)
spin_lock_init(>guest_table_lock);
spin_lock_init(>shadow_lock);
refcount_set(>ref_count, 1);
-   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
-   if (!page)
+   ptdesc = ptdesc_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+   if (!ptdesc)
goto out_free;
-   page->_pt_s390_gaddr = 0;
-   list_add(>lru, >crst_list);
-   table = page_to_virt(page);
+   ptdesc->_pt_s390_gaddr = 0;
+   list_add(>pt_list, >crst_list);
+   table = ptdesc_to_virt(ptdesc);
crst_table_init(table, etype);
gmap->table = table;
gmap->asce = atype | _ASCE_TABLE_LENGTH |
@@ -181,25 +181,25 @@ static void gmap_rmap_radix_tree_free(struct 
radix_tree_root *root)
  */
 static void gmap_free(struct gmap *gmap)
 {
-   struct page *page, *next;
+   struct ptdesc *ptdesc, *next;
 
/* Flush tlb of all gmaps (if not already done for shadows) */
if (!(gmap_is_shadow(gmap) && gmap->removed))
gmap_flush_tlb(gmap);
/* Free all segment & region tables. */
-   list_for_each_entry_safe(page, next, >crst_list, lru) {
-   page->_pt_s390_gaddr = 0;
-   __free_pages(page, CRST_ALLOC_ORDER);
+   list_for_each_entry_safe(ptdesc, next, >crst_list, pt_list) {
+   ptdesc->_pt_s390_gaddr = 0;
+   ptdesc_free(ptdesc);
}
gmap_radix_tree_free(>guest_to_host);
gmap_radix_tree_free(>host_to_guest);
 
/* Free additional data for a shadow gmap */
if (gmap_is_shadow(gmap)) {
-   /* Free all page tables. */
-   list_for_each_entry_safe(page, next, >pt_list, lru) {
-   page->_pt_s390_gaddr = 0;
-   page_table_free_pgste(page);
+   /* Free all ptdesc tables. */
+   list_for_each_entry_safe(ptdesc, next, >pt_list, pt_list) 
{
+   ptdesc->_pt_s390_gaddr = 0;
+   page_table_free_pgste(ptdesc_page(ptdesc));
}
gmap_rmap_radix_tree_free(>host_to_rmap);
/* Release reference to the parent */
@@ -308,27 +308,27 @@ EXPORT_SYMBOL_GPL(gmap_get_enabled);
 static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
unsigned long init, unsigned long gaddr)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned long *new;
 
/* since we dont free the gmap table until gmap_free we can unlock */
-   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
-   if (!page)
+   ptdesc = ptdesc_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+   if (!ptdesc)
return -ENOMEM;
-   new = page_to_virt(page);
+   new = ptdesc_to_virt(ptdesc);
crst_table_init(new, init);
spin_lock(>guest_table_lock);
if (*table & _REGION_ENTRY_INVALID) {
-   list_add(>lru, >crst_list);
+   list_add(>pt_list, >crst_list);
*table = __pa(new) | _REGION_ENTRY_LENGTH |
(*table & _REGION_ENTRY_TYPE_MASK);
-   page->_pt_s390_gaddr = gaddr;
-   page = NULL;
+   ptdesc->_pt_s390_gaddr = gaddr;
+   ptdesc = NULL;
}
spin_unlock(>guest_table_lock);
-   if (page) {
-   page->_pt_s390_gaddr = 0;
-   __free_pages(page, CRST_ALLOC_ORDER);
+   if (ptdesc) {
+   ptdesc->_pt_s390_gaddr = 0;
+   ptdesc_free(ptdesc);
}
return 0;
 }
@@ -341,13 +341,13 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
long *table,
  */
 static unsigned long __gmap_segment_gaddr(unsigned long *entry)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned long offset;
 
offset = (unsigned long) entry / sizeof(unsigned long);
offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
-   page = pmd_pgtable_page((pmd_t *) entry);
-   return page->_pt_s390_gaddr + offset;
+   ptdesc =

[PATCH 13/33] powerpc: Convert various functions to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/powerpc/mm/book3s64/mmu_context.c | 10 +++---
 arch/powerpc/mm/book3s64/pgtable.c | 32 +-
 arch/powerpc/mm/pgtable-frag.c | 46 +-
 3 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
b/arch/powerpc/mm/book3s64/mmu_context.c
index c766e4c26e42..b22ad2839897 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
 static void pmd_frag_destroy(void *pmd_frag)
 {
int count;
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = virt_to_page(pmd_frag);
+   ptdesc = virt_to_ptdesc(pmd_frag);
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_pmd_page_dtor(page);
-   __free_page(page);
+   if (atomic_sub_and_test(PMD_FRAG_NR - count, 
>pt_frag_refcount)) {
+   ptdesc_pmd_dtor(ptdesc);
+   ptdesc_free(ptdesc);
}
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 85c84e89e3ea..7693be80c0f9 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
 static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 {
void *ret = NULL;
-   struct page *page;
+   struct ptdesc *ptdesc;
gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
 
if (mm == _mm)
gfp &= ~__GFP_ACCOUNT;
-   page = alloc_page(gfp);
-   if (!page)
+   ptdesc = ptdesc_alloc(gfp);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pmd_page_ctor(page)) {
-   __free_pages(page, 0);
+   if (!ptdesc_pmd_ctor(ptdesc)) {
+   ptdesc_free(ptdesc);
return NULL;
}
 
-   atomic_set(>pt_frag_refcount, 1);
+   atomic_set(>pt_frag_refcount, 1);
 
-   ret = page_address(page);
+   ret = ptdesc_address(ptdesc);
/*
 * if we support only one fragment just return the
 * allocated page.
@@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 
spin_lock(>page_table_lock);
/*
-* If we find pgtable_page set, we return
+* If we find ptdesc_page set, we return
 * the allocated page with single fragment
 * count.
 */
if (likely(!mm->context.pmd_frag)) {
-   atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
+   atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(>page_table_lock);
@@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, unsigned 
long vmaddr)
 
 void pmd_fragment_free(unsigned long *pmd)
 {
-   struct page *page = virt_to_page(pmd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
 
-   if (PageReserved(page))
-   return free_reserved_page(page);
+   if (ptdesc_is_reserved(ptdesc))
+   return free_reserved_ptdesc(ptdesc);
 
-   BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
-   if (atomic_dec_and_test(>pt_frag_refcount)) {
-   pgtable_pmd_page_dtor(page);
-   __free_page(page);
+   BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
+   if (atomic_dec_and_test(>pt_frag_refcount)) {
+   ptdesc_pmd_dtor(ptdesc);
+   ptdesc_free(ptdesc);
}
 }
 
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 20652daa1d7e..cf08831fa7c3 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -18,15 +18,15 @@
 void pte_frag_destroy(void *pte_frag)
 {
int count;
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = virt_to_page(pte_frag);
+   ptdesc = virt_to_ptdesc(pte_frag);
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   if (atomic_sub_and_test(PTE_FRAG_NR - count, 
>pt_frag_refcount)) {
+   ptdesc_pte_dtor(ptdesc);
+   ptdesc_free(ptdesc);
}
 }
 
@@ -55,25 +55,25 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
 {

[PATCH 11/33] mm: Convert ptlock_free() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 10 +-
 mm/memory.c|  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2390fc2542aa..17a64cfd1430 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2787,7 +2787,7 @@ static inline void ptdesc_clear(void *x)
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
 bool ptlock_alloc(struct ptdesc *ptdesc);
-extern void ptlock_free(struct page *page);
+void ptlock_free(struct ptdesc *ptdesc);
 
 static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
@@ -2803,7 +2803,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
return true;
 }
 
-static inline void ptlock_free(struct page *page)
+static inline void ptlock_free(struct ptdesc *ptdesc)
 {
 }
 
@@ -2844,7 +2844,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
 }
 static inline void ptlock_cache_init(void) {}
 static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
-static inline void ptlock_free(struct page *page) {}
+static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
 static inline bool pgtable_pte_page_ctor(struct page *page)
@@ -2858,7 +2858,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
*page)
 
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
-   ptlock_free(page);
+   ptlock_free(page_ptdesc(page));
__ClearPageTable(page);
dec_lruvec_page_state(page, NR_PAGETABLE);
 }
@@ -2916,7 +2916,7 @@ static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
 #endif
-   ptlock_free(ptdesc_page(ptdesc));
+   ptlock_free(ptdesc);
 }
 
 #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
diff --git a/mm/memory.c b/mm/memory.c
index 37d408ac1b8d..ca74425c9405 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5937,8 +5937,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
return true;
 }
 
-void ptlock_free(struct page *page)
+void ptlock_free(struct ptdesc *ptdesc)
 {
-   kmem_cache_free(page_ptl_cachep, page->ptl);
+   kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
 }
 #endif
-- 
2.39.2

[PATCH 12/33] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-04-19 Thread Vishal Moola (Oracle)

Creates ptdesc_pte_ctor(), ptdesc_pmd_ctor(), ptdesc_pte_dtor(), and
ptdesc_pmd_dtor() and make the original pgtable constructor/destructors
wrappers.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 56 ++
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17a64cfd1430..cb136d2fdf74 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2847,20 +2847,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { 
return true; }
 static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
-static inline bool pgtable_pte_page_ctor(struct page *page)
+static inline bool ptdesc_pte_ctor(struct ptdesc *ptdesc)
 {
-   if (!ptlock_init(page_ptdesc(page)))
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   if (!ptlock_init(ptdesc))
return false;
-   __SetPageTable(page);
-   inc_lruvec_page_state(page, NR_PAGETABLE);
+   __SetPageTable(>page);
+   lruvec_stat_add_folio(folio, NR_PAGETABLE);
return true;
 }
 
+static inline bool pgtable_pte_page_ctor(struct page *page)
+{
+   return ptdesc_pte_ctor(page_ptdesc(page));
+}
+
+static inline void ptdesc_pte_dtor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   ptlock_free(ptdesc);
+   __ClearPageTable(>page);
+   lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+}
+
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
-   ptlock_free(page_ptdesc(page));
-   __ClearPageTable(page);
-   dec_lruvec_page_state(page, NR_PAGETABLE);
+   ptdesc_pte_dtor(page_ptdesc(page));
 }
 
 #define pte_offset_map_lock(mm, pmd, address, ptlp)\
@@ -2942,20 +2956,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
*mm, pmd_t *pmd)
return ptl;
 }
 
-static inline bool pgtable_pmd_page_ctor(struct page *page)
+static inline bool ptdesc_pmd_ctor(struct ptdesc *ptdesc)
 {
-   if (!pmd_ptlock_init(page_ptdesc(page)))
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   if (!pmd_ptlock_init(ptdesc))
return false;
-   __SetPageTable(page);
-   inc_lruvec_page_state(page, NR_PAGETABLE);
+   __SetPageTable(>page);
+   lruvec_stat_add_folio(folio, NR_PAGETABLE);
return true;
 }
 
+static inline bool pgtable_pmd_page_ctor(struct page *page)
+{
+   return ptdesc_pmd_ctor(page_ptdesc(page));
+}
+
+static inline void ptdesc_pmd_dtor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   pmd_ptlock_free(ptdesc);
+   __ClearPageTable(>page);
+   lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+}
+
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
-   pmd_ptlock_free(page_ptdesc(page));
-   __ClearPageTable(page);
-   dec_lruvec_page_state(page, NR_PAGETABLE);
+   ptdesc_pmd_dtor(page_ptdesc(page));
 }
 
 /*
-- 
2.39.2

[PATCH 10/33] mm: Convert pmd_ptlock_free() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d2485a110936..2390fc2542aa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2911,12 +2911,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
*ptdesc)
return ptlock_init(ptdesc);
 }
 
-static inline void pmd_ptlock_free(struct page *page)
+static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
+   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
 #endif
-   ptlock_free(page);
+   ptlock_free(ptdesc_page(ptdesc));
 }
 
 #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
@@ -2929,7 +2929,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
 }
 
 static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
-static inline void pmd_ptlock_free(struct page *page) {}
+static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
 
 #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
 
@@ -2953,7 +2953,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
*page)
 
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
-   pmd_ptlock_free(page);
+   pmd_ptlock_free(page_ptdesc(page));
__ClearPageTable(page);
dec_lruvec_page_state(page, NR_PAGETABLE);
 }
-- 
2.39.2

[PATCH 09/33] mm: Convert ptlock_init() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7eb562909b2c..d2485a110936 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2818,7 +2818,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
 }
 
-static inline bool ptlock_init(struct page *page)
+static inline bool ptlock_init(struct ptdesc *ptdesc)
 {
/*
 * prep_new_page() initialize page->private (and therefore page->ptl)
@@ -2827,10 +2827,10 @@ static inline bool ptlock_init(struct page *page)
 * It can happen if arch try to use slab for page table allocation:
 * slab code uses page->slab_cache, which share storage with page->ptl.
 */
-   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
-   if (!ptlock_alloc(page_ptdesc(page)))
+   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
+   if (!ptlock_alloc(ptdesc))
return false;
-   spin_lock_init(ptlock_ptr(page_ptdesc(page)));
+   spin_lock_init(ptlock_ptr(ptdesc));
return true;
 }
 
@@ -2843,13 +2843,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return >page_table_lock;
 }
 static inline void ptlock_cache_init(void) {}
-static inline bool ptlock_init(struct page *page) { return true; }
+static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void ptlock_free(struct page *page) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
-   if (!ptlock_init(page))
+   if (!ptlock_init(page_ptdesc(page)))
return false;
__SetPageTable(page);
inc_lruvec_page_state(page, NR_PAGETABLE);
@@ -2908,7 +2908,7 @@ static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
ptdesc->pmd_huge_pte = NULL;
 #endif
-   return ptlock_init(ptdesc_page(ptdesc));
+   return ptlock_init(ptdesc);
 }
 
 static inline void pmd_ptlock_free(struct page *page)
-- 
2.39.2

[PATCH 08/33] mm: Convert pmd_ptlock_init() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed8dd0464841..7eb562909b2c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2903,12 +2903,12 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return ptlock_ptr(pmd_ptdesc(pmd));
 }
 
-static inline bool pmd_ptlock_init(struct page *page)
+static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   page->pmd_huge_pte = NULL;
+   ptdesc->pmd_huge_pte = NULL;
 #endif
-   return ptlock_init(page);
+   return ptlock_init(ptdesc_page(ptdesc));
 }
 
 static inline void pmd_ptlock_free(struct page *page)
@@ -2928,7 +2928,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return >page_table_lock;
 }
 
-static inline bool pmd_ptlock_init(struct page *page) { return true; }
+static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void pmd_ptlock_free(struct page *page) {}
 
 #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
@@ -2944,7 +2944,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, 
pmd_t *pmd)
 
 static inline bool pgtable_pmd_page_ctor(struct page *page)
 {
-   if (!pmd_ptlock_init(page))
+   if (!pmd_ptlock_init(page_ptdesc(page)))
return false;
__SetPageTable(page);
inc_lruvec_page_state(page, NR_PAGETABLE);
-- 
2.39.2

[PATCH 07/33] mm: Convert ptlock_ptr() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/x86/xen/mmu_pv.c |  2 +-
 include/linux/mm.h| 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index fdc91deece7e..a1c9f8dcbb5a 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -651,7 +651,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
mm_struct *mm)
spinlock_t *ptl = NULL;
 
 #if USE_SPLIT_PTE_PTLOCKS
-   ptl = ptlock_ptr(page);
+   ptl = ptlock_ptr(page_ptdesc(page));
spin_lock_nest_lock(ptl, >page_table_lock);
 #endif
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17dc6e37ea03..ed8dd0464841 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2789,9 +2789,9 @@ void __init ptlock_cache_init(void);
 bool ptlock_alloc(struct ptdesc *ptdesc);
 extern void ptlock_free(struct page *page);
 
-static inline spinlock_t *ptlock_ptr(struct page *page)
+static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
-   return page->ptl;
+   return ptdesc->ptl;
 }
 #else /* ALLOC_SPLIT_PTLOCKS */
 static inline void ptlock_cache_init(void)
@@ -2807,15 +2807,15 @@ static inline void ptlock_free(struct page *page)
 {
 }
 
-static inline spinlock_t *ptlock_ptr(struct page *page)
+static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
-   return >ptl;
+   return >ptl;
 }
 #endif /* ALLOC_SPLIT_PTLOCKS */
 
 static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(pmd_page(*pmd));
+   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
 }
 
 static inline bool ptlock_init(struct page *page)
@@ -2830,7 +2830,7 @@ static inline bool ptlock_init(struct page *page)
VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
if (!ptlock_alloc(page_ptdesc(page)))
return false;
-   spin_lock_init(ptlock_ptr(page));
+   spin_lock_init(ptlock_ptr(page_ptdesc(page)));
return true;
 }
 
@@ -2900,7 +2900,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
 
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
+   return ptlock_ptr(pmd_ptdesc(pmd));
 }
 
 static inline bool pmd_ptlock_init(struct page *page)
-- 
2.39.2

[PATCH 06/33] mm: Convert ptlock_alloc() to use ptdescs

2023-04-19 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 6 +++---
 mm/memory.c| 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 069187e84e35..17dc6e37ea03 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2786,7 +2786,7 @@ static inline void ptdesc_clear(void *x)
 #if USE_SPLIT_PTE_PTLOCKS
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
-extern bool ptlock_alloc(struct page *page);
+bool ptlock_alloc(struct ptdesc *ptdesc);
 extern void ptlock_free(struct page *page);
 
 static inline spinlock_t *ptlock_ptr(struct page *page)
@@ -2798,7 +2798,7 @@ static inline void ptlock_cache_init(void)
 {
 }
 
-static inline bool ptlock_alloc(struct page *page)
+static inline bool ptlock_alloc(struct ptdesc *ptdesc)
 {
return true;
 }
@@ -2828,7 +2828,7 @@ static inline bool ptlock_init(struct page *page)
 * slab code uses page->slab_cache, which share storage with page->ptl.
 */
VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
-   if (!ptlock_alloc(page))
+   if (!ptlock_alloc(page_ptdesc(page)))
return false;
spin_lock_init(ptlock_ptr(page));
return true;
diff --git a/mm/memory.c b/mm/memory.c
index d4d7df041b6f..37d408ac1b8d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5926,14 +5926,14 @@ void __init ptlock_cache_init(void)
SLAB_PANIC, NULL);
 }
 
-bool ptlock_alloc(struct page *page)
+bool ptlock_alloc(struct ptdesc *ptdesc)
 {
spinlock_t *ptl;
 
ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
if (!ptl)
return false;
-   page->ptl = ptl;
+   ptdesc->ptl = ptl;
return true;
 }
 
-- 
2.39.2

[PATCH 05/33] mm: Convert pmd_pgtable_page() to pmd_ptdesc()

2023-04-19 Thread Vishal Moola (Oracle)

Converts pmd_pgtable_page() to pmd_ptdesc() and all its callers. This
removes some direct accesses to struct page, working towards splitting
out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/mm.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ec3cbe2fa665..069187e84e35 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2892,15 +2892,15 @@ static inline void pgtable_pte_page_dtor(struct page 
*page)
 
 #if USE_SPLIT_PMD_PTLOCKS
 
-static inline struct page *pmd_pgtable_page(pmd_t *pmd)
+static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
 {
unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
-   return virt_to_page((void *)((unsigned long) pmd & mask));
+   return virt_to_ptdesc((void *)((unsigned long) pmd & mask));
 }
 
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(pmd_pgtable_page(pmd));
+   return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
 }
 
 static inline bool pmd_ptlock_init(struct page *page)
@@ -2919,7 +2919,7 @@ static inline void pmd_ptlock_free(struct page *page)
ptlock_free(page);
 }
 
-#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
+#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
 
 #else
 
-- 
2.39.2

[PATCH 04/33] mm: add utility functions for ptdesc

2023-04-19 Thread Vishal Moola (Oracle)

Introduce utility functions setting the foundation for ptdescs. These
will also assist in the splitting out of ptdesc from struct page.

ptdesc_alloc() is defined to allocate new ptdesc pages as compound
pages. This is to standardize ptdescs by allowing for one allocation
and one free function, in contrast to 2 allocation and 2 free functions.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/asm-generic/tlb.h | 11 ++
 include/linux/mm.h| 44 +++
 include/linux/pgtable.h   | 13 
 3 files changed, 68 insertions(+)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b46617207c93..6bade9e0e799 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, 
struct page *page)
return tlb_remove_page_size(tlb, page, PAGE_SIZE);
 }
 
+static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
+{
+   tlb_remove_table(tlb, pt);
+}
+
+/* Like tlb_remove_ptdesc, but for page-like page directories. */
+static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
ptdesc *pt)
+{
+   tlb_remove_page(tlb, ptdesc_page(pt));
+}
+
 static inline void tlb_change_page_size(struct mmu_gather *tlb,
 unsigned int page_size)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b18848ae7e22..ec3cbe2fa665 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2744,6 +2744,45 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
pud_t *pud, unsigned long a
 }
 #endif /* CONFIG_MMU */
 
+static inline struct ptdesc *virt_to_ptdesc(const void *x)
+{
+   return page_ptdesc(virt_to_head_page(x));
+}
+
+static inline void *ptdesc_to_virt(struct ptdesc *pt)
+{
+   return page_to_virt(ptdesc_page(pt));
+}
+
+static inline void *ptdesc_address(struct ptdesc *pt)
+{
+   return folio_address(ptdesc_folio(pt));
+}
+
+static inline bool ptdesc_is_reserved(struct ptdesc *pt)
+{
+   return folio_test_reserved(ptdesc_folio(pt));
+}
+
+static inline struct ptdesc *ptdesc_alloc(gfp_t gfp, unsigned int order)
+{
+   struct page *page = alloc_pages(gfp | __GFP_COMP, order);
+
+   return page_ptdesc(page);
+}
+
+static inline void ptdesc_free(struct ptdesc *pt)
+{
+   struct page *page = ptdesc_page(pt);
+
+   __free_pages(page, compound_order(page));
+}
+
+static inline void ptdesc_clear(void *x)
+{
+   clear_page(x);
+}
+
 #if USE_SPLIT_PTE_PTLOCKS
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
@@ -2970,6 +3009,11 @@ static inline void mark_page_reserved(struct page *page)
adjust_managed_page_count(page, -1);
 }
 
+static inline void free_reserved_ptdesc(struct ptdesc *pt)
+{
+   free_reserved_page(ptdesc_page(pt));
+}
+
 /*
  * Default method to free all the __init memory into the buddy system.
  * The freed pages will be poisoned with pattern "poison" if it's within
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 7cc6ea057ee9..7cd803aa38eb 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -97,6 +97,19 @@ TABLE_MATCH(ptl, ptl);
 #undef TABLE_MATCH
 static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
 
+#define ptdesc_page(pt)(_Generic((pt), 
\
+   const struct ptdesc *:  (const struct page *)(pt),  \
+   struct ptdesc *:(struct page *)(pt)))
+
+#define ptdesc_folio(pt)   (_Generic((pt), \
+   const struct ptdesc *:  (const struct folio *)(pt), \
+   struct ptdesc *:(struct folio *)(pt)))
+
+static inline struct ptdesc *page_ptdesc(struct page *page)
+{
+   return (struct ptdesc *)page;
+}
+
 /*
  * A page table page can be thought of an array like this: pXd_t[PTRS_PER_PxD]
  *
-- 
2.39.2

[PATCH 03/33] pgtable: Create struct ptdesc

2023-04-19 Thread Vishal Moola (Oracle)

Currently, page table information is stored within struct page. As part
of simplifying struct page, create struct ptdesc for page table
information.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/linux/pgtable.h | 50 +
 1 file changed, 50 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 023918666dd4..7cc6ea057ee9 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -47,6 +47,56 @@
 #define pmd_pgtable(pmd) pmd_page(pmd)
 #endif
 
+/**
+ * struct ptdesc - Memory descriptor for page tables.
+ * @__page_flags: Same as page flags. Unused for page tables.
+ * @pt_list: List of used page tables. Used for s390 and x86.
+ * @_pt_pad_1: Padding that aliases with page's compound head.
+ * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
+ * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap only.
+ * @pt_mm: Used for x86 pgds.
+ * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
only.
+ * @ptl: Lock for the page table.
+ *
+ * This struct overlays struct page for now. Do not modify without a good
+ * understanding of the issues.
+ */
+struct ptdesc {
+   unsigned long __page_flags;
+
+   union {
+   struct list_head pt_list;
+   struct {
+   unsigned long _pt_pad_1;
+   pgtable_t pmd_huge_pte;
+   };
+   };
+   unsigned long _pt_s390_gaddr;
+
+   union {
+   struct mm_struct *pt_mm;
+   atomic_t pt_frag_refcount;
+   };
+
+#if ALLOC_SPLIT_PTLOCKS
+   spinlock_t *ptl;
+#else
+   spinlock_t ptl;
+#endif
+};
+
+#define TABLE_MATCH(pg, pt)\
+   static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
+TABLE_MATCH(flags, __page_flags);
+TABLE_MATCH(compound_head, pt_list);
+TABLE_MATCH(compound_head, _pt_pad_1);
+TABLE_MATCH(mapping, _pt_s390_gaddr);
+TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
+TABLE_MATCH(pt_mm, pt_mm);
+TABLE_MATCH(ptl, ptl);
+#undef TABLE_MATCH
+static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
+
 /*
  * A page table page can be thought of an array like this: pXd_t[PTRS_PER_PxD]
  *
-- 
2.39.2

[PATCH 02/33] s390: Use pt_frag_refcount for pagetables

2023-04-19 Thread Vishal Moola (Oracle)

s390 currently uses _refcount to identify fragmented page tables.
The page table struct already has a member pt_frag_refcount used by
powerpc, so have s390 use that instead of the _refcount field as well.
This improves the safety for _refcount and the page table tracking.

This also allows us to simplify the tracking since we can once again use
the lower byte of pt_frag_refcount instead of the upper byte of _refcount.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/mm/pgalloc.c | 38 +++---
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 66ab68db9842..6b99932abc66 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -182,20 +182,17 @@ void page_table_free_pgste(struct page *page)
  * As follows from the above, no unallocated or fully allocated parent
  * pages are contained in mm_context_t::pgtable_list.
  *
- * The upper byte (bits 24-31) of the parent page _refcount is used
+ * The lower byte (bits 0-7) of the parent page pt_frag_refcount is used
  * for tracking contained 2KB-pgtables and has the following format:
  *
  *   PP  AA
- * 01234567upper byte (bits 24-31) of struct page::_refcount
+ * 01234567upper byte (bits 0-7) of struct page::pt_frag_refcount
  *   ||  ||
  *   ||  |+--- upper 2KB-pgtable is allocated
  *   ||  + lower 2KB-pgtable is allocated
  *   |+--- upper 2KB-pgtable is pending for removal
  *   + lower 2KB-pgtable is pending for removal
  *
- * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why
- * using _refcount is possible).
- *
  * When 2KB-pgtable is allocated the corresponding AA bit is set to 1.
  * The parent page is either:
  *   - added to mm_context_t::pgtable_list in case the second half of the
@@ -243,11 +240,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
if (!list_empty(>context.pgtable_list)) {
page = list_first_entry(>context.pgtable_list,
struct page, lru);
-   mask = atomic_read(>_refcount) >> 24;
+   mask = atomic_read(>pt_frag_refcount);
/*
 * The pending removal bits must also be checked.
 * Failure to do so might lead to an impossible
-* value of (i.e 0x13 or 0x23) written to _refcount.
+* value of (i.e 0x13 or 0x23) written to
+* pt_frag_refcount.
 * Such values violate the assumption that pending and
 * allocation bits are mutually exclusive, and the rest
 * of the code unrails as result. That could lead to
@@ -259,8 +257,8 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
bit = mask & 1; /* =1 -> second 2K */
if (bit)
table += PTRS_PER_PTE;
-   atomic_xor_bits(>_refcount,
-   0x01U << (bit + 24));
+   atomic_xor_bits(>pt_frag_refcount,
+   0x01U << bit);
list_del(>lru);
}
}
@@ -281,12 +279,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
table = (unsigned long *) page_to_virt(page);
if (mm_alloc_pgste(mm)) {
/* Return 4K page table with PGSTEs */
-   atomic_xor_bits(>_refcount, 0x03U << 24);
+   atomic_xor_bits(>pt_frag_refcount, 0x03U);
memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
} else {
/* Return the first 2K fragment of the page */
-   atomic_xor_bits(>_refcount, 0x01U << 24);
+   atomic_xor_bits(>pt_frag_refcount, 0x01U);
memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
spin_lock_bh(>context.lock);
list_add(>lru, >context.pgtable_list);
@@ -323,22 +321,19 @@ void page_table_free(struct mm_struct *mm, unsigned long 
*table)
 * will happen outside of the critical section from this
 * function or from __tlb_remove_table()
 */
-   mask = atomic_xor_bits(>_refcount, 0x11U << (bit + 24));
-   mask >>= 24;
+   mask = atomic_xor_bits(>pt_frag_refcount, 0x11U << bit);
if (mask & 0x03U)
list_add(>lru, >context.pgtable_list);
else
list_del(>lru);
spin_unlock_bh(>context.lock);
-   mask = atomic_xor_bits(>_refcount, 0x10U << (bit + 24));
-

[PATCH 01/33] s390: Use _pt_s390_gaddr for gmap address tracking

2023-04-19 Thread Vishal Moola (Oracle)

s390 uses page->index to keep track of page tables for the guest address
space. In an attempt to consolidate the usage of page fields in s390,
replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.

This will help with the splitting of struct ptdesc from struct page, as
well as allow s390 to use _pt_frag_refcount for fragmented page table
tracking.

Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
before freeing the pages as well.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/s390/mm/gmap.c  | 50 +++-
 include/linux/mm_types.h |  2 +-
 2 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 5a716bdcba05..a61ea1a491dc 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -70,7 +70,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
goto out_free;
-   page->index = 0;
+   page->_pt_s390_gaddr = 0;
list_add(>lru, >crst_list);
table = page_to_virt(page);
crst_table_init(table, etype);
@@ -187,16 +187,20 @@ static void gmap_free(struct gmap *gmap)
if (!(gmap_is_shadow(gmap) && gmap->removed))
gmap_flush_tlb(gmap);
/* Free all segment & region tables. */
-   list_for_each_entry_safe(page, next, >crst_list, lru)
+   list_for_each_entry_safe(page, next, >crst_list, lru) {
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
+   }
gmap_radix_tree_free(>guest_to_host);
gmap_radix_tree_free(>host_to_guest);
 
/* Free additional data for a shadow gmap */
if (gmap_is_shadow(gmap)) {
/* Free all page tables. */
-   list_for_each_entry_safe(page, next, >pt_list, lru)
+   list_for_each_entry_safe(page, next, >pt_list, lru) {
+   page->_pt_s390_gaddr = 0;
page_table_free_pgste(page);
+   }
gmap_rmap_radix_tree_free(>host_to_rmap);
/* Release reference to the parent */
gmap_put(gmap->parent);
@@ -318,12 +322,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
long *table,
list_add(>lru, >crst_list);
*table = __pa(new) | _REGION_ENTRY_LENGTH |
(*table & _REGION_ENTRY_TYPE_MASK);
-   page->index = gaddr;
+   page->_pt_s390_gaddr = gaddr;
page = NULL;
}
spin_unlock(>guest_table_lock);
-   if (page)
+   if (page) {
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
+   }
return 0;
 }
 
@@ -341,7 +347,7 @@ static unsigned long __gmap_segment_gaddr(unsigned long 
*entry)
offset = (unsigned long) entry / sizeof(unsigned long);
offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
page = pmd_pgtable_page((pmd_t *) entry);
-   return page->index + offset;
+   return page->_pt_s390_gaddr + offset;
 }
 
 /**
@@ -1351,6 +1357,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned 
long raddr)
/* Free page table */
page = phys_to_page(pgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
page_table_free_pgste(page);
 }
 
@@ -1379,6 +1386,7 @@ static void __gmap_unshadow_sgt(struct gmap *sg, unsigned 
long raddr,
/* Free page table */
page = phys_to_page(pgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
page_table_free_pgste(page);
}
 }
@@ -1409,6 +1417,7 @@ static void gmap_unshadow_sgt(struct gmap *sg, unsigned 
long raddr)
/* Free segment table */
page = phys_to_page(sgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
 }
 
@@ -1437,6 +1446,7 @@ static void __gmap_unshadow_r3t(struct gmap *sg, unsigned 
long raddr,
/* Free segment table */
page = phys_to_page(sgt);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
}
 }
@@ -1467,6 +1477,7 @@ static void gmap_unshadow_r3t(struct gmap *sg, unsigned 
long raddr)
/* Free region 3 table */
page = phys_to_page(r3t);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
 }
 
@@ -1495,6 +1506,7 @@ static void __gmap_unshadow_r2t(struct gmap *sg, unsigned 
long raddr,
/* Free region 3 table */
page = phys_to_page(r3t);
list_del(>lru);
+   page->_pt_s390_gaddr = 0;
__free_pages(page, CRST_ALLOC_ORDER);
}
 }
@@ -1525,6 +1537,7 @@ static void gmap_unshadow_r2t(struct gmap *sg, unsigned

[PATCH 00/33] Split ptdesc from struct page

2023-04-19 Thread Vishal Moola (Oracle)

The MM subsystem is trying to shrink struct page. This patchset
introduces a memory descriptor for page table tracking - struct ptdesc.

This patchset introduces ptdesc, splits ptdesc from struct page, and
converts many callers of page table constructor/destructors to use ptdescs.

Ptdesc is a foundation to further standardize page tables, and eventually
allow for dynamic allocation of page tables independent of struct page.
However, the use of pages for page table tracking is quite deeply
ingrained and varied across archictectures, so there is still a lot of
work to be done before that can happen.

This series is rebased on next-20230417.

Vishal Moola (Oracle) (33):
  s390: Use _pt_s390_gaddr for gmap address tracking
  s390: Use pt_frag_refcount for pagetables
  pgtable: Create struct ptdesc
  mm: add utility functions for ptdesc
  mm: Convert pmd_pgtable_page() to pmd_ptdesc()
  mm: Convert ptlock_alloc() to use ptdescs
  mm: Convert ptlock_ptr() to use ptdescs
  mm: Convert pmd_ptlock_init() to use ptdescs
  mm: Convert ptlock_init() to use ptdescs
  mm: Convert pmd_ptlock_free() to use ptdescs
  mm: Convert ptlock_free() to use ptdescs
  mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}
  powerpc: Convert various functions to use ptdescs
  x86: Convert various functions to use ptdescs
  s390: Convert various gmap functions to use ptdescs
  s390: Convert various pgalloc functions to use ptdescs
  mm: Remove page table members from struct page
  pgalloc: Convert various functions to use ptdescs
  arm: Convert various functions to use ptdescs
  arm64: Convert various functions to use ptdescs
  csky: Convert __pte_free_tlb() to use ptdescs
  hexagon: Convert __pte_free_tlb() to use ptdescs
  loongarch: Convert various functions to use ptdescs
  m68k: Convert various functions to use ptdescs
  mips: Convert various functions to use ptdescs
  nios2: Convert __pte_free_tlb() to use ptdescs
  openrisc: Convert __pte_free_tlb() to use ptdescs
  riscv: Convert alloc_{pmd, pte}_late() to use ptdescs
  sh: Convert pte_free_tlb() to use ptdescs
  sparc64: Convert various functions to use ptdescs
  sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents
  um: Convert {pmd, pte}_free_tlb() to use ptdescs
  mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

 Documentation/mm/split_page_table_lock.rst|  12 +-
 .../zh_CN/mm/split_page_table_lock.rst|  14 +-
 arch/arm/include/asm/tlb.h|  12 +-
 arch/arm/mm/mmu.c |   6 +-
 arch/arm64/include/asm/tlb.h  |  14 +-
 arch/arm64/mm/mmu.c   |   7 +-
 arch/csky/include/asm/pgalloc.h   |   4 +-
 arch/hexagon/include/asm/pgalloc.h|   8 +-
 arch/loongarch/include/asm/pgalloc.h  |  27 ++-
 arch/loongarch/mm/pgtable.c   |   7 +-
 arch/m68k/include/asm/mcf_pgalloc.h   |  41 ++--
 arch/m68k/include/asm/sun3_pgalloc.h  |   8 +-
 arch/m68k/mm/motorola.c   |   4 +-
 arch/mips/include/asm/pgalloc.h   |  31 +--
 arch/mips/mm/pgtable.c|   7 +-
 arch/nios2/include/asm/pgalloc.h  |   8 +-
 arch/openrisc/include/asm/pgalloc.h   |   8 +-
 arch/powerpc/mm/book3s64/mmu_context.c|  10 +-
 arch/powerpc/mm/book3s64/pgtable.c|  32 +--
 arch/powerpc/mm/pgtable-frag.c|  46 ++--
 arch/riscv/include/asm/pgalloc.h  |   8 +-
 arch/riscv/mm/init.c  |  16 +-
 arch/s390/include/asm/pgalloc.h   |   4 +-
 arch/s390/include/asm/tlb.h   |   4 +-
 arch/s390/mm/gmap.c   | 218 +++---
 arch/s390/mm/pgalloc.c| 126 +-
 arch/sh/include/asm/pgalloc.h |   8 +-
 arch/sparc/mm/init_64.c   |  17 +-
 arch/sparc/mm/srmmu.c |   5 +-
 arch/um/include/asm/pgalloc.h |  18 +-
 arch/x86/mm/pgtable.c |  46 ++--
 arch/x86/xen/mmu_pv.c |   2 +-
 include/asm-generic/pgalloc.h |  62 +++--
 include/asm-generic/tlb.h |  11 +
 include/linux/mm.h| 138 +++
 include/linux/mm_types.h  |  14 --
 include/linux/pgtable.h   |  60 +
 mm/memory.c   |   8 +-
 38 files changed, 625 insertions(+), 446 deletions(-)

-- 
2.39.2

Re: [PATCH 1/4] add generic builtin command line

2023-04-19 Thread Tomas Mudrunka

This seems quite useful. Can you please merge it?

[PATCH] ASoC: fsl_asrc_dma: fix potential null-ptr-deref

2023-04-19 Thread Nikita Zhandarovich

dma_request_slave_channel() may return NULL which will lead to
NULL pointer dereference error in 'tmp_chan->private'.

Correct this behaviour by, first, switching from deprecated function
dma_request_slave_channel() to dma_request_chan(). Secondly, enable
sanity check for the resuling value of dma_request_chan().
Also, fix description that follows the enacted changes and that
concerns the use of dma_request_slave_channel().

Fixes: 706e2c881158 ("ASoC: fsl_asrc_dma: Reuse the dma channel if available in 
Back-End")
Co-developed-by: Natalia Petrova 
Signed-off-by: Nikita Zhandarovich 
---
 sound/soc/fsl/fsl_asrc_dma.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
index 3b81a465814a..05a7d1588d20 100644
--- a/sound/soc/fsl/fsl_asrc_dma.c
+++ b/sound/soc/fsl/fsl_asrc_dma.c
@@ -209,14 +209,19 @@ static int fsl_asrc_dma_hw_params(struct 
snd_soc_component *component,
be_chan = 
soc_component_to_pcm(component_be)->chan[substream->stream];
tmp_chan = be_chan;
}
-   if (!tmp_chan)
-   tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" : "rx");
+   if (!tmp_chan) {
+   tmp_chan = dma_request_chan(dev_be, tx ? "tx" : "rx");
+   if (IS_ERR(tmp_chan)) {
+   dev_err(dev, "failed to request DMA channel for 
Back-End\n");
+   return -EINVAL;
+   }
+   }
 
/*
 * An EDMA DEV_TO_DEV channel is fixed and bound with DMA event of each
 * peripheral, unlike SDMA channel that is allocated dynamically. So no
 * need to configure dma_request and dma_request2, but get dma_chan of
-* Back-End device directly via dma_request_slave_channel.
+* Back-End device directly via dma_request_chan.
 */
if (!asrc->use_edma) {
/* Get DMA request of Back-End */

Re: [PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc

2023-04-19 Thread Kautuk Consul

On 2023-04-12 12:34:13, Kautuk Consul wrote:
> Hi,
> 
> On 2023-04-11 16:35:10, Michael Ellerman wrote:
> > Kautuk Consul  writes:
> > > On 2023-04-07 09:01:29, Sean Christopherson wrote:
> > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote:
> > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote:
> > >> > > I used the unlikely() macro on the return values of the k.alloc
> > >> > > calls and found that it changes the code generation a bit.
> > >> > > Optimize all return paths of k.alloc calls by improving
> > >> > > branch prediction on return value of k.alloc.
> > >> 
> > >> Nit, this is improving code generation, not branch prediction.
> > > Sorry my mistake.
> > >> 
> > >> > What about below?
> > >> > 
> > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using
> > >> > unlikely() macro to optimize their return paths."
> > >> 
> > >> Another nit, using unlikely() doesn't necessarily provide a measurable 
> > >> optimization.
> > >> As above, it does often improve code generation for the happy path, but 
> > >> that doesn't
> > >> always equate to improved performance, e.g. if the CPU can easily 
> > >> predict the branch
> > >> and/or there is no impact on the cache footprint.
> > 
> > > I see. I will submit a v2 of the patch with a better and more accurate
> > > description. Does anyone else have any comments before I do so ?
> >  
> > In general I think unlikely should be saved for cases where either the
> > compiler is generating terrible code, or the likelyness of the condition
> > might be surprising to a human reader.
> > 
> > eg. if you had some code that does a NULL check and it's *expected* that
> > the value is NULL, then wrapping that check in likely() actually adds
> > information for a human reader.
> > 
> > Also please don't use unlikely in init paths or other cold paths, it
> > clutters the code (only slightly but a little) and that's not worth the
> > possible tiny benefit for code that only runs once or infrequently.
> > 
> > I would expect the compilers to do the right thing in all
> > these cases without the unlikely. But if you can demonstrate that they
> > meaningfully improve the code generation with a before/after
> > dissassembly then I'd be interested.
> Just FYI, the last email by kautuk.consul...@gmail.com was by me.
> That last email contains a diff file attachment which compares 2 files:
> before my changes and after my changes.
> This diff file shows a lot of changes in code generation. Im assuming
> all those changes are made by the compiler towards optimizing all return
> paths to k.alloc calls.
> Kindly review and comment.
Any comments on the numerous code generation changes as shown by the
files I attached to this mail chain ? Sorry I don't have concrete
figures of any type to prove that this leads to any measurable performance
improvements. I am just assuming that the compiler's modified code
generation (due to the use of the unlikely macro) would be optimal.

Thanks.
> > cheers

Re: [PATCH v3 05/14] ia64: don't allow users to override ARCH_FORCE_MAX_ORDER

2023-04-19 Thread Mike Rapoport

On Sat, Mar 25, 2023 at 02:38:15PM +0800, Kefeng Wang wrote:
> 
> 
> On 2023/3/25 14:08, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" 
> > 
> > It is enough to keep default values for base and huge pages without
> > letting users to override ARCH_FORCE_MAX_ORDER.
> > 
> > Drop the prompt to make the option unvisible in *config.
> > 
> > Acked-by: Kirill A. Shutemov 
> > Reviewed-by: Zi Yan 
> > Signed-off-by: Mike Rapoport (IBM) 
> > ---
> >   arch/ia64/Kconfig | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> > index 0d2f41fa56ee..b61437cae162 100644
> > --- a/arch/ia64/Kconfig
> > +++ b/arch/ia64/Kconfig
> > @@ -202,8 +202,7 @@ config IA64_CYCLONE
> >   If you're unsure, answer N.
> >   config ARCH_FORCE_MAX_ORDER
> > -   int "MAX_ORDER (10 - 16)"  if !HUGETLB_PAGE
> > -   range 10 16  if !HUGETLB_PAGE
> > +   int
> > default "16" if HUGETLB_PAGE
> > default "10"
> 
> It seems that we could drop the following part?

ia64 can have 64k pages, so with MAX_ORDER==16 we'd need at least 32 bits
for section size
 
> diff --git a/arch/ia64/include/asm/sparsemem.h
> b/arch/ia64/include/asm/sparsemem.h
> index a58f8b466d96..18187551b183 100644
> --- a/arch/ia64/include/asm/sparsemem.h
> +++ b/arch/ia64/include/asm/sparsemem.h
> @@ -11,11 +11,6 @@
> 
>  #define SECTION_SIZE_BITS  (30)
>  #define MAX_PHYSMEM_BITS   (50)
> -#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
> -#if (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT > SECTION_SIZE_BITS)
> -#undef SECTION_SIZE_BITS
> -#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT)
> -#endif
>  #endif
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH] ASoC: fsl_sai: Fix pins setting for i.MX8QM platform

2023-04-19 Thread Iuliana Prodan


On 4/18/2023 12:42 PM, Chancel Liu wrote:

SAI on i.MX8QM platform supports the data lines up to 4. So the pins
setting should be corrected to 4.

Fixes: eba0f0077519 ("ASoC: fsl_sai: Enable combine mode soft")
Signed-off-by: Chancel Liu 
---


Reviewed-by: Iuliana Prodan 

Thanks,
Iulia


  sound/soc/fsl/fsl_sai.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 07d13dca852e..abdaffb00fbd 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -1544,7 +1544,7 @@ static const struct fsl_sai_soc_data fsl_sai_imx8qm_data 
= {
.use_imx_pcm = true,
.use_edma = true,
.fifo_depth = 64,
-   .pins = 1,
+   .pins = 4,
.reg_offset = 0,
.mclk0_is_mclk1 = false,
.flags = 0,

Re: [PATCH 01/33] s390: Use _pt_s390_gaddr for gmap address tracking

2023-04-19 Thread David Hildenbrand


On 18.04.23 23:33, Vishal Moola wrote:

On Tue, Apr 18, 2023 at 8:45 AM David Hildenbrand  wrote:


On 17.04.23 22:50, Vishal Moola (Oracle) wrote:

s390 uses page->index to keep track of page tables for the guest address
space. In an attempt to consolidate the usage of page fields in s390,
replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.

This will help with the splitting of struct ptdesc from struct page, as
well as allow s390 to use _pt_frag_refcount for fragmented page table
tracking.

Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
before freeing the pages as well.

Signed-off-by: Vishal Moola (Oracle) 
---


[...]


diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3fc9e680f174..2616d64c0e8c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -144,7 +144,7 @@ struct page {
   struct {/* Page table pages */
   unsigned long _pt_pad_1;/* compound_head */
   pgtable_t pmd_huge_pte; /* protected by page->ptl */
- unsigned long _pt_pad_2;/* mapping */
+ unsigned long _pt_s390_gaddr;   /* mapping */
   union {
   struct mm_struct *pt_mm; /* x86 pgds only */
   atomic_t pt_frag_refcount; /* powerpc */


The confusing part is, that these gmap page tables are not ordinary
process page tables that we would ordinarily place into this section
here. That's why they are also not allocated/freed using the typical
page table constructor/destructor ...


I initially thought the same, so I was quite confused when I saw
__gmap_segment_gaddr was using pmd_pgtable_page().

Although they are not ordinary process page tables, since we
eventually want to move them out of struct page, I think shifting them
to be in ptdescs, being a memory descriptor for page tables, makes
the most sense.


Seeing utilities like tlb_remove_page_ptdesc() that don't really apply 
to such page tables, I wonder if we should much rather treat such 
shadow/auxiliary/... page tables (just like other architectures like 
x86, arm, ... employ as well) as a distinct type.


And have ptdesc be the common type for all process page tables.



Another option is to leave pmd_pgtable_page() as is just for this case.
Or we can revert commit 7e25de77bc5ea which uses the function here
then figure out where these gmap pages table pages will go later.


I'm always confused when reading gmap code, so let me have another look :)

The confusing part is that s390x shares the lowest level page tables 
(PTE tables) between the process and gmap ("guest mapping", similar to 
EPT on x86-64). It maps these process PTE tables (covering 1 MiB) into 
gmap-specific PMD tables.


pmd_pgtable_page() should indeed always give us a gmap-specific 
PMD-table. In fact, something allocated via gmap_alloc_table().


Decoupling both concepts sounds like a good idea.

--
Thanks,

David / dhildenb

86 matches

Mail list logo