date:20230731

[PATCH] arm32: Avoid using solaris syntax for .section directive

2023-07-31 Thread Khem Raj

Assembler from binutils 2.41 rejects this syntax

.section "name"[, flags...]

where flags could be #alloc, #write, #execstr
Switch to using ELF syntax

.section name[, "flags"[, @type]]

[1] https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC119

Signed-off-by: Khem Raj 
---
 xen/arch/arm/arm32/proc-v7.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm32/proc-v7.S b/xen/arch/arm/arm32/proc-v7.S
index c90a31d80f..6d3d19b873 100644
--- a/xen/arch/arm/arm32/proc-v7.S
+++ b/xen/arch/arm/arm32/proc-v7.S
@@ -29,7 +29,7 @@ brahma15mp_init:
 mcr   CP32(r0, ACTLR)
 mov   pc, lr
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca15mp_proc_info, #object
 __v7_ca15mp_proc_info:
 .long 0x410FC0F0 /* Cortex-A15 */
@@ -38,7 +38,7 @@ __v7_ca15mp_proc_info:
 .long caxx_processor
 .size __v7_ca15mp_proc_info, . - __v7_ca15mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca7mp_proc_info, #object
 __v7_ca7mp_proc_info:
 .long 0x410FC070 /* Cortex-A7 */
@@ -47,7 +47,7 @@ __v7_ca7mp_proc_info:
 .long caxx_processor
 .size __v7_ca7mp_proc_info, . - __v7_ca7mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_brahma15mp_proc_info, #object
 __v7_brahma15mp_proc_info:
 .long 0x420F00F0 /* Broadcom Brahma-B15 */
-- 
2.41.0

[PATCH] arm32: Avoid using solaris syntax for .section directive

2023-07-31 Thread Khem Raj

Assembler from binutils 2.41 rejects this syntax

.section "name"[, flags...]

where flags could be #alloc, #write, #execstr
Switch to using ELF syntax

.section name[, "flags"[, @type]]

[1] https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC119

Signed-off-by: Khem Raj 
---
 xen/arch/arm/arm32/proc-v7.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm32/proc-v7.S b/xen/arch/arm/arm32/proc-v7.S
index c90a31d80f..6d3d19b873 100644
--- a/xen/arch/arm/arm32/proc-v7.S
+++ b/xen/arch/arm/arm32/proc-v7.S
@@ -29,7 +29,7 @@ brahma15mp_init:
 mcr   CP32(r0, ACTLR)
 mov   pc, lr
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca15mp_proc_info, #object
 __v7_ca15mp_proc_info:
 .long 0x410FC0F0 /* Cortex-A15 */
@@ -38,7 +38,7 @@ __v7_ca15mp_proc_info:
 .long caxx_processor
 .size __v7_ca15mp_proc_info, . - __v7_ca15mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca7mp_proc_info, #object
 __v7_ca7mp_proc_info:
 .long 0x410FC070 /* Cortex-A7 */
@@ -47,7 +47,7 @@ __v7_ca7mp_proc_info:
 .long caxx_processor
 .size __v7_ca7mp_proc_info, . - __v7_ca7mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_brahma15mp_proc_info, #object
 __v7_brahma15mp_proc_info:
 .long 0x420F00F0 /* Broadcom Brahma-B15 */
-- 
2.41.0

Re: Python in Domain Configurations

2023-07-31 Thread Elliott Mitchell

On Mon, Jul 31, 2023 at 06:09:41PM +0100, Ian Jackson wrote:
> Elliott Mitchell writes ("Re: Python in Domain Configurations"):
> > On Mon, Jul 31, 2023 at 05:59:55AM +0200, Marek Marczykowski-Górecki wrote:
> > > So, IMHO reducing config file from a full python (like it used to be in
> > > xend times) into a static file with well defined syntax was an
> > > improvement. Lets not go backward.
> 
> I'm no longer working on this codebase, but since I've been CC'd:
> 
> I was one of the people who replaced the Python-based config parsing
> with the current arrangements.  We didn't just do this because we were
> replacing xend (whose use of Python as implementation language made it
> appear convenient to just read and execute the configs as Python
> code).
> 
> We did it for the reasons Marek gives.  It's true that the existing
> format is not as well specified as it could be.  It was intended as a
> plausible subset of Python literal syntax.  We chose that syntax to
> preserve compatibility with the vast majority of existing config files
> and to provide something familiar.  (And it seems we did achieve those
> goals.)
> 
> The disk configuration syntax is particularly warty, but we inherited
> much of that from the Python version.

Okay.  I do note allowing full Python does make domain creation by script
easier.  While I have one use for re-adding the functionality, I'm sure
someone else would come up with other high-value ones.

> > > As for your original problem, IIUC you would like to add some data that
> > > would _not_ be interpreted by libxl, right? For that you can use
> > > comments with some specific marker for your script. This approach used
> > > to work well for SysV init script, and in fact for a very similar use case
> > > (ordering and dependencies, among other things).
> > 
> > That is /not/ the issue.  `xl` simply ignores any variables which it
> > doesn't interpret (this is in fact a Bad Thing).
> 
> I forget, but isn't there some kind of scheme for warning about
> unrecognised configuration options ?

There certainly should be and there may have been one in the past, but no
there isn't one now.  One advantage for using Python is processed data
could be removed when creating the domain and anything left over could be
warned about.  Though such could be added to the existing parser too.

> >  I need to know what the limits to the syntax are.
> 
> I agree that it's not great that the syntax is not 100% documented.
> The parser is in
>   tools/libs/util/libxlu_cfg_y.y
>   tools/libs/util/libxlu_cfg_l.l
> I'm sure patches to improve the docs would be welcome.

That is merely the DFA for the grammar.  There is a bunch of extra for
interfacing with the parser, then there is the layer above.

> Note that it is still a *subset* of Python, so if you wish to use a
> Python interpreter to parse it in your own tooling, you're very
> welcome to do so.

Yup, and this will make that simpler.  Having Python dictionaries would
make some things even easier though.

> > Notice how many init scripts do `. /etc/default/` to load
> > configuration?  I'm thinking it would be very handy to use a similar
> > technique to load domain.cfg files, with Python being the interpreter.
> 
> I don't think this is a good idea.  Both because I don't think the
> functionality available in a Python interpreter should be available in
> the libxl configuration, and because Python is a large and complex
> dependency which we don't want to pull in here.

While PvGRUB and Tianocore seem likely to displace PyGRUB, PyGRUB still
functions on ARM (which PvGRUB isn't ported to yet).  As such there is
still a dependency.

Too me the greater concern is the daemon process.  Issue is, for the
daemon process the large amount of functionality built into `xl` is a
similar liability.  Even the temporary data seems a large liability for
the daemon.  Additionally if that was a separate executable, then the
process name could be changed via execve() and make it clearer that
process was expected to remain behind.

> > I also think some portions of the domain.cfg format might work better
> > with full Python syntax.  For example might it be handier to allow:
> > 
> > disk = [
> > {
> > 'vdev': 'xvda',
> > 'format': 'raw',
> > 'access': 'rw',
> > 'target': '/dev/disk/by-path/foo-bar-baz',
> > },
> > ]
> 
> I agree that something like this would be nice.  I don't think it
> should be done by importing Python.  These two files - the main part
> of the existing parser - is only 183 loc including comments.
> Extending it (and the support code in libxlu_cfg.c) to do dictionaries
> as well as lists doesn't seem like it would make it too much bigger.

That ignores all the interfacing code.  Add in the interfacing and it is
closer to 2000 lines.  If you merely count the file parser that is only
1000 lines.

I suspect I could interface to libpython with 200-300 lines.  Those would
allow far more

[PATCH v4 13/13] xen/arm: mmu: enable SMMU subsystem only in MMU

2023-07-31 Thread Henry Wang

From: Penny Zheng 

SMMU subsystem is only supported in MMU system, so we make it dependent
on CONFIG_HAS_MMU.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
Signed-off-by: Henry Wang 
---
v4:
- No change
v3:
- new patch
---
 xen/drivers/passthrough/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
index 864fcf3b0c..5a8d666829 100644
--- a/xen/drivers/passthrough/Kconfig
+++ b/xen/drivers/passthrough/Kconfig
@@ -5,6 +5,7 @@ config HAS_PASSTHROUGH
 if ARM
 config ARM_SMMU
bool "ARM SMMUv1 and v2 driver"
+   depends on HAS_MMU
default y
---help---
  Support for implementations of the ARM System MMU architecture
@@ -15,7 +16,7 @@ config ARM_SMMU
 
 config ARM_SMMU_V3
bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
-   depends on ARM_64 && (!ACPI || BROKEN)
+   depends on ARM_64 && (!ACPI || BROKEN) && HAS_MMU
---help---
 Support for implementations of the ARM System MMU architecture
 version 3. Driver is in experimental stage and should not be used in
-- 
2.25.1

[PATCH v4 12/13] xen/arm: mmu: relocate copy_from_paddr() to setup.c

2023-07-31 Thread Henry Wang

From: Penny Zheng 

Function copy_from_paddr() is defined in asm/setup.h, so it is better
to be implemented in setup.c.

Current copy_from_paddr() implementation is mmu-specific, so this
commit moves copy_from_paddr() into mmu/setup.c, and it is also
benefical for us to implement MPU version of copy_from_paddr() in
later commit.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
Signed-off-by: Henry Wang 
---
v4:
- No change
v3:
- new commit
---
 xen/arch/arm/kernel.c| 27 ---
 xen/arch/arm/mmu/setup.c | 27 +++
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 508c54824d..0d433a32e7 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -41,33 +41,6 @@ struct minimal_dtb_header {
 
 #define DTB_MAGIC 0xd00dfeedU
 
-/**
- * copy_from_paddr - copy data from a physical address
- * @dst: destination virtual address
- * @paddr: source physical address
- * @len: length to copy
- */
-void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
-{
-void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
-
-while (len) {
-unsigned long l, s;
-
-s = paddr & (PAGE_SIZE-1);
-l = min(PAGE_SIZE - s, len);
-
-set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
-memcpy(dst, src + s, l);
-clean_dcache_va_range(dst, l);
-clear_fixmap(FIXMAP_MISC);
-
-paddr += l;
-dst += l;
-len -= l;
-}
-}
-
 static void __init place_modules(struct kernel_info *info,
  paddr_t kernbase, paddr_t kernend)
 {
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index e05cca3f86..889ada6b87 100644
--- a/xen/arch/arm/mmu/setup.c
+++ b/xen/arch/arm/mmu/setup.c
@@ -329,6 +329,33 @@ void __init setup_mm(void)
 }
 #endif
 
+/*
+ * copy_from_paddr - copy data from a physical address
+ * @dst: destination virtual address
+ * @paddr: source physical address
+ * @len: length to copy
+ */
+void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
+{
+void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
+
+while (len) {
+unsigned long l, s;
+
+s = paddr & (PAGE_SIZE-1);
+l = min(PAGE_SIZE - s, len);
+
+set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+memcpy(dst, src + s, l);
+clean_dcache_va_range(dst, l);
+clear_fixmap(FIXMAP_MISC);
+
+paddr += l;
+dst += l;
+len -= l;
+}
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1

[PATCH v4 11/13] xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}

2023-07-31 Thread Henry Wang

From: Penny Zheng 

Current P2M implementation is designed for MMU system only.
We move the MMU-specific codes into mmu/p2m.c, and only keep generic
codes in p2m.c, like VMID allocator, etc. We also move MMU-specific
definitions and declarations to mmu/p2m.h, such as p2m_tlb_flush_sync().
Also expose previously static functions p2m_vmid_allocator_init(),
p2m_alloc_vmid(), __p2m_set_entry() and setup_virt_paging_one()
for futher MPU usage.

With the code movement, global variable max_vmid is used in multiple
files instead of a single file (and will be used in MPU P2M
implementation), declare it in the header and remove the "static" of
this variable.

Add #ifdef CONFIG_HAS_MMU to p2m_write_unlock() since future MPU
work does not need p2m_tlb_flush_sync().

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
Signed-off-by: Henry Wang 
---
v4:
- Rework the patch to drop the unnecessary changes.
- Rework the commit msg a bit.
v3:
- remove MPU stubs
- adapt to the introduction of new directories: mmu/
v2:
- new commit
---
 xen/arch/arm/include/asm/mmu/p2m.h |   18 +
 xen/arch/arm/include/asm/p2m.h |   33 +-
 xen/arch/arm/mmu/Makefile  |1 +
 xen/arch/arm/mmu/p2m.c | 1610 +
 xen/arch/arm/p2m.c | 1772 ++--
 5 files changed, 1745 insertions(+), 1689 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/mmu/p2m.h
 create mode 100644 xen/arch/arm/mmu/p2m.c

diff --git a/xen/arch/arm/include/asm/mmu/p2m.h 
b/xen/arch/arm/include/asm/mmu/p2m.h
new file mode 100644
index 00..f829e325ce
--- /dev/null
+++ b/xen/arch/arm/include/asm/mmu/p2m.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ARM_MMU_P2M_H__
+#define __ARM_MMU_P2M_H__
+
+struct p2m_domain;
+void p2m_force_tlb_flush_sync(struct p2m_domain *p2m);
+void p2m_tlb_flush_sync(struct p2m_domain *p2m);
+
+#endif /* __ARM_MMU_P2M_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index 940495d42b..37401461fa 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -19,6 +19,22 @@ extern unsigned int p2m_root_level;
 #define P2M_ROOT_ORDERp2m_root_order
 #define P2M_ROOT_LEVEL p2m_root_level
 
+#define MAX_VMID_8_BIT  (1UL << 8)
+#define MAX_VMID_16_BIT (1UL << 16)
+
+#define INVALID_VMID 0 /* VMID 0 is reserved */
+
+#ifdef CONFIG_ARM_64
+extern unsigned int max_vmid;
+/* VMID is by default 8 bit width on AArch64 */
+#define MAX_VMID   max_vmid
+#else
+/* VMID is always 8 bit width on AArch32 */
+#define MAX_VMIDMAX_VMID_8_BIT
+#endif
+
+#define P2M_ROOT_PAGES(1<
 
+#ifdef CONFIG_HAS_MMU
+#include 
+#endif
+
 static inline bool arch_acquire_resource_check(struct domain *d)
 {
 /*
@@ -180,7 +200,11 @@ void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
  */
 void p2m_restrict_ipa_bits(unsigned int ipa_bits);
 
+void p2m_vmid_allocator_init(void);
+int p2m_alloc_vmid(struct domain *d);
+
 /* Second stage paging setup, to be called on all CPUs */
+void setup_virt_paging_one(void *data);
 void setup_virt_paging(void);
 
 /* Init the datastructures for later use by the p2m code */
@@ -242,8 +266,6 @@ static inline int p2m_is_write_locked(struct p2m_domain 
*p2m)
 return rw_is_write_locked(>lock);
 }
 
-void p2m_tlb_flush_sync(struct p2m_domain *p2m);
-
 /* Look up the MFN corresponding to a domain's GFN. */
 mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t);
 
@@ -269,6 +291,13 @@ int p2m_set_entry(struct p2m_domain *p2m,
   p2m_type_t t,
   p2m_access_t a);
 
+int __p2m_set_entry(struct p2m_domain *p2m,
+gfn_t sgfn,
+unsigned int page_order,
+mfn_t smfn,
+p2m_type_t t,
+p2m_access_t a);
+
 bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn);
 
 void p2m_clear_root_pages(struct p2m_domain *p2m);
diff --git a/xen/arch/arm/mmu/Makefile b/xen/arch/arm/mmu/Makefile
index 4aa1fb466d..a4f07ab90a 100644
--- a/xen/arch/arm/mmu/Makefile
+++ b/xen/arch/arm/mmu/Makefile
@@ -1,2 +1,3 @@
 obj-y += mm.o
+obj-y += p2m.o
 obj-y += setup.o
diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
new file mode 100644
index 00..a916e2318c
--- /dev/null
+++ b/xen/arch/arm/mmu/p2m.c
@@ -0,0 +1,1610 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+unsigned int __read_mostly p2m_root_order;
+unsigned int __read_mostly p2m_root_level;
+
+static mfn_t __read_mostly empty_root_mfn;
+
+static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
+{
+return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
+}
+
+static struct page_info *p2m_alloc_page(struct domain *d)
+{
+struct page_info *pg;
+
+

[PATCH v4 09/13] xen/arm: mm: Use generic variable/function names for extendability

2023-07-31 Thread Henry Wang

From: Penny Zheng 

As preparation for MPU support, which will use some variables/functions
for both MMU and MPU system, We rename the affected variable/function
to more generic names:
- init_ttbr -> init_mm,
- mmu_init_secondary_cpu() -> mm_init_secondary_cpu()
- init_secondary_pagetables() -> init_secondary_mm()
- Add a wrapper update_mm_mapping() for MMU system's
  update_identity_mapping()

Modify the related in-code comment to reflect above changes, take the
opportunity to fix the incorrect coding style of the in-code comments.

Signed-off-by: Penny Zheng 
Signed-off-by: Henry Wang 
---
v4:
- Extract the renaming part from the original patch:
  "[v3,13/52] xen/mmu: extract mmu-specific codes from mm.c/mm.h"
---
 xen/arch/arm/arm32/head.S   |  4 ++--
 xen/arch/arm/arm64/mmu/head.S   |  2 +-
 xen/arch/arm/arm64/mmu/mm.c | 11 ---
 xen/arch/arm/arm64/smpboot.c|  6 +++---
 xen/arch/arm/include/asm/arm64/mm.h |  7 ---
 xen/arch/arm/include/asm/mm.h   | 10 ++
 xen/arch/arm/mmu/mm.c   | 20 ++--
 xen/arch/arm/smpboot.c  |  4 ++--
 8 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 33b038e7e0..03ab68578a 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -238,11 +238,11 @@ GLOBAL(init_secondary)
 secondary_switched:
 /*
  * Non-boot CPUs need to move on to the proper pagetables, which were
- * setup in init_secondary_pagetables.
+ * setup in init_secondary_mm.
  *
  * XXX: This is not compliant with the Arm Arm.
  */
-mov_w r4, init_ttbr  /* VA of HTTBR value stashed by CPU 0 */
+mov_w r4, init_mm/* VA of HTTBR value stashed by CPU 0 */
 ldrd  r4, r5, [r4]   /* Actual value */
 dsb
 mcrr  CP64(r4, r5, HTTBR)
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index 6bd94c3a45..6e638ff082 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -303,7 +303,7 @@ ENDPROC(enable_mmu)
 ENTRY(enable_secondary_cpu_mm)
 mov   x5, lr
 
-load_paddr x0, init_ttbr
+load_paddr x0, init_mm
 ldr   x0, [x0]
 
 blenable_mmu
diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index 78b7c7eb00..ed0fc5ff7b 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -106,7 +106,7 @@ void __init arch_setup_page_tables(void)
 prepare_runtime_identity_mapping();
 }
 
-void update_identity_mapping(bool enable)
+static void update_identity_mapping(bool enable)
 {
 paddr_t id_addr = virt_to_maddr(_start);
 int rc;
@@ -120,6 +120,11 @@ void update_identity_mapping(bool enable)
 BUG_ON(rc);
 }
 
+void update_mm_mapping(bool enable)
+{
+update_identity_mapping(enable);
+}
+
 extern void switch_ttbr_id(uint64_t ttbr);
 
 typedef void (switch_ttbr_fn)(uint64_t ttbr);
@@ -131,7 +136,7 @@ void __init switch_ttbr(uint64_t ttbr)
 lpae_t pte;
 
 /* Enable the identity mapping in the boot page tables */
-update_identity_mapping(true);
+update_mm_mapping(true);
 
 /* Enable the identity mapping in the runtime page tables */
 pte = pte_of_xenaddr((vaddr_t)switch_ttbr_id);
@@ -148,7 +153,7 @@ void __init switch_ttbr(uint64_t ttbr)
  * Note it is not necessary to disable it in the boot page tables
  * because they are not going to be used by this CPU anymore.
  */
-update_identity_mapping(false);
+update_mm_mapping(false);
 }
 
 /*
diff --git a/xen/arch/arm/arm64/smpboot.c b/xen/arch/arm/arm64/smpboot.c
index 9637f42469..2b1d086a1e 100644
--- a/xen/arch/arm/arm64/smpboot.c
+++ b/xen/arch/arm/arm64/smpboot.c
@@ -111,18 +111,18 @@ int arch_cpu_up(int cpu)
 if ( !smp_enable_ops[cpu].prepare_cpu )
 return -ENODEV;
 
-update_identity_mapping(true);
+update_mm_mapping(true);
 
 rc = smp_enable_ops[cpu].prepare_cpu(cpu);
 if ( rc )
-update_identity_mapping(false);
+update_mm_mapping(false);
 
 return rc;
 }
 
 void arch_cpu_up_finish(void)
 {
-update_identity_mapping(false);
+update_mm_mapping(false);
 }
 
 /*
diff --git a/xen/arch/arm/include/asm/arm64/mm.h 
b/xen/arch/arm/include/asm/arm64/mm.h
index e0bd23a6ed..7a389c4b21 100644
--- a/xen/arch/arm/include/asm/arm64/mm.h
+++ b/xen/arch/arm/include/asm/arm64/mm.h
@@ -15,13 +15,14 @@ static inline bool arch_mfns_in_directmap(unsigned long 
mfn, unsigned long nr)
 void arch_setup_page_tables(void);
 
 /*
- * Enable/disable the identity mapping in the live page-tables (i.e.
- * the one pointed by TTBR_EL2).
+ * In MMU system, enable/disable the identity mapping in the live
+ * page-tables (i.e. the one pointed by TTBR_EL2) through
+ * update_identity_mapping().
  *
  * Note that nested call (e.g. enable=true, enable=true) is not
  * supported.
  */
-void

[PATCH v4 08/13] xen/arm: Fold pmap and fixmap into MMU system

2023-07-31 Thread Henry Wang

From: Penny Zheng 

fixmap and pmap are MMU-specific features, so fold them to MMU system.
Do the folding for pmap by moving the HAS_PMAP Kconfig selection under
HAS_MMU. Do the folding for fixmap by moving the implementation of
virt_to_fix() to mmu/mm.c, so that unnecessary stubs can be avoided.

Signed-off-by: Penny Zheng 
Signed-off-by: Henry Wang 
---
v4:
- Rework "[v3,11/52] xen/arm: mmu: fold FIXMAP into MMU system",
  change the order of this patch and avoid introducing stubs.
---
 xen/arch/arm/Kconfig  | 2 +-
 xen/arch/arm/include/asm/fixmap.h | 7 +--
 xen/arch/arm/mmu/mm.c | 7 +++
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 0e38e9ba17..1b60a938c2 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -15,7 +15,6 @@ config ARM
select HAS_DEVICE_TREE
select HAS_PASSTHROUGH
select HAS_PDX
-   select HAS_PMAP
select HAS_UBSAN
select IOMMU_FORCE_PT_SHARE
 
@@ -71,6 +70,7 @@ choice
 
 config HAS_MMU
bool "MMU for a VMSA system"
+   select HAS_PMAP
 endchoice
 
 source "arch/Kconfig"
diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
index 734eb9b1d4..5d5de6995a 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -36,12 +36,7 @@ extern void clear_fixmap(unsigned int map);
 
 #define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot))
 
-static inline unsigned int virt_to_fix(vaddr_t vaddr)
-{
-BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
-
-return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
-}
+extern unsigned int virt_to_fix(vaddr_t vaddr);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/xen/arch/arm/mmu/mm.c b/xen/arch/arm/mmu/mm.c
index b70982e9d6..1d6267e6c5 100644
--- a/xen/arch/arm/mmu/mm.c
+++ b/xen/arch/arm/mmu/mm.c
@@ -1136,6 +1136,13 @@ int __init populate_pt_range(unsigned long virt, 
unsigned long nr_mfns)
 return xen_pt_update(virt, INVALID_MFN, nr_mfns, _PAGE_POPULATE);
 }
 
+unsigned int virt_to_fix(vaddr_t vaddr)
+{
+BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
+
+return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1

[PATCH v4 10/13] xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c

2023-07-31 Thread Henry Wang

From: Penny Zheng 

setup_mm is used for Xen to setup memory management subsystem at boot
time, like boot allocator, direct-mapping, xenheap initialization,
frametable and static memory pages.

We could inherit some components seamlessly in later MPU system like
boot allocator, whilst we need to implement some components differently
in MPU, like xenheap, etc. There are some components that is specific
to MMU only, like direct-mapping.

In the commit, we move MMU-specific components into mmu/setup.c, in
preparation of implementing MPU version of setup_mm later in future
commit. Also, make init_pdx(), init_staticmem_pages(), setup_mm(), and
populate_boot_allocator() public for future MPU inplementation.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
Signed-off-by: Henry Wang 
---
v4:
- No change
---
 xen/arch/arm/include/asm/setup.h |   5 +
 xen/arch/arm/mmu/Makefile|   1 +
 xen/arch/arm/mmu/setup.c | 339 +++
 xen/arch/arm/setup.c | 326 +
 4 files changed, 349 insertions(+), 322 deletions(-)
 create mode 100644 xen/arch/arm/mmu/setup.c

diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index f0f64d228c..0922549631 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -156,6 +156,11 @@ struct bootcmdline 
*boot_cmdline_find_by_kind(bootmodule_kind kind);
 struct bootcmdline * boot_cmdline_find_by_name(const char *name);
 const char *boot_module_kind_as_string(bootmodule_kind kind);
 
+extern void init_pdx(void);
+extern void init_staticmem_pages(void);
+extern void populate_boot_allocator(void);
+extern void setup_mm(void);
+
 extern uint32_t hyp_traps_vector[];
 void init_traps(void);
 
diff --git a/xen/arch/arm/mmu/Makefile b/xen/arch/arm/mmu/Makefile
index b18cec4836..4aa1fb466d 100644
--- a/xen/arch/arm/mmu/Makefile
+++ b/xen/arch/arm/mmu/Makefile
@@ -1 +1,2 @@
 obj-y += mm.o
+obj-y += setup.o
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
new file mode 100644
index 00..e05cca3f86
--- /dev/null
+++ b/xen/arch/arm/mmu/setup.c
@@ -0,0 +1,339 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * xen/arch/arm/mmu/setup.c
+ *
+ * MMU-specific early bringup code for an ARMv7-A with virt extensions.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_ARM_32
+static unsigned long opt_xenheap_megabytes __initdata;
+integer_param("xenheap_megabytes", opt_xenheap_megabytes);
+
+/*
+ * Returns the end address of the highest region in the range s..e
+ * with required size and alignment that does not conflict with the
+ * modules from first_mod to nr_modules.
+ *
+ * For non-recursive callers first_mod should normally be 0 (all
+ * modules and Xen itself) or 1 (all modules but not Xen).
+ */
+static paddr_t __init consider_modules(paddr_t s, paddr_t e,
+   uint32_t size, paddr_t align,
+   int first_mod)
+{
+const struct bootmodules *mi = 
+int i;
+int nr;
+
+s = (s+align-1) & ~(align-1);
+e = e & ~(align-1);
+
+if ( s > e ||  e - s < size )
+return 0;
+
+/* First check the boot modules */
+for ( i = first_mod; i < mi->nr_mods; i++ )
+{
+paddr_t mod_s = mi->module[i].start;
+paddr_t mod_e = mod_s + mi->module[i].size;
+
+if ( s < mod_e && mod_s < e )
+{
+mod_e = consider_modules(mod_e, e, size, align, i+1);
+if ( mod_e )
+return mod_e;
+
+return consider_modules(s, mod_s, size, align, i+1);
+}
+}
+
+/* Now check any fdt reserved areas. */
+
+nr = fdt_num_mem_rsv(device_tree_flattened);
+
+for ( ; i < mi->nr_mods + nr; i++ )
+{
+paddr_t mod_s, mod_e;
+
+if ( fdt_get_mem_rsv_paddr(device_tree_flattened,
+   i - mi->nr_mods,
+   _s, _e ) < 0 )
+/* If we can't read it, pretend it doesn't exist... */
+continue;
+
+/* fdt_get_mem_rsv_paddr returns length */
+mod_e += mod_s;
+
+if ( s < mod_e && mod_s < e )
+{
+mod_e = consider_modules(mod_e, e, size, align, i+1);
+if ( mod_e )
+return mod_e;
+
+return consider_modules(s, mod_s, size, align, i+1);
+}
+}
+
+/*
+ * i is the current bootmodule we are evaluating, across all
+ * possible kinds of bootmodules.
+ *
+ * When retrieving the corresponding reserved-memory addresses, we
+ * need to index the bootinfo.reserved_mem bank starting from 0, and
+ * only counting the reserved-memory modules. Hence, we need to use
+ * i - nr.
+ */
+nr += mi->nr_mods;
+for ( ; i - nr < bootinfo.reserved_mem.nr_banks; i++ )
+{
+paddr_t r_s = bootinfo.reserved_mem.bank[i -

[PATCH v4 07/13] xen/arm: Extract MMU-specific code

2023-07-31 Thread Henry Wang

Currently, most of the MMU-specific code is in mm.{c,h}. To make the
mm extendable, this commit extract the MMU-specific code by firstly:
- Create a arch/arm/include/asm/mmu/ subdir.
- Create a arch/arm/mmu/ subdir.

Then move the MMU-specific code to above mmu subdir, which includes
below changes:
- Move arch/arm/arm64/mm.c to arch/arm/arm64/mmu/mm.c
- Move MMU-related declaration in arch/arm/include/asm/mm.h to
  arch/arm/include/asm/mmu/mm.h
- Move the MMU-related declaration dump_pt_walk() in asm/page.h
  and pte_of_xenaddr() in asm/setup.h to the new asm/mmu/mm.h.
- Move MMU-related code in arch/arm/mm.c to arch/arm/mmu/mm.c.

Also modify the build system (Makefiles in this case) to pick above
mentioned code changes.

This patch is a pure code movement, no functional change intended.

Signed-off-by: Henry Wang 
---
With the code movement of this patch, the descriptions on top of
xen/arch/arm/mm.c and xen/arch/arm/mmu/mm.c might need some changes,
suggestions?
v4:
- Rework "[v3,13/52] xen/mmu: extract mmu-specific codes from
  mm.c/mm.h" with the lastest staging branch, only do the code movement
  in this patch to ease the review.
---
 xen/arch/arm/Makefile |1 +
 xen/arch/arm/arm64/Makefile   |1 -
 xen/arch/arm/arm64/mmu/Makefile   |1 +
 xen/arch/arm/arm64/{ => mmu}/mm.c |0
 xen/arch/arm/include/asm/mm.h |   20 +-
 xen/arch/arm/include/asm/mmu/mm.h |   55 ++
 xen/arch/arm/include/asm/page.h   |   15 -
 xen/arch/arm/include/asm/setup.h  |3 -
 xen/arch/arm/mm.c | 1119 
 xen/arch/arm/mmu/Makefile |1 +
 xen/arch/arm/mmu/mm.c | 1146 +
 11 files changed, 1208 insertions(+), 1154 deletions(-)
 rename xen/arch/arm/arm64/{ => mmu}/mm.c (100%)
 create mode 100644 xen/arch/arm/include/asm/mmu/mm.h
 create mode 100644 xen/arch/arm/mmu/Makefile
 create mode 100644 xen/arch/arm/mmu/mm.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 7bf07e9920..9801b9dfd0 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_ARM_32) += arm32/
 obj-$(CONFIG_ARM_64) += arm64/
 obj-$(CONFIG_ACPI) += acpi/
+obj-$(CONFIG_HAS_MMU) += mmu/
 obj-$(CONFIG_HAS_PCI) += pci/
 ifneq ($(CONFIG_NO_PLAT),y)
 obj-y += platforms/
diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index ce8749f046..731d00cc8a 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -11,7 +11,6 @@ obj-y += entry.o
 obj-y += head.o
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
-obj-y += mm.o
 obj-y += smc.o
 obj-y += smpboot.o
 obj-$(CONFIG_ARM64_SVE) += sve.o sve-asm.o
diff --git a/xen/arch/arm/arm64/mmu/Makefile b/xen/arch/arm/arm64/mmu/Makefile
index 3340058c08..a8a750a3d0 100644
--- a/xen/arch/arm/arm64/mmu/Makefile
+++ b/xen/arch/arm/arm64/mmu/Makefile
@@ -1 +1,2 @@
 obj-y += head.o
+obj-y += mm.o
diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mmu/mm.c
similarity index 100%
rename from xen/arch/arm/arm64/mm.c
rename to xen/arch/arm/arm64/mmu/mm.c
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 4262165ce2..e70ce4dc61 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -14,6 +14,10 @@
 # error "unknown ARM variant"
 #endif
 
+#ifdef CONFIG_HAS_MMU
+#include 
+#endif
+
 /* Align Xen to a 2 MiB boundary. */
 #define XEN_PADDR_ALIGN (1 << 21)
 
@@ -165,13 +169,6 @@ struct page_info
 #define _PGC_need_scrub   _PGC_allocated
 #define PGC_need_scrubPGC_allocated
 
-extern mfn_t directmap_mfn_start, directmap_mfn_end;
-extern vaddr_t directmap_virt_end;
-#ifdef CONFIG_ARM_64
-extern vaddr_t directmap_virt_start;
-extern unsigned long directmap_base_pdx;
-#endif
-
 #ifdef CONFIG_ARM_32
 #define is_xen_heap_page(page) is_xen_heap_mfn(page_to_mfn(page))
 #define is_xen_heap_mfn(mfn) ({ \
@@ -194,7 +191,6 @@ extern unsigned long directmap_base_pdx;
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma
 
-#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 /* PDX of the first page in the frame table. */
 extern unsigned long frametable_base_pdx;
 
@@ -207,8 +203,6 @@ extern unsigned long total_pages;
 extern void setup_pagetables(unsigned long boot_phys_offset);
 /* Map FDT in boot pagetable */
 extern void *early_fdt_map(paddr_t fdt_paddr);
-/* Switch to a new root page-tables */
-extern void switch_ttbr(uint64_t ttbr);
 /* Remove early mappings */
 extern void remove_early_mappings(void);
 /* Allocate and initialise pagetables for a secondary CPU. Sets init_ttbr to 
the
@@ -216,12 +210,6 @@ extern void remove_early_mappings(void);
 extern int init_secondary_pagetables(int cpu);
 /* Switch secondary CPUS to its own pagetables and finalise MMU setup */
 extern void mmu_init_secondary_cpu(void);
-/*
- * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
- * always-mapped memory.

[PATCH v4 06/13] xen/arm64: Move setup_fixmap() to create_page_tables()

2023-07-31 Thread Henry Wang

The original assembly setup_fixmap() is actually doing two seperate
tasks, one is enabling the early UART when earlyprintk on, and the
other is to set up the fixmap (even when earlyprintk is off).

Per discussion in [1], since commit
9d267c049d92 ("xen/arm64: Rework the memory layout"), there is no
chance that the fixmap and the mapping of early UART will clash with
the 1:1 mapping. Therefore the mapping of both the fixmap and the
early UART can be moved to the end of create_pagetables(). For the
future MPU support work, the early UART mapping could then be moved
in prepare_early_mappings().

No functional change intended.

[1] 
https://lore.kernel.org/xen-devel/78862bb8-fd7f-5a51-a7ae-3c5b5998e...@xen.org/

Signed-off-by: Henry Wang 
---
v4:
- Rework "[v3,12/52] xen/mmu: extract early uart mapping from setup_fixmap"
---
 xen/arch/arm/arm64/head.S |  1 -
 xen/arch/arm/arm64/mmu/head.S | 50 ---
 2 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index e4f579a48e..56f68a8e37 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -275,7 +275,6 @@ real_start_efi:
 b enable_boot_cpu_mm
 
 primary_switched:
-blsetup_fixmap
 #ifdef CONFIG_EARLY_PRINTK
 /* Use a virtual address to access the UART. */
 ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index b7c3dd423a..6bd94c3a45 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -231,6 +231,23 @@ link_from_second_id:
 create_table_entry boot_second_id, boot_third_id, x19, 2, x0, x1, x2
 link_from_third_id:
 create_mapping_entry boot_third_id, x19, x19, x0, x1, x2
+
+#ifdef CONFIG_EARLY_PRINTK
+/* Add UART to the fixmap table */
+ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
+/* x23: Early UART base physical address */
+create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
+#endif
+/* Map fixmap into boot_second */
+ldr   x0, =FIXMAP_ADDR(0)
+create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
+/* Ensure any page table updates made above have occurred. */
+dsb   nshst
+/*
+ * The fixmap area will be used soon after. So ensure no hardware
+ * translation happens before the dsb completes.
+ */
+isb
 ret
 
 virtphys_clash:
@@ -395,39 +412,6 @@ identity_mapping_removed:
 ret
 ENDPROC(remove_identity_mapping)
 
-/*
- * Map the UART in the fixmap (when earlyprintk is used) and hook the
- * fixmap table in the page tables.
- *
- * The fixmap cannot be mapped in create_page_tables because it may
- * clash with the 1:1 mapping.
- *
- * Inputs:
- *   x20: Physical offset
- *   x23: Early UART base physical address
- *
- * Clobbers x0 - x3
- */
-ENTRY(setup_fixmap)
-#ifdef CONFIG_EARLY_PRINTK
-/* Add UART to the fixmap table */
-ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
-create_mapping_entry xen_fixmap, x0, x23, x1, x2, x3, type=PT_DEV_L3
-#endif
-/* Map fixmap into boot_second */
-ldr   x0, =FIXMAP_ADDR(0)
-create_table_entry boot_second, xen_fixmap, x0, 2, x1, x2, x3
-/* Ensure any page table updates made above have occurred. */
-dsb   nshst
-/*
- * The fixmap area will be used soon after. So ensure no hardware
- * translation happens before the dsb completes.
- */
-isb
-
-ret
-ENDPROC(setup_fixmap)
-
 /* Fail-stop */
 fail:   PRINT("- Boot failed -\r\n")
 1:  wfe
-- 
2.25.1

[PATCH v4 05/13] xen/arm: Move MMU related definitions from config.h to mmu/layout.h

2023-07-31 Thread Henry Wang

From: Wei Chen 

Xen defines some global configuration macros for Arm in config.h.
However there are some address layout related definitions that are
defined for MMU systems only, and these definitions could not be
used by MPU systems. Adding ifdefs with CONFIG_HAS_MPU to gate these
definitions will result in a messy and hard-to-read/maintain code.

So move MMU related definitions to a new file, i.e. mmu/layout.h to
avoid spreading "#ifdef" everywhere.

Signed-off-by: Wei Chen 
Signed-off-by: Penny Zheng 
Signed-off-by: Henry Wang 
---
v4:
- Rebase on top of latest staging to pick the recent UBSAN change
  to the layout.
- Use #ifdef CONFIG_HAS_MMU instead of #ifndef CONFIG_HAS_MPU, add
  a #else case.
- Rework commit message.
v3:
- name the new header layout.h
v2:
- Remove duplicated FIXMAP definitions from config_mmu.h
---
 xen/arch/arm/include/asm/config.h | 132 +--
 xen/arch/arm/include/asm/mmu/layout.h | 146 ++
 2 files changed, 149 insertions(+), 129 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/mmu/layout.h

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 83cbf6b0cb..a3cde7f2d7 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -71,136 +71,10 @@
 #include 
 #include 
 
-/*
- * ARM32 layout:
- *   0  -   2M   Unmapped
- *   2M -  10M   Xen text, data, bss
- *  10M -  12M   Fixmap: special-purpose 4K mapping slots
- *  12M -  16M   Early boot mapping of FDT
- *  16M -  18M   Livepatch vmap (if compiled in)
- *
- *  32M - 128M   Frametable: 32 bytes per page for 12GB of RAM
- * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
- *space
- *
- *   1G -   2G   Xenheap: always-mapped memory
- *   2G -   4G   Domheap: on-demand-mapped
- *
- * ARM64 layout:
- * 0x - 0x01ff (2TB, L0 slots [0..3])
- *
- *  Reserved to identity map Xen
- *
- * 0x0200 - 0x027f (512GB, L0 slot [4])
- *  (Relative offsets)
- *   0  -   2M   Unmapped
- *   2M -  10M   Xen text, data, bss
- *  10M -  12M   Fixmap: special-purpose 4K mapping slots
- *  12M -  16M   Early boot mapping of FDT
- *  16M -  18M   Livepatch vmap (if compiled in)
- *
- *   1G -   2G   VMAP: ioremap and early_ioremap
- *
- *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
- *
- * 0x0280 - 0x7fff (125TB, L0 slots [5..255])
- *  Unused
- *
- * 0x8000 - 0x84ff (5TB, L0 slots [256..265])
- *  1:1 mapping of RAM
- *
- * 0x8500 - 0x (123TB, L0 slots [266..511])
- *  Unused
- */
-
-#ifdef CONFIG_ARM_32
-#define XEN_VIRT_START  _AT(vaddr_t, MB(2))
+#ifdef CONFIG_HAS_MMU
+#include 
 #else
-
-#define SLOT0_ENTRY_BITS  39
-#define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
-#define SLOT0_ENTRY_SIZE  SLOT0(1)
-
-#define XEN_VIRT_START  (SLOT0(4) + _AT(vaddr_t, MB(2)))
-#endif
-
-/*
- * Reserve enough space so both UBSAN and GCOV can be enabled together
- * plus some slack for future growth.
- */
-#define XEN_VIRT_SIZE   _AT(vaddr_t, MB(8))
-#define XEN_NR_ENTRIES(lvl) (XEN_VIRT_SIZE / XEN_PT_LEVEL_SIZE(lvl))
-
-#define FIXMAP_VIRT_START   (XEN_VIRT_START + XEN_VIRT_SIZE)
-#define FIXMAP_VIRT_SIZE_AT(vaddr_t, MB(2))
-
-#define FIXMAP_ADDR(n)  (FIXMAP_VIRT_START + (n) * PAGE_SIZE)
-
-#define BOOT_FDT_VIRT_START (FIXMAP_VIRT_START + FIXMAP_VIRT_SIZE)
-#define BOOT_FDT_VIRT_SIZE  _AT(vaddr_t, MB(4))
-
-#ifdef CONFIG_LIVEPATCH
-#define LIVEPATCH_VMAP_START(BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE)
-#define LIVEPATCH_VMAP_SIZE_AT(vaddr_t, MB(2))
-#endif
-
-#define HYPERVISOR_VIRT_START  XEN_VIRT_START
-
-#ifdef CONFIG_ARM_32
-
-#define CONFIG_SEPARATE_XENHEAP 1
-
-#define FRAMETABLE_VIRT_START  _AT(vaddr_t, MB(32))
-#define FRAMETABLE_SIZEMB(128-32)
-#define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
-
-#define VMAP_VIRT_START_AT(vaddr_t, MB(256))
-#define VMAP_VIRT_SIZE _AT(vaddr_t, GB(1) - MB(256))
-
-#define XENHEAP_VIRT_START _AT(vaddr_t, GB(1))
-#define XENHEAP_VIRT_SIZE  _AT(vaddr_t, GB(1))
-
-#define DOMHEAP_VIRT_START _AT(vaddr_t, GB(2))
-#define DOMHEAP_VIRT_SIZE  _AT(vaddr_t, GB(2))
-
-#define DOMHEAP_ENTRIES1024  /* 1024 2MB mapping slots */
-
-/* Number of domheap pagetable pages required at the second level (2MB 
mappings) */
-#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
-
-/*
- * The temporary area is overlapping with the domheap area. This may
- * be used to create an alias of the first slot containing Xen mappings
- * when turning on/off the MMU.
- */
-#define TEMPORARY_AREA_FIRST_SLOT(first_table_offset(DOMHEAP_VIRT_START))
-
-/* Calculate the address in the temporary area */
-#define TEMPORARY_AREA_ADDR(addr)   \
- (((addr) & ~XEN_PT_LEVEL_MASK(1)) |

[PATCH v4 04/13] xen/arm64: Split and move MMU-specific head.S to mmu/head.S

2023-07-31 Thread Henry Wang

The MMU specific code in head.S will not be used on MPU systems.
Instead of introducing more #ifdefs which will bring complexity
to the code, move MMU related code to mmu/head.S and keep common
code in head.S. Two notes while moving:
- As "fail" in original head.S is very simple and this name is too
  easy to be conflicted, duplicate it in mmu/head.S instead of
  exporting it.
- Use ENTRY() for enable_secondary_cpu_mm, enable_boot_cpu_mm and
  setup_fixmap to please the compiler after the code movement.

Also move the assembly macros shared by head.S and mmu/head.S to
macros.h.

Note that, only the first 4KB of Xen image will be mapped as
identity (PA == VA). At the moment, Xen guarantees this by having
everything that needs to be used in the identity mapping in
.text.header section of head.S, and the size will be checked by
_idmap_start and _idmap_end at link time if this fits in 4KB.
Since we are introducing a new head.S in this patch, although
we can add .text.header to the new file to guarantee all identity
map code still in the first 4KB. However, the order of these two
files on this 4KB depends on the build toolchains. Hence, introduce
a new section named .text.idmap in the region between _idmap_start
and _idmap_end. And in Xen linker script, we force the .text.idmap
contents to linked after .text.header. This will ensure code of
head.S always be at the top of Xen binary.

Signed-off-by: Henry Wang 
Signed-off-by: Wei Chen 
---
v4:
- Rework "[v3,08/52] xen/arm64: move MMU related code from
  head.S to mmu/head.S"
- Don't move the "yet to shared" macro such as print_reg.
- Fold "[v3,04/52] xen/arm: add .text.idmap in ld script for Xen
  identity map sections" to this patch. Rework commit msg.
---
 xen/arch/arm/arm64/Makefile |   1 +
 xen/arch/arm/arm64/head.S   | 497 +---
 xen/arch/arm/arm64/mmu/Makefile |   1 +
 xen/arch/arm/arm64/mmu/head.S   | 487 +++
 xen/arch/arm/include/asm/arm64/macros.h |  36 ++
 xen/arch/arm/xen.lds.S  |   1 +
 6 files changed, 527 insertions(+), 496 deletions(-)
 create mode 100644 xen/arch/arm/arm64/mmu/Makefile
 create mode 100644 xen/arch/arm/arm64/mmu/head.S

diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index 54ad55c75c..ce8749f046 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -1,4 +1,5 @@
 obj-y += lib/
+obj-$(CONFIG_HAS_MMU) += mmu/
 
 obj-y += cache.o
 obj-y += cpufeature.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index b29bffce5b..e4f579a48e 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -28,17 +28,6 @@
 #include 
 #endif
 
-#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_MEM0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
-#define PT_MEM_L3 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_DEV0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */
-#define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */
-
-/* Convenience defines to get slot used by Xen mapping. */
-#define XEN_ZEROETH_SLOTzeroeth_table_offset(XEN_VIRT_START)
-#define XEN_FIRST_SLOT  first_table_offset(XEN_VIRT_START)
-#define XEN_SECOND_SLOT second_table_offset(XEN_VIRT_START)
-
 #define __HEAD_FLAG_PAGE_SIZE   ((PAGE_SHIFT - 10) / 2)
 
 #define __HEAD_FLAG_PHYS_BASE   1
@@ -85,19 +74,7 @@
  *  x30 - lr
  */
 
-#ifdef CONFIG_EARLY_PRINTK
-/*
- * Macro to print a string to the UART, if there is one.
- *
- * Clobbers x0 - x3
- */
-#define PRINT(_s)  \
-mov   x3, lr ; \
-adr_l x0, 98f ;\
-blasm_puts ;   \
-mov   lr, x3 ; \
-RODATA_STR(98, _s)
-
+ #ifdef CONFIG_EARLY_PRINTK
 /*
  * Macro to print the value of register \xb
  *
@@ -111,31 +88,11 @@
 .endm
 
 #else /* CONFIG_EARLY_PRINTK */
-#define PRINT(s)
-
 .macro print_reg xb
 .endm
 
 #endif /* !CONFIG_EARLY_PRINTK */
 
-/*
- * Pseudo-op for PC relative adr ,  where  is
- * within the range +/- 4GB of the PC.
- *
- * @dst: destination register (64 bit wide)
- * @sym: name of the symbol
- */
-.macro  adr_l, dst, sym
-adrp \dst, \sym
-add  \dst, \dst, :lo12:\sym
-.endm
-
-/* Load the physical address of a symbol into xb */
-.macro load_paddr xb, sym
-ldr \xb, =\sym
-add \xb, \xb, x20
-.endm
-
 .section .text.header, "ax", %progbits
 /*.aarch64*/
 
@@ -472,413 +429,6 @@ cpu_init:
 ret
 ENDPROC(cpu_init)
 
-/*
- * Macro to find the slot number at a given page-table level
- *
- * slot: slot computed
- * virt: virtual address
- * lvl:  page-table level
- */
-.macro get_table_slot, slot, virt, lvl
-ubfx  \slot, \virt, #XEN_PT_LEVEL_SHIFT(\lvl), #XEN_PT_LPAE_SHIFT
-.endm
-
-/*
- * Macro to create a page table entry in \ptbl to \tbl
- * ptbl:table symbol where the entry will be created
- * tbl: physical address of the table to

[PATCH v4 03/13] xen/arm64: prepare for moving MMU related code from head.S

2023-07-31 Thread Henry Wang

From: Wei Chen 

We want to reuse head.S for MPU systems, but there are some
code are implemented for MMU systems only. We will move such
code to another MMU specific file. But before that we will
do some indentations fix in this patch to make them be easier
for reviewing:
1. Fix the indentations and incorrect style of code comments.
2. Fix the indentations for .text.header section.
3. Rename puts() to asm_puts() for global export

Signed-off-by: Wei Chen 
Signed-off-by: Penny Zheng 
Signed-off-by: Henry Wang 
---
v4:
- Rebase to pick the adr -> adr_l change in PRINT(_s).
- Correct in-code comment for asm_puts() and add a note to
  mention that asm_puts() should be only called from assembly.
- Drop redundant puts (now asm_puts) under CONFIG_EARLY_PRINTK.
v3:
-  fix commit message
-  Rename puts() to asm_puts() for global export
v2:
-  New patch.
---
 xen/arch/arm/arm64/head.S | 46 ---
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 2af9f974d5..b29bffce5b 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -94,7 +94,7 @@
 #define PRINT(_s)  \
 mov   x3, lr ; \
 adr_l x0, 98f ;\
-blputs;\
+blasm_puts ;   \
 mov   lr, x3 ; \
 RODATA_STR(98, _s)
 
@@ -136,21 +136,21 @@
 add \xb, \xb, x20
 .endm
 
-.section .text.header, "ax", %progbits
-/*.aarch64*/
+.section .text.header, "ax", %progbits
+/*.aarch64*/
 
-/*
- * Kernel startup entry point.
- * ---
- *
- * The requirements are:
- *   MMU = off, D-cache = off, I-cache = on or off,
- *   x0 = physical address to the FDT blob.
- *
- * This must be the very first address in the loaded image.
- * It should be linked at XEN_VIRT_START, and loaded at any
- * 4K-aligned address.
- */
+/*
+ * Kernel startup entry point.
+ * ---
+ *
+ * The requirements are:
+ *   MMU = off, D-cache = off, I-cache = on or off,
+ *   x0 = physical address to the FDT blob.
+ *
+ * This must be the very first address in the loaded image.
+ * It should be linked at XEN_VIRT_START, and loaded at any
+ * 4K-aligned address.
+ */
 
 GLOBAL(start)
 /*
@@ -535,7 +535,7 @@ ENDPROC(cpu_init)
  * Macro to create a mapping entry in \tbl to \phys. Only mapping in 3rd
  * level table (i.e page granularity) is supported.
  *
- * ptbl: table symbol where the entry will be created
+ * ptbl:table symbol where the entry will be created
  * virt:virtual address
  * phys:physical address (should be page aligned)
  * tmp1:scratch register
@@ -970,19 +970,22 @@ init_uart:
 ret
 ENDPROC(init_uart)
 
-/* Print early debug messages.
+/*
+ * Print early debug messages.
+ * Note: This function is only supposed to be called from assembly.
  * x0: Nul-terminated string to print.
  * x23: Early UART base address
- * Clobbers x0-x1 */
-puts:
+ * Clobbers x0-x1
+ */
+ENTRY(asm_puts)
 early_uart_ready x23, 1
 ldrb  w1, [x0], #1   /* Load next char */
 cbz   w1, 1f /* Exit on nul */
 early_uart_transmit x23, w1
-b puts
+b asm_puts
 1:
 ret
-ENDPROC(puts)
+ENDPROC(asm_puts)
 
 /*
  * Print a 64-bit number in hex.
@@ -1012,7 +1015,6 @@ hex:.ascii "0123456789abcdef"
 
 ENTRY(early_puts)
 init_uart:
-puts:
 putn:   ret
 
 #endif /* !CONFIG_EARLY_PRINTK */
-- 
2.25.1

[PATCH v4 02/13] xen/arm: Introduce 'choice' for memory system architecture

2023-07-31 Thread Henry Wang

There are two types of memory system architectures available for
Arm-based systems, namely the Virtual Memory System Architecture (VMSA)
and the Protected Memory System Architecture (PMSA). According to
ARM DDI 0487G.a, A VMSA provides a Memory Management Unit (MMU) that
controls address translation, access permissions, and memory attribute
determination and checking, for memory accesses made by the PE. And
refer to ARM DDI 0600A.c, the PMSA supports a unified memory protection
scheme where an Memory Protection Unit (MPU) manages instruction and
data access. Currently, Xen only suuports VMSA.

As a preparation of the Xen MPU (PMSA) support. Introduce a Kconfig
choice under the "Architecture Features" menu for user to choose the
memory system architecture for the system. Since currently only VMSA
is supported, only add the bool CONFIG_HAS_MMU to keep consistent with
the default behavior. User can choose either VMSA or PMSA but not both
in the future after PMSA/MPU is supported in Xen.

Suggested-by: Julien Grall 
Signed-off-by: Henry Wang 
---
v4:
- Completely rework "[v3,06/52] xen/arm: introduce CONFIG_HAS_MMU"
---
 xen/arch/arm/Kconfig | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index fd57a82dd2..0e38e9ba17 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -59,6 +59,20 @@ config PADDR_BITS
default 40 if ARM_PA_BITS_40
default 48 if ARM_64
 
+choice
+   prompt "Memory system architecture"
+   default HAS_MMU
+   help
+ User can choose the memory system architecture.
+ A Virtual Memory System Architecture (VMSA) provides a Memory 
Management
+ Unit (MMU) that controls address translation, access permissions, and
+ memory attribute determination and checking, for memory accesses made 
by
+ the PE.
+
+config HAS_MMU
+   bool "MMU for a VMSA system"
+endchoice
+
 source "arch/Kconfig"
 
 config ACPI
-- 
2.25.1

[PATCH v4 01/13] xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm

2023-07-31 Thread Henry Wang

From: Wei Chen 

At the moment, on MMU system, enable_mmu() will return to an
address in the 1:1 mapping, then each path is responsible to
switch to virtual runtime mapping. Then remove_identity_mapping()
is called on the boot CPU to remove all 1:1 mapping.

Since remove_identity_mapping() is not necessary on Non-MMU system,
and we also avoid creating empty function for Non-MMU system, trying
to keep only one codeflow in arm64/head.S, we move path switch and
remove_identity_mapping() in enable_mmu() on MMU system.

As the remove_identity_mapping should only be called for the boot
CPU only, so we introduce enable_boot_cpu_mm() for boot CPU and
enable_secondary_cpu_mm() for secondary CPUs in this patch.

Signed-off-by: Wei Chen 
Signed-off-by: Penny Zheng 
Signed-off-by: Henry Wang 
---
v4:
- Clarify remove_identity_mapping() is called on boot CPU and keep
  the function/proc format consistent in commit msg.
- Drop inaccurate (due to the refactor) in-code comment.
- Rename enable_{boot,runtime}_mmu to enable_{boot,secondary}_cpu_mm.
- Reword the in-code comment on top of enable_{boot,secondary}_cpu_mm.
- Call "fail" for unreachable code.
v3:
- new patch
---
 xen/arch/arm/arm64/head.S | 89 ++-
 1 file changed, 70 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 31cdb54d74..2af9f974d5 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -313,21 +313,11 @@ real_start_efi:
 
 blcheck_cpu_mode
 blcpu_init
-blcreate_page_tables
-load_paddr x0, boot_pgtable
-blenable_mmu
 
-/* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
-ldr   x0, =primary_switched
-brx0
+ldr   lr, =primary_switched
+b enable_boot_cpu_mm
+
 primary_switched:
-/*
- * The 1:1 map may clash with other parts of the Xen virtual memory
- * layout. As it is not used anymore, remove it completely to
- * avoid having to worry about replacing existing mapping
- * afterwards.
- */
-blremove_identity_mapping
 blsetup_fixmap
 #ifdef CONFIG_EARLY_PRINTK
 /* Use a virtual address to access the UART. */
@@ -372,13 +362,10 @@ GLOBAL(init_secondary)
 #endif
 blcheck_cpu_mode
 blcpu_init
-load_paddr x0, init_ttbr
-ldr   x0, [x0]
-blenable_mmu
 
-/* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
-ldr   x0, =secondary_switched
-brx0
+ldr   lr, =secondary_switched
+b enable_secondary_cpu_mm
+
 secondary_switched:
 #ifdef CONFIG_EARLY_PRINTK
 /* Use a virtual address to access the UART. */
@@ -737,6 +724,70 @@ enable_mmu:
 ret
 ENDPROC(enable_mmu)
 
+/*
+ * Enable mm (turn on the data cache and the MMU) for secondary CPUs.
+ * The function will return to the virtual address provided in LR (e.g. the
+ * runtime mapping).
+ *
+ * Inputs:
+ *   lr : Virtual address to return to.
+ *
+ * Clobbers x0 - x5
+ */
+enable_secondary_cpu_mm:
+mov   x5, lr
+
+load_paddr x0, init_ttbr
+ldr   x0, [x0]
+
+blenable_mmu
+mov   lr, x5
+
+/* return to secondary_switched */
+ret
+ENDPROC(enable_secondary_cpu_mm)
+
+/*
+ * Enable mm (turn on the data cache and the MMU) for the boot CPU.
+ * The function will return to the virtual address provided in LR (e.g. the
+ * runtime mapping).
+ *
+ * Inputs:
+ *   lr : Virtual address to return to.
+ *
+ * Clobbers x0 - x5
+ */
+enable_boot_cpu_mm:
+mov   x5, lr
+
+blcreate_page_tables
+load_paddr x0, boot_pgtable
+
+blenable_mmu
+mov   lr, x5
+
+/*
+ * The MMU is turned on and we are in the 1:1 mapping. Switch
+ * to the runtime mapping.
+ */
+ldr   x0, =1f
+brx0
+1:
+/*
+ * The 1:1 map may clash with other parts of the Xen virtual memory
+ * layout. As it is not used anymore, remove it completely to
+ * avoid having to worry about replacing existing mapping
+ * afterwards. Function will return to primary_switched.
+ */
+b remove_identity_mapping
+
+/*
+ * Below is supposed to be unreachable code, as "ret" in
+ * remove_identity_mapping will use the return address in LR in 
advance.
+ */
+b fail
+ENDPROC(enable_boot_cpu_mm)
+
 /*
  * Remove the 1:1 map from the page-tables. It is not easy to keep track
  * where the 1:1 map was mapped, so we will look for the top-level entry
-- 
2.25.1

[PATCH v4 00/13] xen/arm: Split MMU code as the prepration of MPU work

2023-07-31 Thread Henry Wang

Based on the discussion in the Xen Summit [1], sending this series out after
addressing the comments in v3 [2] as the preparation work to add MPU support.

Mostly code movement, with some of Kconfig and build system (mainly Makefiles)
adjustment. No functional change expected.

[1] 
https://lore.kernel.org/xen-devel/as8pr08mb799122f8b0cb841ded64f48192...@as8pr08mb7991.eurprd08.prod.outlook.com/
[2] 
https://lore.kernel.org/xen-devel/20230626033443.2943270-1-penny.zh...@arm.com/

Henry Wang (4):
  xen/arm: Introduce 'choice' for memory system architecture
  xen/arm64: Split and move MMU-specific head.S to mmu/head.S
  xen/arm64: Move setup_fixmap() to create_page_tables()
  xen/arm: Extract MMU-specific code

Penny Zheng (6):
  xen/arm: Fold pmap and fixmap into MMU system
  xen/arm: mm: Use generic variable/function names for extendability
  xen/arm: mmu: move MMU-specific setup_mm to mmu/setup.c
  xen/arm: mmu: move MMU specific P2M code to mmu/p2m.{c,h}
  xen/arm: mmu: relocate copy_from_paddr() to setup.c
  xen/arm: mmu: enable SMMU subsystem only in MMU

Wei Chen (3):
  xen/arm64: head.S: Introduce enable_{boot,secondary}_cpu_mm
  xen/arm64: prepare for moving MMU related code from head.S
  xen/arm: Move MMU related definitions from config.h to mmu/layout.h

 xen/arch/arm/Kconfig|   16 +-
 xen/arch/arm/Makefile   |1 +
 xen/arch/arm/arm32/head.S   |4 +-
 xen/arch/arm/arm64/Makefile |2 +-
 xen/arch/arm/arm64/head.S   |  497 +--
 xen/arch/arm/arm64/mmu/Makefile |2 +
 xen/arch/arm/arm64/mmu/head.S   |  471 ++
 xen/arch/arm/arm64/{ => mmu}/mm.c   |   11 +-
 xen/arch/arm/arm64/smpboot.c|6 +-
 xen/arch/arm/include/asm/arm64/macros.h |   36 +
 xen/arch/arm/include/asm/arm64/mm.h |7 +-
 xen/arch/arm/include/asm/config.h   |  132 +-
 xen/arch/arm/include/asm/fixmap.h   |7 +-
 xen/arch/arm/include/asm/mm.h   |   28 +-
 xen/arch/arm/include/asm/mmu/layout.h   |  146 ++
 xen/arch/arm/include/asm/mmu/mm.h   |   55 +
 xen/arch/arm/include/asm/mmu/p2m.h  |   18 +
 xen/arch/arm/include/asm/p2m.h  |   33 +-
 xen/arch/arm/include/asm/page.h |   15 -
 xen/arch/arm/include/asm/setup.h|8 +-
 xen/arch/arm/kernel.c   |   27 -
 xen/arch/arm/mm.c   | 1119 --
 xen/arch/arm/mmu/Makefile   |3 +
 xen/arch/arm/mmu/mm.c   | 1153 +++
 xen/arch/arm/mmu/p2m.c  | 1610 
 xen/arch/arm/mmu/setup.c|  366 +
 xen/arch/arm/p2m.c  | 1772 ++-
 xen/arch/arm/setup.c|  326 +
 xen/arch/arm/smpboot.c  |4 +-
 xen/arch/arm/xen.lds.S  |1 +
 xen/drivers/passthrough/Kconfig |3 +-
 31 files changed, 4064 insertions(+), 3815 deletions(-)
 create mode 100644 xen/arch/arm/arm64/mmu/Makefile
 create mode 100644 xen/arch/arm/arm64/mmu/head.S
 rename xen/arch/arm/arm64/{ => mmu}/mm.c (95%)
 create mode 100644 xen/arch/arm/include/asm/mmu/layout.h
 create mode 100644 xen/arch/arm/include/asm/mmu/mm.h
 create mode 100644 xen/arch/arm/include/asm/mmu/p2m.h
 create mode 100644 xen/arch/arm/mmu/Makefile
 create mode 100644 xen/arch/arm/mmu/mm.c
 create mode 100644 xen/arch/arm/mmu/p2m.c
 create mode 100644 xen/arch/arm/mmu/setup.c

-- 
2.25.1

RE: [PATCH v4 4/4] x86/iommu: pass full IO-APIC RTE for remapping table update

2023-07-31 Thread Tian, Kevin

> From: Roger Pau Monne 
> Sent: Friday, July 28, 2023 5:57 PM
> 
> So that the remapping entry can be updated atomically when possible.
> 
> Doing such update atomically will avoid Xen having to mask the IO-APIC
> pin prior to performing any interrupt movements (ie: changing the
> destination and vector fields), as the interrupt remapping entry is
> always consistent.
> 
> This also simplifies some of the logic on both VT-d and AMD-Vi
> implementations, as having the full RTE available instead of half of
> it avoids to possibly read and update the missing other half from
> hardware.
> 
> While there remove the explicit zeroing of new_ire fields in
> ioapic_rte_to_remap_entry() and initialize the variable at definition
> so all fields are zeroed.  Note fields could be also initialized with
> final values at definition, but I found that likely too much to be
> done at this time.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Kevin Tian

[xen-unstable-smoke test] 182098: tolerable all pass - PUSHED

2023-07-31 Thread osstest service owner

flight 182098 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182098/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  70eb862b01023c45b943e8ff92ef4f6a7e9e8950
baseline version:
 xen  c0dd53b8cbd1e47e9c89873a9265a7170bdc6b4c

Last test of basis   182091  2023-07-31 13:00:28 Z0 days
Testing same since   182098  2023-07-31 20:00:37 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Federico Serafini 
  Jan Beulich 
  Jason Andryuk 
  Nicola Vetrini 
  Peter Hoyes 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   c0dd53b8cb..70eb862b01  70eb862b01023c45b943e8ff92ef4f6a7e9e8950 -> smoke

[PATCH] arm32: Avoid using solaris syntax for .section directive

2023-07-31 Thread Khem Raj

Assembler from binutils 2.41 rejects this syntax

.section "name"[, flags...]

where flags could be #alloc, #write, #execstr
Switch to using ELF syntax

.section name[, "flags"[, @type]]

[1] https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC119

Signed-off-by: Khem Raj 
---
 xen/arch/arm/arm32/proc-v7.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm32/proc-v7.S b/xen/arch/arm/arm32/proc-v7.S
index c90a31d80f..6d3d19b873 100644
--- a/xen/arch/arm/arm32/proc-v7.S
+++ b/xen/arch/arm/arm32/proc-v7.S
@@ -29,7 +29,7 @@ brahma15mp_init:
 mcr   CP32(r0, ACTLR)
 mov   pc, lr
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca15mp_proc_info, #object
 __v7_ca15mp_proc_info:
 .long 0x410FC0F0 /* Cortex-A15 */
@@ -38,7 +38,7 @@ __v7_ca15mp_proc_info:
 .long caxx_processor
 .size __v7_ca15mp_proc_info, . - __v7_ca15mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca7mp_proc_info, #object
 __v7_ca7mp_proc_info:
 .long 0x410FC070 /* Cortex-A7 */
@@ -47,7 +47,7 @@ __v7_ca7mp_proc_info:
 .long caxx_processor
 .size __v7_ca7mp_proc_info, . - __v7_ca7mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_brahma15mp_proc_info, #object
 __v7_brahma15mp_proc_info:
 .long 0x420F00F0 /* Broadcom Brahma-B15 */
-- 
2.41.0

[PATCH] arm32: Avoid using solaris syntax for .section directive

2023-07-31 Thread Khem Raj

Assembler from binutils 2.41 rejects this syntax

.section "name"[, flags...]

where flags could be #alloc, #write, #execstr
Switch to using ELF syntax

.section name[, "flags"[, @type]]

[1] https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC119

Signed-off-by: Khem Raj 
---
 xen/arch/arm/arm32/proc-v7.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm32/proc-v7.S b/xen/arch/arm/arm32/proc-v7.S
index c90a31d80f..6d3d19b873 100644
--- a/xen/arch/arm/arm32/proc-v7.S
+++ b/xen/arch/arm/arm32/proc-v7.S
@@ -29,7 +29,7 @@ brahma15mp_init:
 mcr   CP32(r0, ACTLR)
 mov   pc, lr
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca15mp_proc_info, #object
 __v7_ca15mp_proc_info:
 .long 0x410FC0F0 /* Cortex-A15 */
@@ -38,7 +38,7 @@ __v7_ca15mp_proc_info:
 .long caxx_processor
 .size __v7_ca15mp_proc_info, . - __v7_ca15mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_ca7mp_proc_info, #object
 __v7_ca7mp_proc_info:
 .long 0x410FC070 /* Cortex-A7 */
@@ -47,7 +47,7 @@ __v7_ca7mp_proc_info:
 .long caxx_processor
 .size __v7_ca7mp_proc_info, . - __v7_ca7mp_proc_info
 
-.section ".proc.info", #alloc
+.section .proc.info, "a"
 .type __v7_brahma15mp_proc_info, #object
 __v7_brahma15mp_proc_info:
 .long 0x420F00F0 /* Broadcom Brahma-B15 */
-- 
2.41.0

RE: [PATCH v9 00/36] x86: enable FRED for x86-64

2023-07-31 Thread Li, Xin3

> > Are you talking about that you only got a subset of this patch set?
> 
> No, I'm saying I don't want to waste a bunch of time tracking down exactly 
> which
> commit a 36 patch series is based on.  E.g. I just refreshed tip/master and 
> still
> get:
> 
> Applying: x86/idtentry: Incorporate definitions/declarations of the FRED 
> external
> interrupt handler type
> error: sha1 information is lacking or useless 
> (arch/x86/include/asm/idtentry.h).
> error: could not build fake ancestor
> Patch failed at 0024 x86/idtentry: Incorporate definitions/declarations of 
> the FRED
> external interrupt handler type
> hint: Use 'git am --show-current-patch=diff' to see the failed patch

That is due to the following patch set (originally from tglx) is not
merged yet:

https://lore.kernel.org/lkml/20230621171248.6805-1-xin3...@intel.com/

Sigh, I should have mentioned it in the cover letter.

As mentioned in the cover letter, 2 patches are sent out separately
as pre-FRED patches:
https://lore.kernel.org/lkml/20230706051443.2054-1-xin3...@intel.com/
https://lore.kernel.org/lkml/20230706052231.2183-1-xin3...@intel.com/

Sorry it's a bit complicated.

Got to mention, just in case you want to try out FRED, the current
public Intel Simics emulator has not updated to support FRED 5.0 yet
(it only supports FRED 3.0). The plan is late Q3, or early Q4.

Re: [PATCH 2/3] xen/ppc: Relocate kernel to physical address 0 on boot

2023-07-31 Thread Shawn Anastasio

On 7/31/23 10:46 AM, Jan Beulich wrote:
> On 29.07.2023 00:21, Shawn Anastasio wrote:
>> Introduce a small assembly loop in `start` to copy the kernel to
>> physical address 0 before continuing. This ensures that the physical
>> address lines up with XEN_VIRT_START (0xc000) and allows us
>> to identity map the kernel when the MMU is set up in the next patch.
> 
> So PPC guarantees there's always a reasonable amount of memory at 0,
> and that's available for use?

Both Linux and FreeBSD rely on this being the case, so it's essentially
a de facto standard, though I'm not aware of any specification that
guarantees it.

>> --- a/xen/arch/ppc/ppc64/head.S
>> +++ b/xen/arch/ppc/ppc64/head.S
>> @@ -18,6 +18,33 @@ ENTRY(start)
>>  addis   %r2, %r12, .TOC.-1b@ha
>>  addi%r2, %r2, .TOC.-1b@l
>>  
>> +/*
>> + * Copy Xen to physical address zero and jump to XEN_VIRT_START
>> + * (0xc000). This works because the hardware will ignore 
>> the top
>> + * four address bits when the MMU is off.
>> + */
>> +LOAD_REG_ADDR(%r1, start)
> 
> I think you really mean _start here (which is missing from the linker
> script),

The PIC patch series fixes the missing _start definition in the linker
script. In the cover letter of v2 I'll add a clear note that this series
is based on that one.

> not start. See also Andrew's recent related RISC-V change.

Good point. In practice this worked because the `start` function was the
first thing in the first section of the linker script, but of course
using _start here is more correct.

> 
>> +LOAD_IMM64(%r12, XEN_VIRT_START)
>> +
>> +/* If we're at the correct address, skip copy */
>> +cmpld   %r1, %r12
>> +beq .L_correct_address
> 
> Can this ever be the case, especially with the MMU-off behavior you
> describe in the comment above? Wouldn't you need to ignore the top
> four bits in the comparison?

It will always be the case after the code jumps to XEN_VIRT_START after
the copy takes place. I could have it jump past the copy loop entirely,
but then I'd need to duplicate the TOC setup.

>> +/* Copy bytes until _end */
>> +LOAD_REG_ADDR(%r11, _end)
>> +addi%r1, %r1, -8
>> +li  %r13, -8
>> +.L_copy_xen:
>> +ldu %r10, 8(%r1)
>> +stdu%r10, 8(%r13)
>> +cmpld   %r1, %r11
>> +blt .L_copy_xen
>> +
>> +/* Jump to XEN_VIRT_START */
>> +mtctr   %r12
>> +bctr
>> +.L_correct_address:
> 
> Can the two regions potentially overlap? Looking at the ELF header
> it's not clear to me what guarantees there are that this can't
> happen.

As I understand it, any bootloader that placed the kernel at a low
enough address for this to be an issue wouldn't be able to boot Linux or
FreeBSD, so in practice it's a safe bet that this won't be the case.

> Jan

Thanks,
Shawn

Re: [PATCH v9 00/36] x86: enable FRED for x86-64

2023-07-31 Thread Sean Christopherson

On Mon, Jul 31, 2023, Xin3 Li wrote:
> > > This patch set enables the Intel flexible return and event delivery
> > > (FRED) architecture for x86-64.
> > 
> > ...
> > 
> > > --
> > > 2.34.1
> > 
> > What is this based on?
> 
> The tip tree master branch.
> 
> > FYI, you're using a version of git that will (mostly)
> > automatically generate the based, e.g. I do
> > 
> >   git format-patch --base=HEAD~$nr ...
> > 
> > in my scripts, where $nr is the number of patches I am sending.  My specific
> > approaches requires HEAD-$nr to be a publicly visible object/commit, but 
> > that
> > should be the case the vast majority of the time anyways.
> 
> Are you talking about that you only got a subset of this patch set?

No, I'm saying I don't want to waste a bunch of time tracking down exactly which
commit a 36 patch series is based on.  E.g. I just refreshed tip/master and 
still
get:

Applying: x86/idtentry: Incorporate definitions/declarations of the FRED 
external interrupt handler type
error: sha1 information is lacking or useless (arch/x86/include/asm/idtentry.h).
error: could not build fake ancestor
Patch failed at 0024 x86/idtentry: Incorporate definitions/declarations of the 
FRED external interrupt handler type
hint: Use 'git am --show-current-patch=diff' to see the failed patch

> HPA told me he only got patches 0-25/36.
> 
> And I got several undeliverable email notifications, saying
> "
> The following message to  was undeliverable.
> The reason for the problem:
> 5.x.1 - Maximum number of delivery attempts exceeded. [Default] 450-'4.7.25 
> Client host rejected: cannot find your hostname, [134.134.136.31]'
> "
> 
> I guess there were some problems with the Intel mail system last night,
> probably I should resend this patch set later.

Yes, lore also appears to be missing patches.  I grabbed the mbox off of KVM's
patchwork instance.

RE: [PATCH v9 00/36] x86: enable FRED for x86-64

2023-07-31 Thread Li, Xin3

> > This patch set enables the Intel flexible return and event delivery
> > (FRED) architecture for x86-64.
> 
> ...
> 
> > --
> > 2.34.1
> 
> What is this based on?

The tip tree master branch.

> FYI, you're using a version of git that will (mostly)
> automatically generate the based, e.g. I do
> 
>   git format-patch --base=HEAD~$nr ...
> 
> in my scripts, where $nr is the number of patches I am sending.  My specific
> approaches requires HEAD-$nr to be a publicly visible object/commit, but that
> should be the case the vast majority of the time anyways.

Are you talking about that you only got a subset of this patch set?

HPA told me he only got patches 0-25/36.

And I got several undeliverable email notifications, saying
"
The following message to  was undeliverable.
The reason for the problem:
5.x.1 - Maximum number of delivery attempts exceeded. [Default] 450-'4.7.25 
Client host rejected: cannot find your hostname, [134.134.136.31]'
"

I guess there were some problems with the Intel mail system last night,
probably I should resend this patch set later.

Re: [PATCH v9 00/36] x86: enable FRED for x86-64

2023-07-31 Thread Sean Christopherson

On Sun, Jul 30, 2023, Xin Li wrote:
> This patch set enables the Intel flexible return and event delivery
> (FRED) architecture for x86-64.

...

> -- 
> 2.34.1

What is this based on?  FYI, you're using a version of git that will (mostly)
automatically generate the based, e.g. I do 

  git format-patch --base=HEAD~$nr ...

in my scripts, where $nr is the number of patches I am sending.  My specific
approaches requires HEAD-$nr to be a publicly visible object/commit, but that
should be the case the vast majority of the time anyways.

[ovmf test] 182090: all pass - PUSHED

2023-07-31 Thread osstest service owner

flight 182090 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182090/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 677f2c6f1509da21258e02957b869b71b008fc61
baseline version:
 ovmf 70f3e62dc73d28962b833373246ef25c865c575e

Last test of basis   182084  2023-07-31 01:44:53 Z0 days
Testing same since   182090  2023-07-31 12:44:11 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Ard Biesheuvel  # Debian clang version 14.0.6
  Sunil V L 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   70f3e62dc7..677f2c6f15  677f2c6f1509da21258e02957b869b71b008fc61 -> 
xen-tested-master

Re: [PATCH v9 29/36] x86/fred: FRED entry/exit and dispatch code

2023-07-31 Thread H. Peter Anvin


On 7/30/23 23:41, Xin Li wrote:

+static DEFINE_FRED_HANDLER(fred_other_default)
+{
+   regs->vector = X86_TRAP_UD;
+   fred_emulate_fault(regs);
+}
+
+static DEFINE_FRED_HANDLER(fred_syscall)
+{
+   regs->orig_ax = regs->ax;
+   regs->ax = -ENOSYS;
+   do_syscall_64(regs, regs->orig_ax);
+}
+
+#if IS_ENABLED(CONFIG_IA32_EMULATION)
+/*
+ * Emulate SYSENTER if applicable. This is not the preferred system
+ * call in 32-bit mode under FRED, rather int $0x80 is preferred and
+ * exported in the vdso.
+ */
+static DEFINE_FRED_HANDLER(fred_sysenter)
+{
+   regs->orig_ax = regs->ax;
+   regs->ax = -ENOSYS;
+   do_fast_syscall_32(regs);
+}
+#else
+#define fred_sysenter fred_other_default
+#endif
+
+static DEFINE_FRED_HANDLER(fred_other)
+{
+   static const fred_handler user_other_handlers[FRED_NUM_OTHER_VECTORS] =
+   {
+   /*
+* Vector 0 of the other event type is not used
+* per FRED spec 5.0.
+*/
+   [0] = fred_other_default,
+   [FRED_SYSCALL]  = fred_syscall,
+   [FRED_SYSENTER] = fred_sysenter
+   };
+
+   user_other_handlers[regs->vector](regs);
+}


OK, this is wrong.

Dispatching like fred_syscall() is only valid for syscall64, which means 
you have to check regs->l is set in addition to the correct regs->vector 
to determine validity.


Similarly, sysenter is only valid if regs->l is clear.

The best way is probably to drop the dispatch table here and just do an 
if ... else if ... else statement; gcc is smart enough that it will 
combine the vector test and the L bit test into a single mask and 
compare. This also allows stubs to be inlined.


However, emulating #UD on events other than wrong mode of SYSCALL and 
SYSENTER may be a bad idea. It would probably be better to invoke 
fred_bad_event() in that case.


Something like this:

+static DEFINE_FRED_HANDLER(fred_other_default)
+{
+   regs->vector = X86_TRAP_UD;
+   fred_emulate_fault(regs);
+}

1) rename this to fred_emulate_ud (since that is what it actually does.)

... then ...

/* The compiler can fold these into a single test */

if (likely(regs->vector == FRED_SYSCALL && regs->l)) {
fred_syscall64(regs);
} else if (likely(regs->vector == FRED_SYSENTER && !regs->l)) {
fred_sysenter32(regs);
} else if (regs->vector == FRED_SYSCALL ||
   regs->vector == FRED_SYSENTER) {
/* Invalid SYSCALL or SYSENTER instruction */
fred_emulate_ud(regs);
} else {
/* Unknown event */
fred_bad_event(regs);
}

... or the SYSCALL64 and SYSENTER32 can be inlined with the appropriate 
comment (gcc will do so regardless.)


-hpa



-hpa

Re: [PATCH v9 29/36] x86/fred: FRED entry/exit and dispatch code

2023-07-31 Thread H. Peter Anvin


On 7/30/23 23:41, Xin Li wrote:

+
+static DEFINE_FRED_HANDLER(fred_sw_interrupt_user)
+{
+   /*
+* In compat mode INT $0x80 (32bit system call) is
+* performance-critical. Handle it first.
+*/
+   if (IS_ENABLED(CONFIG_IA32_EMULATION) &&
+   likely(regs->vector == IA32_SYSCALL_VECTOR)) {
+   regs->orig_ax = regs->ax;
+   regs->ax = -ENOSYS;
+   return do_int80_syscall_32(regs);
+   }


We can presumably drop the early out here as well...


+
+   /*
+* Some software exceptions can also be triggered as
+* int instructions, for historical reasons.
+*/
+   switch (regs->vector) {
+   case X86_TRAP_BP:
+   case X86_TRAP_OF:
+   fred_emulate_trap(regs);
+   break;
+   default:
+   regs->vector = X86_TRAP_GP;
+   fred_emulate_fault(regs);
+   break;
+   }
+}
+

Re: ack needed [XEN PATCH v3] xen/sched: mechanical renaming to address MISRA C:2012 Rule 5.3

2023-07-31 Thread Stefano Stabellini

George, Dario,

Please ack


On Fri, 28 Jul 2023, Stefano Stabellini wrote:
> On Fri, 28 Jul 2023, Nicola Vetrini wrote:
> > Rule 5.3 has the following headline:
> > "An identifier declared in an inner scope shall not hide an
> > identifier declared in an outer scope"
> > 
> > The renaming s/sched_id/scheduler_id/ of the function defined in
> > 'xen/common/sched/core.c' prevents any hiding of that function
> > by the instances of homonymous function parameters that
> > are defined in inner scopes.
> > 
> > Similarly, the renames
> > - s/ops/operations/ for the static variable in 'xen/common/sched/core.c'
> > - s/do_softirq/needs_softirq/
> > are introduced for variables, to avoid any conflict with homonymous
> > parameters or function identifiers.
> > 
> > Moreover, the variable 'loop' defined at 'xen/common/sched/credit2.c:3887'
> > has been dropped, in favour of the homonymous variable declared in the
> > outer scope. This in turn requires a modification of the printk call that
> > involves it.
> > 
> > Signed-off-by: Nicola Vetrini 
> 
> Reviewed-by: Stefano Stabellini 
> 
> 
> > ---
> > Changes in v3:
> > - removed stray changes to address the remarks
> > Changes in v2:
> > - s/softirq/needs_softirq/
> > - Dropped local variable 'it'
> > - Renamed the 'ops' static variable instead of function parameters
> > in the idle scheduler for coherence.
> > 
> > Note: local variable 'j' in xen/common/sched/credit2.c:3812' should
> > probably be unsigned as well, but I saw while editing the patch
> > that it's used as a parameter to 'dump_pcpu', which takes an int.
> > Possibly changing the types of parameters used in these calls is
> > probably a good target for another patch, as it's not relevant
> > w.r.t. Rule 5.3.
> > ---
> >  xen/common/sched/core.c| 28 ++--
> >  xen/common/sched/credit2.c |  6 +++---
> >  xen/common/sysctl.c|  2 +-
> >  xen/include/xen/sched.h|  2 +-
> >  4 files changed, 19 insertions(+), 19 deletions(-)
> > 
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index 022f548652..12deefa745 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -91,7 +91,7 @@ extern const struct scheduler 
> > *__start_schedulers_array[], *__end_schedulers_arr
> >  #define NUM_SCHEDULERS (__end_schedulers_array - __start_schedulers_array)
> >  #define schedulers __start_schedulers_array
> > 
> > -static struct scheduler __read_mostly ops;
> > +static struct scheduler __read_mostly operations;
> > 
> >  static bool scheduler_active;
> > 
> > @@ -171,7 +171,7 @@ static inline struct scheduler *dom_scheduler(const 
> > struct domain *d)
> >   * is the default scheduler that has been, choosen at boot.
> >   */
> >  ASSERT(is_idle_domain(d));
> > -return 
> > +return 
> >  }
> > 
> >  static inline struct scheduler *unit_scheduler(const struct sched_unit 
> > *unit)
> > @@ -2040,10 +2040,10 @@ long do_set_timer_op(s_time_t timeout)
> >  return 0;
> >  }
> > 
> > -/* sched_id - fetch ID of current scheduler */
> > -int sched_id(void)
> > +/* scheduler_id - fetch ID of current scheduler */
> > +int scheduler_id(void)
> >  {
> > -return ops.sched_id;
> > +return operations.sched_id;
> >  }
> > 
> >  /* Adjust scheduling parameter for a given domain. */
> > @@ -2579,7 +2579,7 @@ static void cf_check sched_slave(void)
> >  struct sched_unit*prev = vprev->sched_unit, *next;
> >  s_time_t  now;
> >  spinlock_t   *lock;
> > -bool  do_softirq = false;
> > +bool  needs_softirq = false;
> >  unsigned int  cpu = smp_processor_id();
> > 
> >  ASSERT_NOT_IN_ATOMIC();
> > @@ -2604,7 +2604,7 @@ static void cf_check sched_slave(void)
> >  return;
> >  }
> > 
> > -do_softirq = true;
> > +needs_softirq = true;
> >  }
> > 
> >  if ( !prev->rendezvous_in_cnt )
> > @@ -2614,7 +2614,7 @@ static void cf_check sched_slave(void)
> >  rcu_read_unlock(_res_rculock);
> > 
> >  /* Check for failed forced context switch. */
> > -if ( do_softirq )
> > +if ( needs_softirq )
> >  raise_softirq(SCHEDULE_SOFTIRQ);
> > 
> >  return;
> > @@ -3016,14 +3016,14 @@ void __init scheduler_init(void)
> >  BUG_ON(!scheduler);
> >  printk("Using '%s' (%s)\n", scheduler->name, scheduler->opt_name);
> >  }
> > -ops = *scheduler;
> > +operations = *scheduler;
> > 
> >  if ( cpu_schedule_up(0) )
> >  BUG();
> >  register_cpu_notifier(_schedule_nfb);
> > 
> > -printk("Using scheduler: %s (%s)\n", ops.name, ops.opt_name);
> > -if ( sched_init() )
> > +printk("Using scheduler: %s (%s)\n", operations.name, 
> > operations.opt_name);
> > +if ( sched_init() )
> >  panic("scheduler returned error on init\n");
> > 
> >  if ( sched_ratelimit_us &&
> > @@ -3363,7 +3363,7 @@ int

Re: [XEN PATCH 2/4] amd/iommu: rename functions to address MISRA C:2012 Rule 5.3

2023-07-31 Thread Stefano Stabellini

On Mon, 31 Jul 2023, Nicola Vetrini wrote:
> The functions 'machine_bfd' and 'guest_bfd' have gained the
> prefix 'get_' to avoid the mutual shadowing with the homonymous
> parameters in these functions.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini 


> ---
>  xen/drivers/passthrough/amd/iommu_guest.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_guest.c 
> b/xen/drivers/passthrough/amd/iommu_guest.c
> index 80a331f546..47a912126a 100644
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -38,12 +38,12 @@
>  (reg)->hi = (val) >> 32; \
>  } while (0)
>  
> -static unsigned int machine_bdf(struct domain *d, uint16_t guest_bdf)
> +static unsigned int get_machine_bdf(struct domain *d, uint16_t guest_bdf)
>  {
>  return guest_bdf;
>  }
>  
> -static uint16_t guest_bdf(struct domain *d, uint16_t machine_bdf)
> +static uint16_t get_guest_bdf(struct domain *d, uint16_t machine_bdf)
>  {
>  return machine_bdf;
>  }
> @@ -195,7 +195,7 @@ void guest_iommu_add_ppr_log(struct domain *d, u32 
> entry[])
>  log = map_domain_page(_mfn(mfn)) + (tail & ~PAGE_MASK);
>  
>  /* Convert physical device id back into virtual device id */
> -gdev_id = guest_bdf(d, iommu_get_devid_from_cmd(entry[0]));
> +gdev_id = get_guest_bdf(d, iommu_get_devid_from_cmd(entry[0]));
>  iommu_set_devid_to_cmd([0], gdev_id);
>  
>  memcpy(log, entry, sizeof(ppr_entry_t));
> @@ -245,7 +245,7 @@ void guest_iommu_add_event_log(struct domain *d, u32 
> entry[])
>  log = map_domain_page(_mfn(mfn)) + (tail & ~PAGE_MASK);
>  
>  /* re-write physical device id into virtual device id */
> -dev_id = guest_bdf(d, iommu_get_devid_from_cmd(entry[0]));
> +dev_id = get_guest_bdf(d, iommu_get_devid_from_cmd(entry[0]));
>  iommu_set_devid_to_cmd([0], dev_id);
>  memcpy(log, entry, sizeof(event_entry_t));
>  
> @@ -268,7 +268,7 @@ static int do_complete_ppr_request(struct domain *d, 
> cmd_entry_t *cmd)
>  uint16_t dev_id;
>  struct amd_iommu *iommu;
>  
> -dev_id = machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0]));
> +dev_id = get_machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0]));
>  iommu = find_iommu_for_device(0, dev_id);
>  
>  if ( !iommu )
> @@ -320,7 +320,7 @@ static int do_invalidate_iotlb_pages(struct domain *d, 
> cmd_entry_t *cmd)
>  struct amd_iommu *iommu;
>  uint16_t dev_id;
>  
> -dev_id = machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0]));
> +dev_id = get_machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0]));
>  
>  iommu = find_iommu_for_device(0, dev_id);
>  if ( !iommu )
> @@ -396,7 +396,7 @@ static int do_invalidate_dte(struct domain *d, 
> cmd_entry_t *cmd)
>  
>  g_iommu = domain_iommu(d);
>  gbdf = iommu_get_devid_from_cmd(cmd->data[0]);
> -mbdf = machine_bdf(d, gbdf);
> +mbdf = get_machine_bdf(d, gbdf);
>  
>  /* Guest can only update DTEs for its passthru devices */
>  if ( mbdf == 0 || gbdf == 0 )
> -- 
> 2.34.1
>

Re: [XEN PATCH 1/4] xen/pci: rename local variable to address MISRA C:2012 Rule 5.3

2023-07-31 Thread Stefano Stabellini

On Mon, 31 Jul 2023, Nicola Vetrini wrote:
> On 31/07/2023 16:16, Jan Beulich wrote:
> > On 31.07.2023 15:34, Nicola Vetrini wrote:
> > > --- a/xen/drivers/passthrough/pci.c
> > > +++ b/xen/drivers/passthrough/pci.c
> > > @@ -650,12 +650,12 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> > >  struct pci_seg *pseg;
> > >  struct pci_dev *pdev;
> > >  unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
> > > -const char *pdev_type;
> > > +const char *pci_dev_type;
> > 
> > I've always been wondering what purpose the pdev_ prefix served here.
> > There's no other "type" variable in the function, so why make the name
> > longer? (I'm okay to adjust on commit, provided you agree.)
> > 
> > Jan
> 
> No objections.

I reviewed the patch and it is correct:

Reviewed-by: Stefano Stabellini 


Jan, feel free to pick any name you prefer on commit, e.g. "type".

Re: [XEN PATCH] xen/arm/IRQ: uniform irq_set_affinity() with x86 version

2023-07-31 Thread Stefano Stabellini

On Mon, 31 Jul 2023, Federico Serafini wrote:
> Change parameter name of irq_set_affinity() to uniform the function
> prototype with the one used by x86.
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 


> ---
>  xen/arch/arm/include/asm/irq.h | 2 +-
>  xen/arch/arm/irq.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h
> index 105b33b37d..c8044b0371 100644
> --- a/xen/arch/arm/include/asm/irq.h
> +++ b/xen/arch/arm/include/asm/irq.h
> @@ -91,7 +91,7 @@ int platform_get_irq(const struct dt_device_node *device, 
> int index);
>  
>  int platform_get_irq_byname(const struct dt_device_node *np, const char 
> *name);
>  
> -void irq_set_affinity(struct irq_desc *desc, const cpumask_t *cpu_mask);
> +void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask);
>  
>  /*
>   * Use this helper in places that need to know whether the IRQ type is
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 054bb281d8..09648db17a 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -175,10 +175,10 @@ static inline struct domain *irq_get_domain(struct 
> irq_desc *desc)
>  return irq_get_guest_info(desc)->d;
>  }
>  
> -void irq_set_affinity(struct irq_desc *desc, const cpumask_t *cpu_mask)
> +void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask)
>  {
>  if ( desc != NULL )
> -desc->handler->set_affinity(desc, cpu_mask);
> +desc->handler->set_affinity(desc, mask);
>  }
>  
>  int request_irq(unsigned int irq, unsigned int irqflags,
> -- 
> 2.34.1
>

Re: [XEN PATCH] xen/sched: address violations of MISRA C:2012 Rules 8.2 and 8.3

2023-07-31 Thread Stefano Stabellini

On Mon, 31 Jul 2023, Federico Serafini wrote:
> Give a name to unnamed parameters to address violations of
> MISRA C:2012 Rule 8.2 ("Function types shall be in prototype form with
> named parameters").
> Keep consistency between parameter names and types used in function
> declarations and the ones used in the corresponding function
> definitions, thus addressing violations of MISRA C:2012 Rule 8.3
> ("All declarations of an object or function shall use the same names
> and type qualifiers").
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 


> ---
>  xen/common/sched/compat.c  |  2 +-
>  xen/common/sched/credit2.c |  3 ++-
>  xen/common/sched/private.h |  2 +-
>  xen/include/xen/sched.h| 10 +-
>  4 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/common/sched/compat.c b/xen/common/sched/compat.c
> index 040b4caca2..a596e3a226 100644
> --- a/xen/common/sched/compat.c
> +++ b/xen/common/sched/compat.c
> @@ -39,7 +39,7 @@ static int compat_poll(struct compat_sched_poll *compat)
>  
>  #include "core.c"
>  
> -int compat_set_timer_op(u32 lo, s32 hi)
> +int compat_set_timer_op(uint32_t lo, int32_t hi)
>  {
>  return do_set_timer_op(((s64)hi << 32) | lo);
>  }
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index 87a1e31ee9..7e23fabebb 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -1480,7 +1480,8 @@ static inline void runq_remove(struct csched2_unit *svc)
>  list_del_init(>runq_elem);
>  }
>  
> -void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_unit *, 
> s_time_t);
> +void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_unit 
> *svc,
> +  s_time_t now);
>  
>  static inline void
>  tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
> diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
> index 0527a8c70d..c516976c37 100644
> --- a/xen/common/sched/private.h
> +++ b/xen/common/sched/private.h
> @@ -629,7 +629,7 @@ int cpu_disable_scheduler(unsigned int cpu);
>  int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
>  struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu, bool aff_alloc);
>  void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu);
> -int schedule_cpu_rm(unsigned int cpu, struct cpu_rm_data *mem);
> +int schedule_cpu_rm(unsigned int cpu, struct cpu_rm_data *data);
>  int sched_move_domain(struct domain *d, struct cpupool *c);
>  void sched_migrate_timers(unsigned int cpu);
>  struct cpupool *cpupool_get_by_id(unsigned int poolid);
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 854f3e32c0..5be61bb252 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -789,8 +789,8 @@ int  sched_init_vcpu(struct vcpu *v);
>  void sched_destroy_vcpu(struct vcpu *v);
>  int  sched_init_domain(struct domain *d, unsigned int poolid);
>  void sched_destroy_domain(struct domain *d);
> -long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *);
> -long sched_adjust_global(struct xen_sysctl_scheduler_op *);
> +long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op);
> +long sched_adjust_global(struct xen_sysctl_scheduler_op *op);
>  int  sched_id(void);
>  
>  /*
> @@ -831,11 +831,11 @@ void context_switch(
>  
>  /*
>   * As described above, context_switch() must call this function when the
> - * local CPU is no longer running in @prev's context, and @prev's context is
> + * local CPU is no longer running in @vprev's context, and @vprev's context 
> is
>   * saved to memory. Alternatively, if implementing lazy context switching,
> - * ensure that invoking sync_vcpu_execstate() will switch and commit @prev.
> + * ensure that invoking sync_vcpu_execstate() will switch and commit @vprev.
>   */
> -void sched_context_switched(struct vcpu *prev, struct vcpu *vnext);
> +void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext);
>  
>  /* Called by the scheduler to continue running the current VCPU. */
>  void continue_running(
> -- 
> 2.34.1
>

Re: [PATCH] x86/HVM: tidy _hvm_load_entry() for style

2023-07-31 Thread Stefano Stabellini

On Mon, 31 Jul 2023, Jan Beulich wrote:
> The primary goal is to eliminate the Misra-non-compliance of "desc"
> shadowing at least the local variable in hvm_load(). Suffix both local
> variables with underscores, while also
> - dropping leading underscores from parameter names (applying this also
>   to the two wrapper macros),
> - correcting indentation,
> - correcting brace placement,
> - dropping unnecessary parentheses around parameter uses when those are
>   passed on as plain arguments.

you might want (or not want) to mention the s/1/true/ and s/0/false/


> No functional change intended.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Stefano Stabellini 


> --- a/xen/arch/x86/include/asm/hvm/save.h
> +++ b/xen/arch/x86/include/asm/hvm/save.h
> @@ -47,30 +47,32 @@ void _hvm_read_entry(struct hvm_domain_c
>   * Unmarshalling: check, then copy. Evaluates to zero on success. This load
>   * function requires the save entry to be the same size as the dest 
> structure.
>   */
> -#define _hvm_load_entry(_x, _h, _dst, _strict) ({   \
> -int r;  \
> -struct hvm_save_descriptor *desc\
> -= (struct hvm_save_descriptor *)&(_h)->data[(_h)->cur]; \
> -if ( (r = _hvm_check_entry((_h), HVM_SAVE_CODE(_x), \
> -   HVM_SAVE_LENGTH(_x), (_strict))) == 0 )  \
> +#define _hvm_load_entry(x, h, dst, strict) ({   \
> +int r_; \
> +struct hvm_save_descriptor *desc_   \
> += (struct hvm_save_descriptor *)&(h)->data[(h)->cur];   \
> +if ( (r_ = _hvm_check_entry(h, HVM_SAVE_CODE(x),\
> +HVM_SAVE_LENGTH(x), strict)) == 0 ) \
>  {   \
> -_hvm_read_entry((_h), (_dst), HVM_SAVE_LENGTH(_x)); \
> -if ( HVM_SAVE_HAS_COMPAT(_x) && \
> - desc->length != HVM_SAVE_LENGTH(_x) )  \
> -r = HVM_SAVE_FIX_COMPAT(_x, (_dst), desc->length);  \
> +_hvm_read_entry(h, dst, HVM_SAVE_LENGTH(x));\
> +if ( HVM_SAVE_HAS_COMPAT(x) &&  \
> + desc_->length != HVM_SAVE_LENGTH(x) )  \
> +r_ = HVM_SAVE_FIX_COMPAT(x, dst, desc_->length);\
>  }   \
> -else if (HVM_SAVE_HAS_COMPAT(_x)\
> - && (r = _hvm_check_entry((_h), HVM_SAVE_CODE(_x),  \
> -   HVM_SAVE_LENGTH_COMPAT(_x), (_strict))) == 0 ) { \
> -_hvm_read_entry((_h), (_dst), HVM_SAVE_LENGTH_COMPAT(_x));  \
> -r = HVM_SAVE_FIX_COMPAT(_x, (_dst), desc->length);  \
> +else if (HVM_SAVE_HAS_COMPAT(x) \
> + && (r_ = _hvm_check_entry(h, HVM_SAVE_CODE(x), \
> +   HVM_SAVE_LENGTH_COMPAT(x),   \
> +   strict)) == 0 )  \
> +{   \
> +_hvm_read_entry(h, dst, HVM_SAVE_LENGTH_COMPAT(x)); \
> +r_ = HVM_SAVE_FIX_COMPAT(x, dst, desc_->length);\
>  }   \
> -r; })
> +r_; })
>  
> -#define hvm_load_entry(_x, _h, _dst)\
> -_hvm_load_entry(_x, _h, _dst, 1)
> -#define hvm_load_entry_zeroextend(_x, _h, _dst) \
> -_hvm_load_entry(_x, _h, _dst, 0)
> +#define hvm_load_entry(x, h, dst)\
> +_hvm_load_entry(x, h, dst, true)
> +#define hvm_load_entry_zeroextend(x, h, dst) \
> +_hvm_load_entry(x, h, dst, false)
>  
>  /* Unmarshalling: what is the instance ID of the next entry? */
>  static inline unsigned int hvm_load_instance(const struct hvm_domain_context 
> *h)
>

Re: [PATCH v6 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-31 Thread Petr Tesařík

V Mon, 31 Jul 2023 18:04:09 +0200
Christoph Hellwig  napsáno:

> I was just going to apply this, but patch 1 seems to have a non-trivial
> conflict with the is_swiotlb_active removal in pci-dma.c.  Can you resend
> against the current dma-mapping for-next tree?

Sure thing, will re-send tomorrow morning.

Petr T

Re: [PATCH 00/24] ALSA: Generic PCM copy ops using sockptr_t

2023-07-31 Thread Mark Brown

On Mon, Jul 31, 2023 at 09:30:29PM +0200, Takashi Iwai wrote:
> Mark Brown wrote:

> > It really feels like we ought to rename, or add an alias for, the type
> > if we're going to start using it more widely - it's not helping to make
> > the code clearer.

> That was my very first impression, too, but I changed my mind after
> seeing the already used code.  An alias might work, either typedef or
> define genptr_t or such as sockptr_t.  But we'll need to copy the
> bunch of helper functions, too...

I would predict that if the type becomes more widely used that'll happen
eventually and the longer it's left the more work it'll be.


signature.asc
Description: PGP signature

Re: [PATCH 00/24] ALSA: Generic PCM copy ops using sockptr_t

2023-07-31 Thread Takashi Iwai

On Mon, 31 Jul 2023 19:20:54 +0200,
Mark Brown wrote:
> 
> On Mon, Jul 31, 2023 at 05:46:54PM +0200, Takashi Iwai wrote:
> 
> > this is a patch set to clean up the PCM copy ops using sockptr_t as a
> > "universal" pointer, inspired by the recent patch from Andy
> > Shevchenko:
> >   
> > https://lore.kernel.org/r/20230721100146.67293-1-andriy.shevche...@linux.intel.com
> 
> > Even though it sounds a bit weird, sockptr_t is a generic type that is
> > used already in wide ranges, and it can fit our purpose, too.  With
> > sockptr_t, the former split of copy_user and copy_kernel PCM ops can
> > be unified again gracefully.
> 
> It really feels like we ought to rename, or add an alias for, the type
> if we're going to start using it more widely - it's not helping to make
> the code clearer.

That was my very first impression, too, but I changed my mind after
seeing the already used code.  An alias might work, either typedef or
define genptr_t or such as sockptr_t.  But we'll need to copy the
bunch of helper functions, too...


Takashi

[xen-unstable-smoke test] 182091: tolerable all pass - PUSHED

2023-07-31 Thread osstest service owner

flight 182091 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182091/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  c0dd53b8cbd1e47e9c89873a9265a7170bdc6b4c
baseline version:
 xen  fff3c99f84589a876fcd8467ea99f2c8d9ff8d21

Last test of basis   182066  2023-07-29 02:00:29 Z2 days
Testing same since   182091  2023-07-31 13:00:28 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   fff3c99f84..c0dd53b8cb  c0dd53b8cbd1e47e9c89873a9265a7170bdc6b4c -> smoke

Re: [PATCH 3/5] xen/ppc: Add OPAL API definition header file

2023-07-31 Thread Shawn Anastasio

On 7/31/23 10:59 AM, Jan Beulich wrote:
> On 28.07.2023 23:35, Shawn Anastasio wrote:
>> OPAL (OpenPower Abstraction Layer) is the interface exposed by firmware
>> on PowerNV (bare metal) systems. Import Linux's header definining the
>> API and related information.
> 
> To help future updating, mentioning version (or commit) at which this
> snapshot was taken would be helpful.

Sounds reasonable. I'll reference the Linux commit that this was pulled
from in the commit message.

> Jan

Thanks,
Shawn

Re: [PATCH 2/5] xen/ppc: Switch to medium PIC code model

2023-07-31 Thread Shawn Anastasio

On 7/31/23 10:58 AM, Jan Beulich wrote:
> On 28.07.2023 23:35, Shawn Anastasio wrote:
>> --- a/xen/arch/ppc/ppc64/head.S
>> +++ b/xen/arch/ppc/ppc64/head.S
>> @@ -1,9 +1,11 @@
>>  /* SPDX-License-Identifier: GPL-2.0-or-later */
>>  
>>  #include 
>> +#include 
>>  
>>  .section .text.header, "ax", %progbits
>>  
>> +
>>  ENTRY(start)
> 
> Nit: Stray change?
> 
>> @@ -11,16 +13,19 @@ ENTRY(start)
>>  FIXUP_ENDIAN
>>  
>>  /* set up the TOC pointer */
>> -LOAD_IMM32(%r2, .TOC.)
>> +bcl 20, 31, .+4
> 
> Could you use a label name instead of .+4? Aiui you really mean
> 
>> +1:  mflr%r12
> 
> ... "1f" there?

Yes, good point. I'll point out that this form of the `bcl` instruction
is specifically defined in the ISA specification as the recommended
way to obtain the address of the next instruction, and hardware
implementations presumably optimize it. Using a label instead of +4
would of course be fine as long as the label immediately follows the
bcl, but if the label was elsewhere then the optimization that the ISA
allows for this specific instruction might not be hit. Just something
that should be kept in mind in case this code is ever refactored.

I'll change it to 1f in v2.

> 
> Jan

Thanks,
Shawn

Re: [PATCH 1/5] xen/lib: Move simple_strtoul from common/vsprintf.c to lib

2023-07-31 Thread Shawn Anastasio

On 7/31/23 10:52 AM, Jan Beulich wrote:
> On 28.07.2023 23:35, Shawn Anastasio wrote:
>> Move the simple_strtoul routine which is used throughout the codebase
>> from vsprintf.c to its own file in xen/lib.
>>
>> This allows libfdt to be built on ppc64 even though xen/common doesn't
>> build yet.
>>
>> Signed-off-by: Shawn Anastasio 
>> ---
>>  xen/common/vsprintf.c| 37 -
>>  xen/lib/Makefile |  1 +
>>  xen/lib/simple_strtoul.c | 40 
>>  3 files changed, 41 insertions(+), 37 deletions(-)
>>  create mode 100644 xen/lib/simple_strtoul.c
> 
> What about its siblings? It'll be irritating to find one here and the
> other there.

I was debating whether to do this or not and ultimately decided to only
make the minimum changes that were required right now. I can go ahead
and make the change for its siblings as well.

> Also please no underscores in (new) filenames unless there's a reason
> for this. In the case here, though, I question the need for "simple"
> in the file name in the first place.

>From a look at the other files in xen/lib there seemed to be a
convention of naming files after the exact function they implement.
Would you rather I rename it to just strtoul.c? Or simple-strotoul.c?

>> --- /dev/null
>> +++ b/xen/lib/simple_strtoul.c
>> @@ -0,0 +1,40 @@
>> +/*
>> + *  Copyright (C) 1991, 1992  Linus Torvalds
>> + */
>> +
>> +#include 
>> +
>> +/**
>> + * simple_strtoul - convert a string to an unsigned long
>> + * @cp: The start of the string
>> + * @endp: A pointer to the end of the parsed string will be placed here
>> + * @base: The number base to use
>> + */
>> +unsigned long simple_strtoul(
>> +const char *cp, const char **endp, unsigned int base)
>> +{
>> +unsigned long result = 0,value;
>> +
>> +if (!base) {
>> +base = 10;
>> +if (*cp == '0') {
>> +base = 8;
>> +cp++;
>> +if ((toupper(*cp) == 'X') && isxdigit(cp[1])) {
>> +cp++;
>> +base = 16;
>> +}
>> +}
>> +} else if (base == 16) {
>> +if (cp[0] == '0' && toupper(cp[1]) == 'X')
>> +cp += 2;
>> +}
>> +while (isxdigit(*cp) &&
>> +   (value = isdigit(*cp) ? *cp-'0' : toupper(*cp)-'A'+10) < base) {
>> +result = result*base + value;
>> +cp++;
>> +}
>> +if (endp)
>> +*endp = cp;
>> +return result;
>> +}
> 
> While moving, I think it would be nice if this stopped using neither
> Xen nor Linux style. I'm not going to insist, but doing such style
> adjustments right here would be quite nice.

Especially if I'm going to be moving its siblings, I'd rather just copy
the functions verbatim for this patch, if that's acceptable.

> Jan

Thanks,
Shawn

Re: [PATCH v2 5/5] pdx: Add CONFIG_HAS_PDX_COMPRESSION as a common Kconfig option

2023-07-31 Thread Andrew Cooper

On 31/07/2023 9:00 am, Jan Beulich wrote:
> On 28.07.2023 18:58, Andrew Cooper wrote:
>> On 28/07/2023 5:36 pm, Andrew Cooper wrote:
>>> On 28/07/2023 8:59 am, Alejandro Vallejo wrote:
 Adds a new compile-time flag to allow disabling pdx compression and
 compiles out compression-related code/data. It also shorts the pdx<->pfn
 conversion macros and creates stubs for masking fucntions.

 While at it, removes the old arch-defined CONFIG_HAS_PDX flag, as it was
 not removable in practice.

 Signed-off-by: Alejandro Vallejo 
 ---
 v2:
   * Merged v1/patch2: Removal of CONFIG_HAS_PDX here (Jan)
>>> This series is now looking fine, except for the Kconfig aspect.
>>>
>>> This is not something any user or developer should ever be queried
>>> about.  The feedback on the documentation patches alone show that it's
>>> not understood well by the maintainers, even if the principle is accepted.
>>>
>>> There is never any reason to have this active on x86.
> We can of course continue to disagree here. At least with EXPERT=y
> selecting this option ought to remain possible for x86. Whether or
> not the original systems this scheme was developed for ever went
> public, such systems did (do) exist, and hence running Xen sensibly
> on them (without losing all memory except that on node 0) ought to
> be possible.

There's one system which never made its way into production,
support-for-which in the no-op case is causing a 10% perf hit in at
least one metric, 17% in another.

... and your seriously arguing that we should continue to take this perf
hit?

It is very likely that said machine(s) aren't even powered on these
days, and even if it is on, the vendor can take the overhead of turning
PDX compression on until such time as they make a production system.

Furthermore, it is unrealistic to think that such a machine will ever
make its way into production.  Linux has never PDX compression, and
by-and-large if it doesn't run Linux, you can't sell it in the first place.

It is utterly unreasonable to be carrying this overhead in the first
place.  PDX compression *should not* have been committed on-by-default
in the first place.  (Yes, I know there was no Kconfig back then, and
the review process was non-existent, but someone really should have said
no).

It is equally unreasonable to offer people (under Expert or not) an
ability to shoot themselves in the foot like this.

If in the very unlikely case that such a system does come into
existence, we can consider re-enabling PDX compression (and by that, I
mean doing it in a less invasive way in the common case), but there's
very little chance this will ever be a path we need to take.

>>   Indeed, Julien's
>>> quick metric shows how much performance we waste by having it enabled.
>> Further to this, bloat-o-meter says net -30k of code and there are
>> plenty of fastpaths getting a several cacheline reduction from this.
> A similar reduction was achieved

Really?  You think that replacing the innermost shift and masking with
an alternative that has a shorter instruction sequence gets you the same
net reduction in code?

I do not find that claim credible.

>  by the BMI2-alt-patching series I
> had put together, yet you weren't willing to come to consensus on
> it.

You have AMD machines, and your patch was alleged to be a performance
improvement.  So the fact you didn't spot the problems with PEXT/PDEP on
all AMD hardware prior to Fam19h suggests there was insufficient testing
for an alleged performance improvement.

The patch you posted:

1) Added extra complexity via alternatives, and
2) Reduced performance on AMD systems prior to Fam19h.

in an area of code which useless on all shipping x86 systems.

You literally micro-anti-optimised a no-op path to a more expensive (on
one vendor at least) no-op path, claiming it to be a performance
improvement.

There is no possible way any form of your patch can ever beat
Alejandro's work of just compiling-out the useless logic wholesale.

~Andrew

Re: [PATCH 00/24] ALSA: Generic PCM copy ops using sockptr_t

2023-07-31 Thread Mark Brown

On Mon, Jul 31, 2023 at 05:46:54PM +0200, Takashi Iwai wrote:

> this is a patch set to clean up the PCM copy ops using sockptr_t as a
> "universal" pointer, inspired by the recent patch from Andy
> Shevchenko:
>   
> https://lore.kernel.org/r/20230721100146.67293-1-andriy.shevche...@linux.intel.com

> Even though it sounds a bit weird, sockptr_t is a generic type that is
> used already in wide ranges, and it can fit our purpose, too.  With
> sockptr_t, the former split of copy_user and copy_kernel PCM ops can
> be unified again gracefully.

It really feels like we ought to rename, or add an alias for, the type
if we're going to start using it more widely - it's not helping to make
the code clearer.


signature.asc
Description: PGP signature

Re: Python in Domain Configurations

2023-07-31 Thread Ian Jackson

Elliott Mitchell writes ("Re: Python in Domain Configurations"):
> On Mon, Jul 31, 2023 at 05:59:55AM +0200, Marek Marczykowski-Górecki wrote:
> > So, IMHO reducing config file from a full python (like it used to be in
> > xend times) into a static file with well defined syntax was an
> > improvement. Lets not go backward.

I'm no longer working on this codebase, but since I've been CC'd:

I was one of the people who replaced the Python-based config parsing
with the current arrangements.  We didn't just do this because we were
replacing xend (whose use of Python as implementation language made it
appear convenient to just read and execute the configs as Python
code).

We did it for the reasons Marek gives.  It's true that the existing
format is not as well specified as it could be.  It was intended as a
plausible subset of Python literal syntax.  We chose that syntax to
preserve compatibility with the vast majority of existing config files
and to provide something familiar.  (And it seems we did achieve those
goals.)

The disk configuration syntax is particularly warty, but we inherited
much of that from the Python version.

If we had a free choice today, I might advocate for TOML.  But I don't
see any value in changing the concrete syntax now.

> > As for your original problem, IIUC you would like to add some data that
> > would _not_ be interpreted by libxl, right? For that you can use
> > comments with some specific marker for your script. This approach used
> > to work well for SysV init script, and in fact for a very similar use case
> > (ordering and dependencies, among other things).
> 
> That is /not/ the issue.  `xl` simply ignores any variables which it
> doesn't interpret (this is in fact a Bad Thing).

I forget, but isn't there some kind of scheme for warning about
unrecognised configuration options ?

>  I need to know what the limits to the syntax are.

I agree that it's not great that the syntax is not 100% documented.
The parser is in
  tools/libs/util/libxlu_cfg_y.y
  tools/libs/util/libxlu_cfg_l.l
I'm sure patches to improve the docs would be welcome.

Note that it is still a *subset* of Python, so if you wish to use a
Python interpreter to parse it in your own tooling, you're very
welcome to do so.

> Notice how many init scripts do `. /etc/default/` to load
> configuration?  I'm thinking it would be very handy to use a similar
> technique to load domain.cfg files, with Python being the interpreter.

I don't think this is a good idea.  Both because I don't think the
functionality available in a Python interpreter should be available in
the libxl configuration, and because Python is a large and complex
dependency which we don't want to pull in here.

> I also think some portions of the domain.cfg format might work better
> with full Python syntax.  For example might it be handier to allow:
> 
> disk = [
>   {
>   'vdev': 'xvda',
>   'format': 'raw',
>   'access': 'rw',
>   'target': '/dev/disk/by-path/foo-bar-baz',
>   },
> ]

I agree that something like this would be nice.  I don't think it
should be done by importing Python.  These two files - the main part
of the existing parser - is only 183 loc including comments.
Extending it (and the support code in libxlu_cfg.c) to do dictionaries
as well as lists doesn't seem like it would make it too much bigger.

Thanks,
Ian.

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

[PATCH mm-unstable v8 17/31] arm: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

late_alloc() also uses the __get_free_pages() helper function. Convert
this to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/arm/include/asm/tlb.h | 12 +++-
 arch/arm/mm/mmu.c  |  7 ---
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index b8cbe03ad260..f40d06ad5d2a 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
 static inline void
 __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
 {
-   pgtable_pte_page_dtor(pte);
+   struct ptdesc *ptdesc = page_ptdesc(pte);
+
+   pagetable_pte_dtor(ptdesc);
 
 #ifndef CONFIG_ARM_LPAE
/*
@@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
unsigned long addr)
__tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
 #endif
 
-   tlb_remove_table(tlb, pte);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 static inline void
 __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
 {
 #ifdef CONFIG_ARM_LPAE
-   struct page *page = virt_to_page(pmdp);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pgtable_pmd_page_dtor(page);
-   tlb_remove_table(tlb, page);
+   pagetable_pmd_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 #endif
 }
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 13fc4bb5f792..fdeaee30d167 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -737,11 +737,12 @@ static void __init *early_alloc(unsigned long sz)
 
 static void *__init late_alloc(unsigned long sz)
 {
-   void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
+   void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM,
+   get_order(sz));
 
-   if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
+   if (!ptdesc || !pagetable_pte_ctor(ptdesc))
BUG();
-   return ptr;
+   return ptdesc_to_virt(ptdesc);
 }
 
 static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
-- 
2.40.1

[PATCH mm-unstable v8 30/31] um: Convert {pmd, pte}_free_tlb() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents. Also cleans up some spacing issues.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/um/include/asm/pgalloc.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
index 8ec7cd46dd96..de5e31c64793 100644
--- a/arch/um/include/asm/pgalloc.h
+++ b/arch/um/include/asm/pgalloc.h
@@ -25,19 +25,19 @@
  */
 extern pgd_t *pgd_alloc(struct mm_struct *);
 
-#define __pte_free_tlb(tlb,pte, address)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb),(pte));   \
+#define __pte_free_tlb(tlb, pte, address)  \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #ifdef CONFIG_3_LEVEL_PGTABLES
 
-#define __pmd_free_tlb(tlb, pmd, address)  \
-do {   \
-   pgtable_pmd_page_dtor(virt_to_page(pmd));   \
-   tlb_remove_page((tlb),virt_to_page(pmd));   \
-} while (0)\
+#define __pmd_free_tlb(tlb, pmd, address)  \
+do {   \
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));\
+   tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
+} while (0)
 
 #endif
 
-- 
2.40.1

[PATCH mm-unstable v8 31/31] mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

2023-07-31 Thread Vishal Moola (Oracle)

These functions are no longer necessary. Remove them and cleanup
Documentation referencing them.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 Documentation/mm/split_page_table_lock.rst| 12 +--
 .../zh_CN/mm/split_page_table_lock.rst| 14 ++---
 include/linux/mm.h| 20 ---
 3 files changed, 13 insertions(+), 33 deletions(-)

diff --git a/Documentation/mm/split_page_table_lock.rst 
b/Documentation/mm/split_page_table_lock.rst
index a834fad9de12..e4f6972eb6c0 100644
--- a/Documentation/mm/split_page_table_lock.rst
+++ b/Documentation/mm/split_page_table_lock.rst
@@ -58,7 +58,7 @@ Support of split page table lock by an architecture
 ===
 
 There's no need in special enabling of PTE split page table lock: everything
-required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
+required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which
 must be called on PTE table allocation / freeing.
 
 Make sure the architecture doesn't use slab allocator for page table
@@ -68,8 +68,8 @@ This field shares storage with page->ptl.
 PMD split lock only makes sense if you have more than two page table
 levels.
 
-PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
-allocation and pgtable_pmd_page_dtor() on freeing.
+PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table
+allocation and pagetable_pmd_dtor() on freeing.
 
 Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
 pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
@@ -77,7 +77,7 @@ paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
 
 With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
 
-NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
+NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must
 be handled properly.
 
 page->ptl
@@ -97,7 +97,7 @@ trick:
split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
one more cache line for indirect access;
 
-The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
-pgtable_pmd_page_ctor() for PMD table.
+The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in
+pagetable_pmd_ctor() for PMD table.
 
 Please, never access page->ptl directly -- use appropriate helper.
diff --git a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst 
b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
index 4fb7aa666037..a2c288670a24 100644
--- a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
+++ b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
@@ -56,16 +56,16 @@ Hugetlb特定的辅助函数:
 架构对分页表锁的支持
 
 
-没有必要特别启用PTE分页表锁：所有需要的东西都由pgtable_pte_page_ctor()
-和pgtable_pte_page_dtor()完成，它们必须在PTE表分配/释放时被调用。
+没有必要特别启用PTE分页表锁：所有需要的东西都由pagetable_pte_ctor()
+和pagetable_pte_dtor()完成，它们必须在PTE表分配/释放时被调用。
 
 确保架构不使用slab分配器来分配页表：slab使用page->slab_cache来分配其页
 面。这个区域与page->ptl共享存储。
 
 PMD分页锁只有在你有两个以上的页表级别时才有意义。
 
-启用PMD分页锁需要在PMD表分配时调用pgtable_pmd_page_ctor()，在释放时调
-用pgtable_pmd_page_dtor()。
+启用PMD分页锁需要在PMD表分配时调用pagetable_pmd_ctor()，在释放时调
+用pagetable_pmd_dtor()。
 
 分配通常发生在pmd_alloc_one()中，释放发生在pmd_free()和pmd_free_tlb()
 中，但要确保覆盖所有的PMD表分配/释放路径：即X86_PAE在pgd_alloc()中预先
@@ -73,7 +73,7 @@ PMD分页锁只有在你有两个以上的页表级别时才有意义。
 
 一切就绪后，你可以设置CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK。
 
-注意：pgtable_pte_page_ctor()和pgtable_pmd_page_ctor()可能失败--必
+注意：pagetable_pte_ctor()和pagetable_pmd_ctor()可能失败--必
 须正确处理。
 
 page->ptl
@@ -90,7 +90,7 @@ page->ptl用于访问分割页表锁，其中'page'是包含该表的页面struc
的指针并动态分配它。这允许在启用DEBUG_SPINLOCK或DEBUG_LOCK_ALLOC的
情况下使用分页锁，但由于间接访问而多花了一个缓存行。
 
-PTE表的spinlock_t分配在pgtable_pte_page_ctor()中，PMD表的spinlock_t
-分配在pgtable_pmd_page_ctor()中。
+PTE表的spinlock_t分配在pagetable_pte_ctor()中，PMD表的spinlock_t
+分配在pagetable_pmd_ctor()中。
 
 请不要直接访问page->ptl - -使用适当的辅助函数。
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bd3d99d81984..e4e34ecbc2ea 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2913,11 +2913,6 @@ static inline bool pagetable_pte_ctor(struct ptdesc 
*ptdesc)
return true;
 }
 
-static inline bool pgtable_pte_page_ctor(struct page *page)
-{
-   return pagetable_pte_ctor(page_ptdesc(page));
-}
-
 static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
 {
struct folio *folio = ptdesc_folio(ptdesc);
@@ -2927,11 +2922,6 @@ static inline void pagetable_pte_dtor(struct ptdesc 
*ptdesc)
lruvec_stat_sub_folio(folio, NR_PAGETABLE);
 }
 
-static inline void pgtable_pte_page_dtor(struct page *page)
-{
-   pagetable_pte_dtor(page_ptdesc(page));
-}
-
 pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp);
 static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr)
 {
@@ -3038,11 +3028,6 @@ static inline bool pagetable_pmd_ctor(struct ptdesc 
*ptdesc)
return true;
 }

[PATCH mm-unstable v8 19/31] csky: Convert __pte_free_tlb() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Guo Ren 
Acked-by: Mike Rapoport (IBM) 
---
 arch/csky/include/asm/pgalloc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 7d57e5da0914..9c84c9012e53 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -63,8 +63,8 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 #define __pte_free_tlb(tlb, pte, address)  \
 do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page(tlb, pte);  \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc(tlb, page_ptdesc(pte));  \
 } while (0)
 
 extern void pagetable_init(void);
-- 
2.40.1

[PATCH mm-unstable v8 14/31] s390: Convert various pgalloc functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/s390/include/asm/pgalloc.h |   4 +-
 arch/s390/include/asm/tlb.h |   4 +-
 arch/s390/mm/pgalloc.c  | 128 
 3 files changed, 69 insertions(+), 67 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 89a9d5ef94f8..376b4b23bdaa 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long vmaddr)
if (!table)
return NULL;
crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
-   if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
+   if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
crst_table_free(mm, table);
return NULL;
}
@@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
if (mm_pmd_folded(mm))
return;
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));
crst_table_free(mm, (unsigned long *) pmd);
 }
 
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index b91f4a9b044c..383b1f91442c 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmd,
 {
if (mm_pmd_folded(tlb->mm))
return;
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));
__tlb_adjust_range(tlb, address, PAGE_SIZE);
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_puds = 1;
-   tlb_remove_table(tlb, pmd);
+   tlb_remove_ptdesc(tlb, pmd);
 }
 
 /*
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index d7374add7820..07fc660a24aa 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
 
 unsigned long *crst_table_alloc(struct mm_struct *mm)
 {
-   struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
 
-   if (!page)
+   if (!ptdesc)
return NULL;
-   arch_set_page_dat(page, CRST_ALLOC_ORDER);
-   return (unsigned long *) page_to_virt(page);
+   arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
+   return (unsigned long *) ptdesc_to_virt(ptdesc);
 }
 
 void crst_table_free(struct mm_struct *mm, unsigned long *table)
 {
-   free_pages((unsigned long)table, CRST_ALLOC_ORDER);
+   pagetable_free(virt_to_ptdesc(table));
 }
 
 static void __crst_table_upgrade(void *arg)
@@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
unsigned int bits)
 
 struct page *page_table_alloc_pgste(struct mm_struct *mm)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
u64 *table;
 
-   page = alloc_page(GFP_KERNEL);
-   if (page) {
-   table = (u64 *)page_to_virt(page);
+   ptdesc = pagetable_alloc(GFP_KERNEL, 0);
+   if (ptdesc) {
+   table = (u64 *)ptdesc_to_virt(ptdesc);
memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
}
-   return page;
+   return ptdesc_page(ptdesc);
 }
 
 void page_table_free_pgste(struct page *page)
 {
-   __free_page(page);
+   pagetable_free(page_ptdesc(page));
 }
 
 #endif /* CONFIG_PGSTE */
@@ -242,7 +242,7 @@ void page_table_free_pgste(struct page *page)
 unsigned long *page_table_alloc(struct mm_struct *mm)
 {
unsigned long *table;
-   struct page *page;
+   struct ptdesc *ptdesc;
unsigned int mask, bit;
 
/* Try to get a fragment of a 4K page as a 2K page table */
@@ -250,9 +250,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
table = NULL;
spin_lock_bh(>context.lock);
if (!list_empty(>context.pgtable_list)) {
-   page = list_first_entry(>context.pgtable_list,
-   struct page, lru);
-   mask = atomic_read(>_refcount) >> 24;
+   ptdesc = list_first_entry(>context.pgtable_list,
+   struct ptdesc, pt_list);
+   mask = atomic_read(>_refcount) >> 24;
/*
 * The pending removal bits must also be checked.

[PATCH mm-unstable v8 16/31] pgalloc: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/asm-generic/pgalloc.h | 88 +--
 1 file changed, 52 insertions(+), 36 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index a7cf825befae..c75d4a753849 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -8,7 +8,7 @@
 #define GFP_PGTABLE_USER   (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
 
 /**
- * __pte_alloc_one_kernel - allocate a page for PTE-level kernel page table
+ * __pte_alloc_one_kernel - allocate memory for a PTE-level kernel page table
  * @mm: the mm_struct of the current context
  *
  * This function is intended for architectures that need
@@ -18,12 +18,17 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL &
+   ~__GFP_HIGHMEM, 0);
+
+   if (!ptdesc)
+   return NULL;
+   return ptdesc_address(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
 /**
- * pte_alloc_one_kernel - allocate a page for PTE-level kernel page table
+ * pte_alloc_one_kernel - allocate memory for a PTE-level kernel page table
  * @mm: the mm_struct of the current context
  *
  * Return: pointer to the allocated memory or %NULL on error
@@ -35,40 +40,40 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
*mm)
 #endif
 
 /**
- * pte_free_kernel - free PTE-level kernel page table page
+ * pte_free_kernel - free PTE-level kernel page table memory
  * @mm: the mm_struct of the current context
  * @pte: pointer to the memory containing the page table
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long)pte);
+   pagetable_free(virt_to_ptdesc(pte));
 }
 
 /**
- * __pte_alloc_one - allocate a page for PTE-level user page table
+ * __pte_alloc_one - allocate memory for a PTE-level user page table
  * @mm: the mm_struct of the current context
  * @gfp: GFP flags to use for the allocation
  *
- * Allocates a page and runs the pgtable_pte_page_ctor().
+ * Allocate memory for a page table and ptdesc and runs pagetable_pte_ctor().
  *
  * This function is intended for architectures that need
  * anything beyond simple page allocation or must have custom GFP flags.
  *
- * Return: `struct page` initialized as page table or %NULL on error
+ * Return: `struct page` referencing the ptdesc or %NULL on error
  */
 static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
 {
-   struct page *pte;
+   struct ptdesc *ptdesc;
 
-   pte = alloc_page(gfp);
-   if (!pte)
+   ptdesc = pagetable_alloc(gfp, 0);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(pte)) {
-   __free_page(pte);
+   if (!pagetable_pte_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   return pte;
+   return ptdesc_page(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
@@ -76,9 +81,9 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, 
gfp_t gfp)
  * pte_alloc_one - allocate a page for PTE-level user page table
  * @mm: the mm_struct of the current context
  *
- * Allocates a page and runs the pgtable_pte_page_ctor().
+ * Allocate memory for a page table and ptdesc and runs pagetable_pte_ctor().
  *
- * Return: `struct page` initialized as page table or %NULL on error
+ * Return: `struct page` referencing the ptdesc or %NULL on error
  */
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
@@ -92,14 +97,16 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
  */
 
 /**
- * pte_free - free PTE-level user page table page
+ * pte_free - free PTE-level user page table memory
  * @mm: the mm_struct of the current context
- * @pte_page: the `struct page` representing the page table
+ * @pte_page: the `struct page` referencing the ptdesc
  */
 static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
 {
-   pgtable_pte_page_dtor(pte_page);
-   __free_page(pte_page);
+   struct ptdesc *ptdesc = page_ptdesc(pte_page);
+
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 
@@ -107,10 +114,11 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
 
 #ifndef __HAVE_ARCH_PMD_ALLOC_ONE
 /**
- * pmd_alloc_one - allocate a page for PMD-level page table
+ * pmd_alloc_one - allocate memory for a PMD-level page table
  * @mm: the mm_struct of the current context
  *
- * Allocates a page and runs the

[PATCH mm-unstable v8 22/31] m68k: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
Acked-by: Geert Uytterhoeven 
---
 arch/m68k/include/asm/mcf_pgalloc.h  | 47 ++--
 arch/m68k/include/asm/sun3_pgalloc.h |  8 ++---
 arch/m68k/mm/motorola.c  |  4 +--
 3 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/arch/m68k/include/asm/mcf_pgalloc.h 
b/arch/m68k/include/asm/mcf_pgalloc.h
index 5c2c0a864524..302c5bf67179 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -5,22 +5,22 @@
 #include 
 #include 
 
-extern inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long) pte);
+   pagetable_free(virt_to_ptdesc(pte));
 }
 
 extern const char bad_pmd_string[];
 
-extern inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
-   unsigned long page = __get_free_page(GFP_DMA);
+   struct ptdesc *ptdesc = pagetable_alloc((GFP_DMA | __GFP_ZERO) &
+   ~__GFP_HIGHMEM, 0);
 
-   if (!page)
+   if (!ptdesc)
return NULL;
 
-   memset((void *)page, 0, PAGE_SIZE);
-   return (pte_t *) (page);
+   return ptdesc_address(ptdesc);
 }
 
 extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned long address)
@@ -35,36 +35,34 @@ extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned 
long address)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
  unsigned long address)
 {
-   struct page *page = virt_to_page(pgtable);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-   struct page *page = alloc_pages(GFP_DMA, 0);
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | __GFP_ZERO, 0);
pte_t *pte;
 
-   if (!page)
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(page)) {
-   __free_page(page);
+   if (!pagetable_pte_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   pte = page_address(page);
-   clear_page(pte);
-
+   pte = ptdesc_address(ptdesc);
return pte;
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
 {
-   struct page *page = virt_to_page(pgtable);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 /*
@@ -75,16 +73,19 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t 
pgtable)
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-   free_page((unsigned long) pgd);
+   pagetable_free(virt_to_ptdesc(pgd));
 }
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
pgd_t *new_pgd;
+   struct ptdesc *ptdesc = pagetable_alloc((GFP_DMA | __GFP_NOWARN) &
+   ~__GFP_HIGHMEM, 0);
 
-   new_pgd = (pgd_t *)__get_free_page(GFP_DMA | __GFP_NOWARN);
-   if (!new_pgd)
+   if (!ptdesc)
return NULL;
+   new_pgd = ptdesc_address(ptdesc);
+
memcpy(new_pgd, swapper_pg_dir, PTRS_PER_PGD * sizeof(pgd_t));
memset(new_pgd, 0, PAGE_OFFSET >> PGDIR_SHIFT);
return new_pgd;
diff --git a/arch/m68k/include/asm/sun3_pgalloc.h 
b/arch/m68k/include/asm/sun3_pgalloc.h
index 198036aff519..ff48573db2c0 100644
--- a/arch/m68k/include/asm/sun3_pgalloc.h
+++ b/arch/m68k/include/asm/sun3_pgalloc.h
@@ -17,10 +17,10 @@
 
 extern const char bad_pmd_string[];
 
-#define __pte_free_tlb(tlb,pte,addr)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t 
*pte)
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index c75984e2d86b..594575a0780c 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -161,7 +161,7 @@

[PATCH mm-unstable v8 28/31] sparc64: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/sparc/mm/init_64.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 0d7fd793924c..9a63a3e08e40 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2893,14 +2893,15 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 
 pgtable_t pte_alloc_one(struct mm_struct *mm)
 {
-   struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!page)
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL | __GFP_ZERO, 0);
+
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pte_page_ctor(page)) {
-   __free_page(page);
+   if (!pagetable_pte_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
-   return (pte_t *) page_address(page);
+   return ptdesc_address(ptdesc);
 }
 
 void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
@@ -2910,10 +2911,10 @@ void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 
 static void __pte_free(pgtable_t pte)
 {
-   struct page *page = virt_to_page(pte);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pte);
 
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 void pte_free(struct mm_struct *mm, pgtable_t pte)
-- 
2.40.1

[PATCH mm-unstable v8 24/31] nios2: Convert __pte_free_tlb() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
Acked-by: Dinh Nguyen 
---
 arch/nios2/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/nios2/include/asm/pgalloc.h b/arch/nios2/include/asm/pgalloc.h
index ecd1657bb2ce..ce6bb8e74271 100644
--- a/arch/nios2/include/asm/pgalloc.h
+++ b/arch/nios2/include/asm/pgalloc.h
@@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
 
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
-#define __pte_free_tlb(tlb, pte, addr) \
-   do {\
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+   do {\
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
} while (0)
 
 #endif /* _ASM_NIOS2_PGALLOC_H */
-- 
2.40.1

[PATCH mm-unstable v8 20/31] hexagon: Convert __pte_free_tlb() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/hexagon/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/hexagon/include/asm/pgalloc.h 
b/arch/hexagon/include/asm/pgalloc.h
index f0c47e6a7427..55988625e6fb 100644
--- a/arch/hexagon/include/asm/pgalloc.h
+++ b/arch/hexagon/include/asm/pgalloc.h
@@ -87,10 +87,10 @@ static inline void pmd_populate_kernel(struct mm_struct 
*mm, pmd_t *pmd,
max_kernel_seg = pmdindex;
 }
 
-#define __pte_free_tlb(tlb, pte, addr) \
-do {   \
-   pgtable_pte_page_dtor((pte));   \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor((page_ptdesc(pte))); \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif
-- 
2.40.1

[PATCH mm-unstable v8 23/31] mips: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/pgalloc.h | 32 ++--
 arch/mips/mm/pgtable.c  |  8 +---
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index f72e737dda21..40e40a7eb94a 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -51,13 +51,13 @@ extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-   free_pages((unsigned long)pgd, PGD_TABLE_ORDER);
+   pagetable_free(virt_to_ptdesc(pgd));
 }
 
-#define __pte_free_tlb(tlb,pte,address)\
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, address)  \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -65,18 +65,18 @@ do {
\
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pmd_t *pmd;
-   struct page *pg;
+   struct ptdesc *ptdesc;
 
-   pg = alloc_pages(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
-   if (!pg)
+   ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
+   if (!ptdesc)
return NULL;
 
-   if (!pgtable_pmd_page_ctor(pg)) {
-   __free_pages(pg, PMD_TABLE_ORDER);
+   if (!pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   pmd = (pmd_t *)page_address(pg);
+   pmd = ptdesc_address(ptdesc);
pmd_init(pmd);
return pmd;
 }
@@ -90,10 +90,14 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pud_t *pud;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM,
+   PUD_TABLE_ORDER);
 
-   pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
-   if (pud)
-   pud_init(pud);
+   if (!ptdesc)
+   return NULL;
+   pud = ptdesc_address(ptdesc);
+
+   pud_init(pud);
return pud;
 }
 
diff --git a/arch/mips/mm/pgtable.c b/arch/mips/mm/pgtable.c
index b13314be5d0e..1506e458040d 100644
--- a/arch/mips/mm/pgtable.c
+++ b/arch/mips/mm/pgtable.c
@@ -10,10 +10,12 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   pgd_t *ret, *init;
+   pgd_t *init, *ret = NULL;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM,
+   PGD_TABLE_ORDER);
 
-   ret = (pgd_t *) __get_free_pages(GFP_KERNEL, PGD_TABLE_ORDER);
-   if (ret) {
+   if (ptdesc) {
+   ret = ptdesc_address(ptdesc);
init = pgd_offset(_mm, 0UL);
pgd_init(ret);
memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
-- 
2.40.1

[PATCH mm-unstable v8 29/31] sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents

2023-07-31 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable pte constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/sparc/mm/srmmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
index 13f027afc875..8393faa3e596 100644
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -355,7 +355,8 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
return NULL;
page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
spin_lock(>page_table_lock);
-   if (page_ref_inc_return(page) == 2 && !pgtable_pte_page_ctor(page)) {
+   if (page_ref_inc_return(page) == 2 &&
+   !pagetable_pte_ctor(page_ptdesc(page))) {
page_ref_dec(page);
ptep = NULL;
}
@@ -371,7 +372,7 @@ void pte_free(struct mm_struct *mm, pgtable_t ptep)
page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
spin_lock(>page_table_lock);
if (page_ref_dec_return(page) == 1)
-   pgtable_pte_page_dtor(page);
+   pagetable_pte_dtor(page_ptdesc(page));
spin_unlock(>page_table_lock);
 
srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
-- 
2.40.1

[PATCH mm-unstable v8 18/31] arm64: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/tlb.h | 14 --
 arch/arm64/mm/mmu.c  |  7 ---
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..2c29239d05c3 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
  unsigned long addr)
 {
-   pgtable_pte_page_dtor(pte);
-   tlb_remove_table(tlb, pte);
+   struct ptdesc *ptdesc = page_ptdesc(pte);
+
+   pagetable_pte_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 2
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
  unsigned long addr)
 {
-   struct page *page = virt_to_page(pmdp);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pgtable_pmd_page_dtor(page);
-   tlb_remove_table(tlb, page);
+   pagetable_pmd_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif
 
@@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
  unsigned long addr)
 {
-   tlb_remove_table(tlb, virt_to_page(pudp));
+   tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
 }
 #endif
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 95d360805f8a..47781bec6171 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
 static phys_addr_t pgd_pgtable_alloc(int shift)
 {
phys_addr_t pa = __pgd_pgtable_alloc(shift);
+   struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
 
/*
 * Call proper page table ctor in case later we need to
@@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
 * this pre-allocated page table.
 *
 * We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
-* folded, and if so pgtable_pmd_page_ctor() becomes nop.
+* folded, and if so pagetable_pte_ctor() becomes nop.
 */
if (shift == PAGE_SHIFT)
-   BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
+   BUG_ON(!pagetable_pte_ctor(ptdesc));
else if (shift == PMD_SHIFT)
-   BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
+   BUG_ON(!pagetable_pmd_ctor(ptdesc));
 
return pa;
 }
-- 
2.40.1

[PATCH mm-unstable v8 21/31] loongarch: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/loongarch/include/asm/pgalloc.h | 27 +++
 arch/loongarch/mm/pgtable.c  |  7 ---
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/loongarch/include/asm/pgalloc.h 
b/arch/loongarch/include/asm/pgalloc.h
index af1d1e4a6965..23f5b1107246 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -45,9 +45,9 @@ extern void pagetable_init(void);
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 
 #define __pte_free_tlb(tlb, pte, address)  \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -55,18 +55,18 @@ do {
\
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pmd_t *pmd;
-   struct page *pg;
+   struct ptdesc *ptdesc;
 
-   pg = alloc_page(GFP_KERNEL_ACCOUNT);
-   if (!pg)
+   ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, 0);
+   if (!ptdesc)
return NULL;
 
-   if (!pgtable_pmd_page_ctor(pg)) {
-   __free_page(pg);
+   if (!pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   pmd = (pmd_t *)page_address(pg);
+   pmd = ptdesc_address(ptdesc);
pmd_init(pmd);
return pmd;
 }
@@ -80,10 +80,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long address)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
pud_t *pud;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
-   pud = (pud_t *) __get_free_page(GFP_KERNEL);
-   if (pud)
-   pud_init(pud);
+   if (!ptdesc)
+   return NULL;
+   pud = ptdesc_address(ptdesc);
+
+   pud_init(pud);
return pud;
 }
 
diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
index 36a6dc0148ae..5bd102b51f7c 100644
--- a/arch/loongarch/mm/pgtable.c
+++ b/arch/loongarch/mm/pgtable.c
@@ -11,10 +11,11 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-   pgd_t *ret, *init;
+   pgd_t *init, *ret = NULL;
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
-   ret = (pgd_t *) __get_free_page(GFP_KERNEL);
-   if (ret) {
+   if (ptdesc) {
+   ret = (pgd_t *)ptdesc_address(ptdesc);
init = pgd_offset(_mm, 0UL);
pgd_init(ret);
memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
-- 
2.40.1

[PATCH mm-unstable v8 27/31] sh: Convert pte_free_tlb() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents. Also cleans up some spacing issues.

Signed-off-by: Vishal Moola (Oracle) 
Reviewed-by: Geert Uytterhoeven 
Acked-by: John Paul Adrian Glaubitz 
Acked-by: Mike Rapoport (IBM) 
---
 arch/sh/include/asm/pgalloc.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
index a9e98233c4d4..5d8577ab1591 100644
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -2,6 +2,7 @@
 #ifndef __ASM_SH_PGALLOC_H
 #define __ASM_SH_PGALLOC_H
 
+#include 
 #include 
 
 #define __HAVE_ARCH_PMD_ALLOC_ONE
@@ -31,10 +32,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,
set_pmd(pmd, __pmd((unsigned long)page_address(pte)));
 }
 
-#define __pte_free_tlb(tlb,pte,addr)   \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif /* __ASM_SH_PGALLOC_H */
-- 
2.40.1

[PATCH mm-unstable v8 26/31] riscv: Convert alloc_{pmd, pte}_late() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

As part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents, convert various page table functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Palmer Dabbelt 
Acked-by: Mike Rapoport (IBM) 
---
 arch/riscv/include/asm/pgalloc.h |  8 
 arch/riscv/mm/init.c | 16 ++--
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 59dc12b5b7e8..d169a4f41a2e 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -153,10 +153,10 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-#define __pte_free_tlb(tlb, pte, buf)   \
-do {\
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), pte);\
+#define __pte_free_tlb(tlb, pte, buf)  \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
 } while (0)
 #endif /* CONFIG_MMU */
 
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 9ce504737d18..430a3d05a841 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -353,12 +353,10 @@ static inline phys_addr_t __init 
alloc_pte_fixmap(uintptr_t va)
 
 static phys_addr_t __init alloc_pte_late(uintptr_t va)
 {
-   unsigned long vaddr;
-
-   vaddr = __get_free_page(GFP_KERNEL);
-   BUG_ON(!vaddr || !pgtable_pte_page_ctor(virt_to_page((void *)vaddr)));
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
-   return __pa(vaddr);
+   BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc));
+   return __pa((pte_t *)ptdesc_address(ptdesc));
 }
 
 static void __init create_pte_mapping(pte_t *ptep,
@@ -436,12 +434,10 @@ static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va)
 
 static phys_addr_t __init alloc_pmd_late(uintptr_t va)
 {
-   unsigned long vaddr;
-
-   vaddr = __get_free_page(GFP_KERNEL);
-   BUG_ON(!vaddr || !pgtable_pmd_page_ctor(virt_to_page((void *)vaddr)));
+   struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
 
-   return __pa(vaddr);
+   BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc));
+   return __pa((pmd_t *)ptdesc_address(ptdesc));
 }
 
 static void __init create_pmd_mapping(pmd_t *pmdp,
-- 
2.40.1

[PATCH mm-unstable v8 15/31] mm: Remove page table members from struct page

2023-07-31 Thread Vishal Moola (Oracle)

The page table members are now split out into their own ptdesc struct.
Remove them from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm_types.h | 18 --
 include/linux/pgtable.h  |  3 ---
 2 files changed, 21 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index da538ff68953..aae6af098031 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -141,24 +141,6 @@ struct page {
struct {/* Tail pages of compound page */
unsigned long compound_head;/* Bit zero is set */
};
-   struct {/* Page table pages */
-   unsigned long _pt_pad_1;/* compound_head */
-   pgtable_t pmd_huge_pte; /* protected by page->ptl */
-   /*
-* A PTE page table page might be freed by use of
-* rcu_head: which overlays those two fields above.
-*/
-   unsigned long _pt_pad_2;/* mapping */
-   union {
-   struct mm_struct *pt_mm; /* x86 pgds only */
-   atomic_t pt_frag_refcount; /* powerpc */
-   };
-#if ALLOC_SPLIT_PTLOCKS
-   spinlock_t *ptl;
-#else
-   spinlock_t ptl;
-#endif
-   };
struct {/* ZONE_DEVICE pages */
/** @pgmap: Points to the hosting device page map. */
struct dev_pagemap *pgmap;
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 250fdeba68f3..1a984c300d45 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1051,10 +1051,7 @@ struct ptdesc {
 TABLE_MATCH(flags, __page_flags);
 TABLE_MATCH(compound_head, pt_list);
 TABLE_MATCH(compound_head, _pt_pad_1);
-TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
 TABLE_MATCH(mapping, __page_mapping);
-TABLE_MATCH(pt_mm, pt_mm);
-TABLE_MATCH(ptl, ptl);
 TABLE_MATCH(rcu_head, pt_rcu_head);
 TABLE_MATCH(page_type, __page_type);
 TABLE_MATCH(_refcount, _refcount);
-- 
2.40.1

[PATCH mm-unstable v8 25/31] openrisc: Convert __pte_free_tlb() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/openrisc/include/asm/pgalloc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/openrisc/include/asm/pgalloc.h 
b/arch/openrisc/include/asm/pgalloc.h
index b7b2b8d16fad..c6a73772a546 100644
--- a/arch/openrisc/include/asm/pgalloc.h
+++ b/arch/openrisc/include/asm/pgalloc.h
@@ -66,10 +66,10 @@ extern inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
 
-#define __pte_free_tlb(tlb, pte, addr) \
-do {   \
-   pgtable_pte_page_dtor(pte); \
-   tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr) \
+do {   \
+   pagetable_pte_dtor(page_ptdesc(pte));   \
+   tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
 } while (0)
 
 #endif
-- 
2.40.1

[PATCH mm-unstable v8 13/31] x86: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Some of the functions use the *get*page*() helper functions. Convert
these to use pagetable_alloc() and ptdesc_address() instead to help
standardize page tables further.

Signed-off-by: Vishal Moola (Oracle) 
---
 arch/x86/mm/pgtable.c | 47 ++-
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 15a8009a4480..d3a93e8766ee 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
 
 void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 {
-   pgtable_pte_page_dtor(pte);
+   pagetable_pte_dtor(page_ptdesc(pte));
paravirt_release_pte(page_to_pfn(pte));
paravirt_tlb_remove_table(tlb, pte);
 }
@@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 #if CONFIG_PGTABLE_LEVELS > 2
 void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
-   struct page *page = virt_to_page(pmd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
/*
 * NOTE! For PAE, any changes to the top page-directory-pointer-table
@@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #ifdef CONFIG_X86_PAE
tlb->need_flush_all = 1;
 #endif
-   pgtable_pmd_page_dtor(page);
-   paravirt_tlb_remove_table(tlb, page);
+   pagetable_pmd_dtor(ptdesc);
+   paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
 }
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 
 static inline void pgd_list_add(pgd_t *pgd)
 {
-   struct page *page = virt_to_page(pgd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
 
-   list_add(>lru, _list);
+   list_add(>pt_list, _list);
 }
 
 static inline void pgd_list_del(pgd_t *pgd)
 {
-   struct page *page = virt_to_page(pgd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
 
-   list_del(>lru);
+   list_del(>pt_list);
 }
 
 #define UNSHARED_PTRS_PER_PGD  \
@@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
 
 static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
 {
-   virt_to_page(pgd)->pt_mm = mm;
+   virt_to_ptdesc(pgd)->pt_mm = mm;
 }
 
 struct mm_struct *pgd_page_get_mm(struct page *page)
 {
-   return page->pt_mm;
+   return page_ptdesc(page)->pt_mm;
 }
 
 static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
@@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
pmd_t *pmd)
 static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
int i;
+   struct ptdesc *ptdesc;
 
for (i = 0; i < count; i++)
if (pmds[i]) {
-   pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
-   free_page((unsigned long)pmds[i]);
+   ptdesc = virt_to_ptdesc(pmds[i]);
+
+   pagetable_pmd_dtor(ptdesc);
+   pagetable_free(ptdesc);
mm_dec_nr_pmds(mm);
}
 }
@@ -230,18 +233,24 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
*pmds[], int count)
 
if (mm == _mm)
gfp &= ~__GFP_ACCOUNT;
+   gfp &= ~__GFP_HIGHMEM;
 
for (i = 0; i < count; i++) {
-   pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
-   if (!pmd)
+   pmd_t *pmd = NULL;
+   struct ptdesc *ptdesc = pagetable_alloc(gfp, 0);
+
+   if (!ptdesc)
failed = true;
-   if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
-   free_page((unsigned long)pmd);
-   pmd = NULL;
+   if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
+   ptdesc = NULL;
failed = true;
}
-   if (pmd)
+   if (ptdesc) {
mm_inc_nr_pmds(mm);
+   pmd = ptdesc_address(ptdesc);
+   }
+
pmds[i] = pmd;
}
 
@@ -830,7 +839,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 
free_page((unsigned long)pmd_sv);
 
-   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   pagetable_pmd_dtor(virt_to_ptdesc(pmd));
free_page((unsigned long)pmd);
 
return 1;
-- 
2.40.1

[PATCH mm-unstable v8 12/31] powerpc: Convert various functions to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

In order to split struct ptdesc from struct page, convert various
functions to use ptdescs.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/powerpc/mm/book3s64/mmu_context.c | 10 ++---
 arch/powerpc/mm/book3s64/pgtable.c | 32 +++---
 arch/powerpc/mm/pgtable-frag.c | 58 +-
 3 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
b/arch/powerpc/mm/book3s64/mmu_context.c
index c766e4c26e42..1715b07c630c 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
 static void pmd_frag_destroy(void *pmd_frag)
 {
int count;
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = virt_to_page(pmd_frag);
+   ptdesc = virt_to_ptdesc(pmd_frag);
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_pmd_page_dtor(page);
-   __free_page(page);
+   if (atomic_sub_and_test(PMD_FRAG_NR - count, 
>pt_frag_refcount)) {
+   pagetable_pmd_dtor(ptdesc);
+   pagetable_free(ptdesc);
}
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 85c84e89e3ea..1212deeabe15 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
 static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 {
void *ret = NULL;
-   struct page *page;
+   struct ptdesc *ptdesc;
gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
 
if (mm == _mm)
gfp &= ~__GFP_ACCOUNT;
-   page = alloc_page(gfp);
-   if (!page)
+   ptdesc = pagetable_alloc(gfp, 0);
+   if (!ptdesc)
return NULL;
-   if (!pgtable_pmd_page_ctor(page)) {
-   __free_pages(page, 0);
+   if (!pagetable_pmd_ctor(ptdesc)) {
+   pagetable_free(ptdesc);
return NULL;
}
 
-   atomic_set(>pt_frag_refcount, 1);
+   atomic_set(>pt_frag_refcount, 1);
 
-   ret = page_address(page);
+   ret = ptdesc_address(ptdesc);
/*
 * if we support only one fragment just return the
 * allocated page.
@@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
 
spin_lock(>page_table_lock);
/*
-* If we find pgtable_page set, we return
+* If we find ptdesc_page set, we return
 * the allocated page with single fragment
 * count.
 */
if (likely(!mm->context.pmd_frag)) {
-   atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
+   atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(>page_table_lock);
@@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, unsigned 
long vmaddr)
 
 void pmd_fragment_free(unsigned long *pmd)
 {
-   struct page *page = virt_to_page(pmd);
+   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
 
-   if (PageReserved(page))
-   return free_reserved_page(page);
+   if (pagetable_is_reserved(ptdesc))
+   return free_reserved_ptdesc(ptdesc);
 
-   BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
-   if (atomic_dec_and_test(>pt_frag_refcount)) {
-   pgtable_pmd_page_dtor(page);
-   __free_page(page);
+   BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
+   if (atomic_dec_and_test(>pt_frag_refcount)) {
+   pagetable_pmd_dtor(ptdesc);
+   pagetable_free(ptdesc);
}
 }
 
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 0c6b68130025..8c31802f97e8 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -18,15 +18,15 @@
 void pte_frag_destroy(void *pte_frag)
 {
int count;
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = virt_to_page(pte_frag);
+   ptdesc = virt_to_ptdesc(pte_frag);
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_pte_page_dtor(page);
-   __free_page(page);
+   if (atomic_sub_and_test(PTE_FRAG_NR - count, 
>pt_frag_refcount)) {
+   pagetable_pte_dtor(ptdesc);
+   pagetable_free(ptdesc);
}
 }
 
@@ -55,25 +55,25 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 static pte_t

[PATCH mm-unstable v8 11/31] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-07-31 Thread Vishal Moola (Oracle)

Create pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(),
and pagetable_pmd_dtor() and make the original pgtable
constructor/destructors wrappers.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm.h | 56 ++
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffddae95af78..bd3d99d81984 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2902,20 +2902,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { 
return true; }
 static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
-static inline bool pgtable_pte_page_ctor(struct page *page)
+static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
 {
-   if (!ptlock_init(page_ptdesc(page)))
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   if (!ptlock_init(ptdesc))
return false;
-   __SetPageTable(page);
-   inc_lruvec_page_state(page, NR_PAGETABLE);
+   __folio_set_pgtable(folio);
+   lruvec_stat_add_folio(folio, NR_PAGETABLE);
return true;
 }
 
+static inline bool pgtable_pte_page_ctor(struct page *page)
+{
+   return pagetable_pte_ctor(page_ptdesc(page));
+}
+
+static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   ptlock_free(ptdesc);
+   __folio_clear_pgtable(folio);
+   lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+}
+
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
-   ptlock_free(page_ptdesc(page));
-   __ClearPageTable(page);
-   dec_lruvec_page_state(page, NR_PAGETABLE);
+   pagetable_pte_dtor(page_ptdesc(page));
 }
 
 pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp);
@@ -3013,20 +3027,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
*mm, pmd_t *pmd)
return ptl;
 }
 
-static inline bool pgtable_pmd_page_ctor(struct page *page)
+static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
 {
-   if (!pmd_ptlock_init(page_ptdesc(page)))
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   if (!pmd_ptlock_init(ptdesc))
return false;
-   __SetPageTable(page);
-   inc_lruvec_page_state(page, NR_PAGETABLE);
+   __folio_set_pgtable(folio);
+   lruvec_stat_add_folio(folio, NR_PAGETABLE);
return true;
 }
 
+static inline bool pgtable_pmd_page_ctor(struct page *page)
+{
+   return pagetable_pmd_ctor(page_ptdesc(page));
+}
+
+static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   pmd_ptlock_free(ptdesc);
+   __folio_clear_pgtable(folio);
+   lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+}
+
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
-   pmd_ptlock_free(page_ptdesc(page));
-   __ClearPageTable(page);
-   dec_lruvec_page_state(page, NR_PAGETABLE);
+   pagetable_pmd_dtor(page_ptdesc(page));
 }
 
 /*
-- 
2.40.1

[PATCH mm-unstable v8 08/31] mm: Convert ptlock_init() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52ef09c100a2..675972d3f7e4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2873,7 +2873,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
 }
 
-static inline bool ptlock_init(struct page *page)
+static inline bool ptlock_init(struct ptdesc *ptdesc)
 {
/*
 * prep_new_page() initialize page->private (and therefore page->ptl)
@@ -2882,10 +2882,10 @@ static inline bool ptlock_init(struct page *page)
 * It can happen if arch try to use slab for page table allocation:
 * slab code uses page->slab_cache, which share storage with page->ptl.
 */
-   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
-   if (!ptlock_alloc(page_ptdesc(page)))
+   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
+   if (!ptlock_alloc(ptdesc))
return false;
-   spin_lock_init(ptlock_ptr(page_ptdesc(page)));
+   spin_lock_init(ptlock_ptr(ptdesc));
return true;
 }
 
@@ -2898,13 +2898,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return >page_table_lock;
 }
 static inline void ptlock_cache_init(void) {}
-static inline bool ptlock_init(struct page *page) { return true; }
+static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void ptlock_free(struct page *page) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
 static inline bool pgtable_pte_page_ctor(struct page *page)
 {
-   if (!ptlock_init(page))
+   if (!ptlock_init(page_ptdesc(page)))
return false;
__SetPageTable(page);
inc_lruvec_page_state(page, NR_PAGETABLE);
@@ -2979,7 +2979,7 @@ static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
ptdesc->pmd_huge_pte = NULL;
 #endif
-   return ptlock_init(ptdesc_page(ptdesc));
+   return ptlock_init(ptdesc);
 }
 
 static inline void pmd_ptlock_free(struct page *page)
-- 
2.40.1

[PATCH mm-unstable v8 09/31] mm: Convert pmd_ptlock_free() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 675972d3f7e4..774fe83c0c16 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2982,12 +2982,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
*ptdesc)
return ptlock_init(ptdesc);
 }
 
-static inline void pmd_ptlock_free(struct page *page)
+static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
+   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
 #endif
-   ptlock_free(page);
+   ptlock_free(ptdesc_page(ptdesc));
 }
 
 #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
@@ -3000,7 +3000,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
 }
 
 static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
-static inline void pmd_ptlock_free(struct page *page) {}
+static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
 
 #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
 
@@ -3024,7 +3024,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
*page)
 
 static inline void pgtable_pmd_page_dtor(struct page *page)
 {
-   pmd_ptlock_free(page);
+   pmd_ptlock_free(page_ptdesc(page));
__ClearPageTable(page);
dec_lruvec_page_state(page, NR_PAGETABLE);
 }
-- 
2.40.1

[PATCH mm-unstable v8 10/31] mm: Convert ptlock_free() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm.h | 10 +-
 mm/memory.c|  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 774fe83c0c16..ffddae95af78 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2842,7 +2842,7 @@ static inline void pagetable_free(struct ptdesc *pt)
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
 bool ptlock_alloc(struct ptdesc *ptdesc);
-extern void ptlock_free(struct page *page);
+void ptlock_free(struct ptdesc *ptdesc);
 
 static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
@@ -2858,7 +2858,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
return true;
 }
 
-static inline void ptlock_free(struct page *page)
+static inline void ptlock_free(struct ptdesc *ptdesc)
 {
 }
 
@@ -2899,7 +2899,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
 }
 static inline void ptlock_cache_init(void) {}
 static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
-static inline void ptlock_free(struct page *page) {}
+static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
 static inline bool pgtable_pte_page_ctor(struct page *page)
@@ -2913,7 +2913,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
*page)
 
 static inline void pgtable_pte_page_dtor(struct page *page)
 {
-   ptlock_free(page);
+   ptlock_free(page_ptdesc(page));
__ClearPageTable(page);
dec_lruvec_page_state(page, NR_PAGETABLE);
 }
@@ -2987,7 +2987,7 @@ static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
 #endif
-   ptlock_free(ptdesc_page(ptdesc));
+   ptlock_free(ptdesc);
 }
 
 #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
diff --git a/mm/memory.c b/mm/memory.c
index 4fee273595e2..e5e370cdac23 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6242,8 +6242,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
return true;
 }
 
-void ptlock_free(struct page *page)
+void ptlock_free(struct ptdesc *ptdesc)
 {
-   kmem_cache_free(page_ptl_cachep, page->ptl);
+   kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
 }
 #endif
-- 
2.40.1

[PATCH mm-unstable v8 07/31] mm: Convert pmd_ptlock_init() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c155f82dd2cc..52ef09c100a2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2974,12 +2974,12 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return ptlock_ptr(pmd_ptdesc(pmd));
 }
 
-static inline bool pmd_ptlock_init(struct page *page)
+static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-   page->pmd_huge_pte = NULL;
+   ptdesc->pmd_huge_pte = NULL;
 #endif
-   return ptlock_init(page);
+   return ptlock_init(ptdesc_page(ptdesc));
 }
 
 static inline void pmd_ptlock_free(struct page *page)
@@ -2999,7 +2999,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
*mm, pmd_t *pmd)
return >page_table_lock;
 }
 
-static inline bool pmd_ptlock_init(struct page *page) { return true; }
+static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void pmd_ptlock_free(struct page *page) {}
 
 #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
@@ -3015,7 +3015,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, 
pmd_t *pmd)
 
 static inline bool pgtable_pmd_page_ctor(struct page *page)
 {
-   if (!pmd_ptlock_init(page))
+   if (!pmd_ptlock_init(page_ptdesc(page)))
return false;
__SetPageTable(page);
inc_lruvec_page_state(page, NR_PAGETABLE);
-- 
2.40.1

[PATCH mm-unstable v8 05/31] mm: Convert ptlock_alloc() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm.h | 6 +++---
 mm/memory.c| 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf552a106e4a..b3fce0bfe201 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2841,7 +2841,7 @@ static inline void pagetable_free(struct ptdesc *pt)
 #if USE_SPLIT_PTE_PTLOCKS
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
-extern bool ptlock_alloc(struct page *page);
+bool ptlock_alloc(struct ptdesc *ptdesc);
 extern void ptlock_free(struct page *page);
 
 static inline spinlock_t *ptlock_ptr(struct page *page)
@@ -2853,7 +2853,7 @@ static inline void ptlock_cache_init(void)
 {
 }
 
-static inline bool ptlock_alloc(struct page *page)
+static inline bool ptlock_alloc(struct ptdesc *ptdesc)
 {
return true;
 }
@@ -2883,7 +2883,7 @@ static inline bool ptlock_init(struct page *page)
 * slab code uses page->slab_cache, which share storage with page->ptl.
 */
VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
-   if (!ptlock_alloc(page))
+   if (!ptlock_alloc(page_ptdesc(page)))
return false;
spin_lock_init(ptlock_ptr(page));
return true;
diff --git a/mm/memory.c b/mm/memory.c
index 2130bad76eb1..4fee273595e2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6231,14 +6231,14 @@ void __init ptlock_cache_init(void)
SLAB_PANIC, NULL);
 }
 
-bool ptlock_alloc(struct page *page)
+bool ptlock_alloc(struct ptdesc *ptdesc)
 {
spinlock_t *ptl;
 
ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
if (!ptl)
return false;
-   page->ptl = ptl;
+   ptdesc->ptl = ptl;
return true;
 }
 
-- 
2.40.1

[PATCH mm-unstable v8 03/31] mm: add utility functions for ptdesc

2023-07-31 Thread Vishal Moola (Oracle)

Introduce utility functions setting the foundation for ptdescs. These
will also assist in the splitting out of ptdesc from struct page.

Functions that focus on the descriptor are prefixed with ptdesc_* while
functions that focus on the pagetable are prefixed with pagetable_*.

pagetable_alloc() is defined to allocate new ptdesc pages as compound
pages. This is to standardize ptdescs by allowing for one allocation
and one free function, in contrast to 2 allocation and 2 free functions.

Signed-off-by: Vishal Moola (Oracle) 
---
 include/asm-generic/tlb.h | 11 +++
 include/linux/mm.h| 61 +++
 include/linux/pgtable.h   | 12 
 3 files changed, 84 insertions(+)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index bc32a2284c56..129a3a759976 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -480,6 +480,17 @@ static inline void tlb_remove_page(struct mmu_gather *tlb, 
struct page *page)
return tlb_remove_page_size(tlb, page, PAGE_SIZE);
 }
 
+static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
+{
+   tlb_remove_table(tlb, pt);
+}
+
+/* Like tlb_remove_ptdesc, but for page-like page directories. */
+static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
ptdesc *pt)
+{
+   tlb_remove_page(tlb, ptdesc_page(pt));
+}
+
 static inline void tlb_change_page_size(struct mmu_gather *tlb,
 unsigned int page_size)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ba73f09ae4a..3fda0ad41cf2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2787,6 +2787,57 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
pud_t *pud, unsigned long a
 }
 #endif /* CONFIG_MMU */
 
+static inline struct ptdesc *virt_to_ptdesc(const void *x)
+{
+   return page_ptdesc(virt_to_page(x));
+}
+
+static inline void *ptdesc_to_virt(const struct ptdesc *pt)
+{
+   return page_to_virt(ptdesc_page(pt));
+}
+
+static inline void *ptdesc_address(const struct ptdesc *pt)
+{
+   return folio_address(ptdesc_folio(pt));
+}
+
+static inline bool pagetable_is_reserved(struct ptdesc *pt)
+{
+   return folio_test_reserved(ptdesc_folio(pt));
+}
+
+/**
+ * pagetable_alloc - Allocate pagetables
+ * @gfp:GFP flags
+ * @order:  desired pagetable order
+ *
+ * pagetable_alloc allocates memory for page tables as well as a page table
+ * descriptor to describe that memory.
+ *
+ * Return: The ptdesc describing the allocated page tables.
+ */
+static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
+{
+   struct page *page = alloc_pages(gfp | __GFP_COMP, order);
+
+   return page_ptdesc(page);
+}
+
+/**
+ * pagetable_free - Free pagetables
+ * @pt:The page table descriptor
+ *
+ * pagetable_free frees the memory of all page tables described by a page
+ * table descriptor and the memory for the descriptor itself.
+ */
+static inline void pagetable_free(struct ptdesc *pt)
+{
+   struct page *page = ptdesc_page(pt);
+
+   __free_pages(page, compound_order(page));
+}
+
 #if USE_SPLIT_PTE_PTLOCKS
 #if ALLOC_SPLIT_PTLOCKS
 void __init ptlock_cache_init(void);
@@ -2913,6 +2964,11 @@ static inline struct page *pmd_pgtable_page(pmd_t *pmd)
return virt_to_page((void *)((unsigned long) pmd & mask));
 }
 
+static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
+{
+   return page_ptdesc(pmd_pgtable_page(pmd));
+}
+
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
return ptlock_ptr(pmd_pgtable_page(pmd));
@@ -3025,6 +3081,11 @@ static inline void mark_page_reserved(struct page *page)
adjust_managed_page_count(page, -1);
 }
 
+static inline void free_reserved_ptdesc(struct ptdesc *pt)
+{
+   free_reserved_page(ptdesc_page(pt));
+}
+
 /*
  * Default method to free all the __init memory into the buddy system.
  * The freed pages will be poisoned with pattern "poison" if it's within
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 1f92514d54b0..250fdeba68f3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1064,6 +1064,18 @@ TABLE_MATCH(memcg_data, pt_memcg_data);
 #undef TABLE_MATCH
 static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
 
+#define ptdesc_page(pt)(_Generic((pt), 
\
+   const struct ptdesc *:  (const struct page *)(pt),  \
+   struct ptdesc *:(struct page *)(pt)))
+
+#define ptdesc_folio(pt)   (_Generic((pt), \
+   const struct ptdesc *:  (const struct folio *)(pt), \
+   struct ptdesc *:(struct folio *)(pt)))
+
+#define page_ptdesc(p) (_Generic((p),  \
+   const struct page *:(const struct ptdesc *)(p), \
+   struct page *:  (struct ptdesc *)(p)))
+
 /*
  * No-op macros

[PATCH mm-unstable v8 06/31] mm: Convert ptlock_ptr() to use ptdescs

2023-07-31 Thread Vishal Moola (Oracle)

This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 arch/x86/xen/mmu_pv.c |  2 +-
 include/linux/mm.h| 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index e0a975165de7..8796ec310483 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -667,7 +667,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
mm_struct *mm)
spinlock_t *ptl = NULL;
 
 #if USE_SPLIT_PTE_PTLOCKS
-   ptl = ptlock_ptr(page);
+   ptl = ptlock_ptr(page_ptdesc(page));
spin_lock_nest_lock(ptl, >page_table_lock);
 #endif
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b3fce0bfe201..c155f82dd2cc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2844,9 +2844,9 @@ void __init ptlock_cache_init(void);
 bool ptlock_alloc(struct ptdesc *ptdesc);
 extern void ptlock_free(struct page *page);
 
-static inline spinlock_t *ptlock_ptr(struct page *page)
+static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
-   return page->ptl;
+   return ptdesc->ptl;
 }
 #else /* ALLOC_SPLIT_PTLOCKS */
 static inline void ptlock_cache_init(void)
@@ -2862,15 +2862,15 @@ static inline void ptlock_free(struct page *page)
 {
 }
 
-static inline spinlock_t *ptlock_ptr(struct page *page)
+static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
-   return >ptl;
+   return >ptl;
 }
 #endif /* ALLOC_SPLIT_PTLOCKS */
 
 static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(pmd_page(*pmd));
+   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
 }
 
 static inline bool ptlock_init(struct page *page)
@@ -2885,7 +2885,7 @@ static inline bool ptlock_init(struct page *page)
VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
if (!ptlock_alloc(page_ptdesc(page)))
return false;
-   spin_lock_init(ptlock_ptr(page));
+   spin_lock_init(ptlock_ptr(page_ptdesc(page)));
return true;
 }
 
@@ -2971,7 +2971,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
 
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
+   return ptlock_ptr(pmd_ptdesc(pmd));
 }
 
 static inline bool pmd_ptlock_init(struct page *page)
-- 
2.40.1

[PATCH mm-unstable v8 02/31] pgtable: Create struct ptdesc

2023-07-31 Thread Vishal Moola (Oracle)

Currently, page table information is stored within struct page. As part
of simplifying struct page, create struct ptdesc for page table
information.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/pgtable.h | 71 +
 1 file changed, 71 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5f36c055794b..1f92514d54b0 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -993,6 +993,77 @@ static inline void ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
 #endif /* CONFIG_MMU */
 
+
+/**
+ * struct ptdesc -Memory descriptor for page tables.
+ * @__page_flags: Same as page flags. Unused for page tables.
+ * @pt_rcu_head:  For freeing page table pages.
+ * @pt_list:  List of used page tables. Used for s390 and x86.
+ * @_pt_pad_1:Padding that aliases with page's compound head.
+ * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
+ * @__page_mapping:   Aliases with page->mapping. Unused for page tables.
+ * @pt_mm:Used for x86 pgds.
+ * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
only.
+ * @_pt_pad_2:Padding to ensure proper alignment.
+ * @ptl:  Lock for the page table.
+ * @__page_type:  Same as page->page_type. Unused for page tables.
+ * @_refcount:Same as page refcount. Used for s390 page tables.
+ * @pt_memcg_data:Memcg data. Tracked for page tables here.
+ *
+ * This struct overlays struct page for now. Do not modify without a good
+ * understanding of the issues.
+ */
+struct ptdesc {
+   unsigned long __page_flags;
+
+   union {
+   struct rcu_head pt_rcu_head;
+   struct list_head pt_list;
+   struct {
+   unsigned long _pt_pad_1;
+   pgtable_t pmd_huge_pte;
+   };
+   };
+   unsigned long __page_mapping;
+
+   union {
+   struct mm_struct *pt_mm;
+   atomic_t pt_frag_refcount;
+   };
+
+   union {
+   unsigned long _pt_pad_2;
+#if ALLOC_SPLIT_PTLOCKS
+   spinlock_t *ptl;
+#else
+   spinlock_t ptl;
+#endif
+   };
+   unsigned int __page_type;
+   atomic_t _refcount;
+#ifdef CONFIG_MEMCG
+   unsigned long pt_memcg_data;
+#endif
+};
+
+#define TABLE_MATCH(pg, pt)\
+   static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
+TABLE_MATCH(flags, __page_flags);
+TABLE_MATCH(compound_head, pt_list);
+TABLE_MATCH(compound_head, _pt_pad_1);
+TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
+TABLE_MATCH(mapping, __page_mapping);
+TABLE_MATCH(pt_mm, pt_mm);
+TABLE_MATCH(ptl, ptl);
+TABLE_MATCH(rcu_head, pt_rcu_head);
+TABLE_MATCH(page_type, __page_type);
+TABLE_MATCH(_refcount, _refcount);
+#ifdef CONFIG_MEMCG
+TABLE_MATCH(memcg_data, pt_memcg_data);
+#endif
+#undef TABLE_MATCH
+static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
+
 /*
  * No-op macros that just return the current protection value. Defined here
  * because these macros can be used even if CONFIG_MMU is not defined.
-- 
2.40.1

[PATCH mm-unstable v8 04/31] mm: Convert pmd_pgtable_page() callers to use pmd_ptdesc()

2023-07-31 Thread Vishal Moola (Oracle)

Converts internal pmd_pgtable_page() callers to use pmd_ptdesc(). This
removes some direct accesses to struct page, working towards splitting
out struct ptdesc from struct page.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/mm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3fda0ad41cf2..bf552a106e4a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2971,7 +2971,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
 
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
-   return ptlock_ptr(pmd_pgtable_page(pmd));
+   return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
 }
 
 static inline bool pmd_ptlock_init(struct page *page)
@@ -2990,7 +2990,7 @@ static inline void pmd_ptlock_free(struct page *page)
ptlock_free(page);
 }
 
-#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
+#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
 
 #else
 
-- 
2.40.1

[PATCH mm-unstable v8 00/31] Split ptdesc from struct page

2023-07-31 Thread Vishal Moola (Oracle)

The MM subsystem is trying to shrink struct page. This patchset
introduces a memory descriptor for page table tracking - struct ptdesc.

This patchset introduces ptdesc, splits ptdesc from struct page, and
converts many callers of page table constructor/destructors to use ptdescs.

Ptdesc is a foundation to further standardize page tables, and eventually
allow for dynamic allocation of page tables independent of struct page.
However, the use of pages for page table tracking is quite deeply
ingrained and varied across archictectures, so there is still a lot of
work to be done before that can happen.

This is rebased on mm-unstable.

v8:
  Fix some compiler issues

v7:
  Drop s390 gmap ptdesc conversions - gmap is unecessary complication
that can be dealt with later
  Be more thorough with ptdesc struct sanity checks and comments
  Rebase onto mm-unstable

Vishal Moola (Oracle) (31):
  mm: Add PAGE_TYPE_OP folio functions
  pgtable: Create struct ptdesc
  mm: add utility functions for ptdesc
  mm: Convert pmd_pgtable_page() callers to use pmd_ptdesc()
  mm: Convert ptlock_alloc() to use ptdescs
  mm: Convert ptlock_ptr() to use ptdescs
  mm: Convert pmd_ptlock_init() to use ptdescs
  mm: Convert ptlock_init() to use ptdescs
  mm: Convert pmd_ptlock_free() to use ptdescs
  mm: Convert ptlock_free() to use ptdescs
  mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}
  powerpc: Convert various functions to use ptdescs
  x86: Convert various functions to use ptdescs
  s390: Convert various pgalloc functions to use ptdescs
  mm: Remove page table members from struct page
  pgalloc: Convert various functions to use ptdescs
  arm: Convert various functions to use ptdescs
  arm64: Convert various functions to use ptdescs
  csky: Convert __pte_free_tlb() to use ptdescs
  hexagon: Convert __pte_free_tlb() to use ptdescs
  loongarch: Convert various functions to use ptdescs
  m68k: Convert various functions to use ptdescs
  mips: Convert various functions to use ptdescs
  nios2: Convert __pte_free_tlb() to use ptdescs
  openrisc: Convert __pte_free_tlb() to use ptdescs
  riscv: Convert alloc_{pmd, pte}_late() to use ptdescs
  sh: Convert pte_free_tlb() to use ptdescs
  sparc64: Convert various functions to use ptdescs
  sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents
  um: Convert {pmd, pte}_free_tlb() to use ptdescs
  mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

 Documentation/mm/split_page_table_lock.rst|  12 +-
 .../zh_CN/mm/split_page_table_lock.rst|  14 +-
 arch/arm/include/asm/tlb.h|  12 +-
 arch/arm/mm/mmu.c |   7 +-
 arch/arm64/include/asm/tlb.h  |  14 +-
 arch/arm64/mm/mmu.c   |   7 +-
 arch/csky/include/asm/pgalloc.h   |   4 +-
 arch/hexagon/include/asm/pgalloc.h|   8 +-
 arch/loongarch/include/asm/pgalloc.h  |  27 ++--
 arch/loongarch/mm/pgtable.c   |   7 +-
 arch/m68k/include/asm/mcf_pgalloc.h   |  47 +++---
 arch/m68k/include/asm/sun3_pgalloc.h  |   8 +-
 arch/m68k/mm/motorola.c   |   4 +-
 arch/mips/include/asm/pgalloc.h   |  32 ++--
 arch/mips/mm/pgtable.c|   8 +-
 arch/nios2/include/asm/pgalloc.h  |   8 +-
 arch/openrisc/include/asm/pgalloc.h   |   8 +-
 arch/powerpc/mm/book3s64/mmu_context.c|  10 +-
 arch/powerpc/mm/book3s64/pgtable.c|  32 ++--
 arch/powerpc/mm/pgtable-frag.c|  58 +++
 arch/riscv/include/asm/pgalloc.h  |   8 +-
 arch/riscv/mm/init.c  |  16 +-
 arch/s390/include/asm/pgalloc.h   |   4 +-
 arch/s390/include/asm/tlb.h   |   4 +-
 arch/s390/mm/pgalloc.c| 128 +++
 arch/sh/include/asm/pgalloc.h |   9 +-
 arch/sparc/mm/init_64.c   |  17 +-
 arch/sparc/mm/srmmu.c |   5 +-
 arch/um/include/asm/pgalloc.h |  18 +--
 arch/x86/mm/pgtable.c |  47 +++---
 arch/x86/xen/mmu_pv.c |   2 +-
 include/asm-generic/pgalloc.h |  88 +-
 include/asm-generic/tlb.h |  11 ++
 include/linux/mm.h| 151 +-
 include/linux/mm_types.h  |  18 ---
 include/linux/page-flags.h|  30 +++-
 include/linux/pgtable.h   |  80 ++
 mm/memory.c   |   8 +-
 38 files changed, 586 insertions(+), 385 deletions(-)

-- 
2.40.1

[PATCH mm-unstable v8 01/31] mm: Add PAGE_TYPE_OP folio functions

2023-07-31 Thread Vishal Moola (Oracle)

No folio equivalents for page type operations have been defined, so
define them for later folio conversions.

Also changes the Page##uname macros to take in const struct page* since
we only read the memory here.

Signed-off-by: Vishal Moola (Oracle) 
Acked-by: Mike Rapoport (IBM) 
---
 include/linux/page-flags.h | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 92a2063a0a23..9218028caf33 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page)
 
 #define PageType(page, flag)   \
((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
+#define folio_test_type(folio, flag)   \
+   ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
 
 static inline int page_type_has_type(unsigned int page_type)
 {
@@ -919,27 +921,41 @@ static inline int page_has_type(struct page *page)
return page_type_has_type(page->page_type);
 }
 
-#define PAGE_TYPE_OPS(uname, lname)\
-static __always_inline int Page##uname(struct page *page)  \
+#define PAGE_TYPE_OPS(uname, lname, fname) \
+static __always_inline int Page##uname(const struct page *page)
\
 {  \
return PageType(page, PG_##lname);  \
 }  \
+static __always_inline int folio_test_##fname(const struct folio *folio)\
+{  \
+   return folio_test_type(folio, PG_##lname);  \
+}  \
 static __always_inline void __SetPage##uname(struct page *page)
\
 {  \
VM_BUG_ON_PAGE(!PageType(page, 0), page);   \
page->page_type &= ~PG_##lname; \
 }  \
+static __always_inline void __folio_set_##fname(struct folio *folio)   \
+{  \
+   VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
+   folio->page.page_type &= ~PG_##lname;   \
+}  \
 static __always_inline void __ClearPage##uname(struct page *page)  \
 {  \
VM_BUG_ON_PAGE(!Page##uname(page), page);   \
page->page_type |= PG_##lname;  \
-}
+}  \
+static __always_inline void __folio_clear_##fname(struct folio *folio) \
+{  \
+   VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \
+   folio->page.page_type |= PG_##lname;\
+}  \
 
 /*
  * PageBuddy() indicates that the page is free and in the buddy system
  * (see mm/page_alloc.c).
  */
-PAGE_TYPE_OPS(Buddy, buddy)
+PAGE_TYPE_OPS(Buddy, buddy, buddy)
 
 /*
  * PageOffline() indicates that the page is logically offline although the
@@ -963,7 +979,7 @@ PAGE_TYPE_OPS(Buddy, buddy)
  * pages should check PageOffline() and synchronize with such drivers using
  * page_offline_freeze()/page_offline_thaw().
  */
-PAGE_TYPE_OPS(Offline, offline)
+PAGE_TYPE_OPS(Offline, offline, offline)
 
 extern void page_offline_freeze(void);
 extern void page_offline_thaw(void);
@@ -973,12 +989,12 @@ extern void page_offline_end(void);
 /*
  * Marks pages in use as page tables.
  */
-PAGE_TYPE_OPS(Table, table)
+PAGE_TYPE_OPS(Table, table, pgtable)
 
 /*
  * Marks guardpages used with debug_pagealloc.
  */
-PAGE_TYPE_OPS(Guard, guard)
+PAGE_TYPE_OPS(Guard, guard, guard)
 
 extern bool is_free_buddy_page(struct page *page);
 
-- 
2.40.1

Xen Security Advisory 433 v3 (CVE-2023-20593) - x86/AMD: Zenbleed

2023-07-31 Thread Xen . org security team

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory CVE-2023-20593 / XSA-433
   version 3

  x86/AMD: Zenbleed

UPDATES IN VERSION 3


The patch provided with earlier versions was buggy.  It unintentionally
disable more bits than expected in the control register.  The contents of this
register is not generally known, so the effects on the system are unknown.

A patch correcting this error has been committed and backported to all stable
trees which got the XSA-433 fix originally.  Additionally, it is attached to
this advisory as xsa433-bugfix.patch, and applicable to all branches in this
form.

ISSUE DESCRIPTION
=

Researchers at Google have discovered Zenbleed, a hardware bug causing
corruption of the vector registers.

When a VZEROUPPER instruction is discarded as part of a bad transient
execution path, its effect on internal tracking are not unwound
correctly.  This manifests as the wrong micro-architectural state
becoming architectural, and corrupting the vector registers.

Note: While this malfunction is related to speculative execution, this
  is not a speculative sidechannel vulnerability.

The corruption is not random.  It happens to be stale values from the
physical vector register file, a structure competitively shared between
sibling threads.  Therefore, an attacker can directly access data from
the sibling thread, or from a more privileged context.

For more details, see:
  https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
  
https://github.com/google/security-research/security/advisories/GHSA-v6wh-rxpg-cmm8

IMPACT
==

With very low probability, corruption of the vector registers can occur.
This data corruption causes mis-calculations in subsequent logic.

An attacker can exploit this bug to read data from different contexts on
the same core.  Examples of such data includes key material, cypher and
plaintext from the AES-NI instructions, or the contents of REP-MOVS
instructions, commonly used to implement memcpy().

VULNERABLE SYSTEMS
==

Systems running all versions of Xen are affected.

This bug is specific to the AMD Zen2 microarchitecture.  AMD do not
believe that other microarchitectures are affected.

MITIGATION
==

This issue can be mitigated by disabling AVX, either by booting Xen with
`cpuid=no-avx` on the command line, or by specifying `cpuid="host:avx=0"` in
the vm.cfg file of all untrusted VMs.  However, this will come with a
significant impact on the system and is not recommended for anyone able to
deploy the microcode or patch described below.

RESOLUTION
==

AMD are producing microcode updates to address the bug.  Consult your
dom0 OS vendor.  This microcode is effective when late-loaded, which can
be performed on a live system without reboot.

In cases where microcode is not available, the appropriate attached
patch updates Xen to use a control register to avoid the issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa433.patch   xen-unstable
xsa433-4.17.patch  Xen 4.17.x
xsa433-4.16.patch  Xen 4.16.x
xsa433-4.15.patch  Xen 4.15.x
xsa433-4.14.patch  Xen 4.14.x

xsa433-bugfix.patchxen-unstable - Xen 4.14.x

$ sha256sum xsa433*
a9331733b63e3e566f1436a48e9bd9e8b86eb48da6a8ced72ff4affb7859e027  xsa433.patch
6f1db2a2078b0152631f819f8ddee21720dabe185ec49dc9806d4a9d3478adfd  
xsa433-4.14.patch
ca3a92605195307ae9b6ff87240beb52a097c125a760c919d7b9a0aff6e557c0  
xsa433-4.15.patch
e5e94b3de68842a1c8d222802fb204d64acd118e3293c8e909dfaf3ada23d912  
xsa433-4.16.patch
41d12104869b7e8307cd93af1af12b4fd75a669aeff15d31b234dc72981ae407  
xsa433-4.17.patch
b197e45aef1f47b6aebc005f876e3f593c2f32b9e5164a195f487cea6e174f75  
xsa433-bugfix.patch
$

NOTE CONCERNING TIMELINE


This issue is subject to coordinated disclosure on August 8th.  The
discoverer chose to publish details ahead of this timeline.
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmTH6HQMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZlIoH/jv0CJKyFgiaOLp4DFeLfzKLHJDbLKywj0bv4Q3V
wgrWVYwzVbpPwvuArS1dOujgEosTiUggKbzDPEpHa5reVKeeLwCBFxMrU+KYRf9h
6eglOJfiW73xxyggnvQLyh3tEGY0sQF0+OFQMsN5twiXsZS0pxLPomq0slun1VkV
8ZDl4FKjmEmAurE7fOtVdvzwZ6tKVLNaGYIm4wUwNZ0Cd4qo1GHIHsvUT9ZPFc82
jwMjCwk7Ca0Iv1GMyXESwOyR/0tLm07nT9isdkXcVFNgg8JL4f2CxGK9Vt97POEw
w9KVo3SoBf+/vY4Fk4HGSXieEofzVBDjO5NkPhESEC+3oMw=
=Z3fJ
-END PGP SIGNATURE-


xsa433.patch
Description: Binary data


xsa433-4.14.patch
Description: Binary data


xsa433-4.15.patch
Description: Binary data


xsa433-4.16.patch
Description: Binary data


xsa433-4.17.patch
Description: Binary data


xsa433-bugfix.patch
Description: Binary

Re: Python in Domain Configurations

2023-07-31 Thread Elliott Mitchell

On Mon, Jul 31, 2023 at 05:59:55AM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Jul 24, 2023 at 01:28:24PM -0700, Elliott Mitchell wrote:
> > On Fri, Jul 07, 2023 at 03:13:07PM -0700, Elliott Mitchell wrote:
> > > 
> > > The only context I could find was 54fbaf446b and
> > > https://wiki.xenproject.org/wiki/PythonInXlConfig which don't explain
> > > the reasoning.
> > > 
> > > Would the maintainers be amenable to revisiting the decision to remove
> > > support for full Python in domain configuration files?
> > 
> > Any chance of this getting a response?
> > 
> > On examination it appears domain configuration files are a proper subset
> > of Python.  The interface to the parser is a bit interesting, but it
> > looks fairly simple to replace the parser with libpython.
> > 
> > My goal is to create an init script for some automatically started
> > domains.  Issue is there can be ordering concerns with domain start/stop,
> > and this seems best handled by adding an extra setting to the
> > configuration files.  If full Python syntax is available, I can use that
> > for this extra data.

> I don't know full history here, but from my point of view, having a
> full-fledged script as a config file is undesirable for several reasons:
>  - it's easy to have unintended side effects of just loading a config
>file
>  - loading config file can no longer be assumed to be "cheap"
>  - dynamic config file means you can no long rely on file timestamp/hash
>to check if anything changed (I don't think it's an issue for the
>current xl/libxl, but could be for some higher level tools)
>  - leads to issues with various sandboxes - for example SELinux policy
>allowing scripted config file would be excessively permissive
> 
> So, IMHO reducing config file from a full python (like it used to be in
> xend times) into a static file with well defined syntax was an
> improvement. Lets not go backward.

I wouldn't really call the existing format "well defined".  While there
are patterns which are followed, some of those have rather a lot of
wiggle room.

I'm still looking, but I suspect libpython can be told to only accept
trivial operations (assignments to variables) and reject anything which
includes a jump or conditional.


> As for your original problem, IIUC you would like to add some data that
> would _not_ be interpreted by libxl, right? For that you can use
> comments with some specific marker for your script. This approach used
> to work well for SysV init script, and in fact for a very similar use case
> (ordering and dependencies, among other things).

That is /not/ the issue.  `xl` simply ignores any variables which it
doesn't interpret (this is in fact a Bad Thing).  I need to know what the
limits to the syntax are.

Notice how many init scripts do `. /etc/default/` to load
configuration?  I'm thinking it would be very handy to use a similar
technique to load domain.cfg files, with Python being the interpreter.


I also think some portions of the domain.cfg format might work better
with full Python syntax.  For example might it be handier to allow:

disk = [
{
'vdev': 'xvda',
'format': 'raw',
'access': 'rw',
'target': '/dev/disk/by-path/foo-bar-baz',
},
]


It looks pretty feasible to replace the low-level parser with libpython.
Now to examining the "ast" module and finding out whether a file can be
loaded while rejecting conditionals.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi

2023-07-31 Thread Chen, Jiqian

Hi,

On 2023/3/18 04:55, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
>> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
>>> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
 On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> On 17.03.2023 00:19, Stefano Stabellini wrote:
>> On Thu, 16 Mar 2023, Jan Beulich wrote:
>>> So yes, it then all boils down to that Linux-
>>> internal question.
>>
>> Excellent question but we'll have to wait for Ray as he is the one with
>> access to the hardware. But I have this data I can share in the
>> meantime:
>>
>> [1.260378] IRQ to pin mappings:
>> [1.260387] IRQ1 -> 0:1
>> [1.260395] IRQ2 -> 0:2
>> [1.260403] IRQ3 -> 0:3
>> [1.260410] IRQ4 -> 0:4
>> [1.260418] IRQ5 -> 0:5
>> [1.260425] IRQ6 -> 0:6
>> [1.260432] IRQ7 -> 0:7
>> [1.260440] IRQ8 -> 0:8
>> [1.260447] IRQ9 -> 0:9
>> [1.260455] IRQ10 -> 0:10
>> [1.260462] IRQ11 -> 0:11
>> [1.260470] IRQ12 -> 0:12
>> [1.260478] IRQ13 -> 0:13
>> [1.260485] IRQ14 -> 0:14
>> [1.260493] IRQ15 -> 0:15
>> [1.260505] IRQ106 -> 1:8
>> [1.260513] IRQ112 -> 1:4
>> [1.260521] IRQ116 -> 1:13
>> [1.260529] IRQ117 -> 1:14
>> [1.260537] IRQ118 -> 1:15
>> [1.260544]  done.
>
> And what does Linux think are IRQs 16 ... 105? Have you compared with
> Linux running baremetal on the same hardware?

 So I have some emails from Ray from he time he was looking into this,
 and on Linux dom0 PVH dmesg there is:

 [0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec0, GSI 
 0-23
 [0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 
 24-55

 So it seems the vIO-APIC data provided by Xen to dom0 is at least
 consistent.
  
>> And I think Ray traced the point in Linux where Linux gives us an IRQ ==
>> 112 (which is the one causing issues):
>>
>> __acpi_register_gsi->
>> acpi_register_gsi_ioapic->
>> mp_map_gsi_to_irq->
>> mp_map_pin_to_irq->
>> __irq_resolve_mapping()
>>
>> if (likely(data)) {
>> desc = irq_data_to_desc(data);
>> if (irq)
>> *irq = data->irq;
>> /* this IRQ is 112, IO-APIC-34 domain */
>> }


 Could this all be a result of patch 4/5 in the Linux series ("[RFC
 PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
 __acpi_register_gsi hook is installed for PVH in order to setup GSIs
 using PHYSDEV ops instead of doing it natively from the IO-APIC?

 FWIW, the introduced function in that patch
 (acpi_register_gsi_xen_pvh()) seems to unconditionally call
 acpi_register_gsi_ioapic() without checking if the GSI is already
 registered, which might lead to multiple IRQs being allocated for the
 same underlying GSI?
>>>
>>> I understand this point and I think it needs investigating.
>>>
>>>
 As I commented there, I think that approach is wrong.  If the GSI has
 not been mapped in Xen (because dom0 hasn't unmasked the respective
 IO-APIC pin) we should add some logic in the toolstack to map it
 before attempting to bind.
>>>
>>> But this statement confuses me. The toolstack doesn't get involved in
>>> IRQ setup for PCI devices for HVM guests?
>>
>> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
>> to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
>> cold be removed (maybe for qemu-trad only?) or it's also required by
>> QEMU upstream, I would have to investigate more.
> 
> You are right. I am not certain, but it seems like a mistake in the
> toolstack to me. In theory, pci_add_dm_done should only be needed for PV
> guests, not for HVM guests. I am not sure. But I can see the call to
> xc_physdev_map_pirq you were referring to now.
> 
> 
>> It's my understanding it's in pci_add_dm_done() where Ray was getting
>> the mismatched IRQ vs GSI number.
> 
> I think the mismatch was actually caused by the xc_physdev_map_pirq call
> from QEMU, which makes sense because in any case it should happen before
> the same call done by pci_add_dm_done (pci_add_dm_done is called after
> sending the pci passthrough QMP command to QEMU). So the first to hit
> the IRQ!=GSI problem would be QEMU.


Sorry for replying to you so late. And thank you all for review. I realized 
that your questions mainly focus on the following points: 1. Why irq is not 
equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call 
PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call 
PHYSDEVOP_setup_gsi in

RE: [PATCH v9 05/36] x86/opcode: Add ERETU, ERETS instructions to x86-opcode-map

2023-07-31 Thread Li, Xin3

> > Add instruction opcodes used by FRED ERETU/ERETS to x86-opcode-map.
> >
> > Opcode numbers are per FRED spec v5.0.
> >
> > Signed-off-by: H. Peter Anvin (Intel) 
> > Tested-by: Shan Kang 
> > Signed-off-by: Xin Li 
> 
> This looks good to me. (ERETS has the opcode F2 0F 01 CA, ERETU has the opcode
> F3 0F 01 CA)
> 
> Reviewed-by: Masami Hiramatsu (Google) 
> 
> Thank you,

Thanks! Will add your RB.
  Xin

Re: [RFC PATCH 1/5] [WIP]misra: add entries to the excluded list

2023-07-31 Thread Luca Fancellu



> On 31 Jul 2023, at 16:20, Jan Beulich  wrote:
> 
> On 31.07.2023 17:11, Luca Fancellu wrote:
 +{
 +"rel_path": "arch/x86/include/asm/bug.h",
 +"comment": "Includes mostly assembly macro"
 +},
>>> 
>>> Mind me asking why assembly macros wouldn't want maintaining in proper
>>> style?
>> 
>> From what I know (experts on CF correct me if I am wrong) clang-format is 
>> meant to format only some languages
>> (C/C++/...) and assembly is not one of them, so what is happening is that 
>> most of the time clang-format breaks
>> it, in fact we are formatting only .c and .h, but since we have some headers 
>> with assembly macros, I’ve seen some issues
>> that ranges from really ugly formatting to build break.
>> 
>> One thing we could do, is to export the headers that contain only assembly 
>> stuffs in dedicate headers (_asm.h ?)
>> so that we could easily use a name regex to exclude "*_asm.h” from 
>> clang-format? And also these headers could #error if
>> included when __ASSEMBLY__ is not defined?
> 
> In principle this may be a route to go (naming aside), but first of all
> I wonder what "assembler macros" are to you: We use both C macros and
> true assembler macros in assembly code. The former I would hope formatting
> tools don't have an issue with.

Yes, C macros are clearly not an issue, true assembler macros are, like the 
example below:

.macro BUG_FRAME type, line, file_str, second_frame, msg

.if \type >= BUGFRAME_NR
.error "Invalid BUGFRAME index"
.endif

.L\@ud: ud2a

.pushsection .rodata.str1, "aMS", @progbits, 1
 .L\@s1: .asciz "\file_str"
.popsection

.pushsection .bug_frames.\type, "a", @progbits
.p2align 2
.L\@bf:
.long (.L\@ud - .L\@bf) + \
   ((\line >> BUG_LINE_LO_WIDTH) << BUG_DISP_WIDTH)
.long (.L\@s1 - .L\@bf) + \
   ((\line & ((1 << BUG_LINE_LO_WIDTH) - 1)) << BUG_DISP_WIDTH)

.if \second_frame
.pushsection .rodata.str1, "aMS", @progbits, 1
.L\@s2: .asciz "\msg"
.popsection
.long 0, (.L\@s2 - .L\@bf)
.endif
.popsection
.endm

I don’t think CF has knowledge of the token .macro/.endm/.if/[...] and so it 
formats them
in weird ways

> 
> Jan

Re: [PATCH 4/5] xen/ppc: Parse device tree for OPAL node on PowerNV

2023-07-31 Thread Jan Beulich

On 28.07.2023 23:35, Shawn Anastasio wrote:
> --- a/xen/arch/ppc/arch.mk
> +++ b/xen/arch/ppc/arch.mk
> @@ -10,5 +10,5 @@ CFLAGS += -mstrict-align -mcmodel=medium -mabi=elfv2 -fPIC 
> -mno-altivec -mno-vsx
>  LDFLAGS += -m elf64lppc
>  
>  # TODO: Drop override when more of the build is working
> -override ALL_OBJS-y = arch/$(SRCARCH)/built_in.o
> -override ALL_LIBS-y =
> +override ALL_OBJS-y = arch/$(SRCARCH)/built_in.o common/libfdt/built_in.o 
> lib/built_in.o
> +override ALL_LIBS-y = lib/lib.a

Can't you drop the ALL_LIBS-y override right here?

Jan

Re: [PATCH v6 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-31 Thread Christoph Hellwig

I was just going to apply this, but patch 1 seems to have a non-trivial
conflict with the is_swiotlb_active removal in pci-dma.c.  Can you resend
against the current dma-mapping for-next tree?

Re: [PATCH 3/5] xen/ppc: Add OPAL API definition header file

2023-07-31 Thread Jan Beulich

On 28.07.2023 23:35, Shawn Anastasio wrote:
> OPAL (OpenPower Abstraction Layer) is the interface exposed by firmware
> on PowerNV (bare metal) systems. Import Linux's header definining the
> API and related information.

To help future updating, mentioning version (or commit) at which this
snapshot was taken would be helpful.

Jan

Re: [PATCH 2/5] xen/ppc: Switch to medium PIC code model

2023-07-31 Thread Jan Beulich

On 28.07.2023 23:35, Shawn Anastasio wrote:
> --- a/xen/arch/ppc/ppc64/head.S
> +++ b/xen/arch/ppc/ppc64/head.S
> @@ -1,9 +1,11 @@
>  /* SPDX-License-Identifier: GPL-2.0-or-later */
>  
>  #include 
> +#include 
>  
>  .section .text.header, "ax", %progbits
>  
> +
>  ENTRY(start)

Nit: Stray change?

> @@ -11,16 +13,19 @@ ENTRY(start)
>  FIXUP_ENDIAN
>  
>  /* set up the TOC pointer */
> -LOAD_IMM32(%r2, .TOC.)
> +bcl  20, 31, .+4

Could you use a label name instead of .+4? Aiui you really mean

> +1:  mflr%r12

... "1f" there?

Jan

> +addis   %r2, %r12, .TOC.-1b@ha
> +addi%r2, %r2, .TOC.-1b@l
>  
>  /* set up the initial stack */
> -LOAD_IMM32(%r1, cpu0_boot_stack)
> +LOAD_REG_ADDR(%r1, cpu0_boot_stack)
>  li  %r11, 0
>  stdu%r11, -STACK_FRAME_OVERHEAD(%r1)
>  
>  /* clear .bss */
> -LOAD_IMM32(%r14, __bss_start)
> -LOAD_IMM32(%r15, __bss_end)
> +LOAD_REG_ADDR(%r14, __bss_start)
> +LOAD_REG_ADDR(%r15, __bss_end)
>  1:
>  std %r11, 0(%r14)
>  addi%r14, %r14, 8

Re: [PATCH 1/5] xen/lib: Move simple_strtoul from common/vsprintf.c to lib

2023-07-31 Thread Jan Beulich

On 28.07.2023 23:35, Shawn Anastasio wrote:
> Move the simple_strtoul routine which is used throughout the codebase
> from vsprintf.c to its own file in xen/lib.
> 
> This allows libfdt to be built on ppc64 even though xen/common doesn't
> build yet.
> 
> Signed-off-by: Shawn Anastasio 
> ---
>  xen/common/vsprintf.c| 37 -
>  xen/lib/Makefile |  1 +
>  xen/lib/simple_strtoul.c | 40 
>  3 files changed, 41 insertions(+), 37 deletions(-)
>  create mode 100644 xen/lib/simple_strtoul.c

What about its siblings? It'll be irritating to find one here and the
other there.

Also please no underscores in (new) filenames unless there's a reason
for this. In the case here, though, I question the need for "simple"
in the file name in the first place.

> --- /dev/null
> +++ b/xen/lib/simple_strtoul.c
> @@ -0,0 +1,40 @@
> +/*
> + *  Copyright (C) 1991, 1992  Linus Torvalds
> + */
> +
> +#include 
> +
> +/**
> + * simple_strtoul - convert a string to an unsigned long
> + * @cp: The start of the string
> + * @endp: A pointer to the end of the parsed string will be placed here
> + * @base: The number base to use
> + */
> +unsigned long simple_strtoul(
> +const char *cp, const char **endp, unsigned int base)
> +{
> +unsigned long result = 0,value;
> +
> +if (!base) {
> +base = 10;
> +if (*cp == '0') {
> +base = 8;
> +cp++;
> +if ((toupper(*cp) == 'X') && isxdigit(cp[1])) {
> +cp++;
> +base = 16;
> +}
> +}
> +} else if (base == 16) {
> +if (cp[0] == '0' && toupper(cp[1]) == 'X')
> +cp += 2;
> +}
> +while (isxdigit(*cp) &&
> +   (value = isdigit(*cp) ? *cp-'0' : toupper(*cp)-'A'+10) < base) {
> +result = result*base + value;
> +cp++;
> +}
> +if (endp)
> +*endp = cp;
> +return result;
> +}

While moving, I think it would be nice if this stopped using neither
Xen nor Linux style. I'm not going to insist, but doing such style
adjustments right here would be quite nice.

Jan

[PATCH 00/24] ALSA: Generic PCM copy ops using sockptr_t

2023-07-31 Thread Takashi Iwai

Hi,

this is a patch set to clean up the PCM copy ops using sockptr_t as a
"universal" pointer, inspired by the recent patch from Andy
Shevchenko:
  
https://lore.kernel.org/r/20230721100146.67293-1-andriy.shevche...@linux.intel.com

Even though it sounds a bit weird, sockptr_t is a generic type that is
used already in wide ranges, and it can fit our purpose, too.  With
sockptr_t, the former split of copy_user and copy_kernel PCM ops can
be unified again gracefully.

The patch set introduces the new PCM ops, converting users, and drops
the old PCM ops.  Most of conversions are straightforward, simply
replacing copy_*_user() with copy_*_sockptr() variants.

Note that the conversion in ASoC will fix a potential problem of ASoC
PCM that has been for long time.  Since ASoC component takes care of
only copy_user, the conversion form/to kernel space might have been
missing.  With this patch set, both cases are handled with sockptr_t
by a single callback.

The patches are lightly tested (with a faked PCM copy implementation
on HD-audio), while most of patches are only compile-tested.


Takashi

===

Cc: Andy Shevchenko 
Cc: Andrey Utkin 
Cc: Anton Sviridenko 
Cc: Arnaud Pouliquen 
Cc: Banajit Goswami 
Cc: Bluecherry Maintainers 
Cc: Claudiu Beznea 
Cc: Ismael Luceno 
Cc: Lars-Peter Clausen 
Cc: Mark Brown 
Cc: Mauro Carvalho Chehab 
Cc: Oleksandr Andrushchenko 
Cc: Olivier Moysan 
Cc: Srinivas Kandagatla 
Cc: linux-me...@vger.kernel.org
Cc: xen-devel@lists.xenproject.org

===

Takashi Iwai (24):
  ALSA: pcm: Add copy ops with universal sockptr_t
  ALSA: core: Add memory copy helpers between sockptr and iomem
  ALSA: dummy: Convert to generic PCM copy ops
  ALSA: gus: Convert to generic PCM copy ops
  ALSA: emu8000: Convert to generic PCM copy ops
  ALSA: es1938: Convert to generic PCM copy ops
  ALSA: korg1212: Convert to generic PCM copy ops
  ALSA: nm256: Convert to generic PCM copy ops
  ALSA: rme32: Convert to generic PCM copy ops
  ALSA: rme96: Convert to generic PCM copy ops
  ALSA: hdsp: Convert to generic PCM copy ops
  ALSA: rme9652: Convert to generic PCM copy ops
  ALSA: sh: Convert to generic PCM copy ops
  ALSA: xen: Convert to generic PCM copy ops
  ALSA: pcmtest: Update comment about PCM copy ops
  media: solo6x10: Convert to generic PCM copy ops
  ASoC: component: Add generic PCM copy ops
  ASoC: mediatek: Convert to generic PCM copy ops
  ASoC: qcom: Convert to generic PCM copy ops
  ASoC: dmaengine: Convert to generic PCM copy ops
  ASoC: dmaengine: Use sockptr_t for process callback, too
  ALSA: doc: Update description for the new PCM copy ops
  ASoC: pcm: Drop obsoleted PCM copy_user ops
  ALSA: pcm: Drop obsoleted PCM copy_user and copy_kernel ops

 .../kernel-api/writing-an-alsa-driver.rst | 59 +-
 drivers/media/pci/solo6x10/solo6x10-g723.c| 41 ++
 include/sound/dmaengine_pcm.h |  2 +-
 include/sound/pcm.h   | 12 +--
 include/sound/soc-component.h | 14 ++--
 sound/core/memory.c   | 39 +
 sound/core/pcm_lib.c  | 81 +--
 sound/core/pcm_native.c   |  2 +-
 sound/drivers/dummy.c | 12 +--
 sound/drivers/pcmtest.c   |  2 +-
 sound/isa/gus/gus_pcm.c   | 23 +-
 sound/isa/sb/emu8000_pcm.c| 79 +-
 sound/pci/es1938.c| 31 ++-
 sound/pci/korg1212/korg1212.c | 46 +++
 sound/pci/nm256/nm256.c   | 42 ++
 sound/pci/rme32.c | 50 +++-
 sound/pci/rme96.c | 48 +++
 sound/pci/rme9652/hdsp.c  | 42 ++
 sound/pci/rme9652/rme9652.c   | 46 ++-
 sound/sh/sh_dac_audio.c   | 25 +-
 sound/soc/atmel/mchp-pdmc.c   |  2 +-
 sound/soc/mediatek/common/mtk-btcvsd.c| 22 ++---
 sound/soc/qcom/lpass-platform.c   | 12 +--
 sound/soc/soc-component.c | 10 +--
 sound/soc/soc-generic-dmaengine-pcm.c | 18 ++---
 sound/soc/soc-pcm.c   |  4 +-
 sound/soc/stm/stm32_sai_sub.c |  2 +-
 sound/xen/xen_snd_front_alsa.c| 55 +++--
 28 files changed, 251 insertions(+), 570 deletions(-)

-- 
2.35.3

[PATCH 14/24] ALSA: xen: Convert to generic PCM copy ops

2023-07-31 Thread Takashi Iwai

This patch converts the xen frontend driver code to use the new
unified PCM copy callback.  It's a straightforward conversion from
*_user() to *_sockptr() variants.

Cc: Oleksandr Andrushchenko 
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Takashi Iwai 
---
 sound/xen/xen_snd_front_alsa.c | 55 +++---
 1 file changed, 10 insertions(+), 45 deletions(-)

diff --git a/sound/xen/xen_snd_front_alsa.c b/sound/xen/xen_snd_front_alsa.c
index 7a3dfce97c15..a7e2cba3ff4e 100644
--- a/sound/xen/xen_snd_front_alsa.c
+++ b/sound/xen/xen_snd_front_alsa.c
@@ -602,38 +602,24 @@ static snd_pcm_uframes_t alsa_pointer(struct 
snd_pcm_substream *substream)
return (snd_pcm_uframes_t)atomic_read(>hw_ptr);
 }
 
-static int alsa_pb_copy_user(struct snd_pcm_substream *substream,
-int channel, unsigned long pos, void __user *src,
-unsigned long count)
+static int alsa_pb_copy(struct snd_pcm_substream *substream,
+   int channel, unsigned long pos, sockptr_t src,
+   unsigned long count)
 {
struct xen_snd_front_pcm_stream_info *stream = stream_get(substream);
 
if (unlikely(pos + count > stream->buffer_sz))
return -EINVAL;
 
-   if (copy_from_user(stream->buffer + pos, src, count))
+   if (copy_from_sockptr(stream->buffer + pos, src, count))
return -EFAULT;
 
return xen_snd_front_stream_write(>evt_pair->req, pos, count);
 }
 
-static int alsa_pb_copy_kernel(struct snd_pcm_substream *substream,
-  int channel, unsigned long pos, void *src,
-  unsigned long count)
-{
-   struct xen_snd_front_pcm_stream_info *stream = stream_get(substream);
-
-   if (unlikely(pos + count > stream->buffer_sz))
-   return -EINVAL;
-
-   memcpy(stream->buffer + pos, src, count);
-
-   return xen_snd_front_stream_write(>evt_pair->req, pos, count);
-}
-
-static int alsa_cap_copy_user(struct snd_pcm_substream *substream,
- int channel, unsigned long pos, void __user *dst,
- unsigned long count)
+static int alsa_cap_copy(struct snd_pcm_substream *substream,
+int channel, unsigned long pos, sockptr_t dst,
+unsigned long count)
 {
struct xen_snd_front_pcm_stream_info *stream = stream_get(substream);
int ret;
@@ -645,29 +631,10 @@ static int alsa_cap_copy_user(struct snd_pcm_substream 
*substream,
if (ret < 0)
return ret;
 
-   return copy_to_user(dst, stream->buffer + pos, count) ?
+   return copy_to_sockptr(dst, stream->buffer + pos, count) ?
-EFAULT : 0;
 }
 
-static int alsa_cap_copy_kernel(struct snd_pcm_substream *substream,
-   int channel, unsigned long pos, void *dst,
-   unsigned long count)
-{
-   struct xen_snd_front_pcm_stream_info *stream = stream_get(substream);
-   int ret;
-
-   if (unlikely(pos + count > stream->buffer_sz))
-   return -EINVAL;
-
-   ret = xen_snd_front_stream_read(>evt_pair->req, pos, count);
-   if (ret < 0)
-   return ret;
-
-   memcpy(dst, stream->buffer + pos, count);
-
-   return 0;
-}
-
 static int alsa_pb_fill_silence(struct snd_pcm_substream *substream,
int channel, unsigned long pos,
unsigned long count)
@@ -697,8 +664,7 @@ static const struct snd_pcm_ops snd_drv_alsa_playback_ops = 
{
.prepare= alsa_prepare,
.trigger= alsa_trigger,
.pointer= alsa_pointer,
-   .copy_user  = alsa_pb_copy_user,
-   .copy_kernel= alsa_pb_copy_kernel,
+   .copy   = alsa_pb_copy,
.fill_silence   = alsa_pb_fill_silence,
 };
 
@@ -710,8 +676,7 @@ static const struct snd_pcm_ops snd_drv_alsa_capture_ops = {
.prepare= alsa_prepare,
.trigger= alsa_trigger,
.pointer= alsa_pointer,
-   .copy_user  = alsa_cap_copy_user,
-   .copy_kernel= alsa_cap_copy_kernel,
+   .copy   = alsa_cap_copy,
 };
 
 static int new_pcm_instance(struct xen_snd_front_card_info *card_info,
-- 
2.35.3

Re: [PATCH 2/3] xen/ppc: Relocate kernel to physical address 0 on boot

2023-07-31 Thread Jan Beulich

On 29.07.2023 00:21, Shawn Anastasio wrote:
> Introduce a small assembly loop in `start` to copy the kernel to
> physical address 0 before continuing. This ensures that the physical
> address lines up with XEN_VIRT_START (0xc000) and allows us
> to identity map the kernel when the MMU is set up in the next patch.

So PPC guarantees there's always a reasonable amount of memory at 0,
and that's available for use?

> --- a/xen/arch/ppc/ppc64/head.S
> +++ b/xen/arch/ppc/ppc64/head.S
> @@ -18,6 +18,33 @@ ENTRY(start)
>  addis   %r2, %r12, .TOC.-1b@ha
>  addi%r2, %r2, .TOC.-1b@l
>  
> +/*
> + * Copy Xen to physical address zero and jump to XEN_VIRT_START
> + * (0xc000). This works because the hardware will ignore the 
> top
> + * four address bits when the MMU is off.
> + */
> +LOAD_REG_ADDR(%r1, start)

I think you really mean _start here (which is missing from the linker
script), not start. See also Andrew's recent related RISC-V change.

> +LOAD_IMM64(%r12, XEN_VIRT_START)
> +
> +/* If we're at the correct address, skip copy */
> +cmpld   %r1, %r12
> +beq .L_correct_address

Can this ever be the case, especially with the MMU-off behavior you
describe in the comment above? Wouldn't you need to ignore the top
four bits in the comparison?

> +/* Copy bytes until _end */
> +LOAD_REG_ADDR(%r11, _end)
> +addi%r1, %r1, -8
> +li  %r13, -8
> +.L_copy_xen:
> +ldu %r10, 8(%r1)
> +stdu%r10, 8(%r13)
> +cmpld   %r1, %r11
> +blt .L_copy_xen
> +
> +/* Jump to XEN_VIRT_START */
> +mtctr   %r12
> +bctr
> +.L_correct_address:

Can the two regions potentially overlap? Looking at the ELF header
it's not clear to me what guarantees there are that this can't
happen.

Jan

Re: [PATCH v2 5/5] pdx: Add CONFIG_HAS_PDX_COMPRESSION as a common Kconfig option

2023-07-31 Thread Jan Beulich

On 28.07.2023 09:59, Alejandro Vallejo wrote:
> --- a/xen/common/pdx.c
> +++ b/xen/common/pdx.c
> @@ -31,11 +31,15 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(
>  
>  bool __mfn_valid(unsigned long mfn)
>  {
> -if ( unlikely(evaluate_nospec(mfn >= max_page)) )
> +bool invalid = mfn >= max_page;
> +#ifdef CONFIG_PDX_COMPRESSION
> +invalid |= mfn & pfn_hole_mask;
> +#endif

Nit: Declaration(s) and statement(s) separated by a blank line please.

> @@ -49,6 +53,8 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn)
>  __set_bit(idx, pdx_group_valid);
>  }
>  
> +#ifdef CONFIG_PDX_COMPRESSION
> +
>  /*
>   * Diagram to make sense of the following variables. The masks and shifts
>   * are done on mfn values in order to convert to/from pdx:

Nit: With a blank line after #ifdef, 

> @@ -175,6 +181,7 @@ void __init pfn_pdx_hole_setup(unsigned long mask)
>  pfn_top_mask= ~(pfn_pdx_bottom_mask | pfn_hole_mask);
>  ma_top_mask = pfn_top_mask << PAGE_SHIFT;
>  }
> +#endif /* CONFIG_PDX_COMPRESSION */
>  
>  
>  /*

... we would typically also have one before #endif. In the case here
you could even leverage that there are already (wrongly) two consecutive
blank lines.

> @@ -100,6 +98,8 @@ bool __mfn_valid(unsigned long mfn);
>  #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
>  #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
>  
> +#ifdef CONFIG_PDX_COMPRESSION
> +
>  extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
>  extern unsigned int pfn_pdx_hole_shift;
>  extern unsigned long pfn_hole_mask;
> @@ -205,8 +205,39 @@ static inline uint64_t directmapoff_to_maddr(unsigned 
> long offset)
>   * position marks a potentially compressible bit.
>   */
>  void pfn_pdx_hole_setup(unsigned long mask);
> +#else /* CONFIG_PDX_COMPRESSION */
> +
> +/* Without PDX compression we can skip some computations */

Same here for the #else then.

Jan

Re: [PATCH v2 3/5] mm/pdx: Standardize region validation wrt pdx compression

2023-07-31 Thread Jan Beulich

On 28.07.2023 09:59, Alejandro Vallejo wrote:
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1159,6 +1159,9 @@ static int mem_hotadd_check(unsigned long spfn, 
> unsigned long epfn)
>  {
>  unsigned long s, e, length, sidx, eidx;
>  
> +paddr_t mem_base = pfn_to_paddr(spfn);
> +unsigned long mem_npages = epfn - spfn;
> +
>  if ( (spfn >= epfn) )
>  return 0;

While occasionally groups of declarations indeed want separating, the
rule of thumb is that the first blank line after declarations separates
them from statements. I don't see reason here to diverge from this.

> @@ -1660,6 +1663,8 @@ static bool __init cf_check rt_range_valid(unsigned 
> long smfn, unsigned long emf
>  
>  void __init efi_init_memory(void)
>  {
> +paddr_t mem_base;
> +unsigned long mem_npages;

Why in the outermost scope when ...

> @@ -1732,6 +1737,9 @@ void __init efi_init_memory(void)
>  smfn = PFN_DOWN(desc->PhysicalStart);
>  emfn = PFN_UP(desc->PhysicalStart + len);
>  
> +mem_base = pfn_to_paddr(smfn);
> +mem_npages = emfn - smfn;
> +
>  if ( desc->Attribute & EFI_MEMORY_WB )
>  prot |= _PAGE_WB;
>  else if ( desc->Attribute & EFI_MEMORY_WT )
> @@ -1759,8 +1767,7 @@ void __init efi_init_memory(void)
>  prot |= _PAGE_NX;
>  
>  if ( pfn_to_pdx(emfn - 1) < (DIRECTMAP_SIZE >> PAGE_SHIFT) &&
> - !(smfn & pfn_hole_mask) &&
> - !((smfn ^ (emfn - 1)) & ~pfn_pdx_bottom_mask) )
> + pdx_is_region_compressible(mem_base, mem_npages))
>  {
>  if ( (unsigned long)mfn_to_virt(emfn - 1) >= HYPERVISOR_VIRT_END 
> )
>  prot &= ~_PAGE_GLOBAL;

... you use the variables only in an inner one?

> --- a/xen/common/pdx.c
> +++ b/xen/common/pdx.c
> @@ -88,7 +88,7 @@ bool __mfn_valid(unsigned long mfn)
>  }
>  
>  /* Sets all bits from the most-significant 1-bit down to the LSB */
> -static uint64_t __init fill_mask(uint64_t mask)
> +static uint64_t fill_mask(uint64_t mask)
>  {
>  while (mask & (mask + 1))
>  mask |= mask + 1;

I see why you want __init dropped here, but the function wasn't written
for "common use" and hence may want improving first when intended for
more frequent (post-init) use as well. Then again I wonder why original
checking all got away without using this function ...

Jan

Re: [RFC PATCH 1/5] [WIP]misra: add entries to the excluded list

2023-07-31 Thread Jan Beulich

On 31.07.2023 17:11, Luca Fancellu wrote:
>>> +{
>>> +"rel_path": "arch/x86/include/asm/bug.h",
>>> +"comment": "Includes mostly assembly macro"
>>> +},
>>
>> Mind me asking why assembly macros wouldn't want maintaining in proper
>> style?
> 
> From what I know (experts on CF correct me if I am wrong) clang-format is 
> meant to format only some languages
> (C/C++/...) and assembly is not one of them, so what is happening is that 
> most of the time clang-format breaks
> it, in fact we are formatting only .c and .h, but since we have some headers 
> with assembly macros, I’ve seen some issues
> that ranges from really ugly formatting to build break.
> 
> One thing we could do, is to export the headers that contain only assembly 
> stuffs in dedicate headers (_asm.h ?)
> so that we could easily use a name regex to exclude "*_asm.h” from 
> clang-format? And also these headers could #error if
> included when __ASSEMBLY__ is not defined?

In principle this may be a route to go (naming aside), but first of all
I wonder what "assembler macros" are to you: We use both C macros and
true assembler macros in assembly code. The former I would hope formatting
tools don't have an issue with.

Jan

Re: [PATCH v2 2/5] mm: Factor out the pdx compression logic in ma/va converters

2023-07-31 Thread Jan Beulich

On 28.07.2023 09:59, Alejandro Vallejo wrote:
> --- a/xen/include/xen/pdx.h
> +++ b/xen/include/xen/pdx.h
> @@ -160,6 +160,31 @@ static inline unsigned long pdx_to_pfn(unsigned long pdx)
>  #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
>  #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
>  
> +/**
> + * Computes the offset into the direct map of an maddr
> + *
> + * @param ma Machine address
> + * @return Offset on the direct map where that
> + * machine address can be accessed
> + */
> +static inline unsigned long maddr_to_directmapoff(uint64_t ma)

Was there prior agreement to use uint64_t here and ...

> +{
> +return ((ma & ma_top_mask) >> pfn_pdx_hole_shift) |
> +   (ma & ma_va_bottom_mask);
> +}
> +
> +/**
> + * Computes a machine address given a direct map offset
> + *
> + * @param offset Offset into the direct map
> + * @return Corresponding machine address of that virtual location
> + */
> +static inline uint64_t directmapoff_to_maddr(unsigned long offset)

... here, not paddr_t?

Also you use unsigned long for the offset here, but size_t for
maddr_to_directmapoff()'s return value in __maddr_to_virt().
Would be nice if this was consistent within the patch.

Especially since the names of the helper functions are longish,
I'm afraid I'm not fully convinced of the transformation. But I'm
also not meaning to stand in the way, if everyone else wants to
move in that direction.

Jan

Re: [RFC PATCH 1/5] [WIP]misra: add entries to the excluded list

2023-07-31 Thread Luca Fancellu

Hi Jan,

>> +{
>> +"rel_path": "arch/x86/include/asm/bug.h",
>> +"comment": "Includes mostly assembly macro"
>> +},
> 
> Mind me asking why assembly macros wouldn't want maintaining in proper
> style?

From what I know (experts on CF correct me if I am wrong) clang-format is meant 
to format only some languages
(C/C++/...) and assembly is not one of them, so what is happening is that most 
of the time clang-format breaks
it, in fact we are formatting only .c and .h, but since we have some headers 
with assembly macros, I’ve seen some issues
that ranges from really ugly formatting to build break.

One thing we could do, is to export the headers that contain only assembly 
stuffs in dedicate headers (_asm.h ?)
so that we could easily use a name regex to exclude "*_asm.h” from 
clang-format? And also these headers could #error if
included when __ASSEMBLY__ is not defined?

But this requires some agreement on what is the best way I guess, you can know 
better if it’s feasible or not.

>> 
>> +{
>> +"rel_path": "include/public/**/**/*.h",
>> +"comment": "Public headers are quite sensitive to format tools"
>> +},
>> +{
>> +"rel_path": "include/public/**/*.h",
>> +"comment": "Public headers are quite sensitive to format tools"
>> +},
> 
> The common meaning of ** that I know is "any level directories", but
> since you use **/**/ above that can't be it here. Could you clarify
> what the difference of */ and **/ is here (or maybe in JSON in general)?

Yes I’ve found that python glob, that we use to solve the wildcard, solves the 
** only for one level,
maybe we could do something better to solve that, but for now I left it as it 
is to focus on the
clang-format configuration side.

Cheers,
Luca

> 
> Jan

Re: [RFC PATCH 1/5] [WIP]misra: add entries to the excluded list

2023-07-31 Thread Jan Beulich

On 28.07.2023 10:11, Luca Fancellu wrote:
> Add entries to the exclusion list, so that they can be excluded
> from the formatter tool.
> 
> TBD: add a field on each entry to understand for what tool is the
> exclusion
> 
> Signed-off-by: Luca Fancellu 
> ---
>  docs/misra/exclude-list.json | 88 
>  1 file changed, 88 insertions(+)
> 
> diff --git a/docs/misra/exclude-list.json b/docs/misra/exclude-list.json
> index ca1e2dd678ff..c103c69209c9 100644
> --- a/docs/misra/exclude-list.json
> +++ b/docs/misra/exclude-list.json
> @@ -1,6 +1,10 @@
>  {
>  "version": "1.0",
>  "content": [
> +{
> +"rel_path": "arch/arm/arm32/lib/assembler.h",
> +"comment": "Includes mostly assembly macro and it's meant to be 
> included only in assembly code"
> +},
>  {
>  "rel_path": "arch/arm/arm64/cpufeature.c",
>  "comment": "Imported from Linux, ignore for now"
> @@ -13,6 +17,26 @@
>  "rel_path": "arch/arm/arm64/lib/find_next_bit.c",
>  "comment": "Imported from Linux, ignore for now"
>  },
> +{
> +"rel_path": "arch/arm/include/asm/arm32/macros.h",
> +"comment": "Includes only assembly macro"
> +},
> +{
> +"rel_path": "arch/arm/include/asm/arm64/macros.h",
> +"comment": "Includes only assembly macro"
> +},
> +{
> +"rel_path": "arch/arm/include/asm/alternative.h",
> +"comment": "Imported from Linux, ignore for now"
> +},
> +{
> +"rel_path": "arch/arm/include/asm/asm_defns.h",
> +"comment": "Includes mostly assembly macro"
> +},
> +{
> +"rel_path": "arch/arm/include/asm/macros.h",
> +"comment": "Includes mostly assembly macro and it's meant to be 
> included only in assembly code"
> +},
>  {
>  "rel_path": "arch/x86/acpi/boot.c",
>  "comment": "Imported from Linux, ignore for now"
> @@ -69,6 +93,30 @@
>  "rel_path": "arch/x86/cpu/mwait-idle.c",
>  "comment": "Imported from Linux, ignore for now"
>  },
> +{
> +"rel_path": "arch/x86/include/asm/alternative-asm.h",
> +"comment": "Includes mostly assembly macro and it's meant to be 
> included only in assembly code"
> +},
> +{
> +"rel_path": "arch/x86/include/asm/asm_defns.h",
> +"comment": "Includes mostly assembly macro"
> +},
> +{
> +"rel_path": "arch/x86/include/asm/asm-defns.h",
> +"comment": "Includes mostly assembly macro"
> +},
> +{
> +"rel_path": "arch/x86/include/asm/bug.h",
> +"comment": "Includes mostly assembly macro"
> +},

Mind me asking why assembly macros wouldn't want maintaining in proper
style?

> +{
> +"rel_path": "arch/x86/include/asm/mpspec.h",
> +"comment": "Imported from Linux, also case ranges are not 
> handled by clang-format, ignore for now"
> +},
> +{
> +"rel_path": "arch/x86/include/asm/spec_ctrl_asm.h",
> +"comment": "Includes mostly assembly macro"
> +},
>  {
>  "rel_path": "arch/x86/delay.c",
>  "comment": "Imported from Linux, ignore for now"
> @@ -181,6 +229,42 @@
>  "rel_path": "drivers/video/font_*",
>  "comment": "Imported from Linux, ignore for now"
>  },
> +{
> +"rel_path": "include/efi/*.h",
> +"comment": "Imported from gnu-efi-3.0k"
> +},
> +{
> +"rel_path": "include/public/arch-x86/cpufeatureset.h",
> +"comment": "This file contains some inputs for the gen-cpuid.py 
> script, leave it out"
> +},
> +{
> +"rel_path": "include/public/**/**/*.h",
> +"comment": "Public headers are quite sensitive to format tools"
> +},
> +{
> +"rel_path": "include/public/**/*.h",
> +"comment": "Public headers are quite sensitive to format tools"
> +},

The common meaning of ** that I know is "any level directories", but
since you use **/**/ above that can't be it here. Could you clarify
what the difference of */ and **/ is here (or maybe in JSON in general)?

Jan

Re: [XEN PATCH 1/4] xen/pci: rename local variable to address MISRA C:2012 Rule 5.3

2023-07-31 Thread Nicola Vetrini


On 31/07/2023 16:16, Jan Beulich wrote:

On 31.07.2023 15:34, Nicola Vetrini wrote:

--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -650,12 +650,12 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 struct pci_seg *pseg;
 struct pci_dev *pdev;
 unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
-const char *pdev_type;
+const char *pci_dev_type;


I've always been wondering what purpose the pdev_ prefix served here.
There's no other "type" variable in the function, so why make the name
longer? (I'm okay to adjust on commit, provided you agree.)

Jan


No objections.

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)

Re: [XEN PATCH] xen/arm: mechanical renaming to address MISRA C:2012 Rule 5.3

2023-07-31 Thread Julien Grall


Hi,

On 28/07/2023 13:14, Nicola Vetrini wrote:

On 21/07/2023 17:54, Julien Grall wrote:

Hi,

On 21/07/2023 16:22, Nicola Vetrini wrote:

Rule 5.3 has the following headline:
"An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope"

The function parameters renamed in this patch are hiding a variable 
defined

in an enclosing scope or a function identifier.

The following renames have been made:
- s/guest_mode/guest_mode_on/ to distinguish from function 'guest_mode'
- s/struct module_name/struct module_info to distinguish from the 
homonymous


Typo: Missing '/' after 'module_info'.

parameters, since the structure contains more information than just 
the name.
- s/file_name/file_info in 'xen/arch/arm/efi/efi-boot.h' for 
consistency with


Same here.


the previous renaming.

Signed-off-by: Nicola Vetrini 


Assuming there is no other comments, I would be Ok to fix it on 
commit. So:


Acked-by: Julien Grall 

Cheers,


I don't see any further comments on this. Are you ok with committing it?


Yes. This was committed by Stefano on Friday evening.

Cheers,

--
Julien Grall

Re: [XEN PATCH 3/4] xen: rename variables and parameters to address MISRA C:2012 Rule 5.3

2023-07-31 Thread Jan Beulich

On 31.07.2023 15:35, Nicola Vetrini wrote:
> Rule 5.3 has the following headline:
> "An identifier declared in an inner scope shall not hide an
> identifier declared in an outer scope"
> 
> Local variables have been suitably renamed to address some violations
> of this rule:
> - s/cmp/c/ because it shadows the union declared at line 87.
> - s/nodes/numa_nodes/ shadows the static variable declared at line 18.
> - s/ctrl/controller/ because the homonymous function parameter is later
>   read.
> - s/baud/baud_rate/ to avoid shadowing the enum constant defined
>   at line 1391.
> 
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 
> ---
>  xen/common/compat/memory.c   |  6 +++---
>  xen/common/numa.c| 36 ++--
>  xen/drivers/char/ehci-dbgp.c |  4 ++--
>  xen/drivers/char/ns16550.c   |  4 ++--
>  4 files changed, 25 insertions(+), 25 deletions(-)

This is an odd mix of files touched in a single patch. How about splitting
into two, one for common/ and one for drivers/?

> --- a/xen/common/compat/memory.c
> +++ b/xen/common/compat/memory.c
> @@ -321,12 +321,12 @@ int compat_memory_op(unsigned int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) compat)
>  
>  case XENMEM_remove_from_physmap:
>  {
> -struct compat_remove_from_physmap cmp;
> +struct compat_remove_from_physmap c;

The intention of the outer scope cmp is to avoid such inner scope
ones then consuming extra stack space. This wants making part of the
union there.

> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -382,7 +382,7 @@ static bool __init numa_process_nodes(paddr_t start, 
> paddr_t end)
>   * 0 if memnodmap[] too small (or shift too small)
>   * -1 if node overlap or lost ram (shift too big)
>   */
> -static int __init populate_memnodemap(const struct node *nodes,
> +static int __init populate_memnodemap(const struct node *numa_nodes,
>unsigned int numnodes, unsigned int 
> shift,
>const nodeid_t *nodeids)
>  {
> @@ -393,8 +393,8 @@ static int __init populate_memnodemap(const struct node 
> *nodes,
>  
>  for ( i = 0; i < numnodes; i++ )
>  {
> -unsigned long spdx = paddr_to_pdx(nodes[i].start);
> -unsigned long epdx = paddr_to_pdx(nodes[i].end - 1);
> +unsigned long spdx = paddr_to_pdx(numa_nodes[i].start);
> +unsigned long epdx = paddr_to_pdx(numa_nodes[i].end - 1);
>  
>  if ( spdx > epdx )
>  continue;
> @@ -440,7 +440,7 @@ static int __init allocate_cachealigned_memnodemap(void)
>   * The LSB of all start addresses in the node map is the value of the
>   * maximum possible shift.
>   */
> -static unsigned int __init extract_lsb_from_nodes(const struct node *nodes,
> +static unsigned int __init extract_lsb_from_nodes(const struct node 
> *numa_nodes,
>nodeid_t numnodes,
>const nodeid_t *nodeids)
>  {
> @@ -449,8 +449,8 @@ static unsigned int __init extract_lsb_from_nodes(const 
> struct node *nodes,
>  
>  for ( i = 0; i < numnodes; i++ )
>  {
> -unsigned long spdx = paddr_to_pdx(nodes[i].start);
> -unsigned long epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> +unsigned long spdx = paddr_to_pdx(numa_nodes[i].start);
> +unsigned long epdx = paddr_to_pdx(numa_nodes[i].end - 1) + 1;
>  
>  if ( spdx >= epdx )
>  continue;
> @@ -475,10 +475,10 @@ static unsigned int __init extract_lsb_from_nodes(const 
> struct node *nodes,
>  return i;
>  }
>  
> -int __init compute_hash_shift(const struct node *nodes,
> +int __init compute_hash_shift(const struct node *numa_nodes,
>unsigned int numnodes, const nodeid_t *nodeids)
>  {
> -unsigned int shift = extract_lsb_from_nodes(nodes, numnodes, nodeids);
> +unsigned int shift = extract_lsb_from_nodes(numa_nodes, numnodes, 
> nodeids);
>  
>  if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
>  memnodemap = _memnodemap;
> @@ -487,7 +487,7 @@ int __init compute_hash_shift(const struct node *nodes,
>  
>  printk(KERN_DEBUG "NUMA: Using %u for the hash shift\n", shift);
>  
> -if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 )
> +if ( populate_memnodemap(numa_nodes, numnodes, shift, nodeids) != 1 )
>  {
>  printk(KERN_INFO "Your memory is not aligned you need to "
> "rebuild your hypervisor with a bigger NODEMAPSIZE "
> @@ -541,7 +541,7 @@ static int __init numa_emulation(unsigned long start_pfn,
>  {
>  int ret;
>  unsigned int i;
> -struct node nodes[MAX_NUMNODES];
> +struct node numa_nodes[MAX_NUMNODES];
>  uint64_t sz = pfn_to_paddr(end_pfn - start_pfn) / numa_fake;
>  
>  /* Kludge needed for the hash function */
> @@ -556,22 +556,22 @@ static int __init numa_emulation(unsigned long 
>

Re: [XEN PATCH 2/4] amd/iommu: rename functions to address MISRA C:2012 Rule 5.3

2023-07-31 Thread Jan Beulich

On 31.07.2023 15:35, Nicola Vetrini wrote:
> The functions 'machine_bfd' and 'guest_bfd' have gained the
> prefix 'get_' to avoid the mutual shadowing with the homonymous
> parameters in these functions.
> 
> Signed-off-by: Nicola Vetrini 

Acked-by: Jan Beulich 

Of course there are several other oddities, but in the end the entire file
in a single big one, I'm afraid.

Jan

1 2 3 >

1 - 100 of 206 matches

Mail list logo