Re: [PATCH v4 2/2] xen/arm: Enlarge identity map space to 10TB

2023-11-16 Thread Leo Yan
On Thu, Nov 16, 2023 at 02:21:18PM +, Julien Grall wrote:

[...]

> I have done the changes and directly committed the series. So no need to
> respin.

Thanks a lot, Julien!

Leo



Re: [PATCH v4] xen/arm: Skip memory nodes if not enabled

2023-11-06 Thread Leo Yan
Hi Julien,

On Mon, Nov 06, 2023 at 09:52:45AM +, Julien Grall wrote:

[...]

> > Gentle ping.
> > 
> > I don't see this patch is landed in Xen master or staging branch, should
> > anything I need to follow up?
> 
> The tree has been frozen for the past few weeks for any patches that are not
> meant for 4.18. We branched for 4.18 last week so staging is now in
> soft-reopening until 4.18 is released.

Thank you for the info.

> I am aware of few patches that are ready to be merged. But I haven't yet
> gone through all of them and merge to 4.19. It will probably be done once
> 4.18 is released.

Sure, good to know this.  I will wait a bit and just let me know if
I need to follow up anything.

Thanks,
Leo



Re: [PATCH v4 2/2] xen/arm: Enlarge identity map space to 10TB

2023-11-06 Thread Leo Yan
Hi Julien,

On Mon, Nov 06, 2023 at 09:39:24AM +, Julien Grall wrote:

[...]

> > I would like to check if here is anything specific I should follow up
> > on. Based on the discussion in this thread, I've come to the following
> > conclusions:
> > 
> > - Remove the fixes tags;
> > - Add a description in commit log, something like:
> >"Since commit 1c78d76b67e1 ('xen/arm64: mm: Introduce helpers to
> > prepare/enable/disable the identity mapping'), Xen will fail to boot
> > up if it's loaded in memory above 2TB. This commit fixes the
> > regression introduced by that commit."
> > - Add tages:
> >A review tag from Michal Orzel
> >A review tag from Bertrand Marquis
> >A test tag from Henry Wang
> > 
> > Should I repin a new patch set to address the items mentioned above?
> 
> You will also want to update the documentation after
> "docs/arm: Document where Xen should be loaded in
> memory"

Will do.

> > Another question is for the 'Release-acked-by' tag.  Henry gave this
> > tag, but I don't know how to handle it if I need to respin this patch.
> > Seems to me this is a special tag only for release process, so I don't
> > need to include it in the new patch, right?
> 
> The release-acked-by tag is only necessary during freeze period if the patch
> will land in the next release (i.e. 4.18). In this case, your patch will be
> part of the 4.19, so you can remove the release-acked-by.

Okay, I will _not_ include release-acked-by tag in the respin.

> If your patch was targeting 4.19, then it is usually fine to keep the
> release-acked-by even for the respin. It means that the release manager is
> happy for the patch to go assuming the patch has all the necessary reviews.

Thanks for explaination.

Leo



Re: [PATCH v4 2/2] xen/arm: Enlarge identity map space to 10TB

2023-11-05 Thread Leo Yan
Hi all,

On Wed, Oct 18, 2023 at 07:11:11PM +0100, Julien Grall wrote:

[...]

> > Anyway, both Bertrand and you seems to be against the Fixes tag here. So
> > I can compromise with the "This commit fixes...". However, can Bertrand
> > or you update process/send-patches.pandoc so it is clear for a
> > contributor when they should add Fixes tag (which BTW I still disagree
> > with but if the majority agrees, then I will not nack)?
> 
> We had a chat about this during the Arm maintainer calls. The disagreement
> boiled down to the fact that SUPPORT.md (or the documentation) doesn't say
> anything about whether loading Xen above 2TB was supported or not. Depending
> on the view, one could consider a bug or not.
> 
> Looking through the documentation, the best place to document might actually
> be misc/arm/booting.txt where we already have some requirements to boot Xen
> (such the binary must be entered in NS EL2 mode).
> 
> I will prepare a patch and send one.

I saw Julien's patch "docs/arm: Document where Xen should be loaded in
memory" has landed on the staging branch [1].

Should I respin this patch set and update the document to 10TB
limitation?  I need maintainer's sugguestion, otherwise I don't know
how to proceed for this patch set.

Thanks,
Leo

[1] 
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=5415b2b2118bd78d8a04f276a8312f7f0cb1a466



Re: [PATCH v4] xen/arm: Skip memory nodes if not enabled

2023-11-05 Thread Leo Yan
Hi maintainers,

On Fri, Oct 13, 2023 at 08:04:42PM +0800, Leo Yan wrote:
> Currently, Xen doesn't check the status property of memory/reserved
> memory nodes, which may lead to the following issues:
> 
> - If a memory node has a status "disabled" it implies that it should
>   not be used. Xen does not handle the status property for the memory
>   node and ends up using it.
> 
> - If a reserved memory node has a status "disabled", it means that this
>   region is no longer reserved and can be used, but the "disabled"
>   status is not handled by Xen.
> 
>   Xen passes the intact device tree binding of the reserved memory nodes
>   to Dom0 and creates a memory node to cover reserved regions. Disabled
>   reserved memory nodes are ignored by the Dom0 Linux kernel, thus the
>   Dom0 Linux kernel will continue to allocate pages from such a region.
> 
>   On the other hand, since the disabled status is not handled by Xen,
>   the disabled reserved memory regions are excluded from the page
>   management in Xen which results in Xen being unable to obtain the
>   corresponding MFN, in the end, Xen reports error like:
> 
>   (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
> 
> This patch introduces a function device_tree_node_is_available(). If it
> detects a memory node is not enabled, Xen will not add the memory region
> into the memory lists. In the end, this avoids to generate the memory
> node for the disabled memory regions sent to the kernel and the kernel
> cannot use the disabled memory nodes any longer.
> 
> Since this patch adds checking device node's status in the
> device_tree_get_meminfo() function, except it checks for memory nodes
> and reserved memory nodes, it also supports status for static memory
> and static heap.
> 
> Suggested-by: Michal Orzel 
> Signed-off-by: Leo Yan 
> Reviewed-by: Michal Orzel 

Gentle ping.

I don't see this patch is landed in Xen master or staging branch, should
anything I need to follow up?

Thanks,
Leo



Re: [PATCH v4 2/2] xen/arm: Enlarge identity map space to 10TB

2023-10-19 Thread Leo Yan
Hi Julien,

On Wed, Oct 18, 2023 at 07:11:11PM +0100, Julien Grall wrote:
> On 18/10/2023 11:59, Julien Grall wrote:
> > On 17/10/2023 20:58, Stefano Stabellini wrote:

[...]

> > I don't really see the problem for someone to mistakenly backport this
> > patch. In fact, this could potentially save them a lot of debugging if
> > it happens that Xen is loaded above 2TB.
> > 
> > Anyway, both Bertrand and you seems to be against the Fixes tag here. So
> > I can compromise with the "This commit fixes...". However, can Bertrand
> > or you update process/send-patches.pandoc so it is clear for a
> > contributor when they should add Fixes tag (which BTW I still disagree
> > with but if the majority agrees, then I will not nack)?
> 
> We had a chat about this during the Arm maintainer calls. The disagreement
> boiled down to the fact that SUPPORT.md (or the documentation) doesn't say
> anything about whether loading Xen above 2TB was supported or not. Depending
> on the view, one could consider a bug or not.
> 
> Looking through the documentation, the best place to document might actually
> be misc/arm/booting.txt where we already have some requirements to boot Xen
> (such the binary must be entered in NS EL2 mode).
> 
> I will prepare a patch and send one.


I would like to check if here is anything specific I should follow up
on. Based on the discussion in this thread, I've come to the following
conclusions:

- Remove the fixes tags;
- Add a description in commit log, something like:
  "Since commit 1c78d76b67e1 ('xen/arm64: mm: Introduce helpers to
   prepare/enable/disable the identity mapping'), Xen will fail to boot
   up if it's loaded in memory above 2TB. This commit fixes the
   regression introduced by that commit."
- Add tages:
  A review tag from Michal Orzel
  A review tag from Bertrand Marquis
  A test tag from Henry Wang

Should I repin a new patch set to address the items mentioned above?

Another question is for the 'Release-acked-by' tag.  Henry gave this
tag, but I don't know how to handle it if I need to respin this patch.
Seems to me this is a special tag only for release process, so I don't
need to include it in the new patch, right?

Thanks all of you for the efforts on this patch set!

Leo



Re: [PATCH v4 2/2] xen/arm: Enlarge identity map space to 10TB

2023-10-16 Thread Leo Yan
On Mon, Oct 16, 2023 at 01:40:26PM +, Bertrand Marquis wrote:

[...]

> > This patch enlarges identity map space to 10TB, allowing module loading
> > within the range of [0x0 .. 0x09ff__].
> > 
> > Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to 
> > prepare/enable/disable")
> 
> I agree with Michal here, this is not a fix so this should be removed (can be 
> done
> on commit).

This is fine for me.

I'd like to confirm with maintainers that should I spin a new patch
set to remove the fix tag?  Or maintainers could help to remove it
when pick up this patch set.

And thanks for review, Michal and Bertrand.

Leo

> > Reported-by: Alexey Klimov 
> > Signed-off-by: Leo Yan 
> 
> Reviewed-by: Bertrand Marquis 
> 
> Cheers
> Bertrand



[PATCH v4 2/2] xen/arm: Enlarge identity map space to 10TB

2023-10-13 Thread Leo Yan
On ADLink AVA platform (Ampere Altra SoC with 32 Arm Neoverse N1 cores),
the physical memory regions are:

  DRAM memory regions:
Node[0] Region[0]: 0x8000 - 0x
Node[0] Region[1]: 0x0800 - 0x08007fff
Node[0] Region[2]: 0x0801 - 0x0807

The UEFI loads Xen hypervisor and DTB into the high memory, the kernel
and ramdisk images are loaded into the low memory space:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

In this case, the Xen binary is loaded above 8TB, which exceeds the
maximum supported identity map space of 2TB in Xen. Consequently, the
system fails to boot.

This patch enlarges identity map space to 10TB, allowing module loading
within the range of [0x0 .. 0x09ff__].

Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to prepare/enable/disable")
Reported-by: Alexey Klimov 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/arm64/mm.c   | 6 --
 xen/arch/arm/include/asm/mmu/layout.h | 8 
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mm.c
index 78b7c7eb00..cb69df0661 100644
--- a/xen/arch/arm/arm64/mm.c
+++ b/xen/arch/arm/arm64/mm.c
@@ -41,7 +41,8 @@ static void __init prepare_boot_identity_mapping(void)
 clear_page(boot_third_id);
 
 if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+panic("Cannot handle ID mapping above %uTB\n",
+  IDENTITY_MAPPING_AREA_NR_L0 >> 1);
 
 /* Link first ID table */
 pte = mfn_to_xen_entry(virt_to_mfn(boot_first_id), MT_NORMAL);
@@ -74,7 +75,8 @@ static void __init prepare_runtime_identity_mapping(void)
 DECLARE_OFFSETS(id_offsets, id_addr);
 
 if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+panic("Cannot handle ID mapping above %uTB\n",
+  IDENTITY_MAPPING_AREA_NR_L0 >> 1);
 
 /* Link first ID table */
 pte = pte_of_xenaddr((vaddr_t)xen_first_id);
diff --git a/xen/arch/arm/include/asm/mmu/layout.h 
b/xen/arch/arm/include/asm/mmu/layout.h
index 2cb2382fbf..eac7eef885 100644
--- a/xen/arch/arm/include/asm/mmu/layout.h
+++ b/xen/arch/arm/include/asm/mmu/layout.h
@@ -19,11 +19,11 @@
  *   2G -   4G   Domheap: on-demand-mapped
  *
  * ARM64 layout:
- * 0x - 0x01ff (2TB, L0 slots [0..3])
+ * 0x - 0x09ff (10TB, L0 slots [0..19])
  *
  *  Reserved to identity map Xen
  *
- * 0x0200 - 0x027f (512GB, L0 slot [4])
+ * 0x0a00 - 0x0a7f (512GB, L0 slot [20])
  *  (Relative offsets)
  *   0  -   2M   Unmapped
  *   2M -  10M   Xen text, data, bss
@@ -35,7 +35,7 @@
  *
  *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
  *
- * 0x0280 - 0x7fff (125TB, L0 slots [5..255])
+ * 0x0a80 - 0x7fff (512GB+117TB, L0 slots [21..255])
  *  Unused
  *
  * 0x8000 - 0x84ff (5TB, L0 slots [256..265])
@@ -49,7 +49,7 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
-#define IDENTITY_MAPPING_AREA_NR_L0 4
+#define IDENTITY_MAPPING_AREA_NR_L0 20
 #define XEN_VM_MAPPING  SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
 
 #define SLOT0_ENTRY_BITS  39
-- 
2.39.2




[PATCH v4 0/2] xen/arm: Enlarge identity map space

2023-10-13 Thread Leo Yan
The latest Xen fails to boot on ADLink AVA platform.  Alexey Klimov root
caused the issue is related with the commit 1c78d76b67 ("xen/arm64: mm:
Introduce helpers to prepare/enable/disable").

This is because on ADLink AVA platform, it loads Xen hypervisor to the
address above 8TB and hence causes Xen panic (see patch 02 for details).

To fix this issue, this series is to enlarge identity map space to 10TB.

Note, I did a smoke test for this patch set, but we have no chance to
test it on the AVA board which reported the issue since the board is
bricked now.  And it's likely we don't need this patch set anymore after
updating firmware on the AVA machine.

Hope this patch set can be helpful for the system with big memory, I'd
leave the judgement to maintainers for picking it or not.

Changes from v3:
- Changed the identity map space from 127TB to 10TB (Julien Grall).

Changes from v2:
- Kept macro naming IDENTITY_MAPPING_AREA_NR_L0 and removed introduced
  macros (Julien Grall).
- Minor improvement for coding style (Julien Grall).
- Added platform's detail in the patch 02 commit log (Julien Grall).

Changes from v1:
- Rebased on staging branch (Bertrand);
- Added Alexey Klimov tested tag for patch 01 (Alexey).
- Corrected the printing log with dynamically calculation ID map size.


Leo Yan (2):
  xen/arm: Add macro XEN_VM_MAPPING
  xen/arm: Enlarge identity map space to 10TB

 xen/arch/arm/arm64/mm.c   |  6 --
 xen/arch/arm/include/asm/mmu/layout.h | 17 +
 2 files changed, 13 insertions(+), 10 deletions(-)

-- 
2.39.2




[PATCH v4 1/2] xen/arm: Add macro XEN_VM_MAPPING

2023-10-13 Thread Leo Yan
Xen maps the virtual memory space starting from L0 slot 4, so it's open
coded for macros with the offset '4'.

For more readable, add a new macro XEN_VM_MAPPING which defines the
start slot for Xen virtual memory mapping, and all virtual memory
regions are defined based on it.

Acked-by: Julien Grall 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/include/asm/mmu/layout.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/include/asm/mmu/layout.h 
b/xen/arch/arm/include/asm/mmu/layout.h
index da6be276ac..2cb2382fbf 100644
--- a/xen/arch/arm/include/asm/mmu/layout.h
+++ b/xen/arch/arm/include/asm/mmu/layout.h
@@ -49,11 +49,14 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
+#define IDENTITY_MAPPING_AREA_NR_L0 4
+#define XEN_VM_MAPPING  SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
+
 #define SLOT0_ENTRY_BITS  39
 #define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
 #define SLOT0_ENTRY_SIZE  SLOT0(1)
 
-#define XEN_VIRT_START  (SLOT0(4) + _AT(vaddr_t, MB(2)))
+#define XEN_VIRT_START  (XEN_VM_MAPPING + _AT(vaddr_t, MB(2)))
 #endif
 
 /*
@@ -116,12 +119,10 @@
 
 #else /* ARM_64 */
 
-#define IDENTITY_MAPPING_AREA_NR_L0  4
-
-#define VMAP_VIRT_START  (SLOT0(4) + GB(1))
+#define VMAP_VIRT_START  (XEN_VM_MAPPING + GB(1))
 #define VMAP_VIRT_SIZE   GB(1)
 
-#define FRAMETABLE_VIRT_START  (SLOT0(4) + GB(32))
+#define FRAMETABLE_VIRT_START  (XEN_VM_MAPPING + GB(32))
 #define FRAMETABLE_SIZEGB(32)
 #define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
 
-- 
2.39.2




[PATCH v4] xen/arm: Skip memory nodes if not enabled

2023-10-13 Thread Leo Yan
Currently, Xen doesn't check the status property of memory/reserved
memory nodes, which may lead to the following issues:

- If a memory node has a status "disabled" it implies that it should
  not be used. Xen does not handle the status property for the memory
  node and ends up using it.

- If a reserved memory node has a status "disabled", it means that this
  region is no longer reserved and can be used, but the "disabled"
  status is not handled by Xen.

  Xen passes the intact device tree binding of the reserved memory nodes
  to Dom0 and creates a memory node to cover reserved regions. Disabled
  reserved memory nodes are ignored by the Dom0 Linux kernel, thus the
  Dom0 Linux kernel will continue to allocate pages from such a region.

  On the other hand, since the disabled status is not handled by Xen,
  the disabled reserved memory regions are excluded from the page
  management in Xen which results in Xen being unable to obtain the
  corresponding MFN, in the end, Xen reports error like:

  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc

This patch introduces a function device_tree_node_is_available(). If it
detects a memory node is not enabled, Xen will not add the memory region
into the memory lists. In the end, this avoids to generate the memory
node for the disabled memory regions sent to the kernel and the kernel
cannot use the disabled memory nodes any longer.

Since this patch adds checking device node's status in the
device_tree_get_meminfo() function, except it checks for memory nodes
and reserved memory nodes, it also supports status for static memory
and static heap.

Suggested-by: Michal Orzel 
Signed-off-by: Leo Yan 
Reviewed-by: Michal Orzel 
---

Changes from v3:
- Refined the commit log to avoid ambiguous description (Michal Orzel).

Changes from v2:
- Added checking for the DT property length (Julien Grall, Michal Orzel).

Changes from v1:
- Renamed function to device_tree_node_is_available() (Michal Orzel);
- Polished coding style (Michal Orzel);
- Refined commit log (Michal Orzel, Julien Grall).

 xen/arch/arm/bootfdt.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 2673ad17a1..d73f8e4971 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -16,6 +16,24 @@
 #include 
 #include 
 
+static bool __init device_tree_node_is_available(const void *fdt, int node)
+{
+const char *status;
+int len;
+
+status = fdt_getprop(fdt, node, "status", &len);
+if ( !status )
+return true;
+
+if ( len > 0 )
+{
+if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
+return true;
+}
+
+return false;
+}
+
 static bool __init device_tree_node_matches(const void *fdt, int node,
 const char *match)
 {
@@ -97,6 +115,9 @@ static int __init device_tree_get_meminfo(const void *fdt, 
int node,
 paddr_t start, size;
 struct meminfo *mem = data;
 
+if ( !device_tree_node_is_available(fdt, node) )
+return 0;
+
 if ( address_cells < 1 || size_cells < 1 )
 {
 printk("fdt: property `%s': invalid #address-cells or #size-cells",
-- 
2.39.2





Re: [PATCH v3] xen/arm: Skip memory nodes if not enabled

2023-10-13 Thread Leo Yan
Hi Michal,

On Fri, Oct 13, 2023 at 01:00:12PM +0200, Michal Orzel wrote:
> Hi Leo,
> 
> On 13/10/2023 12:29, Leo Yan wrote:
> > 
> > 
> > Currently, the Xen hypervisor doesn't handle the status, the issue can
> > be described from two perspectives: the memory nodes and the reserved
> > memory nodes.
> The first part about the status is a bit ambiguous.
> How about (this can be done on commit):
> Currently, Xen doesn't check the status property of memory/reserved
> memory nodes, which may lead to the following issues:

Agreed. Will send the new patch soon.

[...]

> Reviewed-by: Michal Orzel 

Will add your review tag. Thanks for review and suggestion!

Leo



Re: [PATCH v2] xen/arm: Skip memory nodes if not enabled

2023-10-13 Thread Leo Yan
Hi Michal,

On Fri, Sep 29, 2023 at 10:11:19AM +0200, Michal Orzel wrote:

[...]

> > +static bool __init device_tree_node_is_available(const void *fdt, int node)
> > +{
> > +const char *status = fdt_getprop(fdt, node, "status", NULL);
> Please see Julien's comment for v1. To save some jumps,instructions
> we should also check for length of the property to be > 0, just like we do in 
> dt_device_is_available().

Sorry for I missed Julien's comment.  Have sent patch v3 for review.

And apologize for late replying.

Leo



[PATCH v3] xen/arm: Skip memory nodes if not enabled

2023-10-13 Thread Leo Yan
Currently, the Xen hypervisor doesn't handle the status, the issue can
be described from two perspectives: the memory nodes and the reserved
memory nodes.

- If a memory node has a status "disabled" it implies that it should
  not be used. Xen does not handle the status property for the memory
  node and ends up using it.

- If a reserved memory node has a status "disabled", it means that this
  region is no longer reserved and can be used, but the "disabled"
  status is not handled by Xen.

  Xen passes the intact device tree binding of the reserved memory nodes
  to Dom0 and creates a memory node to cover reserved regions. Disabled
  reserved memory nodes are ignored by the Dom0 Linux kernel, thus the
  Dom0 Linux kernel will continue to allocate pages from such a region.

  On the other hand, since the disabled status is not handled by Xen,
  the disabled reserved memory regions are excluded from the page
  management in Xen which results in Xen being unable to obtain the
  corresponding MFN, in the end, Xen reports error like:

  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc

This patch introduces a function device_tree_node_is_available(). If it
detects a memory node is not enabled, Xen will not add the memory region
into the memory lists. In the end, this avoids to generate the memory
node for the disabled memory regions sent to the kernel and the kernel
cannot use the disabled memory nodes any longer.

Since this patch adds checking device node's status in the
device_tree_get_meminfo() function, except it checks for memory nodes
and reserved memory nodes, it also supports status for static memory
and static heap.

Suggested-by: Michal Orzel 
Signed-off-by: Leo Yan 
---

Changes from v2:
- Added checking for the DT property length (Julien Grall, Michal Orzel).

Changes from v1:
- Renamed function to device_tree_node_is_available() (Michal Orzel);
- Polished coding style (Michal Orzel);
- Refined commit log (Michal Orzel, Julien Grall).

 xen/arch/arm/bootfdt.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 2673ad17a1..d73f8e4971 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -16,6 +16,24 @@
 #include 
 #include 
 
+static bool __init device_tree_node_is_available(const void *fdt, int node)
+{
+const char *status;
+int len;
+
+status = fdt_getprop(fdt, node, "status", &len);
+if ( !status )
+return true;
+
+if ( len > 0 )
+{
+if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
+return true;
+}
+
+return false;
+}
+
 static bool __init device_tree_node_matches(const void *fdt, int node,
 const char *match)
 {
@@ -97,6 +115,9 @@ static int __init device_tree_get_meminfo(const void *fdt, 
int node,
 paddr_t start, size;
 struct meminfo *mem = data;
 
+if ( !device_tree_node_is_available(fdt, node) )
+return 0;
+
 if ( address_cells < 1 || size_cells < 1 )
 {
 printk("fdt: property `%s': invalid #address-cells or #size-cells",
-- 
2.39.2




Re: [PATCH] xen/arm: Skip memory nodes if not enabled

2023-09-28 Thread Leo Yan
Hi Michal, Julien,

On Wed, Sep 27, 2023 at 02:49:23PM +0200, Michal Orzel wrote:

[...]

> Either way is fine. The advantage of placing the check in boot_fdt_info() is
> that we can have a single check instead of duplicated and we do the check 
> before
> the "first" use which happens to be the banks sorting. The advantage of 
> setup_mm()
> is that it is the one dealing with memory banks and is called after 
> early_print_info()
> so user can see some additional info.
> 
> @Leo, will you send a patch for that? Don't feel obliged as it is not 
> strictly related
> to your patch in which case I can handle it.

@Michal, since you have much better sense than me for adding check for
memory banks, I'd like to leave it to you.

I just refined the patch v2 according to your comments and sent out
for review.

Thanks,
Leo



[PATCH v2] xen/arm: Skip memory nodes if not enabled

2023-09-28 Thread Leo Yan
Currently, the Xen hypervisor doesn't handle the status, the issue can
be described from two perspectives: the memory nodes and the reserved
memory nodes.

- If a memory node has a status "disabled" it implies that it should
  not be used. Xen does not handle the status property for the memory
  node and ends up using it.

- If a reserved memory node has a status "disabled", it means that this
  region is no longer reserved and can be used, but the "disabled"
  status is not handled by Xen.

  Xen passes the intact device tree binding of the reserved memory nodes
  to Dom0 and creates a memory node to cover reserved regions. Disabled
  reserved memory nodes are ignored by the Dom0 Linux kernel, thus the
  Dom0 Linux kernel will continue to allocate pages from such a region.

  On the other hand, since the disabled status is not handled by Xen,
  the disabled reserved memory regions are excluded from the page
  management in Xen which results in Xen being unable to obtain the
  corresponding MFN, in the end, Xen reports error like:

  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc

This patch introduces a function device_tree_node_is_available(). If it
detects a memory node is not enabled, Xen will not add the memory region
into the memory lists. In the end, this avoids to generate the memory
node for the disabled memory regions sent to the kernel and the kernel
cannot use the disabled memory nodes any longer.

Since this patch adds checking device node's status in the
device_tree_get_meminfo() function, except it checks for memory nodes
and reserved memory nodes, it also supports status for static memory
and static heap.

Suggested-by: Michal Orzel 
Signed-off-by: Leo Yan 
---

Changes from v1:
- Renamed function to device_tree_node_is_available() (Michal Orzel);
- Polished coding style (Michal Orzel);
- Refined commit log (Michal Orzel, Julien Grall).

 xen/arch/arm/bootfdt.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 2673ad17a1..1b80d2d622 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -16,6 +16,19 @@
 #include 
 #include 
 
+static bool __init device_tree_node_is_available(const void *fdt, int node)
+{
+const char *status = fdt_getprop(fdt, node, "status", NULL);
+
+if ( !status )
+return true;
+
+if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
+return true;
+
+return false;
+}
+
 static bool __init device_tree_node_matches(const void *fdt, int node,
 const char *match)
 {
@@ -97,6 +110,9 @@ static int __init device_tree_get_meminfo(const void *fdt, 
int node,
 paddr_t start, size;
 struct meminfo *mem = data;
 
+if ( !device_tree_node_is_available(fdt, node) )
+return 0;
+
 if ( address_cells < 1 || size_cells < 1 )
 {
 printk("fdt: property `%s': invalid #address-cells or #size-cells",
-- 
2.39.2




Re: [PATCH] xen/arm: Skip memory nodes if not enabled

2023-09-26 Thread Leo Yan
Hi Luca,

On Tue, Sep 26, 2023 at 10:10:57AM +, Luca Fancellu wrote:

[...]

> >> I might be wrong, but reading the specifications seems that “status” is 
> >> not a property
> >> of the child nodes of /reserved-memory, so I’m not sure Xen should do 
> >> something about it.
> >
> > Please take a look at dt documentation (v0.4) for /memory and 
> > /reserved-memory.
> > Under the tables listing possible properties, there is a statement:
> > Note: All other standard properties (Section 2.3) are allowed but are 
> > optional.
> 
> Thanks for pointing that out, I missed that bit.

Though ePAPR [1] doesn't explicitly say "status" property can be
included by memory nodes and reserved memory nodes, "status" property
is a standard property.

The Linux kernel checks "status" property for memory nodes and reserved
memory nodes with the of_fdt_device_is_available() function. Actually,
I just reuse this function and rename it in current patch :)

Thanks,
Leo

[1] https://elinux.org/images/c/cf/Power_ePAPR_APPROVED_v1.1.pdf



Re: [PATCH] xen/arm: Skip memory nodes if not enabled

2023-09-26 Thread Leo Yan
Hi Michal,

On Tue, Sep 26, 2023 at 10:36:04AM +0200, Michal Orzel wrote:

[...]

> > Essentially, this issue is caused by the Xen hypervisor which misses to
> > handle the status for the memory nodes (for both the normal memory nodes
> > and the reserved memory nodes) and transfers them to the Linux kernel.
> > 
> > This patch introduces a function memory_node_is_available(). If it
> > detects a memory node is not enabled, the hypervisor will not add the
> > memory region into the memory lists. In the end, this avoids to generate
> > the memory nodes from the memory lists sent to the kernel and the kernel
> > cannot use the disabled memory nodes any longer.
> 
> Interesting. So FWICS, we have 2 issues that have a common ground:
>
> 1) If the reserved memory node has a status "disabled", it implies that this 
> region
> is no longer reserved and can be used which is not handled today by Xen and 
> leads
> to the above mentioned problem.

Correct.

> 2) If the memory node has a status "disabled" it implies that it should not 
> be used
> which is not the case in current Xen. This means that at the moment, Xen 
> would try
> to use such a memory region which is incorrect.

Exactly.

> I think the commit msg should be more generic and focus on these two issues.
> Summary:
> Xen does not handle the status property of memory nodes and ends up using 
> them.
> Xen does not handle the status property of reserved memory nodes. If status 
> is disabled
> it means the reserved region shall no longer be treated as reserved. Xen 
> passes the reserved
> memory nodes untouched to dom0 fdt and creates a memory node to cover 
> reserved regions.
> Disabled reserved memory nodes are ignored by the guest which leaves us with 
> the exposed
> memory nodes. Guest can access such a region but it is excluded from the page 
> management in Xen
> which results in Xen being unable to obtain the corresponding MFN.

Yes, thanks a lot for good summary!

In theory, a good practice should use two separate patches to fix two
issues respectively. Given we can use simple change to fix these two
issues in one patch, I will amend the patch's commit log with your
summary for better recording these issues.

> > Signed-off-by: Leo Yan 
> > ---
> >  xen/arch/arm/bootfdt.c | 16 
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > index 2673ad17a1..b46dde06a9 100644
> > --- a/xen/arch/arm/bootfdt.c
> > +++ b/xen/arch/arm/bootfdt.c
> > @@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt, 
> > int node,
> >  return 0;
> >  }
> > 
> > +static bool __init memory_node_is_available(const void *fdt, unsigned long 
> > node)
>
> This function is not strictly related to memory node so it would be better to 
> call it e.g. device_tree_node_is_available.
> This way it can be used in the future for other nodes if needed. Also, I 
> would move it somewhere near the top of the file
> next to other helpers.

Will do.

> Also, node should be of type 'int'

Will fix.

> > +{
> > +const char *status = fdt_getprop(fdt, node, "status", NULL);
> > +
> > +if (!status)
> white spaces between brackets and condition
> if ( !status )

Will fix.

> > +return true;
> > +
> > +if (!strcmp(status, "ok") || !strcmp(status, "okay"))
> white spaces between brackets and condition
> if ( !strcmp(status, "ok") || !strcmp(status, "okay") )

Will fix.

> > +return true;
> > +
> > +return false;
> > +}
> > +
> >  static int __init process_memory_node(const void *fdt, int node,
> >const char *name, int depth,
> >u32 address_cells, u32 size_cells,
> >void *data)
> >  {
> > +if (!memory_node_is_available(fdt, node))
> > +return 0;
>
> I would move this check to device_tree_get_meminfo()

Okay, I will do it.

> > +
> >  return device_tree_get_meminfo(fdt, node, "reg", address_cells, 
> > size_cells,
> > data, MEMBANK_DEFAULT);
> >  }
> > --
> > 2.39.2
> > 
> > 
> 
> Also, I think it would be nice to add ASSERT(bootinfo.mem.nr_banks); e.g. in 
> boot_fdt_info().
> Otherwise the panic from Xen when there is no memory bank:
> The frametable cannot cover the physical region ...
> is not really meaningful for normal users.

I'd like to use a separate patch to validate the memory banks.
> 
> This is just my suggestion (@Julien ?)

Thank you a lot for review.

Leo



Re: RFC: arm64: Handling reserved memory nodes

2023-09-26 Thread Leo Yan
Hi Julien,

On Wed, Sep 20, 2023 at 11:36:24AM +0100, Julien Grall wrote:

[...]

> > Host dt:
> > memory@4000 {
> >  reg = <0x00 0x4000 0x01 0x00>;
> >  device_type = "memory";
> > };
> > 
> > reserved-memory {
> >  #size-cells = <0x02>;
> >  #address-cells = <0x02>;
> >  ranges;
> > 
> >  test@5000 {
> >  reg = <0x00 0x5000 0x00 0x1000>;
> >  no-map;
> >  };
> > };
> > 
> > Xen:
> > (XEN) MODULE[0]: 4ac0 - 4ad65000 Xen
> > (XEN) MODULE[1]: 4ae0 - 4ae03000 Device Tree
> > (XEN) MODULE[2]: 42c0 - 4aa8ea8b Ramdisk
> > (XEN) MODULE[3]: 4040 - 42b3 Kernel
> > (XEN)  RESVD[0]: 5000 - 5fff
> > ...
> > (XEN) BANK[0] 0x00c000-0x01 (1024MB)
> > 
> > Linux dom0:
> > [0.00] OF: reserved mem: 0x5000..0x5fff 
> > (262144 KiB) nomap non-reusable test@5000
> 
> So Linux should tell whether a region has been reserved. @Leo, can you share
> with us the serial console? Can you confirm the version of Xen you are
> using?

Finally, I located the issue is caused by the reserved memory node
with "disabled" status. In the end, it mismatches for handling the
disabled memory nodes between the Xen hypervisor and the Linux kernel.

I think I should set the status property from the reserved memory
nodes for my debugging platform. But for the case that the memory
nodes (for both normal and reserved nodes) which are disabled, the Xen
hypervisor should ignore them. So I sent a new patch to address the
issue for disabled memory nodes:

https://lore.kernel.org/xen-devel/2023092605.180811-1-leo@linaro.org/T/#u

Hope now we are clear for the issue. If not, please let me know.

Thanks,
Leo



[PATCH] xen/arm: Skip memory nodes if not enabled

2023-09-25 Thread Leo Yan
During the Linux kernel booting, an error is reported by the Xen
hypervisor:

  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc

The kernel attempts to use an invalid memory frame number, which can be
converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.

The invalid memory frame falls into a reserved memory node, which is
defined in the device tree as below:

  reserved-memory {
  #address-cells = <0x02>;
  #size-cells = <0x02>;
  ranges;

  ...

  ethosn_reserved {
  compatible = "shared-dma-pool";
  reg = <0x01 0xa000 0x00 0x2000>;
  status = "disabled";
  no-map;
  };

  ...
  };

Xen excludes all reserved memory regions from the frame management
through the dt_unreserved_regions() function. On the other hand, the
reserved memory nodes are passed to the Linux kernel. However, the Linux
kernel ignores the 'ethosn_reserved' node since its status is
"disabled". This leads to the Linux kernel to allocate pages from the
reserved memory range.

As a result, when the kernel passes the data structure residing in the
frame 0x1_a02d_c000 in the Xen hypervisor, the hypervisor detects that
it misses to manage the frame and reports the error.

Essentially, this issue is caused by the Xen hypervisor which misses to
handle the status for the memory nodes (for both the normal memory nodes
and the reserved memory nodes) and transfers them to the Linux kernel.

This patch introduces a function memory_node_is_available(). If it
detects a memory node is not enabled, the hypervisor will not add the
memory region into the memory lists. In the end, this avoids to generate
the memory nodes from the memory lists sent to the kernel and the kernel
cannot use the disabled memory nodes any longer.

Signed-off-by: Leo Yan 
---
 xen/arch/arm/bootfdt.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 2673ad17a1..b46dde06a9 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt, int 
node,
 return 0;
 }
 
+static bool __init memory_node_is_available(const void *fdt, unsigned long 
node)
+{
+const char *status = fdt_getprop(fdt, node, "status", NULL);
+
+if (!status)
+return true;
+
+if (!strcmp(status, "ok") || !strcmp(status, "okay"))
+return true;
+
+return false;
+}
+
 static int __init process_memory_node(const void *fdt, int node,
   const char *name, int depth,
   u32 address_cells, u32 size_cells,
   void *data)
 {
+if (!memory_node_is_available(fdt, node))
+return 0;
+
 return device_tree_get_meminfo(fdt, node, "reg", address_cells, size_cells,
data, MEMBANK_DEFAULT);
 }
-- 
2.39.2




Re: RFC: arm64: Handling reserved memory nodes

2023-09-20 Thread Leo Yan
On Wed, Sep 20, 2023 at 11:36:24AM +0100, Julien Grall wrote:

[...]

> Can you confirm the version of Xen you are using?

For this question, I use a very new Xen repo with staging branch which
I git pulled at last week.

Thanks,
Leo



Re: RFC: arm64: Handling reserved memory nodes

2023-09-20 Thread Leo Yan
On Wed, Sep 20, 2023 at 11:36:24AM +0100, Julien Grall wrote:

[...]

> > > Could you confirm the Xen does write reserved memory nodes?  Or Xen
> > > converts the reserved memory nodes to normal memory nodes as I
> > > describe above :)
> > 
> > Xen passes the /reserved-memory node unchanged from host device tree to 
> > dom0 fdt.
> > Apart from that it creates an additional memory node covering the reserved 
> > ranges.
> > Take a look at this example run(based on qemu):
> 
> Thanks for providing an example! This is quite handy.

Yeah, thanks a lot for sharing the DT binding, Michal!

> > Host dt:
> > memory@4000 {
> >  reg = <0x00 0x4000 0x01 0x00>;
> >  device_type = "memory";
> > };
> > 
> > reserved-memory {
> >  #size-cells = <0x02>;
> >  #address-cells = <0x02>;
> >  ranges;
> > 
> >  test@5000 {
> >  reg = <0x00 0x5000 0x00 0x1000>;
> >  no-map;
> >  };
> > };
> > 
> > Xen:
> > (XEN) MODULE[0]: 4ac0 - 4ad65000 Xen
> > (XEN) MODULE[1]: 4ae0 - 4ae03000 Device Tree
> > (XEN) MODULE[2]: 42c0 - 4aa8ea8b Ramdisk
> > (XEN) MODULE[3]: 4040 - 42b3 Kernel
> > (XEN)  RESVD[0]: 5000 - 5fff
> > ...
> > (XEN) BANK[0] 0x00c000-0x01 (1024MB)
> > 
> > Linux dom0:
> > [0.00] OF: reserved mem: 0x5000..0x5fff 
> > (262144 KiB) nomap non-reusable test@5000
> 
> So Linux should tell whether a region has been reserved. @Leo, can you share
> with us the serial console? Can you confirm the version of Xen you are
> using?

Sure, I will share the logs and DTS in Dom0 kernel when I can access
my machine in next week (due to I am on vacation in this week).

I am a bit suspicious that I am not aware something in my working
kernel. If have any finding, I will share back.

Thanks,
Leo



Re: RFC: arm64: Handling reserved memory nodes

2023-09-20 Thread Leo Yan
Hi Julien,

On Mon, Sep 18, 2023 at 08:26:21PM +0100, Julien Grall wrote:

[...]

> ... from my understanding reserved-memory are just normal memory that are
> set aside for a specific purpose. So Xen has to create a 'memory' node *and*
> a 'reserved-memory' region.

To be clear, Xen passes the 'reserved-memory' regions as normal memory
nodes, see [1].

> With that the kernel is supposed to exclude all the 'reserved-memory' from
> normal usage unless they have the node contains the property 'reusable'.
> This was more clear before the binding was converted to YAML in [1].

Linux kernel reserves pages for memory ranges in the 'reserved-memory'
node, no matter the 'no-map' property for a range is set or not (see the
function memmap_init_reserved_pages() -> __SetPageReserved() in Linux
kernel).

If a reserved memory range is set with 'no-map' property, the memory
region will be not mapped in the kernel's identical address space.  This
avoids the data corruption caused between the memory speculative fetch
with cachable mapping and the same memory region is used by devices
(e.g. for DMA transferring).

[...]

> > Here the problem is these reserved memory regions are passed as normal
> > memory nodes to Dom0 kernel, then Dom0 kernel allocates pages from
> > these reserved memory regions.  Apparently, this might lead to conflict,
> > e.g. the reserved memory is used by Dom0 kernel, at the meantime the
> > memory is used by another purpose (e.g. by MCU in the system).
> 
> See above. I think this is correct to pass both 'memory' and
> 'reserved-memory'. Now, it is possible that Xen may not create the
> device-tree correctly.

Agreed that now Xen wrongly create DT binding for 'reserved-memory'
node, more specific, the reserved memory nodes are wrongly passed as
normal memory nodes (again, see [1]).

> I would suggest to look how Linux is populating the memory and whether it
> actually skipped the regions.

The Linux kernel reserves the corresponding pages for all reserved
memory regions, which means the kernel page management (buddy
alrogithm) doesn't allocate these pages at all.

With 'no-map' property, the memory range will not be mapped into the
kernel identical address space.

> > Here I am a bit confused for "Xen doesn't have the capability to know
> > the memory attribute".  I looked into the file arch/arm/guest_walk.c,
> > IIUC, it walks through the stage 1's page tables for the virtual
> > machine and get the permission for the mapping, we also can get to
> > know the mapping attribute, right?
> 
> Most of the time, Xen will use the HW to translate the guest virtual address
> to an intermediation physical address. Looking at the specification, it
> looks like that PAR_EL1 will contain the memory attribute which I didn't
> know.
> 
> We would then need to read MAIR_EL1 to find the attribute and also the
> memory attribute in the stage-2 to figure out the final memory attribute.

> This is feasible but the Xen ABI mandates that region passed to Xen have a
> specific memory attributes (see the comment at the top of
> xen/include/public/arch-arm.h).

If you refer to the comment "All memory which is shared with other
entities in the system ... which is mapped as Normal Inner Write-Back
Outer Write-Back Inner-Shareable", I don't think it's relevant with
current issue.  I will explain in details in below.

> Anyway, in your case, Linux is using the buffer is on the stack. So the
> region must have been mapped with the proper attribute.

I think you may misunderstand the issue.  I would like to divide the
issue into two parts:

- The first question is about how to pass reserved memory node from Xen
  hypervisor to Dom0 Linux kernel.  Currently, Xen hypervisor coverts
  the reserved memory ranges and add them into the normal memory node.

  Xen hypervisor should keep the reserved memory node and pass it to
  Dom0 Linux kernel.  With this change, the Dom0 kernel will only
  allocate pages from normal memory node and the data in these pages
  can be shared by Xen hypervisor and Dom0 Linux kernel.

- The second question is for memory attribute for the reserved memory
  node.  Note, the reserved memory ranges are not necessarily _shared_
  between the Xen hypervisor and Dom0 Linux kernel.  I think in most
  cases, the reserved memory will be ioremaped by drivers (for stage-1);
  and the Xen hypervisor should map P2M with the attribute
  p2m_mmio_direct_c, or we can explore a bit based on different
  properties, e.g. for 'no-map' memory range, we map P2M with
  p2m_mmio_direct_c; for 'reusable' memory range, we map with
  attribute 'p2m_ram_rw'.

To simplify the discussion, I think we can firstly finalize the fixing
for the fist question and hold on the second question.  After we fix
the first one, we can come back to think about the second issue.

> > Another question for the attribute for MMIO regions. For mapping MMIO
> > regions, prepare_dtb_hwdom() sets the attribute 'p2m_mmio_direct_c'
> > for the stage 2, but in t

Re: RFC: arm64: Handling reserved memory nodes

2023-09-16 Thread Leo Yan
Hi Julien,

On Thu, Sep 14, 2023 at 10:37:05AM +0100, Julien Grall wrote:

[...]

> > This error is caused by kernel using an invalid memory frame number
> > 0x1a02dc, we can convert it to the address:
> > 
> >0x1a02dc << PAGE_SHIFT = 0x1_a02d_c000
> 
> This error is coming from get_page_from_gva(). The use of the function is
> usually an indication that Xen is trying to access the page. Can you use
> WARN() to provide a full trace?

With adding WARN(), I can get below trace log:

  (XEN) arch/arm/p2m.c:2206: d0v0: Failing to acquire the MFN 0x1a0171
  (XEN) Xen WARN at arch/arm/p2m.c:2208
  (XEN) [ Xen-4.18-unstable  arm64  debug=y  Tainted:  S ]
  (XEN) CPU:0
  (XEN) PC: 7f27540c get_page_from_gva+0x30c/0x334
  (XEN) LR: 7f27540c
  (XEN) SP: 8002ffee7a20
  (XEN) CPSR:   8249 MODE:64-bit EL2h (Hypervisor, handler)
  (XEN)  X0: 7f318038  X1:   X2: 
  (XEN)  X3: 0001  X4: 7f2bf956  X5: 
  (XEN)  X6: 80022150  X7: 80022150  X8: 7f7f7f7f7f7f7f7f
  (XEN)  X9: 0080 X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
  (XEN) X12: 0008 X13: 0030 X14: 
  (XEN) X15: 3be6b29d X16: 000c X17: 9c4542e5
  (XEN) X18: 0014 X19: 8002fffcd000 X20: 8002ffeef000
  (XEN) X21: 7f08054050b8 X22:  X23: 001a0171
  (XEN) X24: 0001a0171d80 X25: 0004 X26: 
  (XEN) X27: 8002ffeef000 X28: 1000  FP: 8002ffee7a20
  (XEN) 
  (XEN)   VTCR_EL2: 800a3558
  (XEN)  VTTBR_EL2: 00010002fffca000
  (XEN) 
  (XEN)  SCTLR_EL2: 30cd183d
  (XEN)HCR_EL2: 807c663f
  (XEN)  TTBR0_EL2: 20251000
  (XEN) 
  (XEN)ESR_EL2: f201
  (XEN)  HPFAR_EL2: 00176800
  (XEN)FAR_EL2: ffc01190
  (XEN) 
  (XEN) Xen stack trace from sp=8002ffee7a20:
  (XEN)8002ffee7a80 7f26cc44 0d80 0018
  (XEN)0007 0018 8002ffee7cf8 ffc01187bd80
  (XEN)8002ffee7ad0 ffc8 8002ffee7ab0 7f32211e
  (XEN)8002ffee7b00 7f26ce54 ffdf ffc01187bd80
  (XEN)0007  8002fffcd000 ffc0117cc000
  (XEN)0001a01db000   
  (XEN)0007 7f2b0018 8002ffeefe16 
  (XEN)8002ffee7b10 7f21ab3c 8002ffee7e40 7f277c84
  (XEN)8002ffee7ea0 5a000ea1 8002ffee7f20 8002ffeef000
  (XEN)5a000ea1 ffc0117cc000 7f3190a4 7f319080
  (XEN)001b 8002ffee7c20 8002ffee7b90 7f26e27c
  (XEN) 7f26e384 8002ffee7bd0 7f269f88
  (XEN)001b 7f329000 8002ffee7c20 
  (XEN)1fff 8249 8002ffee7c10 7f2797cc
  (XEN)0010 5a000ea1 8002ffee7d28 8249
  (XEN)5a000ea1 ffc011603ba8 8002ffee7d40 7f26232c
  (XEN)8002ffee7c50 7f270a68 8002fffcd000 8002ffeef000
  (XEN)7f0800e4d0a8 8002ffeef000 8002ffee7c70 7f223f48
  (XEN)8002ffee7c90 7f224264 7f2bfa20 8002ffee7dca
  (XEN) 8002ffee7dca 8002ffee7cb0 7f2242dc
  (XEN)02c0 7f24b068 8002ffee7cd0 7f24b0c4
  (XEN)000a 7f26ce54 8002ffee7d40 7f2484c0
  (XEN)000a 5a000ea1 8002ffee7d20 7f224264
  (XEN)7f2bf950 5a000ea1 000a 000a
  (XEN)8002ffee7d40 7f2242bc 000a 02c0
  (XEN)8002ffee7d50 7f24946c 8002ffee7e40 7f277cc0
  (XEN)8002ffee7ea0 5a000ea1 8002ffee7f20 8002ffeef000
  (XEN)5a000ea1 ffc011613d40 ffc011713878 0059
  (XEN)8002ffee7dc0 7f223f48 8002ffee7de0 7f224264
  (XEN)8002ffee7de0 7f223f48 8002ffee7e00 7f224264
  (XEN)8002ffeefe10 8002ffeef000 7f318288 0004
  (XEN)8002ffee7e20 7f2242dc 0240 8002ffeef000
  (XEN)8002ffee7e40 7f27c484 0004 
  (XEN)8002ffee7e70 7f279400 8002ffee7ea0 5a000ea1
  (XEN)8002ffee7fa8 8005 ffc01187bd30 7f262480
  (XEN) ffc0117d2000 ffc01187bd30 7f262474
  (XEN)0007 ffc01187bd80 001a01db 0005
  (XEN)

RFC: arm64: Handling reserved memory nodes

2023-09-14 Thread Leo Yan
Hi all,

I'd like to discuss for how to handle the reserved memory nodes in DT
binding on Xen / Arm64.  Note, now I am using DTB when boot Xen but
not UEFI/ACPI (ACPI is disabled in this case).

## Failure

I ported Xen on a platform, after the kernel booting, the Xen hypervisor
reports error:

  (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc

This error is caused by kernel using an invalid memory frame number
0x1a02dc, we can convert it to the address:

  0x1a02dc << PAGE_SHIFT = 0x1_a02d_c000

## Reason

Two important things we need to check.  One is what's the DT binding
passed from the bootloader to Xen hypervisor, and the second thing is
what's the DT binding passed from Xen hypervisor to kernel.

We can see the bootloader passes below memory nodes to Xen hypervisor:

  (XEN) RAM: 2000 - bfff
  (XEN) RAM: 0001a000 - 0002
  (XEN)
  (XEN) MODULE[0]: 2010 - 20265000 Xen
  (XEN) MODULE[1]: 2300 - 23024000 Device Tree
  (XEN) MODULE[2]: 2400 - 2800 Kernel
  (XEN)  RESVD[0]: 2000 - 2000
  (XEN)  RESVD[1]: 4000 - 5fff
  (XEN)  RESVD[2]: 0001a000 - 0001bfff
  (XEN)  RESVD[3]: 2e00 - 2fff

We can see the second DDR section is:

  [0x_0001_a000_ .. 0x_0002__]

And there have reserved memory section is:

  [0x_0001_a000_ .. 0x_0001_bfff_]

When register the boot memory sections, dt_unreserved_regions() will
remove all reserved memory sections, which means the section
[0x_0001_a000_ .. 0x_0001_bfff_] is not managed by Xen
hypervisor at all.  If later kernel uses any pages in this range, Xen
will report the error.

But Dom0 kernel's memory nodes are:

  [0.00] Early memory node ranges
  [0.00]   node   0: [mem 0x2000-0x2000]
  [0.00]   node   0: [mem 0x2e00-0x2fff]
  [0.00]   node   0: [mem 0x4000-0x5fff]
  [0.00]   node   0: [mem 0x6000-0x77bf]
  [0.00]   node   0: [mem 0x0001a000-0x0001bfff]

Based on this log, we can know Xen hypervisor passes the reserved memory
regions to Dom0 kernel, and when Dom0 kernel tries to allocate pages
from these reserved memory regions, Xen hypervisor reports error.

For more specific, this issue is cause by the commit 248faa637d ("xen/arm:
add reserved-memory regions to the dom0 memory node").  When Xen
hypervisor synthesizes DT nodes, it calls below function to generate
_normal_ memory nodes for the reserved memory regions.

  make_memory_node()
  {
 /*
  * Create a second memory node to store the ranges covering
  * reserved-memory regions.
  */
 res = make_memory_node(d, kinfo->fdt, addrcells, sizecells,
&bootinfo.reserved_mem);
 if ( res )
 return res;

 ...
  }

## Fixes

I think it's wrong to add the reserved memory regions into the DT
binding as normal memory nodes for Dom0 kernel.  On the other hand, we
cannot simply remove these reserved memory regions and don't pass to
Dom0 kernel - we might reserve memory for specific purpose, for example,
ramoops [1] for kernel debugging.

The right thing to do is to keep these reserved memory nodes to Dom0
kernel.  So one task is to record properties for these reserved memory
node name and properties and pass to Dom0 kernel.

The difficulty is how we can avoid allocate these reserved memory
regions in Xen hypervisor.  We cannot register the reserved memory
into the boot pages, otherwise, the reserved memory might be allocated
in the early phase.  But we need to register these pages into the
frame management framework and reserve them in the first place, so
that we can allow Dom0 kernel to use them.  (I checked a bit the static
memory mechanism, seems to me we cannot use it to resolve this issue).

Before I proceed, I'd like to check if Xen community has related
discussion or not?  Or any suggestions or input will be appreciate!

Leo

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts#n54



[PATCH v3 2/2] xen/arm: Enlarge identity map space to 127TB

2023-09-13 Thread Leo Yan
On ADLink AVA platform (Ampere Altra SoC with 32 Arm Neoverse N1 cores),
the physical memory regions are:

  DRAM memory regions:
Node[0] Region[0]: 0x8000 - 0x
Node[0] Region[1]: 0x0800 - 0x08007fff
Node[0] Region[2]: 0x0801 - 0x0807

The UEFI loads Xen hypervisor and DTB into the high memory, the kernel
and ramdisk images are loaded into the low memory space:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

In this case, the Xen binary is loaded above 8TB, which exceeds the
maximum supported identity map space of 2TB in Xen. Consequently, the
system fails to boot.

This patch enlarges identity map space to 127TB, allowing module loading
within the range of [0x0 .. 0x7eff__].

Note, despite this expansion of the identity map to 127TB, the frame
table still only supports 2TB.  The reason is the frame table is data
structure for the page management, which does not require coverage of
the memory layout gaps (refer to pfn_pdx_hole_setup() for Xen removing
the biggest gap from memory regions).  Thus, 2TB of memory support
remains sufficient for most use cases.

Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to prepare/enable/disable")
Reported-by: Alexey Klimov 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/arm64/mm.c   | 6 --
 xen/arch/arm/include/asm/mmu/layout.h | 8 
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mm.c
index 78b7c7eb00..cb69df0661 100644
--- a/xen/arch/arm/arm64/mm.c
+++ b/xen/arch/arm/arm64/mm.c
@@ -41,7 +41,8 @@ static void __init prepare_boot_identity_mapping(void)
 clear_page(boot_third_id);
 
 if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+panic("Cannot handle ID mapping above %uTB\n",
+  IDENTITY_MAPPING_AREA_NR_L0 >> 1);
 
 /* Link first ID table */
 pte = mfn_to_xen_entry(virt_to_mfn(boot_first_id), MT_NORMAL);
@@ -74,7 +75,8 @@ static void __init prepare_runtime_identity_mapping(void)
 DECLARE_OFFSETS(id_offsets, id_addr);
 
 if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+panic("Cannot handle ID mapping above %uTB\n",
+  IDENTITY_MAPPING_AREA_NR_L0 >> 1);
 
 /* Link first ID table */
 pte = pte_of_xenaddr((vaddr_t)xen_first_id);
diff --git a/xen/arch/arm/include/asm/mmu/layout.h 
b/xen/arch/arm/include/asm/mmu/layout.h
index 2cb2382fbf..fa16d07d0d 100644
--- a/xen/arch/arm/include/asm/mmu/layout.h
+++ b/xen/arch/arm/include/asm/mmu/layout.h
@@ -19,11 +19,11 @@
  *   2G -   4G   Domheap: on-demand-mapped
  *
  * ARM64 layout:
- * 0x - 0x01ff (2TB, L0 slots [0..3])
+ * 0x - 0x7eff (127TB, L0 slots [0..253])
  *
  *  Reserved to identity map Xen
  *
- * 0x0200 - 0x027f (512GB, L0 slot [4])
+ * 0x7f00 - 0x7f7f (512GB, L0 slot [254])
  *  (Relative offsets)
  *   0  -   2M   Unmapped
  *   2M -  10M   Xen text, data, bss
@@ -35,7 +35,7 @@
  *
  *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
  *
- * 0x0280 - 0x7fff (125TB, L0 slots [5..255])
+ * 0x7f80 - 0x7fff (512GB, L0 slots [255])
  *  Unused
  *
  * 0x8000 - 0x84ff (5TB, L0 slots [256..265])
@@ -49,7 +49,7 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
-#define IDENTITY_MAPPING_AREA_NR_L0 4
+#define IDENTITY_MAPPING_AREA_NR_L0 254
 #define XEN_VM_MAPPING  SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
 
 #define SLOT0_ENTRY_BITS  39
-- 
2.34.1




[PATCH v3 1/2] xen/arm: Add macro XEN_VM_MAPPING

2023-09-13 Thread Leo Yan
Xen maps the virtual memory space starting from L0 slot 4, so it's open
coded for macros with the offset '4'.

For more readable, add a new macro XEN_VM_MAPPING which defines the
start slot for Xen virtual memory mapping, and all virtual memory
regions are defined based on it.

Signed-off-by: Leo Yan 
---
 xen/arch/arm/include/asm/mmu/layout.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/include/asm/mmu/layout.h 
b/xen/arch/arm/include/asm/mmu/layout.h
index da6be276ac..2cb2382fbf 100644
--- a/xen/arch/arm/include/asm/mmu/layout.h
+++ b/xen/arch/arm/include/asm/mmu/layout.h
@@ -49,11 +49,14 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
+#define IDENTITY_MAPPING_AREA_NR_L0 4
+#define XEN_VM_MAPPING  SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
+
 #define SLOT0_ENTRY_BITS  39
 #define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
 #define SLOT0_ENTRY_SIZE  SLOT0(1)
 
-#define XEN_VIRT_START  (SLOT0(4) + _AT(vaddr_t, MB(2)))
+#define XEN_VIRT_START  (XEN_VM_MAPPING + _AT(vaddr_t, MB(2)))
 #endif
 
 /*
@@ -116,12 +119,10 @@
 
 #else /* ARM_64 */
 
-#define IDENTITY_MAPPING_AREA_NR_L0  4
-
-#define VMAP_VIRT_START  (SLOT0(4) + GB(1))
+#define VMAP_VIRT_START  (XEN_VM_MAPPING + GB(1))
 #define VMAP_VIRT_SIZE   GB(1)
 
-#define FRAMETABLE_VIRT_START  (SLOT0(4) + GB(32))
+#define FRAMETABLE_VIRT_START  (XEN_VM_MAPPING + GB(32))
 #define FRAMETABLE_SIZEGB(32)
 #define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
 
-- 
2.34.1




[PATCH v3 0/2] xen/arm: Enlarge identity map space

2023-09-13 Thread Leo Yan
The latest Xen fails to boot on ADLink AVA platform.  Alexey Klimov root
caused the issue is related with the commit 1c78d76b67 ("xen/arm64: mm:
Introduce helpers to prepare/enable/disable").

This is because on ADLink AVA platform, it loads Xen hypervisor to the
address above 8TB and hence causes Xen panic (see patch 02 for details).

To fix this issue, this series is to enlarge identity map space to
127 TiB and tested on AVA platform.

Changes from v2:
- Kept macro naming IDENTITY_MAPPING_AREA_NR_L0 and removed introduced
  macros (Julien Grall).
- Minor improvement for coding style (Julien Grall).
- Added platform's detail in the patch 02 commit log (Julien Grall).

Changes from v1:
- Rebased on staging branch (Bertrand);
- Added Alexey Klimov tested tag for patch 01 (Alexey).
- Corrected the printing log with dynamically calculation ID map size.


Leo Yan (2):
  xen/arm: Add macro XEN_VM_MAPPING
  xen/arm: Enlarge identity map space to 127TB

 xen/arch/arm/arm64/mm.c   |  6 --
 xen/arch/arm/include/asm/mmu/layout.h | 17 +
 2 files changed, 13 insertions(+), 10 deletions(-)

-- 
2.34.1




Re: [PATCH RESEND v2 2/2] xen/arm: Enlarge identity map space to 127TiB

2023-09-11 Thread Leo Yan
Hi Henry,

On Tue, Sep 12, 2023 at 03:01:30AM +, Henry Wang wrote:

[...]

> At Arm we also test Xen on AVA in our CI, but our setup is based on Yocto
> meta-adlink-ampere layer [1].

Thanks for sharing the info.  Yes, we (Linaro) also work with Yocto
based on EWAOL [1] or TRS [2].

> >> Also, in v1, the problem was also depending on the firmware version. Do you
> >> know if it fails booting on a new or old firmware?
> > 
> > My colleague Alexey (CCed) and I both tested Xen hypervisor on own AVA
> > machine, Alexey produced this issue on his machine with the old
> > firmware, I upgraded to the new firmware so I cannot see the issue
> > anymore.
> > 
> >>>   (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
> >>>   (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
> >>>   (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
> >>>   (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel
> 
> I tried to dig a bit more about Xen log printed by our AVA setup, and [2] is 
> our
> RAM and MODULES setup, just FYI and hopefully it helps.

Yeah, the log shows the high address space is above 8TB:

  (XEN) RAM: 0800 - 08007fff
  (XEN) RAM: 0801 - 08039bff
  (XEN) RAM: 08039c00 - 0803

Appreciate for the log and it's helpful!

Leo

> [1] https://github.com/ADLINK/meta-adlink-ampere
> [2] https://pastebin.com/raw/0DQpmJP1
> 
> Kind regards,
> Henry
> 
> 



Re: [PATCH RESEND v2 2/2] xen/arm: Enlarge identity map space to 127TiB

2023-09-11 Thread Leo Yan
Hi Julien,

On Mon, Sep 11, 2023 at 04:13:27PM +0100, Julien Grall wrote:
> Hi Leo,
> 
> I know you said you will respin the series. I'd like to leave some comments
> to avoid having another round afterwards.

Thanks!

> On 09/09/2023 09:34, Leo Yan wrote:
> > On some platforms, the memory regions could be:
> 
> Can you add some details in the commit message on which platform you saw the
> issue?

Sure, I will add it.  The issue happens on the ADLink AVA platform.

> Also, in v1, the problem was also depending on the firmware version. Do you
> know if it fails booting on a new or old firmware?

My colleague Alexey (CCed) and I both tested Xen hypervisor on own AVA
machine, Alexey produced this issue on his machine with the old
firmware, I upgraded to the new firmware so I cannot see the issue
anymore.

> >(XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
> >(XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
> >(XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
> >(XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel
> > 
> > In this case, the Xen binary is loaded above 2TiB.  2TiB is the maximum
> > identity map space supported by Xen, thus it fails to boot up due to the
> > out of the range.
> > 
> > This patch introduces several macros to present the zeroth page table's
> 
> Typo: s/zeroth/zeroeth/

Will fix.

> > slot numbers for easier readable.  Based on the defined macros, it
> > enlarges identity map space to 127TiB, which can support the memory
> > space [0x0 .. 0x7eff__] so has flexibility for various
> > platforms.
> 
> Reserving 127 TiB for just the identity mapping is quite a lot. How did you
> decide the limit?

When I reviewed the existed code, I found it reserves 125TiB:

  0x0280 - 0x7fff (125TB, L0 slots [5..255])
Unused

Seems to me, we can map this area.  Ideally, if we only map for the
first level's page table, we can just fill the zeroeth page and don't
need to allocate extra page tables.

> What limit do you need on your platform?

On AVA platform, we can see the memory layout is:

DRAM memory regions:
  Node[0] Region[0]: 0x8000 - 0x
  Node[0] Region[1]: 0x0800 - 0x08007fff
  Node[0] Region[2]: 0x0801 - 0x0807

So the phycial memory address is splitted into the low address and the
high address.  The high memory address is above 8TiB.

> > Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to 
> > prepare/enable/disable")
> > Reported-by: Alexey Klimov 
> > Signed-off-by: Leo Yan 
> > ---
> >   xen/arch/arm/arm64/mm.c   | 12 
> >   xen/arch/arm/include/asm/config.h | 15 ---
> >   xen/arch/arm/mm.c |  2 +-
> >   3 files changed, 17 insertions(+), 12 deletions(-)
> > 
> > diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mm.c
> > index 78b7c7eb00..802170cf29 100644
> > --- a/xen/arch/arm/arm64/mm.c
> > +++ b/xen/arch/arm/arm64/mm.c
> > @@ -40,8 +40,10 @@ static void __init prepare_boot_identity_mapping(void)
> >   clear_page(boot_second_id);
> >   clear_page(boot_third_id);
> > -if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
> > -panic("Cannot handle ID mapping above 2TB\n");
> > +if ( id_offsets[0] >= XEN_IDENTITY_MAP_L0_END )
> 
> I don't see the value of renaming IDENTIY_MAPPING_AREA_NR_L0 to
> XEN_IDENTIY_MAP_L0_END. See more below.

Okay.  It's fine for me to keep the naming.

> > +/* 1TiB occupies 2 slots in zeroeth table */
> 
> I don't understand how this comment is related to the message below.
> 
> > +panic("Cannot handle ID mapping above %dTiB\n",
> 
> The value is unsigned, so this you should use "%u". Also we have been using
> "TB" in Xen rather than "TiB". I would rather prefer if we keep the same for
> consistency even if this is not totally accurate.

Sure, will fix it.

> > +  XEN_IDENTITY_MAP_L0_END>>1);
> 
> Coding style: please add a space before and after >>.

Will fix.

> >   /* Link first ID table */
> >   pte = mfn_to_xen_entry(virt_to_mfn(boot_first_id), MT_NORMAL);
> > @@ -73,8 +75,10 @@ static void __init prepare_runtime_identity_mapping(void)
> >   lpae_t pte;
> >   DECLARE_OFFSETS(id_offsets, id_addr);
> > -if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
> > -panic("Cannot handle ID mapping above 2TB\n");
> > +if ( id_offsets[0] >= XEN_IDENTITY_MAP_L0_END )
> > +  

Re: [PATCH RESEND v2 0/2] xen/arm: Enlarge identity map space

2023-09-09 Thread Leo Yan
On Sat, Sep 09, 2023 at 04:34:08PM +0800, Leo Yan wrote:
> The latest Xen fails to boot on ADLink AVA platform.  Alexey Klimov root
> caused the issue is related with the commit 1c78d76b67 ("xen/arm64: mm:
> Introduce helpers to prepare/enable/disable").
> 
> This is because on ADLink AVA platform, it loads Xen hypervisor to the
> address above 2TB and hence causes Xen panic:
> 
>   (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
>   (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
>   (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
>   (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel
> 
> To fix this issue, this series is to enlarge identity map space to
> 127 TiB and tested on Telechips Dolphin5 platform.

Ouch, after 'git pull' the latest Xen staging branch and see the
conflict.  I will send new one patch set.  Please ignore this one.

Thanks,
Leo



[PATCH RESEND v2 2/2] xen/arm: Enlarge identity map space to 127TiB

2023-09-09 Thread Leo Yan
On some platforms, the memory regions could be:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

In this case, the Xen binary is loaded above 2TiB.  2TiB is the maximum
identity map space supported by Xen, thus it fails to boot up due to the
out of the range.

This patch introduces several macros to present the zeroth page table's
slot numbers for easier readable.  Based on the defined macros, it
enlarges identity map space to 127TiB, which can support the memory
space [0x0 .. 0x7eff__] so has flexibility for various
platforms.

Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to prepare/enable/disable")
Reported-by: Alexey Klimov 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/arm64/mm.c   | 12 
 xen/arch/arm/include/asm/config.h | 15 ---
 xen/arch/arm/mm.c |  2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mm.c
index 78b7c7eb00..802170cf29 100644
--- a/xen/arch/arm/arm64/mm.c
+++ b/xen/arch/arm/arm64/mm.c
@@ -40,8 +40,10 @@ static void __init prepare_boot_identity_mapping(void)
 clear_page(boot_second_id);
 clear_page(boot_third_id);
 
-if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+if ( id_offsets[0] >= XEN_IDENTITY_MAP_L0_END )
+/* 1TiB occupies 2 slots in zeroeth table */
+panic("Cannot handle ID mapping above %dTiB\n",
+  XEN_IDENTITY_MAP_L0_END>>1);
 
 /* Link first ID table */
 pte = mfn_to_xen_entry(virt_to_mfn(boot_first_id), MT_NORMAL);
@@ -73,8 +75,10 @@ static void __init prepare_runtime_identity_mapping(void)
 lpae_t pte;
 DECLARE_OFFSETS(id_offsets, id_addr);
 
-if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+if ( id_offsets[0] >= XEN_IDENTITY_MAP_L0_END )
+/* 1TiB occupies 2 slots in zeroeth table */
+panic("Cannot handle ID mapping above %dTiB\n",
+  XEN_IDENTITY_MAP_L0_END>>1);
 
 /* Link first ID table */
 pte = pte_of_xenaddr((vaddr_t)xen_first_id);
diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 21f4e68a40..b772f1574d 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -87,11 +87,11 @@
  *   2G -   4G   Domheap: on-demand-mapped
  *
  * ARM64 layout:
- * 0x - 0x01ff (2TB, L0 slots [0..3])
+ * 0x - 0x7eff (127TB, L0 slots [0..253])
  *
  *  Reserved to identity map Xen
  *
- * 0x0200 - 0x027f (512GB, L0 slot [4])
+ * 0x07f0 - 0x7fff (1TB, L0 slot [254..255])
  *  (Relative offsets)
  *   0  -   2M   Unmapped
  *   2M -  10M   Xen text, data, bss
@@ -103,9 +103,6 @@
  *
  *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
  *
- * 0x0280 - 0x7fff (125TB, L0 slots [5..255])
- *  Unused
- *
  * 0x8000 - 0x84ff (5TB, L0 slots [256..265])
  *  1:1 mapping of RAM
  *
@@ -117,8 +114,12 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
-#define IDENTITY_MAPPING_AREA_NR_L04
-#define XEN_VM_MAPPING SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
+#define XEN_IDENTITY_MAP_L0_START   0
+#define XEN_IDENTITY_MAP_L0_NUM 254
+#define XEN_IDENTITY_MAP_L0_END (XEN_IDENTITY_MAP_L0_START + \
+ XEN_IDENTITY_MAP_L0_NUM)
+#define XEN_VM_MAP_L0_START (XEN_IDENTITY_MAP_L0_END)
+#define XEN_VM_MAPPING  SLOT0(XEN_VM_MAP_L0_START)
 
 #define SLOT0_ENTRY_BITS  39
 #define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index c34cc94c90..218552783e 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -156,7 +156,7 @@ static void __init __maybe_unused build_assertions(void)
  * with it.
  */
 #define CHECK_OVERLAP_WITH_IDMAP(virt) \
-BUILD_BUG_ON(zeroeth_table_offset(virt) < IDENTITY_MAPPING_AREA_NR_L0)
+BUILD_BUG_ON(zeroeth_table_offset(virt) < XEN_IDENTITY_MAP_L0_END)
 
 CHECK_OVERLAP_WITH_IDMAP(XEN_VIRT_START);
 CHECK_OVERLAP_WITH_IDMAP(VMAP_VIRT_START);
-- 
2.39.2




[PATCH RESEND v2 1/2] xen/arm: Add macro XEN_VM_MAPPING

2023-09-09 Thread Leo Yan
Xen maps the virtual memory space starting from L0 slot 4, so it's open
coded for macros with the offset '4'.

For more readable, add a new macro XEN_VM_MAPPING which defines the
start slot for Xen virtual memory mapping, and all virtual memory
regions are defined based on it.

Signed-off-by: Leo Yan 
Tested-by: Alexey Klimov 
---
 xen/arch/arm/include/asm/config.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 83cbf6b0cb..21f4e68a40 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -117,11 +117,14 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
+#define IDENTITY_MAPPING_AREA_NR_L04
+#define XEN_VM_MAPPING SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
+
 #define SLOT0_ENTRY_BITS  39
 #define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
 #define SLOT0_ENTRY_SIZE  SLOT0(1)
 
-#define XEN_VIRT_START  (SLOT0(4) + _AT(vaddr_t, MB(2)))
+#define XEN_VIRT_START  (XEN_VM_MAPPING + _AT(vaddr_t, MB(2)))
 #endif
 
 /*
@@ -184,12 +187,10 @@
 
 #else /* ARM_64 */
 
-#define IDENTITY_MAPPING_AREA_NR_L0  4
-
-#define VMAP_VIRT_START  (SLOT0(4) + GB(1))
+#define VMAP_VIRT_START  (XEN_VM_MAPPING + GB(1))
 #define VMAP_VIRT_SIZE   GB(1)
 
-#define FRAMETABLE_VIRT_START  (SLOT0(4) + GB(32))
+#define FRAMETABLE_VIRT_START  (XEN_VM_MAPPING + GB(32))
 #define FRAMETABLE_SIZEGB(32)
 #define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
 
-- 
2.39.2




[PATCH RESEND v2 0/2] xen/arm: Enlarge identity map space

2023-09-09 Thread Leo Yan
The latest Xen fails to boot on ADLink AVA platform.  Alexey Klimov root
caused the issue is related with the commit 1c78d76b67 ("xen/arm64: mm:
Introduce helpers to prepare/enable/disable").

This is because on ADLink AVA platform, it loads Xen hypervisor to the
address above 2TB and hence causes Xen panic:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

To fix this issue, this series is to enlarge identity map space to
127 TiB and tested on Telechips Dolphin5 platform.

Changes from v1:
- Rebased on staging branch (Bertrand);
- Added Alexey Klimov tested tag for patch 01 (Alexey).
- Corrected the printing log with dynamically calculation ID map size.


Leo Yan (2):
  xen/arm: Add macro XEN_VM_MAPPING
  xen/arm: Enlarge identity map space to 127TiB

 xen/arch/arm/arm64/mm.c   | 12 
 xen/arch/arm/include/asm/config.h | 22 --
 xen/arch/arm/mm.c |  2 +-
 3 files changed, 21 insertions(+), 15 deletions(-)

-- 
2.39.2




[PATCH v2 1/2] xen/arm: Add macro XEN_VM_MAPPING

2023-09-09 Thread Leo Yan
Xen maps the virtual memory space starting from L0 slot 4, so it's open
coded for macros with the offset '4'.

For more readable, add a new macro XEN_VM_MAPPING which defines the
start slot for Xen virtual memory mapping, and all virtual memory
regions are defined based on it.

Signed-off-by: Leo Yan 
---
 xen/arch/arm/include/asm/config.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 83cbf6b0cb..21f4e68a40 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -117,11 +117,14 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
+#define IDENTITY_MAPPING_AREA_NR_L04
+#define XEN_VM_MAPPING SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
+
 #define SLOT0_ENTRY_BITS  39
 #define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
 #define SLOT0_ENTRY_SIZE  SLOT0(1)
 
-#define XEN_VIRT_START  (SLOT0(4) + _AT(vaddr_t, MB(2)))
+#define XEN_VIRT_START  (XEN_VM_MAPPING + _AT(vaddr_t, MB(2)))
 #endif
 
 /*
@@ -184,12 +187,10 @@
 
 #else /* ARM_64 */
 
-#define IDENTITY_MAPPING_AREA_NR_L0  4
-
-#define VMAP_VIRT_START  (SLOT0(4) + GB(1))
+#define VMAP_VIRT_START  (XEN_VM_MAPPING + GB(1))
 #define VMAP_VIRT_SIZE   GB(1)
 
-#define FRAMETABLE_VIRT_START  (SLOT0(4) + GB(32))
+#define FRAMETABLE_VIRT_START  (XEN_VM_MAPPING + GB(32))
 #define FRAMETABLE_SIZEGB(32)
 #define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
 
-- 
2.39.2




[PATCH v2 0/2] xen/arm: Enlarge identity map space

2023-09-09 Thread Leo Yan
The latest Xen fails to boot on ADLink AVA platform.  Alexey Klimov root
caused the issue is related with the commit 1c78d76b67 ("xen/arm64: mm:
Introduce helpers to prepare/enable/disable").

This is because on ADLink AVA platform, it loads Xen hypervisor to the
address above 2TB and hence causes Xen panic:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

To fix this issue, this series is to enlarge identity map space to
127 TiB and tested on Telechips Dolphin5 platform.

Changes from v1:
- Rebased on staging branch (Bertrand);
- Added Alexey Klimov tested tag for patch 01 (Alexey).
- Corrected the printing log with dynamically calculation ID map size.


Leo Yan (2):
  xen/arm: Add macro XEN_VM_MAPPING
  xen/arm: Enlarge identity map space to 127TiB

 xen/arch/arm/arm64/mm.c   | 12 
 xen/arch/arm/include/asm/config.h | 22 --
 xen/arch/arm/mm.c |  2 +-
 3 files changed, 21 insertions(+), 15 deletions(-)

-- 
2.39.2




[PATCH v2 2/2] xen/arm: Enlarge identity map space to 127TiB

2023-09-09 Thread Leo Yan
On some platforms, the memory regions could be:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

In this case, the Xen binary is loaded above 2TiB.  2TiB is the maximum
identity map space supported by Xen, thus it fails to boot up due to the
out of the range.

This patch introduces several macros to present the zeroth page table's
slot numbers for easier readable.  Based on the defined macros, it
enlarges identity map space to 127TiB, which can support the memory
space [0x0 .. 0x7eff__] so has flexibility for various
platforms.

Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to prepare/enable/disable")
Reported-by: Alexey Klimov 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/arm64/mm.c   | 12 
 xen/arch/arm/include/asm/config.h | 15 ---
 xen/arch/arm/mm.c |  2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mm.c
index 78b7c7eb00..802170cf29 100644
--- a/xen/arch/arm/arm64/mm.c
+++ b/xen/arch/arm/arm64/mm.c
@@ -40,8 +40,10 @@ static void __init prepare_boot_identity_mapping(void)
 clear_page(boot_second_id);
 clear_page(boot_third_id);
 
-if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+if ( id_offsets[0] >= XEN_IDENTITY_MAP_L0_END )
+/* 1TiB occupies 2 slots in zeroeth table */
+panic("Cannot handle ID mapping above %dTiB\n",
+  XEN_IDENTITY_MAP_L0_END>>1);
 
 /* Link first ID table */
 pte = mfn_to_xen_entry(virt_to_mfn(boot_first_id), MT_NORMAL);
@@ -73,8 +75,10 @@ static void __init prepare_runtime_identity_mapping(void)
 lpae_t pte;
 DECLARE_OFFSETS(id_offsets, id_addr);
 
-if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
-panic("Cannot handle ID mapping above 2TB\n");
+if ( id_offsets[0] >= XEN_IDENTITY_MAP_L0_END )
+/* 1TiB occupies 2 slots in zeroeth table */
+panic("Cannot handle ID mapping above %dTiB\n",
+  XEN_IDENTITY_MAP_L0_END>>1);
 
 /* Link first ID table */
 pte = pte_of_xenaddr((vaddr_t)xen_first_id);
diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 21f4e68a40..b772f1574d 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -87,11 +87,11 @@
  *   2G -   4G   Domheap: on-demand-mapped
  *
  * ARM64 layout:
- * 0x - 0x01ff (2TB, L0 slots [0..3])
+ * 0x - 0x7eff (127TB, L0 slots [0..253])
  *
  *  Reserved to identity map Xen
  *
- * 0x0200 - 0x027f (512GB, L0 slot [4])
+ * 0x07f0 - 0x7fff (1TB, L0 slot [254..255])
  *  (Relative offsets)
  *   0  -   2M   Unmapped
  *   2M -  10M   Xen text, data, bss
@@ -103,9 +103,6 @@
  *
  *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
  *
- * 0x0280 - 0x7fff (125TB, L0 slots [5..255])
- *  Unused
- *
  * 0x8000 - 0x84ff (5TB, L0 slots [256..265])
  *  1:1 mapping of RAM
  *
@@ -117,8 +114,12 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
-#define IDENTITY_MAPPING_AREA_NR_L04
-#define XEN_VM_MAPPING SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
+#define XEN_IDENTITY_MAP_L0_START   0
+#define XEN_IDENTITY_MAP_L0_NUM 254
+#define XEN_IDENTITY_MAP_L0_END (XEN_IDENTITY_MAP_L0_START + \
+ XEN_IDENTITY_MAP_L0_NUM)
+#define XEN_VM_MAP_L0_START (XEN_IDENTITY_MAP_L0_END)
+#define XEN_VM_MAPPING  SLOT0(XEN_VM_MAP_L0_START)
 
 #define SLOT0_ENTRY_BITS  39
 #define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index c34cc94c90..218552783e 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -156,7 +156,7 @@ static void __init __maybe_unused build_assertions(void)
  * with it.
  */
 #define CHECK_OVERLAP_WITH_IDMAP(virt) \
-BUILD_BUG_ON(zeroeth_table_offset(virt) < IDENTITY_MAPPING_AREA_NR_L0)
+BUILD_BUG_ON(zeroeth_table_offset(virt) < XEN_IDENTITY_MAP_L0_END)
 
 CHECK_OVERLAP_WITH_IDMAP(XEN_VIRT_START);
 CHECK_OVERLAP_WITH_IDMAP(VMAP_VIRT_START);
-- 
2.39.2




Re: [PATCH v1 2/2] xen/arm: Enlarge identity map space to 127TiB

2023-09-04 Thread Leo Yan
Hi Bertrand,

On Mon, Sep 04, 2023 at 01:55:12PM +, Bertrand Marquis wrote:
> Hi Leo,
> 
> > On 31 Aug 2023, at 13:01, Leo Yan  wrote:
> > 
> > On some platforms, the memory regions could be:
> > 
> >  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
> >  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
> >  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
> >  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel
> > 
> > In this case, the Xen binary is loaded above 2TB, so Xen fails to boot
> > up due to the out of the identity map space.
> > 
> > This patch enlarges identity map space to 127TiB, which can support the
> > memory space [0x0 .. 0x7eff__], thus it has flexibility for
> > support different platforms.
> > 
> > Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to 
> > prepare/enable/disable")
> > Reported-by: Alexey Klimov 
> > Signed-off-by: Leo Yan 
> 
> This is not based on the current status of staging where this part of the
> code was modified.
> 
> Currently Xen virtual address is at 2TB and the extension you are making
> will potentially make it possible to load Xen at 2TB which will clash for the
> identity mapping handling in Xen.

I will check the stage code if this patch will introduce any clash
with the identity mapping handling.

> Please rebase on the latest staging and check there how you can do as
> this patch if rebased on staging right now with rightfully end in a 
> compilation
> error.

Sure, I will rebase on the latest staging branch.

Thank you for suggestions.

Leo



[PATCH v1 2/2] xen/arm: Enlarge identity map space to 127TiB

2023-08-31 Thread Leo Yan
On some platforms, the memory regions could be:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

In this case, the Xen binary is loaded above 2TB, so Xen fails to boot
up due to the out of the identity map space.

This patch enlarges identity map space to 127TiB, which can support the
memory space [0x0 .. 0x7eff__], thus it has flexibility for
support different platforms.

Fixes: 1c78d76b67 ("xen/arm64: mm: Introduce helpers to prepare/enable/disable")
Reported-by: Alexey Klimov 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/include/asm/config.h | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 21f4e68a40..3e97c95b57 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -87,11 +87,11 @@
  *   2G -   4G   Domheap: on-demand-mapped
  *
  * ARM64 layout:
- * 0x - 0x01ff (2TB, L0 slots [0..3])
+ * 0x - 0x7eff (127TB, L0 slots [0..253])
  *
  *  Reserved to identity map Xen
  *
- * 0x0200 - 0x027f (512GB, L0 slot [4])
+ * 0x07f0 - 0x7fff (1TB, L0 slot [254..255])
  *  (Relative offsets)
  *   0  -   2M   Unmapped
  *   2M -  10M   Xen text, data, bss
@@ -103,9 +103,6 @@
  *
  *  32G -  64G   Frametable: 56 bytes per page for 2TB of RAM
  *
- * 0x0280 - 0x7fff (125TB, L0 slots [5..255])
- *  Unused
- *
  * 0x8000 - 0x84ff (5TB, L0 slots [256..265])
  *  1:1 mapping of RAM
  *
@@ -117,7 +114,7 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
-#define IDENTITY_MAPPING_AREA_NR_L04
+#define IDENTITY_MAPPING_AREA_NR_L0254
 #define XEN_VM_MAPPING SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
 
 #define SLOT0_ENTRY_BITS  39
-- 
2.39.2




[PATCH v1 0/2] xen/arm: Enlarge identity map space

2023-08-31 Thread Leo Yan
The latest Xen fails to boot on ADLink AVA platform.  Alexey Klimov root
caused the issue is related with the commit 1c78d76b67 ("xen/arm64: mm:
Introduce helpers to prepare/enable/disable").

This is because on ADLink AVA platform, it loads Xen hypervisor to the
address above 2TB and hence causes Xen panic:

  (XEN) MODULE[0]: 0807f6df - 0807f6f3e000 Xen
  (XEN) MODULE[1]: 0807f8054000 - 0807f8056000 Device Tree
  (XEN) MODULE[2]: fa834000 - fc5de1d5 Ramdisk
  (XEN) MODULE[3]: fc5df000 - ffb3f810 Kernel

To fix this issue, this series is to enlarge identity map space to
127 TiB and tested on ADLink AVA platform.

Note, we tested on two ADLind AVA platforms, one machine has this issue
and another machine cannot reproduce the panic.  It's likely they have
different firmware versions so one machine loads Xen hypervisor into the
high memory address and caused booting failure.


Leo Yan (2):
  xen/arm: Add macro XEN_VM_MAPPING
  xen/arm: Enlarge identity map space to 127TiB

 xen/arch/arm/include/asm/config.h | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

-- 
2.39.2




[PATCH v1 1/2] xen/arm: Add macro XEN_VM_MAPPING

2023-08-31 Thread Leo Yan
Xen maps the virtual memory space starting from L0 slot 4, so it's open
coded for macros with the offset '4'.

For more readable, add a new macro XEN_VM_MAPPING which defines the
start slot for Xen virtual memory mapping, and all virtual memory
regions are defined based on it.

Signed-off-by: Leo Yan 
---
 xen/arch/arm/include/asm/config.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 83cbf6b0cb..21f4e68a40 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -117,11 +117,14 @@
 #define XEN_VIRT_START  _AT(vaddr_t, MB(2))
 #else
 
+#define IDENTITY_MAPPING_AREA_NR_L04
+#define XEN_VM_MAPPING SLOT0(IDENTITY_MAPPING_AREA_NR_L0)
+
 #define SLOT0_ENTRY_BITS  39
 #define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
 #define SLOT0_ENTRY_SIZE  SLOT0(1)
 
-#define XEN_VIRT_START  (SLOT0(4) + _AT(vaddr_t, MB(2)))
+#define XEN_VIRT_START  (XEN_VM_MAPPING + _AT(vaddr_t, MB(2)))
 #endif
 
 /*
@@ -184,12 +187,10 @@
 
 #else /* ARM_64 */
 
-#define IDENTITY_MAPPING_AREA_NR_L0  4
-
-#define VMAP_VIRT_START  (SLOT0(4) + GB(1))
+#define VMAP_VIRT_START  (XEN_VM_MAPPING + GB(1))
 #define VMAP_VIRT_SIZE   GB(1)
 
-#define FRAMETABLE_VIRT_START  (SLOT0(4) + GB(32))
+#define FRAMETABLE_VIRT_START  (XEN_VM_MAPPING + GB(32))
 #define FRAMETABLE_SIZEGB(32)
 #define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
 
-- 
2.39.2




Re: [PATCH v1] tools/hotplug: systemd: Make dependency on Xen device nodes

2023-08-25 Thread Leo Yan
On Fri, Aug 25, 2023 at 09:37:58AM +0200, Erik Schilling wrote:

[...]

> > > > diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in 
> > > > b/tools/hotplug/Linux/systemd/xenstored.service.in
> > > > index 261077dc92..6e48cdb0e7 100644
> > > > --- a/tools/hotplug/Linux/systemd/xenstored.service.in
> > > > +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
> > > > @@ -1,7 +1,7 @@
> > > >  [Unit]
> > > >  Description=The Xen xenstore
> > > > -Requires=proc-xen.mount
> > > > -After=proc-xen.mount
> > > > +Requires=proc-xen.mount dev-xen-gntdev.device
> > > > +After=proc-xen.mount dev-xen-gntdev.device
> > > 
> > > I must admit that I am a bit confused why this is enough... During our
> > > discussion on Slack, when you quoted from your rule it included
> > > `ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later
> > > development? Was there additional reasearch involved in dropping it?
> >
> > Yes, I dropped ENV{SYSTEMD_WANTS}="xenstored.service" and it works well
> > at my side.
> 
> Hm. Interesting. Could you plot the activations after boot?
> 
> systemd-analyze plot > /tmp/plot.svg
> 
> Seeing failure vs success but also the success cases on AVA vs Telechips
> may be interesting.
> 
> I just did some tests on my workstation where I added dependencies on
> a random USB device. If that one was not plugged at all, the service
> still happily started. Yet, it obviously seems to do something in your
> case... So I fear we may not fully understand the real problem yet.
> 
> I must admit that I find this case a bit under-documented. While "Wants"
> explicitly says that failling transactions are ignored, there is no
> clear statement about what happens in that case with "Requires".
> 
> While skimming the docs I also realized that maybe BindsTo= would maybe
> be more suitable compared to Requires= here. But that is unrelated to
> the confusion that I have about the original problem.

Please ignore this patch and sorry for noise.

Erik and me confirmed the systemd has established the dependencies
properly.  The dependency is (the above one depends on the blow one):

  xenstored.service
`> sysinit.target
 `> systemd-modules-load.service (Load xen modules)


My issue is caused by a local customzied systemd-modules-load.service
which breaks the dependency between sysinit.target and
systemd-modules-load.service.

Very appreciate Erik's help for root cause the issue.

Thanks,
Leo



Re: [PATCH v1] tools/hotplug: systemd: Make dependency on Xen device nodes

2023-08-24 Thread Leo Yan
On Fri, Aug 25, 2023 at 07:02:33AM +0200, Erik Schilling wrote:
> On Fri Aug 25, 2023 at 5:36 AM CEST, Leo Yan wrote:
> > When system booting up, the kernel module xen_gntdev.ko is loaded and
> > the device node '/dev/xen/gntdev' is created; later the xenstored
> > service in systemd launches daemon to open this device node.
> >
> > This flow has a race condition between creating the device node in the
> > kernel module and using the device node in the systemd service.  It's
> > possible that the xenstored service fails to open the device node due
> > to the delay of creating the device node.  In the end, xenbus cannot be
> > used between the Dom0 kernel and the Xen hypervisor.
> >
> > To resolve this issue, we need to synchronize between udev and systemd
> > for the device node.  There have an extra change in the udev rules for
> > tagging 'systemd' for Xen device nodes, which notifies device node
> > creating to systemd; besides udev change, this patch adds dependency in
> > systemd service for waiting the device node.
> >
> > Signed-off-by: Leo Yan 
> > ---
> >
> >  The udev rules change is on github:
> >  
> > https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539
> 
> Let's see what they think, but I fear systemd may not be the correct
> upstream to carry this... Skimming through the rules that they have
> there, it mostly looks like they only carry rules that are relevant for
> systemd-related services directly.

Yeah, I share the same concern.  But seems to me, upstreaming change
to the systemd is the most neat fixing.

> >  tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in 
> > b/tools/hotplug/Linux/systemd/xenstored.service.in
> > index 261077dc92..6e48cdb0e7 100644
> > --- a/tools/hotplug/Linux/systemd/xenstored.service.in
> > +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
> > @@ -1,7 +1,7 @@
> >  [Unit]
> >  Description=The Xen xenstore
> > -Requires=proc-xen.mount
> > -After=proc-xen.mount
> > +Requires=proc-xen.mount dev-xen-gntdev.device
> > +After=proc-xen.mount dev-xen-gntdev.device
> 
> I must admit that I am a bit confused why this is enough... During our
> discussion on Slack, when you quoted from your rule it included
> `ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later
> development? Was there additional reasearch involved in dropping it?

Yes, I dropped ENV{SYSTEMD_WANTS}="xenstored.service" and it works well
at my side.

My purpose is to upstream the udev rules in systemd as general as
possible.  As you mentioned, the "xenstored.service" is maintained in Xen
but not in systemd, for this reason, I would like avoid to add
"xenstored.service" into udev rules in systemd.

> My understanding was that if devices do not exist when systemd builds
> its transaction model, dependencies on them will just get tossed out[1].
> So I would have expected that there should still be a race if
> the .device does not pull in the service.

Thanks for sharing.  To be honest, I don't know this part.  Doesn't
systemd parse the service script and let service to wait for specific
.device until systemd receives notification for the .device?

> Did you hit the race frequently enough to be sure that this fixes it in
> entirety?

I have two boards.  One on board (Telechips), it's consistently
reproduce this issue, and after applying the udev rules change and
this patch, I confirmed the issue is dismissed entirely.

> Is there a place somewhere in the Xen or kernel code where one
> could add an excessive sleep in order to trigger the race more reliably?

On my AVA board with running Xen, I tried to add a long delay (10
seconds) in the kernel driver 'drivers/xen/gntdev.c', but I can see
systemd will wait for the kernel modules loading, and then it launches
Xen services.  Thus I cannot reproduce this issue on my AVA board.

So in below flow:

- Step 1: drivers/xen/gntdev.c registers misc device;
- Step 2: udev creates device node '/dev/xen/gntdev';
- Step 3: systemd launches xenstored.service.

Seems to me, the race condition happens between step 2 and step 3.  If
there have any delay in udev for creating device node, then the step3
for xenstored.service will fail.

Thanks for review.

Leo

> [1] "citation-needed": But I vaguely remember dealing with a similar
> issue and getting told this on #systemd IRC
> 
> - Erik



[PATCH v1] tools/hotplug: systemd: Make dependency on Xen device nodes

2023-08-24 Thread Leo Yan
When system booting up, the kernel module xen_gntdev.ko is loaded and
the device node '/dev/xen/gntdev' is created; later the xenstored
service in systemd launches daemon to open this device node.

This flow has a race condition between creating the device node in the
kernel module and using the device node in the systemd service.  It's
possible that the xenstored service fails to open the device node due
to the delay of creating the device node.  In the end, xenbus cannot be
used between the Dom0 kernel and the Xen hypervisor.

To resolve this issue, we need to synchronize between udev and systemd
for the device node.  There have an extra change in the udev rules for
tagging 'systemd' for Xen device nodes, which notifies device node
creating to systemd; besides udev change, this patch adds dependency in
systemd service for waiting the device node.

Signed-off-by: Leo Yan 
---

 The udev rules change is on github:
 
https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539

 tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in 
b/tools/hotplug/Linux/systemd/xenstored.service.in
index 261077dc92..6e48cdb0e7 100644
--- a/tools/hotplug/Linux/systemd/xenstored.service.in
+++ b/tools/hotplug/Linux/systemd/xenstored.service.in
@@ -1,7 +1,7 @@
 [Unit]
 Description=The Xen xenstore
-Requires=proc-xen.mount
-After=proc-xen.mount
+Requires=proc-xen.mount dev-xen-gntdev.device
+After=proc-xen.mount dev-xen-gntdev.device
 Before=libvirtd.service libvirt-guests.service
 RefuseManualStop=true
 ConditionPathExists=/proc/xen/capabilities
-- 
2.39.2




[PATCH] docs: Correct name for xen-command-line.pandoc

2023-07-24 Thread Leo Yan
In the commit d661611d08 ("docs/markdown: Switch to using pandoc, and
fix underscore escaping"), the documentation suffix was changed from
".markdown" to ".pandoc"; however, the reference was missed to update.

This patch updates the documentation name to xen-command-line.pandoc.

Fixes: d661611d08 ("docs/markdown: Switch to using pandoc, and fix underscore 
escaping")
Signed-off-by: Leo Yan 
---
 docs/features/sched_credit2.pandoc | 2 +-
 docs/misc/arm/big.LITTLE.txt   | 2 +-
 xen/common/Kconfig | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/features/sched_credit2.pandoc 
b/docs/features/sched_credit2.pandoc
index 436ff9f8f6..ef07e463cb 100644
--- a/docs/features/sched_credit2.pandoc
+++ b/docs/features/sched_credit2.pandoc
@@ -27,7 +27,7 @@ different than `credit2`) parameter is passed to Xen via the
 bootloader.
 
 Other parameters are available for tuning the behavior of Credit2
-(see `docs/misc/xen-command-line.markdown` for a complete list and
+(see `docs/misc/xen-command-line.pandoc` for a complete list and
 for their meaning).
 
 Once the system is live, for creating a cpupool with Credit2 as
diff --git a/docs/misc/arm/big.LITTLE.txt b/docs/misc/arm/big.LITTLE.txt
index b6ea1c9d61..1d11058444 100644
--- a/docs/misc/arm/big.LITTLE.txt
+++ b/docs/misc/arm/big.LITTLE.txt
@@ -42,5 +42,5 @@ The following option runs one domain vcpu as big and one as 
LITTLE:
   cpus = ["0-3", "4-7"]
 
 
-[1] docs/misc/xen-command-line.markdown
+[1] docs/misc/xen-command-line.pandoc
 [2] docs/man/xl.cfg.pod.5
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index dd8d7c3f1c..0d248ab941 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -440,7 +440,7 @@ config DOM0_MEM
  The specified string will be used for the dom0_mem parameter in
  case it was not specified on the command line.
 
- See docs/misc/xen-command-line.markdown for the supported syntax.
+ See docs/misc/xen-command-line.pandoc for the supported syntax.
 
  Leave empty if you are not sure what to specify.
 
-- 
2.39.2




Re: Issue: Networking performance in Xen VM on Arm64

2022-10-28 Thread Leo Yan
On Fri, Oct 28, 2022 at 05:54:05PM +0800, Leo Yan wrote:

[...]

> If map to the code, I think the function xennet_start_xmit() in Xen
> frontend driver is critical for the sending interval in domU.  I can
> see several things cost time when sending a packet:
> 
> - Xen frontend driver needs to setup grant table for every skb, it
>   invokes the function xennet_tx_setup_grant() and
>   gnttab_grant_foreign_access_ref() to prepare grant table;
> 
> - Xen frontend driver sends notification by calling
>   notify_remote_via_irq().  It will trap to Xen hypervisor to send the
>   interrupt, this takes several macro seonds for this step.
> 
> - Xen frontend driver calls xennet_tx_buf_gc(), the interval for this
>   function is vary, it will take ~30us in the case for reclaiming grant
>   table.
> 
> Any thoughts for this?

Supplement info with Ftrace function graph.  You could see below
log which shows the time is spent in xennet_start_xmit():

 1)   |  xennet_start_xmit() {
 1)   |  /* xennet_start_xmit: TSC: 50702652609 */
 1)   0.240 us|_raw_spin_lock_irqsave();
 1)   |netif_skb_features() {
 1)   0.280 us|  skb_network_protocol();
 1)   0.920 us|}
 1)   |gnttab_foreach_grant_in_range() {
 1)   |  xennet_tx_setup_grant() {
 1)   0.280 us|gnttab_claim_grant_reference();
 1)   |gnttab_grant_foreign_access_ref() {
 1)   0.280 us|  gnttab_update_entry_v1();
 1)   0.800 us|}
 1)   |/* id=103 ref=871 offset=2050 len=1514 TSC: 
50702652709 */
 1)   2.200 us|  }
 1)   2.800 us|}
 1)   |  /* xennet_notify_tx_irq: TSC: 50702652741 */
 1)   |notify_remote_via_irq() {
 1)   |  irq_get_irq_data() {
 1)   0.240 us|irq_to_desc();
 1)   0.760 us|  }
 1)   3.880 us|}
 1)   0.280 us|xennet_tx_buf_gc();
 1)   0.240 us|_raw_spin_unlock_irqrestore();
 1) + 11.120 us   |  }
 1)   |  xennet_start_xmit() {
 1)   |  /* xennet_start_xmit: TSC: 50702652974 */
 1)   0.280 us|_raw_spin_lock_irqsave();
 1)   |netif_skb_features() {
 1)   0.240 us|  skb_network_protocol();
 1)   0.760 us|}
 1)   |gnttab_foreach_grant_in_range() {
 1)   |  xennet_tx_setup_grant() {
 1)   0.280 us|gnttab_claim_grant_reference();
 1)   |gnttab_grant_foreign_access_ref() {
 1)   0.360 us|  gnttab_update_entry_v1();
 1)   0.840 us|}
 1)   |/* id=101 ref=869 offset=1026 len=574 TSC: 
50702653093 */
 1)   2.800 us|  }
 1)   3.320 us|}
 1)   |  /* xennet_notify_tx_irq: TSC: 50702653124 */
 1)   |notify_remote_via_irq() {
 1)   |  irq_get_irq_data() {
 1)   0.240 us|irq_to_desc();
 1)   0.760 us|  }
 1)   3.840 us|}
 1)   0.280 us|xennet_tx_buf_gc();
 1)   0.280 us|_raw_spin_unlock_irqrestore();
 1) + 11.800 us   |  }
 1)   |  /* finish_transmit_data */
 1)   |  /* transmit_data */
 1)   |  /* finish_transmit_data */
 1)   |  /* transmit_data */
 1)   |  /* finish_transmit_data */
 1)   |  /* transmit_data */
 1)   |  /* finish_transmit_data */
 1)   |  /* Before_throttle */
 1)   |  xennet_start_xmit() {
 1)   |  /* xennet_start_xmit: TSC: 50702654697 */
 1)   0.280 us|_raw_spin_lock_irqsave();
 1)   |netif_skb_features() {
 1)   0.240 us|  skb_network_protocol();
 1)   0.760 us|}
 1)   |gnttab_foreach_grant_in_range() {
 1)   |  xennet_tx_setup_grant() {
 1)   0.280 us|gnttab_claim_grant_reference();
 1)   |gnttab_grant_foreign_access_ref() {
 1)   0.320 us|  gnttab_update_entry_v1();
 1)   0.920 us|}
 1)   |/* id=3 ref=771 offset=2050 len=1514 TSC: 
50702654801 */
 1)   2.400 us|  }
 1)   2.960 us|}
 1)   |  /* xennet_notify_tx_irq: TSC: 50702654832 */
 1)   |notify_remote_via_irq() {
 1)   |  irq_get_irq_data() {
 1)   0.280 us|irq_to_desc();
 1)   0.760 us|  }
 1)   4.160 us|}
 1)   |xennet_tx_buf_gc() {
 1)   |  gnttab_end_foreign_access_ref() {
 1)   0.480 us|gnttab_end_foreign_access_ref_v1();
 1)   0.960 us|  }
 1)   0.280 us|  gnttab_release_grant_reference();
 1)   |  __dev_kfree_skb_irq() {
 1)   0.240 us|__raise_softirq_irqoff();
 1)   1.240 us|  }
 1)   |  gnttab_end_foreign_access_ref() {
 1)   0.

Re: Issue: Networking performance in Xen VM on Arm64

2022-10-28 Thread Leo Yan
Hi Stefano,

On Tue, Oct 25, 2022 at 04:58:16PM -0700, Stefano Stabellini wrote:
> On Mon, 24 Oct 2022, Leo Yan wrote:
> > > If you are really running with the NULL scheduler, then I would
> > > investigate why the vCPU has is_running == 0 because it should not
> > > happen.
> > 
> > Correct for this: it's my bad that I didn't really enable NULL scheduler
> > in my code base.  After I enabled NULL scheduler, the latency by context
> > switching is dismissed.
> > 
> >  8963  pub-338   [002]   217.777652: bprint:   
> > xennet_tx_setup_grant: id=60 ref=1340 offset=2 len=1514 TSC: 7892178799
> >  8964  pub-338   [002]   217.777662: bprint:   
> > xennet_tx_setup_grant: id=82 ref=1362 offset=2050 len=1006 TSC: 7892179043
> >  8965 ksoftirqd/12-75[012]   255.466914: bprint:   
> > xenvif_tx_build_gops.constprop.0: id=60 ref=1340 offset=2 len=1514 TSC: 
> > 7892179731
> >  8966 ksoftirqd/12-75[012]   255.466915: bprint:   
> > xenvif_tx_build_gops.constprop.0: id=82 ref=1362 offset=2050 len=1006 TSC: 
> > 7892179761
> >  8967  pub-338   [002]   217.778057: bprint:   
> > xennet_tx_setup_grant: id=60 ref=1340 offset=2050 len=1514 TSC: 7892188930
> >  8968  pub-338   [002]   217.778072: bprint:   
> > xennet_tx_setup_grant: id=53 ref=1333 offset=2 len=1514 TSC: 7892189293
> >  8969   containerd-2965  [012]   255.467304: bprint:   
> > xenvif_tx_build_gops.constprop.0: id=60 ref=1340 offset=2050 len=1514 TSC: 
> > 7892189479
> >  8970   containerd-2965  [012]   255.467306: bprint:   
> > xenvif_tx_build_gops.constprop.0: id=53 ref=1333 offset=2 len=1514 TSC: 
> > 7892189533
> 
> I am having difficulty following the messages. Are the two points [a]
> and [b] as described in the previous email shown here?

No, the [b] point is about the trace point in Xen hypvervisor.  

For easier review I combined the trace log from Xen dom0 and dom0
Linux kernels, in above log, the function xennet_tx_setup_grant() is
traced from domU Linux kernel and the function
xenvif_tx_build_gops.constprop.0 from dom0 Linux kernel and I sorted
the logs with TSC (Arm arch timer physical counter).

Sorry for confusion and I didn't explain clearly.

> > So the xennet (Xen net forend driver) and xenvif (net backend driver)
> > work in parallel.  Please note, I didn't see networking performance
> > improvement after changed to use NULL scheduler.
> > 
> > Now I will compare the duration for two directions, one direction is
> > sending data from xennet to xenvif, and another is the reversed
> > direction.  It's very likely the two directions have significant
> > difference for sending data with grant tables, you could see in above
> > log, it takes 20~30us to transmit a data block (we can use the id
> > number and grant table's ref number to match the data block in xennet
> > driver and xenvif driver).
> > 
> > > Now regarding the results, I can see the timestamp 3842008681 for
> > > xennet_notify_tx_irq, 3842008885 for vgic_inject_irq, and 3842008935 for
> > > vcpu_kick. Where is the corresponding TSC for the domain receiving the
> > > notification?
> > > 
> > > Also for the other case, starting at 3842016505, can you please
> > > highlight the timestamp for vgic_inject_irq, vcpu_kick, and also the one
> > > for the domain receiving the notification?
> > > 
> > > The most interesting timestamps would be the timestamp for vcpu_kick in
> > > "notification sending domain" [a], the timestamp for receiving the
> > > interrupt in the Xen on pCPU for the "notification receiving domain"
> > > [b], and the timestamp for the "notification receiving domain" getting
> > > the notification [c].
> > > 
> > > If really context switch is the issue, then the interesting latency
> > > would be between [a] and [b].
> > 
> > Understand.  I agree that I didn't move into more details, the main
> > reason is Xen dmesg buffer is fragile after adding more logs, e.g.
> > after I added log in the function gicv3_send_sgi(), Xen will stuck
> > during the booting phase, and after adding logs in
> > leave_hypervisor_to_guest() it will introduce huge logs (so I need to
> > only trace for first 16 CPUs to mitigate log flood).
> > 
> > I think it would be better to enable xentrace for my profiling at my
> > side.  If I have any further data, will share back.
> 
> Looking forwa

Re: Issue: Networking performance in Xen VM on Arm64

2022-10-24 Thread Leo Yan
On Fri, Oct 21, 2022 at 02:14:04PM -0700, Stefano Stabellini wrote:

[...]

> > > # domU side
> > > xen/arch/arm/vgic-v2.c:vgic_v2_to_sgi
> > > xen/arch/arm/vgic.c:vgic_to_sgi
> > > xen/arch/arm/vgic.c:vgic_inject_irq
> > > xen/arch/arm/vgic.c:vcpu_kick
> > > xen/arch/arm/gic-v2.c:gicv2_send_SGI
> > > 
> > > # dom0 side
> > > xen/arch/arm/gic.c:do_sgi
> > > xen/arch/arm/traps.c:leave_hypervisor_to_guest
> > > 
> > > It would be good to understand why sometimes it takes ~10us and some
> > > other times it takes ~540us
> > 
> > Some updates for why it takes several hundreds us for Xen backend driver
> > to respond interrupt.  The short answer is the vcpu running Xen backend
> > driver needs to switch context, even I have set options "sched=null
> > vwfi=native" in Xen command line.
> > 
> > So please see below detailed logs for how the things happen.
> > 
> > Let's take the timestamp 3842008681 as the start point, it's the time
> > for Xen backend driver sending out notification (xennet_notify_tx_irq);
> > at the timestamp 3842008885 the Xen hypervisor injects the interrupt
> > (it's about ~8us duration from the start point).
> > 
> > And then at the timestamp 3842008935 it invokes vcpu_kick() to kick the
> > virtual CPU for running Xen forend driver, you could see
> > VCPU_PROCESSOR is 11 and VCPU_ID is 9 for dom0, the duration is
> > 10.16us from the start point.
> > 
> > The key point is at this point the vcpu's is_running is 0, this is
> > different from the case without long latency which vcpu's is_running
> > is 1.  IIUC, Xen hypervisor needs to take time to restore the vcpu's
> > context, thus we can see the virtual CPU 9 in Dom0 starts to run at
> > the timestamp 3842016505.
> 
> is_running should be always 1 with the NULL scheduler and vwfi=native.
> That is because VMs are never descheduled. Please double-check.
> 
> If you are really running with the NULL scheduler, then I would
> investigate why the vCPU has is_running == 0 because it should not
> happen.

Correct for this: it's my bad that I didn't really enable NULL scheduler
in my code base.  After I enabled NULL scheduler, the latency by context
switching is dismissed.

 8963  pub-338   [002]   217.777652: bprint:   
xennet_tx_setup_grant: id=60 ref=1340 offset=2 len=1514 TSC: 7892178799
 8964  pub-338   [002]   217.777662: bprint:   
xennet_tx_setup_grant: id=82 ref=1362 offset=2050 len=1006 TSC: 7892179043
 8965 ksoftirqd/12-75[012]   255.466914: bprint:   
xenvif_tx_build_gops.constprop.0: id=60 ref=1340 offset=2 len=1514 TSC: 
7892179731
 8966 ksoftirqd/12-75[012]   255.466915: bprint:   
xenvif_tx_build_gops.constprop.0: id=82 ref=1362 offset=2050 len=1006 TSC: 
7892179761
 8967  pub-338   [002]   217.778057: bprint:   
xennet_tx_setup_grant: id=60 ref=1340 offset=2050 len=1514 TSC: 7892188930
 8968  pub-338   [002]   217.778072: bprint:   
xennet_tx_setup_grant: id=53 ref=1333 offset=2 len=1514 TSC: 7892189293
 8969   containerd-2965  [012]   255.467304: bprint:   
xenvif_tx_build_gops.constprop.0: id=60 ref=1340 offset=2050 len=1514 TSC: 
7892189479
 8970   containerd-2965  [012]   255.467306: bprint:   
xenvif_tx_build_gops.constprop.0: id=53 ref=1333 offset=2 len=1514 TSC: 
7892189533

So the xennet (Xen net forend driver) and xenvif (net backend driver)
work in parallel.  Please note, I didn't see networking performance
improvement after changed to use NULL scheduler.

Now I will compare the duration for two directions, one direction is
sending data from xennet to xenvif, and another is the reversed
direction.  It's very likely the two directions have significant
difference for sending data with grant tables, you could see in above
log, it takes 20~30us to transmit a data block (we can use the id
number and grant table's ref number to match the data block in xennet
driver and xenvif driver).

> Now regarding the results, I can see the timestamp 3842008681 for
> xennet_notify_tx_irq, 3842008885 for vgic_inject_irq, and 3842008935 for
> vcpu_kick. Where is the corresponding TSC for the domain receiving the
> notification?
> 
> Also for the other case, starting at 3842016505, can you please
> highlight the timestamp for vgic_inject_irq, vcpu_kick, and also the one
> for the domain receiving the notification?
> 
> The most interesting timestamps would be the timestamp for vcpu_kick in
> "notification sending domain" [a], the timestamp for receiving the
> interrupt in the Xen on pCPU for the "notification receiving domain"
> [b], and the timestamp for the "notification receiving domain" getting
> the notification [c].
> 
> If really context switch is the issue, then the interesting latency
> would be between [a] and [b].

Understand.  I agree that I didn't move into more details, the main
reason is Xen dmesg buffer is fragile after adding more logs, e.g.
after I added log 

Re: Issue: Networking performance in Xen VM on Arm64

2022-10-21 Thread Leo Yan
Hi Stefano and all,

On Mon, Oct 17, 2022 at 04:50:05PM -0700, Stefano Stabellini wrote:

[...]

> > We can see DomU sends notification with timestamp (raw counter) is
> > 4989078592 and Dom0 receives the interrupt with timestamp 4989092169.
> > Since Dom0 and DomU use the same time counter and the counter
> > frequency is 25MHz, so we can get the delta value (in macroseconds):
> > 
> > (4989092169 - 4989078592) / 2500 * 1000 * 1000
> >   = 543us
> > 
> > Which means it takes 543us to let Dom0 to receive the notification.
> > You could see DomU runs in CPU3 and Dom0 runs on CPU13, there should
> > not have contention for CPU resources.  Seems to me, it's likely Xen
> > hypervisor takes long time to deliver the interrupt, note, it's not
> > take so long time for every skb transferring, sometimes the time for
> > response a notification is short (about ~10us).
> 
> Good find. I think this is worth investigating further. Do you have
> vwfi=native in your Xen command line as well?
> 
> After that, I would add printk also in Xen with the timestamp. The event
> channel notification code path is the following:
> 
> # domU side
> xen/arch/arm/vgic-v2.c:vgic_v2_to_sgi
> xen/arch/arm/vgic.c:vgic_to_sgi
> xen/arch/arm/vgic.c:vgic_inject_irq
> xen/arch/arm/vgic.c:vcpu_kick
> xen/arch/arm/gic-v2.c:gicv2_send_SGI
> 
> # dom0 side
> xen/arch/arm/gic.c:do_sgi
> xen/arch/arm/traps.c:leave_hypervisor_to_guest
> 
> It would be good to understand why sometimes it takes ~10us and some
> other times it takes ~540us

Some updates for why it takes several hundreds us for Xen backend driver
to respond interrupt.  The short answer is the vcpu running Xen backend
driver needs to switch context, even I have set options "sched=null
vwfi=native" in Xen command line.

So please see below detailed logs for how the things happen.

Let's take the timestamp 3842008681 as the start point, it's the time
for Xen backend driver sending out notification (xennet_notify_tx_irq);
at the timestamp 3842008885 the Xen hypervisor injects the interrupt
(it's about ~8us duration from the start point).

And then at the timestamp 3842008935 it invokes vcpu_kick() to kick the
virtual CPU for running Xen forend driver, you could see
VCPU_PROCESSOR is 11 and VCPU_ID is 9 for dom0, the duration is
10.16us from the start point.

The key point is at this point the vcpu's is_running is 0, this is
different from the case without long latency which vcpu's is_running
is 1.  IIUC, Xen hypervisor needs to take time to restore the vcpu's
context, thus we can see the virtual CPU 9 in Dom0 starts to run at
the timestamp 3842016505.

3842008548  pub-310   [001]67.352980: bprint:   
xennet_start_xmit: xennet_start_xmit: TSC: 3842008548
3842008652  pub-310   [001]67.352984: bprint:   
xennet_tx_setup_grant: id=52 ref=820 offset=2 len=1514 TSC: 3842008652
3842008681  pub-310   [001]67.352985: bprint:   
xennet_start_xmit: xennet_notify_tx_irq: TSC: 3842008681
3842008689 (XEN) leave_hypervisor_to_guest: CPU_ID: 0 TSC: 3842008689
3842008766 (XEN) EVTCHNOP_send: CPU_ID: 2 TSC: 3842008766
3842008885 (XEN) vgic_inject_irq: CPU_ID: 2 TSC: 3842008885
3842008929 (XEN) leave_hypervisor_to_guest: CPU_ID: 14 TSC: 3842008929
3842008935 (XEN) vcpu_kick: VCPU_PROCESSOR: 11 VCPU_ID: 9 is_running 0 TSC: 
3842008935
3842009049 (XEN) leave_hypervisor_to_guest: CPU_ID: 2 TSC: 3842009049
3842009322  pub-310   [001]67.353011: bprint:   
xennet_start_xmit: xennet_start_xmit: TSC: 3842009322
3842009374  pub-310   [001]67.353013: bprint:   
xennet_tx_setup_grant: id=12 ref=780 offset=2050 len=1514 TSC: 3842009374
3842009584  pub-310   [001]67.353021: bprint:   
xennet_start_xmit: xennet_start_xmit: TSC: 3842009584
3842009625 (XEN) leave_hypervisor_to_guest: CPU_ID: 15 TSC: 3842009625
3842009633  pub-310   [001]67.353023: bprint:   
xennet_tx_setup_grant: id=83 ref=851 offset=2 len=1514 TSC: 3842009633
3842009853  pub-310   [001]67.353032: bprint:   
xennet_start_xmit: xennet_start_xmit: TSC: 3842009853
3842009899  pub-310   [001]67.353034: bprint:   
xennet_tx_setup_grant: id=5 ref=773 offset=2050 len=1514 TSC: 3842009899
3842010080  pub-310   [001]67.353041: bprint:   
xennet_start_xmit: xennet_start_xmit: TSC: 3842010080
3842010121  pub-310   [001]67.353043: bprint:   
xennet_tx_setup_grant: id=85 ref=853 offset=2 len=1514 TSC: 3842010121
3842010316  pub-310   [001]67.353050: bprint:   
xennet_start_xmit: xennet_start_xmit: TSC: 3842010316
3842010359  pub-310   [001]67.353052: bprint:   
xennet_tx_setup_grant: id=9 ref=777 offset=2050 len=1514 TSC: 3842010359
3842010553  pub-310   [001]67.353060: bprint:

Re: Issue: Networking performance in Xen VM on Arm64

2022-10-18 Thread Leo Yan
On Mon, Oct 17, 2022 at 04:50:05PM -0700, Stefano Stabellini wrote:

[...]

> > Which means it takes 543us to let Dom0 to receive the notification.
> > You could see DomU runs in CPU3 and Dom0 runs on CPU13, there should
> > not have contention for CPU resources.  Seems to me, it's likely Xen
> > hypervisor takes long time to deliver the interrupt, note, it's not
> > take so long time for every skb transferring, sometimes the time for
> > response a notification is short (about ~10us).
> 
> Good find. I think this is worth investigating further. Do you have
> vwfi=native in your Xen command line as well?

Yes, I have added "sched=null" and "vwfi=native" into Xen options:

options=noreboot dom0_mem=4096M bootscrub=0 iommu=on loglvl=error 
guest_loglvl=error sched=null vwfi=native

> After that, I would add printk also in Xen with the timestamp. The event
> channel notification code path is the following:
> 
> # domU side
> xen/arch/arm/vgic-v2.c:vgic_v2_to_sgi
> xen/arch/arm/vgic.c:vgic_to_sgi
> xen/arch/arm/vgic.c:vgic_inject_irq
> xen/arch/arm/vgic.c:vcpu_kick
> xen/arch/arm/gic-v2.c:gicv2_send_SGI
> 
> # dom0 side
> xen/arch/arm/gic.c:do_sgi
> xen/arch/arm/traps.c:leave_hypervisor_to_guest
> 
> It would be good to understand why sometimes it takes ~10us and some
> other times it takes ~540us

Thanks a lot for detailed info.

Just note, in my platform DomU enables GICv3 driver rather than GICv2.
This would be a bit different in the Xen code.  But it should be easy
for me to map to vgic-v3 files.

I have a question for how to trace Xen system.  Outputting chars to UART
is time costy (usually it's millisecond level), it is not friendly to
use console for debugging performance issue.  I searched a bit, either
"xl dmesg" or xentrace can be used for capturing trace logs, one thing
I am not certain is if we can save Xen logs only into log buffer and
doesn't output to UART, so that afterwards we can use "xl dmesg" to
capture the logs.  Could anyone confirm for this is correct usage with
"xl dmesg" or I should use xentrace for this case?

Thanks!

Leo



Re: Issue: Networking performance in Xen VM on Arm64

2022-10-17 Thread Leo Yan
Hi Stefano,

Sorry for late response.  Please see below comments.

On Tue, Oct 11, 2022 at 02:47:00PM -0700, Stefano Stabellini wrote:
> On Tue, 11 Oct 2022, Leo Yan wrote:
> > > > The second question is how to mitigate the long latency when send data
> > > > from DomU.  A possible solution is the Xen network forend driver copies
> > > > skb into mediate (bounce) buffer, just like what does in Xen net
> > > > backend driver with gnttab_batch_copy(), in this way the forend driver
> > > > doesn't need to wait for backend driver response and directly return
> > > > back.
> > > 
> > > About this, I am not super familiar with drivers/net/xen-netfront.c but
> > > I take you are referring to xennet_tx_buf_gc? Is that the function that
> > > is causing xennet_start_xmit to wait?
> > 
> > No.  We can take the whole flow in xen-netfront.c as:
> > 
> >   xennet_start_xmit()
> >  --> notify Xen Dom0 to process skb
> >  <-  Dom0 copies skb and notify back to DomU
> >   xennet_tx_buf_gc()
> >   softirq/NET_TX : __kfree_skb()
> 
> Let me premise again that I am not an expert in PV netfront/netback.
> However, I think the above is only true if DomU and Dom0 are running on
> the same physical CPU. If you use sched=null as I suggested above,
> you'll get domU and dom0 running at the same time on different physical
> CPUs and the workflow doesn't work as described.
> 
> It should be:
> 
> CPU1: xennet_start_xmit() ||  CPU2: doing something else
> CPU1: notify Xen Dom0 to process skb  ||  CPU2: receive notification
> CPU1: return from xennet_start_xmit() ||  CPU2: Dom0 copies skb
> CPU1: do something else   ||  CPU2: notify back to DomU
> CPU1: receive irq, xennet_tx_buf_gc() ||  CPU2: do something else

Yes, I agree this is ideal case.  I tried to set option "sched=null" but
I can observe the latency in the second step when CPU1 notify Xen Dom0,
Dom0 takes 500us+ to receive the notification.

Please see below detailed log:

DomU log:

4989078512  pub-321   [003]   101.150966: bprint:   
xennet_start_xmit: xennet_start_xmit: TSC: 4989078512
4989078573  pub-321   [003]   101.150968: bprint:   
xennet_tx_setup_grant: id=24 ref=1816 offset=2 len=1514 TSC: 4989078573
4989078592  pub-321   [003]   101.150969: bprint:   
xennet_start_xmit: xennet_notify_tx_irq: TSC: 4989078592

Dom0 log:

4989092169   -0 [013]   140.121667: bprint:   
xenvif_tx_interrupt: xenvif_tx_interrupt: TSC: 4989092169
4989092331   -0 [013]   140.121673: bprint:   
xenvif_tx_build_gops.constprop.0: id=24 ref=1816 offset=2 len=1514 TSC: 
4989092331

We can see DomU sends notification with timestamp (raw counter) is
4989078592 and Dom0 receives the interrupt with timestamp 4989092169.
Since Dom0 and DomU use the same time counter and the counter
frequency is 25MHz, so we can get the delta value (in macroseconds):

(4989092169 - 4989078592) / 2500 * 1000 * 1000
  = 543us

Which means it takes 543us to let Dom0 to receive the notification.
You could see DomU runs in CPU3 and Dom0 runs on CPU13, there should
not have contention for CPU resources.  Seems to me, it's likely Xen
hypervisor takes long time to deliver the interrupt, note, it's not
take so long time for every skb transferring, sometimes the time for
response a notification is short (about ~10us).

> > > I didn't think that waiting for the backend is actually required. I
> > > mean, in theory xennet_start_xmit could return without calling
> > > xennet_tx_buf_gc, it is just an optimization. But I am not sure about
> > > this.
> > 
> > The function xennet_start_xmit() will not wait and directly return
> > back, but if we review the whole flow we can see the skb is freed until
> > the softirq NET_TX.
> 
> Is it an issue that the skb is not freed until later? Is that affecting
> the latency results? It shouldn't, right?

I did an extra experiment in Xen net forend driver, I enabled the flag
"info->bounce = true" so the forend driver will use bounce buffer to
store data and release the skb immediately to network core layer.

The throughput can be boosted significantly for this: the netperf
result can be improved from 107.73 Mbits/s to 300+ Mbits/s.

> What matters is when dom0 is
> getting those packets on the physical network interface and that happens
> before the skb is freed. I am just trying to figure out if we are
> focusing on the right problem.

Good point.  I agree that releasing skb earlier only can benefit for
throughput, but we still cannot resolve the latency issue i

Re: Issue: Networking performance in Xen VM on Arm64

2022-10-11 Thread Leo Yan
Hi Stefano,

On Mon, Oct 10, 2022 at 04:50:46PM -0700, Stefano Stabellini wrote:
> +Xen/Linux maintainers

Thanks for reviewing.

[...]

> >   Throughput result:
> > 
> > Profile netperf (Mbits/sec)ddsperf (Mbits/sec)
> > Xen-Dom0939.41 > 620
> > Xen-DomU107.73 4~12
> > 
> >   Latency result:
> > 
> > Profile ddsperf's max ping/pong latency (us)
> > Xen-Dom0200 ~ 1400
> > Xen-DomU> 60,000
> > 
> > ## Analysis
> > 
> > The critical thing for the performance is low level network driver if
> > it uses synchronous or asynchronous mode for skb transferring.
> > 
> > When we transfer data from my x86 machine to Xen DomU, the data flow is:
> > 
> >   bridge -> xenif (Xen network backend driver)   => Dom0
> >   `> xennet driver (Xen net forend driver)   => DomU
> > 
> > In this flow, Xen network backend driver (in Dom0) copies skb into the
> > mediate buffer (gnttab_batch_copy()) and notify Xen VM by sending rx
> > irq, the key point is the backend driver doesn't wait for Xen VM to
> > process the skb and directly return to user space, therefore, Xen Dom0
> > and DomU work in asynchronous mode in this case (Dom0 doesn't need to
> > wait for DomU), the duration for handling a skb is 30+ us.
> > 
> > Conversely, if transmit data from Xen DomU, the flow is:
> > 
> >DomU|   Dom0
> >   -+
> >   xennet driver receives skb   |
> > `> send tx interrupt to Dom0   |
> >|  xenif respond tx interrupt
> >|  Copy skb into mediate buffer
> >|  Notify DomU (send tx irq)
> >   xennet driver handle tx irq  |
> >   free skb |
> > 
> > So we can see when DomU sends out packets, it needs to wait for Dom0 to
> > process the packets, until Dom0 notifies DomU that packet has been
> > processed the net forend driver in DomU releases skb.
> > 
> > This means it's a long way to process skbs: Xen DomU and Dom0 work
> > in synchronous mode, the forend driver in DomU sends out skb and
> > notifies Dom0, Dom0 handles skb and notifies back to DomU, finally DomU
> > knows the skb has been processed and releases it.  The duration between
> > sendind and releasing a skb is about 180+ us.
> 
> 180us is not great but above you wrote > 60,000 us. In what ways the two
> measurements differ?

108us is measured by using ftrace events 'net_dev_queue' and
'kfree_skb'.

60,000us is the measured maximum latency via ddsperf ping & pong [1],
the latency is not only caused by low level driver and hypervisor, it
also contains the latency caused by the test program itself.

AFAIK, ddsperf monitors traffic from userspace and throttle data
transferring when it detects tx rate is low, I think this is one main
reason for the huge latency (60ms+) from ddsperf.

[1] 
https://cyclonedds.io/docs/cyclonedds/latest/getting_started.html#measuring-latency

> > ## Questions
> > 
> > Given Xen network driver has been merged in Linux kernel 2.6 (back in
> > 2007), it's very unlikely I am the first person to observe this issue.
> > 
> > I think this is a common issue and not specific to Arm64 arch, the
> > reason is the long latency is mainly caused by Xen networking driver
> > and I did't see the Xen context switching on Arm64 is abnormal (I saw
> > it takes ~10us for context switching between Xen domains).
>  
> Context switching between domains shouldn't come into the picture. For a
> latency measurement like that I would make sure to:
> 
> - use the null scheduler, sched=null
> - use vwfi=native
> 
> This way, we can be sure both domains are running and there are no
> context switches. It should lead to the best latency measurements. Also
> this is the configuration we use by default at Xilinx.

Okay, will try these two options at my side and share back the testing
result.

> > Could anyone confirm if this is a known issue?
> 
> This is not something that was discussed recently as far as I know.

It is a bit worried me if I am the first one to report this issue.

Either it's because this is a specific issue for Arm64 since Xen is
much less deployed on Arm64 than x86 platforms; or I must miss something
for causing the poor network performance.

> > The second question is how to mitigate the long latency when send data
> > from DomU.  A possible solution is the Xen network forend driver copies
> > skb into mediate (bounce) buffer, just like what does in Xen net
> > backend driver with gnttab_batch_copy(), in this way the forend driver
> > doesn't need to wait for backend driver response and directly return
> > back.
> 
> About this, I am not super familiar with drivers/net/xen-netfront.c but
> I take you are referring to xennet_tx_buf_gc? Is that the function that
> is causing xennet_start_xmit to wait?

No.  We can take th

Issue: Networking performance in Xen VM on Arm64

2022-10-10 Thread Leo Yan
Hi there,

I tested the networking performance on my Arm64 platform in Xen
virtual machine, below I will try to give out summary for testing
result and share some analysis, at the end I want to check a bit
from the community and get suggestion before I can proceed.

First of all, if you want to know more details for the profiling, you
could access the slides:
https://docs.google.com/presentation/d/1iTQRx8-UYnm19eU6CnVUSaAodKZ0JuRiHYaXBomfu3E/edit?usp=sharing

## Testing summary

The TL;DR is that I used two tools: netperf and ddsperf to test the
networking latency and throughput for Xen Dom0 and DomU, the below
result shows the performance for sending data from a Xen domain (Dom0
or DomU) to my x86 PC respectively, and performance is poor when
transmit data from Xen DomU (Note, I used the default networking
bridge configuration when launch Xen VM).

  Throughput result:

Profile netperf (Mbits/sec)ddsperf (Mbits/sec)
Xen-Dom0939.41 > 620
Xen-DomU107.73 4~12

  Latency result:

Profile ddsperf's max ping/pong latency (us)
Xen-Dom0200 ~ 1400
Xen-DomU> 60,000

## Analysis

The critical thing for the performance is low level network driver if
it uses synchronous or asynchronous mode for skb transferring.

When we transfer data from my x86 machine to Xen DomU, the data flow is:

  bridge -> xenif (Xen network backend driver)   => Dom0
  `> xennet driver (Xen net forend driver)   => DomU

In this flow, Xen network backend driver (in Dom0) copies skb into the
mediate buffer (gnttab_batch_copy()) and notify Xen VM by sending rx
irq, the key point is the backend driver doesn't wait for Xen VM to
process the skb and directly return to user space, therefore, Xen Dom0
and DomU work in asynchronous mode in this case (Dom0 doesn't need to
wait for DomU), the duration for handling a skb is 30+ us.

Conversely, if transmit data from Xen DomU, the flow is:

   DomU|   Dom0
  -+
  xennet driver receives skb   |
`> send tx interrupt to Dom0   |
   |  xenif respond tx interrupt
   |  Copy skb into mediate buffer
   |  Notify DomU (send tx irq)
  xennet driver handle tx irq  |
  free skb |

So we can see when DomU sends out packets, it needs to wait for Dom0 to
process the packets, until Dom0 notifies DomU that packet has been
processed the net forend driver in DomU releases skb.

This means it's a long way to process skbs: Xen DomU and Dom0 work
in synchronous mode, the forend driver in DomU sends out skb and
notifies Dom0, Dom0 handles skb and notifies back to DomU, finally DomU
knows the skb has been processed and releases it.  The duration between
sendind and releasing a skb is about 180+ us.

## Questions

Given Xen network driver has been merged in Linux kernel 2.6 (back in
2007), it's very unlikely I am the first person to observe this issue.

I think this is a common issue and not specific to Arm64 arch, the
reason is the long latency is mainly caused by Xen networking driver
and I did't see the Xen context switching on Arm64 is abnormal (I saw
it takes ~10us for context switching between Xen domains).

Could anyone confirm if this is a known issue?
 
The second question is how to mitigate the long latency when send data
from DomU.  A possible solution is the Xen network forend driver copies
skb into mediate (bounce) buffer, just like what does in Xen net
backend driver with gnttab_batch_copy(), in this way the forend driver
doesn't need to wait for backend driver response and directly return
back.  But here I am not clear for the mechanism for Xen grant table,
especially if the Xen grant table is only writtable from Dom0, then it
would be hard for us to optimize the forend driver in DomU by directly
copying skb into the grant table.  Any thoughts for this?

Welcome any suggestion and comments.  Thanks!

Leo



Re: [PATCH] xen: Add macro for version number string

2022-09-07 Thread Leo Yan
On Wed, Sep 07, 2022 at 02:34:25PM +0200, Jan Beulich wrote:
> On 07.09.2022 14:20, Bertrand Marquis wrote:
> > Hi Leo,
> > 
> > Thanks a lot for the quick handling here.
> > 
> >> On 7 Sep 2022, at 13:04, Leo Yan  wrote:
> >>
> >> On Arm64 Linux kernel prints log for Xen version number:
> >>
> >>  Xen XEN_VERSION.XEN_SUBVERSION support found
> >>
> >> The header file "xen/compile.h" is missed so that XEN_VERSION and
> >> XEN_SUBVERSION are not defined, __stringify() wrongly converts them as
> >> strings and concatenate to string "XEN_VERSION.XEN_SUBVERSION".
> >>
> >> This patch introduces a string macro XEN_VERSION_STRING, we can directly
> >> use it as version number string, as a result it drops to use of
> >> __stringify() to make the code more readable.
> >>
> >> The change has been tested on Ampere AVA Arm64 platform.
> >>
> >> Fixes: 5d797ee199b3 ("xen/arm: split domain_build.c")
> >> Suggested-by: Bertrand Marquis 
> >> Signed-off-by: Leo Yan 
> > Reviewed-by: Bertrand Marquis 
> > 
> > Regarding the change suggested by Jan to add spaces, I think it is a
> > good idea so if the commiter agrees to do on it on commit please do,
> > otherwise we can keep this as is.
> 
> If I end up committing this, I'd be happy to add the blanks, and therefore
> I'm inclined to say no need for a re-send.

For easier for maintainers, have sent patch v2 with adding blank and
adding review tags.  Thanks!

Leo



[PATCH v2] xen: Add macro for version number string

2022-09-07 Thread Leo Yan
On Arm64 Linux kernel prints log for Xen version number:

  Xen XEN_VERSION.XEN_SUBVERSION support found

The header file "xen/compile.h" is missed so that XEN_VERSION and
XEN_SUBVERSION are not defined, __stringify() wrongly converts them as
strings and concatenate to string "XEN_VERSION.XEN_SUBVERSION".

This patch introduces a string macro XEN_VERSION_STRING, we can directly
use it as version number string, as a result it drops to use of
__stringify() to make the code more readable.

The change has been tested on Ampere AVA Arm64 platform.

Fixes: 5d797ee199b3 ("xen/arm: split domain_build.c")
Suggested-by: Bertrand Marquis 
Signed-off-by: Leo Yan 
Reviewed-by: Bertrand Marquis 
Reviewed-by: Jan Beulich 
---

Changes from v1:
Added blanks around XEN_VERSION_STRING (Jan);
Added Bertrand's and Jan's reviewed tags.

 xen/arch/arm/acpi/domain_build.c | 3 ++-
 xen/arch/arm/domain_build.c  | 2 +-
 xen/common/efi/boot.c| 4 ++--
 xen/include/xen/compile.h.in | 1 +
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/acpi/domain_build.c b/xen/arch/arm/acpi/domain_build.c
index bbdc90f92c..ed824c0178 100644
--- a/xen/arch/arm/acpi/domain_build.c
+++ b/xen/arch/arm/acpi/domain_build.c
@@ -9,6 +9,7 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -91,7 +92,7 @@ static int __init acpi_make_hypervisor_node(const struct 
kernel_info *kinfo,
 struct membank tbl_add[])
 {
 const char compat[] =
-"xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
+"xen,xen-" XEN_VERSION_STRING "\0"
 "xen,xen";
 int res;
 /* Convenience alias */
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 3fd1186b53..d242c542c6 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1367,7 +1367,7 @@ static int __init make_hypervisor_node(struct domain *d,
int addrcells, int sizecells)
 {
 const char compat[] =
-"xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
+"xen,xen-" XEN_VERSION_STRING "\0"
 "xen,xen";
 __be32 *reg, *cells;
 gic_interrupt_t intr;
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index a5b2d6ddb8..db0340c8e2 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -1341,8 +1341,8 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
*SystemTable)
 efi_console_set_mode();
 }
 
-PrintStr(L"Xen " __stringify(XEN_VERSION) "." __stringify(XEN_SUBVERSION)
- XEN_EXTRAVERSION " (c/s " XEN_CHANGESET ") EFI loader\r\n");
+PrintStr(L"Xen " XEN_VERSION_STRING XEN_EXTRAVERSION
+" (c/s " XEN_CHANGESET ") EFI loader\r\n");
 
 efi_arch_relocate_image(0);
 
diff --git a/xen/include/xen/compile.h.in b/xen/include/xen/compile.h.in
index 440ecb25c1..3151d1e7d1 100644
--- a/xen/include/xen/compile.h.in
+++ b/xen/include/xen/compile.h.in
@@ -7,6 +7,7 @@
 
 #define XEN_VERSION@@version@@
 #define XEN_SUBVERSION @@subversion@@
+#define XEN_VERSION_STRING "@@version@@.@@subversion@@"
 #define XEN_EXTRAVERSION   "@@extraversion@@"
 
 #define XEN_CHANGESET  "@@changeset@@"
-- 
2.34.1




Re: [PATCH] xen: Add macro for version number string

2022-09-07 Thread Leo Yan
On Wed, Sep 07, 2022 at 02:31:52PM +0200, Jan Beulich wrote:
> On 07.09.2022 14:28, Leo Yan wrote:
> > A question, since commit 5d797ee199b3 was merged in 4.11.0-rc6, for
> > fixing it, should I explictly add backport tag as below?
> > 
> >   Backport: 4.11+
> 
> That's up to you, I would say. We don't really use that tag all that much,
> the Fixes: tag is more relevant at least based on recent observations.

Thanks a lot for confirmation.  If so, I will just use Fixes tag.

Leo



Re: [PATCH] xen: Add macro for version number string

2022-09-07 Thread Leo Yan
Hi Jan,

On Wed, Sep 07, 2022 at 02:12:25PM +0200, Jan Beulich wrote:
> On 07.09.2022 14:04, Leo Yan wrote:
> > On Arm64 Linux kernel prints log for Xen version number:
> > 
> >   Xen XEN_VERSION.XEN_SUBVERSION support found
> > 
> > The header file "xen/compile.h" is missed so that XEN_VERSION and
> > XEN_SUBVERSION are not defined, __stringify() wrongly converts them as
> > strings and concatenate to string "XEN_VERSION.XEN_SUBVERSION".
> > 
> > This patch introduces a string macro XEN_VERSION_STRING, we can directly
> > use it as version number string, as a result it drops to use of
> > __stringify() to make the code more readable.
> > 
> > The change has been tested on Ampere AVA Arm64 platform.
> > 
> > Fixes: 5d797ee199b3 ("xen/arm: split domain_build.c")
> > Suggested-by: Bertrand Marquis 
> > Signed-off-by: Leo Yan 
> 
> Reviewed-by: Jan Beulich 
> with perhaps a small adjustment (but it'll be the Arm maintainers to judge):
> 
> > @@ -91,7 +92,7 @@ static int __init acpi_make_hypervisor_node(const struct 
> > kernel_info *kinfo,
> >  struct membank tbl_add[])
> >  {
> >  const char compat[] =
> > -
> > "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
> > +"xen,xen-"XEN_VERSION_STRING"\0"
> 
> I think readability would benefit here from adding blanks around
> XEN_VERSION_STRING here and ...

Agree that adding blanks is better.  Will do.

> 
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -1367,7 +1367,7 @@ static int __init make_hypervisor_node(struct domain 
> > *d,
> > int addrcells, int sizecells)
> >  {
> >  const char compat[] =
> > -
> > "xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
> > +"xen,xen-"XEN_VERSION_STRING"\0"
> 
> ... here (as an aside I wonder why these variables aren't static
> __initconst), just like ...

Will add blanks.

> > --- a/xen/common/efi/boot.c
> > +++ b/xen/common/efi/boot.c
> > @@ -1341,8 +1341,8 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
> > *SystemTable)
> >  efi_console_set_mode();
> >  }
> >  
> > -PrintStr(L"Xen " __stringify(XEN_VERSION) "." 
> > __stringify(XEN_SUBVERSION)
> > - XEN_EXTRAVERSION " (c/s " XEN_CHANGESET ") EFI loader\r\n");
> > +PrintStr(L"Xen " XEN_VERSION_STRING XEN_EXTRAVERSION
> > +" (c/s " XEN_CHANGESET ") EFI loader\r\n");
> 
> ... it is here in particular for XEN_CHANGESET.
> 
> The other general remark I have: Please follow patch submission guidelines
> and send To: the list with maintainers on Cc:.

Ah, just now quickly went through docs/process/sending-patches.pandoc,
thanks for reminding.

A question, since commit 5d797ee199b3 was merged in 4.11.0-rc6, for
fixing it, should I explictly add backport tag as below?

  Backport: 4.11+

Thanks,
Leo



[PATCH] xen: Add macro for version number string

2022-09-07 Thread Leo Yan
On Arm64 Linux kernel prints log for Xen version number:

  Xen XEN_VERSION.XEN_SUBVERSION support found

The header file "xen/compile.h" is missed so that XEN_VERSION and
XEN_SUBVERSION are not defined, __stringify() wrongly converts them as
strings and concatenate to string "XEN_VERSION.XEN_SUBVERSION".

This patch introduces a string macro XEN_VERSION_STRING, we can directly
use it as version number string, as a result it drops to use of
__stringify() to make the code more readable.

The change has been tested on Ampere AVA Arm64 platform.

Fixes: 5d797ee199b3 ("xen/arm: split domain_build.c")
Suggested-by: Bertrand Marquis 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/acpi/domain_build.c | 3 ++-
 xen/arch/arm/domain_build.c  | 2 +-
 xen/common/efi/boot.c| 4 ++--
 xen/include/xen/compile.h.in | 1 +
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/acpi/domain_build.c b/xen/arch/arm/acpi/domain_build.c
index bbdc90f92c..b23c7cad7a 100644
--- a/xen/arch/arm/acpi/domain_build.c
+++ b/xen/arch/arm/acpi/domain_build.c
@@ -9,6 +9,7 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -91,7 +92,7 @@ static int __init acpi_make_hypervisor_node(const struct 
kernel_info *kinfo,
 struct membank tbl_add[])
 {
 const char compat[] =
-"xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
+"xen,xen-"XEN_VERSION_STRING"\0"
 "xen,xen";
 int res;
 /* Convenience alias */
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 3fd1186b53..62602d2b86 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1367,7 +1367,7 @@ static int __init make_hypervisor_node(struct domain *d,
int addrcells, int sizecells)
 {
 const char compat[] =
-"xen,xen-"__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)"\0"
+"xen,xen-"XEN_VERSION_STRING"\0"
 "xen,xen";
 __be32 *reg, *cells;
 gic_interrupt_t intr;
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index a5b2d6ddb8..db0340c8e2 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -1341,8 +1341,8 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
*SystemTable)
 efi_console_set_mode();
 }
 
-PrintStr(L"Xen " __stringify(XEN_VERSION) "." __stringify(XEN_SUBVERSION)
- XEN_EXTRAVERSION " (c/s " XEN_CHANGESET ") EFI loader\r\n");
+PrintStr(L"Xen " XEN_VERSION_STRING XEN_EXTRAVERSION
+" (c/s " XEN_CHANGESET ") EFI loader\r\n");
 
 efi_arch_relocate_image(0);
 
diff --git a/xen/include/xen/compile.h.in b/xen/include/xen/compile.h.in
index 440ecb25c1..3151d1e7d1 100644
--- a/xen/include/xen/compile.h.in
+++ b/xen/include/xen/compile.h.in
@@ -7,6 +7,7 @@
 
 #define XEN_VERSION@@version@@
 #define XEN_SUBVERSION @@subversion@@
+#define XEN_VERSION_STRING "@@version@@.@@subversion@@"
 #define XEN_EXTRAVERSION   "@@extraversion@@"
 
 #define XEN_CHANGESET  "@@changeset@@"
-- 
2.34.1




Re: [PATCH] xen/arm: acpi: Include header file for version number

2022-09-07 Thread Leo Yan
On Wed, Sep 07, 2022 at 11:12:24AM +, Bertrand Marquis wrote:

[...]

> > Just remind, We need to define XEN_VERSION_STRING in compile.h.in rather
> > than in compile.h, something like:
> > 
> >  #define XEN_VERSION_STRING @@version@@.@@subversion@@
> 
> Very true but you will need the quotes here

Yeah.

>  Quotes at beginning and end should not be there.
> >>> 
> >>> I have to admit that I dislike the STR infix. I'd prefer a suffixed 
> >>> variant
> >>> (e.g. XEN_VERSION_STRING) or one omitting "string" altogether, e.g.
> >>> XEN_FULL_VERSION (albeit I see "full" as being potentially ambiguous here,
> >>> since one might expect that to include XEN_EXTRAVERSION as well then).
> >> 
> >> 
> >> Version is a value so here I though it made sense to distinguish that one 
> >> as it is a string representation of it.
> >> 
> >> XEN_VERSION_STRING is ok I think.
> >> 
> >> I generally dislike anything named FULL, EXTRA, BASE or other which are 
> >> just unclear.
> > 
> > XEN_VERSION_STRING is good for me.
> > 
> > Hi Bertrand, just let me know if you prefer to cook your own patch for
> > this (essentially this idea is coming from you) or you want me to
> > follow up for a new patch?  Either way is fine for me.
> 
> Please push a new patch and add:
> Suggested-by: Bertrand Marquis 

Sure, will do.

Thanks all for suggestions.

Leo



Re: [PATCH] xen/arm: acpi: Include header file for version number

2022-09-07 Thread Leo Yan
On Wed, Sep 07, 2022 at 10:05:54AM +, Bertrand Marquis wrote:

[...]

>  I think a define in compile.h using stringify is the easiest solution:
> >>> 
> >>> Ah! I thought you were suggesting to tweak __stringify. This is ...
> >> 
> >> Also possible but a bit more tricky
> >> 
>  #define XEN_STR_VERSION 
>  "__stringify(XEN_VERSION)"."__stringify(XEN_SUBVERSION)”

Just remind, We need to define XEN_VERSION_STRING in compile.h.in rather
than in compile.h, something like:

  #define XEN_VERSION_STRING @@version@@.@@subversion@@

> >> Quotes at beginning and end should not be there.
> > 
> > I have to admit that I dislike the STR infix. I'd prefer a suffixed variant
> > (e.g. XEN_VERSION_STRING) or one omitting "string" altogether, e.g.
> > XEN_FULL_VERSION (albeit I see "full" as being potentially ambiguous here,
> > since one might expect that to include XEN_EXTRAVERSION as well then).
> 
> 
> Version is a value so here I though it made sense to distinguish that one as 
> it is a string representation of it.
> 
> XEN_VERSION_STRING is ok I think.
> 
> I generally dislike anything named FULL, EXTRA, BASE or other which are just 
> unclear.

XEN_VERSION_STRING is good for me.

Hi Bertrand, just let me know if you prefer to cook your own patch for
this (essentially this idea is coming from you) or you want me to
follow up for a new patch?  Either way is fine for me.

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-09-06 Thread Leo Yan
On Tue, Sep 06, 2022 at 08:53:02AM +0100, Marc Zyngier wrote:

[...]

> > Okay, I think have two questions for you:
> > 
> > - The first question is if we really need to reserve persistent memory
> >   for RD pending table and configuration table when Linux kernel runs
> >   in Xen domain?
> 
> I have no idea, and really I don't want to know. The architecture
> doesn't make it safe to reuse that memory, and the driver does the
> right thing by always reserving that memory when the FW is supposed to
> support it.
> 
> The "oh but it is safe on so and so" approach doesn't scale. If you
> want to have such a thing, just convince people at ARM that it is
> possible to implement a GICv3-compliant system without the RD tables,
> get them to update the architecture to allow this scheme and advertise
> it in a discoverable register. Xen could then implement it, Linux
> could check this bit, and we'd all be a happy family.

I agree that my patch is not based on a scale approach, this is also
my concern.

To be honest, convincing Arm GIC team is a bit out of my working scope.
I am working on automative project, when I saw verbose log with bunch of
kernel warnings with Xen, it motivated me to chase down, this is the
main reason I tried to explore some solutions at here.

> Because that's really what this is: it isn't that you don't care about
> the RD tables being reserved. It is that you don't care about them at
> all because they are never used by Xen as the GIC implementation. Your
> approach of "huh, let's not reserve it" just papers over this.

> > - If the first question's answer is no, so it's not necessary to reserve
> >   RD pending table and configuration table for Xen, then what's the good
> >   way to dismiss the kernel oops?
> 
> A warning, not an oops.
> 
> > 
> > IIUC, you consider the general flow from architecture view, so you prefer
> > to ask Xen to implement EFI stub to comply the general flow for EFI
> > booting sequence, right?
> 
> If you want to use ACPI, you use EFI, and not a vague emulation of
> it. If you use DT, you can reserve the memory upfront. The various
> alternatives are in this thread.

Given I proposed two fixes, one is for Xen and another is for kernel,
but both are rejected, so I would like to hold on a bit for this.

Maybe it's a good point for Xen maintainers to review if it needs to
support EFI stub, or welcome any other suggestions.

> > If the conclusion is to change Xen for support EFI stub, then this
> > would be fine for me and I will hold on and leave Xen developers to work
> > on it.
> > 
> > > > [1] 
> > > > https://lore.kernel.org/lkml/20220906024040.503764-1-leo@linaro.org/T/#u
> > > 
> > > I'm totally baffled by the fact you're trying to add some extra hacks
> > > to Linux just to paper over some of the Xen's own issues.
> > 
> > I have a last question for why kernel reserves RD pending table and
> > configuration table for kexec.  As we know, the primary kernel and
> > the secondary kernel use separate memory regions,
> 
> No, you got it wrong. Only with *kdump* do you get separate memory
> regions. kexec reuses all of the memory visible by the primary kernel.

Thanks for correction.

> > this means there have
> > no race condition that secondary kernel modifies the tables whilist the
> > GIC accesses the table if the secondary kernel allocates new pages for
> > RD tables.  So only one potential issue I can image is the secondary
> > kernel sets new RD pending table and configuration table, which might
> > introduce inconsistent issue with rest RDs in the system.
> > 
> > Could you confirm if my understanding is correct or not?
> 
> It isn't correct.
> 
> - There is no race condition. Once the RD tables are configured, they
>   cannot be changed.
> 
> - When the kdump kernel boots, none of the primary OS memory is
>   reused, so it is safe to continue and use the same tables in place
> 
> - When the kexec kernel boots, all of the memory except for the
>   reserved memory is reused. If your RD tables are used for anything,
>   you'll see memory corruption as the GIC writes pending bits in the
>   pending table, and you'll be unable to configure interrupts
>   correctly.

This gives me impression that when do a power cycle, CPUs are reset
but GIC is still alive, so for every booting time the same memory
region should be reserved for RD tables.

To be honest, it's a bit weird for me that if a system cannot reset
CPUs and GIC together.

> In conclusion, using kexec with GICv3 is completely unsafe if you
> don't reserve the memory allocated to the RDs.
> 
> > Sorry for noise and many questions.  I understand this is a complex
> > and difficult topic for me, and it's very likely that I am absent
> > sufficient knowledge for this part, this is just what I want to
> > learn from the discussion and from you :-)
> 
> I suggest you read the architecture spec, which has all the details.

Thank you for many suggestions!  I read a bit for GICv3/v4 arch spec
at last week, b

Re: [PATCH] xen/arm: acpi: Include header file for version number

2022-09-06 Thread Leo Yan
On Tue, Sep 06, 2022 at 01:09:28PM +, Bertrand Marquis wrote:
> Hi Leo,
> 
> > On 6 Sep 2022, at 12:31, Leo Yan  wrote:
> > 
> > On Arm64 Linux kernel prints log for Xen version number:
> > 
> >  [0.00] Xen XEN_VERSION.XEN_SUBVERSION support found
> > 
> > Because the header file "xen/compile.h" is missed, XEN_VERSION and
> > XEN_SUBVERSION are not defined, thus compiler directly uses the
> > string "XEN_VERSION" and "XEN_SUBVERSION" in the compatible string.
> > 
> > This patch includes the header "xen/compile.h" which defines macros for
> > XEN_VERSION and XEN_SUBVERSION, thus Xen can pass the version number via
> > hypervisor node.
> > 
> > Signed-off-by: Leo Yan 
> 
> Very nice finding and side effect from stringify.
> 
> Reviewed-by: Bertrand Marquis 

Thanks for review, Bertrand.



[PATCH] xen/arm: acpi: Include header file for version number

2022-09-06 Thread Leo Yan
On Arm64 Linux kernel prints log for Xen version number:

  [0.00] Xen XEN_VERSION.XEN_SUBVERSION support found

Because the header file "xen/compile.h" is missed, XEN_VERSION and
XEN_SUBVERSION are not defined, thus compiler directly uses the
string "XEN_VERSION" and "XEN_SUBVERSION" in the compatible string.

This patch includes the header "xen/compile.h" which defines macros for
XEN_VERSION and XEN_SUBVERSION, thus Xen can pass the version number via
hypervisor node.

Signed-off-by: Leo Yan 
---
 xen/arch/arm/acpi/domain_build.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/arm/acpi/domain_build.c b/xen/arch/arm/acpi/domain_build.c
index bbdc90f92c..2649e11fd4 100644
--- a/xen/arch/arm/acpi/domain_build.c
+++ b/xen/arch/arm/acpi/domain_build.c
@@ -9,6 +9,7 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
-- 
2.34.1




Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-09-06 Thread Leo Yan
On Tue, Sep 06, 2022 at 03:27:47PM +0800, Leo Yan wrote:
> On Tue, Sep 06, 2022 at 09:22:00AM +0200, Ard Biesheuvel wrote:
> 
> [...]
> 
> > > IIUC, you consider the general flow from architecture view, so you prefer
> > > to ask Xen to implement EFI stub to comply the general flow for EFI
> > > booting sequence, right?
> > >
> > > If the conclusion is to change Xen for support EFI stub, then this
> > > would be fine for me and I will hold on and leave Xen developers to work
> > > on it.
> > >
> > 
> > As I mentioned before, proper EFI boot support in Xen would be nice.
> > *However*, I don't think it makes sense to go through all the trouble
> > of implementing that just to shut up a warning that doesn't affect Xen
> > to begin with.
> 
> Another option is we can set a bit for xen feature, so Linux kernel
> can read out the xen feature and make decision if need to reserve
> memory for RD tables based on the new feature bit.  This is somehow
> a solution is to create a general protocol between Xen and Linux kernel.
> 
> How about you think for this?

Just supplement info.  I tried to set flag EFI_PARAVIRT in Linux
kernel, but kernel cannot boot up successfully on Arm64.  Seems
the Linux kernel will not map memory correctly after settting
this flag.

This is why I didn't move forward with this flag.

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-09-06 Thread Leo Yan
On Tue, Sep 06, 2022 at 09:22:00AM +0200, Ard Biesheuvel wrote:

[...]

> > IIUC, you consider the general flow from architecture view, so you prefer
> > to ask Xen to implement EFI stub to comply the general flow for EFI
> > booting sequence, right?
> >
> > If the conclusion is to change Xen for support EFI stub, then this
> > would be fine for me and I will hold on and leave Xen developers to work
> > on it.
> >
> 
> As I mentioned before, proper EFI boot support in Xen would be nice.
> *However*, I don't think it makes sense to go through all the trouble
> of implementing that just to shut up a warning that doesn't affect Xen
> to begin with.

Another option is we can set a bit for xen feature, so Linux kernel
can read out the xen feature and make decision if need to reserve
memory for RD tables based on the new feature bit.  This is somehow
a solution is to create a general protocol between Xen and Linux kernel.

How about you think for this?

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-09-06 Thread Leo Yan
Hi Marc,

On Tue, Sep 06, 2022 at 07:27:17AM +0100, Marc Zyngier wrote:
> On Tue, 06 Sep 2022 03:52:37 +0100,
> Leo Yan  wrote:
> > 
> > On Thu, Aug 25, 2022 at 10:40:41PM +0800, Leo Yan wrote:
> > 
> > [...]
> > 
> > > > > But here I still cannot create the concept that how GIC RD tables play
> > > > > roles to support the para virtualization or passthrough mode.
> > > >
> > > > I am not sure what you are actually asking. The pending tables are just
> > > > memory you give to the GICv3 to record the state of the interrupts.
> > >
> > > For more specific, Xen has its own RD pending table, and we can use
> > > this pending table to set state for SGI/PPI/LPI for a specific CPU
> > > interface.  Xen works as hypervisor, it saves and restores the pending
> > > table according to switched in VM context, right?
> > >
> > > On the other hand, what's the purpose for Linux kernel's GIC RD
> > > pending table?  Is it only used for nested virtulisation?  I mean if
> > > Linux kernel's GIC RD pending table is not used for the drivers in
> > > Dom0 or DomU, then it's useless to pass it from the primary kernel to
> > > secondary kernel; as result, we don't need to reserve the persistent
> > > memory for the pending table in this case.
> > 
> > I don't receive further confirmation from Marc, anyway, I tried to cook
> > a kernel patch to mute the kernel oops [1].
> 
> What sort of confirmation do you expect from me? None of what you
> write above make much sense in the face of the architecture.

Okay, I think have two questions for you:

- The first question is if we really need to reserve persistent memory
  for RD pending table and configuration table when Linux kernel runs
  in Xen domain?

- If the first question's answer is no, so it's not necessary to reserve
  RD pending table and configuration table for Xen, then what's the good
  way to dismiss the kernel oops?

IIUC, you consider the general flow from architecture view, so you prefer
to ask Xen to implement EFI stub to comply the general flow for EFI
booting sequence, right?

If the conclusion is to change Xen for support EFI stub, then this
would be fine for me and I will hold on and leave Xen developers to work
on it.

> > [1] 
> > https://lore.kernel.org/lkml/20220906024040.503764-1-leo@linaro.org/T/#u
> 
> I'm totally baffled by the fact you're trying to add some extra hacks
> to Linux just to paper over some of the Xen's own issues.

I have a last question for why kernel reserves RD pending table and
configuration table for kexec.  As we know, the primary kernel and
the secondary kernel use separate memory regions, this means there have
no race condition that secondary kernel modifies the tables whilist the
GIC accesses the table if the secondary kernel allocates new pages for
RD tables.  So only one potential issue I can image is the secondary
kernel sets new RD pending table and configuration table, which might
introduce inconsistent issue with rest RDs in the system.

Could you confirm if my understanding is correct or not?

Sorry for noise and many questions.  I understand this is a complex
and difficult topic for me, and it's very likely that I am absent
sufficient knowledge for this part, this is just what I want to
learn from the discussion and from you :-)

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-09-05 Thread Leo Yan
On Thu, Aug 25, 2022 at 10:40:41PM +0800, Leo Yan wrote:

[...]

> > > But here I still cannot create the concept that how GIC RD tables play
> > > roles to support the para virtualization or passthrough mode.
> >
> > I am not sure what you are actually asking. The pending tables are just
> > memory you give to the GICv3 to record the state of the interrupts.
>
> For more specific, Xen has its own RD pending table, and we can use
> this pending table to set state for SGI/PPI/LPI for a specific CPU
> interface.  Xen works as hypervisor, it saves and restores the pending
> table according to switched in VM context, right?
>
> On the other hand, what's the purpose for Linux kernel's GIC RD
> pending table?  Is it only used for nested virtulisation?  I mean if
> Linux kernel's GIC RD pending table is not used for the drivers in
> Dom0 or DomU, then it's useless to pass it from the primary kernel to
> secondary kernel; as result, we don't need to reserve the persistent
> memory for the pending table in this case.

I don't receive further confirmation from Marc, anyway, I tried to cook
a kernel patch to mute the kernel oops [1].

Hope this is not too arbitrary and we can move forward a bit.

Thanks,
Leo

[1] https://lore.kernel.org/lkml/20220906024040.503764-1-leo@linaro.org/T/#u



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-25 Thread Leo Yan
Hi Julien,

On Thu, Aug 25, 2022 at 01:59:06PM +0100, Julien Grall wrote:

[...]

> > Seems to me, to support para virtualization driver model (like virtio),
> > Dom0 needs to provide the device driver backend, and DomUs enables
> > the forend device drivers.  In this case, the most hardware interrupts
> > (SPIs) are routed to Dom0.
> 
> That's correct. Most of the shared interrupts will be routed to dom0.

Thanks for confirmation.

> > To support passthrough driver model (VFIO), Xen needs to configure the
> > hardware GICv3 to directly route hardware interrupt to the virtual CPU
> > interface.
> 
> Do you mean GICv4 rather than GICv3? In the latter, all the interrupts will
> be received in Xen and then routed to the domain by updating the LRs.

Thanks for clarification.

So GICv3 relies on hypervisor to set LR, and VM can use virtural
interface to response (ACK/EOI) the interrupt.  GICv4 can directly
forward the SPI to the CPU virtual interface (without hypervisor's
interfering).

> > But here I still cannot create the concept that how GIC RD tables play
> > roles to support the para virtualization or passthrough mode.
> 
> I am not sure what you are actually asking. The pending tables are just
> memory you give to the GICv3 to record the state of the interrupts.

For more specific, Xen has its own RD pending table, and we can use
this pending table to set state for SGI/PPI/LPI for a specific CPU
interface.  Xen works as hypervisor, it saves and restores the pending
table according to switched in VM context, right?

On the other hand, what's the purpose for Linux kernel's GIC RD
pending table?  Is it only used for nested virtulisation?  I mean if
Linux kernel's GIC RD pending table is not used for the drivers in
Dom0 or DomU, then it's useless to pass it from the primary kernel to
secondary kernel; as result, we don't need to reserve the persistent
memory for the pending table in this case.

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-25 Thread Leo Yan
On Thu, Aug 25, 2022 at 10:07:18AM +0100, Julien Grall wrote:

[...]

> > Xen directly passes ACPI MADT table from UEFI to Linux kernel to Dom0,
> > (see functions acpi_create_madt() and gic_make_hwdom_madt()), which
> > means the Linux kernel Dom0 uses the same ACPI table to initialize GICv3
> > driver, but since Linux kernel Dom0 accesses GIC memory region as IPA,
> > it still trap to Xen in EL2 for stage 2 translation, so finally Xen
> > can emulate the GICv3 device for Dom0.
> 
> In the default setup, dom0 is also the hardware domain. So it owns all of
> the devices but the ones used by Xen (e.g. interrupt controller, SMMU).
> 
> Therefore, dom0 will use the same memory layout as the host. At which point,
> it is a lot more convenient to re-use the host ACPI tables and rewrite only
> what's necessary.

We cannot purely talk about interrupt handling without connecting with
device driver model.

Seems to me, to support para virtualization driver model (like virtio),
Dom0 needs to provide the device driver backend, and DomUs enables
the forend device drivers.  In this case, the most hardware interrupts
(SPIs) are routed to Dom0.

To support passthrough driver model (VFIO), Xen needs to configure the
hardware GICv3 to directly route hardware interrupt to the virtual CPU
interface.

But here I still cannot create the concept that how GIC RD tables play
roles to support the para virtualization or passthrough mode.

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-25 Thread Leo Yan
Hi Julien,

On Thu, Aug 25, 2022 at 10:07:18AM +0100, Julien Grall wrote:

[...]

> > In other words, let's assume the Dom0 kernel panic and its secondary
> > kernel is launched by kexec, is it necessarily for the secondary
> > kernel to reuse the primary kernel's RD pending page?
> 
> No.

If the answer is no, then I think it's feasible to pass the same ACPI
table or DT binding for virtual GICv3 from primary kernel to secondary
kernel, then the second kernel can initialize the VGIC and allocate a
new RD tables (and trap to Xen in EL2 to handle the new allocated RD
tables).  How about you think for this?

Thanks a lot for quick response.

Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-25 Thread Leo Yan
On Fri, Aug 19, 2022 at 01:10:10PM +0100, Marc Zyngier wrote:

[...]

> > > > In the context of Xen, dom0 doesn't have direct access to the host ITS
> > > > because we are emulating it. So I think it doesn't matter for us 
> > > > because we
> > > > can fix our implementation if it is affected.
> > > >
> > > > That said, kexec-ing dom0 (or any other domain) on Xen on Arm would 
> > > > require
> > > > some work to be supported. OOI, @leo is it something you are 
> > > > investigating?
> > >
> > >
> > > Now I am working on automative reference platform; the first thing for
> > > me is to resolve the kernel oops.
> > >
> > > For long term, I think the kexec/kdump would be important to be
> > > supported, to be clear, so far supporting kexec/kdump for Xen/Linux is
> > > not priority for my work.
> > >
> > > Also thanks a lot for Ard and Mark's replying. To be honest, I missed
> > > many prerequisites (e.g. redistributor configurations for GIC in
> > > hypervisor) and seems Xen uses a different way by emulating GICv3
> > > controller for guest OS.  So now I am bit puzzle what's for next step
> > > or just keep as it is?
> > >
> > 
> > If i understand Julien's remark correctly, the dom0 GICv3 is emulated,
> > and so it should not suffer from the issue that we are working around
> > here.

Before proceeding discussion, I would like step back to get clear for
the GIC implementation in Xen, otherwise, it's really hard for me to
catch up the dicussion :)

For me it's clear that Xen emulates GICv3 for DomU, but I am still
confused how GICv3 works for Dom0.

Xen directly passes ACPI MADT table from UEFI to Linux kernel to Dom0,
(see functions acpi_create_madt() and gic_make_hwdom_madt()), which
means the Linux kernel Dom0 uses the same ACPI table to initialize GICv3
driver, but since Linux kernel Dom0 accesses GIC memory region as IPA,
it still trap to Xen in EL2 for stage 2 translation, so finally Xen
can emulate the GICv3 device for Dom0.

This is quite different from DomU.  Xen prepares a DT node for GICv3
rather than directly passing ACPI table, so DomU kernel initializes
GICv3 driver based on the DT binding.

Simply to say, no matter Dom0 using ACPI table or DomU using DT
binding, at the end Xen emulates GICv3 device for all of them.

Another thing is not clear for me is that I can see Xen allocates
redistributor pending page (see gicv3_lpi_set_pendtable()), after Dom0
or DomU kernel boots up, kernel allocates another RD pending page; so
the question is how these two different pending pages co-work
together.

In other words, let's assume the Dom0 kernel panic and its secondary
kernel is launched by kexec, is it necessarily for the secondary
kernel to reuse the primary kernel's RD pending page?  Or in this case
it's no matter for the RD pending page in Dom0 and it's safe for Xen
always maintains its own RD pending page in EL2?

> The problem is that there is no way to distinguish a system that
> suffers from GICR LPI tables being immutable from one that allows the
> LPI configuration being changed (either because the HW allows it or
> because the hypervisor plays other games).

Let me ask a stupid question.  Seems to me, GICR LPI tables can be
configured as below options

- The hypervisor pre-allocates GICR LPI tables and pass the memory
  region info to Dom0 kernel;

- The hypervisor doesn't allocate GICR LPI tables, and then Dom0
  kernel allocates GICR LPI tables for the virtual GICv3, and Dom0
  directly write data to the GICR LPI tables and the table is used by
  physical GICv3;

- The hypervisor pre-allocates GICR LPI tables for phycial GICv3 and
  Dom0 kernel allocates another GICR LPI tables for virtual GICv3,
  and Xen needs to sync between these two tables.

To be clear, all of above three options are hypothesis.  So please
correct me if anything is wrong (or even total are wrong?!).

Thanks a lot for suggestions!

Leo

P.s. sorry for truncating Marc's following comments, just want to
focus on above questions.



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-18 Thread Leo Yan
On Thu, Aug 18, 2022 at 11:04:48AM +0100, Julien Grall wrote:

[...]

> > > Seems it's broken for kdump/kexec if kernel boots with using DT?
> > > 
> > 
> > EFI supports both DT and ACPI boot, but only ACPI *requires* EFI.
> > 
> > So DT boot on hardware with affected GICv3 implementations works fine
> > with kdump/kexec as long as EFI is being used. Using non-EFI boot and
> > kexec on such systems will likely result in a splat on the second
> > kernel, unless there is another mechanism being used to reserve the
> > memory in DT as well.
> > 
> > Maybe we should wire up the EFI_PARAVIRT flag for Xen dom0 on arm64,
> > so that we can filter out the call above. That would mean that
> > Xen/arm64/dom0/kexec on affected hardware would result in a splat in
> > the second kernel as well, but whether that matters depends on your
> > support scope.
>
> In the context of Xen, dom0 doesn't have direct access to the host ITS
> because we are emulating it. So I think it doesn't matter for us because we
> can fix our implementation if it is affected.
> 
> That said, kexec-ing dom0 (or any other domain) on Xen on Arm would require
> some work to be supported. OOI, @leo is it something you are investigating?


Now I am working on automative reference platform; the first thing for
me is to resolve the kernel oops.

For long term, I think the kexec/kdump would be important to be
supported, to be clear, so far supporting kexec/kdump for Xen/Linux is
not priority for my work.

Also thanks a lot for Ard and Mark's replying. To be honest, I missed
many prerequisites (e.g. redistributor configurations for GIC in
hypervisor) and seems Xen uses a different way by emulating GICv3
controller for guest OS.  So now I am bit puzzle what's for next step
or just keep as it is?

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-18 Thread Leo Yan
Hi Ard,

On Wed, Aug 17, 2022 at 03:49:32PM +0200, Ard Biesheuvel wrote:

[...]

> > This header holds ACPI spec defined data structures. This one looks
> > to be a Linux one, and hence shouldn't be defined here. You use it
> > in a single CU only, so I see no reason to define it there.
> >
> > Furthermore - what if Linux decided to change their structure? Or
> > is there a guarantee that they won't?
> 
> No, there is not. The memreserve table is an internal interface
> between the EFI stub loader and the Linux kernel proper.
> 
> As I have argued many times before, booting the arm64 kernel in
> EFI/ACPI mode without going through the EFI stub violates the ACPI
> spec, and relies on interfaces that were not intended for public
> consumption.

Let me ask a question but sorry it might be off topic.

In the Linux kernel function:

  static int gic_reserve_range(phys_addr_t addr, unsigned long size)
  {
  if (efi_enabled(EFI_CONFIG_TABLES))
  return efi_mem_reserve_persistent(addr, size);
  
  return 0;
  }

We can see it will reserve persistent memory region for GIC pending
pages, so my question is if a platform is booting with DT binding
rather than ACPI table, how the primary kernel can reserve the pages
and pass the info to the secondary kernel?

Seems it's broken for kdump/kexec if kernel boots with using DT?

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-18 Thread Leo Yan
Hi Jan,

On Thu, Aug 18, 2022 at 09:47:46AM +0200, Jan Beulich wrote:
> On 18.08.2022 09:34, Leo Yan wrote:
> > On Wed, Aug 17, 2022 at 03:17:53PM +0200, Jan Beulich wrote:
> >> Furthermore - what if Linux decided to change their structure? Or
> >> is there a guarantee that they won't? Generally such structures
> >> belong in the public interface, guaranteeing forward compatibility
> >> even if Linux decided to change / extend theirs (at which point
> >> consuming code there would need to do translation, but maybe using
> >> a Xen-defined struct [plus translation in Linux] right away would
> >> be best).
> > 
> > I saw Ard has helped to answer this question in his email.  As Ard
> > said, the general way is to rely on Linux EFI stub to allocate the
> > data structure for MEMRESERVE configuration table.
> > 
> > Given Xen uses pseudo EFI booting (the ACPI table is passed via DT), in
> > short term I don't think Xen can support Linux EFI stub, so we need to
> > maintain this structure in Xen as well.
> > 
> > This structure eventually will not change frequently, so I assume
> > later we will have little effort for maintainence it.
> 
> "Will not change frequently" isn't enough. Imo there needs to be a
> public interface structure in Xen and translation code in Linux.
> That's the only way the consuming code in Linux can remain (largely)
> independent of changes to the structure in Linux (i.e. changes there
> can be expected to be accompanied by updating of this code, perhaps
> simply in order to keep things building).

Yes, actually Xen doesn't care about the structure definition for
linux_efi_memreserve, it just allocates the table and finally used by
Linux kernel.  So another way is we can simply allocate a bigger
memory region (e.g. 64 bytes), which is sufficient than kernel's
structure linux_efi_memreserve size (only 16 bytes), then we can
initilize it as all zeros, and this can be helpful if later kernel
extend the data structure.

But this method is a bit arbitrary, this is why I did't write like this.
As Julien said, I think the critical thing is to make a call to support
EFI stub or not.  If rollback to use current method, then I am happy to
refine the patch with above idea.

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-18 Thread Leo Yan
Hi Julien,

On Thu, Aug 18, 2022 at 08:57:20AM +0100, Julien Grall wrote:
> Hi Leo,
> 
> On 18/08/2022 08:34, Leo Yan wrote:
> > On Wed, Aug 17, 2022 at 03:17:53PM +0200, Jan Beulich wrote:
> > > Furthermore - what if Linux decided to change their structure? Or
> > > is there a guarantee that they won't? Generally such structures
> > > belong in the public interface, guaranteeing forward compatibility
> > > even if Linux decided to change / extend theirs (at which point
> > > consuming code there would need to do translation, but maybe using
> > > a Xen-defined struct [plus translation in Linux] right away would
> > > be best).
> > 
> > I saw Ard has helped to answer this question in his email.  As Ard
> > said, the general way is to rely on Linux EFI stub to allocate the
> > data structure for MEMRESERVE configuration table.
> > 
> > Given Xen uses pseudo EFI booting (the ACPI table is passed via DT), in
> > short term I don't think Xen can support Linux EFI stub,
> 
> I agree that using Linux EFI stub will require more effort. I would also
> need to go through the mailing list archives (or maybe Stefano remember?) to
> understand why we decided against using the EFI stub.

Yeah, seems to me using the EFI stub is the right thing to do.

I read Xen code and found Xen doesn't provide boot time callback, this
is the main reason blocked me to move forward to that way.

> > so we need to
> > maintain this structure in Xen as well.
> 
> I have looked at the commit message. IIUC, if this table is not created then
> Kexec will not work. Is there anything else that would not work?

No, AFAIK, kexec/kdump is the only customer to use the persistent
reserved memory in the Linux kernel.

I personally think this will pontentially impact other kernel
debugging modules, like ramoops, it also needs to use persistent
reserved memory when rebooting the kernel.  But now kernel code
doesn't add ramoops memory region into EFI MEMRESERVE table.

> IOW, would Linux be unusable if we don't create MEMRESERVE?

If we don't create EFI MEMRESERVE, kernel still can boot up
successfully, though it cannot add pages into the reserved memory
table and reports oops.

Without EFI MEMRESERVE, the reserved memory table cannot be passed from
the primary kernel to the secondary kernel, this is why it's important
for debugging tools.

> > This structure eventually will not change frequently, so I assume
> > later we will have little effort for maintainence it
>
> The problem is not really "maintainance" here. It is more that if Linux is
> updating the structure in a non-backward compatible way, then new
> version of Linux would not boot on older Xen.
> 
> We would have similar probable with new Xen booting older Xen because the
> hypervisor doesn't know (and should not need to know) which OS is used.

Fair point.

> In the nutshells, Xen should only use stable interface. If MEMRESERVE is
> really necessary then we should either switch to the Linux EFI stub or
> standardize MEMRESERVE.

Yeah, here I really need your (and other maintainers) opinions.  Seems
to me, support Linux EFI stub would be the best thing to do, to be
honest, this is a task which is out of my capability :)

Current approach in this patch is not perfect, it lets things to work.
So please let me know the conclusion from your side.

Thanks,
Leo



Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-18 Thread Leo Yan
On Wed, Aug 17, 2022 at 03:17:53PM +0200, Jan Beulich wrote:

[...]

> Please make sure you Cc all maintainers of all files that you touch.
> Albeit, see below, you could indeed have avoided Cc-ing me if you
> hadn't misplaced stuff in two of the headers that you fiddle with.

Sorry for this.  When I send a new patch in next time, I will use
./scripts/get_maintainer.pl to find out the maintainers.

> > --- a/xen/arch/arm/efi/efi-dom0.c
> > +++ b/xen/arch/arm/efi/efi-dom0.c
> > @@ -38,7 +38,7 @@ size_t __init estimate_efi_size(unsigned int mem_nr_banks)
> >  {
> >  size_t size;
> >  size_t est_size = sizeof(EFI_SYSTEM_TABLE);
> > -size_t ect_size = sizeof(EFI_CONFIGURATION_TABLE);
> > +size_t ect_size = sizeof(EFI_CONFIGURATION_TABLE) * 2;
> >  size_t emd_size = sizeof(EFI_MEMORY_DESCRIPTOR);
> >  size_t fw_vendor_size = sizeof(xen_efi_fw_vendor);
> >  unsigned int acpi_mem_nr_banks = 0;
> > @@ -63,7 +63,8 @@ void __init acpi_create_efi_system_table(struct domain *d,
> >  
> >  table_addr = d->arch.efi_acpi_gpa
> >   + acpi_get_table_offset(tbl_add, TBL_EFIT);
> > -table_size = sizeof(EFI_SYSTEM_TABLE) + sizeof(EFI_CONFIGURATION_TABLE)
> > +table_size = sizeof(EFI_SYSTEM_TABLE)
> > ++ sizeof(EFI_CONFIGURATION_TABLE) * 2
> >   + sizeof(xen_efi_fw_vendor);
> 
> Nit: Indentation.

Will fix.

> > @@ -75,7 +76,7 @@ void __init acpi_create_efi_system_table(struct domain *d,
> >  efi_sys_tbl->Hdr.HeaderSize = table_size;
> >  
> >  efi_sys_tbl->FirmwareRevision = 1;
> > -efi_sys_tbl->NumberOfTableEntries = 1;
> > +efi_sys_tbl->NumberOfTableEntries = 2;
> 
> This is the 3rd magic "2" - I think there wants to be some consolidation,
> such that it becomes obvious which "2"-s really are the same (and would
> change together if, like you do here, another entry is needed).

I will define a macro for the number of configuration table and add
comment for it.

> > @@ -86,6 +87,18 @@ void __init acpi_create_efi_system_table(struct domain 
> > *d,
> >  efi_conf_tbl->VendorTable = (VOID *)tbl_add[TBL_RSDP].start;
> >  efi_sys_tbl->ConfigurationTable = (EFI_CONFIGURATION_TABLE 
> > *)(table_addr
> >+ 
> > offset);
> > +
> > +/*
> > + * Configuration table for MEMRESERVE is used in Linux kernel for
> > + * reserving pages, its main purpose is used for kexec/kdump to
> > + * reserve persistent pages (e.g. GIC pending pages) for the secondary
> > + * kernel.
> > + */
> > +offset += sizeof(EFI_CONFIGURATION_TABLE);
> > +efi_conf_tbl = (EFI_CONFIGURATION_TABLE *)(base_ptr + offset);
> > +efi_conf_tbl->VendorGuid = (EFI_GUID)LINUX_EFI_MEMRESERVE_TABLE_GUID;
> > +efi_conf_tbl->VendorTable = (VOID *)tbl_add[TBL_MRSV].start;
> > +
> >  xz_crc32_init();
> >  efi_sys_tbl->Hdr.CRC32 = xz_crc32((uint8_t *)efi_sys_tbl,
> >efi_sys_tbl->Hdr.HeaderSize, 0);
> 
> Rather than adjusting offset and calculating efi_conf_table fdrom scratch,
> perhaps better simply efi_conf_table++? That way there would be one less
> cast, which are always somewhat risky.

Yeah, using "efi_conf_table++" is much better.  Will do.

> > --- a/xen/include/acpi/actbl.h
> > +++ b/xen/include/acpi/actbl.h
> > @@ -302,6 +302,23 @@ struct acpi_table_fadt {
> >  #define ACPI_FADT_HW_REDUCED(1<<20)/* 20: [V5] ACPI 
> > hardware is not implemented (ACPI 5.0) */
> >  #define ACPI_FADT_LOW_POWER_S0  (1<<21)/* 21: [V5] S0 power 
> > savings are equal or better than S3 (ACPI 5.0) */
> >  
> > +/***
> > + *
> > + * MEMRESERVE - Dummy entry for memory reserve configuration table
> > + *
> > + 
> > **/
> > +
> > +struct acpi_table_memreserve {
> > +   int size;   /* allocated size of the array */
> > +   int count;  /* number of entries used */
> > +   u64 next;   /* pa of next struct instance */
> > +   struct {
> > +   u64 base;
> > +   u64 size;
> > +   } entry[];
> > +};
> 
> This header holds ACPI spec defined data structures. This one looks
> to be a Linux one, and hence shouldn't be defined here. You use it
> in a single CU only, so I see no reason to define it there.

Okay, I will define the structure in the arm specific file, e.g. I
move it to the file xen/arch/arm/acpi/domain_build.c.

> Furthermore - what if Linux decided to change their structure? Or
> is there a guarantee that they won't? Generally such structures
> belong in the public interface, guaranteeing forward compatibility
> even if Linux decided to change / extend theirs (at which point
> consuming code there would need to do translation, but maybe using
> a Xen-defined struct [plus translation in Linux] right away would
> be

[PATCH] xen/arm: acpi: Support memory reserve configuration table

2022-08-17 Thread Leo Yan
On Arm64, when boot Dom0 with the Linux kernel, it reports the warning:

[0.403737] [ cut here ]
[0.403738] WARNING: CPU: 30 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:3074 
its_cpu_init+0x814/0xae0
[0.403745] Modules linked in:
[0.403748] CPU: 30 PID: 0 Comm: swapper/30 Tainted: GW 
5.15.23-ampere-lts-standard #1
[0.403752] pstate: 61c5 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[0.403755] pc : its_cpu_init+0x814/0xae0
[0.403758] lr : its_cpu_init+0x810/0xae0
[0.403761] sp : 89c03ce0
[0.403762] x29: 89c03ce0 x28: 001e x27: 880711f43000
[0.403767] x26: 8a3c0070 x25: fc1ffe0a4400 x24: 8a3c
[0.403770] x23: 895bc998 x22: 890a6000 x21: 89850cb0
[0.403774] x20: 89701a10 x19: 89701000 x18: 
[0.403777] x17: 3030303035303031 x16: 3030313030303078 x15: 303a30206e6f6967
[0.403780] x14: 6572206530312072 x13: 3030303030353030 x12: 3130303130303030
[0.403784] x11: 78303a30206e6f69 x10: 6765722065303120 x9 : 8870e710
[0.403788] x8 : 6964657220646e75 x7 : 0003 x6 : 
[0.403791] x5 :  x4 : fc00 x3 : 0010
[0.403794] x2 :  x1 : 0001 x0 : ffed
[0.403798] Call trace:
[0.403799]  its_cpu_init+0x814/0xae0
[0.403802]  gic_starting_cpu+0x48/0x90
[0.403805]  cpuhp_invoke_callback+0x16c/0x5b0
[0.403808]  cpuhp_invoke_callback_range+0x78/0xf0
[0.403811]  notify_cpu_starting+0xbc/0xdc
[0.403814]  secondary_start_kernel+0xe0/0x170
[0.403817]  __secondary_switched+0x94/0x98
[0.403821] ---[ end trace f68728a0d3053b70 ]---

In the Linux kernel, the GIC driver tries to reserve ITS interrupt
table, and the reserved pages can survive for kexec/kdump and will be
used for secondary kernel.  Linux kernel relies on MEMRESERVE EFI
configuration table for memory page , but this table is not supported
by Xen.

This patch adds a MEMRESERVE configuration table into the system table,
it points to a dummy data structure acpi_table_memreserve, this
structure has the consistent definition with the Linux kernel.
Following the method in Xen, it creates a table entry for the structure
acpi_table_memreserve.

Cc: Ard Biesheuvel 
Cc: Marc Zyngier 
Cc: Bertrand Marquis 
Cc: Rahul Singh 
Cc: Peter Griffin 
Signed-off-by: Leo Yan 
---
 xen/arch/arm/acpi/domain_build.c | 24 
 xen/arch/arm/efi/efi-dom0.c  | 19 ---
 xen/arch/arm/include/asm/acpi.h  |  1 +
 xen/include/acpi/actbl.h | 17 +
 xen/include/efi/efiapi.h |  2 ++
 5 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/acpi/domain_build.c b/xen/arch/arm/acpi/domain_build.c
index bbdc90f92c..f7d1935f60 100644
--- a/xen/arch/arm/acpi/domain_build.c
+++ b/xen/arch/arm/acpi/domain_build.c
@@ -242,6 +242,26 @@ static int __init acpi_create_rsdp(struct domain *d, 
struct membank tbl_add[])
 return 0;
 }
 
+static int __init acpi_create_memreserve(struct domain *d, struct membank 
tbl_add[])
+{
+struct acpi_table_memreserve *memreserve = NULL;
+u64 table_size = sizeof(struct acpi_table_memreserve);
+u8 *base_ptr;
+
+base_ptr = d->arch.efi_acpi_table
+   + acpi_get_table_offset(tbl_add, TBL_MRSV);
+memreserve = (struct acpi_table_memreserve *)base_ptr;
+
+memreserve->next = 0;
+memreserve->size = 0;
+memreserve->count = 0;
+
+tbl_add[TBL_MRSV].start = d->arch.efi_acpi_gpa
++ acpi_get_table_offset(tbl_add, TBL_MRSV);
+tbl_add[TBL_MRSV].size = table_size;
+return 0;
+}
+
 static void __init acpi_xsdt_modify_entry(u64 entry[],
   unsigned long entry_count,
   char *signature, u64 addr)
@@ -543,6 +563,10 @@ int __init prepare_acpi(struct domain *d, struct 
kernel_info *kinfo)
 if ( rc != 0 )
 return rc;
 
+rc = acpi_create_memreserve(d, tbl_add);
+if ( rc != 0 )
+return rc;
+
 acpi_map_other_tables(d);
 acpi_create_efi_system_table(d, tbl_add);
 acpi_create_efi_mmap_table(d, &kinfo->mem, tbl_add);
diff --git a/xen/arch/arm/efi/efi-dom0.c b/xen/arch/arm/efi/efi-dom0.c
index aae0f97911..4950f9ac99 100644
--- a/xen/arch/arm/efi/efi-dom0.c
+++ b/xen/arch/arm/efi/efi-dom0.c
@@ -38,7 +38,7 @@ size_t __init estimate_efi_size(unsigned int mem_nr_banks)
 {
 size_t size;
 size_t est_size = sizeof(EFI_SYSTEM_TABLE);
-size_t ect_size = sizeof(EFI_CONFIGURATION_TABLE);
+size_t ect_size = sizeof(EFI_CONFIGURATION_TABLE) * 2;
 size_t emd_size = sizeof(EFI_MEMORY_DESCRIPTOR);
 size_t fw_vendor_size = sizeof(xen_efi_fw_vendor);
 unsigned int acpi_mem_nr_banks = 0;
@@ -63,7 +63,8 @@ void __init acpi_create_e

Re: Question: Enable LINUX_EFI_MEMRESERVE_TABLE_GUID in EFI

2022-08-11 Thread Leo Yan
Hi Bertrand, Rahul,

On Fri, Aug 05, 2022 at 12:05:23PM +, Bertrand Marquis wrote:
> > On 5 Aug 2022, at 12:44, Rahul Singh  wrote:

[...]

> >> Looked into the code, the GICv3 driver tries to create persistent
> >> reservations for pending pages, and the persistent reservation table
> >> can be used by kexec/kdump.  For the persistent reservations, it
> >> relies on MEMRESERVE EFI configuration table, but this table is not
> >> supported by xen.efi, I think this is the reason for the above oops.
> > 
> > Yes, you are right above warning is observed because Xen does not support 
> > memreserve efi table. I also observed a similar warning on the N1SDP board.
> >> 
> >> I checked that if I boot a host Linux (without Xen), then the EFI has
> >> provided MEMRESERVE configuration table, I can get below log:
> >> 
> >> #  dmesg | grep MEMRESERVE
> >> [0.00] efi: TPMFinalLog=0x807f9ef ACPI 2.0=0x807fa0d0018 ... 
> >> MEMRESERVE=0x807f8141e98
> >> 
> >> Just want to confirm, is anyone working on enabling MEMRESERVE EFI
> >> configuration table for Xen?  And welcome any comments and
> >> suggestions!
> >> 
> 
> No I do not think anybody is working on this at the moment.
> If you want to work on adding support for this in Xen, we can provide support
> and help on reviewing and testing as we have several targets on which we
> observe this (N1SDP and Ava).

Thanks for your quick response.

I took time to look into the code, below are my conclusions.

For a normal UEFI boot flow, UEFI will invoke Linux kernel's EFI stub,
and the EFI stub will install MEMRESERVE EFI configuration table.
This is accomplished in the Linux function install_memreserve_table().

Secondly, Xen passes DT to kernel, it synthesizes ACPI compatible
nodes in the device tree and finally kernel parses DT and create the
ACPI table.  In this case, Xen doesn't invoke Linux EFI stub.

To be honest, I have very less knowledge for Xen and APCI; just based on
reading code, I think it's hard for Xen to invoke Linux kernel's EFI
stub, this is because Xen needs to provide the EFI runtime services, and
I don't think it's feasible for Xen to pass through UEFI's runtime
service to Linux kernel.  If we implement the EFI runtime services for
Xen, then this would introduce a big implementation.

So another option is we simply add MEMRESERVE EFI configuration table
into device tree, just like Xen does for other ACPI tables (e.g.
RSDP?).  And then in Linux kernel, we need to parse the DT binding and
initialize the corresponding variables in kernel, so we need to add
support in the Linux source drivers/firmware/efi/fdtparams.c.

Before I proceed, just want to check which option would be the right
way to move forward?  And I am open for any other solution and welcome
suggestions.

Thanks a lot!
Leo



Question: Enable LINUX_EFI_MEMRESERVE_TABLE_GUID in EFI

2022-08-04 Thread Leo Yan
Hi there,

Now I am working on Ampere Altra SoC platform, with Xen (4.16) and Linux
kernel (5.15.23).

I observed a warning is reported by Linux kernel in the booting flow:

[0.403737] [ cut here ]
[0.403738] WARNING: CPU: 30 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:3074 
its_cpu_init+0x814/0xae0
[0.403745] Modules linked in:
[0.403748] CPU: 30 PID: 0 Comm: swapper/30 Tainted: GW 
5.15.23-ampere-lts-standard #1
[0.403752] pstate: 61c5 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[0.403755] pc : its_cpu_init+0x814/0xae0
[0.403758] lr : its_cpu_init+0x810/0xae0
[0.403761] sp : 89c03ce0
[0.403762] x29: 89c03ce0 x28: 001e x27: 880711f43000
[0.403767] x26: 8a3c0070 x25: fc1ffe0a4400 x24: 8a3c
[0.403770] x23: 895bc998 x22: 890a6000 x21: 89850cb0
[0.403774] x20: 89701a10 x19: 89701000 x18: 
[0.403777] x17: 3030303035303031 x16: 3030313030303078 x15: 303a30206e6f6967
[0.403780] x14: 6572206530312072 x13: 3030303030353030 x12: 3130303130303030
[0.403784] x11: 78303a30206e6f69 x10: 6765722065303120 x9 : 8870e710
[0.403788] x8 : 6964657220646e75 x7 : 0003 x6 : 
[0.403791] x5 :  x4 : fc00 x3 : 0010
[0.403794] x2 :  x1 : 0001 x0 : ffed
[0.403798] Call trace:
[0.403799]  its_cpu_init+0x814/0xae0
[0.403802]  gic_starting_cpu+0x48/0x90
[0.403805]  cpuhp_invoke_callback+0x16c/0x5b0
[0.403808]  cpuhp_invoke_callback_range+0x78/0xf0
[0.403811]  notify_cpu_starting+0xbc/0xdc
[0.403814]  secondary_start_kernel+0xe0/0x170
[0.403817]  __secondary_switched+0x94/0x98
[0.403821] ---[ end trace f68728a0d3053b70 ]---

Looked into the code, the GICv3 driver tries to create persistent
reservations for pending pages, and the persistent reservation table
can be used by kexec/kdump.  For the persistent reservations, it
relies on MEMRESERVE EFI configuration table, but this table is not
supported by xen.efi, I think this is the reason for the above oops.

I checked that if I boot a host Linux (without Xen), then the EFI has
provided MEMRESERVE configuration table, I can get below log:

  #  dmesg | grep MEMRESERVE
  [0.00] efi: TPMFinalLog=0x807f9ef ACPI 2.0=0x807fa0d0018 ... 
MEMRESERVE=0x807f8141e98

Just want to confirm, is anyone working on enabling MEMRESERVE EFI
configuration table for Xen?  And welcome any comments and
suggestions!

Thanks,
Leo



Re: [Xen-devel] Hikey: Enable Xen + Mainline Linux Kernel

2019-01-16 Thread Leo Yan
Hi Julien,

On Wed, Jan 16, 2019 at 08:13:29PM +, Julien Grall wrote:

[...]

> > So xen_get_dma_ops() will try to retrieve pointer from
> > 'dev->archdata.dev_dma_ops' but because it's NULL so at the end
> > introduces kernel panic will NULL pointer.
> >
> > Seems to me, we should check two pointers in dma_is_direct(), one
> > is 'dev->dma_ops' and another is 'dev->archdata.dev_dma_ops', should
> > both of them are not NULL pointers then we can run into xen_alloc_xxx
> > related function, otherwise it should fallback to use
> > dma_direct_alloc() to allocate dma pages?
> >
> > Also very welcome if you could work on formal fixing and at my side
> > I am glad to test it!
> 
> I actually reported a very similar bug on linux-iommu today [1]. It happens
> to be an issue with the recent change in the DMA API. The IOMMU
> maintainer suggested a patch that should fix both of our issues.
> I haven't yet tried the patch [2]. Would you mind to give it go?

Thanks for the info.

Tested with patch [2] at my side and it does fix my issue :)

Thanks,
Leo Yan

> [1] https://lists.xen.org/archives/html/xen-devel/2019-01/msg01351.html
> [2] https://lists.xen.org/archives/html/xen-devel/2019-01/msg01358.html

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Hikey: Enable Xen + Mainline Linux Kernel

2019-01-15 Thread Leo Yan
Hi all,

On Tue, Jan 15, 2019 at 10:49:58AM +0800, Leo Yan wrote:

[...]

> [1.807591] Modules linked in:
> [1.810717] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 
> 5.0.0-rc2-1-g5b47dea3757c #3
> [1.818691] Hardware name: HiKey Development Board (DT)
> [1.823983] pstate: 4005 (nZcv daif -PAN -UAO)
> [1.828848] pc : xen_swiotlb_alloc_coherent+0x64/0x1e8
> [1.834044] lr : dma_alloc_attrs+0xf4/0x110
> [1.838289] sp : 10073a50
> [1.841671] x29: 10073a50 x28: 0007 
> [1.847047] x27: 11150068 x26: 80001b6ddd60 
> [1.852429] x25: 10caaa70 x24:  
> [1.857800] x23: 1000 x22: 1000 
> [1.863177] x21:  x20: 80001c2edc10 
> [1.868553] x19: 111fd000 x18:  
> [1.873930] x17:  x16:  
> [1.879306] x15: 111fd6c8 x14: 900737b7 
> [1.884683] x13: 100737c5 x12: 11215000 
> [1.890060] x11: 05f5e0ff x10: 111fd940 
> [1.895436] x9 :  x8 : 80001bb0e700 
> [1.900813] x7 :  x6 :  
> [1.906189] x5 : 105bdbb8 x4 :  
> [1.911566] x3 : 006000c0 x2 : 80001b6ddd60 
> [1.916943] x1 : 1000 x0 :  
> [1.922326] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
> [1.929084] Call trace:
> [1.931602]  xen_swiotlb_alloc_coherent+0x64/0x1e8
> [1.936456]  dma_alloc_attrs+0xf4/0x110
> [1.940359]  dmam_alloc_attrs+0x64/0xb8
> [1.944264]  dw_mci_probe+0x5f8/0xb00
> [1.947990]  dw_mci_pltfm_register+0xa0/0xd0
> [1.952327]  dw_mci_k3_probe+0x2c/0x38

Some update for this issue after dig a bit for related code; with
below simple hacking, the kernel can boot successfully to rootFS:

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f6ded992c183..31c7e17f0fe5 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -196,7 +196,8 @@ static inline int dma_mmap_from_global_coherent(struct 
vm_area_struct *vma,

 static inline bool dma_is_direct(const struct dma_map_ops *ops)
 {
-   return likely(!ops);
+   return true;
+   //return likely(!ops);
 }

Though this minor code tweaking can workaround the kernel panic, but
it's not a formal fixing; if we look into the kernel code, we can see
firstly the kernel will initialize dma operation pointer in
arch/arm64/mm/dma-mapping.c, arch_setup_dma_ops():

void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
{
dev->dma_coherent = coherent;
__iommu_setup_dma_ops(dev, dma_base, size, iommu);

#ifdef CONFIG_XEN
if (xen_initial_domain()) {
dev->archdata.dev_dma_ops = dev->dma_ops;   // since 
'dev->dma_ops' is NULL,
// so 
dev->archdata.dev_dma_ops
// will be 
initialized as NULL
dev->dma_ops = xen_dma_ops;
}
#endif
}

So we can see 'dev->archdata.dev_dma_ops' will be NULL and
'dev->dma_ops' is assigned to xen_dma_ops;

In dw mmc driver init function, it will run into below flow:

  dw_mci_init_dma()
`> dmam_alloc_coherent()
 `-> dmam_alloc_attrs()
   `-> dma_alloc_attrs()
 `-> xen_dma_ops::alloc()
   `-> xen_swiotlb_alloc_coherent()
 `-> xen_alloc_coherent_pages()
   `-> xen_get_dma_ops()

So xen_get_dma_ops() will try to retrieve pointer from
'dev->archdata.dev_dma_ops' but because it's NULL so at the end
introduces kernel panic will NULL pointer.

Seems to me, we should check two pointers in dma_is_direct(), one
is 'dev->dma_ops' and another is 'dev->archdata.dev_dma_ops', should
both of them are not NULL pointers then we can run into xen_alloc_xxx
related function, otherwise it should fallback to use
dma_direct_alloc() to allocate dma pages?

Also very welcome if you could work on formal fixing and at my side
I am glad to test it!

Thanks,
Leo Yan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Hikey: Enable Xen + Mainline Linux Kernel

2019-01-14 Thread Leo Yan
Hi all,

Sorry I bring up this question at here if it's duplicate due I googled
and found on the mailing list there have several related discussion for
enabling Xen on Hikey (I use Hikey but not Hikey960); but what I saw it
seems a new issue.

Below is the configuration at my side:

- Linux kernel: mainline kernel with 5.0-rc2;

  make defconfig
  make -j16 Image dtbs

- Xen: latest code base

  git clone git://xenbits.xen.org/xen.git
  cd xen
  make dist-xen XEN_TARGET_ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-

- Prepare my boot image:

  $ mount -o loop,rw,sync boot-linux.uefi.img boot-fat
  $ cp Image hi6220-hikey.dtb boot-fat/

  # prepare xen.cfg and startup.nsh

  For xen.cfg, it have below content:
  options=dom0_mem=512M dom0_max_vcpus=8 conswitch=x console=dtuart
  dtuart=/smb/uart@f7113000
  kernel=Image console=hvc root=/dev/mmcblk0p9 rootwait rw 3 mem=256M
  dtb=hi6220-hikey.dtb

  For startup.nsh, it have below content:
  FS0:
  Image console=ttyAMA3,115200 root=/dev/mmcblk0p9 rootwait rw efi=noruntime 
earlycon=pl011,0xf7113000

  $ umount boot-fat

- Flash the boot image into 'boot' partition and UEFI will find it by
  default:

  $ fastboot flash boot boot-linux.uefi.img

- After run into UEFI shell, after execute 'xen' command I can see
  below kernel panic; it's related with mmc driver initialization and
  it tries to use xen_swiotlb_alloc_coherent() for DMA memory
  allocation but failed.

  I tried to reduce the memory size in xen.cfg (options=dom0_mem=512M)
  for Xen and Linux kernel (mem=256M) but it doesn't work.

  Also enclosed the complete booting log.

  Very appreciate if you could give some suggestions for this.

Thanks,
Leo Yan

[1.807591] Modules linked in:
[1.810717] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 
5.0.0-rc2-1-g5b47dea3757c #3
[1.818691] Hardware name: HiKey Development Board (DT)
[1.823983] pstate: 4005 (nZcv daif -PAN -UAO)
[1.828848] pc : xen_swiotlb_alloc_coherent+0x64/0x1e8
[1.834044] lr : dma_alloc_attrs+0xf4/0x110
[1.838289] sp : 10073a50
[1.841671] x29: 10073a50 x28: 0007 
[1.847047] x27: 11150068 x26: 80001b6ddd60 
[1.852429] x25: 10caaa70 x24:  
[1.857800] x23: 1000 x22: 1000 
[1.863177] x21:  x20: 80001c2edc10 
[1.868553] x19: 111fd000 x18:  
[1.873930] x17:  x16:  
[1.879306] x15: 111fd6c8 x14: 900737b7 
[1.884683] x13: 100737c5 x12: 11215000 
[1.890060] x11: 05f5e0ff x10: 111fd940 
[1.895436] x9 :  x8 : 80001bb0e700 
[1.900813] x7 :  x6 :  
[1.906189] x5 : 105bdbb8 x4 :  
[1.911566] x3 : 006000c0 x2 : 80001b6ddd60 
[1.916943] x1 : 1000 x0 :  
[1.922326] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[1.929084] Call trace:
[1.931602]  xen_swiotlb_alloc_coherent+0x64/0x1e8
[1.936456]  dma_alloc_attrs+0xf4/0x110
[1.940359]  dmam_alloc_attrs+0x64/0xb8
[1.944264]  dw_mci_probe+0x5f8/0xb00
[1.947990]  dw_mci_pltfm_register+0xa0/0xd0
[1.952327]  dw_mci_k3_probe+0x2c/0x38
[1.956145]  platform_drv_probe+0x50/0xa0
[1.960218]  really_probe+0x1f0/0x298
[1.963947]  driver_probe_device+0x58/0x100
[1.968196]  __driver_attach+0xd4/0xd8
[1.972017]  bus_for_each_dev+0x74/0xc8
[1.975914]  driver_attach+0x20/0x28
[1.979556]  bus_add_driver+0x1ac/0x218
[1.983458]  driver_register+0x60/0x110
[1.987361]  __platform_driver_register+0x40/0x48
[1.992138]  dw_mci_k3_pltfm_driver_init+0x18/0x20
[1.996988]  do_one_initcall+0x5c/0x178
[2.000891]  kernel_init_freeable+0x198/0x240
[2.005314]  kernel_init+0x10/0x108
[2.008867]  ret_from_fork+0x10/0x18
[2.012516] Code: f9414280 aa1503e4 aa1a03e2 aa1703e1 (f945) 
[2.018680] ---[ end trace baf3ac16eabafa1f ]---
[2.023394] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b
[2.031068] SMP: stopping secondary CPUs
[2.035101] Kernel Offset: disabled
[2.038611] CPU features: 0x002,24002004
[2.042604] Memory Limit: 256 MB
[2.045898] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b ]---
xen
Loading driver at 0x0003783D000 EntryPoint=0x000378E1398
Loading driver at 0x0003783D000 EntryPoint=0x000378E1398 
3hXen 4.12-unstable (c/s Mon Dec 17 09:22:59 2018 + git:a5b0eb3636) EFI 
loader

Using configuration file 'xen.cfg'
hi6220-hikey.dtb: 0x37b1-0x37b195c5
Image: 0x3653b000-0x3783ca00
 Xen 4.12-unstable
(XEN) Xen version 4.12-unstable (leoy@) (aarch64-linux-gnu-gcc (Linaro GCC 
6.2-2016.11) 6.2.1 20161016) debug=y  Mon Jan 14 11:16:40 CST 2019
(XEN) Lat