Hello Oleksandr,
On 12/3/25 12:03 PM, Oleksandr Tyshchenko wrote:
On 02.12.25 23:33, Grygorii Strashko wrote:
On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
Creating a guest with a high vCPU count (e.g., >32) fails because
the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
causing a guest creation failure.
Increase the buffer size to 16KiB to support guests up to
the MAX_VIRT_CPUS limit (128).
Signed-off-by: Oleksandr Tyshchenko <[email protected]>
---
Noticed when testing the boundary conditions for dom0less guest
creation on Arm64.
Domain configuration:
fdt mknod /chosen domU0
fdt set /chosen/domU0 compatible "xen,domain"
fdt set /chosen/domU0 \#address-cells <0x2>
fdt set /chosen/domU0 \#size-cells <0x2>
fdt set /chosen/domU0 memory <0x0 0x10000 >
fdt set /chosen/domU0 cpus <33>
fdt set /chosen/domU0 vpl011
fdt mknod /chosen/domU0 module@40400000
fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel"
"multiboot,module"
fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
Failure log:
(XEN) Xen dom0less mode detected
(XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
(XEN) Loading d1 kernel from boot module @ 0000000040400000
(XEN) Allocating mappings totalling 64MB for d1:
(XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
(XEN) Device tree generation failed (-22).
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not set up domain domU0 (rc = -22)
(XEN) ****************************************
---
---
xen/common/device-tree/dom0less-build.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
device-tree/dom0less-build.c
index 3f5b987ed8..d7d0a47b97 100644
--- a/xen/common/device-tree/dom0less-build.c
+++ b/xen/common/device-tree/dom0less-build.c
@@ -461,10 +461,12 @@ static int __init
domain_handle_dtb_boot_module(struct domain *d,
/*
* The max size for DT is 2MB. However, the generated DT is small
(not including
- * domU passthrough DT nodes whose size we account separately), 4KB
are enough
- * for now, but we might have to increase it in the future.
+ * domU passthrough DT nodes whose size we account separately). The
size is
+ * primarily driven by the number of vCPU nodes. The previous 4KiB
buffer was
+ * insufficient for guests with high vCPU counts, so it has been
increased
+ * to support up to the MAX_VIRT_CPUS limit (128).
*/
-#define DOMU_DTB_SIZE 4096
+#define DOMU_DTB_SIZE (4096 * 4)
May be It wants Kconfig?
Or some formula which accounts MAX_VIRT_CPUS?
I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
robust approach.
One option could be to detect the size at runtime, essentially, try to allocate
it, and if an error occurs, increase the fdtsize and try again. I don’t really
like this approach, but I wanted to mention it in case someone finds it useful.
The benefit of this approach is that if, in the future, something else such
as a CPU node contributes to the final FDT size, we won’t need to update the
formula again.
Here is the empirical data (by testing with the maximum number of device
tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
optional CPU properties (e.g., clock-frequency)):
cpus=1
(XEN) Final compacted FDT size is: 1586 bytes
cpus=2
(XEN) Final compacted FDT size is: 1698 bytes
cpus=32
(XEN) Final compacted FDT size is: 5058 bytes
cpus=128
(XEN) Final compacted FDT size is: 15810 bytes
static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
*kinfo)
{
int addrcells, sizecells;
@@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
struct kernel_info *kinfo)
if ( ret < 0 )
goto err;
+ printk("Final compacted FDT size is: %d bytes\n",
fdt_totalsize(kinfo->fdt));
+
return 0;
err:
This data shows (assuming my testing/calculations are correct):
- A marginal cost of 112 bytes per vCPU in the final, compacted device tree.
- A fixed base size of 1474 bytes for all non-vCPU content.
Based on that I would propose the following formula with the justification:
/*
* The size is calculated from a fixed baseline plus a scalable
* portion for each potential vCPU node up to the system limit
* (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
* of space.
*
* The baseline of 2KiB is a safe buffer for all non-vCPU FDT
* content. The 128 bytes per vCPU is derived from a worst-case
* analysis of the FDT construction-time size for a single
* vCPU node.
*/
#define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
**********************************************
Please tell me would you be happy with that?
I would also like to note that we probably want to add a BUILD_BUG_ON() check
to ensure that DOMU_DTB_SIZE is not larger than SZ_2M. Otherwise, we would get
a runtime error instead of a build-time failure, since there is code that limits
fdtsize to SZ_2M:
/* Cap to max DT size if needed */
fdt_size = min(fdt_size, SZ_2M);
Thanks.
~ Oleksii