Greetings,
Per my response to Allejandro's message, here is the response sent the
the DTB working group formed last year to discuss DTB parsing for x86.
Original Message:
I have copied everyone that attended the hyperlaunch working group a few
weeks back to ensure everyone has a chance to review and comment.
As a start and to provide a common understanding, first is a quick
overview of Flattened Device Tree and Xen's "Unflattened Device Tree".
The intent is to assist everyone in having an equal footing when
considering the impacts that Device Tree parsing brings.
A Flattened Device Tree (FDT) is a nested linear tree structure that
uses a combination of tags, layout definition, and headers to allow
navigation through the tree. Because the layout is nested, if given the
offset for a node in the FDT, it is possible to start at that node and
jump directly into the tree to access child nodes and properties.
Provided below is a visual representation of what any parent node,
including the root node, may look like:
+------------------------------+
| NODE TAG (parent node) |
+------------------------------+
| Null-term String (node name) |
+------------------------------+
| PROPERTY TAG |
+------------------------------+
| struct property { |
| u32 len |
| u32 name_offset |
| } |
+------------------------------+
| Property Data |
+------------------------------+
| NODE TAG (child node) |
+------------------------------+
| Null-term String (node name) |
+------------------------------+
| PROPERTY TAG |
+------------------------------+
| struct property { |
| u32 len |
| u32 name_offset |
| } |
+------------------------------+
| Property Data |
+------------------------------+
| END NODE TAG (child node) |
+------------------------------+
| END NODE TAG (parent node) |
+------------------------------+
Before moving forward, let us clarify some terminology to ensure a
common understanding when discussing a tree.
- node path: represents a series of hierarchical child nodes starting
at the root node
- adjacent node: the logically next node that is at the same level in
the tree
- child node: a node that is a one level lower leaf to another node,
the parent node
- tree walk: incrementally walking the nodes, to locate a specific
node or to iterate over the whole tree
The libfdt library provides handlers for finding the offset of a node,
as well as handlers to jump to a node offset and iterate only on the
child nodes. While the libfdt is fairly optimized, the reality is that
to find a node, the library must do a tree walk starting with the first
node written in the FDT. If a node is not a path match at the current
depth, it must cross a null terminated string, all the node's property
entries and all children nodes to reach the next adjacent node. Once a
path match for the depth is found, then the search may descend into the
next depth and repeat the process until a match at that level is found.
This brings us to Xen's "Unflattened Device Tree" (UDT), for which I am
quoting as I find myself thinking of it in another way, which IMHO is a
more descriptive name, which is that it is an FDT lookup index. It just
happens that the implementation for the lookup index structure is a tree
structure. UDT uses a structure to represent a node and one to represent
a property. The node structure is a traditional tree structure with
adjacent and child node pointers. The contents of both structures are
pointers to the respective memory locations within the FDT. As with the
FDT, in order to locate a node in the index, a tree walk of the index
must be done. The difference comes when a node is not a path match, to
reach the adjacent node, it only needs to access the node pointed to by
the adjacent node pointer of the current node. UDT provides an API for
walking the node tree, walking the property list for a node, and methods
for type-interpreted extraction of property values. NB: the
type-interpreted extraction API is codified around taking a UDT property
structure, but the interpreted extraction logic isn't UDT specific as it
is still reading the property value from the FDT.
The benefit UDT brings is when repeated node lookups and/or repeated
phandle dereferencing are done. For both FDT and UDT, a tree walk must
be done. The walk will start with a node, either the root node or one
for which a reference has already been found, walking each adjacent node
and descending into a node's children when a path match occurs. For
phandle dereferencing, the benefit is greater due to the fact that when
indexing the FDT, phandles get dereferenced, thus allowing direct
reference in the index. For comparison, a phandle dereference using
libfdt does a walk of the tree to find the node referenced by the phandle.
The UDT, as implemented, is not without cost. The current implementation
takes two complete walks of the entire FDT using libfdt. The first pass
is to obtain the amount of memory that is required to allocate enough
instances of the UDT node and property structures to represent the full
tree. The second pass is when the FDT nodes and properties are indexed
into the UDT.
With the expense of using FDT and UDT established, it is important to
put that expense into context. Consider hyperlaunch on x86 where the
arch itself has no DT requirements. In all likelihood, an FDT
constructed for this arch would only contain the nodes necessary for
hyperlaunch. If hyperlaunch were constructed x86 only, loading the
configuration could be done in a single full walk of the FDT, even when
considering phandle usage. The reason this is true for the phandles
case, is that as nodes known to be a phandle target are encountered,
their offset into the FDT could be stored with dereferencing resolved
post walk.
For DT based archs, currently accepted costs are two FDT node lookups
along with the two full walks to construct the UDT. These first two node
lookups being the memory allocation table and the Xen command line. As
noted above, an FDT node lookup requires a walk of the linear tree until
the node is located. AIUI at this point is that the number of nodes that
must be crossed is indeterminate. I did not see anything in the device
tree specification that the FDT must be packed in the same order as the
string representation. NB: I have not reviewed the DT compiler to see if
it optimizes for early access nodes to be packed at the beginning of the
linear tree to reduce the number of nodes that must be crossed.
While the aforementioned strategy for x86 would be optimal for x86, it
is not necessarily the best for DT based archs. Hyperlaunch started, and
currently is, focused on the x86 arch, but long term it was always
understood that its more expansive design would be desirable by all
archs. Like anything that moves into common, a slightly less efficient
approach for one platform is accepted for the benefit of a common
implementation that reduces the amount of code while increasing the
number of reviewers.
After listening to everyone's concerns, re-reviewing all of Arm's device
tree logic, and considering everything in totality, the conclusion is
that there is a core root cause from with which all the concerns raised
flow. First a summary of the main concerns raised,
- The issue of memory allocator(s) available at the time when the
first FDT walk/parsing occurs.
- Overhead of doing a more than one FDT walk to obtain the hyperlaunch
configuration when phandles are in use.
- Supporting FDT would require the introduction of a
duplicate/competing set of property parsers.
This root cause is due to a design decision difference made for the
hyperlaunch domain builder versus the dom0less domain builder and Arm's
approach to device tree parsing. For dom0less, the approach is to walk
the UDT index tree at the domain construction time, which appears to
stem from Arm's need and practice of repeatedly accessing device tree
entries. Whereas x86 has no need for the device tree and took the
approach to do a single walk to extract its configuration into a code
usable structure.
With that understanding, it is believed that these two approaches are
not diametrically opposed and in fact can be blended together to result
in a generally optimized approach. The approach will be to conduct two
full walks of the FDT, an early boot pass before memory allocation is
available and a second pass after a memory allocator is set up. Both
passes serve to populate the proposed boot info structure, specifically
the scope of these passes are as follows:
Early FDT Walk: (static values)
- calculate the space required for the device tree index
- count the number of domains defined
- Xen command line
- XSM policy
- arch specific static values, via an arch_early_fdt()
Late FDT Walk: (dynamic values)
- construct device tree index, aka "unflattened device tree"
- populate boot modules entries (NB: boot modules are expected to be
static array)
- store device tree node index for top level index and hyperlaunch node
- populate boot domain entries, basis will be device tree index node
- arch specific dynamic values, via an arch_late_fdt()
By taking this approach which is constructed around a set of arch
neutral structures will enable another goal of the hyperlaunch series,
which is to move to having a common domain creation/construction logic.
Currently, there is a significant amount of duplication in each arch's
branch for boot time construction of domains. It will also allow
removing arch specific code from the initialization of common
infrastructure such as XSM.
V/r,
Daniel P. Smith