There is an ongoing disagreement among maintainers for how Xen should
handle deviations to specifications such as ACPI or EFI.

Write up an explicit policy, and include two worked-out examples from
recent discussions.

Signed-off-by: George Dunlap <[email protected]>
---
NB that the technical descriptions of the costs of the accommodations
or lack thereof I've just gathered from reading the discussions; I'm
not familiar enough with the details to assert things about them.  So
please correct any technical issues.
---
 docs/policy/FollowingSpecifications.md | 219 +++++++++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100644 docs/policy/FollowingSpecifications.md

diff --git a/docs/policy/FollowingSpecifications.md 
b/docs/policy/FollowingSpecifications.md
new file mode 100644
index 0000000000..a197f01f65
--- /dev/null
+++ b/docs/policy/FollowingSpecifications.md
@@ -0,0 +1,219 @@
+# Guidelines for following specifications
+
+## In general, follow specifications
+
+In general, specifications such as ACPI and EFI should be followed.
+
+## Accommodate non-compliant systems if it doesn't affect compliant systems
+
+Sometimes, however, there occur situations where real systems "in the
+wild" violate these specifications, or at least our interpretation of
+them (henceforth called "non-compliant").  If we can accommodate
+non-compliant systems without affecting any compliant systems, then we
+should do so.
+
+## If accommodation would affect theoretical compliant systems that are
+   not known to exist, and Linux and/or Windows takes the
+   accommodation, take the accommodation unless there's a
+   reason not to.
+
+Sometimes, however, there occur situations where real, non-compliant
+systems "in the wild" cannot be accommodated without affecting
+theoretical compliant systems; but there are no known theoretical
+compliant systems which exist.  If Linux and/or Windows take the
+accommodation, then from a cost/benefits perspective it's probably best
+for us to take the accommodation as well.
+
+This is really a generalization of the next principle; the "reason not
+to" would be in the form of a cost-benefits analysis as described in
+the next section showing why the "special case" doesn't apply to the
+accommodation in question.
+
+## If things aren't clear, do a cost-benefits analysis
+
+Sometimes, however, things are more complicated or less clear.  In
+that case, we should do a cost-benefits analysis for a particular
+accommodation.  Things which should be factored into the analysis:
+
+N-1: The number of non-compliant systems that require the accommodation
+ N-1a: The number of known current systems
+ N-1b: The probable number of unknown current systems
+ N-1c: The probable number of unknown future systems
+
+N-2 The severity of the effect of non-accommodation on these systems
+
+C-1: The number of compliant systems that would be affected by the 
accommodation
+ C-1a: The number of known current systems
+ C-1b: The probable number of unknown current systems
+ C-1c: The probable number of unknown future systems
+
+C-2 The severity of the effect of accommodation on these systems
+
+Intuitively, N-1 * N-2 gives us N, the cost of not making the
+accommodation, and C-1 * C-2 gives us C, the cost of taking the
+accommodation.  If N > C, then we should take the accommodation; if C >
+N, then we shouldn't.
+
+The idea isn't to come up with actual numbers to plug in here
+(although that's certainly an option if someone wants to), but to
+explain the general idea we're trying to get at.
+
+A couple of other principles to factor in:
+
+Vendors tend to copy themselves and other vendors.  If one or two
+major vendors are known to create compliant or non-compliant systems
+in a particular way, then there are likely to be more unknown and
+future systems which will be affected by / need a similar accommodation
+respectively; that is, we should raise our estimates of N-1{b,c} and
+C-1{b,c}.
+
+Some downstreams already implement accommodations, and test on a
+variety of hardware.  If downstreams such as QubesOS or XenServer /
+XCP-ng implement the accommodations, then N-1 * N-2 is likely to be
+non-negligible, and C-1 * C-2 is likely to be negligible.
+
+Windows and Linux are widely tested.  If Windows and/or Linux make a
+particular accommodation, and that accommodation has remained stable
+without being reverted, then it's likely that the number of unknown
+current systems that are affected by the accommodation is negligible;
+that is, we should lower the C-1b estimate.
+
+Vendors tend to test server hardware on Windows and Linux.  If Windows
+and/or Linux make a particular accommodation, then it's unlikely that
+future systems will be affected by the accommodation; that is, we
+should lower the C-1c estimate.
+
+# Example applications
+
+Here are some examples of how these principles can be applied.
+
+## ACPI MADT tables containing ~0
+
+Xen disables certain kinds of features on CPU hotplug systems; for
+example, it will avoid using TSC, which is faster and more power
+efficient (since on a hot-pluggable system it won't be reliable), and
+instead fall back to other timer sources which are slower and less
+power efficient.
+
+Some hardware vendors have (it seems) begun making a single ACPI table
+image for a range of similar systems, with MADT entries for the number
+of CPUs based on the system with the most CPUs, and then for the
+systems with fewer CPUs, replacing the APIC IDs in the MADT table with
+~0, to indicate that those entries aren't valid.  These systems are
+not hotplug capable.  Sometimes the invalid slots are on a separate
+socket.
+
+One interpretation of the spec is that a system with such MADT entries
+could actually have an extra socket, and that later the system could
+update the MADT table, populating the APIC IDs with real values.
+
+If Xen finds an MADT where all slots are either populated or filled
+with APICID ~0, , should it consider it a multi-socket hotplug system,
+disable features available on single-socket systems?  Or should it
+accommodate the systems above, treating the system as systems
+incapable of hotplug?
+
+N-1a: People have clearly found a number of systems in the wild, from
+different vendors, that exhibit this property; it's a non-negligible
+number of systems.
+
+N-1b,c: Since these systems are from different vendors, and there seem to
+be a fair number of them, there are likely to be many more that we
+don't know about; and likely to be many more produced in the future.
+
+N-2: Xen will use more expensive (both time and power-wise) clock
+sources unless the user manually modifies the Xen command-line.
+
+C-1a,b: There are no known systems that implement phyical CPU hotplug
+whatsoever, much less a system that uses ~0 for APICIDs.
+
+There are hypervisors that implement *virtual* CPU hotplug; but they
+don't use ~0 for APICIDs.
+
+C-1c: It seems that physical CPU hotplug is an unsolved problem: it was
+worked on for quite a while and then abandoned.  So it seems fairly
+unlikely that any physical CPU hotplug systems will come to exist any
+time in the near future.
+
+If any hotplug systems were created, they would only be affected if
+they happened to use ~0 the APIC ID of the empty slots in the MADT
+table.  This by itself seems unlikely, given the number of vendors who
+are now using that to mean "invalid slot", and the fact that virtual
+hotplug systems don't do this.
+
+Furthermore, Linux has been treating such entries as permanently
+invalid since 2016.  If any system were to implement physical CPU
+hotplug in the future, and use ~0 as a placeholder APIC ID, it's very
+likely they would test it on Linux, discover that it doesn't work, and
+modify the system to enable it to work (perhaps copying QEMU's
+behavior).  It seems likely that Windows will do the same thing,
+further reducing the probability that any system like this will make
+it into production.
+
+So the potential number of future systems affected by this before we
+can implement a fix seems very small indeed.
+
+C-2: If such a system did exist, everything would work fine at boot;
+the only issue would be that when an extra CPU was plugged in, nothing
+would happen.  This could be overridden by a command-line argument.
+
+Adding these all together, there's a widespread, moderate cost to not
+accommodating these systems, and an uncertain and small cost to
+accommodating them.  So it makes sense to apply the accommodation.
+
+## Calling EFI Reboot method
+
+One interpretation of the EFI spec is that operating systems should
+call the EFI ResetSystem method in preference to the ACPI reboot
+method.
+
+However, although the ResetSystem method is required by the EFI spec,
+a large number of different systems doesn't actully work, at least
+when called by Xen: a large number of systems don't cleanly reboot
+after calling the EFI REBOOT method, but rather crash or fail in some
+other random way.
+
+(One reason for this is that the Windows EFI test doesn't call the EFI
+ResetSystem method, but calls the ACPI reboot method.  One possibile
+explanation for the repeated pattern is that vendors smoke-test the
+ResetSystem method from the EFI shell, which has its own memory map;
+but fail to test it when running on the OS memory map.)
+
+Should Xen follow our interpretation of the EFI spec, and call the
+ResetSystem method in preference to the ACPI reboot method?  Or should
+Xen accommodate systems with broken ResetSystem methods, and call the
+ACPI reboot method by default?
+
+N-1a: There are clearly a large number of systems which exhibit this
+property.
+
+N-1b,c: Given the large number of diverse vendors who make this
+mistake, it seems likely that there are even more that we don't know
+about, and this will continue into the future.
+
+N-2: Systems are incapable of rebooting cleanly unless the right runes
+are put into the Xen command line to make it prefer using the ACPI
+reboot method.
+
+C-1a: A system would only be negatively affected if 1) an ACPI reboot
+method exists, 2) an EFI method exists, and 3) calling the ACPI method
+in preference to the EFI method causes some sort of issue.  So far
+nobody has run into such a system.
+
+C-1b,c: The Windows EFI test explicitly tests the ACPI reboot method
+on EFI systems.  Linux also prefers calling the ACPI reboot method
+even when an EFI method is available.  The chance of someone shipping
+a system that had a problem while that was the case is very tiny: it
+basically wouldn't run either of the two most important operating
+systems.
+
+C-2: It seems likely that the worst that could happen is what's
+happening now when calling the EFI method: that the ACPI method would
+cause a weird crash, which then would reboot or hang.
+
+XenServer has shipped this accommodation for several years now.
+
+Adding these altogether, the cost of non-accommodation is widespread
+and moderate; that is to say, non-negligible; and the cost of
+accommodation is theoretical and tiny.  So it makes sense to apply the
+accommodation.
\ No newline at end of file
-- 
2.42.0


Reply via email to