Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
On 02/07/2017 05:21 AM, Roger Pau Monné wrote: > Hello Al, > > Thanks for your comments, please see below. > > On Mon, Feb 06, 2017 at 04:06:45PM -0700, Al Stone wrote: >> On 01/24/2017 07:20 AM, Boris Ostrovsky wrote: [snip] >> Then it gets messy :). The APIC and/or x2APIC subtables of the MADT are not >> likely to exist on arm64; chances are just about zero, actually. There are >> other similar MADT subtables for arm64, but APIC, x2APIC and many more just >> won't be there. There is some overlap with ia64, but not entirely. > > ia64 is also out of the picture here, the more that Xen doesn't support it, > and > it doesn't look like anyone is working on it. Aw. That's kind of sad. I worked on Xen/ia64 briefly many, many moons ago. Yeah, there are arch differences. Once you have the x86 side going, though, I think adding in arm64 wouldn't be too bad; they're a little simpler, in some respects. >> The other issue is that a separate name space for the added CPUs would have >> to be very carefully done. If not, then the processor hierarchy information >> in the AML either becomes useless, or at the least inconsistent, and OSPMs >> are just now beginning to use some of that info to make scheduling decisions. >> It would be possible to just assume the hot plug CPUs are outside of any >> existing processor hierarchy, but I would then worry that power management >> decisions made by the OSPM might be wrong; I can imagine a scenario where >> a CPU is inserted and shares a power rail with an existing CPU, but the >> existing CPU is idle so it decides to power off since it's the last in the >> hierarchy, so the power rail isn't needed, and now the power gets turned off >> to the unit just plugged in because the OSPM doesn't realize it shares power. > > Well, my suggestion was to add the processor objects of the virtual CPUs > inside > an ACPI Module Device that has the _SB.XEN0 namespace. However, AFAIK there's > no way to reserve the _SB.XEN0 namespace, so a vendor could use that for > something else. I think the chances of that happening are very low, but it's > not impossible. > > Is there anyway in ACPI to reserve a namespace for a certain usage? (ie: would > it be possible to somehow reserve _SB.XEN0 for Xen usage?) The only really reserved namespace is "_XXX". The rest is fair game; since one can only use four characters, I suspect there will be some reluctance to set aside more. There are the top-level names (mostly just \_SB these days). Maybe a top level \_XEN or \_VRT could work, perhaps with some fairly strict rules on what can be in that subspace. I think the issue at that point would be whether or not this is a solution to a general problem, or if it is something that affects only Xen. > Or if we want to go more generic, we could reserve _SB.VIRT for generic > hypervisor usage. Right. And this would be one of the key questions from ASWG -- can it be generalized? > [snip...] > I'm also a member of the ACPI working group, and I was planning to send this > design document there for further discussion, just haven't found the time yet > to write a proper mail :(. > > Roger. > No worries. Getting things started is not too bad; it's the discussion after that can go on for a while :-). -- ciao, al --- Al Stone Software Engineer Linaro Enterprise Group al.st...@linaro.org --- ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
Hello Al, Thanks for your comments, please see below. On Mon, Feb 06, 2017 at 04:06:45PM -0700, Al Stone wrote: > On 01/24/2017 07:20 AM, Boris Ostrovsky wrote: > > > >> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT > >> there's > >> no way to reserve anything in there (mostly because it's assumed that ACPI > >> tables will be created by a single entity I guess). > >> > >> I think that the chance of this happening is 0%, and that there's no single > >> system out there with a _SB.XEN0 node. I've been wondering whether I > >> should try > >> to post this to the ACPI working group, and try to get some feedback there. > > > > If you end up asking there, I'd suggest including Rafael Wysocki and Len > > Brown (raf...@kernel.org and l...@kernel.org) and maybe > > linux-a...@vger.kernel.org as well. > > > > -boris > > > > My apologies for not leaping into this discussion earlier; real life has been > somewhat complicated lately. Hopefully I won't annoy too many people. > > So, I am on the ASWG (ACPI Spec Working Group) as a Red Hat and/or Linaro > representative. To clarify something mentioned quite some time ago, the STAO > and XENV tables are in the ACPI in a special form. Essentially, there are two > classes of tables within ACPI: official tables defined in the spec itself that > are meant to be used anywhere ACPI is used, and, tables whose names are to be > recognized but whose content is defined elsewhere. The STAO and XENV belong > to this second class -- the spec reserved their signatures so that others do > not use them, but then points to an external source -- Xen, specifically -- > for > the definition. The practical implication is that Xen can change definitions > as they wish, without direct oversight of the ASWG. Just the same, it is > considered bad form to do so, however, so new revisions should at least be > sent > to the ASWG for discussion (it may make sense to pull the table into the spec > itself...). Stefano and I worked together to get the original reservation > made > for the STAO and XENV tables. > > The other thing I've noticed so far in the discussion is that everything > discussed may work on x86 or ia64, but will not work at all on arm64. The > HARDWARE_REDUCED flag in the FADT was mentioned -- this is the crux of the > problem. For arm64, that flag is required to be set, so overloading it is > most > definitely an issue. More problematic, however, is the notion of using GPE > blocks; when the HARDWARE_REDUCED flag is set, the spec requires GPE block > definitions are to be ignored. Yes, this document is specific to x86. I believe that the difference between x86 and ARM regarding ACPI would make it too complicated to come up with a solution that's usable on both, mainly because ACPI tables on ARM and x86 are already too different. > Then it gets messy :). The APIC and/or x2APIC subtables of the MADT are not > likely to exist on arm64; chances are just about zero, actually. There are > other similar MADT subtables for arm64, but APIC, x2APIC and many more just > won't be there. There is some overlap with ia64, but not entirely. ia64 is also out of the picture here, the more that Xen doesn't support it, and it doesn't look like anyone is working on it. > The other issue is that a separate name space for the added CPUs would have > to be very carefully done. If not, then the processor hierarchy information > in the AML either becomes useless, or at the least inconsistent, and OSPMs > are just now beginning to use some of that info to make scheduling decisions. > It would be possible to just assume the hot plug CPUs are outside of any > existing processor hierarchy, but I would then worry that power management > decisions made by the OSPM might be wrong; I can imagine a scenario where > a CPU is inserted and shares a power rail with an existing CPU, but the > existing CPU is idle so it decides to power off since it's the last in the > hierarchy, so the power rail isn't needed, and now the power gets turned off > to the unit just plugged in because the OSPM doesn't realize it shares power. Well, my suggestion was to add the processor objects of the virtual CPUs inside an ACPI Module Device that has the _SB.XEN0 namespace. However, AFAIK there's no way to reserve the _SB.XEN0 namespace, so a vendor could use that for something else. I think the chances of that happening are very low, but it's not impossible. Is there anyway in ACPI to reserve a namespace for a certain usage? (ie: would it be possible to somehow reserve _SB.XEN0 for Xen usage?) Or if we want to go more generic, we could reserve _SB.VIRT for generic hypervisor usage. > So at a minimum, it sounds like there would need to be a solution for each > architecture, with maybe some fiddling around on ia64, too. Unfortunately, > I believe the ACPI spec provides a way to handle all of the things wanted, > but an ASL interpreter would be required because it does rely on
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
On 01/24/2017 07:20 AM, Boris Ostrovsky wrote: > >> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT >> there's >> no way to reserve anything in there (mostly because it's assumed that ACPI >> tables will be created by a single entity I guess). >> >> I think that the chance of this happening is 0%, and that there's no single >> system out there with a _SB.XEN0 node. I've been wondering whether I should >> try >> to post this to the ACPI working group, and try to get some feedback there. > > If you end up asking there, I'd suggest including Rafael Wysocki and Len > Brown (raf...@kernel.org and l...@kernel.org) and maybe > linux-a...@vger.kernel.org as well. > > -boris > My apologies for not leaping into this discussion earlier; real life has been somewhat complicated lately. Hopefully I won't annoy too many people. So, I am on the ASWG (ACPI Spec Working Group) as a Red Hat and/or Linaro representative. To clarify something mentioned quite some time ago, the STAO and XENV tables are in the ACPI in a special form. Essentially, there are two classes of tables within ACPI: official tables defined in the spec itself that are meant to be used anywhere ACPI is used, and, tables whose names are to be recognized but whose content is defined elsewhere. The STAO and XENV belong to this second class -- the spec reserved their signatures so that others do not use them, but then points to an external source -- Xen, specifically -- for the definition. The practical implication is that Xen can change definitions as they wish, without direct oversight of the ASWG. Just the same, it is considered bad form to do so, however, so new revisions should at least be sent to the ASWG for discussion (it may make sense to pull the table into the spec itself...). Stefano and I worked together to get the original reservation made for the STAO and XENV tables. The other thing I've noticed so far in the discussion is that everything discussed may work on x86 or ia64, but will not work at all on arm64. The HARDWARE_REDUCED flag in the FADT was mentioned -- this is the crux of the problem. For arm64, that flag is required to be set, so overloading it is most definitely an issue. More problematic, however, is the notion of using GPE blocks; when the HARDWARE_REDUCED flag is set, the spec requires GPE block definitions are to be ignored. Then it gets messy :). The APIC and/or x2APIC subtables of the MADT are not likely to exist on arm64; chances are just about zero, actually. There are other similar MADT subtables for arm64, but APIC, x2APIC and many more just won't be there. There is some overlap with ia64, but not entirely. The other issue is that a separate name space for the added CPUs would have to be very carefully done. If not, then the processor hierarchy information in the AML either becomes useless, or at the least inconsistent, and OSPMs are just now beginning to use some of that info to make scheduling decisions. It would be possible to just assume the hot plug CPUs are outside of any existing processor hierarchy, but I would then worry that power management decisions made by the OSPM might be wrong; I can imagine a scenario where a CPU is inserted and shares a power rail with an existing CPU, but the existing CPU is idle so it decides to power off since it's the last in the hierarchy, so the power rail isn't needed, and now the power gets turned off to the unit just plugged in because the OSPM doesn't realize it shares power. So at a minimum, it sounds like there would need to be a solution for each architecture, with maybe some fiddling around on ia64, too. Unfortunately, I believe the ACPI spec provides a way to handle all of the things wanted, but an ASL interpreter would be required because it does rely on executing methods (e.g., _CRS to determine processor resources on hot plug). The ACPICA code is dual-licensed, GPL and commercial, and there is the OpenBSD code. But without an interpreter, it feels like we're trying to push dynamic behavior into static tables, and they really weren't designed for that. That's my $0.02 worth at least -- ciao, al --- Al Stone Software Engineer Linaro Enterprise Group al.st...@linaro.org --- ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's > no way to reserve anything in there (mostly because it's assumed that ACPI > tables will be created by a single entity I guess). > > I think that the chance of this happening is 0%, and that there's no single > system out there with a _SB.XEN0 node. I've been wondering whether I should > try > to post this to the ACPI working group, and try to get some feedback there. If you end up asking there, I'd suggest including Rafael Wysocki and Len Brown (raf...@kernel.org and l...@kernel.org) and maybe linux-a...@vger.kernel.org as well. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
>>> On 23.01.17 at 18:12,wrote: > On Mon, Jan 23, 2017 at 09:55:19AM -0700, Jan Beulich wrote: >> >>> On 23.01.17 at 17:42, wrote: >> > On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote: >> >> >>> On 17.01.17 at 18:14, wrote: >> >> > This can be solved by using a different ACPI name in order to describe >> >> > vCPUs in >> >> > the ACPI namespace. Most hardware vendors tend to use CPU or PR >> >> > prefixes for >> >> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix >> >> > should >> >> > prevent clashes. >> >> >> >> I continue to think that this is insufficient, without seeing a nice >> >> clean way to solve the issue properly. >> > >> > But in this document the namespace path for processor objects will be >> > _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should >> > have >> > updated the wording here, every Xen-related ACPI bit will be inside the >> > _SB.XEN0 namespace. >> >> Well, if we want to introduce our own parent name space, why the >> special naming convention then? Any name not colliding with other >> things in _SB.XEN0 should do then, so the only remaining risk would >> then be that the firmware also has _SB.XEN0. > > Right, that's why I say that I should have reworded this. We can then use > PXXX, > CXXX or whatever we want. > > Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's > no way to reserve anything in there (mostly because it's assumed that ACPI > tables will be created by a single entity I guess). Right. > I think that the chance of this happening is 0%, and that there's no single > system out there with a _SB.XEN0 node. I've been wondering whether I should > try > to post this to the ACPI working group, and try to get some feedback there. As you've said during some earlier discussion, it won't hurt to give this a try. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
On Mon, Jan 23, 2017 at 09:55:19AM -0700, Jan Beulich wrote: > >>> On 23.01.17 at 17:42,wrote: > > On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote: > >> >>> On 17.01.17 at 18:14, wrote: > >> > This can be solved by using a different ACPI name in order to describe > >> > vCPUs in > >> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes > >> > for > >> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix > >> > should > >> > prevent clashes. > >> > >> I continue to think that this is insufficient, without seeing a nice > >> clean way to solve the issue properly. > > > > But in this document the namespace path for processor objects will be > > _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should > > have > > updated the wording here, every Xen-related ACPI bit will be inside the > > _SB.XEN0 namespace. > > Well, if we want to introduce our own parent name space, why the > special naming convention then? Any name not colliding with other > things in _SB.XEN0 should do then, so the only remaining risk would > then be that the firmware also has _SB.XEN0. Right, that's why I say that I should have reworded this. We can then use PXXX, CXXX or whatever we want. Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's no way to reserve anything in there (mostly because it's assumed that ACPI tables will be created by a single entity I guess). I think that the chance of this happening is 0%, and that there's no single system out there with a _SB.XEN0 node. I've been wondering whether I should try to post this to the ACPI working group, and try to get some feedback there. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
>>> On 23.01.17 at 17:42,wrote: > On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote: >> >>> On 17.01.17 at 18:14, wrote: >> > This can be solved by using a different ACPI name in order to describe >> > vCPUs in >> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes >> > for >> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix >> > should >> > prevent clashes. >> >> I continue to think that this is insufficient, without seeing a nice >> clean way to solve the issue properly. > > But in this document the namespace path for processor objects will be > _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have > updated the wording here, every Xen-related ACPI bit will be inside the > _SB.XEN0 namespace. Well, if we want to introduce our own parent name space, why the special naming convention then? Any name not colliding with other things in _SB.XEN0 should do then, so the only remaining risk would then be that the firmware also has _SB.XEN0. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote: > >>> On 17.01.17 at 18:14,wrote: > > This can be solved by using a different ACPI name in order to describe > > vCPUs in > > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for > > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should > > prevent clashes. > > I continue to think that this is insufficient, without seeing a nice > clean way to solve the issue properly. But in this document the namespace path for processor objects will be _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have updated the wording here, every Xen-related ACPI bit will be inside the _SB.XEN0 namespace. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document
>>> On 17.01.17 at 18:14,wrote: > This can be solved by using a different ACPI name in order to describe vCPUs > in > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should > prevent clashes. I continue to think that this is insufficient, without seeing a nice clean way to solve the issue properly. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [DRAFT C] PVH CPU hotplug design document
Hello, Below is a draft of a design document for PVHv2 CPU hotplug. It should cover both vCPU and pCPU hotplug. It's mainly centered around the hardware domain, since for unprivileged PVH guests the vCPU hotplug mechanism is already described in Boris series [0], and it's shared with HVM. The aim here is to find a way to use ACPI vCPU hotplug for the hardware domain, while still being able to properly detect and notify Xen of pCPU hotplug. [0] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00060.html ---8<--- % CPU hotplug support for PVH % Roger Pau Monné% Draft C # Revision History | Version | Date| Changes | |-|-|---| | Draft A | 5 Jan 2017 | Initial draft.| |-|-|---| | Draft B | 12 Jan 2017 | Removed the XXX comments and clarify some | | | | sections. | | | | | | | | Added a sample of the SSDT ASL code that would be | | | | appended to the hardware domain. | |-|-|---| |Draft C | 17 Jan 2017 | Define a _SB.XEN0 bus device and place all the| | | | processor objects and the GPE block inside of it. | | | | | | | | Place the GPE status and enable registers and | | | | the vCPU enable bitmap in memory instead of IO| | | | space.| # Preface This document aims to describe the interface to use in order to implement CPU hotplug for PVH guests, this applies to hotplug of both physical and virtual CPUs. # Introduction One of the design goals of PVH is to be able to remove as much Xen PV specific code as possible, thus limiting the number of Xen PV interfaces used by guests, and tending to use native interfaces (as used by bare metal) as much as possible. This is in line with the efforts also done by Xen on ARM and helps reduce the burden of maintaining huge amounts of Xen PV code inside of guests kernels. This however presents some challenges due to the model used by the Xen Hypervisor, where some devices are handled by Xen while others are left for the hardware domain to manage. The fact that Xen lacks and AML parser also makes it harder, since it cannot get the full hardware description from dynamic ACPI tables (DSDT, SSDT) without the hardware domain collaboration. One of such issues is CPU enumeration and hotplug, for both the hardware and unprivileged domains. The aim is to be able to use the same enumeration and hotplug interface for all PVH guests, regardless of their privilege. This document aims to describe the interface used in order to fulfill the following actions: * Virtual CPU (vCPU) enumeration at boot time. * Hotplug of vCPUs. * Hotplug of physical CPUs (pCPUs) to Xen. # Prior work ## PV CPU hotplug CPU hotplug for Xen PV guests is implemented using xenstore and hypercalls. The guest has to setup a watch event on the "cpu/" xenstore node, and react to changes in this directory. CPUs are added creating a new node and setting it's "availability" to online: cpu/X/availability = "online" Where X is the vCPU ID. This is an out-of-band method, that relies on Xen specific interfaces in order to perform CPU hotplug. ## QEMU CPU hotplug using ACPI The ACPI tables provided to HVM guests contain processor objects, as created by libacpi. The number of processor objects in the ACPI namespace matches the maximum number of processors supported by HVM guests (up to 128 at the time of writing). Processors currently disabled are marked as so in the MADT and in their \_MAT and \_STA methods. A PRST operation region in I/O space is also defined, with a size of 128bits, that's used as a bitmap of enabled vCPUs on the system. A PRSC method is provided in order to check for updates to the PRST region and trigger notifications on the affected processor objects. The execution of the PRSC method is done by a GPE event. Then OSPM checks the value returned by \_STA for the ACPI\_STA\_DEVICE\_PRESENT flag in order to check if the vCPU has been enabled. ## Native CPU hotplug OSPM waits for a notification from ACPI on the processor object and when an event is received the return value from _STA is checked in order to see if ACPI\_STA\_DEVICE\_PRESENT has been enabled. This notification is triggered from the method of a GPE block. # PVH CPU hotplug The aim as stated in the introduction is to use a method as similar as possible to bare metal CPU hotplug for PVH, this is