Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-02-22 Thread Al Stone
On 02/07/2017 05:21 AM, Roger Pau Monné wrote:
> Hello Al,
> 
> Thanks for your comments, please see below.
> 
> On Mon, Feb 06, 2017 at 04:06:45PM -0700, Al Stone wrote:
>> On 01/24/2017 07:20 AM, Boris Ostrovsky wrote:
[snip]

>> Then it gets messy :).  The APIC and/or x2APIC subtables of the MADT are not
>> likely to exist on arm64; chances are just about zero, actually.  There are
>> other similar MADT subtables for arm64, but APIC, x2APIC and many more just
>> won't be there.  There is some overlap with ia64, but not entirely.
> 
> ia64 is also out of the picture here, the more that Xen doesn't support it, 
> and
> it doesn't look like anyone is working on it.

Aw.  That's kind of sad.  I worked on Xen/ia64 briefly many, many moons ago.

Yeah, there are arch differences.  Once you have the x86 side going, though, I
think adding in arm64 wouldn't be too bad; they're a little simpler, in some
respects.

>> The other issue is that a separate name space for the added CPUs would have
>> to be very carefully done.  If not, then the processor hierarchy information
>> in the AML either becomes useless, or at the least inconsistent, and OSPMs
>> are just now beginning to use some of that info to make scheduling decisions.
>> It would be possible to just assume the hot plug CPUs are outside of any
>> existing processor hierarchy, but I would then worry that power management
>> decisions made by the OSPM might be wrong; I can imagine a scenario where
>> a CPU is inserted and shares a power rail with an existing CPU, but the
>> existing CPU is idle so it decides to power off since it's the last in the
>> hierarchy, so the power rail isn't needed, and now the power gets turned off
>> to the unit just plugged in because the OSPM doesn't realize it shares power.
> 
> Well, my suggestion was to add the processor objects of the virtual CPUs 
> inside
> an ACPI Module Device that has the _SB.XEN0 namespace. However, AFAIK there's
> no way to reserve the _SB.XEN0 namespace, so a vendor could use that for
> something else. I think the chances of that happening are very low, but it's
> not impossible.
> 
> Is there anyway in ACPI to reserve a namespace for a certain usage? (ie: would
> it be possible to somehow reserve _SB.XEN0 for Xen usage?)

The only really reserved namespace is "_XXX".  The rest is fair game; since one
can only use four characters, I suspect there will be some reluctance to set
aside more.

There are the top-level names (mostly just \_SB these days).  Maybe a top level
\_XEN or \_VRT could work, perhaps with some fairly strict rules on what can be
in that subspace.  I think the issue at that point would be whether or not this
is a solution to a general problem, or if it is something that affects only Xen.

> Or if we want to go more generic, we could reserve _SB.VIRT for generic
> hypervisor usage.

Right.  And this would be one of the key questions from ASWG -- can it be
generalized?

> [snip...] 
> I'm also a member of the ACPI working group, and I was planning to send this
> design document there for further discussion, just haven't found the time yet
> to write a proper mail :(.
> 
> Roger.
> 

No worries.  Getting things started is not too bad; it's the discussion after
that can go on for a while :-).

-- 
ciao,
al
---
Al Stone
Software Engineer
Linaro Enterprise Group
al.st...@linaro.org
---

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-02-07 Thread Roger Pau Monné
Hello Al,

Thanks for your comments, please see below.

On Mon, Feb 06, 2017 at 04:06:45PM -0700, Al Stone wrote:
> On 01/24/2017 07:20 AM, Boris Ostrovsky wrote:
> > 
> >> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT 
> >> there's
> >> no way to reserve anything in there (mostly because it's assumed that ACPI
> >> tables will be created by a single entity I guess).
> >>
> >> I think that the chance of this happening is 0%, and that there's no single
> >> system out there with a _SB.XEN0 node. I've been wondering whether I 
> >> should try
> >> to post this to the ACPI working group, and try to get some feedback there.
> > 
> > If you end up asking there, I'd suggest including Rafael Wysocki and Len
> > Brown (raf...@kernel.org and l...@kernel.org) and maybe 
> > linux-a...@vger.kernel.org as well.
> > 
> > -boris
> > 
> 
> My apologies for not leaping into this discussion earlier; real life has been
> somewhat complicated lately.  Hopefully I won't annoy too many people.
> 
> So, I am on the ASWG (ACPI Spec Working Group) as a Red Hat and/or Linaro
> representative.  To clarify something mentioned quite some time ago, the STAO
> and XENV tables are in the ACPI in a special form.  Essentially, there are two
> classes of tables within ACPI: official tables defined in the spec itself that
> are meant to be used anywhere ACPI is used, and, tables whose names are to be
> recognized but whose content is defined elsewhere.  The STAO and XENV belong
> to this second class -- the spec reserved their signatures so that others do
> not use them, but then points to an external source -- Xen, specifically -- 
> for
> the definition.  The practical implication is that Xen can change definitions
> as they wish, without direct oversight of the ASWG.  Just the same, it is
> considered bad form to do so, however, so new revisions should at least be 
> sent
> to the ASWG for discussion (it may make sense to pull the table into the spec
> itself...).  Stefano and I worked together to get the original reservation 
> made
> for the STAO and XENV tables.
> 
> The other thing I've noticed so far in the discussion is that everything
> discussed may work on x86 or ia64, but will not work at all on arm64.  The
> HARDWARE_REDUCED flag in the FADT was mentioned -- this is the crux of the
> problem.  For arm64, that flag is required to be set, so overloading it is 
> most
> definitely an issue.  More problematic, however, is the notion of using GPE
> blocks; when the HARDWARE_REDUCED flag is set, the spec requires GPE block
> definitions are to be ignored.

Yes, this document is specific to x86. I believe that the difference between
x86 and ARM regarding ACPI would make it too complicated to come up with a
solution that's usable on both, mainly because ACPI tables on ARM and x86 are
already too different.

> Then it gets messy :).  The APIC and/or x2APIC subtables of the MADT are not
> likely to exist on arm64; chances are just about zero, actually.  There are
> other similar MADT subtables for arm64, but APIC, x2APIC and many more just
> won't be there.  There is some overlap with ia64, but not entirely.

ia64 is also out of the picture here, the more that Xen doesn't support it, and
it doesn't look like anyone is working on it.

> The other issue is that a separate name space for the added CPUs would have
> to be very carefully done.  If not, then the processor hierarchy information
> in the AML either becomes useless, or at the least inconsistent, and OSPMs
> are just now beginning to use some of that info to make scheduling decisions.
> It would be possible to just assume the hot plug CPUs are outside of any
> existing processor hierarchy, but I would then worry that power management
> decisions made by the OSPM might be wrong; I can imagine a scenario where
> a CPU is inserted and shares a power rail with an existing CPU, but the
> existing CPU is idle so it decides to power off since it's the last in the
> hierarchy, so the power rail isn't needed, and now the power gets turned off
> to the unit just plugged in because the OSPM doesn't realize it shares power.

Well, my suggestion was to add the processor objects of the virtual CPUs inside
an ACPI Module Device that has the _SB.XEN0 namespace. However, AFAIK there's
no way to reserve the _SB.XEN0 namespace, so a vendor could use that for
something else. I think the chances of that happening are very low, but it's
not impossible.

Is there anyway in ACPI to reserve a namespace for a certain usage? (ie: would
it be possible to somehow reserve _SB.XEN0 for Xen usage?)

Or if we want to go more generic, we could reserve _SB.VIRT for generic
hypervisor usage.

> So at a minimum, it sounds like there would need to be a solution for each
> architecture, with maybe some fiddling around on ia64, too.  Unfortunately,
> I believe the ACPI spec provides a way to handle all of the things wanted,
> but an ASL interpreter would be required because it does rely on 

Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-02-06 Thread Al Stone
On 01/24/2017 07:20 AM, Boris Ostrovsky wrote:
> 
>> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT 
>> there's
>> no way to reserve anything in there (mostly because it's assumed that ACPI
>> tables will be created by a single entity I guess).
>>
>> I think that the chance of this happening is 0%, and that there's no single
>> system out there with a _SB.XEN0 node. I've been wondering whether I should 
>> try
>> to post this to the ACPI working group, and try to get some feedback there.
> 
> If you end up asking there, I'd suggest including Rafael Wysocki and Len
> Brown (raf...@kernel.org and l...@kernel.org) and maybe 
> linux-a...@vger.kernel.org as well.
> 
> -boris
> 

My apologies for not leaping into this discussion earlier; real life has been
somewhat complicated lately.  Hopefully I won't annoy too many people.

So, I am on the ASWG (ACPI Spec Working Group) as a Red Hat and/or Linaro
representative.  To clarify something mentioned quite some time ago, the STAO
and XENV tables are in the ACPI in a special form.  Essentially, there are two
classes of tables within ACPI: official tables defined in the spec itself that
are meant to be used anywhere ACPI is used, and, tables whose names are to be
recognized but whose content is defined elsewhere.  The STAO and XENV belong
to this second class -- the spec reserved their signatures so that others do
not use them, but then points to an external source -- Xen, specifically -- for
the definition.  The practical implication is that Xen can change definitions
as they wish, without direct oversight of the ASWG.  Just the same, it is
considered bad form to do so, however, so new revisions should at least be sent
to the ASWG for discussion (it may make sense to pull the table into the spec
itself...).  Stefano and I worked together to get the original reservation made
for the STAO and XENV tables.

The other thing I've noticed so far in the discussion is that everything
discussed may work on x86 or ia64, but will not work at all on arm64.  The
HARDWARE_REDUCED flag in the FADT was mentioned -- this is the crux of the
problem.  For arm64, that flag is required to be set, so overloading it is most
definitely an issue.  More problematic, however, is the notion of using GPE
blocks; when the HARDWARE_REDUCED flag is set, the spec requires GPE block
definitions are to be ignored.

Then it gets messy :).  The APIC and/or x2APIC subtables of the MADT are not
likely to exist on arm64; chances are just about zero, actually.  There are
other similar MADT subtables for arm64, but APIC, x2APIC and many more just
won't be there.  There is some overlap with ia64, but not entirely.

The other issue is that a separate name space for the added CPUs would have
to be very carefully done.  If not, then the processor hierarchy information
in the AML either becomes useless, or at the least inconsistent, and OSPMs
are just now beginning to use some of that info to make scheduling decisions.
It would be possible to just assume the hot plug CPUs are outside of any
existing processor hierarchy, but I would then worry that power management
decisions made by the OSPM might be wrong; I can imagine a scenario where
a CPU is inserted and shares a power rail with an existing CPU, but the
existing CPU is idle so it decides to power off since it's the last in the
hierarchy, so the power rail isn't needed, and now the power gets turned off
to the unit just plugged in because the OSPM doesn't realize it shares power.

So at a minimum, it sounds like there would need to be a solution for each
architecture, with maybe some fiddling around on ia64, too.  Unfortunately,
I believe the ACPI spec provides a way to handle all of the things wanted,
but an ASL interpreter would be required because it does rely on executing
methods (e.g., _CRS to determine processor resources on hot plug).  The ACPICA
code is dual-licensed, GPL and commercial, and there is the OpenBSD code.
But without an interpreter, it feels like we're trying to push dynamic
behavior into static tables, and they really weren't designed for that.

That's my $0.02 worth at least

-- 
ciao,
al
---
Al Stone
Software Engineer
Linaro Enterprise Group
al.st...@linaro.org
---

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-24 Thread Boris Ostrovsky

> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
> no way to reserve anything in there (mostly because it's assumed that ACPI
> tables will be created by a single entity I guess).
>
> I think that the chance of this happening is 0%, and that there's no single
> system out there with a _SB.XEN0 node. I've been wondering whether I should 
> try
> to post this to the ACPI working group, and try to get some feedback there.

If you end up asking there, I'd suggest including Rafael Wysocki and Len
Brown (raf...@kernel.org and l...@kernel.org) and maybe 
linux-a...@vger.kernel.org as well.

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-23 Thread Jan Beulich
>>> On 23.01.17 at 18:12,  wrote:
> On Mon, Jan 23, 2017 at 09:55:19AM -0700, Jan Beulich wrote:
>> >>> On 23.01.17 at 17:42,  wrote:
>> > On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
>> >> >>> On 17.01.17 at 18:14,  wrote:
>> >> > This can be solved by using a different ACPI name in order to describe 
>> >> > vCPUs in
>> >> > the ACPI namespace. Most hardware vendors tend to use CPU or PR 
>> >> > prefixes for
>> >> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix 
>> >> > should
>> >> > prevent clashes.
>> >> 
>> >> I continue to think that this is insufficient, without seeing a nice
>> >> clean way to solve the issue properly.
>> > 
>> > But in this document the namespace path for processor objects will be
>> > _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should 
>> > have
>> > updated the wording here, every Xen-related ACPI bit will be inside the
>> > _SB.XEN0 namespace.
>> 
>> Well, if we want to introduce our own parent name space, why the
>> special naming convention then? Any name not colliding with other
>> things in _SB.XEN0 should do then, so the only remaining risk would
>> then be that the firmware also has _SB.XEN0.
> 
> Right, that's why I say that I should have reworded this. We can then use 
> PXXX,
> CXXX or whatever we want.
> 
> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
> no way to reserve anything in there (mostly because it's assumed that ACPI
> tables will be created by a single entity I guess).

Right.

> I think that the chance of this happening is 0%, and that there's no single
> system out there with a _SB.XEN0 node. I've been wondering whether I should 
> try
> to post this to the ACPI working group, and try to get some feedback there.

As you've said during some earlier discussion, it won't hurt to give
this a try.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-23 Thread Roger Pau Monné
On Mon, Jan 23, 2017 at 09:55:19AM -0700, Jan Beulich wrote:
> >>> On 23.01.17 at 17:42,  wrote:
> > On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
> >> >>> On 17.01.17 at 18:14,  wrote:
> >> > This can be solved by using a different ACPI name in order to describe 
> >> > vCPUs in
> >> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes 
> >> > for
> >> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix 
> >> > should
> >> > prevent clashes.
> >> 
> >> I continue to think that this is insufficient, without seeing a nice
> >> clean way to solve the issue properly.
> > 
> > But in this document the namespace path for processor objects will be
> > _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should 
> > have
> > updated the wording here, every Xen-related ACPI bit will be inside the
> > _SB.XEN0 namespace.
> 
> Well, if we want to introduce our own parent name space, why the
> special naming convention then? Any name not colliding with other
> things in _SB.XEN0 should do then, so the only remaining risk would
> then be that the firmware also has _SB.XEN0.

Right, that's why I say that I should have reworded this. We can then use PXXX,
CXXX or whatever we want.

Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
no way to reserve anything in there (mostly because it's assumed that ACPI
tables will be created by a single entity I guess).

I think that the chance of this happening is 0%, and that there's no single
system out there with a _SB.XEN0 node. I've been wondering whether I should try
to post this to the ACPI working group, and try to get some feedback there.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-23 Thread Jan Beulich
>>> On 23.01.17 at 17:42,  wrote:
> On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
>> >>> On 17.01.17 at 18:14,  wrote:
>> > This can be solved by using a different ACPI name in order to describe 
>> > vCPUs in
>> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes 
>> > for
>> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix 
>> > should
>> > prevent clashes.
>> 
>> I continue to think that this is insufficient, without seeing a nice
>> clean way to solve the issue properly.
> 
> But in this document the namespace path for processor objects will be
> _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have
> updated the wording here, every Xen-related ACPI bit will be inside the
> _SB.XEN0 namespace.

Well, if we want to introduce our own parent name space, why the
special naming convention then? Any name not colliding with other
things in _SB.XEN0 should do then, so the only remaining risk would
then be that the firmware also has _SB.XEN0.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-23 Thread Roger Pau Monné
On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
> >>> On 17.01.17 at 18:14,  wrote:
> > This can be solved by using a different ACPI name in order to describe 
> > vCPUs in
> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
> > prevent clashes.
> 
> I continue to think that this is insufficient, without seeing a nice
> clean way to solve the issue properly.

But in this document the namespace path for processor objects will be
_SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have
updated the wording here, every Xen-related ACPI bit will be inside the
_SB.XEN0 namespace.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-23 Thread Jan Beulich
>>> On 17.01.17 at 18:14,  wrote:
> This can be solved by using a different ACPI name in order to describe vCPUs 
> in
> the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
> the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
> prevent clashes.

I continue to think that this is insufficient, without seeing a nice
clean way to solve the issue properly.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-17 Thread Roger Pau Monné
Hello,

Below is a draft of a design document for PVHv2 CPU hotplug. It should cover
both vCPU and pCPU hotplug. It's mainly centered around the hardware domain,
since for unprivileged PVH guests the vCPU hotplug mechanism is already
described in Boris series [0], and it's shared with HVM.

The aim here is to find a way to use ACPI vCPU hotplug for the hardware domain,
while still being able to properly detect and notify Xen of pCPU hotplug.

[0] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00060.html

---8<---
% CPU hotplug support for PVH
% Roger Pau Monné 
% Draft C

# Revision History

| Version | Date| Changes   |
|-|-|---|
| Draft A | 5 Jan 2017  | Initial draft.|
|-|-|---|
| Draft B | 12 Jan 2017 | Removed the XXX comments and clarify some |
| | | sections. |
| | |   |
| | | Added a sample of the SSDT ASL code that would be |
| | | appended to the hardware domain.  |
|-|-|---|
|Draft C  | 17 Jan 2017 | Define a _SB.XEN0 bus device and place all the|
| | | processor objects and the GPE block inside of it. |
| | |   |
| | | Place the GPE status and enable registers and |
| | | the vCPU enable bitmap in memory instead of IO|
| | | space.|

# Preface

This document aims to describe the interface to use in order to implement CPU
hotplug for PVH guests, this applies to hotplug of both physical and virtual
CPUs.

# Introduction

One of the design goals of PVH is to be able to remove as much Xen PV specific
code as possible, thus limiting the number of Xen PV interfaces used by guests,
and tending to use native interfaces (as used by bare metal) as much as
possible. This is in line with the efforts also done by Xen on ARM and helps
reduce the burden of maintaining huge amounts of Xen PV code inside of guests
kernels.

This however presents some challenges due to the model used by the Xen
Hypervisor, where some devices are handled by Xen while others are left for the
hardware domain to manage. The fact that Xen lacks and AML parser also makes it
harder, since it cannot get the full hardware description from dynamic ACPI
tables (DSDT, SSDT) without the hardware domain collaboration.

One of such issues is CPU enumeration and hotplug, for both the hardware and
unprivileged domains. The aim is to be able to use the same enumeration and
hotplug interface for all PVH guests, regardless of their privilege.

This document aims to describe the interface used in order to fulfill the
following actions:

 * Virtual CPU (vCPU) enumeration at boot time.
 * Hotplug of vCPUs.
 * Hotplug of physical CPUs (pCPUs) to Xen.

# Prior work

## PV CPU hotplug

CPU hotplug for Xen PV guests is implemented using xenstore and hypercalls. The
guest has to setup a watch event on the "cpu/" xenstore node, and react to
changes in this directory. CPUs are added creating a new node and setting it's
"availability" to online:

cpu/X/availability = "online"

Where X is the vCPU ID. This is an out-of-band method, that relies on Xen
specific interfaces in order to perform CPU hotplug.

## QEMU CPU hotplug using ACPI

The ACPI tables provided to HVM guests contain processor objects, as created by
libacpi. The number of processor objects in the ACPI namespace matches the
maximum number of processors supported by HVM guests (up to 128 at the time of
writing). Processors currently disabled are marked as so in the MADT and in
their \_MAT and \_STA methods.

A PRST operation region in I/O space is also defined, with a size of 128bits,
that's used as a bitmap of enabled vCPUs on the system. A PRSC method is
provided in order to check for updates to the PRST region and trigger
notifications on the affected processor objects. The execution of the PRSC
method is done by a GPE event. Then OSPM checks the value returned by \_STA for
the ACPI\_STA\_DEVICE\_PRESENT flag in order to check if the vCPU has been
enabled.

## Native CPU hotplug

OSPM waits for a notification from ACPI on the processor object and when an
event is received the return value from _STA is checked in order to see if
ACPI\_STA\_DEVICE\_PRESENT has been enabled. This notification is triggered
from the method of a GPE block.

# PVH CPU hotplug

The aim as stated in the introduction is to use a method as similar as possible
to bare metal CPU hotplug for PVH, this is