Re: [Xen-devel] QEMU XenServer/XenProject Working group meeting 29th September 2016

2016-10-20 Thread Stefano Stabellini
On Thu, 20 Oct 2016, Lars Kurth wrote:
> > On 18 Oct 2016, at 20:54, Stefano Stabellini  wrote:
> > 
> > I think this kind of calls should be announced on xen-devel before they
> > happen, to give a chance to other people to participate (I cannot
> > promise I would have participated but it is the principle that counts).
> > 
> > If I missed the announcement, I apologize.
> 
> Stefano, the meeting started off as an internal meeting to brainstorm and 
> share experiences and challenges we have with QEMU amongst different Citrix 
> teams with a view to get a wider dialog started. Maybe we are at the stage 
> where it makes sense to open it up. 

No worries, I didn't mean to pick on you guys, as I wrote I am not sure
I would actually have participated, but I think the meeting would work
better for you and the Xen community if it was open. In fact I think
that we don't have enough open meetings in the Xen community in general:
for your information Julien and I are going to start organizing one for
Xen on ARM soon, with the intention of making it a regular monthly
meeting.


> > On Fri, 14 Oct 2016, Jennifer Herbert wrote:
> >> XenStore
> >> 
> >> 
> >> For the non-pv part of QEMU, XenStore is only used in two places.
> >> There is the DM state, and the physmap mechanism.  Although there is a
> >> vague plan for replacing the physmap mechanism, it is some way off.
> >> 
> >> The DM state key is used for knowing when the qemu process is running
> >> etcetera, QMP would seem to be an option to replace it - however there
> >> is no (nice) way to wait on a socket until it has been opened.  One
> >> solution might be to use Xenstore to let you know the QMP sockets
> >> where available, before QEMU drops privileges,  and then QMP could be
> >> used to know QEMU is in the running state.
> >> 
> >> To avoid the need to use xs-restrict, you would need to both replace
> >> physmap and rework qemu startup procedure. The use of xs-restrict would
> >> be more expedient, and does not look to need that much work.
> >> 
> >> Discussion was had over how secure it would be to allow a guest access
> >> to these Xenstore keys - it was concluded that a guest could mostly
> >> only mess itself up.  If I guest attempted to prevent itself from being
> >> migrated, the tool stack time it out, and could kill it.
> >> 
> >> There followed a discussion on the Xenbus protocol, and additions
> >> needed.  The aim is to merely restrict the permission for the command,
> >> to that of the guest who's domID you provide.  It was proposed that
> >> it uses the header as is, with its  16 bytes, with the command
> >> 'one-time-restrict' , and then the payload would have two additional
> >> field at the start.  These two field would correspond to the domid to
> >> restrict as, and the real command. Transaction ID and tags would be
> >> taken from the real header.
> >> 
> >> Although inter domain xs-restrict is not specifically needed for this
> >> project, it is thought it might be a blocking items for upstream
> >> acceptance.  It it thoughts these changes would not require that much
> >> work to implement, and may be useful in use use cases. Only a few
> >> changes to QEMU would be needed, and libxl should be able to track
> >> QEMU versions.  Ian Jackson volunteered to look at this, with David
> >> helping  with the kernel bits.  Ian won't have time to look at this
> >> until after Xen 4.8 is released.
> >> 
> >> There discussion about what may fail once privileges are taken away,
> >> which would include CDs and PCI pass though.  It is thought the full
> >> list can only be known by trying.  Not everything needs to work for
> >> acceptance upstream, such as PCI pass though.   If such an
> >> incompatible feature is needed, restrictions can be turned off.  These
> >> problems can be fixed in a later phase, with CDs likely being at teh
> >> top of the list.
> > 
> > One thing to note is that xs-restrict is unimplemented in cxenstored.
> > 
> > 
> >> disaggregation
> >> =
> >> 
> >> A disaggregation proposal which had previously been posted to a QEMU
> >> forum was discussed.  It was not previously accepted by all. The big
> >> question was how to separate the device models from the machine, with
> >> a particular point of contention being around PIIX and the idea of
> >> starting a QEMU instance without one.
> > 
> > Right. In particular I tend to agree with the other QEMU maintainers
> > when they say: why ask for a PIIX3 compatible machine, when actually you
> > don't want to be PIIX3 compatible?
> > 
> > 
> >> The general desire from us is
> >> we want to have a specific device emulated and nothing else.
> > 
> > This is really not possible with QEMU, because QEMU is a machine
> > emulator, not a device emulator. BTW who wants this? I mean, why is this
> > part of the QEMU depriv discussion? It is not necessary. I think what we
> > want for QEMU depriv is to be able to build a QEMU PV machine with 

Re: [Xen-devel] QEMU XenServer/XenProject Working group meeting 29th September 2016

2016-10-20 Thread Lars Kurth

> On 18 Oct 2016, at 20:54, Stefano Stabellini  wrote:
> 
> I think this kind of calls should be announced on xen-devel before they
> happen, to give a chance to other people to participate (I cannot
> promise I would have participated but it is the principle that counts).
> 
> If I missed the announcement, I apologize.

Stefano, the meeting started off as an internal meeting to brainstorm and share 
experiences and challenges we have with QEMU amongst different Citrix teams 
with a view to get a wider dialog started. Maybe we are at the stage where it 
makes sense to open it up. 

> On Fri, 14 Oct 2016, Jennifer Herbert wrote:
>> XenStore
>> 
>> 
>> For the non-pv part of QEMU, XenStore is only used in two places.
>> There is the DM state, and the physmap mechanism.  Although there is a
>> vague plan for replacing the physmap mechanism, it is some way off.
>> 
>> The DM state key is used for knowing when the qemu process is running
>> etcetera, QMP would seem to be an option to replace it - however there
>> is no (nice) way to wait on a socket until it has been opened.  One
>> solution might be to use Xenstore to let you know the QMP sockets
>> where available, before QEMU drops privileges,  and then QMP could be
>> used to know QEMU is in the running state.
>> 
>> To avoid the need to use xs-restrict, you would need to both replace
>> physmap and rework qemu startup procedure. The use of xs-restrict would
>> be more expedient, and does not look to need that much work.
>> 
>> Discussion was had over how secure it would be to allow a guest access
>> to these Xenstore keys - it was concluded that a guest could mostly
>> only mess itself up.  If I guest attempted to prevent itself from being
>> migrated, the tool stack time it out, and could kill it.
>> 
>> There followed a discussion on the Xenbus protocol, and additions
>> needed.  The aim is to merely restrict the permission for the command,
>> to that of the guest who's domID you provide.  It was proposed that
>> it uses the header as is, with its  16 bytes, with the command
>> 'one-time-restrict' , and then the payload would have two additional
>> field at the start.  These two field would correspond to the domid to
>> restrict as, and the real command. Transaction ID and tags would be
>> taken from the real header.
>> 
>> Although inter domain xs-restrict is not specifically needed for this
>> project, it is thought it might be a blocking items for upstream
>> acceptance.  It it thoughts these changes would not require that much
>> work to implement, and may be useful in use use cases. Only a few
>> changes to QEMU would be needed, and libxl should be able to track
>> QEMU versions.  Ian Jackson volunteered to look at this, with David
>> helping  with the kernel bits.  Ian won't have time to look at this
>> until after Xen 4.8 is released.
>> 
>> There discussion about what may fail once privileges are taken away,
>> which would include CDs and PCI pass though.  It is thought the full
>> list can only be known by trying.  Not everything needs to work for
>> acceptance upstream, such as PCI pass though.   If such an
>> incompatible feature is needed, restrictions can be turned off.  These
>> problems can be fixed in a later phase, with CDs likely being at teh
>> top of the list.
> 
> One thing to note is that xs-restrict is unimplemented in cxenstored.
> 
> 
>> disaggregation
>> =
>> 
>> A disaggregation proposal which had previously been posted to a QEMU
>> forum was discussed.  It was not previously accepted by all. The big
>> question was how to separate the device models from the machine, with
>> a particular point of contention being around PIIX and the idea of
>> starting a QEMU instance without one.
> 
> Right. In particular I tend to agree with the other QEMU maintainers
> when they say: why ask for a PIIX3 compatible machine, when actually you
> don't want to be PIIX3 compatible?
> 
> 
>> The general desire from us is
>> we want to have a specific device emulated and nothing else.
> 
> This is really not possible with QEMU, because QEMU is a machine
> emulator, not a device emulator. BTW who wants this? I mean, why is this
> part of the QEMU depriv discussion? It is not necessary. I think what we
> want for QEMU depriv is to be able to build a QEMU PV machine with just
> the PV backends in it, which is attainable with the current
> architecture. I know there are use cases for having an emulator of just
> one device, but I don't think they should be confused with the more
> important underlying issue here, which is QEMU running with full
> privileges.
> 
> 
>> It is
>> suggested you would have a software interface between each device that
>> looked a software version of PCI.  The PIIX device could be attached to
>> CPU this pseudo PCI interface.  This would fit in well with how IOREQ
>> server and IOMMU works.  Although this sounds like a large
>> architectural change is wanted, its suggested that 

Re: [Xen-devel] QEMU XenServer/XenProject Working group meeting 29th September 2016

2016-10-18 Thread Stefano Stabellini
I think this kind of calls should be announced on xen-devel before they
happen, to give a chance to other people to participate (I cannot
promise I would have participated but it is the principle that counts).

If I missed the announcement, I apologize.


On Fri, 14 Oct 2016, Jennifer Herbert wrote:
> XenStore
> 
> 
> For the non-pv part of QEMU, XenStore is only used in two places.
> There is the DM state, and the physmap mechanism.  Although there is a
> vague plan for replacing the physmap mechanism, it is some way off.
> 
> The DM state key is used for knowing when the qemu process is running
> etcetera, QMP would seem to be an option to replace it - however there
> is no (nice) way to wait on a socket until it has been opened.  One
> solution might be to use Xenstore to let you know the QMP sockets
> where available, before QEMU drops privileges,  and then QMP could be
> used to know QEMU is in the running state.
> 
> To avoid the need to use xs-restrict, you would need to both replace
> physmap and rework qemu startup procedure. The use of xs-restrict would
> be more expedient, and does not look to need that much work.
> 
> Discussion was had over how secure it would be to allow a guest access
> to these Xenstore keys - it was concluded that a guest could mostly
> only mess itself up.  If I guest attempted to prevent itself from being
> migrated, the tool stack time it out, and could kill it.
> 
> There followed a discussion on the Xenbus protocol, and additions
> needed.  The aim is to merely restrict the permission for the command,
> to that of the guest who's domID you provide.  It was proposed that
> it uses the header as is, with its  16 bytes, with the command
> 'one-time-restrict' , and then the payload would have two additional
> field at the start.  These two field would correspond to the domid to
> restrict as, and the real command. Transaction ID and tags would be
> taken from the real header.
> 
> Although inter domain xs-restrict is not specifically needed for this
> project, it is thought it might be a blocking items for upstream
> acceptance.  It it thoughts these changes would not require that much
> work to implement, and may be useful in use use cases. Only a few
> changes to QEMU would be needed, and libxl should be able to track
> QEMU versions.  Ian Jackson volunteered to look at this, with David
> helping  with the kernel bits.  Ian won't have time to look at this
> until after Xen 4.8 is released.
> 
> There discussion about what may fail once privileges are taken away,
> which would include CDs and PCI pass though.  It is thought the full
> list can only be known by trying.  Not everything needs to work for
> acceptance upstream, such as PCI pass though.   If such an
> incompatible feature is needed, restrictions can be turned off.  These
> problems can be fixed in a later phase, with CDs likely being at teh
> top of the list.

One thing to note is that xs-restrict is unimplemented in cxenstored.

 
> disaggregation
> =
> 
> A disaggregation proposal which had previously been posted to a QEMU
> forum was discussed.  It was not previously accepted by all. The big
> question was how to separate the device models from the machine, with
> a particular point of contention being around PIIX and the idea of
> starting a QEMU instance without one.

Right. In particular I tend to agree with the other QEMU maintainers
when they say: why ask for a PIIX3 compatible machine, when actually you
don't want to be PIIX3 compatible?


> The general desire from us is
> we want to have a specific device emulated and nothing else.

This is really not possible with QEMU, because QEMU is a machine
emulator, not a device emulator. BTW who wants this? I mean, why is this
part of the QEMU depriv discussion? It is not necessary. I think what we
want for QEMU depriv is to be able to build a QEMU PV machine with just
the PV backends in it, which is attainable with the current
architecture. I know there are use cases for having an emulator of just
one device, but I don't think they should be confused with the more
important underlying issue here, which is QEMU running with full
privileges.


> It is
> suggested you would have a software interface between each device that
> looked a software version of PCI.  The PIIX device could be attached to
> CPU this pseudo PCI interface.  This would fit in well with how IOREQ
> server and IOMMU works.  Although this sounds like a large
> architectural change is wanted, its suggested that actually its just
> that we're asking them to take a different stability and plug-ability
> posture on the interfaces they already have.
> 
> This architectural issue is the cause behind lots of little
> annoyances, which have been going on for years. Xen is having to make
> up lots of strange stuff to keep QEMU happy, and there is confusion
> over memory ownership.  Fixing the architecture  should make our lives
> much easier.  These architectural issues are also 

[Xen-devel] QEMU XenServer/XenProject Working group meeting 29th September 2016

2016-10-14 Thread Jennifer Herbert

QEMU XenServer/XenProject Working group meeting 29th September 2016
===

Attendees
-

David Vrabel
Jennifer Herbert
Ian Jackson
Andrew Cooper
Paul Durrant
Lars Kurth

QEMU depriv
===

DMOP


There has been agreement on list on the DMOP proposal.  The HVMCTL
patch series, which was proposed  should need only mechanical changes
to use it as a basis for DMOP.

Privcmd
---

The privcmd changes should be fairly trivial to implement. Libxc
would need changing, but this code is also in the HVMCTL patch
series.  This mean only thing needed for QEMU it to call the restrict
ioctl, to enable it.  If restrict ioctl missing, an error would be
returned.  QEMU would probably want an option to it, to indicate
de-priv is required.  Given this option, the QEMU would raise an error
if the restrict ioctl was not present.

In order to avoid accidents due to ABI instability, old DMOP numbers would
be retired when a DMOP in changed in an ABI-incompatible way - there is
no shortage of new DMOP numbers.

Eventchan
-

Eventchan has resections in 4.7, but the libxc parts need to be done.
This should not be much work.

XenStore


For the non-pv part of QEMU, XenStore is only used in two places.
There is the DM state, and the physmap mechanism.  Although there is a
vague plan for replacing the physmap mechanism, it is some way off.

The DM state key is used for knowing when the qemu process is running
etcetera, QMP would seem to be an option to replace it - however there
is no (nice) way to wait on a socket until it has been opened.  One
solution might be to use Xenstore to let you know the QMP sockets
where available, before QEMU drops privileges,  and then QMP could be
used to know QEMU is in the running state.

To avoid the need to use xs-restrict, you would need to both replace
physmap and rework qemu startup procedure. The use of xs-restrict would
be more expedient, and does not look to need that much work.

Discussion was had over how secure it would be to allow a guest access
to these Xenstore keys - it was concluded that a guest could mostly
only mess itself up.  If I guest attempted to prevent itself from being
migrated, the tool stack time it out, and could kill it.

There followed a discussion on the Xenbus protocol, and additions
needed.  The aim is to merely restrict the permission for the command,
to that of the guest who's domID you provide.  It was proposed that
it uses the header as is, with its  16 bytes, with the command
'one-time-restrict' , and then the payload would have two additional
field at the start.  These two field would correspond to the domid to
restrict as, and the real command. Transaction ID and tags would be
taken from the real header.

Although inter domain xs-restrict is not specifically needed for this
project, it is thought it might be a blocking items for upstream
acceptance.  It it thoughts these changes would not require that much
work to implement, and may be useful in use use cases. Only a few
changes to QEMU would be needed, and libxl should be able to track
QEMU versions.  Ian Jackson volunteered to look at this, with David
helping  with the kernel bits.  Ian won't have time to look at this
until after Xen 4.8 is released.

There discussion about what may fail once privileges are taken away,
which would include CDs and PCI pass though.  It is thought the full
list can only be known by trying.  Not everything needs to work for
acceptance upstream, such as PCI pass though.   If such an
incompatible feature is needed, restrictions can be turned off.  These
problems can be fixed in a later phase, with CDs likely being at teh
top of the list.


Action items


Hypervisor bits really needed first, but can't be done until 4.8 has
been.

Ian to look at the Xenstore items David is to look at the kernel
items.  Paul is to audit the HVMops, checking parameters etc;

It is too late to get this in 4.8, but it is desired to get this in
early into 4.9 so that there can be a period of stabilisation.  With
the release of 4.8 imminent, little work will happen until after that.
However Paul, David and Ian are asked to have a think about their
respective areas, and have a plan for when they can be done.  They are
welcome to start tackling them if they have time.



disaggregation
=

A disaggregation proposal which had previously been posted to a QEMU
forum was discussed.  It was not previously accepted by all. The big
question was how to separate the device models from the machine, with
a particular point of contention being around PIIX and the idea of
starting a QEMU instance without one.  The general desire from us is
we want to have a specific device emulated and nothing else.  It is
suggested you would have a software interface between each device that
looked a software version of PCI.  The PIIX device could be attached to
CPU this pseudo PCI interface.  This would fit in well