QEMU XenServer/XenProject Working group meeting 29th September 2016
There has been agreement on list on the DMOP proposal. The HVMCTL
patch series, which was proposed should need only mechanical changes
to use it as a basis for DMOP.
The privcmd changes should be fairly trivial to implement. Libxc
would need changing, but this code is also in the HVMCTL patch
series. This mean only thing needed for QEMU it to call the restrict
ioctl, to enable it. If restrict ioctl missing, an error would be
returned. QEMU would probably want an option to it, to indicate
de-priv is required. Given this option, the QEMU would raise an error
if the restrict ioctl was not present.
In order to avoid accidents due to ABI instability, old DMOP numbers would
be retired when a DMOP in changed in an ABI-incompatible way - there is
no shortage of new DMOP numbers.
Eventchan has resections in 4.7, but the libxc parts need to be done.
This should not be much work.
For the non-pv part of QEMU, XenStore is only used in two places.
There is the DM state, and the physmap mechanism. Although there is a
vague plan for replacing the physmap mechanism, it is some way off.
The DM state key is used for knowing when the qemu process is running
etcetera, QMP would seem to be an option to replace it - however there
is no (nice) way to wait on a socket until it has been opened. One
solution might be to use Xenstore to let you know the QMP sockets
where available, before QEMU drops privileges, and then QMP could be
used to know QEMU is in the running state.
To avoid the need to use xs-restrict, you would need to both replace
physmap and rework qemu startup procedure. The use of xs-restrict would
be more expedient, and does not look to need that much work.
Discussion was had over how secure it would be to allow a guest access
to these Xenstore keys - it was concluded that a guest could mostly
only mess itself up. If I guest attempted to prevent itself from being
migrated, the tool stack time it out, and could kill it.
There followed a discussion on the Xenbus protocol, and additions
needed. The aim is to merely restrict the permission for the command,
to that of the guest who's domID you provide. It was proposed that
it uses the header as is, with its 16 bytes, with the command
'one-time-restrict' , and then the payload would have two additional
field at the start. These two field would correspond to the domid to
restrict as, and the real command. Transaction ID and tags would be
taken from the real header.
Although inter domain xs-restrict is not specifically needed for this
project, it is thought it might be a blocking items for upstream
acceptance. It it thoughts these changes would not require that much
work to implement, and may be useful in use use cases. Only a few
changes to QEMU would be needed, and libxl should be able to track
QEMU versions. Ian Jackson volunteered to look at this, with David
helping with the kernel bits. Ian won't have time to look at this
until after Xen 4.8 is released.
There discussion about what may fail once privileges are taken away,
which would include CDs and PCI pass though. It is thought the full
list can only be known by trying. Not everything needs to work for
acceptance upstream, such as PCI pass though. If such an
incompatible feature is needed, restrictions can be turned off. These
problems can be fixed in a later phase, with CDs likely being at teh
top of the list.
Hypervisor bits really needed first, but can't be done until 4.8 has
Ian to look at the Xenstore items David is to look at the kernel
items. Paul is to audit the HVMops, checking parameters etc;
It is too late to get this in 4.8, but it is desired to get this in
early into 4.9 so that there can be a period of stabilisation. With
the release of 4.8 imminent, little work will happen until after that.
However Paul, David and Ian are asked to have a think about their
respective areas, and have a plan for when they can be done. They are
welcome to start tackling them if they have time.
A disaggregation proposal which had previously been posted to a QEMU
forum was discussed. It was not previously accepted by all. The big
question was how to separate the device models from the machine, with
a particular point of contention being around PIIX and the idea of
starting a QEMU instance without one. The general desire from us is
we want to have a specific device emulated and nothing else. It is
suggested you would have a software interface between each device that
looked a software version of PCI. The PIIX device could be attached to
CPU this pseudo PCI interface. This would fit in well with how IOREQ
server and IOMMU works. Although this sounds like a large
architectural change is wanted, its suggested that actually its just
that we're asking them to take a different stability and plug-ability
posture on the interfaces they already have.
This architectural issue is the cause behind lots of little
annoyances, which have been going on for years. Xen is having to make
up lots of strange stuff to keep QEMU happy, and there is confusion
over memory ownership. Fixing the architecture should make our lives
much easier. These architectural issues are also making things
difficult for Intel, who are trying to work around the issue with Xen
changes, which may just worsen the problem. This means this is
effectively blocking them.
It is proposed that instead of having a QEMU binary, what is really
wanted is a QEMU library. With a library you could easily take the
bits needed, create your own main loop and link them to whatever
interface, IOREQ services or IPC mechanism is needed. There would be
no longer be a need for the IOREQ server to be in QEMU, which is
thought should be an attractive idea for the QEMU maintainers. It is
also thought that other projects, such as the clear containers people
would also benefit from such an architecture. The idea of spiltting
out the CPU code from the device code may even be attractive to KVM.
The code in the Xen tools directory, would be a small event loop,
using glib probably, thing that reads ioreq off a ring, and a
thing that speaks Xenstore. There would be a bunch of initialisation
calls, that calls into libqemu and initialise the various devices,with
device structures for them, indicating where they should be mapped and
so forth. There would be no IDE code in our tree, and no ioreq
server in the QEMU tree.
The QEMU maintainers should be in favour of removing Xen specific code
from QEMU, and it is also thought that you could demonstrate how to
use this to make disaggregated device models for KVM's case. It is
further postulated that there may be many people out there with dev
boards and experiments with FPGAs and strange PCI stuff, they don't
want to wrestle with QEMUs PCI emulator. With the libqemu, it may
just take 50 lines of of code for a random developer to plug some
hardware together and make a simulator.
There was discussion on if a halfway solution might be easier.
However it was concluded that such a solution would likely only
benefit Xen as a quick fix, and not as much as the full libqemu idea,
and so not look that appealing from QEMUs perspective. While the
the full libqemu idea would benefit many more people, allowing an
explosion of QEMU potential use cases and users. More people using a
project should mean more contributors.
It was concluded that this was largely a political issue, and that we
need to find out what the objections really are. If we where to
convince everyone of the benefit, then we'd probably need to step up
and to much of the work - however, this is still likely to be less
work then maintaining the current set-up. There was further
discussion on who our allies might be, and the approach should take to
persuade people. It was stressed that we need to sell the benefits of
this system i.e. "Releasing its full potential".
The alternative to the politics may be to simply fork the project
again - the previous fork lasted a decade. However it should be much
better to cooperate, and so we much try.
Ian to reach out to Peter Maydell and discuss the issue, and to
consider writting down new proposal.
(updated: Probebly to early for writeup.)
Xen-devel mailing list