Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Laszlo Ersek
On 10/16/15 04:38, Kevin O'Connor wrote:
> On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
>> On 10/14/15 13:27, Ian Campbell wrote:
>>> On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
> Can't you just teach SeaBIOS how to deal with your PV disks and then
> only add that to your VM and forget about IDE/AHCI? I mean, that's how
> it's done for virtio-blk, and it doesn't involve any insanities like
> ripping out non-hotpluggable devices.

 Teaching SeaBIOS to deal with PV disks can be done, in fact we already
 support PV disks in OVMF. It is possible to boot Windows with OVMF
 without any IDE disks (patch pending for libxl to create a VM without
 emulated IDE disks).
>>>
>>> One stumbling block in the past has been how to know when the PV drivers in
>>> the BIOS are no longer required, such that the ring can be torn down and/or
>>> the connection etc handed over to the OS driver.
> [...]
>>> AFAIK the BIOS interfaces do not have anything as reliable as that.
>>>
>>> How does virtio deal with this in the BIOS case?
>>
>> It doesn't, as far as I can tell.
>>
>> I don't think it has to, though! On a BIOS box, you can always boot DOS,
>> or another operating system that continues to use the BIOS interfaces
>> forever. (Same as if you never call ExitBootServices() in UEFI.)
>>
>> Given that no starter pistol gets fired between the firmware and the OS
>> on such a platform, they must always respect each other. I guess this
>> could occur through the E820 map, or some such.
> 
> One can use the "ACPI enable" SMI event to detect this if they really
> wanted to.  In SeaBIOS one could do this from
> src/fw/smm.c:handle_smi() - however, no other drivers need this
> notification today and it would be a bit ugly to have to handle it
> from an SMI.  (Assuming Xen were to support SMIs.)
> 
>> No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
>> but I guess the Linux kernel stays away from those areas until it's past
>> device probing and binding.
> 
> In SeaBIOS, the virtio memory is allocated from reserved memory.

Perfect! That gives Xen drivers precedence to do the same.

>  (See
> the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
> zone is taken from reserved memory:
> http://seabios.org/Memory_Model#Memory_available_during_initialization
> )
> 
> What's the reason for the "stumbling block" that requires the BIOS to
> tear down the Xen ring prior to the OS being able to replace it?  The
> BIOS disk calls are all synchronous, so the ring wont be active when
> the OS brings up its own ring.

Yes, that's an argument that works well in practice. However...

> Is there some low-level interaction
> that prevents the OS from just resetting the ring prior to enabling
> it?

the assumption was that the ring would be placed into normal memory. If
GRUB or the kernel overwrote the memory (reallocating the same pages for
completely unrelated purposes) that used to contain the ring while
SeaBIOS was serving requests, the hypervisor would be allowed to notice
and act upon writes to those pages *without* any explicit "kick" (=
guest-to-host notification). The hypervisor is allowed to look at the
ring any time it wishes, so guest code uses barriers while populating
the ring, and kicks the hypervisor "just in case it's not looking right
now".

But if the firmware's ring is in reserved memory, then the OS will stay
away forever. That's great -- it answers the question for virtio, and
should also guide a Xen PV driver implementation.

Thanks!
Laszlo

> 
> -Kevin
> 




Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Stefano Stabellini
On Fri, 16 Oct 2015, Laszlo Ersek wrote:
> On 10/16/15 11:06, Stefano Stabellini wrote:
> > On Thu, 15 Oct 2015, Kevin O'Connor wrote:
> >> On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
> >>> On 10/14/15 13:27, Ian Campbell wrote:
>  On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
> >> Can't you just teach SeaBIOS how to deal with your PV disks and then
> >> only add that to your VM and forget about IDE/AHCI? I mean, that's how
> >> it's done for virtio-blk, and it doesn't involve any insanities like
> >> ripping out non-hotpluggable devices.
> >
> > Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> > support PV disks in OVMF. It is possible to boot Windows with OVMF
> > without any IDE disks (patch pending for libxl to create a VM without
> > emulated IDE disks).
> 
>  One stumbling block in the past has been how to know when the PV drivers 
>  in
>  the BIOS are no longer required, such that the ring can be torn down 
>  and/or
>  the connection etc handed over to the OS driver.
> >> [...]
>  AFAIK the BIOS interfaces do not have anything as reliable as that.
> 
>  How does virtio deal with this in the BIOS case?
> >>>
> >>> It doesn't, as far as I can tell.
> >>>
> >>> I don't think it has to, though! On a BIOS box, you can always boot DOS,
> >>> or another operating system that continues to use the BIOS interfaces
> >>> forever. (Same as if you never call ExitBootServices() in UEFI.)
> >>>
> >>> Given that no starter pistol gets fired between the firmware and the OS
> >>> on such a platform, they must always respect each other. I guess this
> >>> could occur through the E820 map, or some such.
> >>
> >> One can use the "ACPI enable" SMI event to detect this if they really
> >> wanted to.  In SeaBIOS one could do this from
> >> src/fw/smm.c:handle_smi() - however, no other drivers need this
> >> notification today and it would be a bit ugly to have to handle it
> >> from an SMI.  (Assuming Xen were to support SMIs.)
> >>
> >>> No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
> >>> but I guess the Linux kernel stays away from those areas until it's past
> >>> device probing and binding.
> >>
> >> In SeaBIOS, the virtio memory is allocated from reserved memory.  (See
> >> the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
> >> zone is taken from reserved memory:
> >> http://seabios.org/Memory_Model#Memory_available_during_initialization
> >> )
> >>
> >> What's the reason for the "stumbling block" that requires the BIOS to
> >> tear down the Xen ring prior to the OS being able to replace it?  The
> >> BIOS disk calls are all synchronous, so the ring wont be active when
> >> the OS brings up its own ring.  Is there some low-level interaction
> >> that prevents the OS from just resetting the ring prior to enabling
> >> it?
> > 
> > Xen only exports one PV disk interface for each disk to the guest, and
> > each PV interface only supports one frontend -- only SeaBIOS or the OS
> > can be connected to one PV disk, not both. In the case of OVMF, we
> > handle that by disconnecting the PV frontend in OVMF when
> > ExitBootServices is called, so that the OS driver can reconnect later.
> 
> Does the XenBus protocol support a device reset operation, regardless of
> what state the device is currently in? (I don't remember all the state
> transitions any longer, sorry.)

The PV block protocol doesn't unfortunately. At least the block backend
in QEMU doesn't support it.



Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Laszlo Ersek
On 10/16/15 11:06, Stefano Stabellini wrote:
> On Thu, 15 Oct 2015, Kevin O'Connor wrote:
>> On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
>>> On 10/14/15 13:27, Ian Campbell wrote:
 On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
>> Can't you just teach SeaBIOS how to deal with your PV disks and then
>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
>> it's done for virtio-blk, and it doesn't involve any insanities like
>> ripping out non-hotpluggable devices.
>
> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> support PV disks in OVMF. It is possible to boot Windows with OVMF
> without any IDE disks (patch pending for libxl to create a VM without
> emulated IDE disks).

 One stumbling block in the past has been how to know when the PV drivers in
 the BIOS are no longer required, such that the ring can be torn down and/or
 the connection etc handed over to the OS driver.
>> [...]
 AFAIK the BIOS interfaces do not have anything as reliable as that.

 How does virtio deal with this in the BIOS case?
>>>
>>> It doesn't, as far as I can tell.
>>>
>>> I don't think it has to, though! On a BIOS box, you can always boot DOS,
>>> or another operating system that continues to use the BIOS interfaces
>>> forever. (Same as if you never call ExitBootServices() in UEFI.)
>>>
>>> Given that no starter pistol gets fired between the firmware and the OS
>>> on such a platform, they must always respect each other. I guess this
>>> could occur through the E820 map, or some such.
>>
>> One can use the "ACPI enable" SMI event to detect this if they really
>> wanted to.  In SeaBIOS one could do this from
>> src/fw/smm.c:handle_smi() - however, no other drivers need this
>> notification today and it would be a bit ugly to have to handle it
>> from an SMI.  (Assuming Xen were to support SMIs.)
>>
>>> No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
>>> but I guess the Linux kernel stays away from those areas until it's past
>>> device probing and binding.
>>
>> In SeaBIOS, the virtio memory is allocated from reserved memory.  (See
>> the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
>> zone is taken from reserved memory:
>> http://seabios.org/Memory_Model#Memory_available_during_initialization
>> )
>>
>> What's the reason for the "stumbling block" that requires the BIOS to
>> tear down the Xen ring prior to the OS being able to replace it?  The
>> BIOS disk calls are all synchronous, so the ring wont be active when
>> the OS brings up its own ring.  Is there some low-level interaction
>> that prevents the OS from just resetting the ring prior to enabling
>> it?
> 
> Xen only exports one PV disk interface for each disk to the guest, and
> each PV interface only supports one frontend -- only SeaBIOS or the OS
> can be connected to one PV disk, not both. In the case of OVMF, we
> handle that by disconnecting the PV frontend in OVMF when
> ExitBootServices is called, so that the OS driver can reconnect later.

Does the XenBus protocol support a device reset operation, regardless of
what state the device is currently in? (I don't remember all the state
transitions any longer, sorry.)

Thanks
Laszlo




Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Stefano Stabellini
On Thu, 15 Oct 2015, Kevin O'Connor wrote:
> On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
> > On 10/14/15 13:27, Ian Campbell wrote:
> > > On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
> > >>> Can't you just teach SeaBIOS how to deal with your PV disks and then
> > >>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
> > >>> it's done for virtio-blk, and it doesn't involve any insanities like
> > >>> ripping out non-hotpluggable devices.
> > >>
> > >> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> > >> support PV disks in OVMF. It is possible to boot Windows with OVMF
> > >> without any IDE disks (patch pending for libxl to create a VM without
> > >> emulated IDE disks).
> > > 
> > > One stumbling block in the past has been how to know when the PV drivers 
> > > in
> > > the BIOS are no longer required, such that the ring can be torn down 
> > > and/or
> > > the connection etc handed over to the OS driver.
> [...]
> > > AFAIK the BIOS interfaces do not have anything as reliable as that.
> > > 
> > > How does virtio deal with this in the BIOS case?
> > 
> > It doesn't, as far as I can tell.
> > 
> > I don't think it has to, though! On a BIOS box, you can always boot DOS,
> > or another operating system that continues to use the BIOS interfaces
> > forever. (Same as if you never call ExitBootServices() in UEFI.)
> > 
> > Given that no starter pistol gets fired between the firmware and the OS
> > on such a platform, they must always respect each other. I guess this
> > could occur through the E820 map, or some such.
> 
> One can use the "ACPI enable" SMI event to detect this if they really
> wanted to.  In SeaBIOS one could do this from
> src/fw/smm.c:handle_smi() - however, no other drivers need this
> notification today and it would be a bit ugly to have to handle it
> from an SMI.  (Assuming Xen were to support SMIs.)
> 
> > No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
> > but I guess the Linux kernel stays away from those areas until it's past
> > device probing and binding.
> 
> In SeaBIOS, the virtio memory is allocated from reserved memory.  (See
> the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
> zone is taken from reserved memory:
> http://seabios.org/Memory_Model#Memory_available_during_initialization
> )
> 
> What's the reason for the "stumbling block" that requires the BIOS to
> tear down the Xen ring prior to the OS being able to replace it?  The
> BIOS disk calls are all synchronous, so the ring wont be active when
> the OS brings up its own ring.  Is there some low-level interaction
> that prevents the OS from just resetting the ring prior to enabling
> it?

Xen only exports one PV disk interface for each disk to the guest, and
each PV interface only supports one frontend -- only SeaBIOS or the OS
can be connected to one PV disk, not both. In the case of OVMF, we
handle that by disconnecting the PV frontend in OVMF when
ExitBootServices is called, so that the OS driver can reconnect later.



Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Ian Campbell
On Fri, 2015-10-16 at 10:06 +0100, Stefano Stabellini wrote:

> > What's the reason for the "stumbling block" that requires the BIOS to
> > tear down the Xen ring prior to the OS being able to replace it?  The
> > BIOS disk calls are all synchronous, so the ring wont be active when
> > the OS brings up its own ring.  Is there some low-level interaction
> > that prevents the OS from just resetting the ring prior to enabling
> > it?
> 
> Xen only exports one PV disk interface for each disk to the guest, and
> each PV interface only supports one frontend -- only SeaBIOS or the OS
> can be connected to one PV disk, not both.

Which I think is just another way of saying that the Xen PV protocol
currently lacks an explicit requirement for the OS to reset the device (or
indeed the general PV infrastructure, grant tables etc) before use.

Retrofitting that requirement is of course a little tricky.

The unplug protocol might be extensible neough though. IIRC it does include
provisions for the OS to specify a version and the reject the unplug, so
upreving that to include a reset requirement _might_ be possible. At which
point it can at least be made a config option which can be switch on for
new enough guests.

i.e. if the guest is configured to use PV drivers from SeaBIOS the unplug
protocol would reject the attempt to unplug the (non-existent) IDE devices
and the guest therefore should fail to bind to the PV devices, while a
newer guest which knows it has to do a  reset would declare itself to be
newer and succeed in the unplug.

(NB: details of the protocol are sketchy in my memory, and the above may
need actual though applied to make it practical, but you get the gist I
hope).

Then you are just into some sort of multiyear transition/deprecation
sequence before you make it the default.

>  In the case of OVMF, we
> handle that by disconnecting the PV frontend in OVMF when
> ExitBootServices is called, so that the OS driver can reconnect later.




Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Kevin O'Connor
On Fri, Oct 16, 2015 at 10:06:48AM +0100, Stefano Stabellini wrote:
> On Thu, 15 Oct 2015, Kevin O'Connor wrote:
> > What's the reason for the "stumbling block" that requires the BIOS to
> > tear down the Xen ring prior to the OS being able to replace it?  The
> > BIOS disk calls are all synchronous, so the ring wont be active when
> > the OS brings up its own ring.  Is there some low-level interaction
> > that prevents the OS from just resetting the ring prior to enabling
> > it?
> 
> Xen only exports one PV disk interface for each disk to the guest, and
> each PV interface only supports one frontend -- only SeaBIOS or the OS
> can be connected to one PV disk, not both. In the case of OVMF, we
> handle that by disconnecting the PV frontend in OVMF when
> ExitBootServices is called, so that the OS driver can reconnect later.

Well, there isn't a requirement for both SeaBIOS and the OS to be
connected at the same time - it's fine for the OS to replace SeaBIOS.
With the hardware I'm familiar with (eg, usb, ahci, virtio) the OS
just ends up replacing SeaBIOS' DMA rings when it configures its own.
I guess something in the low-level interface of Xen makes that not
work.

Is plugging/unplugging very high overhead?  Since the SeaBIOS disk
interface is fully synchronous, in theory one could have it
plug/unplug on every read request.

-Kevin



Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-15 Thread Laszlo Ersek
CC'ing Kevin O'Connor

On 10/14/15 13:27, Ian Campbell wrote:
> On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
>>> Can't you just teach SeaBIOS how to deal with your PV disks and then
>>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
>>> it's done for virtio-blk, and it doesn't involve any insanities like
>>> ripping out non-hotpluggable devices.
>>
>> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
>> support PV disks in OVMF. It is possible to boot Windows with OVMF
>> without any IDE disks (patch pending for libxl to create a VM without
>> emulated IDE disks).
> 
> One stumbling block in the past has been how to know when the PV drivers in
> the BIOS are no longer required, such that the ring can be torn down and/or
> the connection etc handed over to the OS driver.
> 
> I think we deal with this in OVMF using ExitBootServices? (TBH I'm not sure 
> how).

Search "XenBusDxe/XenBusDxe.c" in edk2 for "EVT_SIGNAL_EXIT_BOOT_SERVICES".

TBH, the code in NotifyExitBoot() doesn't seem valid. If you check the
UEFI spec (for example, v2.5, but the requirement I'm about to quote is
very old), in the specification of EFI_BOOT_SERVICES.CreateEvent() you find:

EVT_SIGNAL_EXIT_BOOT_SERVICES

This event is to be notified by the system when ExitBootServices()
is invoked. This event is of type EVT_NOTIFY_SIGNAL and should not
be combined with any other event types. The notification function
for this event is not allowed to use the Memory Allocation
Services, or call any functions that use the Memory Allocation
Services and must only call functions that are known not to use
Memory Allocation Services, because these services modify the
current memory map.The notification function must not depend on
timer events since timer services will be deactivated before any
notification functions are called.

NotifyExitBoot() in "XenBusDxe/XenBusDxe.c" calls the
DisconnectController() boot service. That in turn leads to calls to
EFI_DRIVER_BINDING_PROTOCOL.Stop() functions (speaking generally), which
inevitably free memory as part of unbinding the device, thereby breaking
the above requirement.

The right solution is the following:
- when a driver binds a device (a "handle"), a piece of the resources
  allocated for that binding should be a new event, to be signaled at
  ExitBootServices() time. The handler function can be shared by all
  such devices. The context passed to the handler should be the (driver-
  specific) structure that represents the binding and the state of the
  device in general.

- When the driver unbinds the device, the event should be closed. This
  will automatically unregister the callback as well.

- Now, when the callback is entered at all, you can be sure that the
  binding still exists. In this case, you should probe into the various
  fields of the context (the device state, practically), to figure out
  if this device "lives" or is dormant. For simpler devices, the answer
  is always "alive", but some devices could have configuration states
  where they are bound yet not configured (using no hw resources etc).

- In case the device is alive, the action to take is to make it abort
  any in-flight transfers or other operations, and re-set / deconfigure
  it *without* touching any memory allocations.

You can see this in the virtio-net driver in OVMF. In the
"OvmfPkg/VirtioNetDxe" directory, see the "TechNotes.txt",
"DriverBinding.c" and "Events.c" files. The callback function is
VirtioNetExitBoot(), and the event registration / deregistration happens in:

VirtioNetDriverBindingStart()
  VirtioNetSnpPopulate()
gBS->CreateEvent(EVT_SIGNAL_EXIT_BOOT_SERVICES)

vs.

VirtioNetDriverBindingStop()
  VirtioNetSnpEvacuate()
gBS->CloseEvent()

What VirtioNetExitBoot() does is very simple: resetting the virtio
device is a small action, and it covers the responsibilities.


I'll admit that the virtio-scsi and virtio-block drivers play a bit
dirty here. (I've known this for a long time, but been silent about it.)
They should have similar callbacks, but don't (In theory, all devices
that are bound & alive at ExitBootServices() should be re-set, without
touching the memory services.)

There are two mitigating factors here:
- unlike with virtio-net, the scsi and block drivers in OVMF support
  only synchronous operations. When you are not calling their
  functions, there are no transfers in flight. And when something calls
  ExitBootServices(), that thing is not calling virtio-block /
  virtio-scsi functions.

- The first thing the OS-level virtio drivers do (certainly in Linux,
  hopefully in Windows) is a virtio-reset on each virtio device found.
  (This is actually required by the virtio specification, both old and
  new.)

Now, there's one small window for issues here. If something in the guest
scribbled over the memory that, according to QEMU, still hosts a live
virtio ring, between ExitBootServices() and 

Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-15 Thread Kevin O'Connor
On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
> On 10/14/15 13:27, Ian Campbell wrote:
> > On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
> >>> Can't you just teach SeaBIOS how to deal with your PV disks and then
> >>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
> >>> it's done for virtio-blk, and it doesn't involve any insanities like
> >>> ripping out non-hotpluggable devices.
> >>
> >> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> >> support PV disks in OVMF. It is possible to boot Windows with OVMF
> >> without any IDE disks (patch pending for libxl to create a VM without
> >> emulated IDE disks).
> > 
> > One stumbling block in the past has been how to know when the PV drivers in
> > the BIOS are no longer required, such that the ring can be torn down and/or
> > the connection etc handed over to the OS driver.
[...]
> > AFAIK the BIOS interfaces do not have anything as reliable as that.
> > 
> > How does virtio deal with this in the BIOS case?
> 
> It doesn't, as far as I can tell.
> 
> I don't think it has to, though! On a BIOS box, you can always boot DOS,
> or another operating system that continues to use the BIOS interfaces
> forever. (Same as if you never call ExitBootServices() in UEFI.)
> 
> Given that no starter pistol gets fired between the firmware and the OS
> on such a platform, they must always respect each other. I guess this
> could occur through the E820 map, or some such.

One can use the "ACPI enable" SMI event to detect this if they really
wanted to.  In SeaBIOS one could do this from
src/fw/smm.c:handle_smi() - however, no other drivers need this
notification today and it would be a bit ugly to have to handle it
from an SMI.  (Assuming Xen were to support SMIs.)

> No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
> but I guess the Linux kernel stays away from those areas until it's past
> device probing and binding.

In SeaBIOS, the virtio memory is allocated from reserved memory.  (See
the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
zone is taken from reserved memory:
http://seabios.org/Memory_Model#Memory_available_during_initialization
)

What's the reason for the "stumbling block" that requires the BIOS to
tear down the Xen ring prior to the OS being able to replace it?  The
BIOS disk calls are all synchronous, so the ring wont be active when
the OS brings up its own ring.  Is there some low-level interaction
that prevents the OS from just resetting the ring prior to enabling
it?

-Kevin