On Mon, Apr 02, 2012 at 12:59:54PM -0600, Alex Williamson wrote: > On Mon, 2012-04-02 at 13:03 +0300, Michael S. Tsirkin wrote: > > piix acpi interface suffers from the following 2 issues: > > > > 1. > > - delete device a > > - quickly add device b in another slot > > > > if we do this before guest reads the down register, > > the down event is discarded and device will never > > be deleted. > > > > 2. > > - delete device a > > - quickly reset before guest can respond > > > > interrupt is reset and guest will never eject the device. > > > > To fix this, we implement three changes: > > > > 1. Do not clear up/down registers on each hotplug event. > > this ensures we don't lose events. > > > > 2. Make existing up/down registers write 1 to clear; > > bios will now write back to these the value it read. > > > > 3. on reset, remove all devices which have DOWN bit set > > > > For compatibility with old guests, we also clear > > the DOWN bit on write to EJ0 for a device. > > > > This patch also extends the documentation for the > > ACPI interface, to make the semantics explicit. > > > > Compatibility notes: migrating guests across qemu versions > > can cause bios from one qemu realease running on another qemu > > release. The following documents the behaviour in this case: > > > > A. new bios running on old qemu > > A bios implementing the change mentioned above will if running on an old > > qemu introduce a race: the registers were writeable there, so if a > > register changes between read and write, we'll clobber the new bits > > losing events. As that version tends to lose events anyway, and as the > > better than complicating the interface. If someone cares enough, the fix > > can go on the stable branch. > > Seems like it'd be easy to have seabios probe this on boot, ex. write > <value> to 0xae00, read 0xae00 to see if value is reflected. If it is, > old qemu, modify dsdt to avoid writes.
This won't help at all: the issue is when we migrate to old qemu after boot. > > B. old bios running on new qemu > > Down bits will get cleared on eject because of 3 above. > > Up bits only get cleared on reset. > > This will cause unnecessary notifications and bus rescans, > > but they are harmless. > > Doesn't windows poll these bits. No. > How can we say that introducing a bus > rescan every poll interval is harmless? It's not done every poll interval - all this does is cause extra rescans when system interrupt is asserted. > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > --- > > > > Changes from v2: > > - Use existing register instead of adding a new one, > > and update documentation > > Changes from v1: > > - tweaked a comment about clearing down register on reset > > - documentation update > > > > docs/specs/acpi_pci_hotplug.txt | 55 +++++++++++++++++++++++++++++++----- > > hw/acpi_piix4.c | 58 > > ++++++++++++++++++++++++++++----------- > > 2 files changed, 89 insertions(+), 24 deletions(-) > > > > diff --git a/docs/specs/acpi_pci_hotplug.txt > > b/docs/specs/acpi_pci_hotplug.txt > > index f0f74a7..e4a9556 100644 > > --- a/docs/specs/acpi_pci_hotplug.txt > > +++ b/docs/specs/acpi_pci_hotplug.txt > > @@ -7,31 +7,70 @@ describes the interface between QEMU and the ACPI BIOS. > > ACPI GPE block (IO ports 0xafe0-0xafe3, byte access): > > ----------------------------------------- > > > > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject > > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI > > hotplug/hotunplug > > event to ACPI BIOS, via SCI interrupt. > > > > -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte > > access): > > +Semantics: this event occurs each time a bit in either Pending PCI slot > > +injection notification or Pending PCI slot removal notification registers > > +changes status from 0 to 1. > > Or if your bios doesn't clear it, 1 to 1. > > > + > > +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte > > access): > > --------------------------------------------------------------- > > -Slot injection notification pending. One bit per slot. > > +Slot injection notification is pending. One bit per slot. > > +When written by Guest, this register implements write one to clear > > behaviour. > > So we're writing the mask of the bits (slots) to be cleared? Can > multiple slots be cleared in one write? > > > Read by ACPI BIOS GPE.1 handler to notify OS of injection > > events. > > +Written to by ACPI BIOS GPE.1 handler after reading > > +and before notifying OS of injection events. > > + > > +Semantics: after a slot is populated by hotplug, it is immediately enabled > > +and a bit in this register is set. > > +Writes into this register are used to acknowledge the injection > > notification > > +and do not not affect the slot status. > > +Reset value: 0 > > > > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access): > > +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte > > access): > > ----------------------------------------------------- > > -Slot removal notification pending. One bit per slot. > > +Slot removal notification is pending. One bit per slot. > > +When written by Guest, this register implements write one to clear > > behaviour. > > Same clarification as above. > > > Read by ACPI BIOS GPE.1 handler to notify OS of removal > > events. > > > > +Written to by ACPI BIOS GPE.1 handler after reading > > +and before notifying OS of removal events. > > + > > +Semantics: upon hotunplug request, a bit in this register is set > > +and the OSPM is notified about hotunplug request, but a slot remains > > enabled > > +until eject. > > +Writes into this register are used to acknowledge the hotunplug request, > > +and do not not affect the slot status. > > +Reset value: 0 > > + > > PCI device eject (IO port 0xae08-0xae0b, 4-byte access): > > ---------------------------------------- > > > > Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot. > > -Reads return 0. > > +Guests should not read this register. In practice reads return 0. > > Isn't mentioning the read value creating a defacto standard? If we're > not defining we should say reads are undefined. > > > +guests should write values with exactly one bit set, selecting > > +the slot to eject. > > +In practice all bits except the lowest bit that is set are discarded. > > Also in > > practice = standard, define it. > > > +practice writing the value 0 into this register has no effect. > > ditto. > > > +Semantics: selected hotunpluggable slot is disabled and powered off. > > +Has no effect on non-hotunpluggable slots. > > + > > +For compatibility with old BIOS, writes to this register > > +clear bits in pending slot injection notification register. > > > > PCI removability status (IO port 0xae0c-0xae0f, 4-byte access): > > ----------------------------------------------- > > > > -Used by ACPI BIOS _RMV method to indicate removability status to OS. One > > -bit per slot. > > +Used by ACPI BIOS _RMV method to detect removability status > > +and report to OS. One bit per slot. > > +Guests should not write this register, in practice the value > > +written is discarded. > > We should define writes as not supported. "Should not, but ok if you > do" is not a specification. > > > +Semantics: selected slots support hotplug. Contents of this register > > +do not change while guest is running. > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c > > index 797ed24..389db16 100644 > > --- a/hw/acpi_piix4.c > > +++ b/hw/acpi_piix4.c > > @@ -287,6 +287,28 @@ static void piix4_update_hotplug(PIIX4PMState *s) > > } > > } > > > > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) > > +{ > > + DeviceState *qdev, *next; > > + BusState *bus = qdev_get_parent_bus(&s->dev.qdev); > > + int slot = ffs(slots) - 1; > > + > > + /* > > + * Clear DOWN register - this is good for a case where guest can't > > + * write to CLR_DOWN because of VM reset, and for compatibility with > > + * old guests which do not have CLR_DOWN. > > + */ > > + s->pci0_status.down &= ~(1U << slot); > > Should we clear "up" here too? > > > + > > + QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) { > > + PCIDevice *dev = PCI_DEVICE(qdev); > > + PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev); > > + if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) { > > + qdev_free(qdev); > > + } > > + } > > +} > > + > > static void piix4_reset(void *opaque) > > { > > PIIX4PMState *s = opaque; > > @@ -302,6 +324,19 @@ static void piix4_reset(void *opaque) > > pci_conf[0x5B] = 0x02; > > } > > piix4_update_hotplug(s); > > + /* > > + * Guest lost remove events if any. > > + * As it's being reset it's safe to remove the device now. > > + */ > > + while (s->pci0_status.down) { > > + acpi_piix_eject_slot(s, s->pci0_status.down); > > + } > > + /* > > + * Guest lost add events if any. > > + * As it's being reset and will rescan the bus we can discard > > + * past events now. > > + */ > > + s->pci0_status.up = 0; > > } > > > > static void piix4_powerdown(void *opaque, int irq, int power_failing) > > @@ -472,10 +507,10 @@ static void pcihotplug_write(void *opaque, uint32_t > > addr, uint32_t val) > > struct pci_status *g = opaque; > > switch (addr) { > > case PCI_BASE: > > - g->up = val; > > + g->up &= ~val; > > break; > > case PCI_BASE + 4: > > - g->down = val; > > + g->down &= ~val; > > break; > > } > > Ok, so it is a bit(slot) mask and can clear multiple bits. > > > @@ -490,19 +525,12 @@ static uint32_t pciej_read(void *opaque, uint32_t > > addr) > > > > static void pciej_write(void *opaque, uint32_t addr, uint32_t val) > > { > > - BusState *bus = opaque; > > - DeviceState *qdev, *next; > > - int slot = ffs(val) - 1; > > + PIIX4PMState *s = opaque; > > > > - QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) { > > - PCIDevice *dev = PCI_DEVICE(qdev); > > - PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev); > > - if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) { > > - qdev_free(qdev); > > - } > > + if (val) { > > + acpi_piix_eject_slot(s, val); > > } > > > > - > > PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val); > > } > > > > @@ -532,8 +560,8 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, > > PIIX4PMState *s) > > register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status); > > register_ioport_read(PCI_BASE, 8, 4, pcihotplug_read, pci0_status); > > > > - register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus); > > - register_ioport_read(PCI_EJ_BASE, 4, 4, pciej_read, bus); > > + register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s); > > + register_ioport_read(PCI_EJ_BASE, 4, 4, pciej_read, s); > > > > register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s); > > register_ioport_read(PCI_RMV_BASE, 4, 4, pcirmv_read, s); > > @@ -567,8 +595,6 @@ static int piix4_device_hotplug(DeviceState *qdev, > > PCIDevice *dev, > > return 0; > > } > > > > - s->pci0_status.up = 0; > > - s->pci0_status.down = 0; > > if (state == PCI_HOTPLUG_ENABLED) { > > enable_device(s, slot); > > } else { > > So if we have an old bios and do an add, followed by a remove, guest > ACPI finds both "up" and "down" set for the slot and we rely on the > ordering of the AML checking up before down to keep the duct tape and > bailing wire from exploding? Hmm, can't say I'm excited about this hack > either. Thanks, > > Alex