At 03/01/2011 12:11 PM, Isaku Yamahata Write: > Hi. > > I don't suppose that just introducing pending bits solve the issue. > Your test only use single hot plug/unplug. How about mixing of multiple > hot plug/unplug with different slots.
The qemu uses the same thread to deal with monitor command and I/O request from guest OS. So I think mixing of multiple hot plug/unplug with different slots can also work fine. > Zeroing up/down on piix4_device_hotplug() is the culprit. > > State machine of (up, down) would be needed. > (up, down) of each slots should be changed following > something like Why we need this feature? We access s->pci0_status only in main thread and do not accesss it when we deal with signal, so there is no competition to access it. > > - on power-on (0, 0) > - on hot plug (0, 0) -> (1, 0) > if other combination -> error > - on hot unpug (1, 0) or (0, 0) -> (0, 1) > (0, 0) is for cold plugged devices. > (1, 0) is for hot plugged devices. > if other combination -> error > - on pciej_write(write on PCI_EJ_BASE) > (0, 1) -> (0, 0) and qdev_free() > if other combination -> ignore > > When enabling sci, those bits are checked and raise sci if necessary. > > Any comments on this? > Anyway the current pci hotplug-related commands are somewhat broken, > and needs clean up with multifunction hotplug support which is > on my todo list. > > thanks > > On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote: >> Hi Markus Armbruster >> >> At 02/23/2011 04:30 PM, Markus Armbruster Write: >>> Isaku Yamahata <yamah...@valinux.co.jp> writes: >>> >> >> <snip> >> >>> >>> I don't think this patch is correct. Let me explain. >>> >>> Device hot unplug is *not* guaranteed to succeed. >>> >>> For some buses, such as USB, it always succeeds immediately, i.e. when >>> the device_del monitor command finishes, the device is gone. Live is >>> good. >>> >>> But for PCI, device_del merely initiates the ACPI unplug rain dance. It >>> doesn't wait for the dance to complete. Why? The dance can take an >>> unpredictable amount of time, including forever. >>> >>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI >>> slot, and the unplug has not yet completed (race condition), or it >>> failed. Yes, Virginia, PCI hotplug *can* fail. >>> >>> When unplug succeeds, the qdev is automatically destroyed. >>> pciej_write() does that for PIIX4. Looks like pcie_cap_slot_event() >>> does it for PCIE. >> >> I got a similar problem. When I unplug a pci device by hand, it works >> as expected, and I can hotplug it again. But when I use a srcipt to >> do the same thing, sometimes it failed. I think I may find another bug. >> >> Steps to reproduce this bug: >> 1. cat ./test-e1000.sh # RHEL6RC is domain name >> #! /bin/bash >> >> while true; do >> virsh attach-interface RHEL6RC network default --mac >> 52:54:00:1f:db:c7 --model e1000 >> if [[ $? -ne 0 ]]; then >> break >> fi >> virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7 >> if [[ $? -ne 0 ]]; then >> break >> fi >> sleep 5 >> done >> >> 2. ./test-e1000.sh >> Interface attached successfully >> >> Interface detached successfully >> >> Interface attached successfully >> >> Interface detached successfully >> >> error: Failed to attach interface >> error: operation failed: adding >> e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 >> device failed: Duplicate ID 'net1' for device >> >> >> I use ftrace to trace whether sci interrupt is passed to guest OS, here is >> log: >> # cat trace | grep 'irq=9' >> <idle>-0 [000] 118342.634772: irq_handler_entry: irq=9 >> name=acpi >> <idle>-0 [000] 118342.634831: irq_handler_exit: irq=9 >> ret=handled >> <idle>-0 [000] 118342.693216: irq_handler_entry: irq=9 >> name=acpi >> <idle>-0 [000] 118342.693254: irq_handler_exit: irq=9 >> ret=handled >> <idle>-0 [000] 118347.737689: irq_handler_entry: irq=9 >> name=acpi >> <idle>-0 [000] 118347.737738: irq_handler_exit: irq=9 >> ret=handled >> According to the log, we only receive 3 sci interrupt, and one is lost. >> >> I enable piix4's debug and 1 line printf into piix4_device_hotplug: >> printf("slot: %d, up: %d, down: %d, state: %d\n", slot, >> s->pci0_status.up, s->pci0_status.down, state); >> here is the log: >> ======== >> PM: mapping to 0xb000 >> PM readw port=0x0004 val=0x0000 >> ... >> gpe write afe2 <== 0 >> gpe write afe0 <== 255 >> gpe write afe3 <== 0 >> gpe write afe1 <== 255 >> PM readw port=0x0000 val=0x0001 >> PM readw port=0x0002 val=0x0000 >> gpe read afe0 == 0 >> gpe read afe2 == 0 >> gpe read afe1 == 0 >> gpe read afe3 == 0 >> PM writew port=0x0000 val=0x0020 >> PM readw port=0x0002 val=0x0000 >> PM writew port=0x0002 val=0x0020 >> PM readw port=0x0002 val=0x0020 >> gpe write afe2 <== 255 >> gpe write afe3 <== 255 >> ... >> slot: 6, up: 0, down: 0, state: 1 <==== we attach interface the first time >> PM readw port=0x0000 val=0x0001 >> PM readw port=0x0002 val=0x0120 >> gpe read afe0 == 2 >> gpe read afe2 == ff >> gpe read afe2 == ff >> gpe write afe2 <== 253 >> gpe read afe1 == 0 >> gpe read afe3 == ff >> pcihotplug read ae00 == 40 >> pcihotplug read ae04 == 0 >> ... >> gpe write afe0 <== 2 >> gpe write afe2 <== 255 >> slot: 6, up: 64, down: 0, state: 0 <===== we detach interface the first time >> PM readw port=0x0000 val=0x0001 >> PM readw port=0x0002 val=0x0120 >> gpe read afe0 == 2 >> gpe read afe2 == ff >> gpe read afe2 == ff >> gpe write afe2 <== 253 >> gpe read afe1 == 0 >> gpe read afe3 == ff >> pcihotplug read ae00 == 0 >> pcihotplug read ae04 == 40 >> ... >> pciej write ae08 <== 64 <=== we will call qdev_free() >> pciej write ae08 <== 64 >> gpe write afe0 <== 2 >> gpe write afe2 <== 255 >> slot: 7, up: 0, down: 64, state: 1 <=== we attach interface the second time. >> PM readw port=0x0000 val=0x0001 >> PM readw port=0x0002 val=0x0120 >> gpe read afe0 == 2 >> gpe read afe2 == ff >> gpe read afe2 == ff >> gpe write afe2 <== 253 >> gpe read afe1 == 0 >> gpe read afe3 == ff >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> pcihotplug read ae00 == 80 >> pcihotplug read ae04 == 0 >> slot: 7, up: 128, down: 0, state: 0 <=== we detach interface the second time >> gpe write afe0 <== 2 <=== write 2 to afe0 will cause sci interupt to >> be lost >> gpe write afe2 <== 255 >> =========== >> >> Here is short log, and show the different between the first time and second >> time: >> =========== >> gpe write afe0 <== 2 >> gpe write afe2 <== 255 >> slot: 6, up: 64, down: 0, state: 0 <===== we detach interface the first time >> .... >> slot: 7, up: 128, down: 0, state: 0 <=== we detach interface the second time >> gpe write afe0 <== 2 <=== write 2 to afe0 will cause sci interupt to >> be lost >> gpe write afe2 <== 255 >> >> =========== >> >> Does this behavior is a normal behavior? >> >> The following patch can fix this problem. >> >> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001 >> From: Wen Congyang <we...@cn.fujitsu.com> >> Date: Mon, 28 Feb 2011 10:33:45 +0800 >> Subject: [PATCH] pend hotplug event until last event is handled by guest OS >> >> --- >> hw/acpi_piix4.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- >> 1 files changed, 44 insertions(+), 7 deletions(-) >> >> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c >> index b5a2762..4ff3475 100644 >> --- a/hw/acpi_piix4.c >> +++ b/hw/acpi_piix4.c >> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState { >> /* for pci hotplug */ >> struct gpe_regs gpe; >> struct pci_status pci0_status; >> + struct pci_status pci0_status_pending; >> uint32_t pci0_hotplug_enable; >> } PIIX4PMState; >> >> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, >> uint32_t val) >> *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift); >> } >> >> +static void raise_pending_hotplug(PIIX4PMState *s) >> +{ >> + if (s->pci0_status_pending.up || s->pci0_status_pending.down) { >> + s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS; >> + s->pci0_status.up = s->pci0_status_pending.up; >> + s->pci0_status.down = s->pci0_status_pending.down; >> + s->pci0_status_pending.up = 0; >> + s->pci0_status_pending.down = 0; >> + >> + pm_update_sci(s); >> + } >> +} >> + >> static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val) >> { >> PIIX4PMState *s = opaque; >> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, >> uint32_t val) >> pm_update_sci(s); >> >> PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val); >> + >> + if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & >> PIIX4_PCI_HOTPLUG_STATUS) { >> + /* check and reraise the pending hotplug event */ >> + raise_pending_hotplug(s); >> + } >> } >> >> static uint32_t pcihotplug_read(void *opaque, uint32_t addr) >> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot) >> s->pci0_status.up |= (1 << slot); >> } >> >> +static void pend_enable_device(PIIX4PMState *s, int slot) >> +{ >> + s->pci0_status_pending.up |= (1 << slot); >> +} >> + >> static void disable_device(PIIX4PMState *s, int slot) >> { >> s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS; >> s->pci0_status.down |= (1 << slot); >> } >> >> +static void pend_disable_device(PIIX4PMState *s, int slot) >> +{ >> + s->pci0_status_pending.down |= (1 << slot); >> +} >> + >> static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, >> PCIHotplugState state) >> { >> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, >> PCIDevice *dev, >> return 0; >> } >> >> - s->pci0_status.up = 0; >> - s->pci0_status.down = 0; >> - if (state == PCI_HOTPLUG_ENABLED) { >> - enable_device(s, slot); >> + if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) { >> + if (state == PCI_HOTPLUG_ENABLED) { >> + pend_enable_device(s, slot); >> + } else { >> + pend_disable_device(s, slot); >> + } >> } else { >> - disable_device(s, slot); >> - } >> + s->pci0_status.up = 0; >> + s->pci0_status.down = 0; >> + if (state == PCI_HOTPLUG_ENABLED) { >> + enable_device(s, slot); >> + } else { >> + disable_device(s, slot); >> + } >> >> - pm_update_sci(s); >> + pm_update_sci(s); >> + } >> >> return 0; >> } >> -- >> 1.7.1 >> >>> >>> Your patch adds a *second* qdev_free(). No good. >>> >>> >> >> >