Re: [Qemu-devel] [PATCH] arm_gic: Keep track of GICD_CPENDR and GICD_SPENDR
Hi Christoffer, Not related to the patch, for edge type of interrupt, will setting bit in ICD_SPENDR generate interrupt? Thanks -Bharat -Original Message- From: Christoffer Dall [mailto:christoffer.d...@linaro.org] Sent: Wednesday, October 23, 2013 8:57 PM To: peter.mayd...@linaro.org Cc: patc...@linaro.org; qemu-devel@nongnu.org; kvm...@lists.cs.columbia.edu Subject: [PATCH] arm_gic: Keep track of GICD_CPENDR and GICD_SPENDR If software writes to the ISPENDR and sets the pending state of a level- triggered interrupt, the falling edge of the hardware input must not clear the pending state. Conversely, if software writes to the ICPENDR, the pending state of a level-triggered interrupt should only be cleared if the hardware input is not asserted. This requires an extra state variable to keep track of software writes. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- hw/intc/arm_gic.c| 20 +--- hw/intc/arm_gic_common.c | 5 +++-- hw/intc/gic_internal.h | 4 3 files changed, 24 insertions(+), 5 deletions(-) diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index d1ddac1..db54061 100644 --- a/hw/intc/arm_gic.c +++ b/hw/intc/arm_gic.c @@ -101,6 +101,12 @@ static void gic_clear_pending(GICState *s, int irq, int cm, uint8_t src) { unsigned cpu; +/* If a level-triggered interrupt has been set to pending through the + * GICD_SPENDR, then a falling edge does not clear the pending state. + */ +if (GIC_TEST_SW_PENDING(irq, cm)) +return; + GIC_CLEAR_PENDING(irq, cm); if (irq GIC_NR_SGIS) { cpu = (unsigned)ffs(cm) - 1; @@ -177,8 +183,9 @@ uint32_t gic_acknowledge_irq(GICState *s, int cpu) s-last_active[new_irq][cpu] = s-running_irq[cpu]; /* Clear pending flags for both level and edge triggered interrupts. Level triggered IRQs will be reasserted once they become inactive. */ -gic_clear_pending(s, new_irq, GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : cm, - GIC_SGI_SRC(new_irq, cpu)); +cm = GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : cm; +GIC_CLEAR_SW_PENDING(new_irq, cm); +gic_clear_pending(s, new_irq, cm, GIC_SGI_SRC(new_irq, cpu)); gic_set_running_irq(s, cpu, new_irq); DPRINTF(ACK %d\n, new_irq); return new_irq; @@ -445,16 +452,23 @@ static void gic_dist_writeb(void *opaque, hwaddr offset, for (i = 0; i 8; i++) { if (value (1 i)) { GIC_SET_PENDING(irq + i, GIC_TARGET(irq + i)); +if (!GIC_TEST_TRIGGER(irq + i)) { +GIC_SET_SW_PENDING(irq + i, GIC_TARGET(irq + i)); +} } } } else if (offset 0x300) { +int cm = (1 cpu); /* Interrupt Clear Pending. */ irq = (offset - 0x280) * 8 + GIC_BASE_IRQ; if (irq = s-num_irq) goto bad_reg; for (i = 0; i 8; i++, irq++) { if (irq GIC_NR_SGIS value (1 i)) { -gic_clear_pending(s, irq, 1 cpu, 0); +GIC_CLEAR_SW_PENDING(irq, cm); +if (GIC_TEST_TRIGGER(irq + i) || !GIC_TEST_LEVEL(irq, cm)) { +GIC_CLEAR_PENDING(irq, cm); +} } } } else if (offset 0x400) { diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index 1d3b738..7f0615f 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -43,11 +43,12 @@ static int gic_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_gic_irq_state = { .name = arm_gic_irq_state, -.version_id = 1, -.minimum_version_id = 1, +.version_id = 2, +.minimum_version_id = 2, .fields = (VMStateField[]) { VMSTATE_UINT8(enabled, gic_irq_state), VMSTATE_UINT8(pending, gic_irq_state), +VMSTATE_UINT8(sw_pending, gic_irq_state), VMSTATE_UINT8(active, gic_irq_state), VMSTATE_UINT8(level, gic_irq_state), VMSTATE_BOOL(model, gic_irq_state), diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h index f9133b9..173c607 100644 --- a/hw/intc/gic_internal.h +++ b/hw/intc/gic_internal.h @@ -43,6 +43,9 @@ #define GIC_SET_PENDING(irq, cm) s-irq_state[irq].pending |= (cm) #define GIC_CLEAR_PENDING(irq, cm) s-irq_state[irq].pending = ~(cm) #define GIC_TEST_PENDING(irq, cm) ((s-irq_state[irq].pending (cm)) != 0) +#define GIC_SET_SW_PENDING(irq, cm) s-irq_state[irq].sw_pending |= +(cm) #define GIC_CLEAR_SW_PENDING(irq, cm) s-irq_state[irq].sw_pending += ~(cm) #define GIC_TEST_SW_PENDING(irq, cm) +((s-irq_state[irq].sw_pending (cm)) != 0) #define GIC_SET_ACTIVE(irq, cm) s-irq_state[irq].active |= (cm) #define GIC_CLEAR_ACTIVE(irq, cm) s-irq_state[irq].active = ~(cm) #define GIC_TEST_ACTIVE(irq, cm) ((s-irq_state[irq].active (cm)) != 0) @@
Re: [Qemu-devel] vfio for platform devices - 9/5/2012 - minutes
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, September 11, 2013 10:45 PM To: Yoder Stuart-B08248 Cc: Wood Scott-B07421; Sethi Varun-B16395; Bhushan Bharat-R65777; 'Peter Maydell'; 'Santosh Shukla'; 'Alexander Graf'; 'Antonios Motakis'; 'Christoffer Dall'; 'kim.phill...@linaro.org'; kvm...@lists.cs.columbia.edu; kvm- p...@vger.kernel.org; qemu-devel@nongnu.org Subject: Re: vfio for platform devices - 9/5/2012 - minutes On Wed, 2013-09-11 at 16:42 +, Yoder Stuart-B08248 wrote: -Original Message- From: Yoder Stuart-B08248 Sent: Thursday, September 05, 2013 12:51 PM To: Wood Scott-B07421; Sethi Varun-B16395; Bhushan Bharat-R65777; 'Peter Maydell'; 'Santosh Shukla'; 'Alex Williamson'; 'Alexander Graf'; 'Antonios Motakis'; 'Christoffer Dall'; 'kim.phill...@linaro.org' Cc: kvm...@lists.cs.columbia.edu; 'kvm-...@vger.kernel.org'; 'qemu- de...@nongnu.org' Subject: vfio for platform devices - 9/5/2012 - minutes We had a call with those interested and/or working on vfio for platform devices. Participants: Scott Wood, Varun Sethi, Bharat Bhushan, Peter Maydell, Santosh Shukla, Alex Williamson, Alexander Graf, Antonios Motakis, Christoffer Dall, Kim Phillips, Stuart Yoder Several aspects to vfio for platform devices: 1. IOMMU groups -iommu driver needs to register a bus notifier for the platform bus and create groups for relevant platform devices -Antonios is looking at this for several ARM IOMMUs -PAMU (Freescale) driver already does this 2. unbinding device from host PCI: echo :06:0d.0 /sys/bus/pci/devices/:06:0d.0/driver/unbind Platform: echo ffe101300.dma /sys/bus/platform/devices/ffe101300.dma/driver/unbind -don't believe there are issues or work to do here 3. binding device to vfio-platform driver PCI: echo 1102 0002 /sys/bus/pci/drivers/vfio-pci/new_id -this is probably the least understood issue-- platform drivers register themselves with the bus for a specific name string. That is matched with device tree compatible strings later to bind a device to a driver -we want is to have the vfio-platform driver to dynamically bind to a variety of platform devices previously unknown to vfio-platform -ideally unbinding and binding could be an atomic operation -Alex W pointed out that x86 could leverage this work so keep that in mind in what we design -Kim Phillips (Linaro) will start working on this One thing we didn't discuss needs to be considered (probably by Kim who is looking at the 'binding device' issue) is around returning a passthru device back to the host. After a platform device has been bound to vfio and is in use by user space or a virtual machine, we also need to be able to unwind all that and return the device back to the host in a sane state. What happens when user space exits and the vfio file descriptors are closed? For reference, expectations of how vfio-pci handles these situations: For vfio-pci, when the reference count on the device fd drops to zero we call a device disable function that includes disabling the bus master bit in config space stop ongoing DMA. There is no bus mastering for platform devices and device disabling is not a standard like PCI. Do you think that we might need to do some device specific handling. For example for DMA controller, we need to atleast disable the DMA controller and mask its interrupts (may be ensure that there is no pending/running dma transaction, release irqs etc). As we are not yet having any linkage to respective kernel driver then I am not sure how we will do that specific handling? What if the device is still active and doing bus mastering? (e.g. a VM crashed causing a QEMU exit) If the VM crashes the vfio fds get released resulting in the above opportunity for the vfio device driver to quiesce the device. I think the quiesing of devices with we device specific, so the generic vfio=platform driver may not be able to handle that, right? How can the vfio-platform layer in the host kernel get a specific device in a sane state? It's not easy on pci either. We save config space prior to exposing the device and restore config space later, but it's not complete. We mostly rely on device (function) resets, to put things in a sane state, but those aren't always supported. All platform devices also may not have reset capability (and if any then not generic way for all devices). I just introduced patches for v3.12 that enable a PCI bus reset interface, but it's mostly useful for userspace, since on PCI it's often the case that a bus contains multiple devices which don't necessarily align to iommu group boundaries. When a plaform device is 'unbound
Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 10, 2013 11:40 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org; qemu- de...@nongnu.org; Wood Scott-B07421 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 On 10.06.2013, at 19:20, Scott Wood wrote: On 06/10/2013 09:26:18 AM, Alexander Graf wrote: On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Andreas Färber [mailto:afaer...@suse.de] Sent: Monday, June 10, 2013 5:43 PM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 So IIUC we can only allow 63 bits due to signedness, thus a maximum of (1 62), thus target_bit= 61. Any chance at least the comment can be worded to explain that any better? Maybe also use (target-bit + 1= 63) or period INT64_MAX as condition? How about this: /* QEMU timer supports a maximum timer of INT64_MAX (0x7fff_). * Run booke fit/wdog timer when * ((1ULL target_bit + 1) 0x4000_), i.e target_bit = 61. * Also the time with this maximum target_bit (with current range of * CPU frequency PowerPC supports) will be many many years. So it is * pretty safe to stop the timer above this threshold. */ How about /* This timeout will take years to trigger. Treat the timer as disabled. */ There should be at least a brief mention that it's because the QEMU timer can't handle larger values, If it can't handle higher values, maybe it's better to just set the timer value to INT64_MAX when we detect an overflow? That would make the code plainly obvious. What about below comment (a mix of both :)): /* Timeout calculated with (target_bit + 1) 62 will take * hundreds of years to trigger. Treat the timer as disabled. * Also this timeout is within the qemu supported maximum * timeout limit (INT64_MAX.). */ -Bharat Alex with the detailed explanation in the changelog. A better lower bound on the number of years would be nice as well (e.g. hundreds of years). -Scott
Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, June 11, 2013 6:10 PM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu- de...@nongnu.org Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 On 06/11/2013 01:40 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 10, 2013 11:40 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org; qemu- de...@nongnu.org; Wood Scott-B07421 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 On 10.06.2013, at 19:20, Scott Wood wrote: On 06/10/2013 09:26:18 AM, Alexander Graf wrote: On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Andreas Färber [mailto:afaer...@suse.de] Sent: Monday, June 10, 2013 5:43 PM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 So IIUC we can only allow 63 bits due to signedness, thus a maximum of (1 62), thus target_bit= 61. Any chance at least the comment can be worded to explain that any better? Maybe also use (target-bit + 1= 63) or period INT64_MAX as condition? How about this: /* QEMU timer supports a maximum timer of INT64_MAX (0x7fff_). * Run booke fit/wdog timer when * ((1ULL target_bit + 1) 0x4000_), i.e target_bit = 61. * Also the time with this maximum target_bit (with current range of * CPU frequency PowerPC supports) will be many many years. So it is * pretty safe to stop the timer above this threshold. */ How about /* This timeout will take years to trigger. Treat the timer as disabled. */ There should be at least a brief mention that it's because the QEMU timer can't handle larger values, If it can't handle higher values, maybe it's better to just set the timer value to INT64_MAX when we detect an overflow? That would make the code plainly obvious. What about below comment (a mix of both :)): /* Timeout calculated with (target_bit + 1) 62 will take * hundreds of years to trigger. Treat the timer as disabled. * Also this timeout is within the qemu supported maximum * timeout limit (INT64_MAX.). */ Ok, next question: Why does return disable the timer? Actually here disabled means _not_ starting the timer. This function will be called to start timer initially and then later it is called to restart after every expiry. If we do not start then it is as good as stopped/disabled (it is not disabled in TCR). Probably saying do not start qemu timer or something similar is better than saying disabling the timer. Thanks -Bharat Alex
Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, June 11, 2013 6:27 PM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu- de...@nongnu.org Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 On 06/11/2013 02:47 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, June 11, 2013 6:10 PM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu- de...@nongnu.org Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 On 06/11/2013 01:40 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 10, 2013 11:40 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org; qemu- de...@nongnu.org; Wood Scott-B07421 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 On 10.06.2013, at 19:20, Scott Wood wrote: On 06/10/2013 09:26:18 AM, Alexander Graf wrote: On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Andreas Färber [mailto:afaer...@suse.de] Sent: Monday, June 10, 2013 5:43 PM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 So IIUC we can only allow 63 bits due to signedness, thus a maximum of (162), thus target_bit= 61. Any chance at least the comment can be worded to explain that any better? Maybe also use (target-bit + 1= 63) or periodINT64_MAX as condition? How about this: /* QEMU timer supports a maximum timer of INT64_MAX (0x7fff_). * Run booke fit/wdog timer when * ((1ULLtarget_bit + 1)0x4000_), i.e target_bit = 61. * Also the time with this maximum target_bit (with current range of * CPU frequency PowerPC supports) will be many many years. So it is * pretty safe to stop the timer above this threshold. */ How about /* This timeout will take years to trigger. Treat the timer as disabled. */ There should be at least a brief mention that it's because the QEMU timer can't handle larger values, If it can't handle higher values, maybe it's better to just set the timer value to INT64_MAX when we detect an overflow? That would make the code plainly obvious. What about below comment (a mix of both :)): /* Timeout calculated with (target_bit + 1) 62 will take * hundreds of years to trigger. Treat the timer as disabled. * Also this timeout is within the qemu supported maximum * timeout limit (INT64_MAX.). */ Ok, next question: Why does return disable the timer? Actually here disabled means _not_ starting the timer. This function will be called to start timer initially and then later it is called to restart after every expiry. If we do not start then it is as good as stopped/disabled (it is not disabled in TCR). Probably saying do not start qemu timer or something similar is better than saying disabling the timer. Couldn't you simply make things obvious from the code flow without pulling up assumptions? You yourself suggested to stop/disable timer above a threshold :) Something along the lines of if (overflow) { What is overflow? Do you mean something like this: diff --git a/hw/ppc/ppc_booke.c b/hw/ppc/ppc_booke.c index e41b036..5b84b96 100644 --- a/hw/ppc/ppc_booke.c +++ b/hw/ppc/ppc_booke.c @@ -133,15 +133,19 @@ static void booke_update_fixed_timer(CPUPPCState *env, ppc_tb_t *tb_env = env-tb_env; uint64_t lapse; uint64_t tb; -uint64_t period = 1 (target_bit + 1); +uint64_t period; uint64_t now; now = qemu_get_clock_ns(vm_clock); tb = cpu_ppc_get_tb(tb_env, now, tb_env-tb_offset); -lapse = period - ((tb - (1 target_bit)) (period - 1)); - -*next = now + muldiv64(lapse, get_ticks_per_sec(), tb_env-tb_freq); +if (target_bit = 62) { +*next = INT64_MAX; +} else { +period = 1ULL (target_bit + 1); +lapse = period - ((tb - (1 target_bit)) (period - 1)); +*next = now + muldiv64(lapse, get_ticks_per_sec(), tb_env-tb_freq); +} Thanks -Bharat *next = INT64_MAX; } qemu_mod_timer(timer, *next); Then everyone knows what's going on, we can always assume the timer is running and there's no need to understand complex corner cases. It feels more like the timer framework would be the one to decid to ignore timeouts that take years to finish. Alex
Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61
-Original Message- From: Peter Maydell [mailto:peter.mayd...@linaro.org] Sent: Monday, June 10, 2013 2:56 PM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 On 10 June 2013 08:55, Bharat Bhushan r65...@freescale.com wrote: QEMU timer supports a maximum timer of INT64_MAX. So starting timer only for time which is calculated using target_bit 62 and deactivate/stop timer if the target bit is above 61. Is this really what the hardware does? Or do we need to set a timer for the maximum QEMU allows and then reset it for the residual time when the first timer expires? No, this is not how hardware works. But the time with the maximum target bit of 61 (with current range of CPU frequency PowerPC supports) will be many many years. So I think it is pretty safe to stop the timer. Thanks -Bharat thanks -- PMM
Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61
-Original Message- From: Andreas Färber [mailto:afaer...@suse.de] Sent: Monday, June 10, 2013 5:43 PM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61 Am 10.06.2013 09:55, schrieb Bharat Bhushan: QEMU timer supports a maximum timer of INT64_MAX. So starting timer only for time which is calculated using target_bit 62 and deactivate/stop timer if the target bit is above 61. This patch also fix the time calculation from target_bit. The code was doing (1 (target_bit + 1)) while this should be (1ULL (target_bit + 1)). Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1-v2 - Added booke: timer: in patch subject hw/ppc/ppc_booke.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/hw/ppc/ppc_booke.c b/hw/ppc/ppc_booke.c index e41b036..f4eda15 100644 --- a/hw/ppc/ppc_booke.c +++ b/hw/ppc/ppc_booke.c @@ -133,9 +133,15 @@ static void booke_update_fixed_timer(CPUPPCState *env, ppc_tb_t *tb_env = env-tb_env; uint64_t lapse; uint64_t tb; -uint64_t period = 1 (target_bit + 1); +uint64_t period; uint64_t now; +/* Deactivate timer for target_bit 61 */ +if (target_bit 61) +return; Braces missing and trailing whitespace after return. Ok, will correct So IIUC we can only allow 63 bits due to signedness, thus a maximum of (1 62), thus target_bit = 61. Any chance at least the comment can be worded to explain that any better? Maybe also use (target-bit + 1 = 63) or period INT64_MAX as condition? How about this: /* QEMU timer supports a maximum timer of INT64_MAX (0x7fff_). * Run booke fit/wdog timer when * ((1ULL target_bit + 1) 0x4000_), i.e target_bit = 61. * Also the time with this maximum target_bit (with current range of * CPU frequency PowerPC supports) will be many many years. So it is * pretty safe to stop the timer above this threshold. */ Best regards, Andreas + +period = 1ULL (target_bit + 1); + now = qemu_get_clock_ns(vm_clock); tb = cpu_ppc_get_tb(tb_env, now, tb_env-tb_offset); -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v2] ppc: initialize GPRs as per epapr
-Original Message- From: Wood Scott-B07421 Sent: Monday, April 29, 2013 11:21 PM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Yoder Stuart- B08248; da...@gibson.dropbear.id.au; Bhushan Bharat-R65777; Bhushan Bharat- R65777 Subject: Re: [PATCH v2] ppc: initialize GPRs as per epapr On 04/29/2013 09:40:56 AM, Bharat Bhushan wrote: ePAPR defines the initial values of cpu registers. This patch initialize the GPRs as per ePAPR specification. This resolves the issue of guest reboot/reset (guest hang on reboot). Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1--v2 - Dynemic seting of initial map size in gpr[7] - For this the tlb size calculation is moved into a function. hw/ppc/e500.c | 29 ++--- 1 files changed, 26 insertions(+), 3 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index c1bdb6b..decd86c 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -37,6 +37,7 @@ #include qemu/host-utils.h #include hw/pci-host/ppce500.h +#define EPAPR_MAGIC(0x45504150) #define BINARY_DEVICE_TREE_FILEmpc8544ds.dtb #define UIMAGE_LOAD_BASE 0 #define DTC_LOAD_PAD 0x180 @@ -393,11 +394,10 @@ static inline hwaddr booke206_page_size_to_tlb(uint64_t size) return 63 - clz64(size 10); } -static void mmubooke_create_initial_mapping(CPUPPCState *env) +static int booke206_initial_map_tsize(CPUPPCState *env) { struct boot_info *bi = env-load_info; -ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0); -hwaddr size, dt_end; +hwaddr dt_end; int ps; /* Our initial TLB entry needs to cover everything from 0 to @@ -408,6 +408,23 @@ static void mmubooke_create_initial_mapping(CPUPPCState *env) /* e500v2 can only do even TLB size bits */ ps++; } +return ps; +} +static uint64_t mmubooke_initial_mapsize(CPUPPCState *env) Please leave a blank line after each of those } lines. You change the function name to look like it's simply returning information, but it still creates the TLB entry as far as I can see. This is just returning the tlb size . not creating the tlb entry. Then you go calling it multiple times (why?). This may not be harmful, but it is ugly. +{ +int tsize; + +tsize = booke206_initial_map_tsize(env); +return (1ULL 10 tsize); +} What value does this wrapper have? This returning the size on initial tlb entry in bytes. The caller needs both the tsize and the size in bytes, and you only call this wrapper once, so let the caller do the conversion instead. Caller does not need to know both. It only need to know the size in bytes. I think git diff make a mess of this. just trying to put the code for clarity static int booke206_initial_map_tsize(CPUPPCState *env) { struct boot_info *bi = env-load_info; hwaddr dt_end; int ps; /* Our initial TLB entry needs to cover everything from 0 to the device tree top */ dt_end = bi-dt_base + bi-dt_size; ps = booke206_page_size_to_tlb(dt_end) + 1; if (ps 1) { /* e500v2 can only do even TLB size bits */ ps++; } return ps; } static uint64_t mmubooke_initial_mapsize(CPUPPCState *env) { int tsize; tsize = booke206_initial_map_tsize(env); return (1ULL 10 tsize); } static void mmubooke_create_initial_mapping(CPUPPCState *env) { ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0); hwaddr size; int ps; ps = booke206_initial_map_tsize(env); size = (ps MAS1_TSIZE_SHIFT); tlb-mas1 = MAS1_VALID | size; tlb-mas2 = 0; tlb-mas7_3 = 0; tlb-mas7_3 |= MAS3_UR | MAS3_UW | MAS3_UX | MAS3_SR | MAS3_SW | MAS3_SX; env-tlb_dirty = true; } -- Thanks -Bharat -Scott
Re: [Qemu-devel] [PATCH] ppc: initialize GPRs as per epapr
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, April 26, 2013 11:51 AM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; Wood Scott-B07421; Bhushan Bharat-R65777; Yoder Stuart-B08248 Subject: Re: [PATCH] ppc: initialize GPRs as per epapr On 26.04.2013, at 08:17, Bharat Bhushan wrote: ePAPR defines the initial values of cpu registers. This patch initialize the GPRs as per ePAPR specification. This resolves the issue of guest reboot/reset (guest hang on reboot). Why does it hang only on reboot, not on initial bootup? may be memory pointed by env pointer are zero initialized initially. Reboot also not always hangs. I have seen reboot mostly working on e500v2/e500mc and mostly hanging on e5500. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Stuart Yoder stuart.yo...@freescale.com --- hw/ppc/e500.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index c1bdb6b..a47f976 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -37,6 +37,7 @@ #include qemu/host-utils.h #include hw/pci-host/ppce500.h +#define EPAPR_MAGIC(0x45504150) #define BINARY_DEVICE_TREE_FILEmpc8544ds.dtb #define UIMAGE_LOAD_BASE 0 #define DTC_LOAD_PAD 0x180 @@ -444,6 +445,12 @@ static void ppce500_cpu_reset(void *opaque) Does ePAPR mention anything wrt GPR state of secondary CPUs? Yes, I think we handle this in hw/ppc/ppce500_spin.c cs-halted = 0; env-gpr[1] = (1620) - 8; env-gpr[3] = bi-dt_base; +env-gpr[4] = 0; +env-gpr[5] = 0; +env-gpr[6] = EPAPR_MAGIC; +env-gpr[7] = (64 * 1024 * 1024); What is this? Size of initial TLB ( should be big enough to cover kernel handler). I do not see ePAPR defines any value, I set this to 64M. -Bharat Alex +env-gpr[8] = 0; +env-gpr[9] = 0; env-nip = bi-entry; mmubooke_create_initial_mapping(env); } -- 1.7.0.4
Re: [Qemu-devel] RFC: vfio API changes needed for powerpc (v3)
So now the sequence would be something like: 1)VFIO_GROUP_SET_CONTAINER // add groups to the container 2)VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model 3)count = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of MSI banks 4)VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture 5)VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows, including MSI banks 6) For (int I = 0; I count; i++) VFIO_IOMMU_PAMU_MAP_MSI_BANK() // map the MSI banks, do not enable aperture here. 7) Memory Listener will call- VFIO_IOMMU_MAP_DMA// map the guest's memory --- kernel enables aperture here on first VFIO_IOMMU_MAP_DMA 8)VFIO_DEVICE_SET_IRQS --- VFIO in kernel makes pci_enable_msix()/pci_enable_msi_block() calls, this sets actual MSI addr/data in physical device. --- As the address set by above APIs is not what we want so - is using MSIX, VFIO will update address in the MSI-X table - if using MSI, update MSI address in PCI configuration space. Thanks -Bharat -Original Message- From: Yoder Stuart-B08248 Sent: Friday, April 05, 2013 3:40 AM To: Alex Williamson Cc: Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; Sethi Varun-B16395; k...@vger.kernel.org; qemu-devel@nongnu.org; io...@lists.linux-foundation.org Subject: RFC: vfio API changes needed for powerpc (v3) -v3 updates -made vfio_pamu_attr a union, added flags -s/VFIO_PAMU_/VFIO_IOMMU_PAMU_/ for the ioctls to make it more clear which fd is being operated on -added flags to vfio_pamu_msi_bank_map/umap -VFIO_PAMU_GET_MSI_BANK_COUNT now just returns a __u32 not a struct -fixed some typos The Freescale PAMU is an aperture-based IOMMU with the following characteristics. Each device has an entry in a table in memory describing the iova-phys mapping. The mapping has: -an overall aperture that is power of 2 sized, and has a start iova that is naturally aligned -has 1 or more windows within the aperture -number of windows must be power of 2, max is 256 -size of each window is determined by aperture size / # of windows -iova of each window is determined by aperture start iova / # of windows -the mapped region in each window can be different than the window size...mapping must power of 2 -physical address of the mapping must be naturally aligned with the mapping size These ioctls operate on the VFIO file descriptor (/dev/vfio/vfio). /* * VFIO_IOMMU_PAMU_GET_ATTR * * Gets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets argsz and attribute. The ioctl fills in * the provided struct vfio_pamu_attr based on the attribute * value that was set. * Return: 0 on success, -errno on failure */ struct vfio_pamu_attr { __u32 argsz; __u32 flags;/* no flags currently */ __u32 attribute; union { /* VFIO_ATTR_GEOMETRY */ struct { __u64 aperture_start; /* first addr that can be mapped */ __u64 aperture_end; /* last addr that can be mapped */ } attr; /* VFIO_ATTR_WINDOWS */ __u32 windows; /* number of windows in the aperture */ /* initially this will be the max number * of windows that can be set */ /* VFIO_ATTR_PAMU_STASH */ struct { __u32 cpu; /* CPU number for stashing */ __u32 cache; /* cache ID for stashing */ } stash; } }; #define VFIO_IOMMU_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_IOMMU_PAMU_SET_ATTR * * Sets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets struct vfio_pamu attr, including argsz and attribute and * setting any fields that are valid for the attribute. * Return: 0 on success, -errno on failure */ #define VFIO_IOMMU_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT * * Returns the number of MSI banks for this platform. This tells user space * how many aperture windows should be reserved for MSI banks when setting * the PAMU geometry and window count. * Return: __u32 bank count on success, -errno on failure */ #define VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, __u32) /* * VFIO_IOMMU_PAMU_MAP_MSI_BANK
Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, February 14, 2013 11:57 PM To: Bhushan Bharat-R65777 Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu- de...@nongnu.org; qemu-...@nongnu.org Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote: Hi Alex Williamson, I am able (with some hacks :)) to directly assign the e1000 PCI device to KVM guest using VFIO on Freescale device. One of the problem I am facing is about the DMA mapping in IOMMU for PCI device BARs. On Freescale devices, the mmaped PCI device BARs are not required to be mapped in IOMMU. Typically the flow of in/out transaction (from CPU) is: Incoming flow: -| |--| |---| |-| CORE |-| IOMMU|--| PCI-Controller|--- | PCI device | -| |--| |---| |-| Outgoing Flow: IOMMU is bypassed for out transactions -| |--| |---| |-| CORE |--| | IOMMU|---| PCI-Controller|--- | PCI device | -| | |--| ^ |---| |-| | | |--| Mapping the BAR into the IOMMU isn't for this second use case, CPU to device is never IOMMU translated. Instead, it's for this: |--| |---| |-| | IOMMU|--| PCI-Controller|---| PCI device | |--| |---| |-| | | |---| |-| +- -| PCI-Controller|---| PCI device | |---| |-| It's perfectly fine to not support peer-to-peer DMA, but let's skip it where it's not supported in case it might actually work in some cases. Also because of some hardware limitations on our IOMMU it is difficult to map these BAR regions with RAM (DDR) regions. So on Freescale device we want the VFIO_IOMMU_MAP_DMA ioctl to be called for RAM regions (DDR) only and _not_ for these mmaped ram regions of PCI device bars. I can understand that we need to add these mmaped PCI bars as RAM type in qemu memory_region_*(). So for that I tried to skip these regions in VFIO memory_listeners. Below changes which works for me. I am not sure whether this is correct approach, please suggest. - diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..63728d8 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, return -errno; } -static bool vfio_listener_skipped_section(MemoryRegionSection *section) +static int memory_region_is_mmap_bars(VFIOContainer *container, + MemoryRegionSection *section) { -return !memory_region_is_ram(section-mr); +VFIOGroup *group; +VFIODevice *vdev; +int i; + +QLIST_FOREACH(group, container-group_list, next) { I think you want to start at the global group_list Why you think global? My understanding is that the operation on dma_map/unmap is limited to a group in the given container. +QLIST_FOREACH(vdev, group-device_list, next) { +if (vdev-msix-mmap_mem.ram_addr == + section-mr-ram_addr) Note the comment in memory.h: struct MemoryRegion { /* All fields are private - violators will be prosecuted */ ... ram_addr_t ram_addr; You'll need to invent some kind of memory region comparison interfaces instead of accessing these private fields. You mean adding some api ine memory.c? +return 1; +for (i = 0; i PCI_ROM_SLOT; i++) { +VFIOBAR *bar = vdev-bars[i]; +if (bar-mmap_mem.ram_addr == section-mr-ram_addr) +return 1; +} +} +} + +return 0; +} It's a pretty heavy weight function, I think the memory API should help us differentiate MMIO from RAM. What about adding a bool in struct MemoryRegion { Bool require_iommu_map; There will be APIs to set/clear this bool and default all ram = true, will have require_iommu_map = true if (!(memory_region-ram == true memory_region-require_iommu_map == true) skip_iommumap; +static bool vfio_listener_skipped_section(VFIOContainer *container, + MemoryRegionSection +*section) { +if (!memory_region_is_ram(section-mr)) + return 1; Need
Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, February 19, 2013 9:39 AM To: Bhushan Bharat-R65777 Cc: Alex Williamson; Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander- B36701; qemu-devel@nongnu.org; qemu-...@nongnu.org Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars On 02/18/2013 10:05:20 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, February 14, 2013 11:57 PM To: Bhushan Bharat-R65777 Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu- de...@nongnu.org; qemu-...@nongnu.org Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote: +static bool vfio_listener_skipped_section(VFIOContainer *container, + MemoryRegionSection +*section) { +if (!memory_region_is_ram(section-mr)) + return 1; Need some kind of conditional here for platforms that support peer-to-peer. Thanks, What about adding a iommu_type will help here (VFIO_TYPE2_IOMMU), which does not require BARs to be mapped in iommu ? What specifically does type 2 mean again? If we're going to use it as the thing to key off of for all sorts of PAMU quirks, might as well just call it VFIO_IOMMU_FSL_PAMU. Just thought of a common name so in case some other IOMMU also fall in this category and add a comment where this will be defined. I am fine with this name as well :) Thanks -Bharat
Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Tuesday, February 19, 2013 9:55 AM To: Bhushan Bharat-R65777 Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu- de...@nongnu.org; qemu-...@nongnu.org Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars On Tue, 2013-02-19 at 04:05 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, February 14, 2013 11:57 PM To: Bhushan Bharat-R65777 Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu- de...@nongnu.org; qemu-...@nongnu.org Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote: Hi Alex Williamson, I am able (with some hacks :)) to directly assign the e1000 PCI device to KVM guest using VFIO on Freescale device. One of the problem I am facing is about the DMA mapping in IOMMU for PCI device BARs. On Freescale devices, the mmaped PCI device BARs are not required to be mapped in IOMMU. Typically the flow of in/out transaction (from CPU) is: Incoming flow: -| |--| |---| |-| CORE |-| IOMMU|--| PCI-Controller|--- | PCI device | -| |--| |---| |-| Outgoing Flow: IOMMU is bypassed for out transactions -| |--| |---| |-| CORE |--| | IOMMU|---| PCI-Controller|--- | PCI device | -| | |--| ^ |---| |-| | | |--| Mapping the BAR into the IOMMU isn't for this second use case, CPU to device is never IOMMU translated. Instead, it's for this: |--| |---| |-| | IOMMU|--| PCI-Controller|---| PCI device | |--| |---| |-| | | |---| |-| +- -| PCI-Controller|---| PCI device | |---| |-| It's perfectly fine to not support peer-to-peer DMA, but let's skip it where it's not supported in case it might actually work in some cases. Also because of some hardware limitations on our IOMMU it is difficult to map these BAR regions with RAM (DDR) regions. So on Freescale device we want the VFIO_IOMMU_MAP_DMA ioctl to be called for RAM regions (DDR) only and _not_ for these mmaped ram regions of PCI device bars. I can understand that we need to add these mmaped PCI bars as RAM type in qemu memory_region_*(). So for that I tried to skip these regions in VFIO memory_listeners. Below changes which works for me. I am not sure whether this is correct approach, please suggest. - diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..63728d8 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, return -errno; } -static bool vfio_listener_skipped_section(MemoryRegionSection *section) +static int memory_region_is_mmap_bars(VFIOContainer *container, + MemoryRegionSection +*section) { -return !memory_region_is_ram(section-mr); +VFIOGroup *group; +VFIODevice *vdev; +int i; + +QLIST_FOREACH(group, container-group_list, next) { I think you want to start at the global group_list Why you think global? My understanding is that the operation on dma_map/unmap is limited to a group in the given container. There can theoretically be more than one container, so if you're only searching groups within a container then you may not be searching all the devices. +QLIST_FOREACH(vdev, group-device_list, next) { +if (vdev-msix-mmap_mem.ram_addr == + section-mr-ram_addr) Note the comment in memory.h: struct MemoryRegion { /* All fields are private - violators will be prosecuted */ ... ram_addr_t ram_addr; You'll need to invent some kind of memory region comparison interfaces instead of accessing these private fields. You mean adding some api ine memory.c? Yes +return 1; +for (i = 0; i PCI_ROM_SLOT; i++) { +VFIOBAR *bar
[Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
Hi Alex Williamson, I am able (with some hacks :)) to directly assign the e1000 PCI device to KVM guest using VFIO on Freescale device. One of the problem I am facing is about the DMA mapping in IOMMU for PCI device BARs. On Freescale devices, the mmaped PCI device BARs are not required to be mapped in IOMMU. Typically the flow of in/out transaction (from CPU) is: Incoming flow: -| |--| |---| |-| CORE |-| IOMMU|--| PCI-Controller|---| PCI device | -| |--| |---| |-| Outgoing Flow: IOMMU is bypassed for out transactions -| |--| |---| |-| CORE |--| | IOMMU|---| PCI-Controller|---| PCI device | -| | |--| ^ |---| |-| | | |--| Also because of some hardware limitations on our IOMMU it is difficult to map these BAR regions with RAM (DDR) regions. So on Freescale device we want the VFIO_IOMMU_MAP_DMA ioctl to be called for RAM regions (DDR) only and _not_ for these mmaped ram regions of PCI device bars. I can understand that we need to add these mmaped PCI bars as RAM type in qemu memory_region_*(). So for that I tried to skip these regions in VFIO memory_listeners. Below changes which works for me. I am not sure whether this is correct approach, please suggest. - diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..63728d8 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, return -errno; } -static bool vfio_listener_skipped_section(MemoryRegionSection *section) +static int memory_region_is_mmap_bars(VFIOContainer *container, + MemoryRegionSection *section) { -return !memory_region_is_ram(section-mr); +VFIOGroup *group; +VFIODevice *vdev; +int i; + +QLIST_FOREACH(group, container-group_list, next) { +QLIST_FOREACH(vdev, group-device_list, next) { +if (vdev-msix-mmap_mem.ram_addr == section-mr-ram_addr) +return 1; +for (i = 0; i PCI_ROM_SLOT; i++) { +VFIOBAR *bar = vdev-bars[i]; +if (bar-mmap_mem.ram_addr == section-mr-ram_addr) +return 1; +} +} +} + +return 0; +} + +static bool vfio_listener_skipped_section(VFIOContainer *container, + MemoryRegionSection *section) +{ +if (!memory_region_is_ram(section-mr)) + return 1; + +return memory_region_is_mmap_bars(container, section); } static void vfio_listener_region_add(MemoryListener *listener, @@ -1129,7 +1155,7 @@ static void vfio_listener_region_add(MemoryListener *listener, void *vaddr; int ret; -if (vfio_listener_skipped_section(section)) { +if (vfio_listener_skipped_section(container, section)) { DPRINTF(vfio: SKIPPING region_add %HWADDR_PRIx - %PRIx64\n, section-offset_within_address_space, section-offset_within_address_space + section-size - 1); @@ -1173,7 +1199,7 @@ static void vfio_listener_region_del(MemoryListener *listener, hwaddr iova, end; int ret; -if (vfio_listener_skipped_section(section)) { +if (vfio_listener_skipped_section(container, section)) { DPRINTF(vfio: SKIPPING region_del %HWADDR_PRIx - %PRIx64\n, section-offset_within_address_space, section-offset_within_address_space + section-size - 1); - Thanks -Bharat
Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state
-Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Wednesday, January 16, 2013 9:36 PM To: Jason J. Herne Cc: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com; qemu- de...@nongnu.org; Bhushan Bharat-R65777; jan.kis...@siemens.com Subject: Re: [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state On Thu, Jan 10, 2013 at 10:29:04AM -0500, Jason J. Herne wrote: From: Jason J. Herne jjhe...@us.ibm.com do_kvm_cpu_synchronize_state is called via run_on_cpu, so we can only pass a single argument. Create SyncStateArgs struct for this purpose and add register bitmap data member to it. Signed-off-by: Jason J. Herne jjhe...@us.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com --- include/sysemu/kvm.h |6 ++ kvm-all.c| 27 +-- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index e0738ba..193d1f4 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -223,6 +223,12 @@ int kvm_check_extension(KVMState *s, unsigned int extension); uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function, uint32_t index, int reg); + +struct kvm_cpu_syncstate_args { +CPUState *cpu; +int regmap; +}; + void kvm_cpu_synchronize_state(CPUArchState *env); void kvm_cpu_synchronize_post_reset(CPUArchState *env); void kvm_cpu_synchronize_post_init(CPUArchState *env); diff --git a/kvm-all.c b/kvm-all.c index 1aa61bb..77ab72a 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -231,7 +231,7 @@ int kvm_init_vcpu(CPUArchState *env) cpu-kvm_fd = ret; cpu-kvm_state = s; -cpu-kvm_vcpu_dirty = true; +cpu-kvm_vcpu_dirty = KVM_REGSYNC_FULL_STATE; mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0); if (mmap_size 0) { @@ -1491,20 +1491,27 @@ void kvm_flush_coalesced_mmio_buffer(void) static void do_kvm_cpu_synchronize_state(void *arg) { -CPUState *cpu = arg; +struct kvm_cpu_syncstate_args *args = arg; -if (!cpu-kvm_vcpu_dirty) { -kvm_arch_get_registers(cpu, KVM_REGSYNC_FULL_STATE); -cpu-kvm_vcpu_dirty = true; +/* Do not sync regs that are already dirty */ +int regs_to_get = args-regmap ~args-cpu-kvm_vcpu_dirty; + +if (regs_to_get) { +kvm_arch_get_registers(args-cpu, regs_to_get); +args-cpu-kvm_vcpu_dirty |= regs_to_get; } } void kvm_cpu_synchronize_state(CPUArchState *env) { CPUState *cpu = ENV_GET_CPU(env); +struct kvm_cpu_syncstate_args args; + +args.cpu = cpu; +args.regmap = KVM_REGSYNC_FULL_STATE; -if (!cpu-kvm_vcpu_dirty) { -run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu); +if (args.regmap ~cpu-kvm_vcpu_dirty) { +run_on_cpu(cpu, do_kvm_cpu_synchronize_state, args); } } @@ -1513,7 +1520,7 @@ void kvm_cpu_synchronize_post_reset(CPUArchState *env) CPUState *cpu = ENV_GET_CPU(env); kvm_arch_put_registers(cpu, KVM_REGSYNC_RESET_STATE); -cpu-kvm_vcpu_dirty = false; +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RESET_STATE; } void kvm_cpu_synchronize_post_init(CPUArchState *env) @@ -1521,7 +1528,7 @@ void kvm_cpu_synchronize_post_init(CPUArchState *env) CPUState *cpu = ENV_GET_CPU(env); kvm_arch_put_registers(cpu, KVM_REGSYNC_FULL_STATE); -cpu-kvm_vcpu_dirty = false; +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_FULL_STATE; } int kvm_cpu_exec(CPUArchState *env) @@ -1540,7 +1547,7 @@ int kvm_cpu_exec(CPUArchState *env) do { if (cpu-kvm_vcpu_dirty) { kvm_arch_put_registers(cpu, KVM_REGSYNC_RUNTIME_STATE); -cpu-kvm_vcpu_dirty = false; +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RUNTIME_STATE; } 1) This implies a vcpu can enter guest mode with kvm_vcpu_dirty non-zero. I think above code should be: kvm_arch_put_registers(cpu, cpu-kvm_vcpu_dirty); cpu-kvm_vcpu_dirty = false; so vcpu will not enter guest state with dirty registers in qemu. Unrelated: 2) Also, what is the reason for specifying sets of registers in arch-specific code? Is that because it allows PPC to fix their sync-timer register problem? When you are writing generic code, what does it mean to use 'KVM_REGSYNC_{RUNTIME,RESET,FULL}_STATE' ? Answer: it depends on the architecture. 3) On x86, kvm_arch_get_registers(GET_FULL) must not imply kvm_arch_put_registers(PUT_FULL). The S/390 problem, from http://lists.nongnu.org/archive/html/qemu-devel/2012-11/msg02213.html: The kvm register sync needs to happen in the kvm register sync function :) That would eliminate the whole purpose of sync regs and forces us to have an expensive ioctl on lots of exits (again). I
Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state
-Original Message- From: Jason J. Herne [mailto:jjhe...@us.ibm.com] Sent: Thursday, January 10, 2013 8:59 PM To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com; mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777; jan.kis...@siemens.com Cc: Jason J. Herne Subject: [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state From: Jason J. Herne jjhe...@us.ibm.com do_kvm_cpu_synchronize_state is called via run_on_cpu, so we can only pass a single argument. Create SyncStateArgs struct for this purpose and add register bitmap data member to it. Signed-off-by: Jason J. Herne jjhe...@us.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com --- include/sysemu/kvm.h |6 ++ kvm-all.c| 27 +-- 2 files changed, 23 insertions(+), 10 deletions(-) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index e0738ba..193d1f4 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -223,6 +223,12 @@ int kvm_check_extension(KVMState *s, unsigned int extension); uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function, uint32_t index, int reg); + +struct kvm_cpu_syncstate_args { +CPUState *cpu; +int regmap; +}; + void kvm_cpu_synchronize_state(CPUArchState *env); void kvm_cpu_synchronize_post_reset(CPUArchState *env); void kvm_cpu_synchronize_post_init(CPUArchState *env); diff --git a/kvm-all.c b/kvm-all.c index 1aa61bb..77ab72a 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -231,7 +231,7 @@ int kvm_init_vcpu(CPUArchState *env) cpu-kvm_fd = ret; cpu-kvm_state = s; -cpu-kvm_vcpu_dirty = true; +cpu-kvm_vcpu_dirty = KVM_REGSYNC_FULL_STATE; mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0); if (mmap_size 0) { @@ -1491,20 +1491,27 @@ void kvm_flush_coalesced_mmio_buffer(void) static void do_kvm_cpu_synchronize_state(void *arg) { -CPUState *cpu = arg; +struct kvm_cpu_syncstate_args *args = arg; -if (!cpu-kvm_vcpu_dirty) { -kvm_arch_get_registers(cpu, KVM_REGSYNC_FULL_STATE); -cpu-kvm_vcpu_dirty = true; +/* Do not sync regs that are already dirty */ +int regs_to_get = args-regmap ~args-cpu-kvm_vcpu_dirty; + +if (regs_to_get) { +kvm_arch_get_registers(args-cpu, regs_to_get); +args-cpu-kvm_vcpu_dirty |= regs_to_get; } } void kvm_cpu_synchronize_state(CPUArchState *env) { CPUState *cpu = ENV_GET_CPU(env); +struct kvm_cpu_syncstate_args args; + +args.cpu = cpu; +args.regmap = KVM_REGSYNC_FULL_STATE; -if (!cpu-kvm_vcpu_dirty) { -run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu); +if (args.regmap ~cpu-kvm_vcpu_dirty) { +run_on_cpu(cpu, do_kvm_cpu_synchronize_state, args); } } @@ -1513,7 +1520,7 @@ void kvm_cpu_synchronize_post_reset(CPUArchState *env) CPUState *cpu = ENV_GET_CPU(env); kvm_arch_put_registers(cpu, KVM_REGSYNC_RESET_STATE); -cpu-kvm_vcpu_dirty = false; +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RESET_STATE; } void kvm_cpu_synchronize_post_init(CPUArchState *env) @@ -1521,7 +1528,7 @@ void kvm_cpu_synchronize_post_init(CPUArchState *env) CPUState *cpu = ENV_GET_CPU(env); kvm_arch_put_registers(cpu, KVM_REGSYNC_FULL_STATE); -cpu-kvm_vcpu_dirty = false; +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_FULL_STATE; } int kvm_cpu_exec(CPUArchState *env) @@ -1540,7 +1547,7 @@ int kvm_cpu_exec(CPUArchState *env) do { if (cpu-kvm_vcpu_dirty) { kvm_arch_put_registers(cpu, KVM_REGSYNC_RUNTIME_STATE); s/KVM_REGSYNC_RUNTIME_STATE/cpu-kvm_vcpu_dirty -cpu-kvm_vcpu_dirty = false; +cpu-kvm_vcpu_dirty = ~KVM_REGSYNC_RUNTIME_STATE; cpu-kvm_vcpu_dirty = 0; -Bharat } kvm_arch_pre_run(cpu, run); -- 1.7.9.5
Re: [Qemu-devel] [PATCH 0/7 v2] KVM regsync
-Original Message- From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com] Sent: Thursday, January 10, 2013 8:59 PM To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com; mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777; jan.kis...@siemens.com Subject: [PATCH 0/7 v2] KVM regsync Rework the method used to synchronize CPU registers between Qemu KVM. This patch set extends kvm_arch_put_registers() and kvm_arch_get_registers() to take a register bitmap parameter. All existing code paths are updated to specify this new parameter. IMPORTANT NOTE: The PPC and i386 implementations are incomplete. I am submitting this code at this time only to get a review on the implementation of the existing code and to perhaps seek assistance with the mentioned architectures. I am not sure who will finish the implementation of PPC/i386 yet. Due to the fact that I am unfamiliar with these architectures at the register level and I do not have test environments I would like to humbly request that a maintainer of these architectures take a look at it. Or perhaps Bharat could handle the PPC code? This would only leave i386 to worry about. If I cannot find someone to handle i386 I will look into the feasibility of completing it myself. In order to complete the missing implementations, kvm_arch_get_registers and kvm_arch_put_registers (and associated helper functions) will need to be updated to only sync the registers contained in the new bitmap argument. Also, each set of registers represented by one of the bits must be mutually exclusive with respect to every other bit. if this is not the case then local register data can be lost when kvm_arch_get_registers is called causing an old register value to overwrite a newer local value. I think net step can be let the register bitmap to go all upto the KVM, so the KVM also read/write the specified registers set only. Thanks -Bharat
Re: [Qemu-devel] [PATCH 0/7 v2] KVM regsync
-Original Message- From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com] Sent: Thursday, January 10, 2013 8:59 PM To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com; mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777; jan.kis...@siemens.com Subject: [PATCH 0/7 v2] KVM regsync Rework the method used to synchronize CPU registers between Qemu KVM. This patch set extends kvm_arch_put_registers() and kvm_arch_get_registers() to take a register bitmap parameter. All existing code paths are updated to specify this new parameter. IMPORTANT NOTE: The PPC and i386 implementations are incomplete. I am submitting this code at this time only to get a review on the implementation of the existing code and to perhaps seek assistance with the mentioned architectures. I am not sure who will finish the implementation of PPC/i386 yet. Due to the fact that I am unfamiliar with these architectures at the register level and I do not have test environments I would like to humbly request that a maintainer of these architectures take a look at it. Or perhaps Bharat could handle the PPC code? This would only leave I will handle the PPC code. -Bharat i386 to worry about. If I cannot find someone to handle i386 I will look into the feasibility of completing it myself. In order to complete the missing implementations, kvm_arch_get_registers and kvm_arch_put_registers (and associated helper functions) will need to be updated to only sync the registers contained in the new bitmap argument. Also, each set of registers represented by one of the bits must be mutually exclusive with respect to every other bit. if this is not the case then local register data can be lost when kvm_arch_get_registers is called causing an old register value to overwrite a newer local value. -- -- Jason J. Herne (jjhe...@linux.vnet.ibm.com)
Re: [Qemu-devel] [PATCH 3/3 v2] Enable kvm emulated watchdog
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, January 10, 2013 9:07 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 3/3 v2] Enable kvm emulated watchdog On 28.12.2012, at 06:16, Bharat Bhushan wrote: Enable the KVM emulated watchdog if KVM supports (use the capability enablement in watchdog handler). Also watchdog exit (KVM_EXIT_WATCHDOG) handling is added. Watchdog state machine is cleared whenever VM state changes to running. This is to handle the cases like return from debug halt etc. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v2: - access cap_* from target_ppc/kvm.c only. - Added wrapper functions in target_ppc/kvm.c for enable_watchdog and tsr_sregs synchronization. - Incorporated other Review comments hw/ppc.h |1 + hw/ppc_booke.c | 36 +++- target-ppc/kvm.c | 56 ++ target-ppc/kvm_ppc.h | 11 + 4 files changed, 103 insertions(+), 1 deletions(-) diff --git a/hw/ppc.h b/hw/ppc.h index 2f3ea27..6ad9e1f 100644 --- a/hw/ppc.h +++ b/hw/ppc.h @@ -90,3 +90,4 @@ enum { /* ppc_booke.c */ void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t flags); +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env); diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..7273259 100644 --- a/hw/ppc_booke.c +++ b/hw/ppc_booke.c @@ -28,7 +28,7 @@ #include nvram.h #include qemu-log.h #include loader.h - +#include kvm_ppc.h /* Timer Control Register */ @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque) booke_timer-wdt_timer); } +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env) { +env-spr[SPR_BOOKE_TSR] = ~(TSR_ENW | TSR_WIS | TSR_WRS_MASK); } + void store_booke_tsr(CPUPPCState *env, target_ulong val) { env-spr[SPR_BOOKE_TSR] = ~val; @@ -241,10 +246,27 @@ static void ppc_booke_timer_reset_handle(void *opaque) booke_update_irq(env); } +static void cpu_state_change_handler(void *opaque, int running, +RunState state) { Needs a comment when this happens +CPUPPCState *env = opaque; + +if (!running) { +return; +} + +/* + * Clear watchdog interrupt condition by clearing TSR. + */ +ppc_booke_watchdog_clear_tsr(env); + +kvmppc_synch_sregs_tsr(env); kvmppc_sync_tsr. Also please add the sync to store_booke_tsr(). Then here, you can just do store_booke_tsr(TSR_ENW | TSR_WIS | TSR_WRS_MASK); +} + void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t flags) { ppc_tb_t *tb_env; booke_timer_t *booke_timer; +int ret = 0; tb_env = g_malloc0(sizeof(ppc_tb_t)); booke_timer = g_malloc0(sizeof(booke_timer_t)); @@ -262,5 +284,17 @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t flags) booke_timer-wdt_timer = qemu_new_timer_ns(vm_clock, booke_wdt_cb, env); +ret = kvmppc_booke_watchdog_enable(env); + +if (ret) { +/* TODO: Start the QEMU emulated watchdog if not running on KVM. + * Also start the QEMU emulated watchdog if KVM does not support + * emulated watchdog or somehow it is not enabled (supported but + * not enabled is though some bug and requires debugging :)). + */ +} + +qemu_add_vm_change_state_handler(cpu_state_change_handler, env); + qemu_register_reset(ppc_booke_timer_reset_handle, env); } diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 3f5df57..6828afa 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -32,10 +32,12 @@ #include device_tree.h #include hw/sysbus.h #include hw/spapr.h +#include hw/watchdog.h #include hw/sysbus.h #include hw/spapr.h #include hw/spapr_vio.h +#include hw/ppc.h //#define DEBUG_KVM @@ -61,6 +63,7 @@ static int cap_ppc_smt; static int cap_ppc_rma; static int cap_spapr_tce; static int cap_hior; +static int cap_ppc_watchdog; /* XXX We have a race condition where we actually have a level triggered * interrupt, but the infrastructure can't expose that yet, so the guest @@ -90,6 +93,7 @@ int kvm_arch_init(KVMState *s) cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA); cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE); cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR); +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG); if (!cap_interrupt_level) { fprintf(stderr, KVM: Couldn't find level irq capability. Expect the @@ -823,6 +827,12 @@ int kvm_arch_handle_exit(CPUPPCState *env, struct kvm_run *run) ret = 0; break; #endif +case
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
-Original Message- From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com] Sent: Friday, January 04, 2013 11:10 PM To: Alexander Graf Cc: Christian Borntraeger; Anthony Liguori; Marcelo Tosatti; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On 01/04/2013 11:27 AM, Alexander Graf wrote: On 04.01.2013, at 16:25, Jason J. Herne wrote: If I've followed the conversation correctly this is what needs to be done: 1. Remove the level parameters from kvm_arch_get_registers and kvm_arch_put_registers. 2. Add a new bitmap parameter to kvm_arch_get_registers and kvm_arch_put_registers. I would combine these into replace levels with bitmap. 3. Define a bit that correlates to our current notion of all runtime registers. This bit, and all bits in this bitmap, would be architecture specific. Why would that bit be architecture specific? All runtime registers == registers that gdb can access IIRC. The implementation on what exactly that means obviously is architecture specific, but the bit itself would not be, as the gdbstub wants to be able to synchronize in arch independent code. 4. Remove the cpustate-kvm_sync_dirty field. Replace it with a bitmap that tracks which bits are dirty and need to be synced back to KVM-land. 5. As we do today, we'll assume registers are dirty and turn on their corresponding bit in this new bitmap whenever we get the registers from KVM. Yes. Changing these semantics is nothing for today :). 6. Add other bits as needed on a case by case basis. Does this seem to match what was discussed, and what we want to do? It's probably the best way forward, keeping everyone happy. Please coordinate with Bharat on who actually wants to sit down to implement this. Or if you're quick you might be able to beat him to it regardless thanks to time zones :). Hi Bharat, How would you like to handle these changes? I can do them, or you could if you prefer. Please let me know. Hi Jason, I will be happy to see the patch from you and because of some other things if you think that it will take time then let me know, I will do the changes. This framework will be used by my watchdog patches and only thing I want is my watchdog changes get pushed in QEMU. Thanks -Bharat
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, January 04, 2013 2:11 PM To: Bhushan Bharat-R65777 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote: -Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Friday, January 04, 2013 7:08 AM To: Alexander Graf Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote: On 03.01.2013, at 19:48, Jason J. Herne wrote: On 01/03/2013 08:56 AM, Alexander Graf wrote: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; +CPUArchState *env = args-env; +int register_level = args-register_level; This probably becomes more readable if we explicitly revert back to unsynced state first: /* Write back local modifications at our current level */ if (register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(...); env-kvm_vcpu_dirty = 0; } and then do the sync we are requested to do: if (!env-kvm_vcpu_dirty) { ... } I agree, but only if we add a second conditional to the if 1st statement as such: if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) This is to cover the case where the caller is asking for register level 1 and we're already dirty at level 2. In this case, nothing should happen and we'll need the args-env-kvm_vcpu_dirty to ensure that is the case. As before, I'd prefer to make this explicit: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; CPUArchState *env = args-env; int register_level = args-register_level; if (register_level env-kvm_vcpu_dirty) { /* We are more dirty than we need to - all is well */ return; } /* Write back local modifications at our current level */ if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(env, env-kvm_vcpu_dirty); env-kvm_vcpu_dirty = 0; } if (!args-env-kvm_vcpu_dirty) { kvm_arch_get_registers(env, register_level); env-kvm_vcpu_dirty = register_level; } } Do you agree? Thanks for your time. :) Please also check out the discussions I've had with Bharat about his watchdog patches. There we need a mechanism to synchronize registers only when we actually need to, in order to avoid potential race conditions with a kernel timer. That particular case doesn't work well with levels. We can have multiple different potential race producers in the kernel that we need to avoid individually, so we can't always synchronize all of them when only one of them needs to be synchronized. The big question is what we should be doing about this. We basically have 3 options: * implement levels, treat racy registers as manually synchronized, as Bharat's latest patch set does * implement levels, add a bitmap for additional special synchronization bits * replace levels by bitmap I'm quite frankly not sure which one of the 3 would be the best way forward. Alex Implement read/write accessors for individual registers, similarly to how its done in the kernel. See kvm_cache_regs.h. Read/write for individual registers can be heavy for cases where multiple register needed to be updated. Is not it? It depends. We can still have classes of registers to synchronize that we could even synchronize using MANY_REG. For writes, things become even faster with individual accessors since we can easily coalesce all register writes until we hit the vcpu again into a MANY_REG array and write them in one go. For cases where multiple register needed to be synchronized then I would like to elaborate the options as: Option 1: int timer_func(CPUArchState env) { cpu_synchronize_state(env); //update env-timer_registers env-updated_regs_type |= TIMER_REGS_TYPE_BIT; To scale this one, it's probably also more clever to swap the logic: env-kvm_sync_extra |= SYNC_TIMER_REGS; cpu_synchronize_state(env); /* gets timer regs */ /* update env-timer_registers */ /* timer regs will be synchronized later because kvm_sync_extra has SYNC_TIMER_REGS set */ Your case right now is special in that we don't need to get any register state, but only write it. However, if we do cpu_synchronize_state(env); /* will not get timer regs
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, January 04, 2013 2:11 PM To: Bhushan Bharat-R65777 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote: -Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Friday, January 04, 2013 7:08 AM To: Alexander Graf Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote: On 03.01.2013, at 19:48, Jason J. Herne wrote: On 01/03/2013 08:56 AM, Alexander Graf wrote: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; +CPUArchState *env = args-env; +int register_level = args-register_level; This probably becomes more readable if we explicitly revert back to unsynced state first: /* Write back local modifications at our current level */ if (register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(...); env-kvm_vcpu_dirty = 0; } and then do the sync we are requested to do: if (!env-kvm_vcpu_dirty) { ... } I agree, but only if we add a second conditional to the if 1st statement as such: if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) This is to cover the case where the caller is asking for register level 1 and we're already dirty at level 2. In this case, nothing should happen and we'll need the args-env-kvm_vcpu_dirty to ensure that is the case. As before, I'd prefer to make this explicit: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; CPUArchState *env = args-env; int register_level = args-register_level; if (register_level env-kvm_vcpu_dirty) { /* We are more dirty than we need to - all is well */ return; } /* Write back local modifications at our current level */ if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(env, env-kvm_vcpu_dirty); env-kvm_vcpu_dirty = 0; } if (!args-env-kvm_vcpu_dirty) { kvm_arch_get_registers(env, register_level); env-kvm_vcpu_dirty = register_level; } } Do you agree? Thanks for your time. :) Please also check out the discussions I've had with Bharat about his watchdog patches. There we need a mechanism to synchronize registers only when we actually need to, in order to avoid potential race conditions with a kernel timer. That particular case doesn't work well with levels. We can have multiple different potential race producers in the kernel that we need to avoid individually, so we can't always synchronize all of them when only one of them needs to be synchronized. The big question is what we should be doing about this. We basically have 3 options: * implement levels, treat racy registers as manually synchronized, as Bharat's latest patch set does * implement levels, add a bitmap for additional special synchronization bits * replace levels by bitmap I'm quite frankly not sure which one of the 3 would be the best way forward. Alex Implement read/write accessors for individual registers, similarly to how its done in the kernel. See kvm_cache_regs.h. Read/write for individual registers can be heavy for cases where multiple register needed to be updated. Is not it? It depends. We can still have classes of registers to synchronize that we could even synchronize using MANY_REG. For writes, things become even faster with individual accessors since we can easily coalesce all register writes until we hit the vcpu again into a MANY_REG array and write them in one go. For cases where multiple register needed to be synchronized then I would like to elaborate the options as: Option 1: int timer_func(CPUArchState env) { cpu_synchronize_state(env); //update env-timer_registers env-updated_regs_type |= TIMER_REGS_TYPE_BIT; To scale this one, it's probably also more clever to swap the logic: env-kvm_sync_extra |= SYNC_TIMER_REGS; cpu_synchronize_state(env); /* gets timer regs */ /* update env-timer_registers */ /* timer regs will be synchronized later because kvm_sync_extra has SYNC_TIMER_REGS set */ Your case right now is special in that we don't need to get any register state, but only write it. However, if we do cpu_synchronize_state(env); /* will not get timer regs
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, January 04, 2013 3:59 PM To: Bhushan Bharat-R65777 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On 04.01.2013, at 11:23, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, January 04, 2013 2:11 PM To: Bhushan Bharat-R65777 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote: -Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Friday, January 04, 2013 7:08 AM To: Alexander Graf Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote: On 03.01.2013, at 19:48, Jason J. Herne wrote: On 01/03/2013 08:56 AM, Alexander Graf wrote: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; +CPUArchState *env = args-env; +int register_level = args-register_level; This probably becomes more readable if we explicitly revert back to unsynced state first: /* Write back local modifications at our current level */ if (register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(...); env-kvm_vcpu_dirty = 0; } and then do the sync we are requested to do: if (!env-kvm_vcpu_dirty) { ... } I agree, but only if we add a second conditional to the if 1st statement as such: if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) This is to cover the case where the caller is asking for register level 1 and we're already dirty at level 2. In this case, nothing should happen and we'll need the args-env-kvm_vcpu_dirty to ensure that is the case. As before, I'd prefer to make this explicit: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; CPUArchState *env = args-env; int register_level = args-register_level; if (register_level env-kvm_vcpu_dirty) { /* We are more dirty than we need to - all is well */ return; } /* Write back local modifications at our current level */ if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(env, env-kvm_vcpu_dirty); env-kvm_vcpu_dirty = 0; } if (!args-env-kvm_vcpu_dirty) { kvm_arch_get_registers(env, register_level); env-kvm_vcpu_dirty = register_level; } } Do you agree? Thanks for your time. :) Please also check out the discussions I've had with Bharat about his watchdog patches. There we need a mechanism to synchronize registers only when we actually need to, in order to avoid potential race conditions with a kernel timer. That particular case doesn't work well with levels. We can have multiple different potential race producers in the kernel that we need to avoid individually, so we can't always synchronize all of them when only one of them needs to be synchronized. The big question is what we should be doing about this. We basically have 3 options: * implement levels, treat racy registers as manually synchronized, as Bharat's latest patch set does * implement levels, add a bitmap for additional special synchronization bits * replace levels by bitmap I'm quite frankly not sure which one of the 3 would be the best way forward. Alex Implement read/write accessors for individual registers, similarly to how its done in the kernel. See kvm_cache_regs.h. Read/write for individual registers can be heavy for cases where multiple register needed to be updated. Is not it? It depends. We can still have classes of registers to synchronize that we could even synchronize using MANY_REG. For writes, things become even faster with individual accessors since we can easily coalesce all register writes until we hit the vcpu again into a MANY_REG array and write them in one go. For cases where multiple register needed to be synchronized then I would like to elaborate the options as: Option 1: int timer_func(CPUArchState env) { cpu_synchronize_state(env); //update env-timer_registers env-updated_regs_type |= TIMER_REGS_TYPE_BIT; To scale this one, it's probably also more clever to swap the logic: env
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, January 04, 2013 4:07 PM To: Bhushan Bharat-R65777 Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On 04.01.2013, at 11:32, Bhushan Bharat-R65777 wrote: -Original Message- Int timer_func(CPUxxState env) { Struct timer_regs; read_regs_type((env, timer_regs,TIMER_REGS); //update env-timer_registers Write_regs_type(env, TIMER_REGS) } This will get one type of register_types and can cause multiple IOCTL per entry to QEMU. This is still too explicit. How about static inline void ppc_set_tsr(CPUState *env, target_ulong val) { env-kvm_sync_extra |= SYNC_TIMER_REGS; cpu_synchronize_registers(env); env-spr[SPR_TSR] = val; You also want env-kvm_sync_dirty also, right? Not quite, since SYNC_TIMER_REGS spans more than only TSR. So we need to make sure we fetch the non-TSR register values before we can declare TIMER_REGS as dirty. Right , I actually want communicate mark dirty the TIMER_REGS only :) Imagine TIMER_REGS includes TSR and TCR. Then ppc_set_tsr(); should still work without a respective ppc_set_tcr() call. So we can't just mark it dirty here. The only way we could optimize this bit would be by either splitting the bitmap per register or by extending the setter functions to the full scope of the sync: Are you talking about GET all register of a type, say TIMER (TCR and TSR) on first call to cpu_synchronization_function but SET will be only for dirty register. Say if TSR is changed then setter will only communicate that TSR is changed? Or ( get multiple regs of a type , say TIMER (TCR and TSR), change one or more and mark dirty if any one changed. Set all registers of a type, TIMER (TCR and TSR) if dirty. In this case the first call by ppc_set_tsr() to cpu_synchronization_function will implicitly get all registers of TIMER (TCR and TSR) - means one or more registers of timer can be updated by QEMU. This function will update only env-TSR. Later if ppc_set_tcr() is called, it will also call cpu_synchronization_function but this will see that TIMER registers are already available to QEMU, so it will not fetch again. This function will update the TCR also. Finally they all, TIMER (TCR and TSR) will be pushed to kernel. ) Thanks -Bharat static inline void ppc_set_timer_regs(CPUState *env, target_ulong tsr, target_ulong tcr) { ... } Alex
Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3 v2] Reset qemu timers when guest reset
-Original Message- From: Wood Scott-B07421 Sent: Friday, January 04, 2013 1:51 AM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat- R65777 Subject: Re: [Qemu-ppc] [PATCH 2/3 v2] Reset qemu timers when guest reset On 12/27/2012 11:16:51 PM, Bharat Bhushan wrote: This patch install the timer reset handler. This will be called when the guest is reset. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v2: same as v1 hw/ppc_booke.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index d51e7fa..837a5b6 100644 --- a/hw/ppc_booke.c +++ b/hw/ppc_booke.c @@ -231,6 +231,16 @@ void store_booke_tcr(CPUPPCState *env, target_ulong val) } +static void ppc_booke_timer_reset_handle(void *opaque) { +CPUPPCState *env = opaque; + +env-spr[SPR_BOOKE_TSR] = 0; +env-spr[SPR_BOOKE_TCR] = 0; + +booke_update_irq(env); +} When does KVM_SET_SREGS get called? This is part of reset processing and is not cpu_synchronize_state() called before all reset handlers are called and after that post_synchronize will do the KVM_SET_SREGS in kvm_put_registers(). Thanks -Bharat
Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue
-Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Friday, January 04, 2013 7:08 AM To: Alexander Graf Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote: On 03.01.2013, at 19:48, Jason J. Herne wrote: On 01/03/2013 08:56 AM, Alexander Graf wrote: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; +CPUArchState *env = args-env; +int register_level = args-register_level; This probably becomes more readable if we explicitly revert back to unsynced state first: /* Write back local modifications at our current level */ if (register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(...); env-kvm_vcpu_dirty = 0; } and then do the sync we are requested to do: if (!env-kvm_vcpu_dirty) { ... } I agree, but only if we add a second conditional to the if 1st statement as such: if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) This is to cover the case where the caller is asking for register level 1 and we're already dirty at level 2. In this case, nothing should happen and we'll need the args-env-kvm_vcpu_dirty to ensure that is the case. As before, I'd prefer to make this explicit: static void do_kvm_cpu_synchronize_state(void *_args) { struct kvm_cpu_syncstate_args *args = _args; CPUArchState *env = args-env; int register_level = args-register_level; if (register_level env-kvm_vcpu_dirty) { /* We are more dirty than we need to - all is well */ return; } /* Write back local modifications at our current level */ if (args-env-kvm_vcpu_dirty register_level env-kvm_vcpu_dirty) { kvm_arch_put_registers(env, env-kvm_vcpu_dirty); env-kvm_vcpu_dirty = 0; } if (!args-env-kvm_vcpu_dirty) { kvm_arch_get_registers(env, register_level); env-kvm_vcpu_dirty = register_level; } } Do you agree? Thanks for your time. :) Please also check out the discussions I've had with Bharat about his watchdog patches. There we need a mechanism to synchronize registers only when we actually need to, in order to avoid potential race conditions with a kernel timer. That particular case doesn't work well with levels. We can have multiple different potential race producers in the kernel that we need to avoid individually, so we can't always synchronize all of them when only one of them needs to be synchronized. The big question is what we should be doing about this. We basically have 3 options: * implement levels, treat racy registers as manually synchronized, as Bharat's latest patch set does * implement levels, add a bitmap for additional special synchronization bits * replace levels by bitmap I'm quite frankly not sure which one of the 3 would be the best way forward. Alex Implement read/write accessors for individual registers, similarly to how its done in the kernel. See kvm_cache_regs.h. Read/write for individual registers can be heavy for cases where multiple register needed to be updated. Is not it? For cases where multiple register needed to be synchronized then I would like to elaborate the options as: Option 1: int timer_func(CPUArchState env) { cpu_synchronize_state(env); //update env-timer_registers env-updated_regs_type |= TIMER_REGS_TYPE_BIT; } Change the kvm_arch_put_registers() to add followings: int kvm_arch_put_registers(CPUArchState *env, int level) { If (env-updated_regs_type) { struct kvm_sregs sregs; If (env-updated_regs_type TIMER_REGS_TYPE_BIT) { // update sregs-timer_registers; // may be we can have a bitmap to tell kernel for what actually updated } If (env-updated_regs_type XX_CPU_REGS_TYPE) { // update respective registers in sregs } ioctl(env, KVM_GET_SREGS, sregs); } } This mechanism will get all registers state but this is required only once per entry in QEMU. Option 2: Define read_regs_type() and write_regs_type() for cases where it requires multiple registers updates: int read_regs_type((CPUArchState env, void *regs_ptr, int reg_type) { - reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc. Switch(reg_type) { Case TIMER_REGS: Struct *timer_regs = regs_ptr; Ioctl(KVM_GET_SREGS, timer_regs, CPU_reg_type); Break; Default
Re: [Qemu-devel] [PATCH 3/3] Enable kvm emulated watchdog
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 8:09 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 3/3] Enable kvm emulated watchdog On 17.12.2012, at 07:08, Bharat Bhushan wrote: Enable the KVM emulated watchdog if KVM supports (use the capability enablement in watchdog handler). Also watchdog exit (KVM_EXIT_WATCHDOG) handling is added. Watchdog state machine is cleared whenever VM state changes to running. This is to handle the cases like return from debug halt etc. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- hw/ppc.h |2 + hw/ppc_booke.c | 71 ++ target-ppc/kvm.c | 13 +- 3 files changed, 85 insertions(+), 1 deletions(-) diff --git a/hw/ppc.h b/hw/ppc.h index 2f3ea27..3672fe8 100644 --- a/hw/ppc.h +++ b/hw/ppc.h @@ -44,6 +44,8 @@ struct ppc_tb_t { uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset); clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq); +extern int cap_ppc_watchdog; +extern int cap_booke_sregs; No. Never export cap_ variables. They are kvm internal. /* Embedded PowerPC DCR management */ typedef uint32_t (*dcr_read_cb)(void *opaque, int dcrn); typedef void (*dcr_write_cb)(void *opaque, int dcrn, uint32_t val); diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..f18df74 100644 --- a/hw/ppc_booke.c +++ b/hw/ppc_booke.c @@ -21,6 +21,8 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ +#include sysemu.h +#include kvm.h #include hw.h #include ppc.h #include qemu-timer.h @@ -203,6 +205,11 @@ static void booke_wdt_cb(void *opaque) booke_timer-wdt_timer); } +static void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, +target_ulong tsr) { +env-spr[SPR_BOOKE_TSR] = tsr ~(TSR_ENW | TSR_WIS | +TSR_WRS_MASK); } + void store_booke_tsr(CPUPPCState *env, target_ulong val) { env-spr[SPR_BOOKE_TSR] = ~val; @@ -241,6 +248,64 @@ static void ppc_booke_timer_reset_handle(void *opaque) booke_update_irq(env); } +static void cpu_state_change_handler(void *opaque, int running, +RunState state) { +CPUPPCState *env = opaque; + +struct kvm_sregs sregs; + +if (!running) { +return; +} + +/* + * Clear watchdog interrupt condition by clearing TSR. + * Similar logic needed to be implemented for watchdog + * emulation in qemu. + */ + +if (!kvm_enabled()) { +/* FIXME: add handling for qemu emulated case */ +return; +} + +if (cap_booke_sregs cap_ppc_watchdog) { +kvm_vcpu_ioctl(env, KVM_GET_SREGS, sregs); + +/* Clear TSR.ENW, TSR.WIS and TSR.WRS */ +ppc_booke_watchdog_clear_tsr(env, sregs.u.e.tsr); This should happen outside of the if (kvm_enabled()) block. +sregs.u.e.tsr = env-spr[SPR_BOOKE_TSR]; +sregs.u.e.update_special = KVM_SREGS_E_UPDATE_TSR; + +kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs); Please create a kvmppc_... wrapper for all this in target-ppc/kvm.c. Or maybe even better yet add a helper variable that tells the kvm register sync function to sync TSR as well and just use the normal cpu_synchronize_state() way of pushing register into the CPU. Not sure what type of helper variable you are talking about. What came in my mine is we define a helper variable as per bitmap of SREGS update feature KVM_SREGS_E_UPDATE_* (update_special) in env. Whenever any code changes the env[spr] it will set the update_special. Env-update_special will be checked in put_registers(). Thanks -Bharat +} +} + +static int kvm_booke_watchdog_enable(CPUPPCState *env) { +int ret; +struct kvm_enable_cap encap = {}; + +if (!kvm_enabled()) { +return 0; Why return 0? +} + +if (!cap_ppc_watchdog) { +printf(warning: KVM does not support watchdog); +return 0; Why return 0? +} + +encap.cap = KVM_CAP_PPC_BOOKE_WATCHDOG; +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, encap); +if (ret 0) { +fprintf(stderr, %s: couldn't enable KVM_CAP_PPC_BOOKE_WATCHDOG: %s\n, +__func__, strerror(-ret)); +return ret; +} Please wrap this in a kvmppc_... function in kvm.c. + +return ret; +} + void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t flags) { ppc_tb_t *tb_env; @@ -262,5 +327,11 @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t flags) booke_timer-wdt_timer = qemu_new_timer_ns(vm_clock, booke_wdt_cb, env); +if (kvm_enabled()) { Why double-check
Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header
+++ b/scripts/update-linux-headers.sh @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +if [ -e $linux/arch/$arch/include/uapi ] +! [ -e $output/linux-headers/uapi ] ; then +mkdir $output/linux-headers/uapi mkdir -p But looking through this whole thing, it seems like the root cause is actually different. We don't want any uapi directories exposed to user space. So let's go back a step: Why do we need the uapi include dir? Because some header is using it. linux-headers/asm-powerpc/kvm_para.h: The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/ Is not this the correct thing that any header file in include/uapi/asm/ (in this case kvm_para.h) includes another header file (epapr_hcalls.h) in same directory? Also I think now only the uapi/asm/*.h files should be exposed to userspace (QEMU here). make headers_install should basically remove all the uapi magic and give us normal backwards-compatible asm trees :). I am perfectly fine, How we can do this now :) Well, for starters, do the headers work if you apply the patch I sent in a previous mail plus the epapr_hcall.h copy? If so, then that's the way to go :) Are you really sure that applying a patch and then syncing (or other way round) is the way you want to go ? Yes, because I'm quite confident we're generating broken headers right now. Ok, so every time someone does the sync he/she has to do this? Also do we think that sometime in future this will be taken care by make header_install? Thanks -Bharat Alex To me it does not look good, I think we can go with the script changes to make install_header is updated to do the work. -Bharat Alex
Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 7:45 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; Jan Kiszka; Peter Maydell Subject: Re: [PATCH] Added uapi directory into linux-header On 17.12.2012, at 05:08, Bharat Bhushan wrote: Linux ARCH specific header files are now in uapi/ directory also. These header files (epapr_hcalls.h on powerpc) are needed in qemu in linux-headers/uapi/asm/ directory as these are referenced by other header files in linux-headers/asm/. This patch is about changing the scripts for same. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Please include people who were participating in the discussion previously in CC when you post a new version. --- configure |1 + scripts/update-linux-headers.sh | 10 ++ 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 38b1cc6..51cce6b 100755 --- a/configure +++ b/configure @@ -3724,6 +3724,7 @@ if test $linux = yes ; then # For non-KVM architectures we will not have asm headers if [ -e $source_path/linux-headers/asm-$linux_arch ]; then symlink $source_path/linux-headers/asm-$linux_arch linux-headers/asm + symlink $source_path/linux-headers/uapi/asm-$linux_arch + linux-headers/uapi/asm This is still wrong. You want this guarded by another if [ -e ]. This is not wrong because fi fi diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +if ! [ -e $output/linux-headers/uapi ] ; then +mkdir $output/linux-headers/uapi +fi ... which would mean you don't need this bit. ... because here we ensure that the directory exists. + rm -rf $output/linux-headers/asm-$arch mkdir -p $output/linux-headers/asm-$arch +rm -rf $output/linux-headers/uapi/asm-$arch +mkdir -p $output/linux-headers/uapi/asm-$arch Only if it exists, no? Above it is ensured that the directory always exists .. + for header in kvm.h kvm_para.h; do cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch done if [ $arch = x86 ]; then cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86 fi +if [ $arch = powerpc ]; then +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h $output/linux-headers/uapi/asm-$arch/ +fi Why is this pointing at $linux and not $tmpdir? Because I found that $tmpdir does not have epapr_haclls.h, though I do not know why ? any idea why ? Also, no need for $arch - be explicit and make it powerpc. Ok. -Bharat Alex done rm -rf $output/linux-headers/linux -- 1.7.0.4
Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 8:24 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka; Peter Maydell Subject: Re: [PATCH] Added uapi directory into linux-header On 17.12.2012, at 15:51, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 7:45 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; Jan Kiszka; Peter Maydell Subject: Re: [PATCH] Added uapi directory into linux-header On 17.12.2012, at 05:08, Bharat Bhushan wrote: Linux ARCH specific header files are now in uapi/ directory also. These header files (epapr_hcalls.h on powerpc) are needed in qemu in linux-headers/uapi/asm/ directory as these are referenced by other header files in linux-headers/asm/. This patch is about changing the scripts for same. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Please include people who were participating in the discussion previously in CC when you post a new version. --- configure |1 + scripts/update-linux-headers.sh | 10 ++ 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 38b1cc6..51cce6b 100755 --- a/configure +++ b/configure @@ -3724,6 +3724,7 @@ if test $linux = yes ; then # For non-KVM architectures we will not have asm headers if [ -e $source_path/linux-headers/asm-$linux_arch ]; then symlink $source_path/linux-headers/asm-$linux_arch linux-headers/asm + symlink $source_path/linux-headers/uapi/asm-$linux_arch + linux-headers/uapi/asm This is still wrong. You want this guarded by another if [ -e ]. This is not wrong because fi fi diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +if ! [ -e $output/linux-headers/uapi ] ; then +mkdir $output/linux-headers/uapi +fi ... which would mean you don't need this bit. ... because here we ensure that the directory exists. But that means we create a uapi directory for archs that don't have one, right? Yes, I think that's the direction going forward (user specific header files in uapi/). + rm -rf $output/linux-headers/asm-$arch mkdir -p $output/linux-headers/asm-$arch +rm -rf $output/linux-headers/uapi/asm-$arch +mkdir -p $output/linux-headers/uapi/asm-$arch Only if it exists, no? Above it is ensured that the directory always exists .. + for header in kvm.h kvm_para.h; do cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch done if [ $arch = x86 ]; then cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86 fi +if [ $arch = powerpc ]; then +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h $output/linux-headers/uapi/asm-$arch/ +fi Why is this pointing at $linux and not $tmpdir? Because I found that $tmpdir does not have epapr_haclls.h, though I do not know why ? any idea why ? Smells like a kernel bug :). In fact, I remember that we used to forget uapi/asm/epapr_hcalls.h in the header list a while ago. I am not able to locate which command (set of commands) are installing the header files (all/selectively header files of arch/powerpc/include/asm/ and arch/powerp/include/uapi/asm/) in $tmpdir/ Are you sure you're basing on the latest kvm code? kvm-ppc-next branch of http://github.com/agraf/linux-2.6.git -Bharat Alex Also, no need for $arch - be explicit and make it powerpc. Ok. -Bharat Alex done rm -rf $output/linux-headers/linux -- 1.7.0.4
Re: [Qemu-devel] [PATCH 2/3] Reset qemu timers when guest reset
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 7:53 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/3] Reset qemu timers when guest reset On 17.12.2012, at 07:08, Bharat Bhushan wrote: This patch install the timer reset handler. This will be called when the guest is reset. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- hw/ppc_booke.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index d51e7fa..837a5b6 100644 --- a/hw/ppc_booke.c +++ b/hw/ppc_booke.c @@ -231,6 +231,16 @@ void store_booke_tcr(CPUPPCState *env, target_ulong val) } +static void ppc_booke_timer_reset_handle(void *opaque) { +CPUPPCState *env = opaque; + Doesn't this need a cpu_synchronize_state() call? There are some more registered reset_handler which changes the spr's but does not call synchronize.. But is not the qemu_system_reset() ( which calls registered reset handler) synchronizes the cpu state ? -Bharat Alex +env-spr[SPR_BOOKE_TSR] = 0; +env-spr[SPR_BOOKE_TCR] = 0; + +booke_update_irq(env); +} + void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t flags) { ppc_tb_t *tb_env; @@ -251,4 +261,6 @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t flags) qemu_new_timer_ns(vm_clock, booke_fit_cb, env); booke_timer-wdt_timer = qemu_new_timer_ns(vm_clock, booke_wdt_cb, env); + +qemu_register_reset(ppc_booke_timer_reset_handle, env); } -- 1.7.0.4
Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 8:39 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka; Peter Maydell Subject: Re: [PATCH] Added uapi directory into linux-header On 17.12.2012, at 16:02, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 8:24 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka; Peter Maydell Subject: Re: [PATCH] Added uapi directory into linux-header On 17.12.2012, at 15:51, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, December 17, 2012 7:45 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; Jan Kiszka; Peter Maydell Subject: Re: [PATCH] Added uapi directory into linux-header On 17.12.2012, at 05:08, Bharat Bhushan wrote: Linux ARCH specific header files are now in uapi/ directory also. These header files (epapr_hcalls.h on powerpc) are needed in qemu in linux-headers/uapi/asm/ directory as these are referenced by other header files in linux-headers/asm/. This patch is about changing the scripts for same. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Please include people who were participating in the discussion previously in CC when you post a new version. --- configure |1 + scripts/update-linux-headers.sh | 10 ++ 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 38b1cc6..51cce6b 100755 --- a/configure +++ b/configure @@ -3724,6 +3724,7 @@ if test $linux = yes ; then # For non-KVM architectures we will not have asm headers if [ -e $source_path/linux-headers/asm-$linux_arch ]; then symlink $source_path/linux-headers/asm-$linux_arch linux-headers/asm + symlink $source_path/linux-headers/uapi/asm-$linux_arch + linux-headers/uapi/asm This is still wrong. You want this guarded by another if [ -e ]. This is not wrong because fi fi diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +if ! [ -e $output/linux-headers/uapi ] ; then +mkdir $output/linux-headers/uapi +fi ... which would mean you don't need this bit. ... because here we ensure that the directory exists. But that means we create a uapi directory for archs that don't have one, right? Yes, I think that's the direction going forward (user specific header files in uapi/). Sure, but we shouldn't be faster than Linux itself here ;). I am fine with what makes you happier :) + rm -rf $output/linux-headers/asm-$arch mkdir -p $output/linux-headers/asm-$arch +rm -rf $output/linux-headers/uapi/asm-$arch +mkdir -p $output/linux-headers/uapi/asm-$arch Only if it exists, no? Above it is ensured that the directory always exists .. + for header in kvm.h kvm_para.h; do cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch done if [ $arch = x86 ]; then cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86 fi +if [ $arch = powerpc ]; then +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h $output/linux-headers/uapi/asm-$arch/ +fi Why is this pointing at $linux and not $tmpdir? Because I found that $tmpdir does not have epapr_haclls.h, though I do not know why ? any idea why ? Smells like a kernel bug :). In fact, I remember that we used to forget uapi/asm/epapr_hcalls.h in the header list a while ago. I am not able to locate which command (set of commands) are installing the header files (all/selectively header files of arch/powerpc/include/asm/ and arch/powerp/include/uapi/asm/) in $tmpdir/ There is a make target headers_install to install user space headers into a directory. To be installed by this, the file needs to be listed in arch/powerpc/include/uapi/asm/Kbuild. Commit c99ec973a63e2249020d6d93a46d7572432da6a2 put it in there, so it should get installed... Ahh, thanks . Actually all the files are getting installed in $tmpdir/include/asm/ while I was looking for the header file in $tmpdir/include/uapi/asm/ -Bharat Are you sure you're basing on the latest kvm code? kvm-ppc-next branch of http://github.com/agraf/linux-2.6.git Yeah, that one is good :) Alex
Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header
+++ b/scripts/update-linux-headers.sh @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +if [ -e $linux/arch/$arch/include/uapi ] +! [ -e $output/linux-headers/uapi ] ; then +mkdir $output/linux-headers/uapi mkdir -p But looking through this whole thing, it seems like the root cause is actually different. We don't want any uapi directories exposed to user space. So let's go back a step: Why do we need the uapi include dir? Because some header is using it. linux-headers/asm-powerpc/kvm_para.h: The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/ Is not this the correct thing that any header file in include/uapi/asm/ (in this case kvm_para.h) includes another header file (epapr_hcalls.h) in same directory? Also I think now only the uapi/asm/*.h files should be exposed to userspace (QEMU here). Waiting for David's comment for better clarity ... -Bharat #include uapi/asm/epapr_hcalls.h This is the root cause of the problem. We must never manually include any uapi header paths. We only ever include their normal asm-counterparts which then may include uapi (in kernel) or actually are uapi (in user space). David, please correct me if I'm wrong. Could you please try and see if this kernel side patch makes things work for you too? Alex diff --git a/arch/powerpc/include/uapi/asm/kvm_para.h b/arch/powerpc/include/uapi/asm/kvm_para.h index ed0e025..e3af328 100644 --- a/arch/powerpc/include/uapi/asm/kvm_para.h +++ b/arch/powerpc/include/uapi/asm/kvm_para.h @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared { #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num) -#include uapi/asm/epapr_hcalls.h +#include asm/epapr_hcalls.h #define KVM_FEATURE_MAGIC_PAGE 1
Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, December 18, 2012 6:51 AM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka; qemu-...@nongnu.org List; Marcelo Tosatti; David Howells Subject: Re: [PATCH v2] Added uapi directory into linux-header On 18.12.2012, at 02:14, Bhushan Bharat-R65777 wrote: +++ b/scripts/update-linux-headers.sh @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +if [ -e $linux/arch/$arch/include/uapi ] +! [ -e $output/linux-headers/uapi ] ; then +mkdir $output/linux-headers/uapi mkdir -p But looking through this whole thing, it seems like the root cause is actually different. We don't want any uapi directories exposed to user space. So let's go back a step: Why do we need the uapi include dir? Because some header is using it. linux-headers/asm-powerpc/kvm_para.h: The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/ Is not this the correct thing that any header file in include/uapi/asm/ (in this case kvm_para.h) includes another header file (epapr_hcalls.h) in same directory? Also I think now only the uapi/asm/*.h files should be exposed to userspace (QEMU here). make headers_install should basically remove all the uapi magic and give us normal backwards-compatible asm trees :). I am perfectly fine, How we can do this now :) -Bharat Alex Waiting for David's comment for better clarity ... -Bharat #include uapi/asm/epapr_hcalls.h This is the root cause of the problem. We must never manually include any uapi header paths. We only ever include their normal asm-counterparts which then may include uapi (in kernel) or actually are uapi (in user space). David, please correct me if I'm wrong. Could you please try and see if this kernel side patch makes things work for you too? Alex diff --git a/arch/powerpc/include/uapi/asm/kvm_para.h b/arch/powerpc/include/uapi/asm/kvm_para.h index ed0e025..e3af328 100644 --- a/arch/powerpc/include/uapi/asm/kvm_para.h +++ b/arch/powerpc/include/uapi/asm/kvm_para.h @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared { #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num) -#include uapi/asm/epapr_hcalls.h +#include asm/epapr_hcalls.h #define KVM_FEATURE_MAGIC_PAGE 1
Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, December 18, 2012 7:00 AM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka; qemu-...@nongnu.org List; Marcelo Tosatti; David Howells Subject: Re: [PATCH v2] Added uapi directory into linux-header On 18.12.2012, at 02:27, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, December 18, 2012 6:51 AM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka; qemu-...@nongnu.org List; Marcelo Tosatti; David Howells Subject: Re: [PATCH v2] Added uapi directory into linux-header On 18.12.2012, at 02:14, Bhushan Bharat-R65777 wrote: +++ b/scripts/update-linux-headers.sh @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install +if [ -e $linux/arch/$arch/include/uapi ] +! [ -e $output/linux-headers/uapi ] ; then +mkdir $output/linux-headers/uapi mkdir -p But looking through this whole thing, it seems like the root cause is actually different. We don't want any uapi directories exposed to user space. So let's go back a step: Why do we need the uapi include dir? Because some header is using it. linux-headers/asm-powerpc/kvm_para.h: The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/ Is not this the correct thing that any header file in include/uapi/asm/ (in this case kvm_para.h) includes another header file (epapr_hcalls.h) in same directory? Also I think now only the uapi/asm/*.h files should be exposed to userspace (QEMU here). make headers_install should basically remove all the uapi magic and give us normal backwards-compatible asm trees :). I am perfectly fine, How we can do this now :) Well, for starters, do the headers work if you apply the patch I sent in a previous mail plus the epapr_hcall.h copy? If so, then that's the way to go :) Are you really sure that applying a patch and then syncing (or other way round) is the way you want to go ? To me it does not look good, I think we can go with the script changes to make install_header is updated to do the work. -Bharat Alex
Re: [Qemu-devel] [PATCH RFC] PowerPC: Added uapi directory into linux-header
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, December 14, 2012 5:06 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; Jan Kiszka Subject: Re: [PATCH RFC] PowerPC: Added uapi directory into linux-header On 14.12.2012, at 12:04, Bharat Bhushan wrote: This is corrently done for powerpc. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Jan, could you please check if this is correct? --- configure |1 + scripts/update-linux-headers.sh |5 + 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/configure b/configure index 780b19a..bdc2d5e 100755 --- a/configure +++ b/configure @@ -3701,6 +3701,7 @@ if test $linux = yes ; then # For non-KVM architectures we will not have asm headers if [ -e $source_path/linux-headers/asm-$linux_arch ]; then symlink $source_path/linux-headers/asm-$linux_arch linux-headers/asm + symlink $source_path/linux-headers/uapi/asm-$linux_arch + linux-headers/uapi/asm fi fi diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 4c7b566..9f6bf25 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -48,6 +48,11 @@ for arch in $ARCHLIST; do rm -rf $output/linux-headers/asm-$arch mkdir -p $output/linux-headers/asm-$arch +if [ $arch = powerpc ]; then This looks bogus. There shouldn't be any powerpc specifics anywhere in this file. This file have x86 specific also, why ? -Bharat Alex + rm -rf $output/linux-headers/uapi/asm-$arch/* +cp $linux/arch/$arch/include/uapi/asm/epapr_hcalls.h $output/linux-headers/uapi/asm-$arch/ +fi + for header in kvm.h kvm_para.h; do cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch done -- 1.7.0.4
Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region
-Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, October 09, 2012 2:35 PM To: Andreas Färber Cc: Bhushan Bharat-R65777; ag...@suse.de; qemu-...@nongnu.org; qemu- de...@nongnu.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region On 10/08/2012 07:21 PM, Andreas Färber wrote: Am 08.10.2012 18:46, schrieb Bharat Bhushan: All devices are also placed under CCSR memory region. The CCSR memory region is exported to pci device. The MSI interrupt generation is the main reason to export the CCSR region to PCI device. This put the requirement to move mpic under CCSR region, but logically all devices should be under CCSR. So this patch places all emulated devices under ccsr region. +sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); +sysbus_connect_irq(s, 1, mpic[pci_irq_nrs[1]]); +sysbus_connect_irq(s, 2, mpic[pci_irq_nrs[2]]); +sysbus_connect_irq(s, 3, mpic[pci_irq_nrs[3]]); +memory_region_add_subregion(ccsr, MPC8544_PCI_REGS_OFFSET, + s-mmio[0].memory); ... I wonder if fiddling with SysBus MMIO is a good idea. s-mmio[0].addr is not getting assigned this way, which is checked as condition for deleting the subregion. But sysbus_mmio_map() only adds to / deletes from get_system_memory(). The alternative would be using a custom field rather than the SysBus-internal one. Avi/Alex? IMO yes. Or not use sysbus at all. What about adding a API: void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t addr, MemoryRegion *mr) { assert(n = 0 n dev-num_mmio); if (dev-mmio[n].addr == addr) { /* ??? region already mapped here. */ return; } if (dev-mmio[n].addr != (target_phys_addr_t)-1) { /* Unregister previous mapping. */ memory_region_del_subregion(mr, dev-mmio[n].memory); } dev-mmio[n].addr = addr; memory_region_add_subregion(mr, addr, dev-mmio[n].memory); } Thanks -Bharat -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region
-Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, October 09, 2012 10:24 PM To: Bhushan Bharat-R65777 Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region On 10/09/2012 06:45 PM, Bhushan Bharat-R65777 wrote: What about adding a API: void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t addr, MemoryRegion *mr) { assert(n = 0 n dev-num_mmio); if (dev-mmio[n].addr == addr) { /* ??? region already mapped here. */ return; } if (dev-mmio[n].addr != (target_phys_addr_t)-1) { /* Unregister previous mapping. */ memory_region_del_subregion(mr, dev-mmio[n].memory); } dev-mmio[n].addr = addr; memory_region_add_subregion(mr, addr, dev-mmio[n].memory); } I think you can just use sysbus_mmio_get_region(). There are plenty of other users, so there's precedent. You mean something like this : memory_region_add_subregion(mr, addr, sysbus_mmio_get_region(dev, 0); Ok, but this will still not resolve the issue of not setting the dev-mmio[n].addr, no ? Thanks -Bharat -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region
-Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, October 09, 2012 10:31 PM To: Bhushan Bharat-R65777 Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region On 10/09/2012 06:57 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, October 09, 2012 10:24 PM To: Bhushan Bharat-R65777 Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region On 10/09/2012 06:45 PM, Bhushan Bharat-R65777 wrote: What about adding a API: void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t addr, MemoryRegion *mr) { assert(n = 0 n dev-num_mmio); if (dev-mmio[n].addr == addr) { /* ??? region already mapped here. */ return; } if (dev-mmio[n].addr != (target_phys_addr_t)-1) { /* Unregister previous mapping. */ memory_region_del_subregion(mr, dev-mmio[n].memory); } dev-mmio[n].addr = addr; memory_region_add_subregion(mr, addr, dev-mmio[n].memory); } I think you can just use sysbus_mmio_get_region(). There are plenty of other users, so there's precedent. You mean something like this : memory_region_add_subregion(mr, addr, sysbus_mmio_get_region(dev, 0); Ok, but this will still not resolve the issue of not setting the dev- mmio[n].addr, no ? Correct. But there are 20 uses already, so it can't matter much. If someone wants to fix them, they can write a new API and do a sweep. But really, sysbus just needs to go away. It's pointless to give it more and more features. Ok Thanks -Bharat -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 2/2 v2] Adding BAR0 for e500 PCI controller
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, October 10, 2012 4:08 AM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; afaer...@suse.de; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH 2/2 v2] Adding BAR0 for e500 PCI controller On 10/09/2012 01:19:10 PM, Bharat Bhushan wrote: +static int e500_ccsr_initfn(SysBusDevice *dev) { +PPCE500CCSRState *pci_ccsr; + +pci_ccsr = CCSR(dev); +memory_region_init(pci_ccsr-ccsr_space, e500-ccsr, + MPC8544_CCSRBAR_SIZE); +return 0; +} Is this object supposed to represent CCSR (which is what the type name seems to imply, along with the existence of a different PPCE500PCIBridgeState) or PCI BAR0 (which is what pci_ccsr seems to imply, along with the fact that it's being added in the PCI patch)? It is ccsr, I will correct this naming. Thanks -Bharat
Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
Which device's initialization function you are talking about? static const TypeInfo e500_host_bridge_info = { .name = e500-host-bridge, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(PCIDevice), .class_init= e500_host_bridge_class_init, }; This needs to describe a derived class of PCIDevice, hold the BAR as a data member, and register the BAR in the init function (which doesn't exist yet because you haven't subclassed it). This way the BAR lives in the device which exposes it, like BARs everywhere. Do you mean that we add the MemoryRegion bar0 in PCIDevice struct. Do the same thing that I was doing in e500_pcihost_initfn() in the k-init() (will add this) function of e500-host-bridge No, he means that you create a new struct like this: struct foo { PCIDevice p; MemoryRegion bar0; }; Please check out any other random PCI device in QEMU. Almost all of them do this to store more information than their parent class can hold. Just want to be sure I understood you correctly: Do you mean something like this : ( I know I have to switch to QOM mechanism to share parameters) diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..a948bc6 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -89,6 +89,12 @@ struct PPCE500PCIState { MemoryRegion iomem; }; +struct BHARAT { +PCIDevice p; +void *bar0; MemoryRegion *bar0 If I uses MemoryRegion *bar0 or MemoryRegion bar0 then how the size and address of bar0 will be populated? As far as I can see, other PCI devices using this way to have extra information but they does not get MemoryRegion information from other object. +}; + +typedef struct BHARAT bharat; typedef struct PPCE500PCIState PPCE500PCIState; static uint64_t pci_reg_read4(void *opaque, target_phys_addr_t addr, @@ -307,6 +313,16 @@ static const VMStateDescription vmstate_ppce500_pci = { #include exec-memory.h +static int e500_pcihost_bridge_initfn(PCIDevice *d) { +bharat *b = DO_UPCAST(bharat, p, d); + +printf(Addr = %llx, size = %llx\n, ((MemoryRegion *)b-bar0)-addr, (unsigned long long)int128_get64(((Me +pci_register_bar(d, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, + (MemoryRegion *)b-bar0); That one still has to call its parent initfn, no? I am really sorry, but I did not get. The object says its parent is TYPE_PCI_DEVICE, so which function are you talking about? +return 0; +} + +MemoryRegion ccsr; static int e500_pcihost_initfn(SysBusDevice *dev) { PCIHostState *h; @@ -315,6 +331,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev) int i; MemoryRegion *address_space_mem = get_system_memory(); MemoryRegion *address_space_io = get_system_io(); +PCIDevice *d; h = PCI_HOST_BRIDGE(dev); s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -328,7 +345,12 @@ static int e500_pcihost_initfn(SysBusDevice *dev) address_space_io, PCI_DEVFN(0x11, 0), 4); h-bus = b; -pci_create_simple(b, 0, e500-host-bridge); +d = pci_create(b, 0, e500-host-bridge); +/* ccsr- should be passed from hw/ppc/e500.c */ +ccsr.addr = 0xE000; +ccsr.size = int128_make64(0x0010); What is this? I am trying to pass the CCSR information to e500-host-bridge. This is currently hardcoded, but finally this will come from hw/ppc/e500.c. Thanks -Bharat Alex +qdev_prop_set_ptr(d-qdev, bar0_region, ccsr); +qdev_init_nofail(d-qdev); memory_region_init(s-container, pci-container, PCIE500_ALL_SIZE); memory_region_init_io(h-conf_mem, pci_host_conf_be_ops, h, @@ -345,11 +367,18 @@ static int e500_pcihost_initfn(SysBusDevice *dev) return 0; } +static Property pci_host_dev_info[] = { +DEFINE_PROP_PTR(bar0_region, bharat, bar0), +DEFINE_PROP_END_OF_LIST(), +}; + static void e500_host_bridge_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); +k-init = e500_pcihost_bridge_initfn; +dc-props = pci_host_dev_info; k-vendor_id = PCI_VENDOR_ID_FREESCALE; k-device_id = PCI_DEVICE_ID_MPC8533E; k-class_id = PCI_CLASS_PROCESSOR_POWERPC; @@ -359,10 +388,11 @@ static void e500_host_bridge_class_init(ObjectClass *klass, void *data) static const TypeInfo e500_host_bridge_info = { .name = e500-host-bridge, .parent= TYPE_PCI_DEVICE, -.instance_size = sizeof(PCIDevice), +.instance_size = sizeof(bharat), .class_init= e500_host_bridge_class_init, }; static void e500_pcihost_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); Thanks -Bharat Alex This way I should be able to met both of my requirements. Thanks -Bharat -- error compiling
Re: [Qemu-devel] [PATCH 3/3] Adding BAR0 for e500 PCI controller
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, October 09, 2012 1:54 AM To: Alexander Graf Cc: Andreas Färber; Bhushan Bharat-R65777; Avi Kivity; qemu-...@nongnu.org; qemu-devel@nongnu.org; Bhushan Bharat-R65777 Subject: Re: [Qemu-devel] [PATCH 3/3] Adding BAR0 for e500 PCI controller On 10/08/2012 02:04:43 PM, Alexander Graf wrote: On 08.10.2012, at 20:00, Andreas Färber wrote: Am 08.10.2012 18:46, schrieb Bharat Bhushan: #define BINARY_DEVICE_TREE_FILEmpc8544ds.dtb #define UIMAGE_LOAD_BASE 0 -#define DTC_LOAD_PAD 0x180 +#define DTC_LOAD_PAD 0x50 #define DTC_PAD_MASK 0xF #define INITRD_LOAD_PAD0x200 #define INITRD_PAD_MASK0xFF Was this change intentional? I don't see it being used here, and commit message doesn't seem to mention it. I'd assume he tried to work around a bug I fixed in between. But this change definitely is not intentional. It looks like an accidental revert of http://patchwork.ozlabs.org/patch/179475/ Oops, it was not intended to be sent as patch. It was not accidental because without this revert I was not able to boot my guest. Thanks -Bharat
Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, October 04, 2012 9:37 PM To: Bhushan Bharat-R65777 Cc: Avi Kivity; qemu-devel@nongnu.org; qemu-...@nongnu.org Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 04.10.2012, at 18:03, Bhushan Bharat-R65777 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, October 04, 2012 8:28 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, October 04, 2012 6:02 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat- R65777 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 10/03/2012 01:50 PM, Bharat Bhushan wrote: sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -87,6 +87,7 @@ struct PPCE500PCIState { /* mmio maps */ MemoryRegion container; MemoryRegion iomem; +void *bar0; }; void *? Why? Passing parameter via qdev is either possible by value or by void *. That's why I used void *. Then you shouldn't be using qdev for this. But if you make the changes below, it will likely not be necessary. However this is best done from the pci device's initialization function, not here (the MemoryRegion should be a member of the pci device as well). We want to set BAR0 of PCI controller so we did this here. But yes, we also want that all devices under the PCI controller have this mapping, so when they access the MPIC register to send MSI then they can do that. Please elaborate. One way to describe what you want is to issue an 'info mtree' and show what changes you want. Setup is something like this: |-| | PCI device | | | --- | | | | |---| | | | PCI controller| | | | -- | | | BAR0 in | | | | TYPE1| | | | Config HDR | | | -- | | | --- BAR0: it is an inbound window, and on e500 devices this maps the pci bus address to CCSR address. My requirement are: 1) when guest read pci controller BAR0, it is able to get proper size ( Size of CCSR space) 2) Guest can program this to any address in pci address space. When PCI device access this address range then that address should be mapped to CCSR address space. Though this patch only met the requirement number (1). No, it also meets (2). The PCI address space is identical to the CPU memory space in our mapping right now. So if the guest maps BAR0 somewhere, it automatically maps CCSR into CPU address space, which exposes it to PCI address space. Which device's initialization function you are talking about? static const TypeInfo e500_host_bridge_info = { .name = e500-host-bridge, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(PCIDevice), .class_init= e500_host_bridge_class_init, }; This needs to describe a derived class of PCIDevice, hold the BAR as a data member, and register the BAR in the init function (which doesn't exist yet because you haven't subclassed it). This way the BAR lives in the device which exposes it, like BARs everywhere. Do you mean that we add the MemoryRegion bar0 in PCIDevice struct. Do the same thing that I was doing in e500_pcihost_initfn() in the k-init() (will add this) function of e500-host-bridge No, he means that you create a new struct like this: struct foo { PCIDevice p; MemoryRegion bar0; }; Please check out any other random PCI device in QEMU. Almost all of them do this to store more information than their parent class can hold. Just want to be sure I understood you correctly: Do you mean something like this : ( I know I have to switch to QOM mechanism to share parameters) diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..a948bc6 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -89,6 +89,12 @@ struct PPCE500PCIState { MemoryRegion iomem; }; +struct BHARAT { +PCIDevice p; +void *bar0; +}; + +typedef struct BHARAT bharat; typedef struct PPCE500PCIState PPCE500PCIState; static uint64_t pci_reg_read4(void *opaque, target_phys_addr_t addr, @@ -307,6 +313,16 @@ static const VMStateDescription vmstate_ppce500_pci = { #include exec-memory.h +static int e500_pcihost_bridge_initfn
Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
-Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, October 04, 2012 6:02 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat- R65777 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 10/03/2012 01:50 PM, Bharat Bhushan wrote: sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -87,6 +87,7 @@ struct PPCE500PCIState { /* mmio maps */ MemoryRegion container; MemoryRegion iomem; +void *bar0; }; void *? Why? Passing parameter via qdev is either possible by value or by void *. That's why I used void *. typedef struct PPCE500PCIState PPCE500PCIState; @@ -315,6 +316,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev) int i; MemoryRegion *address_space_mem = get_system_memory(); MemoryRegion *address_space_io = get_system_io(); +PCIDevice *pdev; +MemoryRegion bar0; h = PCI_HOST_BRIDGE(dev); s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +345,10 @@ static int e500_pcihost_initfn(SysBusDevice *dev) memory_region_add_subregion(s-container, PCIE500_REG_BASE, s-iomem); sysbus_init_mmio(dev, s-container); +bar0 = *(MemoryRegion *)s-bar0; +pdev = pci_find_device(b, 0, 0); +pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, bar0); + This is broken, you're registering an object on the stack which will be freed as soon as the function returns. Just pass s-bar0 as Alex suggests. Ok. However this is best done from the pci device's initialization function, not here (the MemoryRegion should be a member of the pci device as well). We want to set BAR0 of PCI controller so we did this here. But yes, we also want that all devices under the PCI controller have this mapping, so when they access the MPIC register to send MSI then they can do that. Which device's initialization function you are talking about? Thanks -Bharat return 0; } -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
-Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, October 04, 2012 8:28 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, October 04, 2012 6:02 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat- R65777 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 10/03/2012 01:50 PM, Bharat Bhushan wrote: sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -87,6 +87,7 @@ struct PPCE500PCIState { /* mmio maps */ MemoryRegion container; MemoryRegion iomem; +void *bar0; }; void *? Why? Passing parameter via qdev is either possible by value or by void *. That's why I used void *. Then you shouldn't be using qdev for this. But if you make the changes below, it will likely not be necessary. However this is best done from the pci device's initialization function, not here (the MemoryRegion should be a member of the pci device as well). We want to set BAR0 of PCI controller so we did this here. But yes, we also want that all devices under the PCI controller have this mapping, so when they access the MPIC register to send MSI then they can do that. Please elaborate. One way to describe what you want is to issue an 'info mtree' and show what changes you want. Setup is something like this: |-| | PCI device | | | --- | | | | |---| | | | PCI controller| | | | -- | | | BAR0 in | | | | TYPE1| | | | Config HDR | | | -- | | | --- BAR0: it is an inbound window, and on e500 devices this maps the pci bus address to CCSR address. My requirement are: 1) when guest read pci controller BAR0, it is able to get proper size ( Size of CCSR space) 2) Guest can program this to any address in pci address space. When PCI device access this address range then that address should be mapped to CCSR address space. Though this patch only met the requirement number (1). Which device's initialization function you are talking about? static const TypeInfo e500_host_bridge_info = { .name = e500-host-bridge, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(PCIDevice), .class_init= e500_host_bridge_class_init, }; This needs to describe a derived class of PCIDevice, hold the BAR as a data member, and register the BAR in the init function (which doesn't exist yet because you haven't subclassed it). This way the BAR lives in the device which exposes it, like BARs everywhere. Do you mean that we add the MemoryRegion bar0 in PCIDevice struct. Do the same thing that I was doing in e500_pcihost_initfn() in the k-init() (will add this) function of e500-host-bridge This way I should be able to met both of my requirements. Thanks -Bharat -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, October 04, 2012 9:37 PM To: Bhushan Bharat-R65777 Cc: Avi Kivity; qemu-devel@nongnu.org; qemu-...@nongnu.org Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 04.10.2012, at 18:03, Bhushan Bharat-R65777 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, October 04, 2012 8:28 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, October 04, 2012 6:02 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat- R65777 Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller On 10/03/2012 01:50 PM, Bharat Bhushan wrote: sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -87,6 +87,7 @@ struct PPCE500PCIState { /* mmio maps */ MemoryRegion container; MemoryRegion iomem; +void *bar0; }; void *? Why? Passing parameter via qdev is either possible by value or by void *. That's why I used void *. Then you shouldn't be using qdev for this. But if you make the changes below, it will likely not be necessary. However this is best done from the pci device's initialization function, not here (the MemoryRegion should be a member of the pci device as well). We want to set BAR0 of PCI controller so we did this here. But yes, we also want that all devices under the PCI controller have this mapping, so when they access the MPIC register to send MSI then they can do that. Please elaborate. One way to describe what you want is to issue an 'info mtree' and show what changes you want. Setup is something like this: |-| | PCI device | | | --- | | | | |---| | | | PCI controller| | | | -- | | | BAR0 in | | | | TYPE1| | | | Config HDR | | | -- | | | --- BAR0: it is an inbound window, and on e500 devices this maps the pci bus address to CCSR address. My requirement are: 1) when guest read pci controller BAR0, it is able to get proper size ( Size of CCSR space) 2) Guest can program this to any address in pci address space. When PCI device access this address range then that address should be mapped to CCSR address space. Though this patch only met the requirement number (1). No, it also meets (2). The PCI address space is identical to the CPU memory space in our mapping right now. So if the guest maps BAR0 somewhere, it automatically maps CCSR into CPU address space, which exposes it to PCI address space. Really? I think on powerpc the pci address space is defined as: it maps the outbound window just below 0x1__, then CCSR and then inbound window. So inbound window is 1:1 map if guest physical starts from 0x0. But I do not think CCSR is 1:1 map in pci address space and cpu address space. Which device's initialization function you are talking about? static const TypeInfo e500_host_bridge_info = { .name = e500-host-bridge, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(PCIDevice), .class_init= e500_host_bridge_class_init, }; This needs to describe a derived class of PCIDevice, hold the BAR as a data member, and register the BAR in the init function (which doesn't exist yet because you haven't subclassed it). This way the BAR lives in the device which exposes it, like BARs everywhere. Do you mean that we add the MemoryRegion bar0 in PCIDevice struct. Do the same thing that I was doing in e500_pcihost_initfn() in the k-init() (will add this) function of e500-host-bridge No, he means that you create a new struct like this: struct foo { PCIDevice p; MemoryRegion bar0; }; Please check out any other random PCI device in QEMU. Almost all of them do this to store more information than their parent class can hold. Ok, Thanks -Bharat Alex This way I should be able to met both of my requirements. Thanks -Bharat -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH 1/2] e500: Adding CCSR memory region
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, October 03, 2012 5:39 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 1/2] e500: Adding CCSR memory region On 03.10.2012, at 13:49, Bharat Bhushan wrote: All devices are also placed under CCSR memory region. The CCSR memory region is exported to pci device. The MSI interrupt generation is the main reason to export the CCSR region to PCI device. This put the requirement to move mpic under CCSR region, but logically all devices should be under CCSR. So this patch places all emulated devices under ccsr region. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- hw/ppc/e500.c | 51 +-- 1 files changed, 37 insertions(+), 14 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index 5bab340..197411d 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -46,14 +46,23 @@ /* TODO: parameterize */ #define MPC8544_CCSRBAR_BASE 0xE000ULL #define MPC8544_CCSRBAR_SIZE 0x0010ULL -#define MPC8544_MPIC_REGS_BASE (MPC8544_CCSRBAR_BASE + 0x4ULL) -#define MPC8544_SERIAL0_REGS_BASE (MPC8544_CCSRBAR_BASE + 0x4500ULL) -#define MPC8544_SERIAL1_REGS_BASE (MPC8544_CCSRBAR_BASE + 0x4600ULL) -#define MPC8544_PCI_REGS_BASE (MPC8544_CCSRBAR_BASE + 0x8000ULL) +#define MPC8544_MPIC_REGS_OFFSET 0x4ULL +#define MPC8544_MPIC_REGS_BASE (MPC8544_CCSRBAR_BASE + \ +MPC8544_MPIC_REGS_OFFSET) #define +MPC8544_SERIAL0_REGS_OFFSET 0x4500ULL #define +MPC8544_SERIAL0_REGS_BASE (MPC8544_CCSRBAR_BASE + \ +MPC8544_SERIAL0_REGS_OFFSET) +#define MPC8544_SERIAL1_REGS_OFFSET 0x4600ULL #define +MPC8544_SERIAL1_REGS_BASE (MPC8544_CCSRBAR_BASE + \ +MPC8544_SERIAL1_REGS_OFFSET) +#define MPC8544_PCI_REGS_OFFSET0x8000ULL +#define MPC8544_PCI_REGS_BASE (MPC8544_CCSRBAR_BASE + \ +MPC8544_PCI_REGS_OFFSET) You don't use any of the bases anymore, right? Please remove the respective #define's. Alex, some of these bases are used in device tree creation code. Thanks -Bharat The rest of the patch looks very nice. Alex #define MPC8544_PCI_REGS_SIZE 0x1000ULL #define MPC8544_PCI_IO 0xE100ULL #define MPC8544_PCI_IOLEN 0x1ULL -#define MPC8544_UTIL_BASE (MPC8544_CCSRBAR_BASE + 0xeULL) +#define MPC8544_UTIL_OFFSET0xeULL +#define MPC8544_UTIL_BASE (MPC8544_CCSRBAR_BASE + MPC8544_UTIL_OFFSET) #define MPC8544_SPIN_BASE 0xEF00ULL struct boot_info @@ -419,6 +428,8 @@ void ppce500_init(PPCE500Params *params) qemu_irq **irqs, *mpic; DeviceState *dev; CPUPPCState *firstenv = NULL; +MemoryRegion *ccsr; +SysBusDevice *s; /* Setup CPUs */ if (params-cpu_model == NULL) { @@ -474,8 +485,12 @@ void ppce500_init(PPCE500Params *params) vmstate_register_ram_global(ram); memory_region_add_subregion(address_space_mem, 0, ram); +ccsr = g_malloc0(sizeof(MemoryRegion)); +memory_region_init(ccsr, e500-cssr, MPC8544_CCSRBAR_SIZE); +memory_region_add_subregion(address_space_mem, + MPC8544_CCSRBAR_BASE, ccsr); + /* MPIC */ -mpic = mpic_init(address_space_mem, MPC8544_MPIC_REGS_BASE, +mpic = mpic_init(ccsr, MPC8544_MPIC_REGS_OFFSET, smp_cpus, irqs, NULL); if (!mpic) { @@ -484,25 +499,33 @@ void ppce500_init(PPCE500Params *params) /* Serial */ if (serial_hds[0]) { -serial_mm_init(address_space_mem, MPC8544_SERIAL0_REGS_BASE, +serial_mm_init(ccsr, MPC8544_SERIAL0_REGS_OFFSET, 0, mpic[12+26], 399193, serial_hds[0], DEVICE_BIG_ENDIAN); } if (serial_hds[1]) { -serial_mm_init(address_space_mem, MPC8544_SERIAL1_REGS_BASE, +serial_mm_init(ccsr, MPC8544_SERIAL1_REGS_OFFSET, 0, mpic[12+26], 399193, - serial_hds[0], DEVICE_BIG_ENDIAN); + serial_hds[1], DEVICE_BIG_ENDIAN); } /* General Utility device */ -sysbus_create_simple(mpc8544-guts, MPC8544_UTIL_BASE, NULL); +dev = qdev_create(NULL, mpc8544-guts); +qdev_init_nofail(dev); +s = sysbus_from_qdev(dev); +memory_region_add_subregion(ccsr, MPC8544_UTIL_OFFSET, + s-mmio[0].memory); /* PCI */ -dev = sysbus_create_varargs(e500-pcihost, MPC8544_PCI_REGS_BASE, -mpic[pci_irq_nrs[0]], mpic[pci_irq_nrs[1]], -mpic[pci_irq_nrs[2]], mpic[pci_irq_nrs[3]], -NULL); +dev = qdev_create
Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, October 03, 2012 5:41 PM To: Bhushan Bharat-R65777 Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; Avi Kivity Subject: Re: [PATCH 2/2] Adding BAR0 for e500 PCI controller On 03.10.2012, at 13:50, Bharat Bhushan wrote: PCI Root complex have TYPE-1 configuration header while PCI endpoint have type-0 configuration header. The type-1 configuration header have a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci address space to CCSR address space. This can used for 2 purposes: 1) for MSI interrupt generation 2) Allow CCSR registers access when configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM guest. What I observed is that when guest read the size of BAR0 of host controller configuration header (TYPE1 header) then it always reads it as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the PCI controller device registering BAR0. I do not find any other controller also doing so may they do not use BAR0. There are two issues when BAR0 is not there (which I can think of): 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header) and when reading the size of BAR0, it should give size as per real h/w. This patch solves this problem. 2) Do we need this BAR0 inbound address translation? When BAR0 is of non-zero size then it will be configured for PCI address space to local address(CCSR) space translation on inbound access. The primary use case is for MSI interrupt generation. The device is configured with a address offsets in PCI address space, which will be translated to MSI interrupt generation MPIC registers. Currently I do not understand the MSI interrupt generation mechanism in QEMU and also IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines. But this BAR0 will be used when using MSI on e500. I can see one more issue, There are ATMUs emulated in hw/ppce500_pci.c, but i do not see these being used for address translation. So far that works because pci address space and local address space are 1:1 mapped. BAR0 inbound translation + ATMU translation will complete the address translation of inbound traffic. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- hw/ppc/e500.c|1 + hw/ppce500_pci.c | 13 + 2 files changed, 14 insertions(+), 0 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index 197411d..c7ae2b6 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -518,6 +518,7 @@ void ppce500_init(PPCE500Params *params) /* PCI */ dev = qdev_create(NULL, e500-pcihost); +qdev_prop_set_ptr(dev, bar0_region, ccsr); qdev_init_nofail(dev); s = sysbus_from_qdev(dev); sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -87,6 +87,7 @@ struct PPCE500PCIState { /* mmio maps */ MemoryRegion container; MemoryRegion iomem; +void *bar0; }; typedef struct PPCE500PCIState PPCE500PCIState; @@ -315,6 +316,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev) int i; MemoryRegion *address_space_mem = get_system_memory(); MemoryRegion *address_space_io = get_system_io(); +PCIDevice *pdev; +MemoryRegion bar0; h = PCI_HOST_BRIDGE(dev); s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +345,10 @@ static int e500_pcihost_initfn(SysBusDevice *dev) memory_region_add_subregion(s-container, PCIE500_REG_BASE, s-iomem); sysbus_init_mmio(dev, s-container); +bar0 = *(MemoryRegion *)s-bar0; +pdev = pci_find_device(b, 0, 0); +pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, bar0); Any reason you can't just pass in s-bar0 here? Though I'm not sure whether we have to do something special to get a memory region alias. Yes I think this should work: pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, (MemoryRegion *)s-bar0); Thanks -Bharat Avi, any ideas? We want the same memory region mapped twice. Once at a fixed address, once as BAR0 of the PCI host controller. Alex + return 0; } @@ -363,6 +370,11 @@ static const TypeInfo e500_host_bridge_info = { .class_init= e500_host_bridge_class_init, }; +static Property pci_host_dev_info[] = { +DEFINE_PROP_PTR(bar0_region, PPCE500PCIState, bar0), +DEFINE_PROP_END_OF_LIST(), +}; + static void e500_pcihost_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -370,6 +382,7 @@ static void e500_pcihost_class_init(ObjectClass *klass, void *data) k-init = e500_pcihost_initfn; dc-vmsd = vmstate_ppce500_pci; +dc-props = pci_host_dev_info; } static
Re: [Qemu-devel] [PATCH] Adding BAR0 for e500 PCI controller
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, September 19, 2012 4:33 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; afaer...@suse.de; Bhushan Bharat-R65777 Subject: Re: [PATCH] Adding BAR0 for e500 PCI controller On 19.09.2012, at 09:41, Bharat Bhushan wrote: PCI Root complex have TYPE-1 configuration header while PCI endpoint have type-0 configuration header. The type-1 configuration header have a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci address space to CCSR address space. This can used for 2 purposes: 1) for MSI interrupt generation 2) Allow CCSR registers access when configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM guest. What I observed is that when guest read the size of BAR0 of host controller configuration header (TYPE1 header) then it always reads it as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the PCI controller device registering BAR0. I do not find any other controller also doing so may they do not use BAR0. There are two issues when BAR0 is not there (which I can think of): 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header) and when reading the size of BAR0, it should give size as per real h/w. This patch solves this problem. 2) Do we need this BAR0 inbound address translation? When BAR0 is of non-zero size then it will be configured for PCI address space to local address(CCSR) space translation on inbound access. The primary use case is for MSI interrupt generation. The device is configured with a address offsets in PCI address space, which will be translated to MSI interrupt generation MPIC registers. Currently I do not understand the MSI interrupt generation mechanism in QEMU and also IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines. But this BAR0 will be used when using MSI on e500. I can see one more issue, There are ATMUs emulated in hw/ppce500_pci.c, but i do not see these being used for address translation. So far that works because pci address space and local address space are 1:1 mapped. BAR0 inbound translation + ATMU translation will complete the address translation of inbound traffic. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- hw/pci_host.h|9 + hw/ppc/e500.c| 21 + hw/ppce500_pci.c |7 +++ 3 files changed, 37 insertions(+), 0 deletions(-) diff --git a/hw/pci_host.h b/hw/pci_host.h index 4b9c300..c1ec7eb 100644 --- a/hw/pci_host.h +++ b/hw/pci_host.h @@ -41,10 +41,19 @@ struct PCIHostState { MemoryRegion data_mem; MemoryRegion mmcfg; MemoryRegion *address_space; +MemoryRegion bar0; uint32_t config_reg; PCIBus *bus; }; +typedef struct PPCE500CCSRState { +SysBusDevice *parent; +MemoryRegion ccsr_space; +} PPCE500CCSRState; + +#define TYPE_CCSR e500-ccsr +#define CCSR(obj) OBJECT_CHECK(PPCE500CCSRState, (obj), TYPE_CCSR) + All of this is e500 specific, so it should definitely not be in any generic pci host header. Yes, ok. /* common internal helpers for PCI/PCIe hosts, cut off overflows */ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr, uint32_t limit, uint32_t val, uint32_t len); diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index 6f0de6d..1f5da28 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -31,6 +31,7 @@ #include hw/loader.h #include elf.h #include hw/sysbus.h +#include hw/pci_host.h #include exec-memory.h #include host-utils.h @@ -587,3 +588,23 @@ void ppce500_init(PPCE500Params *params) kvmppc_init(); } } + +static void e500_ccsr_type_init(Object *obj) { +PPCE500CCSRState *ccsr = CCSR(obj); +ccsr-ccsr_space.size = int128_make64(MPC8544_CCSRBAR_SIZE); +} + +static const TypeInfo e500_ccsr_info = { +.name = TYPE_CCSR, +.parent= TYPE_SYS_BUS_DEVICE, +.instance_size = sizeof(PPCE500CCSRState), +.instance_init = e500_ccsr_type_init, }; + +static void e500_ccsr_register_types(void) { +type_register_static(e500_ccsr_info); +} + +type_init(e500_ccsr_register_types) diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..135f2a6 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -315,6 +315,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev) int i; MemoryRegion *address_space_mem = get_system_memory(); MemoryRegion *address_space_io = get_system_io(); +PPCE500CCSRState *ccsr; +PCIDevice *pdev; h = PCI_HOST_BRIDGE(dev); s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +344,11 @@ static int e500_pcihost_initfn(SysBusDevice *dev) memory_region_add_subregion(s-container, PCIE500_REG_BASE, s
Re: [Qemu-devel] [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, August 15, 2012 6:59 AM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat- R65777 Subject: Re: [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller On 08/14/2012 07:50 AM, Bharat Bhushan wrote: PCI Root complex have TYPE-1 configuration header while PCI endpoint have type-0 configuration header. The type-1 configuration header have a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci address space to CCSR address space. This can used for 2 purposes: 1) for MSI interrupt generation 2) Allow CCSR registers access when configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM guest. What I observed is that when guest read the size of BAR0 of host controller configuration header (TYPE1 header) then it always reads it as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the PCI controller device registering BAR0. I do not find any other controller also doing so may they do not use BAR0. There are two issues when BAR0 is not there (which I can think of): 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header) and when reading the size of BAR0, it should give size as per real h/w. 2) Do we need this BAR0 inbound address translation? When BAR0 is of non-zero size then it will be configured for PCI address space to local address(CCSR) space translation on inbound access. The primary use case is for MSI interrupt generation. The device is configured with a address offsets in PCI address space, which will be translated to MSI interrupt generation MPIC registers. Currently I do not understand the MSI interrupt generation mechanism in QEMU and also IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines. But this BAR0 will be used when using MSI on e500. This patch is only trying to address #1, right? I don't see any connection from this BAR to CCSR. +memory_region_init_io(h-bar0, pci_host_conf_be_ops, h, + PCIHOST-bar0, 0x100); 0x0100 is correct for e500mc-based systems, but it should be 0x0010 for e500v2-based systems. Scott, Currently we have a generic e500 machine which have CCSR size 0x0010 (MPC8544_CCSRBAR_SIZE). We do not have e500mc and e500v2 machines. So should we make this 0x0010 as per generic e500 machine? Can we somehow pass this via qdev/varargs from machine emulation code (hw/ppc/e500.c) ? Thanks -Bharat -Scott
Re: [Qemu-devel] Isuue assiging devices using VFIO on x86
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, August 29, 2012 12:16 PM To: Bhushan Bharat-R65777 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org Subject: Re: Isuue assiging devices using VFIO on x86 On Tue, 2012-08-28 at 17:10 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Tuesday, August 28, 2012 9:27 PM To: Bhushan Bharat-R65777 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org Subject: Re: Isuue assiging devices using VFIO on x86 On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote: Hi Alex, In my susyem I have following devices: I tried assigning a following PCI devices: 00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI Controller (rev 03) 00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller (rev 03) 00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT Controller (rev 03) and getting below error: --- Command: --- qemu-system-x86_64 -enable-kvm -nographic -kernel /boot/vmlinuz-3.6.0-rc2+ - initrd /boot/initramfs-3.6.0-rc2+.img -append root=/dev/sda1 -m 1024 -drive file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio- pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device vfio- pci,host=:00:03.3 Error prints: qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device :00:03.0 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO :00:03.0 BAR 0 is too small to mmap, this may affect performance. qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device :00:03.2 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device :00:03.3 does not support reset qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6 CPU #0: EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38 EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000fcd68 0037 IDT= 000fdb60 CR0=0011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= Aborted (core dumped) Linux: http://github.com/awilliam/linux-vfio.git next Qemu: https://github.com/awilliam/qemu-vfio.git iommu-group-vfio Any idea what's is the issue. Please try the vfio-pci-for-qemu-1.2-v3 qemu-vfio.git tag, I'm not even sure what's in that old iommu-group-vfio branch. Thanks, Can you share the command to configure qemu? Nothing special: ./configure I sometimes set --prefix or --target-list, but nothing that changes the binaries. Thanks, Hello Alex, I am using ./configure --target-list=x86_64-softmmu --enable-kvm . QEMU repo is at: commit 3b4672f6614e07f9dc0a85a12a8c89d480a2493c Author: Alex Williamson alex.william...@redhat.com Date: Tue Aug 14 14:07:19 2012 -0600 vfio: Enable vfio-pci and mark supported Signed-off-by: Alex Williamson alex.william...@redhat.com Linux Repo is at : commit d9875690d9b89a866022ff49e3fcea892345ad92 Author: Linus Torvalds torva...@linux-foundation.org Date: Thu Aug 16 14:51:24 2012 -0700 Linux 3.6-rc2 --- Then I bound 3 pci devices with VFIO and run below command to launch VM qemu-system-x86_64 -enable-kvm -nographic -kernel /boot/vmlinuz-3.6.0-rc2+ -initrd /boot/initramfs-3.6.0-rc2+.img
Re: [Qemu-devel] Isuue assiging devices using VFIO on x86
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, August 29, 2012 8:44 PM To: Bhushan Bharat-R65777 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org Subject: Re: Isuue assiging devices using VFIO on x86 On Wed, 2012-08-29 at 07:11 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, August 29, 2012 12:16 PM To: Bhushan Bharat-R65777 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org Subject: Re: Isuue assiging devices using VFIO on x86 On Tue, 2012-08-28 at 17:10 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Tuesday, August 28, 2012 9:27 PM To: Bhushan Bharat-R65777 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org Subject: Re: Isuue assiging devices using VFIO on x86 On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote: Hi Alex, In my susyem I have following devices: I tried assigning a following PCI devices: 00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI Controller (rev 03) 00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller (rev 03) 00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT Controller (rev 03) Hello Alex, I am using ./configure --target-list=x86_64-softmmu --enable-kvm . QEMU repo is at: commit 3b4672f6614e07f9dc0a85a12a8c89d480a2493c Author: Alex Williamson alex.william...@redhat.com Date: Tue Aug 14 14:07:19 2012 -0600 vfio: Enable vfio-pci and mark supported Signed-off-by: Alex Williamson alex.william...@redhat.com Linux Repo is at : commit d9875690d9b89a866022ff49e3fcea892345ad92 Author: Linus Torvalds torva...@linux-foundation.org Date: Thu Aug 16 14:51:24 2012 -0700 Linux 3.6-rc2 --- Then I bound 3 pci devices with VFIO and run below command to launch VM qemu-system-x86_64 -enable-kvm -nographic -kernel /boot/vmlinuz-3.6.0-rc2+ -initrd /boot/initramfs-3.6.0-rc2+.img -append root=/dev/sda1 -m 1024 -drive file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio-pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device vfio-pci,host=:00:03.3 Then I was getting below errors: qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed to set iommu for container: Operation not permitted qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed to setup container for group 2 qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed to get group 2 qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Device 'vfio-pci' could not be initialized - Then I set /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts = 1 And getting below error: qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device :00:03.0 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device :00:03.2 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device :00:03.3 does not support reset qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6 CPU #0: EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38 EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000fcd68 0037 IDT= 000fdb60 CR0=0011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07=
[Qemu-devel] Isuue assiging devices using VFIO on x86
Hi Alex, In my susyem I have following devices: I tried assigning a following PCI devices: 00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI Controller (rev 03) 00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller (rev 03) 00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT Controller (rev 03) and getting below error: --- Command: --- qemu-system-x86_64 -enable-kvm -nographic -kernel /boot/vmlinuz-3.6.0-rc2+ -initrd /boot/initramfs-3.6.0-rc2+.img -append root=/dev/sda1 -m 1024 -drive file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio-pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device vfio-pci,host=:00:03.3 Error prints: qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device :00:03.0 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO :00:03.0 BAR 0 is too small to mmap, this may affect performance. qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device :00:03.2 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device :00:03.3 does not support reset qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6 CPU #0: EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38 EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000fcd68 0037 IDT= 000fdb60 CR0=0011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= Aborted (core dumped) Linux: http://github.com/awilliam/linux-vfio.git next Qemu: https://github.com/awilliam/qemu-vfio.git iommu-group-vfio Any idea what's is the issue. Thanks -Bharat
Re: [Qemu-devel] Isuue assiging devices using VFIO on x86
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Tuesday, August 28, 2012 9:27 PM To: Bhushan Bharat-R65777 Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org Subject: Re: Isuue assiging devices using VFIO on x86 On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote: Hi Alex, In my susyem I have following devices: I tried assigning a following PCI devices: 00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI Controller (rev 03) 00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller (rev 03) 00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT Controller (rev 03) and getting below error: --- Command: --- qemu-system-x86_64 -enable-kvm -nographic -kernel /boot/vmlinuz-3.6.0-rc2+ - initrd /boot/initramfs-3.6.0-rc2+.img -append root=/dev/sda1 -m 1024 -drive file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio- pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device vfio- pci,host=:00:03.3 Error prints: qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device :00:03.0 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO :00:03.0 BAR 0 is too small to mmap, this may affect performance. qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device :00:03.2 does not support reset qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device :00:03.3 does not support reset qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6 CPU #0: EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38 EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000fcd68 0037 IDT= 000fdb60 CR0=0011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= Aborted (core dumped) Linux: http://github.com/awilliam/linux-vfio.git next Qemu: https://github.com/awilliam/qemu-vfio.git iommu-group-vfio Any idea what's is the issue. Please try the vfio-pci-for-qemu-1.2-v3 qemu-vfio.git tag, I'm not even sure what's in that old iommu-group-vfio branch. Thanks, Can you share the command to configure qemu? Thanks -Bharat Alex
Re: [Qemu-devel] [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, August 15, 2012 6:59 AM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat- R65777 Subject: Re: [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller On 08/14/2012 07:50 AM, Bharat Bhushan wrote: PCI Root complex have TYPE-1 configuration header while PCI endpoint have type-0 configuration header. The type-1 configuration header have a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci address space to CCSR address space. This can used for 2 purposes: 1) for MSI interrupt generation 2) Allow CCSR registers access when configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM guest. What I observed is that when guest read the size of BAR0 of host controller configuration header (TYPE1 header) then it always reads it as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the PCI controller device registering BAR0. I do not find any other controller also doing so may they do not use BAR0. There are two issues when BAR0 is not there (which I can think of): 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header) and when reading the size of BAR0, it should give size as per real h/w. 2) Do we need this BAR0 inbound address translation? When BAR0 is of non-zero size then it will be configured for PCI address space to local address(CCSR) space translation on inbound access. The primary use case is for MSI interrupt generation. The device is configured with a address offsets in PCI address space, which will be translated to MSI interrupt generation MPIC registers. Currently I do not understand the MSI interrupt generation mechanism in QEMU and also IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines. But this BAR0 will be used when using MSI on e500. This patch is only trying to address #1, right? Yes. I don't see any connection from this BAR to CCSR. Yes, it is not there. That can be the next step. +memory_region_init_io(h-bar0, pci_host_conf_be_ops, h, + PCIHOST-bar0, 0x100); 0x0100 is correct for e500mc-based systems, but it should be 0x0010 for e500v2-based systems. Yes, I will correct this. Thanks -Bharat -Scott
Re: [Qemu-devel] Running KVM guest on X86
On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote: Hi Avi/All, I am facing issue to boot KVM guest on x86 (I used to work on PowerPC platform and do not have enough knowledge of x86). I am working on making VFIO working on PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled kernel for x86 on fedora with virtualization configuration (selected all kernel config options for same). Run below command to boot Guest (I have not provided vfio device yet): qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial tcp::,server,telnet After the I can see qemu command line (able to run various commands like info registers etc), while guest does not boot (not even the first print comes). Can anyone help in what I am missing or doing wrong? x86 doesn't use the serial port for console by default, so you're making things quite a bit more difficult that way. Typically you'll want to provide a disk image (the -hda option is the easiest way to do this), a display (-vga std -vnc :0 is again easiest), and probably something to install from (-cdrom image.iso). You can also add a -boot d to get it to choose the cdrom the first time for install. Thanks, Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append console=ttyS0 Note, once you get to user space you will need a getty specified in inittab in order to get a login on your serial port. Something like: T0:23:respawn:/sbin/getty -L ttyS0 1) I tried booting with prebuilt qcow2 then it works for me: qemu-system-x86_64 -enable-kvm -nographic -device sga -m 1024 -hda debian_squeeze_amd64_standard.qcow2 Does anyone help on how I can add my kernel to qcow2? Or create a proper qcow2? 2) Also I tried as mentioned in section 3.9 Direct Linux Boot: http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages : qemu-kvm -enable-kvm -nographic -kernel /boot/vmlinuz-3.5.0+ -hda /boot/initramfs-3.5.0+.img -append console=ttyS0 root=/dev/sda -m 1024 -hda /boot/initramfs-3.5.0+.img is incorrect. Should be -hda debian_squeeze_amd64_standard.qcow2 -initrd /boot/initramfs-3.5.0+.img and root=/dev/sda1 probably. I tried : qemu-system-x86_64 -enable-kvm -nographic -kernel /boot/vmlinuz-3.5.0+ -initrd /boot/initramfs-3.5.0+.img -append root=/dev/sda1 rw console=ttyS0 -m 1024 -hda debian_squeeze_amd64_standard.qcow2 With this I get the login prompt, but it is not taking input character from keyboard properly (not able to give login credentials even). Seeing some weird behavior, like sometimes it treat normal character as like ENTER pressed. Below are some boot prints and it is found that there were some junk characters after Setting console screen modes. -- Setting console screen modes. ]Rcannot (un)set powersave mode [9;30][14;30]Skipping font and keymap setup (handled by console-setup). Setting up console font and keymap...done. [ 11.547904] rc (278) used greatest stack depth: 1760 bytes left Starting portmap daemon...Already running.. Starting NFS common utilities: statd. Starting enhanced syslogd: rsyslogd. Starting deferred execution scheduler: atd. Starting ACPI services Starting periodic command scheduler: cron. Starting MTA: exim4. Debian GNU/Linux 6.0 debian-amd64 ttyS0 debian-amd64 login: Thanks -Bharat
Re: [Qemu-devel] Running KVM guest on X86
-Original Message- From: Stuart Yoder [mailto:b08...@gmail.com] Sent: Thursday, August 09, 2012 8:28 PM To: Bhushan Bharat-R65777 Cc: Alex Williamson; qemu-devel@nongnu.org; Avi Kivity Subject: Re: [Qemu-devel] Running KVM guest on X86 On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777 r65...@freescale.com wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Monday, August 06, 2012 9:27 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; Avi Kivity Subject: Re: Running KVM guest on X86 On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote: Hi Avi/All, I am facing issue to boot KVM guest on x86 (I used to work on PowerPC platform and do not have enough knowledge of x86). I am working on making VFIO working on PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled kernel for x86 on fedora with virtualization configuration (selected all kernel config options for same). Run below command to boot Guest (I have not provided vfio device yet): qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial tcp::,server,telnet After the I can see qemu command line (able to run various commands like info registers etc), while guest does not boot (not even the first print comes). Can anyone help in what I am missing or doing wrong? x86 doesn't use the serial port for console by default, so you're making things quite a bit more difficult that way. Typically you'll want to provide a disk image (the -hda option is the easiest way to do this), a display (-vga std -vnc :0 is again easiest), and probably something to install from (-cdrom image.iso). You can also add a -boot d to get it to choose the cdrom the first time for install. Thanks, Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append console=ttyS0 Note, once you get to user space you will need a getty specified in inittab in order to get a login on your serial port. Something like: T0:23:respawn:/sbin/getty -L ttyS0 1) I tried booting with prebuilt qcow2 then it works for me: qemu-system-x86_64 -enable-kvm -nographic -device sga -m 1024 -hda debian_squeeze_amd64_standard.qcow2 Does anyone help on how I can add my kernel to qcow2? Or create a proper qcow2? 2) Also I tried as mentioned in section 3.9 Direct Linux Boot: http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages : qemu-kvm -enable-kvm -nographic -kernel /boot/vmlinuz-3.5.0+ -hda /boot/initramfs-3.5.0+.img -append console=ttyS0 root=/dev/sda -m 1024 I get below error : [1.299225] No filesystem could mount root, tried: ext3 ext2 ext4 iso9660 [1.303232] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,0) [1.307683] Pid: 1, comm: swapper/0 Not tainted 3.3.5-2.fc16.x86_64 #1 [1.311201] Call Trace: [1.312548] [815eac62] panic+0xba/0x1cd [1.315160] [81cf1075] mount_block_root+0x258/0x283 [1.318275] [81cf10f3] mount_root+0x53/0x57 [1.321047] [81cf1234] prepare_namespace+0x13d/0x176 [1.324206] [81cf0d59] kernel_init+0x156/0x15b [1.327114] [81089587] ? schedule_tail+0x27/0xb0 [1.330102] [815fd6a4] kernel_thread_helper+0x4/0x10 [1.333413] [81cf0c03] ? start_kernel+0x3c5/0x3c5 [1.336446] [815fd6a0] ? gs_change+0x13/0x13 Thanks -Bharat
Re: [Qemu-devel] Running KVM guest on X86
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, August 09, 2012 11:25 PM To: Bhushan Bharat-R65777 Cc: Stuart Yoder; qemu-devel@nongnu.org; Avi Kivity Subject: Re: [Qemu-devel] Running KVM guest on X86 On Thu, 2012-08-09 at 17:39 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Stuart Yoder [mailto:b08...@gmail.com] Sent: Thursday, August 09, 2012 8:28 PM To: Bhushan Bharat-R65777 Cc: Alex Williamson; qemu-devel@nongnu.org; Avi Kivity Subject: Re: [Qemu-devel] Running KVM guest on X86 On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777 r65...@freescale.com wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Monday, August 06, 2012 9:27 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; Avi Kivity Subject: Re: Running KVM guest on X86 On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote: Hi Avi/All, I am facing issue to boot KVM guest on x86 (I used to work on PowerPC platform and do not have enough knowledge of x86). I am working on making VFIO working on PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled kernel for x86 on fedora with virtualization configuration (selected all kernel config options for same). Run below command to boot Guest (I have not provided vfio device yet): qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial tcp::,server,telnet After the I can see qemu command line (able to run various commands like info registers etc), while guest does not boot (not even the first print comes). Can anyone help in what I am missing or doing wrong? x86 doesn't use the serial port for console by default, so you're making things quite a bit more difficult that way. Typically you'll want to provide a disk image (the -hda option is the easiest way to do this), a display (-vga std -vnc :0 is again easiest), and probably something to install from (-cdrom image.iso). You can also add a -boot d to get it to choose the cdrom the first time for install. Thanks, Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append console=ttyS0 Note, once you get to user space you will need a getty specified in inittab in order to get a login on your serial port. Something like: T0:23:respawn:/sbin/getty -L ttyS0 1) I tried booting with prebuilt qcow2 then it works for me: qemu-system-x86_64 -enable-kvm -nographic -device sga -m 1024 -hda debian_squeeze_amd64_standard.qcow2 Does anyone help on how I can add my kernel to qcow2? Or create a proper qcow2? 2) Also I tried as mentioned in section 3.9 Direct Linux Boot: http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages : qemu-kvm -enable-kvm -nographic -kernel /boot/vmlinuz-3.5.0+ -hda /boot/initramfs-3.5.0+.img -append console=ttyS0 root=/dev/sda -m 1024 I get below error : [1.299225] No filesystem could mount root, tried: ext3 ext2 ext4 iso9660 [1.303232] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,0) Chances are your root is a partition on /dev/sda, not the disk itself (ex. sda1, sda2)... Why exactly are you causing yourself so much pain by doing these direct boot, no VGA options instead of starting with an ISO image, installing it, then setting up a serial console if you want serial access? This may be the norm for powerpc VMs, but it's not how I think most people setup x86 VMs. Thanks, I have a fedora machine to which I do not have direct access (but I can reboot remotely, have a console). So far what I was trying direct booting VM using same initramfs and bzimage as of host, Alex, How I can create a ISO image with my kernel? Where I should place that on host? Thanks -Bharat Alex [1.307683] Pid: 1, comm: swapper/0 Not tainted 3.3.5-2.fc16.x86_64 #1 [1.311201] Call Trace: [1.312548] [815eac62] panic+0xba/0x1cd [1.315160] [81cf1075] mount_block_root+0x258/0x283 [1.318275] [81cf10f3] mount_root+0x53/0x57 [1.321047] [81cf1234] prepare_namespace+0x13d/0x176 [1.324206] [81cf0d59] kernel_init+0x156/0x15b [1.327114] [81089587] ? schedule_tail+0x27/0xb0 [1.330102] [815fd6a4] kernel_thread_helper+0x4/0x10 [1.333413] [81cf0c03] ? start_kernel+0x3c5/0x3c5 [1.336446] [815fd6a0] ? gs_change+0x13/0x13 Thanks -Bharat
Re: [Qemu-devel] Running KVM guest on X86
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Monday, August 06, 2012 9:27 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; Avi Kivity Subject: Re: Running KVM guest on X86 On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote: Hi Avi/All, I am facing issue to boot KVM guest on x86 (I used to work on PowerPC platform and do not have enough knowledge of x86). I am working on making VFIO working on PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled kernel for x86 on fedora with virtualization configuration (selected all kernel config options for same). Run below command to boot Guest (I have not provided vfio device yet): qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial tcp::,server,telnet After the I can see qemu command line (able to run various commands like info registers etc), while guest does not boot (not even the first print comes). Can anyone help in what I am missing or doing wrong? x86 doesn't use the serial port for console by default, so you're making things quite a bit more difficult that way. Typically you'll want to provide a disk image (the -hda option is the easiest way to do this), a display (-vga std -vnc :0 is again easiest), and probably something to install from (-cdrom image.iso). You can also add a -boot d to get it to choose the cdrom the first time for install. Thanks, Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append console=ttyS0 Now my exact command is like: qemu-system-x86_64 -enable-kvm -nographic -nodefconfig -kernel /boot/vmlinuz-3.5.0+ -initrd /boot/initramfs-3.5.0+.img -append console=ttyS0 -hda fedora.qcow -m 1024 Where fedora.qcow is created by qemu-img create fedora.qcow 5G With this it is falling to Dracut, below are the error prints: [2.288931] dracut: FATAL: No or empty root= argument [2.291808] dracut: Refusing to continue [2.294721] dracut Warning: Signal caught! dracut Warning: Signal caught! [2.298894] dracut Warning: dracut: FATAL: No or empty root= argument dracut Warning: dracut: FATAL: No or empty root= argument [2.304713] dracut Warning: dracut: Refusing to continue dracut Warning: dracut: Refusing to continue [2.320311] init (1) used greatest stack depth: 2664 bytes left [2.323851] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0100 [2.323851] [2.324551] Pid: 1, comm: init Not tainted 3.5.0+ #7 [2.324551] Call Trace: [2.324551] [8171c34e] panic+0xc6/0x1e1 [2.324551] [81729be0] ? _raw_write_unlock_irq+0x30/0x50 [2.324551] [810c0020] do_exit+0xa20/0xb70 [2.324551] [8172a215] ? retint_swapgs+0x13/0x1b [2.324551] [810c04bf] do_group_exit+0x4f/0xc0 [2.324551] [810c0547] sys_exit_group+0x17/0x20 [2.324551] [81732d69] system_call_fastpath+0x16/0x1b - Should I mount the root-file-system in disk and pass -append root=mount-path. I had no luck when I tried qemu-system-x86_64 -device virtio-scsi-pci,id=scsi -enable-kvm -nographic -nodefconfig -nodefaults -chardev stdio,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -drive file=fedora.qcow,if=none,id=sda -device scsi-disk,bus=scsi.0,drive=sda -device sga -kernel /boot/vmlinuz-3.5.0+ -initrd /boot/initramfs-3.5.0+.img -append console=ttyS0 -m 1024 Thanks in Advance -Bharat
[Qemu-devel] Running KVM guest on X86
Hi Avi/All, I am facing issue to boot KVM guest on x86 (I used to work on PowerPC platform and do not have enough knowledge of x86). I am working on making VFIO working on PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled kernel for x86 on fedora with virtualization configuration (selected all kernel config options for same). Run below command to boot Guest (I have not provided vfio device yet): qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial tcp::,server,telnet After the I can see qemu command line (able to run various commands like info registers etc), while guest does not boot (not even the first print comes). Can anyone help in what I am missing or doing wrong? Thanks -Bharat
Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, August 01, 2012 11:31 PM To: Bhushan Bharat-R65777 Cc: Alexander Graf; qemu-...@nongnu.org List; kvm-...@vger.kernel.org; qemu- devel qemu-devel; KVM list Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog On 08/01/2012 12:27 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, August 01, 2012 7:57 AM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777; qemu-devel qemu-devel; KVM list Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog On 20.07.2012, at 07:23, Bharat Bhushan wrote: @@ -384,6 +437,15 @@ int kvm_arch_init_vcpu(CPUPPCState *cenv) return ret; } +if (enable_watchdog_support) { +ret = kvm_watchdog_enable(cenv); Do you think this is a good idea? Why would real hardware not implement a watchdog just because the user didn't select an action? If there is no watchdog action then why we want to run watchdog timer? On real hardware, if software sets WRC to a non-zero value, the watchdog action is a system reset. The user doesn't have to do anything special. Scott, maybe I misunderstood you comment, did not you commented to enable kvm watchdog if there is watchdog action provided by use. Thanks -Bharat
Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog
-Original Message- From: Wood Scott-B07421 Sent: Friday, August 03, 2012 2:02 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; Alexander Graf; qemu-...@nongnu.org List; kvm- p...@vger.kernel.org; qemu-devel qemu-devel; KVM list Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog On 08/02/2012 10:56 AM, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Wednesday, August 01, 2012 11:31 PM To: Bhushan Bharat-R65777 Cc: Alexander Graf; qemu-...@nongnu.org List; kvm-...@vger.kernel.org; qemu- devel qemu-devel; KVM list Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog On 08/01/2012 12:27 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, August 01, 2012 7:57 AM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777; qemu-devel qemu-devel; KVM list Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog On 20.07.2012, at 07:23, Bharat Bhushan wrote: @@ -384,6 +437,15 @@ int kvm_arch_init_vcpu(CPUPPCState *cenv) return ret; } +if (enable_watchdog_support) { +ret = kvm_watchdog_enable(cenv); Do you think this is a good idea? Why would real hardware not implement a watchdog just because the user didn't select an action? If there is no watchdog action then why we want to run watchdog timer? On real hardware, if software sets WRC to a non-zero value, the watchdog action is a system reset. The user doesn't have to do anything special. Scott, maybe I misunderstood you comment, did not you commented to enable kvm watchdog if there is watchdog action provided by use. I changed my mind. :-) The main difference between the user specifically asking for an action of system reset, and the user not asking for anything, is that only in the former case should we error out if watchdog functionality isn't available -- but if it's a pain to distinguish, don't error out in either case. :-) .. ok -Bharat -Scott
Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, August 01, 2012 7:57 AM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777; qemu-devel qemu-devel; KVM list Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog On 20.07.2012, at 07:23, Bharat Bhushan wrote: Enable the KVM emulated watchdog if KVM supports (use the capability enablement in watchdog handler). Also watchdog exit (KVM_EXIT_WATCHDOG) handling is added. Watchdog state machine is cleared whenever VM state changes to running. This is to handle the cases like return from debug halt etc. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v4: Enbale watchdog support only when user specifies watchdog-action Earlier this was [3/3] of the patch series. Now because i separated linux-header updation in separate patch, so this become [4/4] v3: - TSR clearing is removed in whatchdog exit handling as this is no more needed. v2: - Merged ([PATCH 3/4] Watchdog exit handling support) and ([PATCH 4/4] Enable to use kvm emulated watchdog) - Clear watchdog state machine when VM state changes to running. hw/ppc_booke.c |5 sysemu.h |1 + target-ppc/cpu.h |1 + target-ppc/kvm.c | 69 ++ vl.c |2 + 5 files changed, 78 insertions(+), 0 deletions(-) diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..478dbcd 100644 --- a/hw/ppc_booke.c +++ b/hw/ppc_booke.c @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque) booke_timer-wdt_timer); } +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong tsr) +{ +env-spr[SPR_BOOKE_TSR] = tsr ~(TSR_ENW | TSR_WIS | +TSR_WRS_MASK); } + void store_booke_tsr(CPUPPCState *env, target_ulong val) { env-spr[SPR_BOOKE_TSR] = ~val; diff --git a/sysemu.h b/sysemu.h index bc2c788..fc388b7 100644 --- a/sysemu.h +++ b/sysemu.h @@ -131,6 +131,7 @@ extern uint8_t *boot_splash_filedata; extern int boot_splash_filedata_size; extern uint8_t qemu_extra_params_fw[2]; extern QEMUClock *rtc_clock; +extern int enable_watchdog_support; #define MAX_NODES 64 extern int nb_numa_nodes; diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index ca2fc21..163389a 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -1191,6 +1191,7 @@ void store_40x_dbcr0 (CPUPPCState *env, uint32_t val); void store_40x_sler (CPUPPCState *env, uint32_t val); void store_booke_tcr (CPUPPCState *env, target_ulong val); void store_booke_tsr (CPUPPCState *env, target_ulong val); +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong +tsr); void booke206_flush_tlb(CPUPPCState *env, int flags, const int check_iprot); target_phys_addr_t booke206_tlb_to_page_size(CPUPPCState *env, ppcmas_tlb_t *tlb); int ppcmas_tlb_check(CPUPPCState *env, ppcmas_tlb_t *tlb, diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index b6ef72d..0226b5e 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -32,6 +32,7 @@ #include device_tree.h #include hw/sysbus.h #include hw/spapr.h +#include hw/watchdog.h #include hw/sysbus.h #include hw/spapr.h @@ -60,6 +61,7 @@ static int cap_booke_sregs; static int cap_ppc_smt; static int cap_ppc_rma; static int cap_spapr_tce; +static int cap_ppc_watchdog; /* XXX We have a race condition where we actually have a level triggered * interrupt, but the infrastructure can't expose that yet, so the guest @@ -86,6 +88,7 @@ int kvm_arch_init(KVMState *s) cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT); cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA); cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE); +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG); if (!cap_interrupt_level) { fprintf(stderr, KVM: Couldn't find level irq capability. Expect the @@ -168,6 +171,30 @@ static int kvm_booke206_tlb_init(CPUPPCState *env) return 0; } +static int kvm_watchdog_enable(CPUPPCState *env) +{ +int ret; +struct kvm_enable_cap encap = {}; + +if (!kvm_enabled()) { +return 0; +} + +if (!cap_ppc_watchdog) { +printf(warning: KVM does not support watchdog); +return 0; +} + +encap.cap = KVM_CAP_PPC_BOOKE_WATCHDOG; +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, encap); +if (ret 0) { +fprintf(stderr, %s: couldn't enable KVM_CAP_PPC_BOOKE_WATCHDOG: %s\n, +__func__, strerror(-ret)); +return ret; +} + +return ret; +} #if defined(TARGET_PPC64) static void kvm_get_fallback_smmu_info(CPUPPCState *env, @@ -371,6 +398,32 @@ static inline void kvm_fixup_page_sizes(CPUPPCState *env) #endif /* !defined
Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, August 01, 2012 7:57 AM To: Bhushan Bharat-R65777 Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777; qemu-devel qemu-devel; KVM list Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog On 20.07.2012, at 07:23, Bharat Bhushan wrote: Enable the KVM emulated watchdog if KVM supports (use the capability enablement in watchdog handler). Also watchdog exit (KVM_EXIT_WATCHDOG) handling is added. Watchdog state machine is cleared whenever VM state changes to running. This is to handle the cases like return from debug halt etc. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v4: Enbale watchdog support only when user specifies watchdog-action Earlier this was [3/3] of the patch series. Now because i separated linux-header updation in separate patch, so this become [4/4] v3: - TSR clearing is removed in whatchdog exit handling as this is no more needed. v2: - Merged ([PATCH 3/4] Watchdog exit handling support) and ([PATCH 4/4] Enable to use kvm emulated watchdog) - Clear watchdog state machine when VM state changes to running. hw/ppc_booke.c |5 sysemu.h |1 + target-ppc/cpu.h |1 + target-ppc/kvm.c | 69 ++ vl.c |2 + 5 files changed, 78 insertions(+), 0 deletions(-) diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..478dbcd 100644 --- a/hw/ppc_booke.c +++ b/hw/ppc_booke.c @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque) booke_timer-wdt_timer); } +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong tsr) +{ +env-spr[SPR_BOOKE_TSR] = tsr ~(TSR_ENW | TSR_WIS | +TSR_WRS_MASK); } + void store_booke_tsr(CPUPPCState *env, target_ulong val) { env-spr[SPR_BOOKE_TSR] = ~val; diff --git a/sysemu.h b/sysemu.h index bc2c788..fc388b7 100644 --- a/sysemu.h +++ b/sysemu.h @@ -131,6 +131,7 @@ extern uint8_t *boot_splash_filedata; extern int boot_splash_filedata_size; extern uint8_t qemu_extra_params_fw[2]; extern QEMUClock *rtc_clock; +extern int enable_watchdog_support; #define MAX_NODES 64 extern int nb_numa_nodes; diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index ca2fc21..163389a 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -1191,6 +1191,7 @@ void store_40x_dbcr0 (CPUPPCState *env, uint32_t val); void store_40x_sler (CPUPPCState *env, uint32_t val); void store_booke_tcr (CPUPPCState *env, target_ulong val); void store_booke_tsr (CPUPPCState *env, target_ulong val); +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong +tsr); void booke206_flush_tlb(CPUPPCState *env, int flags, const int check_iprot); target_phys_addr_t booke206_tlb_to_page_size(CPUPPCState *env, ppcmas_tlb_t *tlb); int ppcmas_tlb_check(CPUPPCState *env, ppcmas_tlb_t *tlb, diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index b6ef72d..0226b5e 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -32,6 +32,7 @@ #include device_tree.h #include hw/sysbus.h #include hw/spapr.h +#include hw/watchdog.h #include hw/sysbus.h #include hw/spapr.h @@ -60,6 +61,7 @@ static int cap_booke_sregs; static int cap_ppc_smt; static int cap_ppc_rma; static int cap_spapr_tce; +static int cap_ppc_watchdog; /* XXX We have a race condition where we actually have a level triggered * interrupt, but the infrastructure can't expose that yet, so the guest @@ -86,6 +88,7 @@ int kvm_arch_init(KVMState *s) cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT); cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA); cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE); +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG); if (!cap_interrupt_level) { fprintf(stderr, KVM: Couldn't find level irq capability. Expect the @@ -168,6 +171,30 @@ static int kvm_booke206_tlb_init(CPUPPCState *env) return 0; } +static int kvm_watchdog_enable(CPUPPCState *env) +{ +int ret; +struct kvm_enable_cap encap = {}; + +if (!kvm_enabled()) { +return 0; +} + +if (!cap_ppc_watchdog) { +printf(warning: KVM does not support watchdog); +return 0; +} + +encap.cap = KVM_CAP_PPC_BOOKE_WATCHDOG; +ret = kvm_vcpu_ioctl(env, KVM_ENABLE_CAP, encap); +if (ret 0) { +fprintf(stderr, %s: couldn't enable KVM_CAP_PPC_BOOKE_WATCHDOG: %s\n, +__func__, strerror(-ret)); +return ret; +} + +return ret; +} #if defined(TARGET_PPC64) static void kvm_get_fallback_smmu_info(CPUPPCState *env, @@ -371,6 +398,32 @@ static inline void kvm_fixup_page_sizes(CPUPPCState *env) #endif /* !defined
Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Friday, June 08, 2012 4:32 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; Wood Scott-B07421; Yoder Stuart-B08248 Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex) On Fri, 2012-06-08 at 09:03 +, Bhushan Bharat-R65777 wrote: Hi All, When Freescale PCI controller configured in Root Complex mode then, its configuration header (type 1) have one inbound BAR (BAR0, called as CCSRBAR). And rest of BARs (inbound and outbound) are supported by ATMU registers, which are outside the Type 1 configuration header. This BAR0 of Type 1 configuration header always translate to CCSR space and is of the size of CCSR. This BAR0 (inbound window) is required for MSI interrupts support. With this window, the pci devices can write to the MPIC MSI registers to generate MSI interrupt. So you should start with giving an overview, nobody here knows what ATMU or CCSR are, those are device specific acronyms ATMU - Address Translation and Mapping Unit :- This facilitate flexibility in defining the address maps for the external interfaces (PCI, Rapidio etc). This basically provides following features: 1. Translating the local address to an external address space 2. Translating external addresses to the local address space CCSR - Configuration, Control, and Status Register (CCSR) Space. This is memory mapped system register space. In PCI/PCIe context this can be visualized as: -|--| Local| |PCI controller (Root Complex)|| Address | | ___| Space|-| |ATMU | | Type 1||--PCI Address Space--- | | |__| | Config Header || | |||| -|--| As a matter of fact it looks like I misunderstood you on IRC :-) IE. I thought your problem was that BAR0 is 'special' as it represents the inbound memory window (as it does on some PPC 4xx for example and a whole other collection of embedded devices). You seem to indicate that it's in fact MMIO: In some SOCs, BAR0 always maps to CCSR (MMIO) while in some new SOCs, the BAR0 default maps to CCSR space while it can be programmed to map system memory in local address space. As far as I know, as of now no emulated PCI controller supports this BAR0 in type 1 configuration header. But probably (I think so) that supporting this is not of big concern, but the point is that this window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory. So the 4xx case or the case where your BAR 0 actually represents system memory are something that is bothering me but not what you are trying to discuss here. IE. Normal BAR operations should work for an MMIO BAR, ie register it normally as a PCIe device and devices accessing those addresses will do the right thing. From there, AFAIK, the MSI code will simply do stl_le_phys, which I -believe- will hit a BAR that does MMIO decoding for those addresses, but I'll let people knowing qemu more in depth reply whether that's true or not. There's very few devices with MSI support in hw/* so it's hard to test with anything other than virtio. Probably testing with virtio should be ok as of now ( to put the framework in place). Thanks -Bharat So I have couple of concerns here: 1. Whenever PCI device does need DMA then these windows (inbound and outbound ATMUs registers) need to used to translate pci address to system physical address (Sometime we also call this as cpu address space). This will probably be done by : [Qemu-devel] [PATCH 00/12] IOMMU Infrastructure : patch-set ( I am trying to understand these patches :-)) Yes, that's basically it. The patches allow you to add a set of routines that will be used for translating DMA accesses to system memory along with map/unmap operations etc... Beware that this only works with drivers that use the proper accessors. The patch series fixes some of them but not everything. I don't know off hand (I don't have the code at hand right now) whether the MSI code goes through that or not, if it does, you may need to be careful to -not- hit the translation layer and pass the accesses on. 2. Hook up this inbound BAR0 in the above patch-set to translate to mmio-regs. As this would be controller specific, a callback will be registered for translation. Now it will be upto the controller specific code on how it translates. As this is needed only for MSI interrupt, maybe, initially we do not do anything initially, till we want MSI emulation in QEMU. Well, virtio can make good use of MSI emulation ... Cheers, Ben. Please provide your comment. Thanks -Bharat
Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)
-Original Message- From: Wood Scott-B07421 Sent: Saturday, June 09, 2012 12:27 AM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; b...@kernel.crashing.org; Wood Scott-B07421; Yoder Stuart-B08248 Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex) On 06/08/2012 04:03 AM, Bhushan Bharat-R65777 wrote: Hi All, When Freescale PCI controller configured in Root Complex mode then, its configuration header (type 1) have one inbound BAR (BAR0, called as CCSRBAR). It maps to CCSRBAR, but it's called PCICSRBAR/PEXCSRBAR. And rest of BARs (inbound and outbound) are supported by ATMU registers, which are outside the Type 1 configuration header. This BAR0 of Type 1 configuration header always translate to CCSR space and is of the size of CCSR. This BAR0 (inbound window) is required for MSI interrupts support. Some sort of inbound mapping is required for MSIs. It's typically this one or PEXMSIBAR (which is a similar concept but more full-featured, and not in config space), but doesn't have to be (e.g. we use normal inbound windows for MSI on Topaz, because of PAMU limitations that prevent us from using the CCSR address as the guest physical MSI destination). When guest access BAR0 of configuration space or it accesses PEXMSIBAR, a common inbound mapping will translate the address to CCSR (Now the translated address may be further translated to real physical by IOMMU translation layer if configured in QEMU). As far as I know, as of now no emulated PCI controller supports this BAR0 in type 1 configuration header. But probably (I think so) that supporting this is not of big concern, but the point is that this window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory. So I have couple of concerns here: 1. Whenever PCI device does need DMA then these windows (inbound and outbound ATMUs registers) need to used to translate pci address to system physical address (Sometime we also call this as cpu address space). This will probably be done by : [Qemu-devel] [PATCH 00/12] IOMMU Infrastructure : patch-set ( I am trying to understand these patches :-)) 2. Hook up this inbound BAR0 in the above patch-set to translate to mmio-regs. As this would be controller specific, a callback will be registered for translation. Now it will be upto the controller specific code on how it translates. As this is needed only for MSI interrupt, maybe, initially we do not do anything initially, till we want MSI emulation in QEMU. MSIs may be the only thing that we plan to use it for, but if you want to properly emulate the chip you need to fully implement all the translation windows. Do you mean address translation using ATMU inbound/outbound windows? Or complete translation support of CCSR space? The former, I think, not sure, should get supported by IOMMU Infrastructure patch set. The later is not supported. And if I understood correctly then, I think the I agree with you on emulating complete CCSR space. Do you think that we can start with supporting the MSI region mapping (MPIC mmio regs) and design it in way that later if needed it can be extended for rest of CCSR space. If in kernel MPIC is there, then the QEMU will use the in-kernel MPIC to generate MSI interrupt or it will use QEMU emulated MPIC to generate MSI interrupt. We also want the BAR itself to be properly emulated, as otherwise software can get confused when it reads it and sees zero as the location of the PCI DMA memory hole. Probably I did not understood clearly. Can you please elaborate of what type of confusion. Thanks -Bharat
[Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)
Hi All, When Freescale PCI controller configured in Root Complex mode then, its configuration header (type 1) have one inbound BAR (BAR0, called as CCSRBAR). And rest of BARs (inbound and outbound) are supported by ATMU registers, which are outside the Type 1 configuration header. This BAR0 of Type 1 configuration header always translate to CCSR space and is of the size of CCSR. This BAR0 (inbound window) is required for MSI interrupts support. With this window, the pci devices can write to the MPIC MSI registers to generate MSI interrupt. As far as I know, as of now no emulated PCI controller supports this BAR0 in type 1 configuration header. But probably (I think so) that supporting this is not of big concern, but the point is that this window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory. So I have couple of concerns here: 1. Whenever PCI device does need DMA then these windows (inbound and outbound ATMUs registers) need to used to translate pci address to system physical address (Sometime we also call this as cpu address space). This will probably be done by : [Qemu-devel] [PATCH 00/12] IOMMU Infrastructure : patch-set ( I am trying to understand these patches :-)) 2. Hook up this inbound BAR0 in the above patch-set to translate to mmio-regs. As this would be controller specific, a callback will be registered for translation. Now it will be upto the controller specific code on how it translates. As this is needed only for MSI interrupt, maybe, initially we do not do anything initially, till we want MSI emulation in QEMU. Please provide your comment. Thanks -Bharat
Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)
Just adding Alex Graf and qemu-ppc@. Will add my response in reply. -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Friday, June 08, 2012 4:32 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; Wood Scott-B07421; Yoder Stuart-B08248 Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex) On Fri, 2012-06-08 at 09:03 +, Bhushan Bharat-R65777 wrote: Hi All, When Freescale PCI controller configured in Root Complex mode then, its configuration header (type 1) have one inbound BAR (BAR0, called as CCSRBAR). And rest of BARs (inbound and outbound) are supported by ATMU registers, which are outside the Type 1 configuration header. This BAR0 of Type 1 configuration header always translate to CCSR space and is of the size of CCSR. This BAR0 (inbound window) is required for MSI interrupts support. With this window, the pci devices can write to the MPIC MSI registers to generate MSI interrupt. So you should start with giving an overview, nobody here knows what ATMU or CCSR are, those are device specific acronyms As a matter of fact it looks like I misunderstood you on IRC :-) IE. I thought your problem was that BAR0 is 'special' as it represents the inbound memory window (as it does on some PPC 4xx for example and a whole other collection of embedded devices). You seem to indicate that it's in fact MMIO: As far as I know, as of now no emulated PCI controller supports this BAR0 in type 1 configuration header. But probably (I think so) that supporting this is not of big concern, but the point is that this window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory. So the 4xx case or the case where your BAR 0 actually represents system memory are something that is bothering me but not what you are trying to discuss here. IE. Normal BAR operations should work for an MMIO BAR, ie register it normally as a PCIe device and devices accessing those addresses will do the right thing. From there, AFAIK, the MSI code will simply do stl_le_phys, which I -believe- will hit a BAR that does MMIO decoding for those addresses, but I'll let people knowing qemu more in depth reply whether that's true or not. There's very few devices with MSI support in hw/* so it's hard to test with anything other than virtio. So I have couple of concerns here: 1. Whenever PCI device does need DMA then these windows (inbound and outbound ATMUs registers) need to used to translate pci address to system physical address (Sometime we also call this as cpu address space). This will probably be done by : [Qemu-devel] [PATCH 00/12] IOMMU Infrastructure : patch-set ( I am trying to understand these patches :-)) Yes, that's basically it. The patches allow you to add a set of routines that will be used for translating DMA accesses to system memory along with map/unmap operations etc... Beware that this only works with drivers that use the proper accessors. The patch series fixes some of them but not everything. I don't know off hand (I don't have the code at hand right now) whether the MSI code goes through that or not, if it does, you may need to be careful to -not- hit the translation layer and pass the accesses on. 2. Hook up this inbound BAR0 in the above patch-set to translate to mmio-regs. As this would be controller specific, a callback will be registered for translation. Now it will be upto the controller specific code on how it translates. As this is needed only for MSI interrupt, maybe, initially we do not do anything initially, till we want MSI emulation in QEMU. Well, virtio can make good use of MSI emulation ... Cheers, Ben. Please provide your comment. Thanks -Bharat
Re: [Qemu-devel] qemu_power_down_request handling
Also why not use qemu_system_shutdown_request() rather than qemu_system_powerdown_request() for -watchdog-action shutdown ? Thanks -Bharat -Original Message- From: Bhushan Bharat-R65777 Sent: Monday, April 16, 2012 1:37 PM To: 'qemu-devel@nongnu.org' Cc: 'Alexander Graf'; Wood Scott-B07421 Subject: qemu_power_down_request handling Hi All, The -watchdog-action shutdown uses the monitor system_shutdown mechanism but this does not actually powerdown the system. In code, If there is pending powerdown_request (qemu_system_powerdown_request() called) then it try to raise an irq. What is that irq supposed to do. Currently it in NULL pointer so does not do anything. Code snapshot -- vl.c : main_loop_should_exit() if (qemu_powerdown_requested()) { monitor_protocol_event(QEVENT_POWERDOWN, NULL); qemu_irq_raise(qemu_system_powerdown); } . . . . Is the powerdown_request handling not complete? Thanks -Bharat
[Qemu-devel] qemu_power_down_request handling
Hi All, The -watchdog-action shutdown uses the monitor system_shutdown mechanism but this does not actually powerdown the system. In code, If there is pending powerdown_request (qemu_system_powerdown_request() called) then it try to raise an irq. What is that irq supposed to do. Currently it in NULL pointer so does not do anything. Code snapshot -- vl.c : main_loop_should_exit() if (qemu_powerdown_requested()) { monitor_protocol_event(QEVENT_POWERDOWN, NULL); qemu_irq_raise(qemu_system_powerdown); } . . . . Is the powerdown_request handling not complete? Thanks -Bharat
Re: [Qemu-devel] qemu_power_down_request handling
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, April 16, 2012 1:47 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org; Wood Scott-B07421 Subject: Re: qemu_power_down_request handling On 16.04.2012, at 10:07, Bhushan Bharat-R65777 wrote: Hi All, The -watchdog-action shutdown uses the monitor system_shutdown mechanism but this does not actually powerdown the system. In code, If there is pending powerdown_request (qemu_system_powerdown_request() called) then it try to raise an irq. What is that irq supposed to do. Currently it in NULL pointer so does not do anything. Code snapshot -- vl.c : main_loop_should_exit() if (qemu_powerdown_requested()) { monitor_protocol_event(QEVENT_POWERDOWN, NULL); qemu_irq_raise(qemu_system_powerdown); } . . . . Is the powerdown_request handling not complete? The watchdog shutdown event is a soft power off. Imagine a PC where you press the power button, the OS gets notified through ACPI that it's supposed to shut down and shuts down the machine. I don't know of any BookE system that has a comparable mechanism. We could implement it through qemu-ga and fall back to hard power off maybe? Till someone develop this interface, does it make sense to use qemu_system_shutdown_request() rather than qemu_system_powerdown_request() for -watchdog-action shutdown ? Thanks -Bharat
Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Stuart Yoder Sent: Friday, December 02, 2011 8:11 PM To: Alex Williamson Cc: Alexey Kardashevskiy; aafab...@cisco.com; k...@vger.kernel.org; p...@au1.ibm.com; qemu-devel@nongnu.org; joerg.roe...@amd.com; konrad.w...@oracle.com; ag...@suse.de; d...@au1.ibm.com; chrisw@sous- sol.org; Yoder Stuart-B08248; io...@lists.linux-foundation.org; a...@redhat.com; linux-...@vger.kernel.org; Wood Scott-B07421; be...@cisco.com Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework On Thu, Dec 1, 2011 at 3:25 PM, Alex Williamson alex.william...@redhat.com wrote: On Thu, 2011-12-01 at 14:58 -0600, Stuart Yoder wrote: One other mechanism we need as well is the ability to enable/disable a domain. For example-- suppose a device is assigned to a VM, the device is in use when the VM is abruptly terminated. The VM terminate would shut off DMA at the IOMMU, but now the device is in an indeterminate state. Some devices have no simple reset bit and getting the device back into a sane state could be complicated-- something the hypervisor doesn't want to do. So now KVM restarts the VM, vfio init happens for the device and the IOMMU for that device is re-configured, etc, but we really can't re-enable DMA until the guest OS tells us (via an hcall) that it is ready. The guest needs to get the assigned device in a sane state before DMA is enabled. Giant red flag. We need to paravirtualize the guest? Not on x86. It's the reality we have to deal with, but doing this would obviously only apply to platforms that need it. Some devices are better for assignment than others. PCI devices are moving towards supporting standard reset mechanisms. Does this warrant a new domain enable/disable API, or should we make this part of the setup API we are discussing here? What's wrong with simply not adding any DMA mapping entries until you think your guest is ready? Isn't that effectively the same thing? Unmap ~= disable. If the IOMMU API had a mechanism to toggle the iommu domain on and off, I wouldn't be opposed to adding an ioctl to do it, but it really seems like just a shortcut vs map/unmap. Thanks, Yes, we could do something like that I guess. How do we determine whether guest is ready or not? There can be multiple device get ready at different time. Further if guest have given the device to it user level process or its guest. Should not there be some mechanism for a guest to enable/disable on per device or group? Thanks -Bharat
Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
-Original Message- From: Wood Scott-B07421 Sent: Friday, December 02, 2011 11:57 PM To: Bhushan Bharat-R65777 Cc: Stuart Yoder; Alex Williamson; Alexey Kardashevskiy; aafab...@cisco.com; k...@vger.kernel.org; p...@au1.ibm.com; qemu- de...@nongnu.org; joerg.roe...@amd.com; konrad.w...@oracle.com; ag...@suse.de; d...@au1.ibm.com; chr...@sous-sol.org; Yoder Stuart-B08248; io...@lists.linux-foundation.org; a...@redhat.com; linux- p...@vger.kernel.org; Wood Scott-B07421; be...@cisco.com Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework On 12/02/2011 12:11 PM, Bhushan Bharat-R65777 wrote: How do we determine whether guest is ready or not? There can be multiple device get ready at different time. The guest makes a hypercall with a device handle -- at least that's how we do it in Topaz. Further if guest have given the device to it user level process or its guest. Should not there be some mechanism for a guest to enable/disable on per device or group? Yes, the same mechanism can be used for that -- though in that case we'll also want the ability for the guest to be able to control another layer of mapping (which will get quite tricky with PAMU's limitations). This isn't really VFIO's concern, though (unless you're talking about the VFIO implementation in the guest). Scott, I am not sure if there is any real use case where device needed to assigned beyond 2 level (host + immediate guest) in nested virtualization. But if there any exists then will not it be better to virtualizes the iommu (PAMU for Freescale)? Thanks -Bharat
Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
-Original Message- From: Wood Scott-B07421 Sent: Friday, December 02, 2011 11:57 PM To: Bhushan Bharat-R65777 Cc: Stuart Yoder; Alex Williamson; Alexey Kardashevskiy; aafab...@cisco.com; k...@vger.kernel.org; p...@au1.ibm.com; qemu- de...@nongnu.org; joerg.roe...@amd.com; konrad.w...@oracle.com; ag...@suse.de; d...@au1.ibm.com; chr...@sous-sol.org; Yoder Stuart-B08248; io...@lists.linux-foundation.org; a...@redhat.com; linux- p...@vger.kernel.org; Wood Scott-B07421; be...@cisco.com Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework On 12/02/2011 12:11 PM, Bhushan Bharat-R65777 wrote: How do we determine whether guest is ready or not? There can be multiple device get ready at different time. The guest makes a hypercall with a device handle -- at least that's how we do it in Topaz. Yes, it is ok from hcall with device handle perspective. But I could not understood how cleanly this can be handled with the idea of enabling iommu when guest is ready. Thanks -Bharat Further if guest have given the device to it user level process or its guest. Should not there be some mechanism for a guest to enable/disable on per device or group? Yes, the same mechanism can be used for that -- though in that case we'll also want the ability for the guest to be able to control another layer of mapping (which will get quite tricky with PAMU's limitations). This isn't really VFIO's concern, though (unless you're talking about the VFIO implementation in the guest). May be thinking too ahead, but will not something like this will be needed for nested virtualization? Thanks -Bharat
[Qemu-devel] Query: gdbstub software breakpoint removal
Hi All, I can see that the software breakpoint queue is cleared in kvm_remove_breakpoint() in kvm-all.c. I.e QTAILQ_REMOVE(current_env-kvm_state-kvm_sw_breakpoints, bp, entry); is called when the breakpoint is cleared. While the queue is not cleared on kvm_remove_all_breakpoints(); Is there any specific reason for that? Thanks -Bharat
[Qemu-devel] RE: Query: gdbstub software breakpoint removal
Hi Jan, I will send a patch. Thanks -Bharat -Original Message- From: Jan Kiszka [mailto:jan.kis...@siemens.com] Sent: Monday, March 21, 2011 5:04 PM To: Bhushan Bharat-R65777 Cc: qemu-devel@nongnu.org Subject: Re: Query: gdbstub software breakpoint removal On 2011-03-21 12:01, Bhushan Bharat-R65777 wrote: Hi All, I can see that the software breakpoint queue is cleared in kvm_remove_breakpoint() in kvm-all.c. I.e QTAILQ_REMOVE(current_env- kvm_state-kvm_sw_breakpoints, bp, entry); is called when the breakpoint is cleared. While the queue is not cleared on kvm_remove_all_breakpoints(); Is there any specific reason for that? Hmm, no particular reason that I recall. This rather looks like a very old leak. Patch welcome. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux